Monday, August 13, 2007

diffing files in different way

The diff utility compares the content of file1 and file2 and writes to standard output a list of changes necessary to convert file1 into file2. diff may not be used when you want to find out which lines in file1 are not present in file2 and vice-a-versa.

Let's say you have a expected result's output file and a current result's output file,

expected-output.txt:
12 something some blah
15 ok ok and not ok
14 someone at somewhere
20 and many more such records

current-output.txt:
15 ok ok and not ok
13 this is not present in expected output
20 and many more such records

One quick way to do this is using power of Unix piping, sed, sort and uniq commands.

cat expected-output.txt | sed 's/^/expected-output.txt /g' > mixed.txt
cat current-output.txt | sed 's/^/current-output.txt /g' >> mixed.txt
sort +1 mixed.txt | uniq -u -f1 | sort

I would need to test this for files with really large number of records. But, currently I am satisfied with the above solution.

4 comments:

Here n Now said...

Hey Man!!!
tell me a way to put time-out for a command.

Let's say, I want to kill a command if it does not complete itself in 40 secs.

What if, I can specify a time-out here for any command.

Rahul Upakare said...

One option I can think of quickly is:
- get the pid of process to be timed-out
- sleep for time-out period
- try to kill the process using pid

Sasi-pository said...

Try the "comm" command - it has options to show you lines unique in each file.

Rahul Upakare said...

Thanks for the info. This command is really useful in this case.