0

I am comparing the values in column 3 from two files, file1 and file2. When the column's value does not match across file1 and file2, code it as 0. When the column's value does match across file1 and file2, code it as 1. For example:

file 1
fid1 iid1 693 900 399
fid2 iid2 589 209 485

file2
fid0 iid0 693 448 932
fid8 iid8 482 548 589

desired output
fid1 iid1 693 900 399 1
fid2 iid2 589 209 485 0

I can get this output in awk, using awk 'FNR==NR{a[$3]++;next}a[$3]' file1 file2

output
fid1 iid1 693 900 399

But, I cannot figure out how to code a new variable based on the a[$3] array comparison, instead of printing just the rows from file1 that match.

1
  • You want to compare the third field of each line and add a column at the end of the lines from the first file that indicates whether the value in the second file was the same or different? (Ignoring all the other fields?) Commented Aug 27, 2015 at 17:02

1 Answer 1

1

You can do:

$ awk 'NR==FNR{a[$3]++;next}{$(NF+1)=(($3 in a) ? 1 : 0)}1' file2 file1
fid1 iid1 693 900 399 1
fid2 iid2 589 209 485 0

Note:

  • Using $(NF+1) may not work on old broken awk.
  • This does not do line for line comparison. This just checks if third column of file1 is present in file2.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.