compare two .csv files using awk\ unix

Question

file1:

SA, 5006, 12, , DJ
CN, BN, , BBB, 13
22, 67, GG, FF, 88
33, BB, AA, CC, 22

file2:

SA, 5006, 12, 15 , DJ
CN, BN, , BBB, 13
empty line
33, CC, AA, dd, 22

output:

SA, 5006, 12, 15 , DJ, unmatch, 4
CN, BN, , BBB, 13, match
empt, empt, empt, empt, empt, unmatch, 12345
33, CC, AA, dd, 22, unmatch, 24

I need to compare two .csv files line by line, but some of field/lines can be empty and output should be in file3: 5 columns form file 2, match\unmatch, unmatch Fields like this:

c1, c2, c3, c4, c5, match/unmatch, concatenation of digits representing unmatch fields.

I try something but I new with awk can anyone help? :)

code that I use, but I think the problem its empty fields anf I dont know How I can print the :

 ##Set input and output field separators to ':'.
BEGIN {
    FS = OFS = ":"
}


NR == FNR {
    ## save all the line in an array, so lines will be saved like:
    ## c1::c2::c3::c4::c5

    ++a[$0]

    ## Process next line from the beginning.
    next
}

## for every line of second file.
{ 

    ## Search for the line in the array, if not exists it means that any field is     different  
    ## print the line.
    if ( !a[$0] ) {
            $6 = "same"
            print
    }else {
   $6 = " not same"
            print
}
}

Shouldnt the output of output for last line be just ... unmatch, 2 instead of 24? — jaypal singh
– jaypal singh, Commented Aug 19, 2014 at 4:30
Why are you setting the field separator to : when the file uses ,? — Barmar
– Barmar, Commented Aug 19, 2014 at 4:31
@YifatKatrielUdi The output is still wrong even so. It should be dd in the output and not CC? Should the output contain lines from file1 or file2? — jaypal singh
– jaypal singh, Commented Aug 19, 2014 at 4:43

Barmar · Accepted Answer · 2014-08-20 19:33:07Z

2

You need to use the line number as the index of the array that you save between files, so you can compare corresponding lines in the two files.

BEGIN { FS = ", "; }
NR == FNR { a[FNR] = $0 } # In first file, just save each line in an array
NR != FNR { if (a[FNR] == $0) { # Compare line in 2nd file to corresponding line in first file
                $6 = "match";
            } else {
                $6 = "unmatch";
                split(a[FNR], b); # Split up the fields from the first file
                $7 = ""
                for (i = 1; i <= 5; i++) { # Compare each field
                    if ($i != b[i]) { $7 = $7 i; } # Add non-matching field numbers to output
                }
            }
            print;
        }

edited Aug 20, 2014 at 19:33

answered Aug 19, 2014 at 4:37

Barmar

789k57 gold badges555 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Yifat Katriel Udi Over a year ago

That was fast! Thanks @barmar I would be very happy if you can put notes, so I could learn from it, If it's not hard :)

Barmar Over a year ago

I've added comments, are they helpful?

Yifat Katriel Udi Over a year ago

Ok (: i, see, I dont understand, yesterday I see 2 answer, and today only one, why?

Yifat Katriel Udi Over a year ago

first the match/unmatched appear in field 5 and not in sperate field, second if I have two Line empty in parallel it's appear unmatched /:

Barmar Over a year ago

The author of the other answer deleted it yesterday.

|

Collectives™ on Stack Overflow

compare two .csv files using awk\ unix

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related