0

file1:

SA, 5006, 12, , DJ
CN, BN, , BBB, 13
22, 67, GG, FF, 88
33, BB, AA, CC, 22

file2:

SA, 5006, 12, 15 , DJ
CN, BN, , BBB, 13
empty line
33, CC, AA, dd, 22

output:

SA, 5006, 12, 15 , DJ, unmatch, 4
CN, BN, , BBB, 13, match
empt, empt, empt, empt, empt, unmatch, 12345
33, CC, AA, dd, 22, unmatch, 24

I need to compare two .csv files line by line, but some of field/lines can be empty and output should be in file3: 5 columns form file 2, match\unmatch, unmatch Fields like this:

c1, c2, c3, c4, c5, match/unmatch, concatenation of digits representing unmatch fields.

I try something but I new with awk can anyone help? :)

code that I use, but I think the problem its empty fields anf I dont know How I can print the :

 ##Set input and output field separators to ':'.
BEGIN {
    FS = OFS = ":"
}


NR == FNR {
    ## save all the line in an array, so lines will be saved like:
    ## c1::c2::c3::c4::c5

    ++a[$0]

    ## Process next line from the beginning.
    next
}

## for every line of second file.
{ 

    ## Search for the line in the array, if not exists it means that any field is     different  
    ## print the line.
    if ( !a[$0] ) {
            $6 = "same"
            print
    }else {
   $6 = " not same"
            print
}
}
10
  • Shouldnt the output of output for last line be just ... unmatch, 2 instead of 24? Commented Aug 19, 2014 at 4:30
  • Why are you setting the field separator to : when the file uses ,? Commented Aug 19, 2014 at 4:31
  • Hi @jaypal IT should be 24 look again (: Commented Aug 19, 2014 at 4:36
  • @YifatKatrielUdi The output is still wrong even so. It should be dd in the output and not CC? Should the output contain lines from file1 or file2? Commented Aug 19, 2014 at 4:43
  • @jaypal, yes! my mistake (: you right Commented Aug 19, 2014 at 4:47

1 Answer 1

2

You need to use the line number as the index of the array that you save between files, so you can compare corresponding lines in the two files.

BEGIN { FS = ", "; }
NR == FNR { a[FNR] = $0 } # In first file, just save each line in an array
NR != FNR { if (a[FNR] == $0) { # Compare line in 2nd file to corresponding line in first file
                $6 = "match";
            } else {
                $6 = "unmatch";
                split(a[FNR], b); # Split up the fields from the first file
                $7 = ""
                for (i = 1; i <= 5; i++) { # Compare each field
                    if ($i != b[i]) { $7 = $7 i; } # Add non-matching field numbers to output
                }
            }
            print;
        }
Sign up to request clarification or add additional context in comments.

6 Comments

That was fast! Thanks @barmar I would be very happy if you can put notes, so I could learn from it, If it's not hard :)
I've added comments, are they helpful?
Ok (: i, see, I dont understand, yesterday I see 2 answer, and today only one, why?
first the match/unmatched appear in field 5 and not in sperate field, second if I have two Line empty in parallel it's appear unmatched /:
The author of the other answer deleted it yesterday.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.