compare multiple columns and only replace if matching

Question

I have two files (File 1 and File 2)
I am trying to compare the string of Column1 and 2 of File1 with Column4 and 5 of File2. Except this match, column6 of File2 also need to match certain string, like SO or CO (because column3 and 4 of FILE1 is SO and CO respectively), then replace of column7 of FILE2 with column3 of FILE1, otherwise keep the others unchanged.

I tried to modify and use the solution provided in the forum for a similar problem, but did not work.

FILE1
type  code     SO  CO other

7757    1       6941.958        138.922 149.17
7757    2       8666.123        198.908 225.67
7757    4       2795.885        334.875 378.68
7759    GT3     222.104    13.5    734.62
7768    CT2     0       0       0
7805    6       3796.677        75.175  79.09 

FILE2
"US","01073",,"7757","1","SO","10","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","10","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","10","299"

Required output:
"US","01073",,"7757","1","SO","6941.958","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","138.922","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","75.175","299"

Solution I tried (for CO only) :

tr -d '"' < FILE2 > temp  # to remove double quote
awk 'NR==FNR{A[$1,$2]=$3;next} A[$4,$5] && $6=="CO" {$7=A[$1,$2]; print}' FS=" " OFS="," FILE1 temp > out

Thank you so much for helping editing my code! Randomir.

kelly
– kelly

2017-11-06 14:44:01 +00:00
Commented Nov 6, 2017 at 14:44 — kelly
– kelly, Commented Nov 6, 2017 at 14:44

RomanPerekhrest · Accepted Answer · 2017-11-05 23:16:25Z

2

Complex awk solution:

awk 'function unquote(f){ 
         return substr(f, 2, length(f)-2) 
     }
     NR==FNR{ 
         if (NR==1){ f3=$3; f4=$4 }
         else if (NF){ a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 }
         next; 
     }
     { k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) }
     k in a{ $7=a[k] }1' file1 FS=',' OFS=',' file2

function unquote(f) { ... } - unquotes/extracts value between double quotes (in fact - between the 1st and last characters of the string)
a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 - grouping crucial sequences

The output:

"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"

answered Nov 5, 2017 at 23:16

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

kelly Over a year ago

Hello RomanPerekhrest, thank you for your help. Your script looks great to me. However I keep getting same output as "file2", which means no any replace in column7 in the output. Any hint?

RomanPerekhrest Over a year ago

@kelly, a hint: make sure that you have posted the actual input samples, cause they were copied and tested. The solution works fine for the current posted samples

kelly Over a year ago

RomanPerekhrest ,it is my problem, your code works perfectly. I appreciate your help and time very much.

RomanPerekhrest Over a year ago

@kelly, that's ok

kelly Over a year ago

The solution from @RomanPerekhrest works perfectly with the test data. However there is problem with the real data in FILE2: the column2 is like"abc,45" or "abc23", which means some has comma inside of double quote, some not. Since I can not use double quote as delimiter for this problem, how to deal with that? Thank you for help.

Collectives™ on Stack Overflow

compare multiple columns and only replace if matching

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related