- I have two files (File 1 and File 2)
I am trying to compare the string of Column1 and 2 of File1 with Column4 and 5 of File2. Except this match, column6 of File2 also need to match certain string, like SO or CO (because column3 and 4 of FILE1 is SO and CO respectively), then replace of column7 of FILE2 with column3 of FILE1, otherwise keep the others unchanged.
I tried to modify and use the solution provided in the forum for a similar problem, but did not work.
FILE1 type code SO CO other 7757 1 6941.958 138.922 149.17 7757 2 8666.123 198.908 225.67 7757 4 2795.885 334.875 378.68 7759 GT3 222.104 13.5 734.62 7768 CT2 0 0 0 7805 6 3796.677 75.175 79.09 FILE2 "US","01073",,"7757","1","SO","10","299" "US","01073",,"7758","1","SO","10","299" "US","01073",,"7757","1","NO","10","299" "US","01073",,"7757","1","CO","10","299" "US","01073",,"7757","4","MO","10","299" "US","01073",,"7757","1","GO","10","299" "US","01073",,"7805","6","CO","10","299" Required output: "US","01073",,"7757","1","SO","6941.958","299" "US","01073",,"7758","1","SO","10","299" "US","01073",,"7757","1","NO","10","299" "US","01073",,"7757","1","CO","138.922","299" "US","01073",,"7757","4","MO","10","299" "US","01073",,"7757","1","GO","10","299" "US","01073",,"7805","6","CO","75.175","299"Solution I tried (for CO only) :
tr -d '"' < FILE2 > temp # to remove double quote awk 'NR==FNR{A[$1,$2]=$3;next} A[$4,$5] && $6=="CO" {$7=A[$1,$2]; print}' FS=" " OFS="," FILE1 temp > out
-
Thank you so much for helping editing my code! Randomir.kelly– kelly2017-11-06 14:44:01 +00:00Commented Nov 6, 2017 at 14:44
Add a comment
|
1 Answer
Complex awk solution:
awk 'function unquote(f){
return substr(f, 2, length(f)-2)
}
NR==FNR{
if (NR==1){ f3=$3; f4=$4 }
else if (NF){ a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 }
next;
}
{ k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) }
k in a{ $7=a[k] }1' file1 FS=',' OFS=',' file2
function unquote(f) { ... }- unquotes/extracts value between double quotes (in fact - between the 1st and last characters of the string)a[$1,$2,f3]=$3; a[$1,$2,f4]=$4- grouping crucial sequences
The output:
"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"
5 Comments
kelly
Hello RomanPerekhrest, thank you for your help. Your script looks great to me. However I keep getting same output as "file2", which means no any replace in column7 in the output. Any hint?
RomanPerekhrest
@kelly, a hint: make sure that you have posted the actual input samples, cause they were copied and tested. The solution works fine for the current posted samples
kelly
RomanPerekhrest ,it is my problem, your code works perfectly. I appreciate your help and time very much.
RomanPerekhrest
@kelly, that's ok
kelly
The solution from @RomanPerekhrest works perfectly with the test data. However there is problem with the real data in FILE2: the column2 is like"abc,45" or "abc23", which means some has comma inside of double quote, some not. Since I can not use double quote as delimiter for this problem, how to deal with that? Thank you for help.