1
  • I have two files (File 1 and File 2)
  • I am trying to compare the string of Column1 and 2 of File1 with Column4 and 5 of File2. Except this match, column6 of File2 also need to match certain string, like SO or CO (because column3 and 4 of FILE1 is SO and CO respectively), then replace of column7 of FILE2 with column3 of FILE1, otherwise keep the others unchanged.

  • I tried to modify and use the solution provided in the forum for a similar problem, but did not work.

    FILE1
    type  code     SO  CO other
    
    7757    1       6941.958        138.922 149.17
    7757    2       8666.123        198.908 225.67
    7757    4       2795.885        334.875 378.68
    7759    GT3     222.104    13.5    734.62
    7768    CT2     0       0       0
    7805    6       3796.677        75.175  79.09 
    
    FILE2
    "US","01073",,"7757","1","SO","10","299"
    "US","01073",,"7758","1","SO","10","299"
    "US","01073",,"7757","1","NO","10","299"
    "US","01073",,"7757","1","CO","10","299"
    "US","01073",,"7757","4","MO","10","299"
    "US","01073",,"7757","1","GO","10","299"
    "US","01073",,"7805","6","CO","10","299"
    
    Required output:
    "US","01073",,"7757","1","SO","6941.958","299"
    "US","01073",,"7758","1","SO","10","299"
    "US","01073",,"7757","1","NO","10","299"
    "US","01073",,"7757","1","CO","138.922","299"
    "US","01073",,"7757","4","MO","10","299"
    "US","01073",,"7757","1","GO","10","299"
    "US","01073",,"7805","6","CO","75.175","299"
    

    Solution I tried (for CO only) :

    tr -d '"' < FILE2 > temp  # to remove double quote
    awk 'NR==FNR{A[$1,$2]=$3;next} A[$4,$5] && $6=="CO" {$7=A[$1,$2]; print}' FS=" " OFS="," FILE1 temp > out
    
1
  • Thank you so much for helping editing my code! Randomir. Commented Nov 6, 2017 at 14:44

1 Answer 1

2

Complex awk solution:

awk 'function unquote(f){ 
         return substr(f, 2, length(f)-2) 
     }
     NR==FNR{ 
         if (NR==1){ f3=$3; f4=$4 }
         else if (NF){ a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 }
         next; 
     }
     { k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) }
     k in a{ $7=a[k] }1' file1 FS=',' OFS=',' file2
  • function unquote(f) { ... } - unquotes/extracts value between double quotes (in fact - between the 1st and last characters of the string)

  • a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 - grouping crucial sequences


The output:

"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"
Sign up to request clarification or add additional context in comments.

5 Comments

Hello RomanPerekhrest, thank you for your help. Your script looks great to me. However I keep getting same output as "file2", which means no any replace in column7 in the output. Any hint?
@kelly, a hint: make sure that you have posted the actual input samples, cause they were copied and tested. The solution works fine for the current posted samples
RomanPerekhrest ,it is my problem, your code works perfectly. I appreciate your help and time very much.
@kelly, that's ok
The solution from @RomanPerekhrest works perfectly with the test data. However there is problem with the real data in FILE2: the column2 is like"abc,45" or "abc23", which means some has comma inside of double quote, some not. Since I can not use double quote as delimiter for this problem, how to deal with that? Thank you for help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.