3

I have two csv files file1.csv and file2.csv.
file1.csv contains 4 columns.

file1:

Header1,Header2,Header3,Header4
aaaaaaa,bbbbbbb,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
iiiiiii,jjjjjjj,kkkkkkk,lllllll
mmmmmmm,nnnnnnn,ooooooo,ppppppp

file2:

"Header1","Header2","Header3"
"aaaaaaa","cat","dog"
"iiiiiii","doctor","engineer"
"mmmmmmm","sky","blue"

So what I am trying to do is read file1.csv line by line, put each entry into an array then compare the first element of that array with first column of file2.csv and if a match exists then replace the column1 and column2 of file1.csv with the corresponding column of file2.csv

So my desired output is:

cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp

I am able to do it when there is only column to replace.
Here is my code:

awk -F'"(,")?' '
NR==FNR { r[$2] = $3; next }
{ for (n in r) gsub(n,r[n]) } IGNORECASE=1' file2.csv file1.csv>output.csv

My final step is to dump the entire array into a file with 10 columns. Any suggestions where I can improve or correct my code?

6
  • By seeing your profile came to know that from last few questions you are not selecting any answer as correct one, so request you to do so(after sometime after your post) and when you see enough answers to your post you coluld select anyone of the answer as correct one out of them @Anuj Kulkarni Commented Jan 21, 2019 at 6:41
  • @Tiw yes my file2.csv may have quotes. I create file2.csv on my own while file1.csv is a tool generated file. So while creating file2.csv I had used quotes. Commented Jan 21, 2019 at 6:46
  • @Tiw yes that's correct. There will be no commas. I use the quotes because file2 may have more than two space separated words(like a sentence). Commented Jan 21, 2019 at 6:52
  • @AnujKulkarni. It is always advisable to post samples which are near to Input_file(s), kindly edit your post and let us know then. Commented Jan 21, 2019 at 6:59
  • 1
    Anul if your input is comma-separated then a blank or a tab is just like any other character, you don't need to put quotes around your fields unless they contain a comma or a newline. Can they contain newlines? Commented Jan 21, 2019 at 16:08

3 Answers 3

3

EDIT: Considering that your Input_file2 is having date in "ytest","test2" etc format if yes then try following.(Thanks to Tiw for providing this samples in his/her post)

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  gsub(/\"/,"")
  a[tolower($1)]=$0
  next
}
a[tolower($1)]{
  print a[tolower($1)],$NF
  next
}
1' file2.csv file1.csv


Could you please try following.

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  a[$1]=$0
  next
}
a[$1]{
  print a[$1],$NF
  next
}
1'  Input_file2  Input_file1

OR in case you could have combination of lower and capital letters in Input_file(s) then try following.

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  a[tolower($1)]=$0
  next
}
a[tolower($1)]{
  print a[tolower($1)],$NF
  next
}
1'  Input_file2  Input_file1
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the efforts mate I am experimenting with your code right now.
2

With any awk and any number of fields in either file:

$ cat tst.awk
BEGIN { FS=OFS="," }
{
    gsub(/"/,"")
    key = tolower($1)
}
NR==FNR {
    for (i=2; i<=NF; i++) {
        map[key,i] = $i
    }
    next
}
{
    for (i=2; i<=NF; i++) {
        $(i-1) = ((key,i) in map ? map[key,i] : $(i-1))
    }
    print
}

$ awk -f tst.awk file2 file1
Header2,Header3,Header3,Header4
cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp

1 Comment

The code within the NR==FNR block stores the values from file2 in the map array one field at a time and the loop within the 2nd block accesses the values from that array one field at a time.
2

Given your sample data, and the description from your comments, please try this:
(Judged from your own code, you may have quotes around fields, thus I didn't try to answer.)

awk 'BEGIN{FS=OFS=","}
    NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");a[$1]=$2;b[$1]=$3;next}
    $1 in a{$2=b[$1];$1=a[$1];}
    1' file2.csv file1.csv

Eg:

$ cat file1.csv
Header1,Header2,Header3,Header4
aaaaaaa,bbbbbbb,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
iiiiiii,jjjjjjj,kkkkkkk,lllllll
mmmmmmm,nnnnnnn,ooooooo,ppppppp

$ cat file2.csv
"Header1","Header2","Header3"
"aaaaaaa","cat","dog"
"iiiiiii","doctor","engineer"
"mmmmmmm","sky","blue"

$ awk 'BEGIN{FS=OFS=","}
NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");a[$1]=$2;b[$1]=$3;next}
$1 in a{$2=b[$1];$1=a[$1];}
1' file2.csv file1.csv
Header2,Header3,Header3,Header4
cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp

Another way, more verbose, but I think it's better for you to understand (GNU awk):

awk 'BEGIN{FS=OFS=","}
    NR==FNR{for(i=1;i<=NF;i++)$i=gensub(/^"(.*)"$/,"\\1",1,$i);a[$1]=$2;b[$1]=$3;next}
    $1 in b{$2=b[$1];}
    $1 in a{$1=a[$1];}
    1' file2.csv file1.csv

Note a pitfall here, since $1 is the key, so we should change $1 last.

A case insensitive solution:

awk 'BEGIN{FS=OFS=","}
    NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");k=tolower($1);a[k]=$2;b[k]=$3;next}
    {k=tolower($1);if(a[k]){$2=b[k];$1=a[k]}}
    1' file2.csv file1.csv

For code concise, added variabe k and moved "if" inside.

1 Comment

thanks like you said the second method was easier to understand. I tried using it and it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.