How to read a csv file into arrays and compare and replace it with entries from another csv file?

Question

I have two csv files file1.csv and file2.csv.
file1.csv contains 4 columns.

file1:

Header1,Header2,Header3,Header4
aaaaaaa,bbbbbbb,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
iiiiiii,jjjjjjj,kkkkkkk,lllllll
mmmmmmm,nnnnnnn,ooooooo,ppppppp

file2:

"Header1","Header2","Header3"
"aaaaaaa","cat","dog"
"iiiiiii","doctor","engineer"
"mmmmmmm","sky","blue"

So what I am trying to do is read file1.csv line by line, put each entry into an array then compare the first element of that array with first column of file2.csv and if a match exists then replace the column1 and column2 of file1.csv with the corresponding column of file2.csv

So my desired output is:

cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp

I am able to do it when there is only column to replace.
Here is my code:

awk -F'"(,")?' '
NR==FNR { r[$2] = $3; next }
{ for (n in r) gsub(n,r[n]) } IGNORECASE=1' file2.csv file1.csv>output.csv

My final step is to dump the entire array into a file with 10 columns. Any suggestions where I can improve or correct my code?

By seeing your profile came to know that from last few questions you are not selecting any answer as correct one, so request you to do so(after sometime after your post) and when you see enough answers to your post you coluld select anyone of the answer as correct one out of them @Anuj Kulkarni — RavinderSingh13
– RavinderSingh13, Commented Jan 21, 2019 at 6:41
@Tiw yes my file2.csv may have quotes. I create file2.csv on my own while file1.csv is a tool generated file. So while creating file2.csv I had used quotes. — Anuj Kulkarni
– Anuj Kulkarni, Commented Jan 21, 2019 at 6:46
@Tiw yes that's correct. There will be no commas. I use the quotes because file2 may have more than two space separated words(like a sentence). — Anuj Kulkarni
– Anuj Kulkarni, Commented Jan 21, 2019 at 6:52
@AnujKulkarni. It is always advisable to post samples which are near to Input_file(s), kindly edit your post and let us know then. — RavinderSingh13
– RavinderSingh13, Commented Jan 21, 2019 at 6:59
Anul if your input is comma-separated then a blank or a tab is just like any other character, you don't need to put quotes around your fields unless they contain a comma or a newline. Can they contain newlines? — Ed Morton
– Ed Morton, Commented Jan 21, 2019 at 16:08

RavinderSingh13 · Accepted Answer · 2019-01-21 07:52:37Z

3

EDIT: Considering that your Input_file2 is having date in "ytest","test2" etc format if yes then try following.(Thanks to Tiw for providing this samples in his/her post)

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  gsub(/\"/,"")
  a[tolower($1)]=$0
  next
}
a[tolower($1)]{
  print a[tolower($1)],$NF
  next
}
1' file2.csv file1.csv

Could you please try following.

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  a[$1]=$0
  next
}
a[$1]{
  print a[$1],$NF
  next
}
1'  Input_file2  Input_file1

OR in case you could have combination of lower and capital letters in Input_file(s) then try following.

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  a[tolower($1)]=$0
  next
}
a[tolower($1)]{
  print a[tolower($1)],$NF
  next
}
1'  Input_file2  Input_file1

edited Jan 21, 2019 at 7:52

answered Jan 21, 2019 at 6:27

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Anuj Kulkarni Over a year ago

Thanks for the efforts mate I am experimenting with your code right now.

Ed Morton · Accepted Answer · 2019-01-21 16:24:37Z

2

With any awk and any number of fields in either file:

$ cat tst.awk
BEGIN { FS=OFS="," }
{
    gsub(/"/,"")
    key = tolower($1)
}
NR==FNR {
    for (i=2; i<=NF; i++) {
        map[key,i] = $i
    }
    next
}
{
    for (i=2; i<=NF; i++) {
        $(i-1) = ((key,i) in map ? map[key,i] : $(i-1))
    }
    print
}

$ awk -f tst.awk file2 file1
Header2,Header3,Header3,Header4
cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp

answered Jan 21, 2019 at 16:24

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

1 Comment

Ed Morton Over a year ago

The code within the NR==FNR block stores the values from file2 in the map array one field at a time and the loop within the 2nd block accesses the values from that array one field at a time.

Tyl · Accepted Answer · 2019-01-21 17:09:59Z

Given your sample data, and the description from your comments, please try this:
(Judged from your own code, you may have quotes around fields, thus I didn't try to answer.)

awk 'BEGIN{FS=OFS=","}
    NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");a[$1]=$2;b[$1]=$3;next}
    $1 in a{$2=b[$1];$1=a[$1];}
    1' file2.csv file1.csv

Eg:

$ cat file1.csv
Header1,Header2,Header3,Header4
aaaaaaa,bbbbbbb,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
iiiiiii,jjjjjjj,kkkkkkk,lllllll
mmmmmmm,nnnnnnn,ooooooo,ppppppp

$ cat file2.csv
"Header1","Header2","Header3"
"aaaaaaa","cat","dog"
"iiiiiii","doctor","engineer"
"mmmmmmm","sky","blue"

$ awk 'BEGIN{FS=OFS=","}
NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");a[$1]=$2;b[$1]=$3;next}
$1 in a{$2=b[$1];$1=a[$1];}
1' file2.csv file1.csv
Header2,Header3,Header3,Header4
cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp

Another way, more verbose, but I think it's better for you to understand (GNU awk):

awk 'BEGIN{FS=OFS=","}
    NR==FNR{for(i=1;i<=NF;i++)$i=gensub(/^"(.*)"$/,"\\1",1,$i);a[$1]=$2;b[$1]=$3;next}
    $1 in b{$2=b[$1];}
    $1 in a{$1=a[$1];}
    1' file2.csv file1.csv

Note a pitfall here, since $1 is the key, so we should change $1 last.

A case insensitive solution:

awk 'BEGIN{FS=OFS=","}
    NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");k=tolower($1);a[k]=$2;b[k]=$3;next}
    {k=tolower($1);if(a[k]){$2=b[k];$1=a[k]}}
    1' file2.csv file1.csv

For code concise, added variabe k and moved "if" inside.

thanks like you said the second method was easier to understand. I tried using it and it works.

Collectives™ on Stack Overflow

How to read a csv file into arrays and compare and replace it with entries from another csv file?

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related