how to print/store duplicate values in array in awk

Question

I have two files. I want to print data of file1 on the basis of file2.

File1:

a 1 
b 2 
c 3 
d 4 
e 1 
f 5 
g 1

File2:

using below command :

 awk 'NR==FNR{a[$2]=$1;next}$1 in a{print a[$1] " " $2}' file1 file2

i got following output :

But i don't want duplicate values to be overwritten in array . Desired Output :

Is it possible to store duplicate key in array in awk script like multimap in C++. Or is there another way to do this ? Please help me out.

ilkkachu · Accepted Answer · 2017-04-20 12:26:36Z

4

If (and only if) the first fields of the second file (the single-digit numbers) are unique, you could turn the logic around and use that field as the key to the array:

$ awk 'FNR==NR { a[$1] = $2; next } $2 in a {print $1, a[$2]} ' file2 file1
a 100
b 200
c 400
d 600
e 100
f 700
g 100

Now the output order is the order of file1, so not what you wanted, but a pipe to sort -nk2 will fix that.

There's the border case of what to do if the first file has a line where the second field isn't in the second file (say, h 9). The $2 in a condition would skip those entirely. Without the condition they would be printed, with an empty second field (just h[space] in the output).

edited Apr 20, 2017 at 12:26

answered Apr 20, 2017 at 12:21

ilkkachu

6,56418 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ilkkachu Over a year ago

@Sundeep, true, that's better since it allows changing OFS if desired.

Ed Morton Over a year ago

Got my vote. I was too focused on the Is it possible to store duplicate key in array in awk script part of the question and got sucked in by the OPs approach...

Ed Morton · Accepted Answer · 2017-04-20 12:19:40Z

3

With GNU awk for true multi-dimensional arrays:

$ awk 'NR==FNR{a[$2][$1]=$1;next} $1 in a{for (i in a[$1]) print a[$1][i], $2}' file1 file2
a 100
e 100
g 100
b 200
c 400
d 600
f 700

With other awks:

$ awk 'NR==FNR{a[$2]=a[$2] FS $1;next} $1 in a{split(a[$1],b); for (i in b) print b[i], $2}' file1 file2
a 100
e 100
g 100
b 200
c 400
d 600
f 700

Output order per key will be random due to the in operator, if that's a problem let us know what order you need.

edited Apr 20, 2017 at 12:19

answered Apr 20, 2017 at 12:10

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Comments

Sundeep · Accepted Answer · 2017-04-20 12:35:53Z

2

You can use join command.
Before join command, you must be use sort command to sort the files.

$ sort -k 2 file1 > file1_sort  
$ sort -k 1 file2 > file2_sort  
$ join -1 2 -2 1 file1_sort file2_sort -o 1.1,2.2 > new_file  
$ rm file1_sort  
$ rm file2_sort  
$ cat new_file
a 100
e 100
g 100
b 200
c 400
d 600
f 700

With Process Substitution

$ join -1 2 -2 1 <(sort -k2 file1) <(sort file2) -o 1.1,2.2
a 100
e 100
g 100
b 200
c 400
d 600
f 700

edited Apr 20, 2017 at 12:35

Sundeep

23.9k2 gold badges35 silver badges131 bronze badges

answered Apr 20, 2017 at 12:19

signjing

984 bronze badges

1 Comment

Sundeep Over a year ago

upvote for different and working solution... would be better if you could add explanation as well

NeronLeVelu · Accepted Answer · 2017-04-21 06:52:50Z

1

after first reply that replies to title but not OP itself, a second version for OP (original reply fully modified)

awk 'FNR==NR{R[$1]=$2;next}{$2=R[$2]}7' File2 File1

edited Apr 21, 2017 at 6:52

answered Apr 20, 2017 at 12:03

NeronLeVelu

10.1k1 gold badge26 silver badges44 bronze badges

2 Comments

NeronLeVelu Over a year ago

sorry, you are right this is the OP that was a bit confusing with But i don't want duplicate values to be overwritten in array * meaning '*i want all value associate to line of File1 corresponding in File2. The code remove duplicate (that doesn't exist in fact)

Ed Morton Over a year ago

You should just delete it rather than adding a comment for us to forget it.

Collectives™ on Stack Overflow

how to print/store duplicate values in array in awk

4 Answers 4

2 Comments

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related