2

I have two files. I want to print data of file1 on the basis of file2.

File1:

a 1 
b 2 
c 3 
d 4 
e 1 
f 5 
g 1

File2:

1 100
2 200
3 400
4 600
5 700

using below command :

 awk 'NR==FNR{a[$2]=$1;next}$1 in a{print a[$1] " " $2}' file1 file2

i got following output :

g 100
b 200
c 400
d 600
f 700

But i don't want duplicate values to be overwritten in array . Desired Output :

a 100
e 100
g 100
b 200
c 400
d 600
f 700

Is it possible to store duplicate key in array in awk script like multimap in C++. Or is there another way to do this ? Please help me out.

4 Answers 4

4

If (and only if) the first fields of the second file (the single-digit numbers) are unique, you could turn the logic around and use that field as the key to the array:

$ awk 'FNR==NR { a[$1] = $2; next } $2 in a {print $1, a[$2]} ' file2 file1
a 100
b 200
c 400
d 600
e 100
f 700
g 100

Now the output order is the order of file1, so not what you wanted, but a pipe to sort -nk2 will fix that.

There's the border case of what to do if the first file has a line where the second field isn't in the second file (say, h 9). The $2 in a condition would skip those entirely. Without the condition they would be printed, with an empty second field (just h[space] in the output).

Sign up to request clarification or add additional context in comments.

2 Comments

@Sundeep, true, that's better since it allows changing OFS if desired.
Got my vote. I was too focused on the Is it possible to store duplicate key in array in awk script part of the question and got sucked in by the OPs approach...
3

With GNU awk for true multi-dimensional arrays:

$ awk 'NR==FNR{a[$2][$1]=$1;next} $1 in a{for (i in a[$1]) print a[$1][i], $2}' file1 file2
a 100
e 100
g 100
b 200
c 400
d 600
f 700

With other awks:

$ awk 'NR==FNR{a[$2]=a[$2] FS $1;next} $1 in a{split(a[$1],b); for (i in b) print b[i], $2}' file1 file2
a 100
e 100
g 100
b 200
c 400
d 600
f 700

Output order per key will be random due to the in operator, if that's a problem let us know what order you need.

Comments

2

You can use join command.
Before join command, you must be use sort command to sort the files.

$ sort -k 2 file1 > file1_sort  
$ sort -k 1 file2 > file2_sort  
$ join -1 2 -2 1 file1_sort file2_sort -o 1.1,2.2 > new_file  
$ rm file1_sort  
$ rm file2_sort  
$ cat new_file
a 100
e 100
g 100
b 200
c 400
d 600
f 700


With Process Substitution

$ join -1 2 -2 1 <(sort -k2 file1) <(sort file2) -o 1.1,2.2
a 100
e 100
g 100
b 200
c 400
d 600
f 700

1 Comment

upvote for different and working solution... would be better if you could add explanation as well
1

after first reply that replies to title but not OP itself, a second version for OP (original reply fully modified)

awk 'FNR==NR{R[$1]=$2;next}{$2=R[$2]}7' File2 File1

2 Comments

sorry, you are right this is the OP that was a bit confusing with But i don't want duplicate values to be overwritten in array * meaning '*i want all value associate to line of File1 corresponding in File2. The code remove duplicate (that doesn't exist in fact)
You should just delete it rather than adding a comment for us to forget it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.