Merge multiple input files with awk

Question

I am trying to merge the contents of multiple files based on a key matching with awk, I have seen solutions only for two input files, but not more. The input files look like this:

file1

1#a1
2#b1
3#c1
4#d1
6#f1

file2

1#a2
2#b2
3#c2
5#e2
6#f2

file3

1#a3#extra_field_1
2#b3#extra_field_2
3#c3#extra_field_3
4#d3#extra_field_4
5#e3#extra_field_5

The desired output is the following:

output

a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3
d1;;d3;extra_field_4
;e2;3e;extra_field_5

For this, I am using a bash script based on awk command like the following:

$ awk -v OFS=';' -F '#' 'FNR==NR{a[$1]=$2;next} FNR!=NR{b[$1]=$2;next} NF==3{print a[$1],b[$1],$2,$3}' file1 file2 file3 > output

Anyway, it seems to obviate some of the inputs because it doesn't produce any output, any ideas?

Thanks.

Dima Chubarov · Accepted Answer · 2017-08-17 13:23:15Z

2

You could do that using just the join command

join -t\# file1 file2 -j 1 |\
    join -t\# - file3 -j 1 |\
    cut -d\# --output-delimiter=\; -f2-5

Outputs

a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3

answered Aug 17, 2017 at 13:23

Dima Chubarov

17.3k7 gold badges45 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

BernatL Over a year ago

Could be a nice approach, simplify inputs with join, thanks for the tip. Anyway, it's hard to codify more complex logic with this command.

James Brown · Accepted Answer · 2017-08-17 13:45:09Z

1

Here's one in awk. It doesn't take missing data into consideration as you did not state in the question how it should be handled. It hashes all data into a hash and outputs it in the END:

$ awk '
BEGIN { FS="#"; OFS=";" }
{
    for(i=2;i<=NF;i++)
        a[$1]=a[$1] (a[$1]==""?"":OFS) $i
}
END {
    for(i in a)
        print a[i]
}' f1 f2 f3
a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3

answered Aug 17, 2017 at 13:45

James Brown

37.7k8 gold badges52 silver badges64 bronze badges

2 Comments

BernatL Over a year ago

I guess my example could have been more exhaustive. In fact, the desired output would gather each record of the third file, adding every 2nd field of the other two files with matching keys.

James Brown Over a year ago

Sure. As there is an infinite amount of questions, instead of us guessing the one for you, you tend to get better results if you provide us with the facts.

Guru · Accepted Answer · 2017-08-17 13:49:15Z

1

One more way using paste and awk:

paste -d"#" file1 file2 file3 | awk -F"#" '{print $2,$4,$6,$7}' OFS=";"

answered Aug 17, 2017 at 13:49

Guru

17.1k2 gold badges37 silver badges47 bronze badges

Comments

jared_mamrot · Accepted Answer · 2023-03-23 10:58:09Z

This solution merges two or more files and fills missing/blank fields with "NA" (requires GNU awk):

awk 'BEGIN {
        FS = OFS = "#"
        PROCINFO["sorted_in"] = "@val_str_asc"
}

FNR == 1 {
        filecount++
        numfields[filecount] = NF
        if (NR == 1) {
                a = split($0, header, FS)
        } else {
                for (i = 2; i <= NF; i++) {
                        header[++a] = $i
                }
        }
}

FNR > 1 {
        for (j = 2; j <= NF; j++) {
                b[$1][filecount, j] = $j
        }
}

END {
        for (k = 1; k <= length(header); k++) {
                printf "%s%s", header[k], ((k < length(header)) ? OFS : ORS)
        }
        for (l in b) {
                printf "%s", l OFS
                for (m = 1; m <= filecount; m++) {
                        for (n = 2; n <= numfields[m]; n++) {
                                printf "%s%s",
                                (b[l][m, n] == "" ? "NA" : b[l][m, n]),
                                ((m + n < filecount + numfields[m]) ? OFS : ORS)
                        }
                }
        }
}' file*
1#a1#a2#a3#extra_field_1
2#b1#b2#b3#extra_field_2
3#c1#c2#c3#extra_field_3
4#d1#NA#d3#extra_field_4
5#NA#e2#e3#extra_field_5
6#f1#f2#NA#NA

Different example data:

head file*
==> file1 <==
ID,Value
A1,10
A2,20
A3,30
A4,40

==> file2 <==
ID,Score,Extra
A2,200,True
A1,100,False

==> file3 <==
ID,Evaluation
A1,Correct
A3,Incorrect

==> file4 <==
ID,Value1,Value2,Value3,Value4
A1,,1,1
A2,3,3,3,3

awk 'BEGIN {
        FS = OFS = ","
        PROCINFO["sorted_in"] = "@val_str_asc"
}

FNR == 1 {
        filecount++
        numfields[filecount] = NF
        if (NR == 1) {
                a = split($0, header, FS)
        } else {
                for (i = 2; i <= NF; i++) {
                        header[++a] = $i
                }
        }
}

FNR > 1 {
        for (j = 2; j <= NF; j++) {
                b[$1][filecount, j] = $j
        }
}

END {
        for (k = 1; k <= length(header); k++) {
                printf "%s%s", header[k], ((k < length(header)) ? OFS : ORS)
        }
        for (l in b) {
                printf "%s", l OFS
                for (m = 1; m <= filecount; m++) {
                        for (n = 2; n <= numfields[m]; n++) {
                                printf "%s%s",
                                (b[l][m, n] == "" ? "NA" : b[l][m, n]),
                                ((m + n < filecount + numfields[m]) ? OFS : ORS)
                        }
                }
        }
}' file1 file2 file3 file4
ID,Value,Score,Extra,Evaluation,Value1,Value2,Value3,Value4
A1,10,100,False,Correct,NA,1,1,NA
A2,20,200,True,NA,3,3,3,3
A3,30,NA,NA,Incorrect,NA,NA,NA,NA
A4,40,NA,NA,NA,NA,NA,NA,NA

Collectives™ on Stack Overflow

Merge multiple input files with awk

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related