Adding column values from multiple different files

Question

I have ~100 files and I would like to do an arithmetical operation (e.g. sum them up) on the second column of the files, such that I add the value of first row of one file to the first row value of second file and so on for all rows of column 2 in each file.

In my actual files I have ~30 000 rows so any kind of manual manipulation with the rows is not possible.

fileA

fileB

fileC

output:

I used this and ran it as: script.sh listofnames.txt (All the files have the same name but they are in different directories so I was referring to them with $line to the file with the list of directories names). This gives me a syntax error and I am looking for a way to define the "sum" otherwise.

while IFS='' read -r line || [[ -n "$line" ]]; do
    awk '{"'$sum'"+=$3; print $1,$2,"'$sum'"}' ../$line/file.txt >> output.txt
    echo $sum
done < "$1"

Ed Morton · Accepted Answer · 2017-03-24 19:27:05Z

2

$ paste fileA fileB fileC | awk '{sum=0; for (i=2;i<=NF;i+=2) sum+=$i; print $1, sum}'
1 12
2 900
3 15000
4 105000

or if you wanted to do it all in awk:

$ awk '{key[FNR]=$1; sum[FNR]+=$2} END{for (i=1; i<=FNR;i++) print key[i], sum[i]}' fileA fileB fileC
1 12
2 900
3 15000
4 105000

If you have a list of directories in a file named "foo" and every file you're interested in in every directory is named "bar" then you can do:

IFS=$'\n' files=( $(awk '{print $0 "/bar"}' foo) )
cmd "${files[@]}"

where cmd is awk or paste or anything else you want to run on those files. Look:

$ cat foo
abc
def
ghi klm

$ IFS=$'\n' files=( $(awk '{print $0 "/bar"}' foo) )

$ awk 'BEGIN{ for (i=1;i<ARGC;i++) print "<" ARGV[i] ">"; exit}' "${files[@]}"
<abc/bar>
<def/bar>
<ghi klm/bar>

So if your files are all named file.txt and your directory names are stored in listofnames.txt then your script would be:

IFS=$'\n' files=( $(awk '{print $0 "/file.txt"}' listofnames.txt) )

followed by whichever of these you prefer:

paste "${files[@]}" | awk '{sum=0; for (i=2;i<=NF;i+=2) sum+=$i; print $1, sum}'

awk '{key[FNR]=$1; sum[FNR]+=$2} END{for (i=1; i<=FNR;i++) print key[i], sum[i]}' "${files[@]}"

edited Mar 24, 2017 at 19:27

answered Mar 24, 2017 at 16:23

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Layla_K Over a year ago

I have over a hundred files (with irregular string names,not just numbers) so writing fileA fileB fileC at the end is not really feasible. Can that part be scripted somehow?

Ed Morton Over a year ago

If there's nothing else in your directory then awk '...' file* is all you need.

Layla_K Over a year ago

They are all in separate directories and I usually used a file for referencing to those directories so I don't know how to apply it to this solution

Ed Morton Over a year ago

That's a completely different question to the one you asked. Can you cd to the directory under which you files exist and do awk '...' */file.txt or use find to find them? If not - how do you provide them to your script today and why can't you just do that again?

Ed Morton Over a year ago

Yup that'll do it. Obviously you should have included that info in your question and made your sample input look like your real input. It's an easy fix of course - just change +=2 to +=4 in the awk script.

|

Collectives™ on Stack Overflow

Adding column values from multiple different files

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related