1

I have ~100 files and I would like to do an arithmetical operation (e.g. sum them up) on the second column of the files, such that I add the value of first row of one file to the first row value of second file and so on for all rows of column 2 in each file.

In my actual files I have ~30 000 rows so any kind of manual manipulation with the rows is not possible.

fileA

1   1  
2   100  
3   1000  
4   15000   

fileB

1   7  
2   500  
3   6000    
4   20000  

fileC

1   4  
2   300  
3   8000    
4   70000

output:

1   12  
2   900  
3   15000  
4   105000  

I used this and ran it as: script.sh listofnames.txt (All the files have the same name but they are in different directories so I was referring to them with $line to the file with the list of directories names). This gives me a syntax error and I am looking for a way to define the "sum" otherwise.

while IFS='' read -r line || [[ -n "$line" ]]; do
    awk '{"'$sum'"+=$3; print $1,$2,"'$sum'"}' ../$line/file.txt >> output.txt
    echo $sum
done < "$1"

1 Answer 1

2
$ paste fileA fileB fileC | awk '{sum=0; for (i=2;i<=NF;i+=2) sum+=$i; print $1, sum}'
1 12
2 900
3 15000
4 105000

or if you wanted to do it all in awk:

$ awk '{key[FNR]=$1; sum[FNR]+=$2} END{for (i=1; i<=FNR;i++) print key[i], sum[i]}' fileA fileB fileC
1 12
2 900
3 15000
4 105000

If you have a list of directories in a file named "foo" and every file you're interested in in every directory is named "bar" then you can do:

IFS=$'\n' files=( $(awk '{print $0 "/bar"}' foo) )
cmd "${files[@]}"

where cmd is awk or paste or anything else you want to run on those files. Look:

$ cat foo
abc
def
ghi klm

$ IFS=$'\n' files=( $(awk '{print $0 "/bar"}' foo) )

$ awk 'BEGIN{ for (i=1;i<ARGC;i++) print "<" ARGV[i] ">"; exit}' "${files[@]}"
<abc/bar>
<def/bar>
<ghi klm/bar>

So if your files are all named file.txt and your directory names are stored in listofnames.txt then your script would be:

IFS=$'\n' files=( $(awk '{print $0 "/file.txt"}' listofnames.txt) )

followed by whichever of these you prefer:

paste "${files[@]}" | awk '{sum=0; for (i=2;i<=NF;i+=2) sum+=$i; print $1, sum}'

awk '{key[FNR]=$1; sum[FNR]+=$2} END{for (i=1; i<=FNR;i++) print key[i], sum[i]}' "${files[@]}"
Sign up to request clarification or add additional context in comments.

10 Comments

I have over a hundred files (with irregular string names,not just numbers) so writing fileA fileB fileC at the end is not really feasible. Can that part be scripted somehow?
If there's nothing else in your directory then awk '...' file* is all you need.
They are all in separate directories and I usually used a file for referencing to those directories so I don't know how to apply it to this solution
That's a completely different question to the one you asked. Can you cd to the directory under which you files exist and do awk '...' */file.txt or use find to find them? If not - how do you provide them to your script today and why can't you just do that again?
Yup that'll do it. Obviously you should have included that info in your question and made your sample input look like your real input. It's an easy fix of course - just change +=2 to +=4 in the awk script.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.