1

Say I have a tab-delimited data file with 10 columns. With awk, it's easy to extract column 7, for example, and output that into a separate file. (See this question, for example.)

What if I have 5 such data files, and I would like to extract column 7 from each of them and make a new file with 5 data columns, one for the column 7 of each input file? Can this be done from the command line with awk and other commands?

Or should I just write up a Python script to handle it?

2 Answers 2

1

awk '{a[FNR] = a[FNR]" " $7}END{for(i=0;i<FNR;i++) print a[i]}'

a array holds each line from different files

FNR number of records read in current input file, set to zero at begining of each file.

END{for(i=0;i<FNR;i++) print a[i]} prints the content of array a on END of file

Sign up to request clarification or add additional context in comments.

Comments

0

If the data is small enough to store it all in memory then this should work:

awk '{out[FNR]=out[FNR] (out[FNR]?OFS:"") $7; max=(FNR>max)?FNR:max} END {for (i=1; i<=max; i++) {print out[i]}}' file1 file2 file3 file4 file5

If it isn't then you would need something fancier which could seek around file streams or read single lines from multiple files (a shell loop with N calls to read could do this).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.