1

I'm looking for some help with my script. I'm trying to loop over a bunch of CSV files, cut out the 3rd column, and append that to an output file as a new column. Here's what I have so far:

#!/bin/bash

for n in ~/sampledir/*
do
    awk -F "," '{print $3","}' $n >> output.csv
done

The output looks like this:

Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,

What I want is for the new information to be appended to the CSV in columns, so rather than the output above, I want this:

Column3,Column3,Column3,Column3,Column3,Column3,
3,3,3,3,3,3,
33,33,33,33,33,33,
333,333,333,333,333,333,
3333,3333,3333,3333,3333,3333,
33333,33333,33333,33333,33333,33333,

Any guidance would be helpful. Thanks y'all.

2 Answers 2

2

If the awk that you're using is gawk, you could have an awk script like this:

BEGIN { 
    FS="," 
    file_num = 0
    max_num_rows = 0
}

BEGINFILE { file_num++  }

{ 
    data[FNR SUBSEP file_num] = $3 
    if (FNR > max_num_rows) { max_num_rows++ }
}

END {

    for (i = 1; i <= max_num_rows; i++) {
        printf data[i SUBSEP 1]

        for (j = 2; j <= file_num; j++) {
            printf "," data[i SUBSEP j]
        }
        printf "\n"
    }
}

and then use like:

awk -f script.awk ~/sampledir/* > output.csv

Basic idea is to store the data you want into a multidimensional awk array, and then loop over the rows and columns and print the data. If you don't have BEGINFILE available, you could do some extra logic to do something similar. This should also work if the number of rows are different in the files.

Sign up to request clarification or add additional context in comments.

2 Comments

How would I adapt that gawk script to add in the filename as well? Basically, I'm looking to add the filename on top of each column to make it easier to identify which column belongs to which file.
There is a built in variable FILENAME that you can use. If the first row of the data files is headers, you could change the data[FNR SUBSEP file_num] = $3 to data[FNR SUBSEP file_num] = FNR == 1 ? FILENAME : $3
1

Using awk:

awk -F, '{row[FNR] = (FNR==NR? $3: row[FNR] FS $3)}
     END {for (i=1;i<=length(row);i++) print row[i]}' *.csv

Using paste (bash shell):

For a known number of files, you could have

paste -d, <(cut -d, -f3 file1.csv) <(cut -d, -f3 file2.csv)

but this doesn't scale for unknown number of csv files. A hacky way could be:

cmd="paste -d, "
for f in *.csv; do cmd+=' <(cut -d, -f3 '"$f"')'; done
eval "$cmd"

but better go with awk.


Note: Your csv files should not have delimiters nested into quoted fields. In this case you have to use a more complex field separator with GNU awk.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.