Append Columns to CSV in For Loop

Question

I'm looking for some help with my script. I'm trying to loop over a bunch of CSV files, cut out the 3rd column, and append that to an output file as a new column. Here's what I have so far:

#!/bin/bash

for n in ~/sampledir/*
do
    awk -F "," '{print $3","}' $n >> output.csv
done

The output looks like this:

Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,

What I want is for the new information to be appended to the CSV in columns, so rather than the output above, I want this:

Column3,Column3,Column3,Column3,Column3,Column3,
3,3,3,3,3,3,
33,33,33,33,33,33,
333,333,333,333,333,333,
3333,3333,3333,3333,3333,3333,
33333,33333,33333,33333,33333,33333,

Any guidance would be helpful. Thanks y'all.

Mitchell P · Accepted Answer · 2020-10-01 02:28:13Z

2

If the awk that you're using is gawk, you could have an awk script like this:

BEGIN { 
    FS="," 
    file_num = 0
    max_num_rows = 0
}

BEGINFILE { file_num++  }

{ 
    data[FNR SUBSEP file_num] = $3 
    if (FNR > max_num_rows) { max_num_rows++ }
}

END {

    for (i = 1; i <= max_num_rows; i++) {
        printf data[i SUBSEP 1]

        for (j = 2; j <= file_num; j++) {
            printf "," data[i SUBSEP j]
        }
        printf "\n"
    }
}

and then use like:

awk -f script.awk ~/sampledir/* > output.csv

Basic idea is to store the data you want into a multidimensional awk array, and then loop over the rows and columns and print the data. If you don't have BEGINFILE available, you could do some extra logic to do something similar. This should also work if the number of rows are different in the files.

answered Oct 1, 2020 at 2:28

Mitchell P

3083 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

John Perkins Over a year ago

How would I adapt that gawk script to add in the filename as well? Basically, I'm looking to add the filename on top of each column to make it easier to identify which column belongs to which file.

Mitchell P Over a year ago

There is a built in variable FILENAME that you can use. If the first row of the data files is headers, you could change the data[FNR SUBSEP file_num] = $3 to data[FNR SUBSEP file_num] = FNR == 1 ? FILENAME : $3

thanasisp · Accepted Answer · 2020-10-01 02:46:26Z

1

Using awk:

awk -F, '{row[FNR] = (FNR==NR? $3: row[FNR] FS $3)}
     END {for (i=1;i<=length(row);i++) print row[i]}' *.csv

Using paste (bash shell):

For a known number of files, you could have

paste -d, <(cut -d, -f3 file1.csv) <(cut -d, -f3 file2.csv)

but this doesn't scale for unknown number of csv files. A hacky way could be:

cmd="paste -d, "
for f in *.csv; do cmd+=' <(cut -d, -f3 '"$f"')'; done
eval "$cmd"

but better go with awk.

Note: Your csv files should not have delimiters nested into quoted fields. In this case you have to use a more complex field separator with GNU awk.

answered Oct 1, 2020 at 2:46

thanasisp

6,0053 gold badges18 silver badges32 bronze badges

Collectives™ on Stack Overflow

Append Columns to CSV in For Loop

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related