Adding new columns to the CSV file

Question

I have a file that contains 5 columns and the number of lines varies. I want to append three columns being populated from variables. The variable value stays the same.

At the moment I am doing it in the following way:

#!/bin/bash

newvar1="abcd6"
newvar2="abcd7"
newvar3="abcd8"

rm -rf *.txtyy
number_of_lines=`wc -l smallsample.txt|awk {'print $1'}`
for i in `seq $number_of_lines`; do
echo $newvar1 >> paste1.txtyy
echo $newvar2 >> paste2.txtyy
echo $newvar3 >> paste3.txtyy
done

paste -d "," smallsample.txt paste1.txtyy paste2.txtyy paste3.txtyy

Script output is:

# bash paste.sh
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8

Execution time on 1,000,000 lines on my machine is:

time bash paste.sh

real    0m24.257s
user    0m14.668s
sys     0m9.380s

Input:

abcd1,abcd2,abcd3,abcd4,abcd5
abcd1,abcd2,abcd3,abcd4,abcd5
abcd1,abcd2,abcd3,abcd4,abcd5
abcd1,abcd2,abcd3,abcd4,abcd5
...
abcd1,abcd2,abcd3,abcd4,abcd5

Required output:

abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
...
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8

I believe that what am I doing here is such an overkill and wasting available resources. Can I do better and faster somehow on Debian 9.4 using available tools in that distro?

RavinderSingh13 · Accepted Answer · 2018-10-03 10:40:06Z

4

Could you please try following. This will save output into Input_file itself.

cat script.ksh
newvar1="abcd6"
newvar2="abcd7"
newvar3="abcd8"

awk -v var1="$newvar1" -v var2="$newvar2" -v var3="$newvar3" 'BEGIN{OFS=","}{print $0,var1,var2,var3}' Input_file > temp_file && mv temp_file input_file

answered Oct 3, 2018 at 10:40

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wizard · Accepted Answer · 2018-10-04 05:29:30Z

2

I think you could try something like this

#!/bin/bash

newvar1="abcd6"
newvar2="abcd7"
newvar3="abcd8"

awk -v var1="$newvar1" -v var2="$newvar2" -v var3="$newvar3" -vOFS="," '{print $0,var1,var2,var3}' smallsample.txt > outputfile.txt

I haven't tested its performance, but I think it shouldn't be so bad.

edited Oct 4, 2018 at 5:29

answered Oct 3, 2018 at 10:35

wizard

1,5722 gold badges12 silver badges20 bronze badges

4 Comments

Cyrus Over a year ago

I suggest: awk -v var1="$newvar1" -v var2="$newvar2" -v var3="$newvar3" -v OFS="," '{print $0,var1,var2,var3}' file

creed Over a year ago

Amazing! It yielded with this approach: time bash paste.sh real 0m0.436s user 0m0.224s sys 0m0.208s

wizard Over a year ago

@Cyrus Yes, it is actually better and cleaner to use OFS than concatenate output with commas :) I'll edit the response for future references.

Ed Morton Over a year ago

Also -vvar=value (no space after -v) is gawk-only, use -v var=value for portability. And, of course, always quote your shell variables - -v var="$newvar1" etc.

Collectives™ on Stack Overflow

Adding new columns to the CSV file

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related