0

I have a file that contains 5 columns and the number of lines varies. I want to append three columns being populated from variables. The variable value stays the same.

At the moment I am doing it in the following way:

#!/bin/bash

newvar1="abcd6"
newvar2="abcd7"
newvar3="abcd8"

rm -rf *.txtyy
number_of_lines=`wc -l smallsample.txt|awk {'print $1'}`
for i in `seq $number_of_lines`; do
echo $newvar1 >> paste1.txtyy
echo $newvar2 >> paste2.txtyy
echo $newvar3 >> paste3.txtyy
done

paste -d "," smallsample.txt paste1.txtyy paste2.txtyy paste3.txtyy

Script output is:

# bash paste.sh
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8

Execution time on 1,000,000 lines on my machine is:

time bash paste.sh

real    0m24.257s
user    0m14.668s
sys     0m9.380s

Input:

abcd1,abcd2,abcd3,abcd4,abcd5
abcd1,abcd2,abcd3,abcd4,abcd5
abcd1,abcd2,abcd3,abcd4,abcd5
abcd1,abcd2,abcd3,abcd4,abcd5
...
abcd1,abcd2,abcd3,abcd4,abcd5

Required output:

abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8
...
abcd1,abcd2,abcd3,abcd4,abcd5,abcd6,abcd7,abcd8

I believe that what am I doing here is such an overkill and wasting available resources. Can I do better and faster somehow on Debian 9.4 using available tools in that distro?

2 Answers 2

4

Could you please try following. This will save output into Input_file itself.

cat script.ksh
newvar1="abcd6"
newvar2="abcd7"
newvar3="abcd8"

awk -v var1="$newvar1" -v var2="$newvar2" -v var3="$newvar3" 'BEGIN{OFS=","}{print $0,var1,var2,var3}' Input_file > temp_file && mv temp_file input_file
Sign up to request clarification or add additional context in comments.

Comments

2

I think you could try something like this

#!/bin/bash

newvar1="abcd6"
newvar2="abcd7"
newvar3="abcd8"

awk -v var1="$newvar1" -v var2="$newvar2" -v var3="$newvar3" -vOFS="," '{print $0,var1,var2,var3}' smallsample.txt > outputfile.txt

I haven't tested its performance, but I think it shouldn't be so bad.

4 Comments

I suggest: awk -v var1="$newvar1" -v var2="$newvar2" -v var3="$newvar3" -v OFS="," '{print $0,var1,var2,var3}' file
Amazing! It yielded with this approach: time bash paste.sh real 0m0.436s user 0m0.224s sys 0m0.208s
@Cyrus Yes, it is actually better and cleaner to use OFS than concatenate output with commas :) I'll edit the response for future references.
Also -vvar=value (no space after -v) is gawk-only, use -v var=value for portability. And, of course, always quote your shell variables - -v var="$newvar1" etc.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.