2

Is there a way to run a awk script within bash script? I have a large file (~40GB) that I want to split based on 3rd field. The third field can be either chr1, chr2 ... chr22, chrX and chrY (total 24 types). When I run

awk 'BEGIN{OFS=FS="\t"}$3=="chr1"{print $0}' inputfile.txt > inputfile_chr1.txt

It runs fine but when I try to loop it doesn't:

for i in {1..22} X Y; do 
awk 'BEGIN{OFS=FS="\t"}$3=="chr${i}"{print $0}' inputfile.txt > inputfile_chr${i}.txt
done

I tried using single quotes for $3 and back slash to escape $3 but everything failed. Is there a better way to do this?

3 Answers 3

3

You don't want to use your current bash approach. You are reading the 40GB inputfile.txt 24 times! Just simply parse the file once with awk:

awk '{file="inputfile_"$3".txt";print >> file;close(file)}' inputfile.txt 

Demo:

$ ls
inputfile.txt

$ cat inputfile.txt 
1 foo chr1
2 bar chr1
3 abc chr2
4 zyz chr3
5 123 chr2

$ awk '{file="inputfile_"$3".txt";print >> file;close(file)}' inputfile.txt

$ ls
inputfile_chr1.txt  inputfile_chr2.txt  inputfile_chr3.txt  inputfile.txt

$ cat inputfile_chr1.txt 
1 foo chr1
2 bar chr1

$ cat inputfile_chr2.txt 
3 abc chr2
5 123 chr2

$ cat inputfile_chr3.txt 
4 zyz chr3
Sign up to request clarification or add additional context in comments.

Comments

1

Looks like you just need to dig out the i.

'BEGIN{OFS=FS="\t"}$3=="chr'${i}'"{print $0}'

3 Comments

Whilst this may be a quick fix it really isn't the fix the OP wants. There current approach would read that 40GB input file 22 times. Not to mention this isn't the way shell variables values should be pass to awk.
@djechlin: Thanks very much! so if I understood correctly: If 2 single quotes are used, it will expand variable but if I use 1 it won't, right?
@sudo_O: can you please suggest a better way? I greatly appreciate any help. Thanks.
0

Or, and in my opinion better, pass i as var:

for i in {1..22} X Y; do 
awk -v i=$i 'BEGIN{OFS=FS="\t"}$3=="chr" i {print $0}' inputfile.txt > inputfile_chr${i}.txt
done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.