1

I am trying to do queries from a big file. I am using "awk" in a bash script. The bash script reads some parameters (line by line) from a parameter file and put them in variables which are then passed to the awk. The result of each query needs to be stored in a separate file named as specified in the parameter file:

#!/bin/bash

while IFS=\t read chr start end name
do 

echo $chr $start $end $name

awk -v "chr=$chr" -v "start=$start" -v "end=$end" '$1==chr && $3>start && $3<end && $11<5E-2 {print $0}' bigfile.out > ${name}.out

done < parameterfile

Unfortunately, the awk command does not produce any output. Any idea what might be wrong. (based on echo command bash variables are assigned correctly).

1
  • Why not have awk process the input directly? Commented Aug 13, 2012 at 3:31

3 Answers 3

1

IMHO Bash does not understand "\t" in IFS. Try this

while IFS=$(echo -e "\t") read chr start end name
do
        echo =$chr=$start=$end=$name=
done <<EOF
11      1       10      aaa bbb
12      3       30      ccc bbb
EOF

This one will break up tab delimited text. Your variant will assign everything into $chr. Every time print variable assignments with visible delimiters. :) '=' for example.

Sign up to request clarification or add additional context in comments.

Comments

1

The key is at the IFS:

while IFS='   ' read chr start end name

where what is between the single quotes is a tab char.

Comments

0

I do not know what the exact specific requirement is for having bash in between, however if reading input from a file / user is a requirement, then this should work

#!/bin/bash  
cat parameterfile |awk 'BEGIN{  
    FS="\t";  
}{  
 # If parameterfile has multiple lines, and you want to comment in them, prahaps  
 #  if($0~"^[ \t]*#")next;  
 # Will allow lines starting with # (with any amount of space or tab in the front) to be reconized  
 # as comments instead of parameters :-)  
 #  
 # read the parameter file, whatever format it may be.  
 # Here we assume parameterfile is tab separated, so inside the BEGIN{} we specify FS as tab  
 # if it is a cvs , then A[0]=split($0,A,","); and then chr=A[1]; as such.  
 chr=$1;  
 start=$2;  
 end=$3;  
 name=$4;  
 # Lets start reading the file. We could read this from parameter file, if you want, or a -v var=arg on awk  
 file_to_read_from="bigfile.out";  
 while((getline line_of_data < file_to_read_from)>0){  
    # Since I do not have psychic powers to guess the format of the input of the file, here is some example  
    # If it is separated my more than one space   
    # B[0]=split(line_of_data,B,"[ ]");  
    # If it is separated by tabs  
    B[0]=split(line_of_data,B,"\t");  

    # Check if the line matches our specified whatever  condition
    if( B[1]==chr && B[3]>start && B[3]<end && B[11]<5E-2 ){  
      # Print to whatever destination  
      print > name".out";  
    }  

 }  
 # Done reading all lines from file_to_read_from
 # Close opened file, so that we can handle millions of files  
 close(file_to_read_from);  
 # If parameterfile has multiple lines, then more is processed.
 # If you only want the first line of parameter file to be read, then
 # exit 0;
 # should get you out of here
}'   

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.