0

currently I'am working on a script that that should produce some PBS script which can be submited to the cluster. My normal scripts are working well but now I'am facing the problem of having two input file for one program. One of my scripts for example looks like:

#!/bin/bash

echo -e "#!/bin/bash\n
#SBATCH --job-name=whatever
#SBATCH --export=NONE
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=80G
#SBATCH --partition=blabla
#SBATCH --blabla" >> $1

echo -e "touch log_file_$1\n" >> $1

x=$( cd $( dirname ${BASH_SOURCE[0]} ) && pwd ) 

for file in /foo/bar/foo/bar/*; do
rl=$(readlink -f $file)
kw=${rl##*/} 
id=${kw%%.*} 
gz_weg=${kw%.*} 

if [ ! -d "$id" ]; then
    mkdir "$id"
fi

echo "echo $kw >> log_file_$1" >> $1
printf "foo-bar --mode barbar -e 0.001 --index /barz/barz/barz.index --inFile $rl --output $x/$id/$gz_weg.rma 2>> $x/log_file_$1 \n" >> $1
echo "echo -e '"\\n"' >> log_file_$1" >> $1
echo -e "\n" >> $1
done

Not a beauty I guess but it works for me. But now as stated above I'am facing the problem of having two input files. They are both in the same folder and I tried something like:

for file in /ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/iceman.UDG.*/*.fastq.gz; do

bs=$(basename $file)

if  [[ "$bs" == *R1* ]]; then
    r1=$(readlink -f $file)
    k1=${r1##*/}
    id1=${k1%%.*}
    gz_weg1=${k1%.*}
fi


if  [[ "$bs" == *R2* ]]; then
    r2=$(readlink -f $file)
    k2=${r2##*/}
    id2=${k2%%.*}
    gz_weg1=${k2%.*}
fi


if [ ! -d "$id1" ]; then
    mkdir "$id1"
fi

echo "echo $kw >> log_file_$1" >> $1
printf "blablabla -in1 $r1 -in2 $r2 -f foo -r bar -l 25 -qt -q 20 -o $x/$id1/whatever -verbose 2>> $x/log_file_$1 \n" >> $1
echo "echo -e '"\\n"' >> log_file_$1" >> $1
echo -e "\n" >> $1
done
fi

Because the files differ only in R1 or R2 in their filenames. However I realised this will not work properly because it will only get me one file. So how to solve the problem that -in1 is pointing to the file containing the R1 and -in2 containing the R2

Thanks in advance :)

1 Answer 1

1

If you save your arguments in variables beforehand, then you can replace the arguments with the list of files and consume them two at a time:

out_file=$1
set -- /ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/iceman.UDG.*/*.fastq.gz

while [[ -z $1 ]]
do
    # Get the next two filenames
    file1=$1
    file2=$2
    # discard them from arguments
    shift 2

    # Then the rest of the script 
    bs1=...
    # Use $out_file instead of $1
done

This might run the risk of running out of space for arguments, so you could save a bit by trimming out the path:

out_file=$1
dirpath=/ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/
cd "$dirpath"
set -- iceman.UDG.*/*.fastq.gz
cd "$OLDPWD"
while [[ -z $1 ]]
do
    # Get the next two filenames
    file1="$dirpath/$1"
    file2="$dirpath/$2"
    # discard them from arguments
    shift 2
    ...

If all R1 files have a corresponding R2 file, then you don't need to take files two at a time - just loop over all R1 files, and then take the corresponding R2 file:

for file in /ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/iceman.UDG.*/*R1*.fastq.gz; do
    r1=$(readlink -f $file)
    k1=${r1##*/}
    id1=${k1%%.*}
    gz_weg1=${k1%.*}


    # Change R1 to R2 in filename
    file=${file//R1/R2}
    r2=$(readlink -f $file)
    k2=${r2##*/}
    id2=${k2%%.*}
    gz_weg2=${k2%.*}

    if [ ! -d "$id1" ]; then
        mkdir "$id1"
    fi

    echo "echo $kw >> log_file_$1" >> $1
    printf "blablabla -in1 $r1 -in2 $r2 -f foo -r bar -l 25 -qt -q 20 -o $x/$id1/whatever -verbose 2>> $x/log_file_$1 \n" >> $1
    echo "echo -e '"\\n"' >> log_file_$1" >> $1
    echo -e "\n" >> $1
done

file=${file//R1/R2} replaces R1 in the filename with R2, thus giving the other filename.

Sign up to request clarification or add additional context in comments.

5 Comments

Would you be so nice to include that into my script so that it works? Atm I' am not having enough brain power to get it to work.
@JFS31 do all R1 files have a corresponding R2 file?
Yes they have. In each folder there two files for example: D0770_S23_L001_R1_001.fastq.gz and D0770_S23_L001_R2_001.fastq.gz which I need to process.
@JFS31 see update - you can just loop over R1 files and get the R2 filename using the R1 filename.
Smart solution. Could have thought earlier about that ;) Thank you a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.