How to get two files in one for loop

Question

currently I'am working on a script that that should produce some PBS script which can be submited to the cluster. My normal scripts are working well but now I'am facing the problem of having two input file for one program. One of my scripts for example looks like:

#!/bin/bash

echo -e "#!/bin/bash\n
#SBATCH --job-name=whatever
#SBATCH --export=NONE
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=80G
#SBATCH --partition=blabla
#SBATCH --blabla" >> $1

echo -e "touch log_file_$1\n" >> $1

x=$( cd $( dirname ${BASH_SOURCE[0]} ) && pwd ) 

for file in /foo/bar/foo/bar/*; do
rl=$(readlink -f $file)
kw=${rl##*/} 
id=${kw%%.*} 
gz_weg=${kw%.*} 

if [ ! -d "$id" ]; then
    mkdir "$id"
fi

echo "echo $kw >> log_file_$1" >> $1
printf "foo-bar --mode barbar -e 0.001 --index /barz/barz/barz.index --inFile $rl --output $x/$id/$gz_weg.rma 2>> $x/log_file_$1 \n" >> $1
echo "echo -e '"\\n"' >> log_file_$1" >> $1
echo -e "\n" >> $1
done

Not a beauty I guess but it works for me. But now as stated above I'am facing the problem of having two input files. They are both in the same folder and I tried something like:

for file in /ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/iceman.UDG.*/*.fastq.gz; do

bs=$(basename $file)

if  [[ "$bs" == *R1* ]]; then
    r1=$(readlink -f $file)
    k1=${r1##*/}
    id1=${k1%%.*}
    gz_weg1=${k1%.*}
fi


if  [[ "$bs" == *R2* ]]; then
    r2=$(readlink -f $file)
    k2=${r2##*/}
    id2=${k2%%.*}
    gz_weg1=${k2%.*}
fi


if [ ! -d "$id1" ]; then
    mkdir "$id1"
fi

echo "echo $kw >> log_file_$1" >> $1
printf "blablabla -in1 $r1 -in2 $r2 -f foo -r bar -l 25 -qt -q 20 -o $x/$id1/whatever -verbose 2>> $x/log_file_$1 \n" >> $1
echo "echo -e '"\\n"' >> log_file_$1" >> $1
echo -e "\n" >> $1
done
fi

Because the files differ only in R1 or R2 in their filenames. However I realised this will not work properly because it will only get me one file. So how to solve the problem that -in1 is pointing to the file containing the R1 and -in2 containing the R2

Thanks in advance :)

muru · Accepted Answer · 2017-03-02 10:26:38Z

1

If you save your arguments in variables beforehand, then you can replace the arguments with the list of files and consume them two at a time:

out_file=$1
set -- /ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/iceman.UDG.*/*.fastq.gz

while [[ -z $1 ]]
do
    # Get the next two filenames
    file1=$1
    file2=$2
    # discard them from arguments
    shift 2

    # Then the rest of the script 
    bs1=...
    # Use $out_file instead of $1
done

This might run the risk of running out of space for arguments, so you could save a bit by trimming out the path:

out_file=$1
dirpath=/ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/
cd "$dirpath"
set -- iceman.UDG.*/*.fastq.gz
cd "$OLDPWD"
while [[ -z $1 ]]
do
    # Get the next two filenames
    file1="$dirpath/$1"
    file2="$dirpath/$2"
    # discard them from arguments
    shift 2
    ...

If all R1 files have a corresponding R2 file, then you don't need to take files two at a time - just loop over all R1 files, and then take the corresponding R2 file:

for file in /ifs/data/nfs_share/sukmb241/raw_data/samples/iceman_old/iceman.UDG.*/*R1*.fastq.gz; do
    r1=$(readlink -f $file)
    k1=${r1##*/}
    id1=${k1%%.*}
    gz_weg1=${k1%.*}


    # Change R1 to R2 in filename
    file=${file//R1/R2}
    r2=$(readlink -f $file)
    k2=${r2##*/}
    id2=${k2%%.*}
    gz_weg2=${k2%.*}

    if [ ! -d "$id1" ]; then
        mkdir "$id1"
    fi

    echo "echo $kw >> log_file_$1" >> $1
    printf "blablabla -in1 $r1 -in2 $r2 -f foo -r bar -l 25 -qt -q 20 -o $x/$id1/whatever -verbose 2>> $x/log_file_$1 \n" >> $1
    echo "echo -e '"\\n"' >> log_file_$1" >> $1
    echo -e "\n" >> $1
done

file=${file//R1/R2} replaces R1 in the filename with R2, thus giving the other filename.

edited Mar 2, 2017 at 10:26

answered Mar 2, 2017 at 9:53

muru

4,9811 gold badge38 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

JFS31 Over a year ago

Would you be so nice to include that into my script so that it works? Atm I' am not having enough brain power to get it to work.

muru Over a year ago

@JFS31 do all R1 files have a corresponding R2 file?

JFS31 Over a year ago

Yes they have. In each folder there two files for example: D0770_S23_L001_R1_001.fastq.gz and D0770_S23_L001_R2_001.fastq.gz which I need to process.

muru Over a year ago

@JFS31 see update - you can just loop over R1 files and get the R2 filename using the R1 filename.

JFS31 Over a year ago

Smart solution. Could have thought earlier about that ;) Thank you a lot.

Collectives™ on Stack Overflow

How to get two files in one for loop

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related