0

I am working with RNAseq data and I am doing the mapping in unix.Si I have several individual bash scripts, which everyone using as input the output of the previous one. Now I want to merge them all together and run one script at once. I am pretty new in Unix environment and I don't know how to this, i guess it's not just copy pasting , right

PS. first thing i supposed I have to change variable f to another,let's say z

first script :

for f in `ls ../reads/*.fastq.gz | sed 's/_[12].fastq.gz//g' `

do

hisat2 -x ../genome -1 ${f}_01.fastq.gz -2 ${f}_02.fastq.gz > ${f}.mapped.sam

done

second script :

for f in `ls  *.mapped.sam|sed 's/.mapped.sam//g' `

do

samtools view -b ${f} > ${f}.mapped.bam

done
5
  • copy pasting will work Commented Oct 18, 2019 at 10:20
  • you can also create a script that will run those two : script 3 : sh script1.sh && sh script2.sh Commented Oct 18, 2019 at 10:21
  • And the first person to suggest copy'n'paste gets evicted from their computer science class for … oh, this is not allowed to be NSFW. Commented Oct 18, 2019 at 10:30
  • 1
    When writing shell scripts, you're often best off taking the file names to be processed as arguments, rather than trying to parse the output of ls or working with other fixed names. for file in "$@"; do …; done and then using bash script.sh *.mapped.sam is generally better (and a lot more flexible) than using a version with ls *.mapped.sam in it. Commented Oct 18, 2019 at 10:37
  • Thank you both I will try and see Commented Oct 18, 2019 at 10:46

1 Answer 1

1

With a single loop and bash's variable substitution:

for f in `ls ../reads/*.fastq.gz`
do
    f="${f/_[12].fastq.gz/}"
    hisat2 -x ../genome -1 ${f}_01.fastq.gz -2 ${f}_02.fastq.gz | samtools view - -b -o ${f}.mapped.bam
done

Related note: "Samtools is designed to work on a stream. It regards an input file - as the standard input (stdin)"

Sign up to request clarification or add additional context in comments.

1 Comment

It's curious that for each file in ../reads/ that matches xyz_[12].fastq.gz, there are two other files in the directory xyz_01.fastq.gz and xyz_02.fastq.gz — but those names also match *.fastq.gz. This is a problem in the original first script — not unique to your solution which is perfectly reasonable. If the intermediate xyz.mapped.sam file is wanted for some reason, you can add | tee ${f}.mapped.sam before the pipe in your solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.