1

I want to write a for loop that uses a function that takes two variables, both of which are files. I have a set of FASTA files and a set of text files that correspond to each FASTA file, and I want the for loop to itereate over both lists in parallel. This is what the function would look like for one pair of files:

anvi-gen-contigs-database --contigs-fasta fastafile.fna --project-name ProjectName --output-db-path /path/to/file/fastafile.fna.db --external-gene-calls /path/to/file/textfile.txt --ignore-internal-stop-codons

I know how to set up a for loop that takes only one variable, and previously I have used a for loop that looks like this:

for f in *.fna; do 
    anvi-gen-contigs-database --contigs-fasta $f --project-name ProjectName --output-db-path /path/to/file/${f}_out.db; 
done

I've been looking around this and other sites for a way to modify the preceding for loop so that it will take two files as variables and iterate over them in pairs, and I've found two ways that might work, but none of the examples I have found involve using files as variables, so I wanted to check and make sure the for loop will do what I want it to do before I try running it. Here is the first way:

for f in *.fna and for e in *.txt; do 
    anvi-gen-contigs-database --contigs-fasta $f --project-name ProjectName --output-db-path /path/to/file/${f}_out.db --external-gene-calls /path/to/file/$e --ignore-internal-stop-codons; 
done

And here is the second way:

for f, e in zip(*.fna, *.txt); do 
    anvi-gen-contigs-database --contigs-fasta $f --project-name ProjectName --output-db-path /path/to/file/${f}_out.db --external-gene-calls /path/to/file/$e --ignore-internal-stop-codons; 
done

Can someone please confirm that at least one of those ways is syntactically correct and will iterate over the two lists of files in parallel as I want it to do, or else suggest a correct or better way? Thanks!

4
  • None of your example loops are actually Python. Are actually trying to solve this in Python, or in a shell script? Commented Jul 24, 2021 at 18:00
  • @larsks It must be a shell script then. I would be typing them directly into the Terminal window. I thought the loops were Python based, but I must have been mistaken. Commented Jul 24, 2021 at 18:08
  • Put the two lists of filenames in arrays. Then loop over the array indexes, and use that to index both arrays. Commented Jul 24, 2021 at 18:18
  • zip() is a Python function, not bash. Commented Jul 24, 2021 at 18:18

1 Answer 1

1

Put the two lists of filenames in arrays, then iterate over the array indexes.

fastas=(*.fna)
texts=(*.txt)

for i in ${#fastas[@]}; do
    f=${fastas[i]}
    e=${texts[i]}
    anvi-gen-contigs-database --contigs-fasta "$f" --project-name ProjectName --output-db-path "/path/to/file/${f}_out.db" --external-gene-calls "/path/to/file/$e" --ignore-internal-stop-codons
done
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Is it not necessary to add a semicolon after the body of the function, before 'done'?
No. Semicolons are only needed between commands on the same line, not different lines. Do you know basic shell syntax?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.