1

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel. The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:

Missing output file(s) `trimmed_files` expected by process `trimming` (1)

The nextflow script I'm trying to run is:

#! /usr/bin/env nextflow

params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
 
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"

process trimming {

input: 
file fasta_file from fasta_files

output: 
path trimmed_files into trimmed_channel

// the shell script to be run: 
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow

#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
    [ -f "\$file" ] || continue 
    name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files. 
    sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
    echo "Found" "\$name" "from:" "\$sample"
    if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
        trim_galore -j 8 "\$file" -o FASTQ #trim the files
        mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report 
    else
      echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
    fi
done

trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}

The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!

2 Answers 2

4

Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.

However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.

This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:

nextflow.enable.dsl=2

params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"


process trim_galore {

    tag { "${sample}:${fastq_file}" }

    publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
        fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
    }

    cpus 8

    input:
    tuple val(sample), path(fastq_file)

    output:
    tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
    path "${fastq_file}_trimming_report.txt", emit: trimming_report

    """
    trim_galore \\
        -j ${task.cpus} \\
        "${fastq_file}"
    """
}

workflow {

    Channel.fromPath( params.fastq_files )
        | map { tuple( it.getSimpleName(), it ) }
        | set { sample_fastq_files }

    results = trim_galore( sample_fastq_files )

    results.trimmed_fastq_files.view()
}

Run using:

nextflow run script.nf \
    -ansi-log false \
    --fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'
Sign up to request clarification or add additional context in comments.

Comments

3

Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.

Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:

path "FASTQ/*_trimmed.fq.gz" into trimmed_channel

Some things you do, but probably want to avoid:

  • Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
  • Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.