Nextflow: Missing output file(s) expected by process

Question

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel. The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:

Missing output file(s) `trimmed_files` expected by process `trimming` (1)

The nextflow script I'm trying to run is:

#! /usr/bin/env nextflow

params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
 
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"

process trimming {

input: 
file fasta_file from fasta_files

output: 
path trimmed_files into trimmed_channel

// the shell script to be run: 
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow

#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
    [ -f "\$file" ] || continue 
    name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files. 
    sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
    echo "Found" "\$name" "from:" "\$sample"
    if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
        trim_galore -j 8 "\$file" -o FASTQ #trim the files
        mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report 
    else
      echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
    fi
done

trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}

The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!

Steve · Accepted Answer · 2022-03-09 10:46:30Z

Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.

However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.

This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:

nextflow.enable.dsl=2

params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"


process trim_galore {

    tag { "${sample}:${fastq_file}" }

    publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
        fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
    }

    cpus 8

    input:
    tuple val(sample), path(fastq_file)

    output:
    tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
    path "${fastq_file}_trimming_report.txt", emit: trimming_report

    """
    trim_galore \\
        -j ${task.cpus} \\
        "${fastq_file}"
    """
}

workflow {

    Channel.fromPath( params.fastq_files )
        | map { tuple( it.getSimpleName(), it ) }
        | set { sample_fastq_files }

    results = trim_galore( sample_fastq_files )

    results.trimmed_fastq_files.view()
}

Run using:

nextflow run script.nf \
    -ansi-log false \
    --fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'

Steve · Accepted Answer · 2022-03-09 06:14:23Z

3

Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.

Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:

path "FASTQ/*_trimmed.fq.gz" into trimmed_channel

Some things you do, but probably want to avoid:

Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.

edited Mar 9, 2022 at 6:14

Steve

55.1k13 gold badges94 silver badges105 bronze badges

answered Mar 7, 2022 at 14:28

Pallie

1,1196 silver badges12 bronze badges

Collectives™ on Stack Overflow

Nextflow: Missing output file(s) expected by process

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related