3

I have started to read Nexflow's documentation and found that one can specify a scratch directory for the execution. Once the task is complete, one can use the stageOutMode directive to copy the output files from scratch to storeDir.

The output files to be copied are specified by the output directive. My question is the following: is it possible to specify entire directories as output so that they would be copied recursively from scratch to storeDir? If so, how?

1 Answer 1

3

By default, the path output qualifier will capture process outputs (files, directories, etc) recursively. All you need to do is specify the (top-level) directory in your output declaration, like in the example below:

nextflow.enable.dsl=2

process test {

    scratch '/tmp/my/path'

    stageOutMode 'copy'

    storeDir '/store/results'

    input:
    val myint

    output:
    path "outdir-${myint}"

    script:
    def outdir = "outdir-${myint}/foo/bar/baz"

    """
    mkdir -p "${outdir}" 
    touch "${outdir}/${myint}.txt"
    """
}

workflow {

    ch = Channel.of( 1..3 )

    test(ch)
}

Setting the stageOutMode directive just changes the how the output files are staged out from the scratch directory to the work directory. I.e. this directive does not change how process results are staged into the storeDir directory.

The storeDir directive changes what finally happens to the files listed in the output declaration such that they are moved from the work directory into the specified storeDir directory.

Sign up to request clarification or add additional context in comments.

2 Comments

thank you. Please allow me one more question: if I define scratch '$tmppath', and execute the script on a SLURM cluster, will $tmppath be substituted at runtime? My scheduler allocates a temporary directory with the JOBID in its name which is only allocated when the job starts running.
@Botond Yes, as long as the variable is single quoted like you've got above. Our PBS Pro scheduler does the same thing and sets a variable called $TMPDIR which will point to something like /scratch/pbs.36569355.hpcpbs01 when the job starts. This directory is only created when the job starts and will be cleaned up automatically when it finishes. Nextflow will try to write all output in a sub-directory using a unique id, for example: /scratch/pbs.36569355.hpcpbs01/nxf.6S4nL9mY3p/outdir-1/foo/bar/baz/1.txt.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.