3

I have the following nextflow script:

echo true                                                                       
                                                                                
wd = "$params.wd"                                                               
geoid = "$params.geoid"                                                         
                                                                                
                                                                                
process step1 {                                                                 
                                                                                
 publishDir = "$wd/data/"                                                       
                                                                                
 input:                                                                         
  val celFiles from "$wd/data/$geoid"                                           
                                                                                
 output:                                                                        
  file "${geoid}_datFiles.RData" into channel                                   
                                                                                
 """                                                                            
 Rscript $wd/scripts/step1.R $celFiles $wd/data/${geoid}_datFiles.RData         
                                                                                
 """                                                                            
                                                                                
}                                                                               
  

The Rscript contains the following commands:

step1=function(WD,
               celFiles,
               output) {
 
  library(affy)

  datFiles=ReadAffy(celfile.path=paste0(WD,"/",celFiles))
  
  save(datFiles,file=output)

}

args=commandArgs(trailingOnly=TRUE)
WD=args[1]
celFiles=args[2]
output=args[3]

step1(WD,celFiles,output)

When it runs, the output file is saved in the directory I want ($wd/data/${geoid}_datFiles.RData). Given that publishDir points to the same directory, I would expect output (defined as "${geoid}_datFiles.RData") to be available under the publishDir directory.

However, I get the following error:

  Missing output file(s) `GSE4290_datFiles.RData` expected by process `step1`

The log file suggests that nextflow is still looking for the output in the workflow created directory:

Process `step1` is unable to find [UnixPath]: `/Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/work/92/42afb131a36eb32ed780bd1bf3bc3b/GSE4290_datFiles.RData`

The complete log file:

Nov-12 17:55:39.611 [main] DEBUG nextflow.cli.Launcher - $> nextflow run main.nf
Nov-12 17:55:39.945 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 20.07.1
Nov-12 17:55:39.968 [main] INFO  nextflow.cli.CmdRun - Launching `main.nf` [infallible_brahmagupta] - revision: d68e496ea0
Nov-12 17:55:40.026 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/nextflow.config
Nov-12 17:55:40.029 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/nextflow.config
Nov-12 17:55:40.140 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Nov-12 17:55:41.288 [main] DEBUG nextflow.Session - Session uuid: 94f22a74-2a63-4a87-9fb3-33cf925a5a74
Nov-12 17:55:41.288 [main] DEBUG nextflow.Session - Run name: infallible_brahmagupta
Nov-12 17:55:41.289 [main] DEBUG nextflow.Session - Executor pool size: 4
Nov-12 17:55:41.326 [main] DEBUG nextflow.cli.CmdRun -
  Version: 20.07.1 build 5412
  Created: 24-07-2020 15:18 UTC (08:18 PDT)
  System: Mac OS X 10.15.7
  Runtime: Groovy 2.5.11 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14
  Encoding: UTF-8 (UTF-8)
  Process: [email protected] [10.49.41.197]
  CPUs: 4 - Mem: 8 GB (708.4 MB) - Swap: 2 GB (927 MB)
Nov-12 17:55:41.353 [main] DEBUG nextflow.Session - Work-dir: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/work [Mac OS X]
Nov-12 17:55:41.354 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/bin
Nov-12 17:55:41.594 [main] DEBUG nextflow.Session - Observer factory: TowerFactory
Nov-12 17:55:41.598 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Nov-12 17:55:41.911 [main] DEBUG nextflow.Session - Session start invoked
Nov-12 17:55:42.309 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Nov-12 17:55:42.331 [main] DEBUG nextflow.Session - Workflow process names [dsl1]: step1
Nov-12 17:55:42.334 [main] WARN  nextflow.script.BaseScript - The use of `echo` method has been deprecated
Nov-12 17:55:42.495 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
Nov-12 17:55:42.496 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Nov-12 17:55:42.508 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
Nov-12 17:55:42.521 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=4; memory=8 GB; capacity=4; pollInterval=100ms; dumpInterval=5m

1 Answer 1

3

Your output declaration is looking for a file in the current workDir: "${geoid}_datFiles.RData", but your Rscript is writing to: $wd/data/${geoid}_datFiles.RData. If you change your command to:

Rscript $wd/scripts/step1.R $celFiles ${geoid}_datFiles.RData

Then Nextflow should be able to find the output file. The publishDir directive will then 'publish' it to the defined publishDir.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi @Steve, thanks for your response. I have adjusted the naming of the output files to be just the file name, as you suggest. While the workflow now succeeds, I still don't find the outputs where I want per publishDir, i.e. "$wd/data". They are instead under the unique workflow "work/..." directory.
Update: I had moved the 'publishDir' directive to the top of the document. I guess it wasn't reaching the processes, because when I added 'publishDir' to each process, I got the desired behavior. Is there better way to do this, i.e. without being redundant and including publishDir in each process?
Well, NF runs each process in a unique workDir (i.e. "work/...") so you should always expect output files under that directory (assuming your process succeeds). But looking at your code again, I think you have a typo on your publishDir line; i.e. there's an = sign where there shouldn't be; all you need is: publishDir "$wd/data". I think it's ok to be explicit, perhaps something like: publishDir(path: "${params.publish_dir}/${task.process.replaceAll(':', '/')}", enabled: params.publish_everything || params.publish_step1, mode: params.publish_mode,)
Thanks again for your response!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.