1

I have the following function

@dask.delayed
def load_ds(p):
    import xarray as xr
    multi_file_dataset = xr.open_mfdataset(p, combine='by_coords', concat_dim="time", parallel=True)
    mean = multi_file_dataset['tas'].mean(dim='time')
    return mean

which opens a set of NetCDF files (identified by path p) and calculates the mean value over time.

I'm trying to run in parallel the function over two differents paths (= datasets):

results = []
result1 = dask.delayed(load_ds)(path1)
results.append(result1)
result2 = dask.delayed(load_ds)(path2)
results.append(result2)
   
results = dask.compute(*results)

I've also tried

results = []
result1 = dask.delayed(load_ds)(path1)
results.append(result1)
result2 = dask.delayed(load_ds)(path2)
results.append(result2)
  
futures = dask.persist(*results)
results = dask.compute(*futures)

But, I noticed that the execution actually starts when I try to retrieve the results:

 print(results[0].values)

And again, when I retrieve the second one

 print(results[1].values)

What's wrong? Is there a way to retrieve the results object just once?

2
  • It is the principle of delayed to run lazily. So nothing is wrong. It is results as such that must be the argument of a delayed-decorated function. Commented Feb 9, 2021 at 13:35
  • Is there a way to run the function in parallel over the two datasets? Commented Feb 9, 2021 at 13:39

1 Answer 1

1

Given what you have done so far, what about:

delayed_task = dask.delayed(
    lambda L: (L[0].values, L[1].values)
)(results)

And "later",

tup = delayed_task.compute()

Sign up to request clarification or add additional context in comments.

4 Comments

@Fab see edit. What does print(tup) return ?
Computation does not start... No tasks in the dask dashboard
I was able to test it. It seems ok! Is this approach better or equal to run client.submit(load_ds,path2) and then results = client.gather(futures)?
@Fab Actually, ~.compute is synchronous, meaning that it blocks the interpreter until it completes. So it depends on whether you want to block things until the result is returned. Put differently I would go for asynchronous techniques, such as ~.gather indeed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.