0

This is a follow up question to a potential answer to one of my previous questions on using Dask computed to access one element in a large array .

Why does using Dask compute cause the execution to hang below? Here's the working code snippet:

#Suppose you created a scheduler at the ip address of 111.111.11.11:8786


from dask.distributed import Client
import dask.array as da

# client1
client1 = Client("111.111.11.11:8786")
x = da.ones(10000000, chunks=(100000,))  # 1e7 size array cut into 1e5 size chunks
x = x.persist()
client1.publish_dataset(x=x)

# client2
client2 = Client("111.111.11.11:8786")
x = client2.get_dataset('x')  #get the lazy collection x
result = x[0].compute() #code execution hangs here
print(result)

1 Answer 1

2

persist behaves differently, depending on whether you have a distributed client active or not. In your case, you call it before making any client, with the result that the whole of the data is packed into the graph description. This behaviour is OK on the threaded scheduler, where memory is shared between workers, but when you publish, you are sending the whole thing to the scheduler, and apparently it is choking.

If you make client1 first, you will notice that persist happens very quickly (the scheduler is only getting pointers to the data in this case), and the publish-fetch cycle will work as expected.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi MDurant, when I define client1 before persist, the lag still occurs. I updated the code in the example above for consistency.
A lag is OK - you are still creating data in the workers when you persist - but does it now succeed? Your code runs for me almost instantly on my laptop.
I was able to get it working. I had two problems. First, instead of using 111.111.11.11:8786, I had to use tcp://111.111.11.11:8786. Second, I hadn't created any workers for the scheduler. What's interesting is that I was able to get the code working when defining the client after using persist.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.