0

I have a process which moves lots of data into a database. I use multiprocessing for this.

It runs nice and quickly, but even when it's finished (all the rows are moved), it doesn't seem to end.

I've added join as I thought it means that the process won't terminate until all other processes are complete

Is there something I have missed here? why doesn't it end?

p=mp.Pool(cpu_count())
p.map(do_process, result)
p.close()
p.join()
1
  • I've noticed that multiprocessing tends to swallow exceptions without reporting them. Try wrapping the body of do_process in a try/except block and printing any exceptions. Commented Jul 8, 2022 at 7:31

1 Answer 1

2

So a nicer way of doing it, it to use joblib:

import joblib

with joblib.parallel_backend('loky'):
    results = joblib.Parallel(n_jobs=-1)(
        joblib.delayed(do_process)(item) 
        for item in result
        )

I am assuming here that that the result object is mapped over the do_process function. The with statement ensure closing the loop. If you want to know of what is happening, then you can add a verbosity setting to the joblib.Parralel object.

Hope it helps

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot - what does 'loky' mean here/
loky is the backend. It is the default backend to spawn more processes and run stuff in parallel. There are backends like dask, that support cluster computers and various GPU accelerated backend. The TL;DR is, loky is default, and probably the fastest on a local machine :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.