python2 and python3 multiprocessing.process issue

Question

I'm trying to understand what changed between python2 and python3 in the multiprocessing module. On python2 running this code works like a charm:

def RunPrice(items, price):
    print("There is %s items, price is: %s" % (items, price))

def GetTargetItemsAndPrice(cursor):
    res = cursor.execute("SELECT DISTINCT items, price FROM SELLS")
    threads = []
    for row in res.fetchall():
        p = multiprocessing.Process(target=RunPrice, args=(row[0],row[1]))
        threads.append(p)
        p.start()
    for proc in threads:
        proc.join()

Let's say there is 2000 entries to be processed in SELLS. On python2 this script run and exit as expected. On python3 I get a:

  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 69, in _launch
    child_r, parent_w = os.pipe()
OSError: [Errno 24] Too many open files

Any idea what happened between python2 and python3?

This is likely due to 2 things: 1. multiprocessing in python2 starts processes very differently from how multiprocessing starts processes in python3; and 2. Don't create a Process object for every entry. Create a Pool and distribute the work. — Abdou
– Abdou, Commented Jul 13, 2021 at 23:48
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 47145 max locked memory (kbytes, -l) 65536 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 47145 file locks (-x) unlimited — n00bz0r
– n00bz0r, Commented Jul 13, 2021 at 23:49
@Abdou could you provide an example based on the code I provided? — n00bz0r
– n00bz0r, Commented Jul 13, 2021 at 23:50

Booboo · Accepted Answer · 2021-07-17 13:23:21Z

I am assuming that your actual RunPrice function is a bit more CPU-intensive than what you show. Otherwise, this would not be a good candidate for multiprocessing. If RunPrice were very CPU-intensive and does not relinquish the CPU to wait for I/O to complete, it would not be advantageous to have a processing pool with more processes than the number of CPU cores that you have when you consider that creating processes is not a particularly inexpensive operation (although certainly not as expensive as it would be if you were running on Windows).

from multiprocessing import Pool

def RunPrice(items, price):
    print("There is %s items, price is: %s" % (items, price))

def GetTargetItemsAndPrice(cursor):
    res = cursor.execute("SELECT DISTINCT items, price FROM SELLS")
    rows = res.fetchall()
    MAX_POOL_SIZE = 1024
    # if RunPrice is very CPU-intensive, it may not pay to have a pool size
    # greater than the number of CPU cores you have. In that case:
    #from multiprocessing import cpu_count
    #MAX_POOL_SIZE = cpu_count()
    pool_size = min(MAX_POOL_SIZE, len(rows))
    with Pool(pool_size) as pool:
        # return values from RunPrice:
        results = pool.starmap(RunPrice, [(row[0], row[1]) for row in rows])

Collectives™ on Stack Overflow

python2 and python3 multiprocessing.process issue

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related