0

I'm trying to understand what changed between python2 and python3 in the multiprocessing module. On python2 running this code works like a charm:

def RunPrice(items, price):
    print("There is %s items, price is: %s" % (items, price))

def GetTargetItemsAndPrice(cursor):
    res = cursor.execute("SELECT DISTINCT items, price FROM SELLS")
    threads = []
    for row in res.fetchall():
        p = multiprocessing.Process(target=RunPrice, args=(row[0],row[1]))
        threads.append(p)
        p.start()
    for proc in threads:
        proc.join()

Let's say there is 2000 entries to be processed in SELLS. On python2 this script run and exit as expected. On python3 I get a:

  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 69, in _launch
    child_r, parent_w = os.pipe()
OSError: [Errno 24] Too many open files

Any idea what happened between python2 and python3?

8
  • What's the output of ulimit -a (on the shell)? Commented Jul 13, 2021 at 23:42
  • This script runs on linux only. Commented Jul 13, 2021 at 23:48
  • This is likely due to 2 things: 1. multiprocessing in python2 starts processes very differently from how multiprocessing starts processes in python3; and 2. Don't create a Process object for every entry. Create a Pool and distribute the work. Commented Jul 13, 2021 at 23:48
  • core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 47145 max locked memory (kbytes, -l) 65536 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 47145 file locks (-x) unlimited Commented Jul 13, 2021 at 23:49
  • @Abdou could you provide an example based on the code I provided? Commented Jul 13, 2021 at 23:50

1 Answer 1

1

I am assuming that your actual RunPrice function is a bit more CPU-intensive than what you show. Otherwise, this would not be a good candidate for multiprocessing. If RunPrice were very CPU-intensive and does not relinquish the CPU to wait for I/O to complete, it would not be advantageous to have a processing pool with more processes than the number of CPU cores that you have when you consider that creating processes is not a particularly inexpensive operation (although certainly not as expensive as it would be if you were running on Windows).

from multiprocessing import Pool

def RunPrice(items, price):
    print("There is %s items, price is: %s" % (items, price))

def GetTargetItemsAndPrice(cursor):
    res = cursor.execute("SELECT DISTINCT items, price FROM SELLS")
    rows = res.fetchall()
    MAX_POOL_SIZE = 1024
    # if RunPrice is very CPU-intensive, it may not pay to have a pool size
    # greater than the number of CPU cores you have. In that case:
    #from multiprocessing import cpu_count
    #MAX_POOL_SIZE = cpu_count()
    pool_size = min(MAX_POOL_SIZE, len(rows))
    with Pool(pool_size) as pool:
        # return values from RunPrice:
        results = pool.starmap(RunPrice, [(row[0], row[1]) for row in rows])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.