When I try to implement a parallel operation in python with multiprocessing library, I saw some processes do not terminate in non-intuitive manner.
My program consists of:
- a queue, used for data transfer between processes
- a user process, which calculates something using data received via the queue
- two maker processes, which generate data and push to the queue
Below is a simplified example. make_data generates random numbers and push to the queue, and the use_data receives the data and compute the average. In total, 2*1000=2000 numbers are generated, and all of them are used. This code runs as expected. After all, all processes becomes terminated and no data is left in the queue.
import random
from multiprocessing import Process, Queue
q = Queue(maxsize=10000)
def make_data(q):
for i in range(1000):
x = random.random()
q.put(x)
print("final line of make data")
def use_data(q):
i = 0
res = 0.0
while i < 2000:
if q.empty():
continue
i += 1
x = q.get()
res = res*(i-1)/i + x/i
print("iter %6d, avg = %.5f" % (i, res))
u = Process(target=use_data, args=(q,))
u.start()
p1 = Process(target=make_data, args=(q,))
p1.start()
p2 = Process(target=make_data, args=(q,))
p2.start()
u.join(timeout=10)
p1.join(timeout=10)
p2.join(timeout=10)
print(u.is_alive(), p1.is_alive(), p2.is_alive(), q.qsize())
Outcome:
final line of make data
final line of make data
iter 2000, avg = 0.49655
False False False 0
Things change when I let the makers generate more than necessary data.
The code below differs from the above only in that each maker generates 5000 data, hence not all data are used. When this is run, it prints message of the final lines, but the maker processes never terminate (needs Ctrl-C to stop).
import random
from multiprocessing import Process, Queue
q = Queue(maxsize=10000)
def make_data(q):
for i in range(5000):
x = random.random()
q.put(x)
print("final line of make data")
def use_data(q):
i = 0
res = 0.0
while i < 2000:
if q.empty():
continue
i += 1
x = q.get()
res = res*(i-1)/i + x/i
print("iter %6d, avg = %.5f" % (i, res))
u = Process(target=use_data, args=(q,))
u.start()
p1 = Process(target=make_data, args=(q,))
p1.start()
p2 = Process(target=make_data, args=(q,))
p2.start()
u.join(timeout=10)
p1.join(timeout=10)
p2.join(timeout=10)
print(u.is_alive(), p1.is_alive(), p2.is_alive(), q.qsize())
Outcome:
final line of make data
final line of make data
iter 2000, avg = 0.49388
False True True 8000
# and never finish
It looks to me that all processes run to the end, so wonder why they keep alive. Can someone help me understand this phenomenon?
I ran this program on python 3.6.6 from miniconda distribution.