I've been playing around with a Pool object while using an instance method as the func argument. It's been a bit surprising with regards to instance state. It seems like the instance gets reset on every chunk. E.g.:
import multiprocessing as mp
import logging
class Worker(object):
def __init__(self):
self.consumed = set()
def consume(self, i):
if i not in self.consumed:
logging.info(i)
self.consumed.add(i)
if __name__ == '__main__':
n = 1
logging.basicConfig(level='INFO', format='%(process)d: %(message)s')
worker = Worker()
with mp.Pool(processes=2) as pool:
pool.map(worker.consume, [1] * 100, chunksize=n)
If n is set to 1, then 1 gets logged every time. if n is set to 20, it's logged 5 times, etc. What is the reason for this, and is there any way around it? I also wanted to use the initializer pool argument with an instance method but hit similar issues.
Workerin Memory and those are shared. So if you have chunksize2and10jobs,5objects will share the same worker.