python TypeError: can't pickle generator objects on using generators in combination with multiprocessing

Question

I have a function say fun() which yields generators. I want to check whether the generator is empty, and since i want to save as much run time as possible, I don't convert it to a list and check whether its empty. Instead I do this:

def peek(iterable):
    try:
        first = next(iterable)
    except StopIteration:
        return None
    return first, itertools.chain([first], iterable)

I use multiprocessing like this:

def call_generator_obj(ret_val):
    next = peek(ret_val)
    if next is not None:
        return False
    else:
        return True


def main():
    import multiprocessing as mp
    pool = mp.Pool(processes=mp.cpu_count()-1)
    # for loop over here
        ret_val = fun(args, kwargs)
        results.append(pool.apply(call_generator_obj, args=(ret_val,))
        # the above line throws me the error

As far as I know pickling is when converting some object in memory to a byte stream and I am doing anything like that in any of my functions.

TRACEBACK: (after the pointed line)

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 253, in apply
    return self.apply_async(func, args, kwds).get()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
    put(task)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: can't pickle generator objects

abarnert · Accepted Answer · 2018-07-08 03:55:43Z

2

As far as I know pickling is when converting some object in memory to a byte stream and I don't think I am doing anything like that over here.

Well, you are doing that.

You can't pass Python values directly between processes. Even the simplest variable holds a pointer to a structure somewhere in the process's memory space, and just copying that pointer to a different process would give you a segfault or garbage, depending on whether the same memory space in the other process is unmapped or mapped to something completely different. Something as complex as a generator—which is basically a live stack frame—would be even more impossible.

The way multiprocessing solves that is to transparently pickle everything you give it to pass. Functions, and their arguments, and their return values, all need to be pickled.

If you want to know how it work under the covers: a Pool essentially works by having a Queue that the parent puts tasks on—where the tasks are basically (func, args) pairs—and the children get tasks off. And a Queue essentially works by calling pickle.dumps(value) and then writing the result to a pipe or other inter-process-communication mechanism.

answered Jul 8, 2018 at 3:55

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

J. Doe Over a year ago

hmmm... any solution/workaround then? without having to convert my generator into a list? Its the matter of saving run time and I am dealing with very large data ☹️

abarnert Over a year ago

@J.Doe There is no fully-general solution. For most apps, there are specific solutions, where you can rewrite your iteration so that it's possible to grab a value or chunk of values in the parent and pass that to the pool function, instead of passing the iterator. But for your code as posted here, the only visible work is calling next on a generator. If that's true for your real code as well, it would probably require much larger changes.

J. Doe Over a year ago

What do you mean by "the only visible work is calling next on a generator"? You mean that is the thing creating the problem, right? And yes that is a toplevel summary of my code. I guess I will have to run a profiler and compromise between generators or multiprocessing

abarnert Over a year ago

@J.Doe Well, yeah, but the point is that there's presumably real work being done inside the generator functions, and if you could surface that real work—e.g., maybe yield a bunch of values that can be passed a function, instead of yielding the result of calling that function—then you could multiprocess that work without rewriting everything from scratch. (Of course that would almost certainly mean breaking an abstraction that makes your code readable/maintainable/generic, but sometimes you have to do that for the sake of performance.)

J. Doe Over a year ago

sorry man its Open Source and the guys will go crazy if I try to do that 😛

|

Collectives™ on Stack Overflow

python TypeError: can't pickle generator objects on using generators in combination with multiprocessing

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related