Embarassingly parallel tasks with IPython Parallel (or other package) depending on unpickable objects

Question

I often hit problems where I wanna do a simple stuff over a set of many, many objects quickly. My natural choice is to use IPython Parallel for its simplicity, but often I have to deal with unpickable objects. After trying for a few hours I usually resign myself to running my taks overnight on a single computer, or do a stupid thing like dividing things semi-manually in to run in multiple python scripts.

To give a concrete example, suppose I want to delete all keys in a give S3 bucket.

What I'd normally do without thinking is:

import boto
from IPython.parallel import Client

connection = boto.connect_s3(awskey, awssec)
bucket = connection.get_bucket('mybucket')

client = Client()
loadbalancer = c.load_balanced_view()

keyList = list(bucket.list())
loadbalancer.map(lambda key: key.delete(), keyList)

The problem is that the Key object in boto is unpickable (*). This occurs very often in different contexts for me. It's a problem also with multiprocessing, execnet, and all other frameworks and libs I tried (for obvious reasons: they all use the same pickler to serialize the objects).

Do you guys also have those problems? Is there a way I can serialize these more complex objects? Do I have to write my own pickler for this particular objects? If I do, how do I tell IPython Parallel to use it? How do I write a pickler?

Thanks!

(*) I'm aware that I can simply make a list of the keys names and do something like this:

loadbalancer.map(lambda keyname: getKey(keyname).delete())

and define the getKey function in each engine of the IPython cluster. This is just a particular instance of a more general problem that I find often. Maybe it's a bad example, since it can be easily solved in another way.

I'm sure the tasks are embarrassed enough without you making fun of them on SO! — user1781710
– user1781710, Commented Mar 5, 2013 at 15:23

Community · Accepted Answer · 2017-05-23 10:24:09Z

2

IPython has a use_dill option, where if you have the dill serializer installed, you can serialize most "unpicklable" objects.

How can I use dill instead of pickle with load_balanced_view

edited May 23, 2017 at 10:24

CommunityBot

11 silver badge

answered Oct 19, 2014 at 11:18

Mike McKerns

35.5k8 gold badges126 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alex S · Accepted Answer · 2013-11-04 13:37:16Z

That IPython sure brings people together ;). So from what I've been able to gather, the problem with pickling objects are their methods. So maybe instead of using the method of key to delete it you could write a function that takes it and deletes it. Maybe first get a list of dict's with the relevant information on each key and then afterwards call a function delete_key( dict ) which I leave up to you to write because I've no idea how to handle s3 keys.

Would that work?

Alternatively, it could be that this works: simply instead of calling the method of the instance, call the method of the class with the instance as an argument. So instead of lambda key : key.delete() you would do lambda key : Key.delete(key). Of course you have to push the class to the nodes then, but that shouldn't be a problem. A minimal example:

 class stuff(object):
       def __init__(self,a=1):
            self.list = []
       def append(self, a):
            self.list.append(a)

  import IPython.parallel as p
  c = p.Client()
  dview = c[:]

  li = map( stuff, [[]]*10 ) # creates 10 stuff instances

  dview.map( lambda x : x.append(1), li ) # should append 1 to all lists, but fails

  dview.push({'stuff':stuff}) # push the class to the engines
  dview.map( lambda x : stuff.append(x,1), li ) # this works.

Collectives™ on Stack Overflow

Embarassingly parallel tasks with IPython Parallel (or other package) depending on unpickable objects

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related