2

I have a luigi pipeline. We have a lot and lots of external files that change with a regular basis, and we want to be able to build the pipeline from metadata.

I create classes dynamically, and have found two ways to do it:

Using exec:

exec("""
class {system}(DeliverySystem):
    pass
""".format(system='ClassUsingExec'))

Using type:

name = 'ClassUsingType'
globals()[name] = type(name, (DeliverySystem,),{})

Both of these work fine in single-threaded environments, but when I start running luigi with many workers spawning child-processes the exec version is fine but the type version gives errors as described in this post and this post (see them for more complete stack traces):

PicklingError: Can't pickle <class 'abc.ClassUsingType'>: attribute lookup abc.ClassUsingType failed.

The only diff I can find between the two is the module:

print(ClassUsingExec.__dict__) #=>

mappingproxy({'__module__': '__main__',
              '__doc__': None,
              '__abstractmethods__': frozenset(),
              '_abc_impl': <_abc_data at 0x15b5063c120>,
              '_namespace_at_class_time': ''})


print(ClassUsingType.__dict__) #=>

mappingproxy({'__module__': 'abc',
              '__doc__': None,
              '__abstractmethods__': frozenset(),
              '_abc_impl': <_abc_data at 0x15b3f870450>,
              '_namespace_at_class_time': ''})

It seems to module is different, and that might be the source of the diff.

Using Python 3.6, Windows 10, luigi 2.8.9.

Questions:

Is there a way to use type to create a class so that its module is the module in which it is defined, and not in abc?

Is there some other difference I am missing between the methods? According to this post there should be no difference, but I am not finding that to be the case.

4
  • Why do you not want to use the exec method? Commented Nov 5, 2019 at 12:26
  • I want to understand the difference, as I find the type method easier to extend. As many others, I also have hard feelings toward exec. Commented Nov 5, 2019 at 13:01
  • I understand hard feelings towards exec, but I think in this case they're outweighed by having to assign to globals()[...]. FWIW, if I create a class using type, it's __module__ is the module I defined it in. Commented Nov 5, 2019 at 13:20
  • That is very helpful, there must be something going on with the Luigi Task Metaclass tinkering with the creation of the classes then! Thank, that helps a lot! Commented Nov 5, 2019 at 13:55

1 Answer 1

3

The problem arises because:

  • In Windows, child processes do not have access to parent variables.
  • Luigi Tasks uses Register (and extension of ABC) as its meta-class.
  • When creating a class dynamically that has an ABC (Abstract Base Class) as a meta-class, the module of that class will be abc, not the name of the module where the class was defined.

When a worker is assigned a Task, it will go to the module to load the task. Since the module is set to abc and not the module where the class is dynamically created it will fail.

To make it work, all that is needed is to modify the class creation to modify the module:

type(name, (DeliverySystem,),{})

becomes

type(name, (DeliverySystem,),{'__module__':__name__})

Now when the worker gets assigned the task, it will go into the right module and re-create the class, and things will work!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.