1

Original version of my problem

I'm trying to do a brute-force search using scipy.optimize.brute.

The cost function can be evaluated if 4 parameters are given, but those 4 parameters must follow some conditions.

To deal with it and some other complecated issues, I made my python class, which is simplified as Parameter in below example, but some of the attributes got lost when I use multiprocessing via workers keyword.

Simplified version of my problem

import numpy as np
from multiprocessing import Pool

class Parameter(np.ndarray):
    def __new__(cls, maximum):
        self = np.asarray([0., 0., 0., 0.], dtype=np.float64).view(cls)
        return self

    def __init__(self, maximum):
        self.maximum = maximum
        self.validity = True

    def isvalid(self):
        if self.sum() <= self.maximum:
            return True
        else:
            return False

    def set(self, arg):
        for i in range(4):
            self[i] = arg[i]
        self.validity = self.isvalid()

def cost(arg, para):
    para.set(arg)
    if para.validity:
        return para.sum()
    else:
        return para.maximum

class CostWrapper:
    def __init__(self, f, args):
        self.f = f
        self.args = [] if args is None else args

    def __call__(self, x):
        return self.f(np.asarray(x), *self.args)

if __name__ == '__main__':
    parameter = Parameter(100)
    wrapped_cost = CostWrapper(cost, (parameter,))
    parameters_to_be_evaluated = [np.random.rand(4) for _ in range(4)]
    with Pool(2) as p:
        res = p.map(wrapped_cost, parameters_to_be_evaluated)

, which raises

  File "\_bug_attribute_lose.py", line 126, in isvalid
    if self.sum() <= self.maximum:
AttributeError: 'Parameter' object has no attribute 'maximum'

But, if I use wrapped_cost without p.map, like below does not raise error.

wrapped_cost(np.random.rand(4))

What I've tried

By putting some print messages all around my code, I found that both __new__ and __init__ methods are called only once, so I guess that multiprocessing library somehow copied parameter.

Also, I found out that the copied version of parameter only contains attributes that np.ndarray has:

dir(para) = ['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_function__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmatmul__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'isvalid', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'set', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']

(see that neither 'maximum' nor 'validity' exist)

Therefore, I tried to implement __copy__ method in Parameter class, like

def __copy__(self):
    print('__copy__')
    new = Parameter(self.maximum)
    new.__dict__.update(self.__dict__)
    return new

, but failed.

My questions:

  1. Some of the attributes that Parameter object should have got lost. My guess is that it's because multiprocessing library somehow copied the variable parameter, but I didn't implement the copy method properly. Am I right?

  2. If so, how can I do that? If not, please let me know which makes the error.

1 Answer 1

1

It's a bit tricky but it's possible.

First, when inheriting from np.ndarray you should define __array_finalize__ method that will retrieve your custom attributes from the object returned by __new__. Note that __array_finalize__ is for some reason called multiple times, so you have to introduce a null guard. More about this in the docs.

def __array_finalize__(self, obj):
    if obj is None: return
    self.maximum = getattr(obj, 'maximum', None)
    self.validity = getattr(obj, 'validity', None)

Secondly, multiprocessing.Pool serializes the data before sending them to workers using pickle. In the process, your extra attributes are lost. So we have to add them back before continuing.

Override __reduce__ method:

def __reduce__(self):
    pickled_state = super().__reduce__()
    new_state = pickled_state[2] + (self.__dict__, )
    return (*pickled_state[0:2], new_state)

And override __setstate__ method:

def __setstate__(self, state):
        self.__dict__.update(state[-1])
        super().__setstate__(state[0:-1])

The implementation was borrowed from this answer.

Ok, now let's combine it into a runnable code:

import numpy as np
from multiprocessing import Pool

class Parameter(np.ndarray):
    def __new__(cls, maximum):
        obj = np.asarray([0, 0, 0, 0], dtype=np.float64).view(cls)
        obj.maximum = maximum
        obj.validity = True
        return obj
    
    def __array_finalize__(self, obj):
        if obj is None: return
        self.maximum = getattr(obj, 'maximum', None)
        self.validity = getattr(obj, 'validity', None)

    def __reduce__(self):
        pickled_state = super().__reduce__()
        new_state = pickled_state[2] + (self.__dict__, )
        return (*pickled_state[0:2], new_state)
    
    def __setstate__(self, state):
        self.__dict__.update(state[-1])
        super().__setstate__(state[0:-1])

    def isvalid(self):
        return self.sum() <= self.maximum

    def set(self, arg):
        for i in range(4):
            self[i] = arg[i]
        self.validity = self.isvalid()

def cost(arg, para):
    para.set(arg)
    return para.sum() if para.validity else para.maximum

class CostWrapper:
    def __init__(self, f, args):
        self.f = f
        self.args = () if args is None else args

    def __call__(self, x):
        return self.f(np.asarray(x), *self.args)

if __name__ == '__main__':
    parameter = Parameter(100)
    wrapped_cost = CostWrapper(cost, (parameter,))
    parameters_to_be_evaluated = [np.random.rand(4) for _ in range(4)]
    with Pool(2) as p:
        res = p.map(wrapped_cost, parameters_to_be_evaluated)

By the way, did you know this question already exists? Here. But it doesn't share your problem with multiple attributes (which is an easy fix), so I will cut you some slack this time.

Sign up to request clarification or add additional context in comments.

1 Comment

I didn't noticed that the same question was posted before. Both of your answer and the post you gave me are really helpful to me. Thank you very much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.