1

I am subclassing pandas DataFrame and I want to have an attribute.

class MyFrame(pd.DataFrame):


    _metadata = ['myattr']
    myattr = []


    def __init__(self, *args, **kwargs):
        pd.DataFrame.__init__(self, *args, **kwargs)

        self.myattr.append(0)


    @property
    def _constructor(self):
        return AutoData

My issue is that myattr is a class attribute. When I modify it in an instance of my class, every instances got modified:

mf2 = mf
mf2.myattr.append(1)
print(mf.myattr)
>>> [0, 1]

But I want the attribute to be attached with its instance. In other word, modify myattr only for mf2 but not for mf. Thank you.

4
  • You can simply define attribute in __init__ self.myattr = [], by this you are sure this is instance's attribute ;) Commented Jan 27, 2019 at 2:07
  • Yes but then it raises 2 issues: (1) myattr will not be attached to copies of my object and (2) it gives this warning: UserWarning: Pandas doesn't allow columns to be created via a new attribute name Commented Jan 27, 2019 at 2:15
  • Well, I don't know how much this will be problematic for you but consider creating class which will not inherit from DataFrame but instead will contain it as for example self.df = pd.DataFrame. Composition in this case looks like better solution because it will not need from you to adjust to DataFrame implementation at cost of wrapping it. Of course this isn't best solution at all cases but still, consider it :) Commented Jan 27, 2019 at 2:22
  • Thank you for you advice. I finally manage to find a solution by defining the attribute in __init__ and redefining the copy method which now copy also my attribute to the new DataFrame. Then I use copy() to duplicate my object. Commented Jan 28, 2019 at 1:18

1 Answer 1

1

Assigning instance attributes to a pd.DataFrame subclass can be done as following:

class MyFrame(pd.DataFrame):
    _metadata = ['myattr']

    def __init__(self, *args, **kwargs):
        pd.DataFrame.__init__(self, *args, **kwargs)
        self.myattr = [0]

    @property
    def _constructor(self):
        return MyFrame

The _metadata list sets the attributes that should not be considered as columns. Names listed in _metadata are considered in the __setattr__() and __getattr__() methods of the parentclass of pd.DataFrame(which is NDFrame) and set as object attributes without raising the UserWarning.

While _metadata are normal properties there is also the possibility to set temporary properties with _internal_names as described in the pandas documentation. Temporary properties are not kept after dataframe modifictions.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.