4

I have an instance variable that seems to be treated like a class variable since it changes all instances of the object.

class DNA(object):

      def __init__(self,genes = pd.DataFrame(), sizecontrol=500, name='1'):
        self.name = name
        self.genes = genes  # This attribute should be an instance variable 
        self.GeneLen = self.genes.shape[1]
        self.sizecontrol = sizecontrol
        self.Features = []
        self.BaseFeats = []
        random.seed(self.name)

When I run this I get the following:

 In[68]: df = pd.DataFrame(data)

 In[69]: x1 = DNA(genes=df)

 In[70]: x2 = DNA(genes=df)

 In[71]: x1.genes["dummy"] = 'test'

 In[72]: x2.genes["dummy"].head(4) 
 Out[72]:   
  0 test 
  1 test 
  2 test 
  3 test 

How can I make sure x1.genes does not affect x2.genes?

3
  • 1
    Try passing the df variable as DNA(genes=df.copy()) Commented May 19, 2017 at 14:48
  • Both your instances are using the same dataframe as their .genes attribute. Commented May 19, 2017 at 14:50
  • 1
    Where @PM2Ring means literally the same object in memory. Commented May 19, 2017 at 14:55

2 Answers 2

6

There are two issues here.

First, data frames are mutable objects and both of your instances are referencing the same object. You'll want to supply a new copy to each instance using df.copy(). You could alternatively copy the dataframe in the __init__ function itself. This would be "safer" in that can be sure that you are not reusing data frames, but this also might create unnecessary copies.

Second, and not relevant in your example, there is an issue with supplying a mutable default argument, genes = pd.DataFrame(). This data frame is saved on the unbound __init__ function like it was member data of that function (see __init__.__func__.func_defaults). Instead, use a default argument of None or some other sentinel value and then instantiate a new data frame when genes is None.

Sign up to request clarification or add additional context in comments.

2 Comments

"This data frame is saved on the class sort of like member data of the class" Not exactly, but I know what you mean. It's just that default args are evaluated once, when the class definition itself is created. And that's why it can lead to problems with default mutable args.
@SyrtisMajor That's not really relevant to the OP's question since they aren't actually using that default arg. But I fully agree that if they want a default there, they should be using None and testing for None in the body of the __init__
5

Your code is working fine in the sense that genes is an attribute of instances of the DNA class.

However, you only ever created one dataframe. You assign the name df to it and also make it the attribute genes of both x1 and x2 with the

self.genes = genes

assignment. Since assignment never copies data you still have only one dataframe which is shared across x1 and x2.

enter image description here

To solve the issue, you could either make a copy of your dataframe before passing it to the DNA constructor or use

self.genes = genes.copy()

in the __init__ method.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.