DataFrame.apply in python pandas alters both original and duplicate DataFrames

Question

I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate and the original DataFrame.

Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:

In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]

In [68]: d = DataFrame(d)

In [69]: d

Out[69]: 
   a  b
0  3  5
1  1  1

Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:

In [70]: e = d

In [71]: e['a'] = e['a'].apply(lambda x: x + 1)

The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:

In [72]: e # duplicate DataFrame
Out[72]: 
   a  b
0  4  5
1  2  1

In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied
Out[73]:  
   a  b
0  4  5
1  2  1

I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.

I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']] ), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.

BrenBarn · Accepted Answer · 2012-06-01 05:27:19Z

14

This is not a pandas-specific issue. In Python, assignment never copies anything:

>>> a = [1,2,3]
>>> b = a
>>> b[0] = 'WHOA!'
>>> a
['WHOA!', 2, 3]

If you want a new DataFrame, make a copy with e = d.copy().

Edit: I should clarify that assignment to a bare name never copies anything. Assignment to an item or attribute (e.g., a[1] = x or a.foo = bar) is converted into method calls under the hood and may do copying depending on what kind of object a is.

edited Jun 1, 2012 at 5:27

answered Jun 1, 2012 at 5:13

BrenBarn

253k39 gold badges421 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MikeGruz Over a year ago

And just when I thought I was getting a decent grip on Python! Thanks much for the response. That's a big help.

BrenBarn Over a year ago

I don't understand what you mean by "why doesn't it work". It does work. Maybe you're confused by the variable names? I said e = d.copy() because those were the names used in the original question.

Collectives™ on Stack Overflow

DataFrame.apply in python pandas alters both original and duplicate DataFrames

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related