8

I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate and the original DataFrame.

Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:

In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]

In [68]: d = DataFrame(d)

In [69]: d

Out[69]: 
   a  b
0  3  5
1  1  1

Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:

In [70]: e = d

In [71]: e['a'] = e['a'].apply(lambda x: x + 1)

The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:

In [72]: e # duplicate DataFrame
Out[72]: 
   a  b
0  4  5
1  2  1

In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied
Out[73]:  
   a  b
0  4  5
1  2  1

I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.

I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']] ), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.

1 Answer 1

14

This is not a pandas-specific issue. In Python, assignment never copies anything:

>>> a = [1,2,3]
>>> b = a
>>> b[0] = 'WHOA!'
>>> a
['WHOA!', 2, 3]

If you want a new DataFrame, make a copy with e = d.copy().

Edit: I should clarify that assignment to a bare name never copies anything. Assignment to an item or attribute (e.g., a[1] = x or a.foo = bar) is converted into method calls under the hood and may do copying depending on what kind of object a is.

Sign up to request clarification or add additional context in comments.

2 Comments

And just when I thought I was getting a decent grip on Python! Thanks much for the response. That's a big help.
I don't understand what you mean by "why doesn't it work". It does work. Maybe you're confused by the variable names? I said e = d.copy() because those were the names used in the original question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.