I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate and the original DataFrame.
Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:
In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]
In [68]: d = DataFrame(d)
In [69]: d
Out[69]:
a b
0 3 5
1 1 1
Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:
In [70]: e = d
In [71]: e['a'] = e['a'].apply(lambda x: x + 1)
The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:
In [72]: e # duplicate DataFrame
Out[72]:
a b
0 4 5
1 2 1
In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied
Out[73]:
a b
0 4 5
1 2 1
I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.
I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']] ), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.