1

I have two pandas DataFrames, sdm. I wanted to create a copy of that DataFrame and work on that and later, I want to create another copy from sdm and work on different analysis. However, when I create a new Data Frame like this,

new_df = sdm

It creates a copy, however, when I alter new_df, it makes changes to the my old DataFrame sdm. How can I handle this without using =?

5
  • 1
    To copy a dataframe: sdm.copy() Commented Aug 9, 2017 at 20:46
  • pandas.pydata.org/pandas-docs/stable/generated/…. You probably want a deep copy, too, so don't forget deep=True. Commented Aug 9, 2017 at 20:46
  • May I ask you why = does not work as it is in R? Commented Aug 9, 2017 at 20:47
  • 1
    You might want to look at "Facts and myths about Python names and values" (Blog post by Ned Batchelder) already the third "fact" is appropriate here "Fact: Many names can refer to one value.". In short: = isn't a trick. It's an assignment. It assigns the right hand side to the variable on the left hand side. Commented Aug 9, 2017 at 20:50
  • @MSeifert got it, thank you, I will read that! Commented Aug 9, 2017 at 20:55

2 Answers 2

3

What python does is passing by reference. Try this:

new_df = sdm.copy()

I think you should have search more, I am sure there will be lots of questions on this topic!

Sign up to request clarification or add additional context in comments.

2 Comments

I know I should have searched more, but I was confused why this was happening. as @PaSTE pointed I used deep=True too and it works!
@Jesse as mentioned in the answer, there is plenty of information online for this question. Good Luck!
3

you need to use new_df = sdm.copy() instead which is described here in the official documentation. new_df = sdm doesn't work because this assignement operation performs a copy by reference and not by value which means in nutshell, both new_df and sdm will reference the same data in memory.

Comments