Pandas seting column in new dataframe replace old dataframe

Question

I have two dataframes and I wish to update column in one based on the another. The problem is that when I update the column, the old dataframe gets rewritten as well.

(One dataframe contains correlation between column and target variable, the other is supposed to show the ranking)

import numpy as np
import pandas as pd
from scipy.stats import pearsonr
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data[:100]
y = iris.target[:100]
clmns = iris.feature_names

out = pd.DataFrame(index=np.arange(0,len(clmns)), columns=['coef'])

feat_coef = pd.DataFrame(columns=['Feature_name','pearson_koef_FM']) 

feat_coef['Feature_name'] = clmns
feat_rank = feat_coef

X_np = np.array(X)
y_np = np.array(y)
for idx,name in enumerate(clmns):
    out['coef'].loc[idx] = pearsonr(X_np[:,idx], y_np)[0]

feat_coef['pearson_koef_FM'] = np.absolute(out['coef'])

print '----BEFORE----'      
print feat_coef

feat_rank['pearson_koef_FM'] = feat_coef['pearson_koef_FM'].rank(ascending=False)

print '----AFTER----'     
print feat_coef

Which returns this:

----BEFORE----
        Feature_name pearson_koef_FM
0  sepal length (cm)         0.72829
1   sepal width (cm)        0.684019
2  petal length (cm)        0.969955
3   petal width (cm)        0.960158
----AFTER----
        Feature_name  pearson_koef_FM
0  sepal length (cm)              3.0
1   sepal width (cm)              4.0
2  petal length (cm)              1.0
3   petal width (cm)              2.0

Obviously, I expect the feat_coef remain unchanged. If I print feat_rank, I get correct output. I feel like it has something to do with setting a copy vs view when copying dataframes.

feat_rank is a reference, so replace feat_rank = feat_coef with feat_rank = feat_coef.copy() — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Jan 4, 2017 at 8:56
I knew it will be something like this. Works perfectly! Could you explain why please? — HonzaB
– HonzaB, Commented Jan 4, 2017 at 8:59

MaxU - stand with Ukraine · Accepted Answer · 2017-01-04 09:07:12Z

1

After this line:

feat_rank = feat_coef

feat_rank is a reference to feat_coef:

In [9]: feat_rank is feat_coef
Out[9]: True

In [10]: id(feat_rank)
Out[10]: 177476664

In [11]: id(feat_coef)
Out[11]: 177476664

In [12]: id(feat_coef) == id(feat_rank)
Out[12]: True

In [13]: feat_rank['new'] = 100

In [14]: feat_coef
Out[14]:
        Feature_name pearson_koef_FM  new
0  sepal length (cm)         0.72829  100
1   sepal width (cm)        0.684019  100
2  petal length (cm)        0.969955  100
3   petal width (cm)        0.960158  100

So if you change any existing column (value) in the reference DF feat_rank - it will be done on the source DF feat_coef

Solution: if you need an independent DF use .copy():

feat_rank = feat_coef.copy()

answered Jan 4, 2017 at 9:07

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas seting column in new dataframe replace old dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related