1

I have a pandas dataframe in python, let's call it df

In this dataframe I create a new column based on an exist column as follows:

df.loc[:, 'new_col'] = df['col']

Then I do the following:

df[df['new_col']=='Above Average'] = 'Good'

However, I noticed that this operation also changes the values in df['col']

What should I do in order the values in df['col'] not to be affected by operations I do in df['new_col'] ?

1
  • I tried and does not work Commented May 14, 2019 at 9:01

2 Answers 2

2

Use DataFrame.loc with boolean indexing:

df.loc[df['new_col']=='Above Average', 'new_col'] = 'Good'

If no column is specified, all columns are set to Good by condition.


Also both line of code should be changed to one by numpy.where or Series.mask:

df['new_col'] = np.where(df['new_col']=='Above Average', 'Good', df['col'])

df['new_col'] = df['col'].mask(df['new_col']=='Above Average', 'Good')

EDIT: For change many values use Series.replace or Series.map with dictionary for specified values:

d = {'Good':['Above average','effective'], 'Very Good':['Really effective']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Above average': 'Good', 'effective': 'Good', 'Really effective': 'Very Good'}

df['new_col'] = df['col'].replace(d1)
#if large data obviously better performance
df['new_col'] = df['col'].map(d1).fillna(df['col'])
Sign up to request clarification or add additional context in comments.

1 Comment

And what if I have multiple conditions say, change Above average and effective both to Good, and secondly what if I have other cases ? say Really effective to become 'very Good', as an example.
0

There is also an option to use dataframe where method:

df['new_col'] = df['col']
df['new_col'].where(df['new_col']!='Above Average', other='Good', inplace=True )

But to be clear np.where is the fastest way to go:

m = df['col'] == 'Above Average'
df['new_column'] = np.where(m, 'Good', df['col'])

df['new_column'] is the new column name. If mask m is True df['col'] will be assigned else 'Good'.


+----+---------------+
|    | col           |
|----+---------------|
|  0 | Nan           |
|  1 | Above Average |
|  2 | 1.0           |
+----+---------------+
+----+---------------+--------------+
|    | col           | new_column   |
|----+---------------+--------------|
|  0 | Nan           | Nan          |
|  1 | Above Average | Good         |
|  2 | 1.0           | 1.0          |
+----+---------------+--------------+

I am also providing here some notes on masking when using the df.loc:

m = df['col']=='Above Average'
print(m)
df.loc[m, 'new_column'] = 'Good'

As you may see the result will be the same, but note how mask m is having the information where to read the value if m is False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.