3

My dataframe is something like

df

group cat_col
g1    r
g1    nr
g1    r
g1    nr
g2    nr
g2    nr

I need to replace "nr" for "r" whenever the group has at least 1 "r". In this case, I need it to return:

df_new

group cat_col
g1    r
g1    r
g1    r
g1    r
g2    nr
g2    nr

I know this question is elementary, but I'm stuck for hours and I didn't figure out how to solve it. Does someone know?

1
  • I assume you're using Pandas, so I added the pandas tag for you. If that's incorrect, you can edit to fix it. Commented Aug 29, 2021 at 23:57

4 Answers 4

3

Use groupby.transform:

df.cat_col.groupby(df.group).transform(lambda g: 'r' if g.eq('r').any() else g)

0     r
1     r
2     r
3     r
4    nr
5    nr
Name: cat_col, dtype: object

If only need to replace nr with r:

df.cat_col = df.cat_col.groupby(df.group).transform(
  lambda g: g.replace('nr', 'r') if g.eq('r').any() else g
)
Sign up to request clarification or add additional context in comments.

1 Comment

Nice one, however this will not work if the requirement is "to change only nr to r" and there are other values except from r and nr. OP must clarify this
1

We can also use groupby transform to see if there are any values in each group that eq r and use this Boolean index to then replace those values with r:

m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = 'r'

Or conditionally replace nr with r using the same boolean index (in case there are multiple replace values):

m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = df.loc[m, 'cat_col'].replace({'nr': 'r'})

df:

  group cat_col
0    g1       r
1    g1       r
2    g1       r
3    g1       r
4    g2      nr
5    g2      nr

Boolean index steps in a DataFrame:

steps_df = pd.DataFrame({
    # Find where cat_col is r
    'step 1': df['cat_col'].eq('r'),
    # Find groups which have an r value
    'step 2': df['cat_col'].eq('r').groupby(df['group']).transform('any')
})
   step 1  step 2
0    True    True
1   False    True
2    True    True
3   False    True
4   False   False
5   False   False

Setup (DataFrame and imports):

import pandas as pd

df = pd.DataFrame({
    'group': ['g1', 'g1', 'g1', 'g1', 'g2', 'g2'],
    'cat_col': ['r', 'nr', 'r', 'nr', 'nr', 'nr']
})

Comments

0

Here is one way to do it:

l=df[df['cat_col']=='r']['group'].to_list()

df.loc[df['group'].isin(l), 'cat_col'] = df.loc[df['group'].isin(gr), 'cat_col'].replace('nr', 'r')

Output:

>>> print(df)
  group cat_col
0    g1       r
1    g1       r
2    g1       r
3    g1       r
4    g2      nr
5    g2      nr

Comments

0

This has a few more steps but I think is quite clear to follow:

groups_that_pass_the_condition = []
groups = df.group.unique()

for group in groups:
    cat_col_by_group = df.loc[df.group == group]['cat_col']
    value_counts = cat_col_by_group.value_counts()
    if 'r' in value_counts.index:
        if value_counts.r >= 1:
            groups_that_pass_the_condition.append(group)

for group_that_passed in groups_that_pass_the_condition:
    df.loc[df.group == group_that_passed] = df.loc[df.group == group_that_passed].replace('nr', 'r')

print(df)

OUT:

  group cat_col
0    g1       r
1    g1       r
2    g1       r
3    g1       r
4    g2      nr
5    g2      nr

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.