1

I have a dataframe that looks like this

import pandas as pd
    
all_df = pd.DataFrame(
    [
        ('A','Area1','AA','A B D E','A B','D E'),
        ('B','Area1','AA','A B D E','A B','D E'),
        ('C','Area2','BB','C','C','C'),
        ('E','Area1','CC','A B D E','A B','D E'),
        ('F','Area3','BB','F G','G','F')
    ],
    columns = ['Name','Area','Type','Group','AA members','CC members']
)

    
 Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB      F G          G          F

The last row (row 4) is incorrect. Anything that is type BB should only have itself (F) in Group, AA members, CC members. In other words, it should look like this:

4    F  Area3   BB        F          F          F

To do this I tried:

  1. Check when Type is BB and length of Group is 2:

    all_df.loc[all_df['Type']== 'BB']['Group'].str.split().str.len() == 2
    
  2. Iterate over every row and to find the cases like this

  3. Make a new df with all these rows and make the Group, AA members, CC members = Name

  4. Drop the rows where that happens in all_df.

  5. Merge 3 back in to all_df.

Is there a better pandas way to do this?

2 Answers 2

2

We can use boolean indexing to replace values here. The key here is that we don't really need to check if the Group value is correct or not; we can simply assign the Name to the latter three columns if Type is BB.

# identify rows where Type is BB
m = all_df['Type'] == 'BB'
# for Type BB rows, replace Group, AA members and CC members values by Name
all_df.loc[m, ['Group', 'AA members', 'CC members']] = all_df.loc[m, 'Name']

The above filters the rows where Type is BB and replaces the values in the relevant columns. Another way is to use mask to replace values in the relevant columns.

cols = ['Group', 'AA members', 'CC members']
all_df[cols] = all_df[cols].mask(all_df['Type'] == 'BB', all_df['Name'], axis=0)

Both return the desired output:

  Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB        F          F          F
Sign up to request clarification or add additional context in comments.

Comments

2

You can try iloc and for loop.

for row in all_df.index:
    if all_df.iloc[row,2] == "BB":
        all_df.iloc[row,3:] = all_df["Name"][row]
        
all_df

  Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB        F          F          F

2 Comments

I only picked the other one as an answer because it didnt have a loop. But this is super cool too. Can you explain how all_df.iloc[row,3:] works? Is it sayin from column 3 and on wards ?
No problem, as long as it solves your problem, it's always a good answer :D and yes, in fact, the values from row 3 till the last column will be replaced (If you are planning to do so). If not, the solution from @not a robot will be a better solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.