I have a dataframe that looks like this
import pandas as pd
all_df = pd.DataFrame(
[
('A','Area1','AA','A B D E','A B','D E'),
('B','Area1','AA','A B D E','A B','D E'),
('C','Area2','BB','C','C','C'),
('E','Area1','CC','A B D E','A B','D E'),
('F','Area3','BB','F G','G','F')
],
columns = ['Name','Area','Type','Group','AA members','CC members']
)
Name Area Type Group AA members CC members
0 A Area1 AA A B D E A B D E
1 B Area1 AA A B D E A B D E
2 C Area2 BB C C C
3 E Area1 CC A B D E A B D E
4 F Area3 BB F G G F
The last row (row 4) is incorrect. Anything that is type BB should only have itself (F) in Group, AA members, CC members. In other words, it should look like this:
4 F Area3 BB F F F
To do this I tried:
Check when Type is
BBand length ofGroupis 2:all_df.loc[all_df['Type']== 'BB']['Group'].str.split().str.len() == 2Iterate over every row and to find the cases like this
Make a new df with all these rows and make the Group, AA members, CC members = Name
Drop the rows where that happens in
all_df.Merge 3 back in to
all_df.
Is there a better pandas way to do this?