Determining if a Pandas dataframe row has multiple specific values

Question

I have a Pandas data frame represented by the one below:

     A    B    C    D
 |   1    1    1    3    |
 |   1    1    1    2    |
 |   2    3    4    5    |

I need to iterate through this data frame, looking for rows where the values in columns A, B, & C match and if that's true check the values in column D for those rows and delete the row with the smaller value. So, in above example would look like this afterwards.

         A    B    C    D
    |    1    1    1    3    |
    |    2    3    4    5    |

I've written the following code, but something isn't right and it's causing an error. It also looks more complicated than it may need to be, so I am wondering if there is a better, more concise way to write this.

 for col, row in df.iterrows():
...     df1 = df.copy()
...     df1.drop(col, inplace = True)
...     for col1, row1 in df1.iterrows():
...             if df[0].iloc[col] == df1[0].iloc[col1] & df[1].iloc[col] == df1[1].iloc[col1] & 
                df[2].iloc[col] == df1[2].iloc[col1] & df1[3].iloc[col1] > df[3].iloc[col]:
...                     df.drop(col, inplace = True)

is this what you are after : df.groupby(["A", "B", "C"], as_index=False).max()? — sammywemmy
– sammywemmy, Commented Jan 8, 2021 at 21:23

IoaTzimas · Accepted Answer · 2021-01-08 16:42:17Z

1

Here is one solution:

df[~((df[['A', 'B', 'C']].duplicated(keep=False)) & (df.groupby(['A', 'B', 'C'])['D'].transform(min)==df['D']))]

Explanation:

df[['A', 'B', 'C']].duplicated(keep=False)

returns a mask for rows with duplicated values of ['A', 'B', 'C'] columns

df.groupby(['A', 'B', 'C'])['D'].transform(min)==df['D']

returns a mask for rows that have the minimum value for ['D'] column, for each group of ['A', 'B', 'C']

The combination of these masks, selects all these rows (duplicated ['A', 'B', 'C'] and minimum 'D' for the group. With ~ we select all other rows except from these ones.

Result for the provided input:

   A  B  C  D
0  1  1  1  3
2  2  3  4  5

answered Jan 8, 2021 at 16:42

IoaTzimas

10.7k2 gold badges15 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ggaurav · Accepted Answer · 2021-01-09 17:42:46Z

1

You can groupby all the variables (using groupby(['A', 'B', 'C'])) which have to be equal and then exclude the row with minimum value of D (using func)) if there are multiple unique records to get the boolean indices for the rows which has to be retained

def func(x):
    if len(x.unique()) != 1:
        return x != x.min()
    else:
        return x == x

df[df.groupby(['A', 'B', 'C'])['D'].apply(lambda x: func(x))]
    
    A   B   C   D
0   1   1   1   3
2   2   3   4   5

If row having just the maximum group value in D has to be retained. Then you can use the below:

df[df.groupby(['A', 'B', 'C'])['D'].apply(lambda x: x == x.max())]

edited Jan 9, 2021 at 17:42

answered Jan 8, 2021 at 16:28

ggaurav

1,8041 gold badge11 silver badges11 bronze badges

11 Comments

IoaTzimas Over a year ago

This is not correct. The row with the minimum value of 'D' must be removed. For group [1,1,1] min value is 2

ggaurav Over a year ago

Read the question wrongly. Corrected the answer

HelpWithR Over a year ago

Thank you for your help. When I ran this with the random data that I provided above it worked. However for some reason this didn't work on the actual data I'm working with. I verified that the values in columns a, b, and c matched, respectively and that the values in 'd' entered the first step of the if statement in func. I then confirmed that the larger value is what is being returned but then nothing appears to have changed in the final data frame. Any idea what might be happening and how to possibly fix?

ggaurav Over a year ago

@HelpWithR if you can replicate the issue with again some random data, that would be great.

ggaurav Over a year ago

@HelpWithR just to confirm out of curiosity, you have to delete only the group row where D value is minimum?

|

Collectives™ on Stack Overflow

Determining if a Pandas dataframe row has multiple specific values

2 Answers 2

Comments

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related