0

I have a Pandas data frame represented by the one below:

     A    B    C    D
 |   1    1    1    3    |
 |   1    1    1    2    |
 |   2    3    4    5    |

I need to iterate through this data frame, looking for rows where the values in columns A, B, & C match and if that's true check the values in column D for those rows and delete the row with the smaller value. So, in above example would look like this afterwards.

         A    B    C    D
    |    1    1    1    3    |
    |    2    3    4    5    |

I've written the following code, but something isn't right and it's causing an error. It also looks more complicated than it may need to be, so I am wondering if there is a better, more concise way to write this.

 for col, row in df.iterrows():
...     df1 = df.copy()
...     df1.drop(col, inplace = True)
...     for col1, row1 in df1.iterrows():
...             if df[0].iloc[col] == df1[0].iloc[col1] & df[1].iloc[col] == df1[1].iloc[col1] & 
                df[2].iloc[col] == df1[2].iloc[col1] & df1[3].iloc[col1] > df[3].iloc[col]:
...                     df.drop(col, inplace = True)
1
  • is this what you are after : df.groupby(["A", "B", "C"], as_index=False).max()? Commented Jan 8, 2021 at 21:23

2 Answers 2

1

Here is one solution:

df[~((df[['A', 'B', 'C']].duplicated(keep=False)) & (df.groupby(['A', 'B', 'C'])['D'].transform(min)==df['D']))]

Explanation:

df[['A', 'B', 'C']].duplicated(keep=False)

returns a mask for rows with duplicated values of ['A', 'B', 'C'] columns

df.groupby(['A', 'B', 'C'])['D'].transform(min)==df['D']

returns a mask for rows that have the minimum value for ['D'] column, for each group of ['A', 'B', 'C']

The combination of these masks, selects all these rows (duplicated ['A', 'B', 'C'] and minimum 'D' for the group. With ~ we select all other rows except from these ones.

Result for the provided input:

   A  B  C  D
0  1  1  1  3
2  2  3  4  5
Sign up to request clarification or add additional context in comments.

Comments

1

You can groupby all the variables (using groupby(['A', 'B', 'C'])) which have to be equal and then exclude the row with minimum value of D (using func)) if there are multiple unique records to get the boolean indices for the rows which has to be retained

def func(x):
    if len(x.unique()) != 1:
        return x != x.min()
    else:
        return x == x

df[df.groupby(['A', 'B', 'C'])['D'].apply(lambda x: func(x))]
    
    A   B   C   D
0   1   1   1   3
2   2   3   4   5

If row having just the maximum group value in D has to be retained. Then you can use the below:

df[df.groupby(['A', 'B', 'C'])['D'].apply(lambda x: x == x.max())]

11 Comments

This is not correct. The row with the minimum value of 'D' must be removed. For group [1,1,1] min value is 2
Read the question wrongly. Corrected the answer
Thank you for your help. When I ran this with the random data that I provided above it worked. However for some reason this didn't work on the actual data I'm working with. I verified that the values in columns a, b, and c matched, respectively and that the values in 'd' entered the first step of the if statement in func. I then confirmed that the larger value is what is being returned but then nothing appears to have changed in the final data frame. Any idea what might be happening and how to possibly fix?
@HelpWithR if you can replicate the issue with again some random data, that would be great.
@HelpWithR just to confirm out of curiosity, you have to delete only the group row where D value is minimum?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.