1

Suppose I have the following (example) dataframe:

   a  b  c  d  e
0  9  9  0  9  9
1  1  2  1  9  9
2  8  8  0  2  3
3  7  7  0  7  8
4  1  2  0  3  4
5  6  2  3  6  6
6  1  2  0  1  2
7  1  3  0  1  2

Also suppose I have generated an (arbitrary) list of indices, e.g. [3,4]. For each element in the list, I want to drop all rows from the data frame that have the same values in column 'a' and column 'b' as rows 3 and 4.

Since row 3 has a=7 and b=7, and no other rows have a=7 and b=7, only row 3 is dropped.

Since row 4 has a=1 and b=2, and rows 1 and 6 also have a=1 and b=2, I drop rows 4, 1, and 6.

So the resulting dataframe would look like this:

   a  b  c  d  e
0  9  9  0  9  9
1  8  8  0  2  3
2  6  2  3  6  6
3  1  3  0  1  2

Does anyone know how to come up with a solution to do this quickly (for use in a much larger data frame)? Thank you.

1 Answer 1

1

Make use of numpy broadcasting;

  • Extract values at indices and columns with loc and reshape it to 3d array:

    df.loc[indices,cols].values[:,None]

  • compare it with columns a and b, which will compare rows 3 and 4 with all other rows because of dimension mismatch and numpy broadcasting

    df[cols].values == df.loc[indices,cols].values[:,None]

  • use .all(2) to make sure both columns match, and any(0) to get matches for either row 3 or row 4

  • Negate ~ and drop matched rows

gives:

indices = [3,4]
cols = ['a','b']
df[~(df[cols].values == df.loc[indices,cols].values[:,None]).all(2).any(0)]

#   a  b  c  d  e
#0  9  9  0  9  9
#2  8  8  0  2  3
#5  6  2  3  6  6
#7  1  3  0  1  2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.