How to select dataframe rows according to multi-(other column)-condition on columnar groups?

Question

Copy the following dataframe to your clipboard:

  textId   score              textInfo
0  name1     1.0            text_stuff
1  name1     2.0  different_text_stuff
2  name1     2.0            text_stuff
3  name2     1.0  different_text_stuff
4  name2     1.3  different_text_stuff
5  name2     2.0  still_different_text
6  name2     1.0              yoko ono
7  name2     3.0     I lika da Gweneth
8  name3     1.0     Always a tradeoff
9  name3     3.0                What?!

Now use

import pandas as pd
df=pd.read_clipboard(sep='\s\s+')

to load it into your environment. How does one slice this dataframe such that all the rows of a particular textId are returned if the score group of that textId includes at least one score that equals 1.0, 2.0 and 3.0? Here, the desired operation's result would exclude textId rows name1 since its score group is missing a 3.0 and exclude name3 since its score group is missing a 2.0:

  textId   score              textInfo
0  name2     1.0  different_text_stuff
1  name2     1.3  different_text_stuff
2  name2     2.0  still_different_text
3  name2     1.0              yoko ono
4  name2     3.0     I lika da Gweneth

Attempts

df[df.textId == "textIdRowName" & df.score == 1.0 & df.score == 2.0 & & df.score == 3.0] isn't right since the condition isn't acting on the textId group but only individual rows. If this could be rewritten to match against textId groups then it could be placed in a for loop and fed the unique textIdRowName's. Such a function would collect the names of the textId in a series (say textIdThatMatchScore123) that could then be used to slice the original df like df[df.textId.isin(textIdThatMatchScore123)].
Failing at groupby.

chrisb · Accepted Answer · 2016-04-13 17:41:15Z

4

Here's one solution - groupby textId, then keep only those groups where the unique values of score is a superset (>=) of [1.0, 2.0, 3.0].

In [58]: df.groupby('textId').filter(lambda x: set(x['score']) >= set([1.,2.,3.]))
Out[58]: 
  textId  score              textInfo
3  name2    1.0  different_text_stuff
4  name2    1.3  different_text_stuff
5  name2    2.0  still_different_text
6  name2    1.0              yoko ono
7  name2    3.0     I lika da Gweneth

answered Apr 13, 2016 at 17:41

chrisb

52.7k8 gold badges73 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to select dataframe rows according to multi-(other column)-condition on columnar groups?

Attempts

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Attempts

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related