Returning subset of each group from a pandas groupby object

Question

I have the multilevel dataframe that looks like:

                      date_time      name  note   value
list index                                    
1    0     2015-05-22 05:37:59       Tom   129    False
     1     2015-05-22 05:38:59       Tom     0    True
     2     2015-05-22 05:39:59       Tom     0    False
     3     2015-05-22 05:40:59       Tom    45    True
2    4     2015-05-22 05:37:59       Kate   129    True
     5     2015-05-22 05:41:59       Kate     0    False
     5     2015-05-22 05:37:59       Kate     0    True

I want iterate over the list , and for each first row of list check the value of column value, and if it is False, delete this row. So the final goal is to delete all the first rows in list, that have False in value I use this code, that seems logic:

def delete_first_false():
    for list, new_df in df.groupby(level=0):
        for index, row in new_df.iterrows():
            new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
        return new_df
    return df

but I have this error

AttributeError: '_LocIndexer' object has no attribute 'groupby'

could you explain me what's wrong with my method?

do you mind of i take a stab at revising the title to make it more searchable? — Paul H
– Paul H, Commented Nov 4, 2015 at 0:14

Paul H · Accepted Answer · 2015-11-03 17:49:03Z

4

Your general approach -- using loops -- rarely works the way you want in pandas.

If you have a groupby object, you should use the apply, agg, filter or transform methods. In your case apply is appropriate.

Your main goal is the following:

So the final goal is to delete all the first rows in (each group defined by ) list that have False in (the) value (column).

So let's write a simple function to do just that on a single, stand-alone dataframe:

def filter_firstrow_falses(df):
    if not df['value'].iloc[0]:
        return df.iloc[1:]
    else:
        return df

OK. Simple enough.

Now, let's apply that to each group of your real dataframe:

import pandas
from io import StringIO

csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")

df = pandas.read_csv(csv)

final = (
    df.groupby(by=['list']) # create the groupby object
      .apply(filter_firstrow_falses) # apply our function to each group
      .reset_index(drop=True) # clean up the index
)
print(final)


   list            date_time  name  note  value
0     1  2015-05-22 05:38:59   Tom     0   True
1     1  2015-05-22 05:39:59   Tom     0  False
2     1  2015-05-22 05:40:59   Tom    45   True
3     2  2015-05-22 05:37:59  Kate   129   True
4     2  2015-05-22 05:41:59  Kate     0  False
5     2  2015-05-22 05:37:59  Kate     0   True

answered Nov 3, 2015 at 17:49

Paul H

68.7k23 gold badges165 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user5421875 Over a year ago

I just test it on my dataframe @PaulH, and maybe because my False is not boolean value but string, it doesn't delete the first rows that has False in value columns

user5421875 Over a year ago

it's ok, I just change this line if not df['value'].iloc[0]: of code for if df['value'].iloc[0]=='False' and it works! thank you very much for excellent explanation of methods in answer!!

Collectives™ on Stack Overflow

Returning subset of each group from a pandas groupby object

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related