2

I have the multilevel dataframe that looks like:

                      date_time      name  note   value
list index                                    
1    0     2015-05-22 05:37:59       Tom   129    False
     1     2015-05-22 05:38:59       Tom     0    True
     2     2015-05-22 05:39:59       Tom     0    False
     3     2015-05-22 05:40:59       Tom    45    True
2    4     2015-05-22 05:37:59       Kate   129    True
     5     2015-05-22 05:41:59       Kate     0    False
     5     2015-05-22 05:37:59       Kate     0    True

I want iterate over the list , and for each first row of list check the value of column value, and if it is False, delete this row. So the final goal is to delete all the first rows in list, that have False in value I use this code, that seems logic:

def delete_first_false():
    for list, new_df in df.groupby(level=0):
        for index, row in new_df.iterrows():
            new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
        return new_df
    return df

but I have this error

AttributeError: '_LocIndexer' object has no attribute 'groupby'

could you explain me what's wrong with my method?

2
  • do you mind of i take a stab at revising the title to make it more searchable? Commented Nov 4, 2015 at 0:14
  • @PaulH, sure, if you consider it will make more searchable! Commented Nov 4, 2015 at 8:26

1 Answer 1

4

Your general approach -- using loops -- rarely works the way you want in pandas.

If you have a groupby object, you should use the apply, agg, filter or transform methods. In your case apply is appropriate.

Your main goal is the following:

So the final goal is to delete all the first rows in (each group defined by ) list that have False in (the) value (column).

So let's write a simple function to do just that on a single, stand-alone dataframe:

def filter_firstrow_falses(df):
    if not df['value'].iloc[0]:
        return df.iloc[1:]
    else:
        return df

OK. Simple enough.

Now, let's apply that to each group of your real dataframe:

import pandas
from io import StringIO

csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")

df = pandas.read_csv(csv)

final = (
    df.groupby(by=['list']) # create the groupby object
      .apply(filter_firstrow_falses) # apply our function to each group
      .reset_index(drop=True) # clean up the index
)
print(final)


   list            date_time  name  note  value
0     1  2015-05-22 05:38:59   Tom     0   True
1     1  2015-05-22 05:39:59   Tom     0  False
2     1  2015-05-22 05:40:59   Tom    45   True
3     2  2015-05-22 05:37:59  Kate   129   True
4     2  2015-05-22 05:41:59  Kate     0  False
5     2  2015-05-22 05:37:59  Kate     0   True
Sign up to request clarification or add additional context in comments.

2 Comments

I just test it on my dataframe @PaulH, and maybe because my False is not boolean value but string, it doesn't delete the first rows that has False in value columns
it's ok, I just change this line if not df['value'].iloc[0]: of code for if df['value'].iloc[0]=='False' and it works! thank you very much for excellent explanation of methods in answer!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.