Deleting a row based on values of the next row on multiple columns with pandas

Question

I'd like to delete rows from my dataframe when the next one meets certain conditions. Let's say that my dataset is:

raw_data = {'SessionID': ['S1', 'S1', 'S1', 'S2', 'S2', 'S2', 'S2', 'S2', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3'], 
    'Event Action': ['Action', 'Action', 'Filter', 'Action', 'Action', 'Action', 'Filter', 'Filter', 'Action', 'Filter','Action', 'Filter', 'Filter', 'Action'], 
    'Timestamp': ['T1.1', 'T1.2', 'T1.3', 'T1.1', 'T1.2', 'T1.3', 'T1.3', 'T1.4', 'T1.4', 'T1.5', 'T1.7', 'T1.7', 'T1.8', 'T1.9']}

df = pd.DataFrame(raw_data, columns = ['SessionID', 'Event Action', 'Timestamp'])

df

 SessionID  Event Action    Timestamp
0   S1         Action          T1.1
1   S1         Action          T1.2
2   S1         Filter          T1.3
3   S2         Action          T1.1
4   S2         Action          T1.2
5   S2         Action          T1.3
6   S2         Filter          T1.3
7   S2         Filter          T1.4
8   S3         Action          T1.4
9   S3         Filter          T1.5
10  S3         Action          T1.7
11  S3         Filter          T1.7
12  S3         Filter          T1.8
13  S3         Action          T1.9

Given any row and being row1 the next one, I want to delete row when:

if df[row:'SessionID'] == df[row1:'SessionID'] 
and df[row:'Event Action'] == 'Action' 
and df[row1:'Event Action'] == 'Filter' 
and df[row:'Timestamp'] == df[row1:'Timestamp']

For instance, in the dataset above the rows that should be eliminated are 5 and 10. I'm not that expert with fuctions in python, but I've tried:

def cleanfilter(row):
    row1 = row + 1
    if df[row:'SessionID'] == df[row1:'SessionID'] and df[row:'Event Action'] == 'Search Action'and df[row1:'Event Action'] == 'Search Filter' and df[row:'Timestamp'] == df[row1:'Timestamp']:
    df.drop(df.index[row])

df.apply(cleanfilter,axis=1)

But i'm constantly receving: "TypeError: ('must be str, not int', 'occurred at index 0')". I don't know what to google anymore... Any advice would be much appreciated! Thanks in advance.

My apologies for the bad formulation. row1 is the row immediately after row. So, taken any specific row, row1 is the one immediately after row — E. Faslo
– E. Faslo, Commented Jul 12, 2018 at 15:02

harpan · Accepted Answer · 2018-07-12 15:14:14Z

4

You can create masks for your conditions and then apply them to your df with a negation since we are looking to delete the rows that meet the conditions.

m1 = (df['SessionID'] == df['SessionID'].shift(-1))
m2 = (df['Event Action']=='Action')
m3 = (df['Event Action'].shift(-1)=='Filter')
m4 = (df['Timestamp']==df['Timestamp'].shift(-1))
df[~(m1 & m2 & m3 & m4)]

Output:

         SessionID Event Action Timestamp
0         S1       Action      T1.1
1         S1       Action      T1.2
2         S1       Filter      T1.3
3         S2       Action      T1.1
4         S2       Action      T1.2
6         S2       Filter      T1.3
7         S2       Filter      T1.4
8         S3       Action      T1.4
9         S3       Filter      T1.5
11        S3       Filter      T1.7
12        S3       Filter      T1.8
13        S3       Action      T1.9

edited Jul 12, 2018 at 15:14

answered Jul 12, 2018 at 15:04

harpan

8,6412 gold badges22 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

E. Faslo Over a year ago

That perfectly works and is way more elegant than what I was trying to do. I've read now the .shift pandas documentation, cool. However I'm not fully understanding the last sentence: df[~(m1 & m2 & m3 & m4)]. Does it mean: print the dataframe excluding those conditions?

harpan Over a year ago

@E.Faslo, it returns the copy of the df that do not satisfy (m1 & m2 & m3 & m4) (all the conditions.) Rather than deleting the rows that satisfy the conditions, we are looking for the rows that do not satisfy conditions. I hope it clears your doubts.

Collectives™ on Stack Overflow

Deleting a row based on values of the next row on multiple columns with pandas

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related