I'd like to delete rows from my dataframe when the next one meets certain conditions. Let's say that my dataset is:
raw_data = {'SessionID': ['S1', 'S1', 'S1', 'S2', 'S2', 'S2', 'S2', 'S2', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3'],
'Event Action': ['Action', 'Action', 'Filter', 'Action', 'Action', 'Action', 'Filter', 'Filter', 'Action', 'Filter','Action', 'Filter', 'Filter', 'Action'],
'Timestamp': ['T1.1', 'T1.2', 'T1.3', 'T1.1', 'T1.2', 'T1.3', 'T1.3', 'T1.4', 'T1.4', 'T1.5', 'T1.7', 'T1.7', 'T1.8', 'T1.9']}
df = pd.DataFrame(raw_data, columns = ['SessionID', 'Event Action', 'Timestamp'])
df
SessionID Event Action Timestamp
0 S1 Action T1.1
1 S1 Action T1.2
2 S1 Filter T1.3
3 S2 Action T1.1
4 S2 Action T1.2
5 S2 Action T1.3
6 S2 Filter T1.3
7 S2 Filter T1.4
8 S3 Action T1.4
9 S3 Filter T1.5
10 S3 Action T1.7
11 S3 Filter T1.7
12 S3 Filter T1.8
13 S3 Action T1.9
Given any row and being row1 the next one, I want to delete row when:
if df[row:'SessionID'] == df[row1:'SessionID']
and df[row:'Event Action'] == 'Action'
and df[row1:'Event Action'] == 'Filter'
and df[row:'Timestamp'] == df[row1:'Timestamp']
For instance, in the dataset above the rows that should be eliminated are 5 and 10. I'm not that expert with fuctions in python, but I've tried:
def cleanfilter(row):
row1 = row + 1
if df[row:'SessionID'] == df[row1:'SessionID'] and df[row:'Event Action'] == 'Search Action'and df[row1:'Event Action'] == 'Search Filter' and df[row:'Timestamp'] == df[row1:'Timestamp']:
df.drop(df.index[row])
df.apply(cleanfilter,axis=1)
But i'm constantly receving: "TypeError: ('must be str, not int', 'occurred at index 0')". I don't know what to google anymore... Any advice would be much appreciated! Thanks in advance.
row1:'SessionID', I don't see a row 1