I have a multi-index df:
This df outlines somebodies path through a website, sid is the session, vid is the visitor id, pid are the web pages and ts is the time in which they landed on the site
pid ts
sid vid
1 A page1 t1
A page2 t2
A page3 t3
A page4 t4
A page5 t5
2 B page1 t4
3 C page1 t5
C page2 t6
Some users have ridiculously long pid paths (1000+) which I imagine could be an error. However when I transpose/pivot this data, it takes ages to transpose because a few paths which are so long.
So I want to impose some threshold where for every session after some number (lets say for example 3) it deletes the session sid
I can impose a threshold for the amount of rows which equals lets say 3, then the df would look like this:
pid ts
sid vid
2 B page1 t4
3 C page1 t5
C page2 t6
Any idea on how to do this?