0

I'm trying to write a function that takes as inputs a DataFrame with a column 'timestamp' and a list of tuples. Every tuple will contain a beginning and end time.

What I want to do is to "split" the dataframe in two new ones, where the first contains the rows for which the timestamp value is not contained between the extremes of any tuple, and the other is just the complementary. The number of filter tuples is not known a priori though.

df = DataFrame({'timestamp':[0,1,2,5,6,7,11,22,33,100], 'x':[1,2,3,4,5,6,7,8,9,1])
filt = [(1,4), (10,40)]
left, removed = func(df, filt)

This should give me two dataframes

  • left: with rows with timestamp [0,5,6,7,100]
  • removed: with rows with timestamp [1,2,11,22,33]

I believe the right approach is to write a custom function that can be used as a filter, and then call is somehow to filter/mask the dataframe, but I could not find a proper example of how to implement this.

2 Answers 2

3

Check

out = df[~pd.concat([df.timestamp.between(*x) for x in filt]).any(level=0)]
Out[175]: 
   timestamp  x
0          0  1
3          5  4
4          6  5
5          7  6
9        100  1
Sign up to request clarification or add additional context in comments.

2 Comments

I guess it works, thanks. I'm not sure concat is very efficient when handling many values and conditions, right?
@AndreaRonco yes it is
0

Can't you use filtering with .isin():

left,removed = df[df['timestamp'].isin([0,5,6,7,100])],df[df['timestamp'].isin([1,2,11,22,33])]

1 Comment

No, it needs to scale with variable number of filter tuples

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.