2

I have a dataframe that looks similar to this :

Price        From           To
 300€        2020-01-01     2020-01-07
 250€        2020-01-04     2020-01-08
 150€        2020-02-01     2020-02-04
 350€        2020-02-04     2020-02-08

And then I have a list of dates. For example: list = [2020-01-03, 2020-02-04]

I would like to keep only the rows of the dataframe where the dates are in between the From column and the To column.

So, after transformation I would have the following dataframe.

Price        From           To
 300€        2020-01-01     2020-01-07
 150€        2020-02-01     2020-02-04
 350€        2020-02-04     2020-02-08

First I thought of using a lambda with an apply but I thought it was not very efficient because my dataset is very large. Is there a simpler way to do this with pandas ?

The result would be contained in one single dataframe

11
  • Can you specify more precisely the list of dates? Is it guaranteed to have one entry for each row of the pandas Dataframe? Or is it a two-element list and you want to compare the first element to the From and the second element to To column in the dataframe? Commented Dec 31, 2020 at 13:32
  • The list contains a list of dates of the following format: year-month-day just (could be of type string or of type date I can convert them if needed). The dates in the list have the same format as the dates in the dataframe. There are no NAN values in the dataframe and the list will contain at least one date. Commented Dec 31, 2020 at 13:35
  • the list could contain more dates. In the example I only put 2 days but it could've been 3 dates or even 4 Commented Dec 31, 2020 at 13:36
  • Okay, so which date in the list should be compared to which date in the dataframe? Or is the outcome several dataframes, one for each item in the list? Commented Dec 31, 2020 at 13:37
  • If I look at the example I gave above, I would like to take the first date in the list and keep all the rows where this date is between the From and the To. Then I would take the second date of the list and then keep also all the rows where this date is in between the From and the To dates. Is it clear enough ? Commented Dec 31, 2020 at 13:39

2 Answers 2

2

Let's try with numpy broadcasting:

x, y = df[['From', 'To']].values.T
a = np.array(['2020-01-03', '2020-02-04'], dtype=np.datetime64)
mask = ((x[:, None] <= a) & (y[:, None] >= a)).any(1)

df[mask]

  Price       From         To
0  300€ 2020-01-01 2020-01-07
2  150€ 2020-02-01 2020-02-04
3  350€ 2020-02-04 2020-02-08
Sign up to request clarification or add additional context in comments.

6 Comments

Nice solution, Shubham! You got my upvote!
Thank's @DanailPetrov happy holidays!
Thank you very much but I get only False in the mask when I should get some True. I am not sure I understand the last line (mask = ...) could you please provide a little explanation?
@colla Check df.dtypes the data type of From and To columns should be be datetime64, if not you first need to use pd.to_datetime to convert them to datetime type..
Both my To and From columns were converted as datetime64 : df['From'] = df['From'].astype('datetime64[ns]')
|
0

One option is with Pandas IntervalIndex:

dates = ['2020-01-03', '2020-02-04']
dates = pd.to_datetime(dates)
intervals = pd.IntervalIndex.from_arrays(df.From, df.To, closed='both')

df.iloc[intervals.get_indexer_for(dates)] # for duplicates, you can use .unique
 
  Price       From         To
0  300€ 2020-01-01 2020-01-07
2  150€ 2020-02-01 2020-02-04
3  350€ 2020-02-04 2020-02-08

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.