2

i have the following code data...

import pandas as pd, numpy as np
from datetime import datetime
end_dt = datetime.today()
st_dt = (end_dt + pd.DateOffset(-10)).date()
df_index = pd.date_range(st_dt, end_dt)
df = pd.DataFrame(index=df_index, columns=['in_range'])

data = [pd.to_datetime(['2022-11-08','2022-11-10']), pd.to_datetime(['2022-11-13','2022-11-15'])]
dt_ranges = pd.DataFrame(data,columns={'st_dt':'datetimens[64]', 'end_dt': 'datetimens[64]'})

this produces the following two dataframes:
df:

            in_range
2022-11-08  NaN
2022-11-09  NaN
2022-11-10  NaN
2022-11-11  NaN
2022-11-12  NaN
2022-11-13  NaN
2022-11-14  NaN
2022-11-15  NaN
2022-11-16  NaN
2022-11-17  NaN
2022-11-18  NaN

and date_ranges:

    st_dt       end_dt
0   2022-11-08  2022-11-10
1   2022-11-13  2022-11-15

I would like to update the 'in_range' column to indicate if the index falls within any of the pairs of start and end dates of the 2nd dataframe. so i should end up with this:

            in_range
2022-11-08  True
2022-11-09  True
2022-11-10  True
2022-11-11  NaN
2022-11-12  NaN
2022-11-13  True
2022-11-14  True
2022-11-15  True
2022-11-16  NaN
2022-11-17  NaN
2022-11-18  NaN

I've gone down the path of trying to do this with using lambda and iteration. but to me that seems in efficient.

    def in_range(index_date, date_ranges):
        for r in date_ranges.values:
            if (r[0] >= index_date) & (r[1] <= index_date):
                return True
        return False

     df['in_range'] = df.reset_index().apply(lambda x: in_range(x.date, dt_ranges), axis=1)

the above sets in_range to NaNs always, despite the code returning the correct values. i suspect it's because i am resetting the index and so it can not align. Also, as mentioned - this solution probably is pretty inefficient

is there a more pythonic/pandemic way of doing this?

2 Answers 2

4

Use merge_asof and boolean indexing:

s = df.index.to_series()
m = (pd.merge_asof(s.rename('st_dt'), dt_ranges)
     ['end_dt'].ge(s.to_numpy()).to_numpy()
     )

df.loc[m, 'in_range'] = True

NB. The intervals in dt_ranges should be non-overlapping.

Output:

           in_range
2022-11-08     True
2022-11-09     True
2022-11-10     True
2022-11-11      NaN
2022-11-12      NaN
2022-11-13     True
2022-11-14     True
2022-11-15     True
2022-11-16      NaN
2022-11-17      NaN
2022-11-18      NaN
Sign up to request clarification or add additional context in comments.

4 Comments

that works! thank you. did not know about this merge_asof function.
if dt_ranges are overlapping, would the solution be to aggregate/combine the overlaps into a single range?
This is actually only an issue if one interval is fully inside another, so I guess unlikely. Yes you would need to merge/delete.
For more details on why this is an issue. If you have [1,5] and [2,3] as intervals, for a value of 4, merge_asof would map it to [2,3] as the mapping depends on only one boundary, and the further check would fail. If you remove the interval the mapping will be correct (to [1,5]).
3

One option is to compute the non-equi join is with conditional_join, which can handle overlaps:

# pip install pyjanitor
import pandas as pd
import janitor
(
df
.reset_index()
.conditional_join(
    dt_ranges, 
    ('index', 'st_dt', '>='), 
    ('index', 'end_dt', '<='), 
    # depending on your data size
    # setting use_numba to True
    # can improve performance
    # of course, this requires numba installed
    use_numba = False,
    how = 'left', 
    # performance is better when
    # sort_by_appearance is False
    sort_by_appearance=True)
.assign(in_range = lambda df: df.in_range.mask(df.st_dt.notna(), True))
.iloc[:, :2]
.set_index('index')
)

           in_range
index              
2022-11-08     True
2022-11-09     True
2022-11-10     True
2022-11-11      NaN
2022-11-12      NaN
2022-11-13     True
2022-11-14     True
2022-11-15     True
2022-11-16      NaN
2022-11-17      NaN
2022-11-18      NaN

2 Comments

Sometimes I'm wondering why janitor doesn't get incorporated into pandas. It would be nice.
this is a nice solution as well. thx for posting. and again..wasnt familiar with this janitor lib

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.