How to slice rows in between time intervals on a case by case basis python

Question

I want to slice rows in df1 in between time intervals StartTime and EndTime in df2 on a case by case basis(by values in column Group_Id in df2). Then concatenate the multiple slices together given they are of the same formats.

So this is df1:

      Timestamp           Group_Id      Data
2013-10-20 00:00:05.143    11           14
2013-10-21 00:05:10.377    11           15
2013-10-22 14:22:15.501    11           19
                   ...
2019-02-05 00:00:05.743    101          21
2019-02-10 00:00:10.407    101          33

and df2:

EndTime          StartTime             Group_Id
27/10/13 16:08   20/10/13 16:08          11
03/12/16 16:11   26/11/16 16:11          2
24/10/14 12:08   17/10/14 12:08          11
04/07/17 08:00   27/06/17 08:00          100
03/04/13 14:10   27/03/13 14:10          26
15/11/18 17:00   08/11/18 17:00          46
11/02/19 00:20   04/02/19 00:20          101

Step1: We start from first row in column Group_Id,df2: 11

Step2: Copy & Paste corresponding rows in df1 that lie between EndTime & StartTime for Group_Id==11

Step3: Concat all sliced subsets from each row in Group_Id(df2)

Hopefully final dataset df3 looks like this:

Group_Id EndTime         StartTime      Timestamp                 Data
11       27/10/13 16:08  20/10/13 16:08 2013-10-20 20:00:05.143   14
11       27/10/13 16:08  20/10/13 16:08 2013-10-21 00:05:10.377   15
11       27/10/13 16:08  20/10/13 16:08 2013-10-22 14:22:15.501   19
                             ...
101      11/02/19 00:20  04/02/19 00:20 2019-02-05 00:00:05.743   21
101      11/02/19 00:20  04/02/19 00:20 2019-02-10 00:00:10.407   33
                             ...

A bad Pseudo code:

for i in df2['Group_Id']:
  if i = df1['Group_Id'],
  dfxx = df1[(df1['Timestamp'] <= df2.loc[i, 'EndTime']) & df1['Timestamp'] > (df2.loc['EndTime'] - dt.timedelta(days=7)])                                                                   
  pd.concat(dfxx for all i)
  i = i+1

Hope this helps to better illustrate the problem.

df1.Timestamp 2013-10-20 00:00:05.143 is outside of 27/10/13 16:08 20/10/13 16:08. Why is it in the output? — Andy L.
– Andy L., Commented Oct 24, 2019 at 2:37

Andy L. · Accepted Answer · 2019-10-24 07:28:50Z

0

Convert df1.Timestamp to datetime. Merge on Group_Id. Create IntervalIndex from start and end of df3. Use listcomp to create True/False mask m and slice df3.

df1.Timestamp = pd.to_datetime(df1.Timestamp)
df3 = df2.merge(df1, on='Group_Id')
iix = pd.IntervalIndex.from_tuples([*df3[['StartTime','EndTime']].apply(pd.to_datetime, dayfirst=True).to_records(index=False)], 
                                   closed='both')
m = [x in iix[i] for i, x in enumerate(df3.Timestamp)]

df3.loc[m]

Out[494]:
          EndTime       StartTime  Group_Id               Timestamp  Data
0  27/10/13 16:08  20/10/13 16:08        11 2013-10-20 20:00:05.143    14
1  27/10/13 16:08  20/10/13 16:08        11 2013-10-21 00:05:10.377    15
2  27/10/13 16:08  20/10/13 16:08        11 2013-10-22 14:22:15.501    19
6  11/02/19 00:20  04/02/19 00:20       101 2019-02-05 00:00:05.743    21
7  11/02/19 00:20  04/02/19 00:20       101 2019-02-10 00:00:10.407    33

edited Oct 24, 2019 at 7:28

answered Oct 24, 2019 at 7:23

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

nilsinelabore Over a year ago

thank you but I don't know why the output is empty with only headings

Andy L. Over a year ago

@nilsinelabore: there is something unusual in your real dataset which is not in the sample data you provided. You may do line-by-line command above and check the result of each line to see where it fails on your real dataset.

nilsinelabore Over a year ago

thanks I can run it now but it seems Timestamp is not filtered by the StartTime and EndTime

Andy L. Over a year ago

@nilsinelabore: after creating iix on your real dataset, check it to see whether it is dtype IntervalIndex with values from df3.StartTime, df3.EndTime (note: df3 is the result from merge) and check df1.Timestamp is dtype datetime

sharder · Accepted Answer · 2019-10-24 02:28:25Z

0

You should be able to accomplish this with a merge based on your example.

df1.merge(df2,on='Group_Id',how='left')

answered Oct 24, 2019 at 2:28

sharder

1414 bronze badges

1 Comment

nilsinelabore Over a year ago

thanks I don't think it'll work as Group_Id is not unique..

Collectives™ on Stack Overflow

How to slice rows in between time intervals on a case by case basis python

2 Answers 2

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related