Pandas groupby with Grouper still includes time bins beyond my filtered range

Question

I'm working with 5-min level data that only includes timestamps between 09:30 and 16:00. (dateTime is saved as column not as index)

after applying operation to the group, I get additional data beyond the expected time frame 16:00
even though there’s no data beyond 16:00. These groups appear with empty values or NaNs.

Here is my code:


filtered = self.df.groupby(pd.Grouper(key = 'Datetime', freq = '30min', origin = 'start'))
self.other_df['Avg_volatility'] = filtered['volatility'].mean()

(The original data doesn't even include dateTime beyond 16:00 for each date. I suspect it has to deal with freq = 30min?)

Here's the original data frame (df):

                    Datetime        Open       Close        High         Low   Volume  VWAP_Price  Volatility
0  2025-06-10 09:30:00-04:00  200.600006  202.136993  203.149994  200.570007  4374637  201.614250    2.579987
1  2025-06-10 09:35:00-04:00  202.134995  202.139999  202.389999  201.695007  1077512  202.090000    0.694992
2  2025-06-10 09:40:00-04:00  202.139999  201.798996  202.324997  201.434998   897000  201.924747    0.889999
..                       ...         ...         ...         ...         ...      ...         ...         ...
78  2025-06-10 15:55:00-04:00  201.804993  201.934998  202.020004  201.380005   805672  201.785000    0.639999
79  2025-06-11 09:30:00-04:00  201.927200  202.869995  203.110001  201.865005  1176969  202.443050    1.244995

this is my output of other_df:

                          Avg_Volatility
Datetime                                 
2025-06-10 09:30:00-04:00        1.146612
2025-06-10 10:00:00-04:00        0.556870
...
2025-06-10 15:00:00-04:00        0.259351
2025-06-10 15:30:00-04:00        0.317085
2025-06-10 16:00:00-04:00             NaN
2025-06-10 16:30:00-04:00             NaN
2025-06-10 17:00:00-04:00             NaN
2025-06-10 17:30:00-04:00             NaN
2025-06-10 18:00:00-04:00             NaN

Thank you,

I'm working with 5-min level data that only includes timestamps between 09:30 and 18:00.

After applying operation to the group, I get additional data like 18:00, 18:30, etc., even though there’s no data beyond 18:00 in my original data frame (df). These groups appear with empty values or NaNs.
I don't want to keep data that go beyond date frame in range from 09:00:00 to 16:00:00.

Why not just filter that out in your other_df or simply drop the rows with null? — chitown88
– chitown88, Commented Jul 24 at 6:32
@chitown88, I thought about it but dropping the rows would cause unneccesary computations, and was trying to avoid it! — JoonHak Kim
– JoonHak Kim, Commented Jul 24 at 9:37

PV8 · Accepted Answer · 2025-07-24 11:26:04Z

0

With a minimal example:

test = pd.DataFrame(
   {
       "Publish date": [
            pd.Timestamp("2000-01-02 09:00:00"),
            pd.Timestamp("2000-01-02 09:00:05"),
            pd.Timestamp("2000-01-02 15:55:00"),
            pd.Timestamp("2000-01-12 16:00:00")
        ],
        "Open ": [0, 1, 2, 3],
        "Close ": [10, 20, 30, 40]
    }
)
groupby = test.groupby(pd.Grouper(key='Publish date', freq='30min', origin='end')).mean().reset_index()

so in your case, you just need to adjust the origin and that should lead to the deseried ouput:

filtered = self.df.groupby(pd.Grouper(key = 'Datetime', freq = '30min', origin = 'end')).mean().reset_index()

answered Jul 24 at 11:26

PV8

6,3669 gold badges54 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JoonHak Kim Jul 25 at 3:54

ahh thank you. I ran the code online but it does still generate data outside of the range (9:00 to 16:00) I thought origin just determines how the interval gets set not necessarily restricting the data range

chitown88 · Accepted Answer · 2025-07-25 07:23:44Z

0

pd.Grouper creates full-length bins, even when there’s no data in the tail bin (like 16:00–16:30). This is normal behavior for time-based grouping.

Try using the closed parameters.

filtered = self.df.groupby(
    pd.Grouper(key='Datetime', freq='30min', origin='start', closed='left')
).mean().reset_index()

or simply drop the nulls when you aggragte/get the mean

self.other_df['Avg_volatility'] = filtered['volatility'].mean().dropna()

edited Jul 25 at 7:23

answered Jul 24 at 6:36

chitown88

29.1k6 gold badges34 silver badges67 bronze badges

1 Comment

JoonHak Kim Jul 24 at 9:51

thank you for the help! I am a bit confused tho, I thought, closed itself does not prevent Grouper from creating bins outside the range of your data; instead, it defines the boundary rules for assigning timestamps within the overall range of bins. how does it help to create an exclusive bin?

Collectives™ on Stack Overflow

Pandas groupby with Grouper still includes time bins beyond my filtered range

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related