0

I'm working with 5-min level data that only includes timestamps between 09:30 and 16:00. (dateTime is saved as column not as index)

after applying operation to the group, I get additional data beyond the expected time frame 16:00
even though there’s no data beyond 16:00. These groups appear with empty values or NaNs.

Here is my code:


filtered = self.df.groupby(pd.Grouper(key = 'Datetime', freq = '30min', origin = 'start'))
self.other_df['Avg_volatility'] = filtered['volatility'].mean()

(The original data doesn't even include dateTime beyond 16:00 for each date. I suspect it has to deal with freq = 30min?)

Here's the original data frame (df):

                    Datetime        Open       Close        High         Low   Volume  VWAP_Price  Volatility
0  2025-06-10 09:30:00-04:00  200.600006  202.136993  203.149994  200.570007  4374637  201.614250    2.579987
1  2025-06-10 09:35:00-04:00  202.134995  202.139999  202.389999  201.695007  1077512  202.090000    0.694992
2  2025-06-10 09:40:00-04:00  202.139999  201.798996  202.324997  201.434998   897000  201.924747    0.889999
..                       ...         ...         ...         ...         ...      ...         ...         ...
78  2025-06-10 15:55:00-04:00  201.804993  201.934998  202.020004  201.380005   805672  201.785000    0.639999
79  2025-06-11 09:30:00-04:00  201.927200  202.869995  203.110001  201.865005  1176969  202.443050    1.244995

this is my output of other_df:

                          Avg_Volatility
Datetime                                 
2025-06-10 09:30:00-04:00        1.146612
2025-06-10 10:00:00-04:00        0.556870
...
2025-06-10 15:00:00-04:00        0.259351
2025-06-10 15:30:00-04:00        0.317085
2025-06-10 16:00:00-04:00             NaN
2025-06-10 16:30:00-04:00             NaN
2025-06-10 17:00:00-04:00             NaN
2025-06-10 17:30:00-04:00             NaN
2025-06-10 18:00:00-04:00             NaN

Thank you,

I'm working with 5-min level data that only includes timestamps between 09:30 and 18:00.

After applying operation to the group, I get additional data like 18:00, 18:30, etc., even though there’s no data beyond 18:00 in my original data frame (df). These groups appear with empty values or NaNs.
I don't want to keep data that go beyond date frame in range from 09:00:00 to 16:00:00.

2
  • 1
    Why not just filter that out in your other_df or simply drop the rows with null? Commented Jul 24 at 6:32
  • @chitown88, I thought about it but dropping the rows would cause unneccesary computations, and was trying to avoid it! Commented Jul 24 at 9:37

2 Answers 2

0

With a minimal example:

test = pd.DataFrame(
   {
       "Publish date": [
            pd.Timestamp("2000-01-02 09:00:00"),
            pd.Timestamp("2000-01-02 09:00:05"),
            pd.Timestamp("2000-01-02 15:55:00"),
            pd.Timestamp("2000-01-12 16:00:00")
        ],
        "Open ": [0, 1, 2, 3],
        "Close ": [10, 20, 30, 40]
    }
)
groupby = test.groupby(pd.Grouper(key='Publish date', freq='30min', origin='end')).mean().reset_index()

so in your case, you just need to adjust the origin and that should lead to the deseried ouput:

filtered = self.df.groupby(pd.Grouper(key = 'Datetime', freq = '30min', origin = 'end')).mean().reset_index()
Sign up to request clarification or add additional context in comments.

1 Comment

ahh thank you. I ran the code online but it does still generate data outside of the range (9:00 to 16:00) I thought origin just determines how the interval gets set not necessarily restricting the data range
0

pd.Grouper creates full-length bins, even when there’s no data in the tail bin (like 16:00–16:30). This is normal behavior for time-based grouping.

Try using the closed parameters.

filtered = self.df.groupby(
    pd.Grouper(key='Datetime', freq='30min', origin='start', closed='left')
).mean().reset_index()

or simply drop the nulls when you aggragte/get the mean

self.other_df['Avg_volatility'] = filtered['volatility'].mean().dropna()

1 Comment

thank you for the help! I am a bit confused tho, I thought, closed itself does not prevent Grouper from creating bins outside the range of your data; instead, it defines the boundary rules for assigning timestamps within the overall range of bins. how does it help to create an exclusive bin?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.