4

I have the following dataframe:

import pandas as pd
import csv

lst = [['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
       ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'],
       ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00']
      ]

df = pd.DataFrame(lst, columns = ['Symbol', 'Date', 'open', 'high', 'low', 'close', 'volume', 'hour'])

df['Date'] = pd.to_datetime(df['Date'])
df['hour'] = pd.to_datetime(df['hour'], format='%H:%M:%S')
df.index = pd.MultiIndex.from_arrays([df['Symbol'], df['Date']], names=['Symbol', 'Date'])
df.drop(columns=['Symbol', 'Date'], inplace=True)

When run, it yields this:

                                 open  high   low  close  volume                hour day_high day_low
Symbol               Date                                                                            
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14 1900-01-01 08:30:00                 
                     2025-07-14  2.80  2.80  2.50   2.53      61 1900-01-01 08:31:00                 
                     2025-07-14  2.45  2.45  2.45   2.45       2 1900-01-01 08:32:00                 
                     2025-07-14  2.58  2.80  2.58   2.60      32 1900-01-01 08:33:00                 
                     2025-07-14  2.50  2.50  2.25   2.30       5 1900-01-01 08:34:00                 
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246 1900-01-01 08:30:00                 
                     2025-07-09  0.05  0.10  0.03   0.07     452 1900-01-01 08:31:00                 
                     2025-07-09  0.07  0.10  0.05   0.07     137 1900-01-01 08:32:00                 
                     2025-07-09  0.07  0.07  0.07   0.07       5 1900-01-01 08:33:00                 
                     2025-07-09  0.07  0.07  0.05   0.05     225 1900-01-01 08:34:00                 
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2 1900-01-01 08:30:00                 
                     2025-07-11  7.20  7.20  7.20   7.20       2 1900-01-01 08:31:00                 
                     2025-07-11  6.92  6.92  6.92   6.92      20 1900-01-01 08:32:00                 
                     2025-07-11  6.58  6.58  6.58   6.58       1 1900-01-01 08:34:00                 
                     2025-07-11  6.41  6.41  6.41   6.41       2 1900-01-01 08:35:00  

This is a fragment of a dataframe that has 820,000 lines. These are options contracts trading over approximately two weeks, with 1 minute bars. I have set up a multiindex with Symbol and Date because it just makes sense to me to filter data that way.

What I want to do, is find max value for a contract on a given day, and append it to the column 'day_high' where the time in 'hour' column is equal to 08:30:00.

I've tried the following line of code:

df.loc[('SPXW 250715C06310000', '2025-07-14'), 'day_high'] = df.loc[('SPXW 250715C06310000', '2025-07-14')]['high'].max()

This line of code puts the max value in all rows. I've tried a few different modifiers to stick specifically in the row where hour = 8:30:00, but I've had no luck.

So... here are my questions:

  1. How can I assign the max, or whatever value from a given day to a different column where 'hour'=8:30:00?

  2. Same as question #2, but for all contracts in the dataframe. Will I need to set up a for loop or is there an idiomatic way of doing this with pandas?

TIA

1 Answer 1

4

You need assign by mask for 8:30 and MultiIndex:

#simplier set MutiIndex
df['Date'] = pd.to_datetime(df['Date'])
df['hour'] = pd.to_datetime(df['hour'], format='%H:%M:%S')

df = df.set_index(['Symbol', 'Date'])

mask = df['hour'].dt.strftime('%H:%M').eq('08:30') & 
       (df.index == ('SPXW 250715C06310000', pd.Timestamp('2025-07-14')))

df.loc[mask, 'day_high'] = df.loc[('SPXW 250715C06310000', '2025-07-14')]['high'].max()

print (df)

                                 open  high   low  close  volume  \
Symbol               Date                                          
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14   
                     2025-07-14  2.80  2.80  2.50   2.53      61   
                     2025-07-14  2.45  2.45  2.45   2.45       2   
                     2025-07-14  2.58  2.80  2.58   2.60      32   
                     2025-07-14  2.50  2.50  2.25   2.30       5   
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246   
                     2025-07-09  0.05  0.10  0.03   0.07     452   
                     2025-07-09  0.07  0.10  0.05   0.07     137   
                     2025-07-09  0.07  0.07  0.07   0.07       5   
                     2025-07-09  0.07  0.07  0.05   0.05     225   
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2   
                     2025-07-11  7.20  7.20  7.20   7.20       2   
                     2025-07-11  6.92  6.92  6.92   6.92      20   
                     2025-07-11  6.58  6.58  6.58   6.58       1   
                     2025-07-11  6.41  6.41  6.41   6.41       2   

                                               hour  day_high  
Symbol               Date                                      
SPXW 250715C06310000 2025-07-14 1900-01-01 08:30:00      2.87  
                     2025-07-14 1900-01-01 08:31:00       NaN  
                     2025-07-14 1900-01-01 08:32:00       NaN  
                     2025-07-14 1900-01-01 08:33:00       NaN  
                     2025-07-14 1900-01-01 08:34:00       NaN  
SPXW 250709C06345000 2025-07-09 1900-01-01 08:30:00       NaN  
                     2025-07-09 1900-01-01 08:31:00       NaN  
                     2025-07-09 1900-01-01 08:32:00       NaN  
                     2025-07-09 1900-01-01 08:33:00       NaN  
                     2025-07-09 1900-01-01 08:34:00       NaN  
SPXW 250715C06310000 2025-07-11 1900-01-01 08:30:00       NaN  
                     2025-07-11 1900-01-01 08:31:00       NaN  
                     2025-07-11 1900-01-01 08:32:00       NaN  
                     2025-07-11 1900-01-01 08:34:00       NaN  
                     2025-07-11 1900-01-01 08:35:00       NaN  

For second use GroupBy.max:

mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
df.loc[mask, 'day_high'] = df.groupby(['Symbol','Date'])['high'].max()

Or GroupBy.transform with Series.where for set only rows with 8:30:

mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
df['day_high'] = df.groupby(['Symbol','Date'])['high'].transform('max').where(mask)

print (df)

                                 open  high   low  close  volume  \
Symbol               Date                                          
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14   
                     2025-07-14  2.80  2.80  2.50   2.53      61   
                     2025-07-14  2.45  2.45  2.45   2.45       2   
                     2025-07-14  2.58  2.80  2.58   2.60      32   
                     2025-07-14  2.50  2.50  2.25   2.30       5   
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246   
                     2025-07-09  0.05  0.10  0.03   0.07     452   
                     2025-07-09  0.07  0.10  0.05   0.07     137   
                     2025-07-09  0.07  0.07  0.07   0.07       5   
                     2025-07-09  0.07  0.07  0.05   0.05     225   
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2   
                     2025-07-11  7.20  7.20  7.20   7.20       2   
                     2025-07-11  6.92  6.92  6.92   6.92      20   
                     2025-07-11  6.58  6.58  6.58   6.58       1   
                     2025-07-11  6.41  6.41  6.41   6.41       2   

                                               hour  day_high  
Symbol               Date                                      
SPXW 250715C06310000 2025-07-14 1900-01-01 08:30:00      2.87  
                     2025-07-14 1900-01-01 08:31:00       NaN  
                     2025-07-14 1900-01-01 08:32:00       NaN  
                     2025-07-14 1900-01-01 08:33:00       NaN  
                     2025-07-14 1900-01-01 08:34:00       NaN  
SPXW 250709C06345000 2025-07-09 1900-01-01 08:30:00      0.10  
                     2025-07-09 1900-01-01 08:31:00       NaN  
                     2025-07-09 1900-01-01 08:32:00       NaN  
                     2025-07-09 1900-01-01 08:33:00       NaN  
                     2025-07-09 1900-01-01 08:34:00       NaN  
SPXW 250715C06310000 2025-07-11 1900-01-01 08:30:00      7.30  
                     2025-07-11 1900-01-01 08:31:00       NaN  
                     2025-07-11 1900-01-01 08:32:00       NaN  
                     2025-07-11 1900-01-01 08:34:00       NaN  
                     2025-07-11 1900-01-01 08:35:00       NaN  
Sign up to request clarification or add additional context in comments.

3 Comments

I tried your answer to the first question: mask = df['hour'].dt.strftime('%H:%M').eq('08:30') & (df.index == ('SPXW 250715C06310000', pd.Timestamp('2025-07-14'))). It didn't give me any errors, but it didn't stick that max value either. I deleted pd.Timestamp, since the date was already a datetime object, and it worked.
Ugh, I just deleted the wrong comment. I wanted to thank you :) - I'm going to go study your code and try it out as soon as I can.
For answer to question 2, when I copy/pasted your first solution, it did not give me any errors, but it did not put the max values in the day_high columns either. When I copy pasted the second solution, it worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.