I am building a backtesting project in Python using Pandas.
I have:
A large tick / 1-minute level DataFrame (df) with full market data.
A 15-minute interval DataFrame (df_15) created from it using resampling.
In my strategy, I calculate entry points from the 15-minute candles, but I need to check the stop loss trigger from the underlying 1-minute data.
So during the backtest, I keep "going back" to the original DataFrame to find the exact stop loss hit.
This works, but is it efficient to repeatedly go back to the full df for each row of df_15 to check stop loss conditions?
Would it be better to pre-compute stop loss conditions when generating the 15-minute candles (e.g., by merging/joining df with df_15 beforehand)?
For large datasets (millions of rows), what is the most efficient pattern in Pandas:
- per-row lookups into the original df; or
- vectorized/pre-merged stop loss computations?
Here's a simplified version of my code:
import pandas as pd
df_15 = df.resample('15min').agg({
'Open': 'first',
'High': 'max',
'Low': 'min',
'Close': 'last'
})
results = []
> for idx, row in df_15.iterrows():
entry_price = row['Open']
stop_loss = entry_price - 10 # example fixed SL
# Go back to original df to check if SL was hit in this 15-min interval
interval_slice = df[(df.index >= idx) & (df.index < idx + pd.Timedelta('15min'))]
sl_hit = interval_slice[interval_slice['Low'] <= stop_loss]
if not sl_hit.empty:
results.append({"time": idx, "stop_loss_hit": True})
else:
results.append({"time": idx, "stop_loss_hit": False})
out_df = pd.DataFrame(results)
print(out_df)
What I tried:
The above iterative approach works fine for small data.
But when running on large datasets, performance slows down noticeably because of repeated slicing into the original df.
I want to know the best practice: keep the logic as-is, or restructure my data so stop loss checks are pre-computed / vectorized.
merge_asofat which point you can perform your checks on the primary dataframe, or you may be able to us apandas.DataFrame.rollingto just do it all at once. Either way, if you provide a reproducible example on a small subset of data it would be possible to help you solve the problem.if/else){"stop_loss_hit": not sl_hit.empty}