I have a dataframe with multiple columns, including analysis_date (datetime), and forecast_hour (int). I want to add a new column called total_hours, which is the sum of the hour component of analysis_date plus the corresponding forecast_hour in that row. Here's a visual example:
original dataframe:
analysis_date | forecast_hour
12-2-19-05 | 3
12-2-19-06 | 3
12-2-19-07 | 3
12-2-19-08 | 3
dataframe after calculation:
analysis_date | forecast_hour | total_hours
12-2-19-05 | 3 | 8
12-2-19-06 | 3 | 9
12-2-19-07 | 3 | 10
12-2-19-08 | 3 | 11
Here is the current logic that does what I want:
df['total_hours'] = df.apply(lambda row: row.analysis_date.hour + row.forecast_hours_out, axis=1)
Unfortunately, this is too slow for my application, it takes around 15 seconds for a dataframe with a few hundred thousand entries. I have tried using the swifter library, but unfortunately, it took approximately as long (if not longer) than my current implementation.