I am seeking some help and or perspectives in solving a problem.
I have a dataset (accessible here) with the following columns:
- DATE: this is the date in dd/mm/yyyy format
- HH: this is the "half-hour" window of the day. In a day there are 48 HH windows starting from 12 AM - 12:30 AM, 12:30 AM - 01:00 AM, and so on.
- WIND_SPEED_MS: this is the forecasted wind speed in m/s
- GENERATION_MW: this is the actual power generated by a windfarm in MW
- SEASONAL_NORMAL_WIND_SPEED_KNOTS: this is the long-term, historically averaged, windspeed for a given time of day and day of year, in knots (1 nautical mile per hour)
The goal: To correlate wind power generation (MWh) to the forecasted wind speed (m/s), so that for a future wind speed forecast, I may predict the expected power output reliably.
Optional side questions:
- What explains the power generation at very high wind speeds (when in reality most turbines should all be stopped for safety reasons beyond 30-35 m/ s)?
- What explains really high levels of wind generation, far above the "ceiling" that can be observed in the scatter plot?
- Why is the seasonal normal wind speed in knots, when converted to m/s, so low. It doesn't seem right.
Your help would be greatly appreciated!
My approach: average generation data (MW) in wind speed bins of 0.5 m/s width. This gives me a smoother curve that I can interpolate using piecewise polynomial. Then I can look up what will be the power output for a given (future) forecast wind speed.
Any comments on this approach would also be appreciated.