Wind Power Data Analysis - Python

Question

I am seeking some help and or perspectives in solving a problem.

I have a dataset (accessible here) with the following columns:

DATE: this is the date in dd/mm/yyyy format
HH: this is the "half-hour" window of the day. In a day there are 48 HH windows starting from 12 AM - 12:30 AM, 12:30 AM - 01:00 AM, and so on.
WIND_SPEED_MS: this is the forecasted wind speed in m/s
GENERATION_MW: this is the actual power generated by a windfarm in MW
SEASONAL_NORMAL_WIND_SPEED_KNOTS: this is the long-term, historically averaged, windspeed for a given time of day and day of year, in knots (1 nautical mile per hour)

The goal: To correlate wind power generation (MWh) to the forecasted wind speed (m/s), so that for a future wind speed forecast, I may predict the expected power output reliably.

Optional side questions:

What explains the power generation at very high wind speeds (when in reality most turbines should all be stopped for safety reasons beyond 30-35 m/ s)?
What explains really high levels of wind generation, far above the "ceiling" that can be observed in the scatter plot?
Why is the seasonal normal wind speed in knots, when converted to m/s, so low. It doesn't seem right.

Your help would be greatly appreciated!

My approach: average generation data (MW) in wind speed bins of 0.5 m/s width. This gives me a smoother curve that I can interpolate using piecewise polynomial. Then I can look up what will be the power output for a given (future) forecast wind speed.

Any comments on this approach would also be appreciated.

Mireia Gómez · Accepted Answer · 2025-04-16 13:17:56Z

You want to predict power output (in MW or MWh) based on forecasted wind speed (in m/s). Here’s a structured approach:

Preprocessing

Convert DATE and HH into a timestamp for easier handling.
Convert SEASONAL_NORMAL_WIND_SPEED_KNOTS into m/s: 1 knot ≈ 0.51444m/s

Visual Analysis

Plot GENERATION_MW vs. WIND_SPEED_MS in an interactive scatter plot, using Plotly. Expect a nonlinear "power curve" shape:
- Low speeds → low generation
- Moderate speeds → sharp increase
- High speeds → plateau or drop

Modeling

Use a regression model to learn the wind power curve. Try these baseline options: Polynomial Regression, Random Forest, Gradient Boosting (e.g., XGBoost), Support Vector Regression (SVR).
Do error analysis (RMSE, MAE by time of day or season).

You might also want to bin wind speeds and look at average generation per bin to get a smoothed empirical curve before fitting.

Stack Exchange Network

Wind Power Data Analysis - Python

1 Answer 1

Your Answer

Hot Network Questions

Wind Power Data Analysis - Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions