What data is needed for renewable generation forecasting?

Kumo connects directly to your existing relational tables: GENERATORS, WEATHER_FORECASTS, HISTORICAL_GENERATION, GRID_DEMAND. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

4Regression · Generation Forecasting

Renewable Generation Forecasting

“What will solar/wind generation be tomorrow?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

What will solar/wind generation be tomorrow?

Renewable intermittency costs grid operators $5-15B annually in curtailment, balancing, and reserve capacity. Solar and wind forecasting errors of 15-25% force operators to maintain expensive spinning reserves. As renewable penetration increases, forecast accuracy becomes critical for grid stability and cost control. For a grid with 5 GW of renewable capacity, a 5% improvement in day-ahead forecasting saves $40-60M annually in reduced curtailment and reserve requirements.

Quick answer

Renewable generation forecasting AI predicts hourly solar and wind output per site for the next 24-48 hours by modeling spatial weather propagation, wake effects between turbines, and grid demand context. Traditional weather-to-power models forecast each site independently, missing how cloud fronts move across solar farm clusters and how wind wake effects reduce downstream turbine output. Graph-based models improve day-ahead accuracy by 5%+, saving grids with 5 GW of renewable capacity $40-60M annually in reduced curtailment and reserve requirements.

Approaches compared

4 ways to solve this problem

1. Numerical Weather Prediction (NWP) + Power Curves

Use weather forecast outputs (irradiance, wind speed) with manufacturer power curves to estimate generation. The standard approach used by most grid operators.

Best for

Initial renewable integration where you have weather forecasts but limited historical generation data.

Watch out for

Power curves assume ideal conditions. Real-world generation depends on panel soiling, inverter efficiency, wake effects, and grid curtailment decisions. NWP resolution (typically 1-3 km) misses micro-climate effects that matter for individual sites. Errors of 15-25% are common for day-ahead forecasts.

2. Site-Level ML (Per-Site Regression/LSTM)

Train separate ML models per generation site using historical generation, weather data, and site-specific features. Learns the site's actual power curve rather than relying on manufacturer curves.

Best for

Well-established sites with 2+ years of generation history and consistent operating conditions.

Watch out for

Treats each site independently. Cannot capture how a cloud front moving east-to-west will hit Solar Farm A at 10 AM and Solar Farm B at 11 AM. Wind farm models miss wake effects between turbines and neighboring farms. Also cannot incorporate grid demand context that affects curtailment decisions.

3. Ensemble Models (Blending Multiple Forecasts)

Combine multiple forecast sources (NWP, satellite imagery, ML models) into a blended forecast using weighted averaging or stacking. Reduces individual model errors.

Best for

Operational forecasting where you have access to multiple forecast streams and want to minimize worst-case errors.

Watch out for

Ensembles reduce average errors but cannot capture spatial correlations or grid context. If all constituent models miss the same cloud front, the ensemble will too. The improvement is statistical (reducing noise) rather than structural (capturing new information).

4. Graph Neural Networks (Kumo's Approach)

Connect generators, weather forecasts, historical generation, and grid demand into a renewable energy graph. GNNs learn generation patterns from spatial weather propagation, inter-site correlations, and grid demand context.

Best for

Grid regions with multiple renewable sites where spatial weather effects, wake interactions, and grid context drive generation outcomes.

Watch out for

Requires multi-site generation data and spatially resolved weather forecasts. Less value-add for isolated single-site installations without neighboring generation or complex grid interactions.

Key metric: A 5% improvement in day-ahead renewable forecasting saves $40-60M annually for a 5 GW grid. The improvement comes from spatial weather propagation, wake effects, and grid demand context that site-level models miss entirely.

Why relational data changes the answer

Renewable generation is inherently spatial. A cloud front moving across California does not hit all solar farms simultaneously. It arrives at coastal GEN03 at 10 AM, reduces generation 27%, and reaches inland farms 2-3 hours later. Wind farms have wake effects: turbines upstream reduce wind speed for downstream turbines, and this effect varies with wind direction and atmospheric stability. Grid context also matters: when grid demand is low on a sunny afternoon, operators curtail solar generation even though the sun is shining. All of these effects are relational, connecting generation sites to each other, to weather, and to grid conditions.

Site-level models forecast each generator independently. They see GEN03's cloud cover forecast at 60% and predict generation accordingly, but cannot anticipate that the same cloud system will reduce GEN01's output 2 hours later. Graph-based models connect generators through their spatial weather relationships and grid topology. SAP's SALT benchmark shows graph models at 91% accuracy vs 63% for gradient-boosted trees on relational prediction. RelBench shows 76.71 vs 62.44 for GNNs vs tree-based approaches. In renewable forecasting, a 5% accuracy improvement translates to $40-60M annually for a 5 GW grid, through reduced curtailment (dispatching storage instead of curtailing generation) and lower spinning reserve requirements.

Forecasting renewable generation site-by-site is like predicting rainfall at individual weather stations without tracking the storm system moving across the region. You would get each station's forecast roughly right on average but completely miss the timing and sequence. Graph-based forecasting tracks the entire weather system as it propagates across the generator network, predicting not just how much each site will generate but when, and how the pattern will unfold across the grid.

How KumoRFM solves this

Graph-powered intelligence for energy and utilities

Kumo connects generators, weather forecasts, historical generation, and grid demand into a renewable energy graph. The GNN learns generation patterns that depend on spatial weather propagation (cloud fronts moving across solar farms), wake effects between wind turbines, and how grid demand context affects curtailment decisions. PQL predicts hourly generation per site for the next 24-48 hours, enabling optimized dispatch and storage decisions.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

GENERATORS

generator_id	type	capacity_mw	location	age_years
GEN01	Solar Farm	250	Arizona	4
GEN02	Wind Farm	180	Texas	6
GEN03	Solar Farm	120	California	2

WEATHER_FORECASTS

location	date	hour	cloud_pct	wind_speed_mph	temp_f
Arizona	2025-03-06	12:00	10%	8	95
Texas	2025-03-06	14:00	25%	22	72
California	2025-03-06	13:00	60%	12	68

HISTORICAL_GENERATION

generator_id	date	total_mwh	capacity_factor	curtailed_mwh
GEN01	2025-03-05	1,450	72.5%	0
GEN02	2025-03-05	980	54.4%	45
GEN03	2025-03-05	520	43.3%	0

GRID_DEMAND

region	date	peak_demand_mw	renewable_pct	storage_available_mwh
Southwest	2025-03-06	18,500	32%	2,400
South Central	2025-03-06	25,200	28%	1,800
Pacific	2025-03-06	22,000	38%	3,200

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT SUM(HISTORICAL_GENERATION.total_mwh, 0, 24, hours)
FOR EACH GENERATORS.generator_id

Prediction output

Every entity gets a score, updated continuously

GENERATOR_ID	TYPE	PREDICTED_MWH	CAPACITY_FACTOR	VS_YESTERDAY
GEN01	Solar	1,520	76.0%	+4.8%
GEN02	Wind	1,120	62.2%	+14.3%
GEN03	Solar	380	31.7%	-26.9%

Understand why

Every prediction includes feature attributions — no black boxes

Generator GEN03 -- 120 MW Solar Farm in California

Predicted: 380 MWh predicted (31.7% capacity factor, -26.9% vs yesterday)

Top contributing features

Cloud cover forecast

60% (vs 25% yesterday)

35% attribution

Cloud front propagation from coast

Arriving 10 AM

24% attribution

Temperature impact on panel efficiency

68F (optimal)

17% attribution

Historical pattern for overcast days

30-35% capacity factor

14% attribution

Grid curtailment likelihood (low demand)

Possible PM

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about renewable generation forecasting

How accurate is AI at forecasting solar and wind generation?

Graph-based models achieve 8-12% MAPE for day-ahead solar forecasting and 12-18% MAPE for wind (wind is inherently more variable). This compares to 15-25% MAPE for NWP-based power curve methods and 10-15% for site-level ML. The improvement is largest for partially cloudy days (where cloud front tracking matters most) and for wind farms with significant wake effects. Clear-sky solar forecasting is already highly accurate and benefits less.

What data do you need for renewable generation forecasting?

At minimum: 1-2 years of historical generation data per site at hourly or sub-hourly resolution, and spatially resolved weather forecasts (irradiance, wind speed, temperature, cloud cover). High-value additions include: satellite cloud imagery (for nowcasting 0-6 hours), grid demand data (for curtailment prediction), and SCADA data from individual turbines (for wake effect modeling). Most grid operators have this data but it is often spread across multiple systems.

How does renewable forecasting help with battery storage decisions?

Accurate generation forecasting determines when to charge and discharge grid-scale batteries. If the model predicts a solar dip at 2 PM due to an incoming cloud front, operators can dispatch batteries to fill the gap rather than starting a gas peaker. The value of better forecasting is proportional to storage capacity: a 500 MWh battery that is optimally dispatched based on accurate forecasts generates 15-25% more value than one dispatched on conservative or inaccurate forecasts.

Can forecasting AI reduce renewable curtailment?

Yes, and this is a major value driver. Curtailment occurs when generation exceeds grid capacity or ramp rates exceed what the grid can absorb. Better forecasting enables operators to pre-position storage, adjust conventional generation, or activate demand response before the renewable ramp arrives. Grids with high renewable penetration (>30%) curtail 3-8% of potential generation. Graph-based forecasting can reduce curtailment by 30-50%, recovering generation worth $10-25M annually for a 5 GW system.

How does forecasting accuracy change with forecast horizon?

Accuracy degrades with horizon but the rate depends on weather stability. For solar: 0-6 hour forecasts use satellite imagery and achieve 5-8% MAPE. Day-ahead (12-36 hours) uses NWP and achieves 8-12% MAPE with graph models. Week-ahead is 15-20% MAPE and is primarily useful for maintenance scheduling, not dispatch. For wind: add 4-6% MAPE to each horizon. The most commercially valuable horizon is day-ahead, where dispatch and market participation decisions are made.

Bottom line: A grid with 5 GW of renewable capacity saves $40-60M annually by improving day-ahead generation forecasting 5%. Kumo's renewable graph captures spatial weather propagation, wake effects, and grid demand context that site-level weather models miss.

Related use cases

Explore more energy & utilities use cases

Use Case #1Grid Load ForecastingLearn more

Use Case #2Outage PredictionLearn more

Use Case #3Consumption Anomaly DetectionLearn more

Previous#3 Consumption Anomaly Detection

Topics covered

renewable energy forecasting AIsolar generation predictionwind power forecasting MLrenewable integration modelenergy generation forecastKumoRFM renewableintermittent generation predictiongrid balancing renewable AI

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free