Stochastic Modeling

Hydrothermal dispatch is inherently uncertain. Reservoir inflows depend on rainfall and snowmelt that cannot be known in advance, and electrical load varies in ways that are predictable in aggregate but noisy at any given moment. A dispatch policy that ignores uncertainty will systematically under-prepare for dry periods and over-commit thermal capacity in wet years.

Cobre addresses this by treating inflows and loads as stochastic processes. During training, the solver samples many scenario trajectories and builds a policy that performs well across the distribution of possible futures — not just for a single forecast. The stochastic layer is responsible for generating those scenario trajectories in a statistically sound, reproducible way.

The stochastic models are driven by historical statistics provided by the user in the scenarios/ directory of the case. If no scenarios/ directory is present, Cobre falls back to white-noise generation using only the stage definitions in stages.json. For any study with real hydro plants, providing historical inflow statistics is strongly recommended.

The `scenarios/` Directory

The scenarios/ directory sits alongside the other input files in the case directory:

my_study/
  config.json
  stages.json
  ...
  scenarios/
    inflow_seasonal_stats.parquet
    load_seasonal_stats.parquet
    inflow_ar_coefficients.parquet    (only when ar_order > 0)

The directory is optional. When it is absent, Cobre generates independent standard-normal noise at each stage for each hydro plant and scales it by a default standard deviation — effectively treating all uncertainty as white noise. This is sufficient for verifying a case loads correctly, but is not representative of real inflow dynamics.

When scenarios/ is present, Cobre reads the Parquet files and fits a Periodic Autoregressive (PAR(p)) model for each hydro plant and each bus. The fitted model generates correlated, seasonally-varying inflow and load trajectories that reflect the historical statistics you supply.

Inflow Statistics

inflow_seasonal_stats.parquet provides the seasonal distribution of historical inflows for every (hydro plant, stage) pair.

Schema

Column	Type	Nullable	Description
`hydro_id`	INT32	No	Hydro plant identifier (matches `id` in `hydros.json`)
`stage_id`	INT32	No	Stage identifier (matches `id` in `stages.json`)
`mean_m3s`	DOUBLE	No	Seasonal mean inflow in m³/s
`std_m3s`	DOUBLE	No	Seasonal standard deviation in m³/s (must be >= 0)
`ar_order`	INT32	No	Number of AR lags in the PAR(p) model (0 = white noise)

The file must contain exactly one row per (hydro_id, stage_id) pair. Every hydro plant defined in hydros.json must have a row for every stage defined in stages.json. The validator will reject the case if any combination is missing.

For the 1dtoy example, the file has 4 rows — one for each of the four monthly stages — for the single hydro plant UHE1 (hydro_id = 0).

Inspecting the file

# Polars
import polars as pl
df = pl.read_parquet("scenarios/inflow_seasonal_stats.parquet")
print(df)

# Pandas
import pandas as pd
df = pd.read_parquet("scenarios/inflow_seasonal_stats.parquet")
print(df)

-- DuckDB
SELECT * FROM read_parquet('scenarios/inflow_seasonal_stats.parquet');

# R with arrow
library(arrow)
df <- read_parquet("scenarios/inflow_seasonal_stats.parquet")
print(df)

Load Statistics

load_seasonal_stats.parquet provides the seasonal distribution of electrical demand at each bus. It drives the stochastic load model used during training and simulation.

Schema

Column	Type	Nullable	Description
`bus_id`	INT32	No	Bus identifier (matches `id` in `buses.json`)
`stage_id`	INT32	No	Stage identifier (matches `id` in `stages.json`)
`mean_mw`	DOUBLE	No	Seasonal mean load in MW
`std_mw`	DOUBLE	No	Seasonal standard deviation in MW (must be >= 0)
`ar_order`	INT32	No	Number of AR lags in the PAR(p) model (0 = white noise)

One row per (bus_id, stage_id) pair is required. Every bus in buses.json must have a row for every stage. The load mean and standard deviation determine both the expected demand level and how much it varies across scenarios in each stage.

The PAR(p) Model

PAR(p) stands for Periodic Autoregressive model of order p. It is the standard model for hydro inflow time series in long-term hydrothermal planning because inflows have two key properties the model captures well: seasonal patterns (wet seasons and dry seasons recur predictably each year) and autocorrelation (a wet month tends to be followed by another wet month, and vice versa).

What `ar_order` controls

The ar_order column in the seasonal statistics files sets the number of autoregressive lags for each (entity, stage) pair.

ar_order = 0 — white noise. The inflow at each stage is drawn independently from a normal distribution with the specified mean and standard deviation. There is no memory between stages: knowing last month’s inflow tells you nothing about this month’s. This is the simplest setting and appropriate when you lack historical data to fit AR coefficients, or when the inflow series shows very little autocorrelation.

ar_order > 0 — periodic autoregressive. The inflow at each stage depends on the inflows at the preceding p stages, weighted by coefficients that reflect the seasonal autocorrelation structure. A wet period is followed by another wet period with the probability implied by the coefficients. Higher AR orders capture longer-range dependencies: ar_order = 1 captures month-to-month persistence, ar_order = 2 adds two-month memory, and so on. Most hydro inflow series are well-described by ar_order = 1 or ar_order = 2.

AR coefficients file

When any stage in inflow_seasonal_stats.parquet has ar_order > 0, Cobre also requires an inflow_ar_coefficients.parquet file in the scenarios/ directory. This file contains the fitted AR coefficients in standardized form (as produced by the Yule-Walker equations). The schema and the fitting procedure are documented in the Case Format Reference.

The 1dtoy example uses ar_order = 0 for all stages, so no coefficients file is needed.

When to use higher AR orders

In general:

Use ar_order = 0 when historical data is short or when you want to establish a baseline with the simplest possible model.
Use ar_order = 1 for most real hydro systems. Monthly inflows have strong one-month autocorrelation, and a first-order model captures the bulk of it.
Use ar_order = 2 or higher when the inflow series shows multi-month persistence (common in systems with large upstream catchments or snowmelt storage). Validate with autocorrelation plots of your historical data.
Setting ar_order > 0 with std_m3s = 0 is a validation error — the model requires non-zero variance to be identifiable.

For the theoretical derivation of the PAR(p) model, see Stochastic Modeling and PAR(p) Autoregressive Models in the methodology reference.

Correlation

Hydro plants that share a watershed tend to have correlated inflows: when the upstream basin receives heavy rainfall, all plants along the river benefit simultaneously. Ignoring this correlation can cause the optimizer to underestimate the risk of a system-wide dry spell.

Default behavior: independent noise

When no correlation configuration is provided, Cobre treats each hydro plant’s inflow as independent of all others. Each plant draws its own noise realization at each stage without any coupling. This is the correct setting for the 1dtoy example, which has only one hydro plant.

Configuring spatial correlation

For multi-plant systems, Cobre supports Cholesky-based spatial correlation. A correlation model is specified in correlation.json in the case directory and defines named correlation groups, each with a symmetric positive-definite correlation matrix.

{
  "method": "cholesky",
  "profiles": {
    "default": {
      "groups": [
        {
          "name": "basin_south",
          "entities": [
            { "type": "inflow", "id": 0 },
            { "type": "inflow", "id": 1 }
          ],
          "matrix": [
            [1.0, 0.7],
            [0.7, 1.0]
          ]
        }
      ]
    }
  }
}

Entities not listed in any group retain independent noise. Multiple profiles can be defined and scheduled to activate for specific stages (for example, using a wet-season correlation structure in January through March and a dry-season structure for the remaining months). Detailed correlation configuration documentation will be added with future multi-plant example cases.

Scenario Count and Seeds

`num_scenarios` in `stages.json`

Each stage in stages.json has a num_scenarios field that controls how many scenario branches are pre-generated for the opening scenario tree used during the backward pass. A larger value gives the backward pass more diverse inflow realizations to evaluate cuts against, at the cost of a proportionally larger opening tree in memory. For the 1dtoy example this is set to 10. Production studies typically use 50 to 200.

`forward_passes` in `config.json`

The forward_passes field in config.json controls how many scenario trajectories are sampled during each training iteration’s forward pass. This is distinct from num_scenarios: the forward pass draws new trajectories on each iteration using a deterministic per-iteration seed, while num_scenarios controls the pre-generated backward-pass tree.

The `seed` field

The seed field in the training section of config.json is the base seed for all stochastic generation in the run:

{
  "training": {
    "forward_passes": 50,
    "seed": 42,
    "stopping_rules": [{ "type": "iteration_limit", "limit": 200 }]
  }
}

The default value is 42 when seed is omitted. When a seed is provided, every run with the same case directory and the same seed produces bitwise-identical scenarios, training trajectories, and simulation results. This reproducibility is guaranteed regardless of the number of MPI ranks, because each rank derives its scenario seeds independently from the base seed using a deterministic hash — no inter-rank coordination is required.

To get a non-reproducible run (different scenarios each time), set "seed": null in config.json. Cobre will then derive the base seed from OS entropy at startup.

Inflow Non-Negativity

Normal distributions used in PAR(p) models have unbounded support: even with a positive mean, there is a non-zero probability of drawing a negative noise realisation that, after applying the AR dynamics, produces a negative inflow value. Negative inflow has no physical meaning and, if uncorrected, would violate water balance constraints in the LP.

Method in v0.1.0: penalty

Cobre v0.1.0 uses the penalty method to handle negative inflow realisations. A high-cost slack variable is added to each water balance row. When the LP solver encounters a scenario where the inflow would be negative, it draws on this virtual inflow at the penalty cost rather than violating the balance constraint. The penalty cost is configurable via the inflow_non_negativity field in the case configuration; the default keeps it high enough that the slack is used only when necessary.

In practice, the penalty is rarely activated in well-specified studies. It acts as a backstop for low-probability tail realisations.

Truncation methods: planned for a future release

Two additional methods from the literature — truncation (modifying LP row bounds based on external AR evaluation) and truncation with penalty (combining bounded slack with modified bounds) — are planned for a future release. These require evaluating the full inflow value a_h as a scalar before LP patching, which is a non-trivial architectural change in v0.1.0.

For the mathematical theory behind all three methods, see the Inflow Non-Negativity page in the methodology reference, or Oliveira et al. (2022), Energies 15(3):1115.

Anatomy of a Case — introductory walkthrough of the scenarios/ directory and Parquet schemas
Configuration — full documentation of config.json fields including seed and forward_passes
cobre-stochastic — internal architecture of the stochastic crate: PAR preprocessing, Cholesky correlation, opening tree, and seed derivation

Keyboard shortcuts

Cobre