cobre-stochastic
experimental
cobre-stochastic provides the stochastic process models for the Cobre power
systems ecosystem. It builds probabilistic representations of hydro inflow
time series — using Periodic Autoregressive (PAR(p)) models — and generates
correlated noise scenarios for use by iterative scenario-based optimization
algorithms. The crate is solver-agnostic: it supplies fully-initialized
stochastic infrastructure components that any scenario-based iterative
optimization algorithm can consume read-only, with no dependency on any
particular solver vertical.
The crate has no dependency on cobre-solver or cobre-comm. It depends only
on cobre-core for entity types and on a small set of RNG and hashing crates
for deterministic noise generation.
Module overview
| Module | Purpose |
|---|---|
par | PAR(p) coefficient preprocessing: validation, original-unit conversion, and the PrecomputedParLp cache |
noise | Deterministic noise generation: SipHash-1-3 seed derivation (seed) and Pcg64 RNG construction (rng) |
correlation | Cholesky-based spatial correlation: decomposition (cholesky) and profile resolution (resolve) |
tree | Opening scenario tree: flat storage structure (opening_tree) and tree generation (generate) |
sampling | InSample scenario selection: sample_forward for picking an opening for a given iteration/scenario/stage |
context | StochasticContext integration type and build_stochastic_context pipeline entry point |
error | StochasticError with five variants covering all failure domains of the stochastic layer |
Architecture
PAR(p) preprocessing and flat array layout
PAR(p) (Periodic Autoregressive) models describe the seasonal autocorrelation
structure of hydro inflow time series. Each hydro plant at each stage has an
InflowModel with a mean (mean_m3s), a standard deviation (std_m3s), and
a vector of AR coefficients in standardized form (ar_coefficients).
PrecomputedParLp is built once at initialization from raw InflowModel
parameters. It converts AR coefficients from standardized form (ψ*,
direct Yule-Walker output) to original-unit form at build time:
ψ_{m,ℓ} = ψ*_{m,ℓ} · s_m / s_{m-ℓ}
where s_m is std_m3s for the current stage’s season and s_{m-ℓ} is
std_m3s for the season ℓ stages prior. The converted coefficients and their
derived intercepts (base) are stored in stage-major flat arrays:
array[stage * n_hydros + hydro] (2-D: means, stds, base terms)
psi[stage * n_hydros * max_order + hydro * max_order + lag] (3-D: AR coefficients)
This layout ensures that all per-stage data for every hydro plant is contiguous in memory, maximizing cache utilization during sequential stage iteration within a scenario trajectory.
All hot-path arrays use Box<[f64]> (via Vec::into_boxed_slice()) rather
than Vec<f64>. The boxed-slice type communicates the no-resize invariant and
eliminates the capacity word from each allocation.
Deterministic noise via SipHash-1-3 seed derivation (DEC-017)
Each scenario realization in an iterative optimization run requires a draw from the noise distribution. Rather than broadcasting seeds across compute nodes — which would require communication — each node independently derives its own seed from a small tuple using SipHash-1-3 (DEC-017).
Two derivation functions are provided:
derive_forward_seed(base_seed, iteration, scenario, stage) -> u64: hashes a 20-byte little-endian wire formatbase_seed (8B) ++ iteration (4B) ++ scenario (4B) ++ stage (4B).derive_opening_seed(base_seed, opening_index, stage) -> u64: hashes a 16-byte wire formatbase_seed (8B) ++ opening_index (4B) ++ stage (4B).
The different wire lengths provide domain separation without explicit prefixes,
preventing hash collisions between forward-pass seeds and opening-tree seeds.
stage in both functions is always stage.id (the domain identifier), never
stage.index (the array position), because array positions shift under stage
filtering while IDs are stable.
From the derived seed, a Pcg64 RNG is constructed via rng_from_seed. The
PCG family provides good statistical quality with fast generation, suitable
for producing large numbers of standard-normal samples via the StandardNormal
distribution.
Cholesky-based spatial correlation
Hydro inflow series at neighboring plants are spatially correlated. cobre-stochastic
applies a Cholesky transformation to convert independent standard-normal samples
into correlated samples.
The Cholesky decomposition is hand-rolled using the Cholesky-Banachiewicz
algorithm (~150 lines). No external linear algebra crate is added to the
dependency tree. The lower-triangular factor L (such that Sigma = L * L^T)
is stored in packed lower-triangular format: element (i, j) with j <= i is at index i*(i+1)/2 + j. This eliminates the zero upper-triangle
entries and halves memory usage.
Correlation profiles can be defined per-season. DecomposedCorrelation holds
all profiles in a BTreeMap<String, Vec<GroupFactor>> — the BTreeMap
guarantees deterministic iteration order, which is required for
declaration-order invariance.
Before entering the hot optimization loop, callers must invoke
DecomposedCorrelation::resolve_positions(&mut self, entity_order: &[EntityId])
once. This pre-computes the positions of each group’s entities within the
canonical entity order and stores them on each GroupFactor as
Option<Box<[usize]>>. With positions pre-computed, apply_correlation
avoids a per-call O(n) linear scan and heap allocation on the hot path.
If a correlation group’s entity IDs are only partially present in
entity_order, the Cholesky transform is skipped for that group entirely.
Entities not in any group retain their independent noise values unchanged.
Opening tree structure
The opening scenario tree pre-generates all noise realizations used during the backward pass of the optimization algorithm, before the iterative loop begins. This avoids per-iteration recomputation and ensures the backward pass always operates on a fixed, reproducible set of scenarios.
OpeningTree stores all noise values in a single flat contiguous array with
stage-major ordering:
data[stage_offsets[stage] + opening_idx * dim .. + dim]
The stage_offsets array has length n_stages + 1. The sentinel entry
stage_offsets[n_stages] equals data.len(), making bounds checks exact
without special-casing the last stage. This sentinel pattern is used
consistently in PrecomputedParLp, OpeningTree, and throughout
StochasticContext.
Pre-study stages (those with negative stage.id) are excluded from the
opening tree but remain in inflow_models for PAR lag initialization.
StochasticContext as the integration entry point
StochasticContext bundles the three independently-built components into a
single ready-to-use value:
PrecomputedParLp— PAR coefficient cache for LP RHS patching.DecomposedCorrelation— pre-decomposed Cholesky factors for all profiles.OpeningTree— pre-generated noise realizations for the backward pass.
build_stochastic_context(&system, base_seed) runs the full preprocessing
pipeline in a fixed order: validate PAR parameters, build the coefficient
cache, decompose correlation matrices, generate the opening tree. After
construction, all fields are immutable. StochasticContext is Send + Sync,
verified by a compile-time assertion and a unit test.
sample_forward for InSample scenario selection
sample_forward implements the InSample scenario selection strategy: for each
(iteration, scenario, stage) triple, it deterministically selects one opening
from the tree by deriving a seed via derive_forward_seed and sampling a
Pcg64 RNG. The selected opening index and its noise slice are returned together,
so the caller can both log which opening was chosen and immediately use the
noise values.
Public types
StochasticContext
Owns all three preprocessing pipeline outputs: PrecomputedParLp,
DecomposedCorrelation, and OpeningTree. Constructed by
build_stochastic_context and then consumed read-only. Accessors:
par_lp(), correlation(), opening_tree(), tree_view(), base_seed(),
dim(), n_stages(). Both Send and Sync.
PrecomputedParLp
Cache-friendly PAR(p) model data for LP RHS patching. Stores means, standard
deviations, original-unit AR coefficients (ψ), and intercept terms (base) in
stage-major flat arrays (Box<[f64]>). Built via PrecomputedParLp::build.
Accessors: n_hydros(), n_stages(), max_order(), mean(), std(),
base(), psi().
DecomposedCorrelation
Holds Cholesky-decomposed correlation factors for all profiles, keyed by
profile name in a BTreeMap. Built via DecomposedCorrelation::build, which
validates and decomposes all profiles eagerly — errors surface at initialization,
not at per-stage lookup time. Call resolve_positions once with the canonical
entity order before entering the optimization loop.
OpeningTree
Fixed opening scenario tree holding pre-generated noise realizations. All noise
values are in a flat Box<[f64]> with stage-major ordering and a sentinel
offset array of length n_stages + 1. Provides opening(stage_idx, opening_idx) -> &[f64] for element access and view() -> OpeningTreeView<'_> for a
zero-copy borrowed view.
OpeningTreeView<'a>
A zero-copy borrowed view over an OpeningTree, with the same accessor API:
opening(stage_idx, opening_idx), n_stages(), n_openings(stage_idx),
dim(). Passed to sample_forward to avoid cloning the tree data.
StochasticError
Returned by all fallible APIs. Five variants:
| Variant | When it occurs |
|---|---|
InvalidParParameters | AR order > 0 with zero standard deviation, or ill-conditioned coefficients |
CholeskyDecompositionFailed | Correlation matrix is not positive-definite |
InvalidCorrelation | Missing default profile, ambiguous profile set, or out-of-range correlation entry |
InsufficientData | Fewer historical records than the PAR order requires |
SeedDerivationError | Hash computation produces an invalid result during seed derivation |
Implements std::error::Error, Send, and Sync.
ParValidationReport
Return type of validate_par_parameters. Contains a list of ParWarning
values for non-fatal issues (e.g., high AR coefficients that may indicate
numerical instability) that the caller can inspect or log before proceeding
to PrecomputedParLp::build.
ParWarning
A non-fatal PAR parameter warning. Carries the hydro ID, stage ID, and a human-readable description of the potential issue.
GroupFactor
A single correlation group’s Cholesky factor with its associated entity ID
mapping. Fields: factor: CholeskyFactor, entity_ids: Vec<EntityId>, and
pre-computed positions: Option<Box<[usize]>> (filled by resolve_positions).
CholeskyFactor
The lower-triangular Cholesky factor L of a correlation matrix, stored in
packed row-major form. Element (i, j) with j <= i is at index
i*(i+1)/2 + j. Constructed via CholeskyFactor::decompose(&matrix) and
applied via transform(&input, &mut output).
Usage example
The following shows how to construct a stochastic context from a loaded system and use it to sample a forward-pass scenario.
#![allow(unused)]
fn main() {
use cobre_stochastic::{
build_stochastic_context,
sampling::insample::sample_forward,
};
// `system` is a `cobre_core::System` produced by `cobre_io::load_case`.
// `base_seed` comes from the study configuration (application layer handles
// the Option<i64> -> u64 conversion and OS-entropy fallback).
let ctx = build_stochastic_context(&system, base_seed)?;
println!(
"stochastic context: {} hydros, {} study stages",
ctx.dim(),
ctx.n_stages(),
);
// Obtain a borrowed view over the opening tree (zero-copy).
let tree_view = ctx.tree_view();
// In the iterative optimization loop, select a forward scenario for each
// (iteration, scenario, stage) triple.
let iteration: u32 = 0;
let scenario: u32 = 0;
for (stage_idx, stage) in study_stages.iter().enumerate() {
// stage.id is the domain identifier; stage_idx is the array position.
let (opening_idx, noise_slice) = sample_forward(
&tree_view,
ctx.base_seed(),
iteration,
scenario,
stage.id as u32,
stage_idx,
);
// `noise_slice` has length `ctx.dim()` (one value per hydro plant).
// Pass to LP RHS patching together with `ctx.par_lp()`.
let _ = (opening_idx, noise_slice);
}
Ok::<(), cobre_stochastic::StochasticError>(())
}
Performance notes
cobre-stochastic is designed so that all performance-critical preprocessing
happens once at initialization. The iterative optimization loop consumes
already-materialized data through slice indexing, with no re-allocation on the
hot path.
Pre-computed entity positions (resolve_positions)
DecomposedCorrelation::resolve_positions must be called once before entering
the optimization loop. It pre-computes the mapping from each correlation group’s
entity IDs to their positions in the canonical entity_order slice and stores
the result as Option<Box<[usize]>> on each GroupFactor. Without this
pre-computation, apply_correlation would perform an O(n) linear scan and a
Vec allocation for every noise draw.
Stack-allocated buffers for small groups (MAX_STACK_DIM = 64)
Inside apply_correlation, intermediate working buffers for correlation groups
with at most 64 entities are stack-allocated (using arrayvec or a fixed-size
array on the stack). Groups larger than this threshold fall back to
heap-allocated Vec. The fast path covers the overwhelming majority of
practical correlation groups, eliminating heap allocation from the inner loop
for typical study configurations.
Incremental row_base in Cholesky transform
The packed lower-triangular storage index for element (i, j) is
i*(i+1)/2 + j. Rather than recomputing the triangular index from scratch for
each row, the transform method maintains an incremental row_base variable
that is incremented by i+1 at the end of each row. This eliminates a
multiplication per row iteration on the hot path of the Cholesky forward
substitution.
Box<[f64]> for the no-resize invariant
All fixed-size hot-path arrays in PrecomputedParLp, OpeningTree, and
CholeskyFactor use Box<[f64]> rather than Vec<f64>. The boxed-slice
type communicates that these arrays are immutable after construction, eliminates
the capacity word from each allocation, and allows the optimizer to treat the
length as a compile-time-stable bound.
Feature flags
cobre-stochastic has no optional feature flags. All dependencies are always
compiled. No external system libraries are required (HiGHS, MPI, etc.).
# Cargo.toml
cobre-stochastic = { version = "0.1" }
Testing
Running the test suite
cargo test -p cobre-stochastic
No external dependencies or system libraries are required. All dependencies
(siphasher, rand, rand_pcg, rand_distr, thiserror) are Cargo-managed. The
--all-features flag is not needed — there are no feature flags.
Test suite overview
The crate has 125 tests total: 105 unit tests, 5 conformance integration tests, 4 reproducibility integration tests, and 11 doc-tests.
Conformance suite (tests/conformance.rs)
The conformance test suite verifies the PAR(p) preprocessing pipeline against hand-computed fixtures with known exact outputs.
Two fixtures are used:
- AR(0) fixture: a zero-order AR model (pure noise, no lagged terms). The
precomputed
psiarray must be all-zeros and thebasevalues must equal the raw means. Tolerance: 1e-10. - AR(1) fixture: a first-order AR model with a pre-study stage (negative
stage.id) that supplies the lag mean and standard deviation for coefficient unit conversion. The conversion formulaψ = ψ* · s_m / s_lagis tested against a hand-computed value. Tolerance: 1e-10.
Reproducibility suite (tests/reproducibility.rs)
Four tests verify the determinism and invariance properties that are required for correct behavior in a distributed, multi-run setting:
- Seed determinism: calling
derive_forward_seedandderive_opening_seedwith the same inputs always returns bitwise-identical seeds. Golden-value regression pins the exact hash output for a known(base_seed, ...)tuple. - Opening tree seed sensitivity: different
base_seedvalues produce different opening trees (verified by checking that at least one noise value differs across the full tree). Usesany()over all tree entries rather thanassert_ne!on the whole tree, to handle the astronomically unlikely case where two seeds produce one identical value. - Declaration-order invariance: inserting hydros in reversed order into a
SystemBuilder(which sorts byEntityIdinternally) produces aStochasticContextwith bitwise-identical PAR arrays, opening tree, and Cholesky transform output. This verifies the canonical-order invariant across the full preprocessing pipeline. - Infrastructure genericity gate: a grep audit confirms that no algorithm-specific
references appear anywhere in the crate source tree. The gate is encoded as a
#[test]usingstd::process::Commandso it runs automatically in CI.
Design notes
Communication-free noise generation (DEC-017)
The original design considered broadcasting a seed from the root rank to all workers before each iteration. DEC-017 rejected this approach because it adds an MPI collective on the hot path and creates a serialization point as the number of ranks grows.
The alternative — deriving each rank’s seeds independently from a common
base_seed plus a context tuple — requires no communication and produces
identical results regardless of the number of ranks. SipHash-1-3 was chosen
because it is non-cryptographic (fast), produces high-quality 64-bit hashes
suitable for seeding a CSPRNG, and is available in the siphasher crate with
no system dependencies.
The two wire formats (20 bytes for forward seeds, 16 bytes for opening seeds) use length-based domain separation rather than an explicit prefix byte, which is slightly more efficient and equally correct given that the two sets of input tuples have different shapes and lengths.