Method-equivalence validation: the double-well on one engine

This page records a reproducibility check that every sampling strategy and every code path in PyRETIS reproduces the same crossing probability and rate constant on one simple system, and that the result matches an absolute analytical truth rather than only agreeing with itself. The system is a 1D double well sampled on the internal engine; the cases and the driver script live in examples/validation/methods/ and are launched by examples/validation/run_validation.py.

The point of the test is that the in-process RETIS loop, the infinite-swapping sampler analysed with WHAM, and the native run routed through the infinite-swapping scheduler (the Stage C route, including the multi-worker pool) all integrate the identical potential, so if their rate estimates agree – and land on the analytical Kramers rate – the sampling machinery is reproducible across strategies and across code paths. This complements the cross-engine validation, which checks engine equivalence on a shared force field.

The shared system

Every method case models the same 1D double well, so they must give the same rate within statistical error:

  • Potential: the quartic double well \(V = a x^4 - b x^2 + c\) with a = 1 and b = 2 (so the barrier is \(\Delta V = b^2 / (4 a) = 1\)), c = 0.
  • Dynamics: inertial Langevin at temperature T = 0.07 with a time step dt = 0.025 and friction gamma = 0.3 (reduced units), unit mass. With these numbers \(\beta\,\Delta V = 14.3\), so escape over the barrier is a genuine rare event.
  • Order parameter: the particle position, with eight interfaces spanning \([-0.99 \ldots 1.0]\).

The system parameters are not hard-coded into the analysis – they are read back out of each case’s input TOML (read_system_params), so the analytical reference below tracks the actual runs, and every method case is cross-checked to confirm it really models the same well.

The analytical reference – a known truth

The double well is simple enough to have a closed-form escape rate from Kramers’ theory, so the suite checks the simulations against an absolute truth, not only against each other. analytical_double_well_rate() evaluates it in three increasingly complete forms:

Table 58 Analytical double-well escape rate (reduced units).
Estimator Rate Meaning
k_TST 2.81e-07 transition-state theory, no recrossing (upper bound)
k_Kramers (spatial) 2.61e-07 moderate/high-friction Kramers prefactor
k_Kramers-MM (truth) 2.52e-07
  • Mel’nikov–Meshkov turnover factor (headline)

With \(\beta\,\Delta V = 14.3\) and reduced energy loss \(\delta = 11.4\) the turnover factor is \(\Upsilon = 0.97\) – the system sits firmly in the spatial-diffusion regime, so the Kramers result is accurate to a few percent. Converged cases should land on 2.5e-07 within their statistical error. A case can agree with every other case yet sit several sigma from the analytical rate – a shared systematic that the self-consistency check alone cannot see.

The methods

The same double well is sampled with every available strategy and through every code path. The native_* cases run the in-process RETIS loop via pyretisrun; infswap_wham runs the infinite-swapping sampler and is analysed with WHAM; the scheduler_* cases run a native config through the infinite-swapping scheduler (the Stage C native->scheduler route, PYRETIS_NATIVE_VIA_INFSWAP), which emits native-format output and is therefore analysed exactly like a native case and must reproduce its native sibling’s rate.

Table 59 Method cases on the shared double well.
Case Strategy / what it adds
native_sh standard shooting (pyretisrun, in-process loop)
native_ss stone skipping (pyretisrun)
native_wt web throwing (pyretisrun)
native_wf wire fencing (pyretisrun)
native_wf_ha wire fencing with high acceptance (pyretisrun)
native_wf_cap_* wire fencing with a capped window ([tis] interface_cap)
native_wt_sour_* web throwing with a shifted source sub-interface ([tis] interface_sour)
native_relshoot RETIS with relative per-ensemble shoot frequencies ([retis] relative_shoots)
native_*_pcg64 the same native methods driven by the PCG64 generator (rgen = "pcg64") – the A3.4 RNG-migration validation set
infswap_wham infinite swapping (pyretisrun) analysed with WHAM – a cross-check of the second sampler / code path
scheduler_sh standard shooting routed through the infinite-swapping scheduler (Stage C native->scheduler route; native-format output, analysed like native_sh)
scheduler_sh_w3 the same, with a multi-worker pool of three workers (PYRETIS_NATIVE_WORKERS=3) – the parallel-worker route
scheduler_wf wire fencing routed through the scheduler (WHAM Cxy/HA unweighting on the native route)
scheduler_ss stone skipping routed through the scheduler
scheduler_wt web throwing routed through the scheduler

The native_wf_cap_* / native_wt_sour_* / native_relshoot cases (group params) vary non-default TIS knobs that change the sampling but not the physics, so each must still reach the same analytical rate. The wf_convention group (native_wf/native_wf_ha, infswap_wham, scheduler_wf) cross-checks wire fencing across the code paths. On the scheduler’s native-output route the wf occupancy is HA-weighted (the compute_weight crossing count drives the swap, as in infretis), so the native writer applies the WHAM Cxy/HA unweighting (Weight = compute_weight / frac) to recover the per-ensemble crossing probability – without it the rate over-counts by ~two orders of magnitude; scheduler_ss carries the same treatment for stone skipping. The suite config’s select key (group tags such as wf_convention or params) runs a chosen subset without dropping the rest.

The native sh/ss/wt/wf cases (analysed with the standard PyRETIS crossing probability), the infinite-swapping infswap_wham (analysed with WHAM), and the scheduler-routed cases (the Stage C path, analysed as native) are a direct cross-check of the distinct code paths. The native_*_pcg64 cases are the A3.4 step 2 go/no-go: they run the identical native methods with the canonical PCG64 generator instead of the legacy MT19937 and must reproduce the MT19937 rate within statistical error before the default generator is flipped (see MERGE_TODO.md A3.4). As the in-process and infinite-swapping code paths unify, the infswap_* and scheduler_* cases should collapse onto their native equivalents.

Note

scheduler_ss (stone skipping through the scheduler) earlier crashed the internal engine’s streaming dump (FileNotFoundError on ss_shoot.xyz) because the move re-pointed a persistent path’s phase point at the transient dump file. That is fixed (stone skipping now dumps a copy), and the case is enabled like the others.

Running it and reading the output

The suite is not a unit test and not a tutorial: the GitLab CI cannot afford runs long enough to drive the rare-event statistics down, so it is launched manually, once, on a cluster. It is driven by a per-machine config, validation.toml – an ordered list of cases with their target cycle counts plus a per-machine seed and a reverse_list toggle. The first positional argument selects the action:

cd examples/validation

# usage helper (also printed when no action is given)
python run_validation.py

# show the recorded results -- READ-ONLY: runs nothing, writes nothing
python run_validation.py status
python run_validation.py status native_sh   # ... for one case

# (re)analyse the runs already on disk and update the results table
python run_validation.py analyze
python run_validation.py analyze native_sh   # ... for one case

# run every case listed in ./validation.toml, then analyse
python run_validation.py run
python run_validation.py run native_sh       # ... for one case

One-off overrides, usable with run:

python run_validation.py run --config machineB.toml  # different per-machine config
python run_validation.py run --cycles 20000          # override every case's target
python run_validation.py run --seed 2                # this machine's seed
python run_validation.py run --jobs 8                # internal cases in parallel

--jobs N runs that many internal-engine (methods/) cases at once; they are independent single-process runs, so this only uses more cores and does not change any result (each case has its own directory and seed). --cycles is a target total, not an increment – a case with existing output is continued up to that count, otherwise it starts fresh from seed. python run_validation.py --write-config validation.toml writes a fresh config template with every case enabled.

Each case is analysed as soon as it finishes – its rate prints on the go – and a combined inventory, convergence plot and comparison follow at the end. The analysis prints a “Rate vs analytical Kramers reference” table (rate, k/k_ref, |d|/sigma, agree?) next to the suite-mean table. The persistent summaries land in validation_results.json (one entry per case: rate, relative error, cycles, number of independent runs, agreement flag, plus provenance – when, at which git commit, and whether the tree was dirty) and the human-readable validation_results.html (the same table with the convergence rate_vs_cycles.png embedded). A CHECK in the comparison almost always means not converged yet – raise that case’s cycles and re-analyse.

See examples/validation/README.rst for the full driver reference, including the two-machine combine-for-statistics workflow.