.. _developer-method-validation:

###############################################################
Method-equivalence validation: the double-well on one engine
###############################################################

This page records a reproducibility check that **every sampling strategy
and every code path** in |pyretis| reproduces the **same** crossing
probability and rate constant on **one simple system**, and that the
result matches an **absolute analytical truth** rather than only agreeing
with itself. The system is a 1D **double well** sampled on the internal
engine; the cases and the driver script live in
``examples/validation/methods/`` and are launched by
``examples/validation/run_validation.py``.

The point of the test is that the in-process RETIS loop, the
infinite-swapping sampler analysed with WHAM, and the native run routed
through the infinite-swapping scheduler (the Stage C route, including the
multi-worker pool) all integrate the *identical* potential, so if their
rate estimates agree -- and land on the analytical Kramers rate -- the
sampling machinery is reproducible across strategies and across code
paths. This complements the
:ref:`cross-engine validation <developer-engine-validation>`, which
checks engine equivalence on a shared force field.

The shared system
=================

Every method case models the **same** 1D double well, so they must give
the same rate within statistical error:

* **Potential:** the quartic double well :math:`V = a x^4 - b x^2 + c`
  with ``a = 1`` and ``b = 2`` (so the barrier is
  :math:`\Delta V = b^2 / (4 a) = 1`), ``c = 0``.
* **Dynamics:** inertial **Langevin** at temperature **T = 0.07** with a
  time step **dt = 0.025** and friction **gamma = 0.3** (reduced units),
  unit mass. With these numbers :math:`\beta\,\Delta V = 14.3`, so escape
  over the barrier is a genuine rare event.
* **Order parameter:** the particle position, with **eight interfaces**
  spanning :math:`[-0.99 \ldots 1.0]`.

The system parameters are not hard-coded into the analysis -- they are
read back out of each case's input TOML (``read_system_params``), so the
analytical reference below tracks the actual runs, and every method case
is cross-checked to confirm it really models the same well.

The analytical reference -- a known truth
=========================================

The double well is simple enough to have a **closed-form escape rate from
Kramers' theory**, so the suite checks the simulations against an
*absolute* truth, not only against each other.
``analytical_double_well_rate()`` evaluates it in three increasingly
complete forms:

.. table:: Analytical double-well escape rate (reduced units).

   ==========================  =========  ==========================================================
   Estimator                   Rate       Meaning
   ==========================  =========  ==========================================================
   ``k_TST``                   2.81e-07   transition-state theory, no recrossing (upper bound)
   ``k_Kramers`` (spatial)     2.61e-07   moderate/high-friction Kramers prefactor
   ``k_Kramers-MM`` (truth)    2.52e-07   + Mel'nikov--Meshkov turnover factor (**headline**)
   ==========================  =========  ==========================================================

With :math:`\beta\,\Delta V = 14.3` and reduced energy loss
:math:`\delta = 11.4` the turnover factor is :math:`\Upsilon = 0.97` --
the system sits firmly in the spatial-diffusion regime, so the Kramers
result is accurate to a few percent. Converged cases should land on
**2.5e-07** within their statistical error. A case can agree with every
*other* case yet sit several sigma from the analytical rate -- a shared
systematic that the self-consistency check alone cannot see.

The methods
===========

The same double well is sampled with every available strategy and through
every code path. The ``native_*`` cases run the in-process RETIS loop via
``pyretisrun``; ``infswap_wham`` runs the infinite-swapping sampler and is
analysed with WHAM; the ``scheduler_*`` cases run a **native** config
*through the infinite-swapping scheduler* (the Stage C native->scheduler
route, ``PYRETIS_NATIVE_VIA_INFSWAP``), which emits native-format output
and is therefore analysed exactly like a native case and must reproduce
its native sibling's rate.

.. table:: Method cases on the shared double well.

   =====================  ===============================================================
   Case                   Strategy / what it adds
   =====================  ===============================================================
   ``native_sh``          standard shooting (``pyretisrun``, in-process loop)
   ``native_ss``          stone skipping (``pyretisrun``)
   ``native_wt``          web throwing (``pyretisrun``)
   ``native_wf``          wire fencing (``pyretisrun``)
   ``native_wf_ha``       wire fencing with high acceptance (``pyretisrun``)
   ``native_wf_cap_*``    wire fencing with a capped window (``[tis] interface_cap``)
   ``native_wt_sour_*``   web throwing with a shifted source sub-interface
                          (``[tis] interface_sour``)
   ``native_relshoot``    RETIS with relative per-ensemble shoot frequencies
                          (``[retis] relative_shoots``)
   ``native_*_pcg64``     the same native methods driven by the PCG64 generator
                          (``rgen = "pcg64"``) -- the A3.4 RNG-migration validation set
   ``infswap_wham``       infinite swapping (``pyretisrun``) analysed with WHAM --
                          a cross-check of the second sampler / code path
   ``scheduler_sh``       standard shooting routed through the infinite-swapping
                          scheduler (Stage C native->scheduler route; native-format
                          output, analysed like ``native_sh``)
   ``scheduler_sh_w3``    the same, with a **multi-worker** pool of three workers
                          (``PYRETIS_NATIVE_WORKERS=3``) -- the parallel-worker route
   ``scheduler_wf``       wire fencing routed through the scheduler (WHAM
                          ``Cxy/HA`` unweighting on the native route)
   ``scheduler_ss``       stone skipping routed through the scheduler
   ``scheduler_wt``       web throwing routed through the scheduler
   =====================  ===============================================================

The ``native_wf_cap_*`` / ``native_wt_sour_*`` / ``native_relshoot`` cases
(group ``params``) vary non-default TIS knobs that change the *sampling*
but not the *physics*, so each must still reach the same analytical rate.
The ``wf_convention`` group (``native_wf``/``native_wf_ha``,
``infswap_wham``, ``scheduler_wf``) cross-checks wire fencing across the
code paths. On the scheduler's native-output route the wf occupancy is
HA-weighted (the ``compute_weight`` crossing count drives the swap, as in
infretis), so the native writer applies the WHAM ``Cxy/HA`` unweighting
(``Weight = compute_weight / frac``) to recover the per-ensemble crossing
probability -- without it the rate over-counts by ~two orders of magnitude;
``scheduler_ss`` carries the same treatment for stone skipping. The suite
config's ``select`` key (group tags such as ``wf_convention`` or
``params``) runs a chosen subset without dropping the rest.

The native ``sh/ss/wt/wf`` cases (analysed with the standard |pyretis|
crossing probability), the infinite-swapping ``infswap_wham`` (analysed
with WHAM), and the scheduler-routed cases (the Stage C path, analysed as
native) are a direct cross-check of the distinct code paths. The
``native_*_pcg64`` cases are the **A3.4 step 2** go/no-go: they run the
identical native methods with the canonical PCG64 generator instead of the
legacy MT19937 and must reproduce the MT19937 rate within statistical
error before the default generator is flipped (see ``MERGE_TODO.md``
A3.4). As the in-process and infinite-swapping code paths unify, the
``infswap_*`` and ``scheduler_*`` cases should collapse onto their native
equivalents.

.. note::

   ``scheduler_ss`` (stone skipping through the scheduler) earlier crashed
   the internal engine's streaming dump (``FileNotFoundError`` on
   ``ss_shoot.xyz``) because the move re-pointed a *persistent* path's
   phase point at the transient dump file. That is fixed (stone skipping
   now dumps a copy), and the case is enabled like the others.

Running it and reading the output
=================================

The suite is **not** a unit test and **not** a tutorial: the GitLab CI
cannot afford runs long enough to drive the rare-event statistics down, so
it is launched **manually, once, on a cluster**. It is driven by a
per-machine config, ``validation.toml`` -- an ordered list of cases with
their target cycle counts plus a per-machine ``seed`` and a
``reverse_list`` toggle. The first positional argument selects the action:

.. code-block:: pyretis

    cd examples/validation

    # usage helper (also printed when no action is given)
    python run_validation.py

    # show the recorded results -- READ-ONLY: runs nothing, writes nothing
    python run_validation.py status
    python run_validation.py status native_sh   # ... for one case

    # (re)analyse the runs already on disk and update the results table
    python run_validation.py analyze
    python run_validation.py analyze native_sh   # ... for one case

    # run every case listed in ./validation.toml, then analyse
    python run_validation.py run
    python run_validation.py run native_sh       # ... for one case

One-off overrides, usable with ``run``:

.. code-block:: pyretis

    python run_validation.py run --config machineB.toml  # different per-machine config
    python run_validation.py run --cycles 20000          # override every case's target
    python run_validation.py run --seed 2                # this machine's seed
    python run_validation.py run --jobs 8                # internal cases in parallel

``--jobs N`` runs that many **internal-engine** (``methods/``) cases at
once; they are independent single-process runs, so this only uses more
cores and does **not** change any result (each case has its own directory
and seed). ``--cycles`` is a **target total**, not an increment -- a case
with existing output is continued up to that count, otherwise it starts
fresh from ``seed``. ``python run_validation.py --write-config
validation.toml`` writes a fresh config template with every case enabled.

Each case is analysed **as soon as it finishes** -- its rate prints on the
go -- and a combined inventory, convergence plot and comparison follow at
the end. The analysis prints a **"Rate vs analytical Kramers reference"**
table (rate, ``k/k_ref``, ``|d|/sigma``, agree?) next to the suite-mean
table. The persistent summaries land in
``validation_results.json`` (one entry per case: rate, relative error,
cycles, number of independent runs, agreement flag, plus provenance --
when, at which git commit, and whether the tree was dirty) and the
human-readable ``validation_results.html`` (the same table with the
convergence ``rate_vs_cycles.png`` embedded). A ``CHECK`` in the
comparison almost always means *not converged yet* -- raise that case's
``cycles`` and re-analyse.

See ``examples/validation/README.rst`` for the full driver reference,
including the two-machine combine-for-statistics workflow.