.. _developer-method-validation: ############################################################### Method-equivalence validation: the double-well on one engine ############################################################### This page records a reproducibility check that **every sampling strategy and every code path** in |pyretis| reproduces the **same** crossing probability and rate constant on **one simple system**, and that the result matches an **absolute analytical truth** rather than only agreeing with itself. The system is a 1D **double well** sampled on the internal engine; the cases and the driver script live in ``examples/validation/methods/`` and are launched by ``examples/validation/run_validation.py``. The point of the test is that the in-process RETIS loop, the infinite-swapping sampler analysed with WHAM, and the native run routed through the infinite-swapping scheduler (the Stage C route, including the multi-worker pool) all integrate the *identical* potential, so if their rate estimates agree -- and land on the analytical Kramers rate -- the sampling machinery is reproducible across strategies and across code paths. This complements the :ref:`cross-engine validation `, which checks engine equivalence on a shared force field. The shared system ================= Every method case models the **same** 1D double well, so they must give the same rate within statistical error: * **Potential:** the quartic double well :math:`V = a x^4 - b x^2 + c` with ``a = 1`` and ``b = 2`` (so the barrier is :math:`\Delta V = b^2 / (4 a) = 1`), ``c = 0``. * **Dynamics:** inertial **Langevin** at temperature **T = 0.07** with a time step **dt = 0.025** and friction **gamma = 0.3** (reduced units), unit mass. With these numbers :math:`\beta\,\Delta V = 14.3`, so escape over the barrier is a genuine rare event. * **Order parameter:** the particle position, with **eight interfaces** spanning :math:`[-0.99 \ldots 1.0]`. The system parameters are not hard-coded into the analysis -- they are read back out of each case's input TOML (``read_system_params``), so the analytical reference below tracks the actual runs, and every method case is cross-checked to confirm it really models the same well. The analytical reference -- a known truth ========================================= The double well is simple enough to have a **closed-form escape rate from Kramers' theory**, so the suite checks the simulations against an *absolute* truth, not only against each other. ``analytical_double_well_rate()`` evaluates it in three increasingly complete forms: .. table:: Analytical double-well escape rate (reduced units). ========================== ========= ========================================================== Estimator Rate Meaning ========================== ========= ========================================================== ``k_TST`` 2.81e-07 transition-state theory, no recrossing (upper bound) ``k_Kramers`` (spatial) 2.61e-07 moderate/high-friction Kramers prefactor ``k_Kramers-MM`` (truth) 2.52e-07 + Mel'nikov--Meshkov turnover factor (**headline**) ========================== ========= ========================================================== With :math:`\beta\,\Delta V = 14.3` and reduced energy loss :math:`\delta = 11.4` the turnover factor is :math:`\Upsilon = 0.97` -- the system sits firmly in the spatial-diffusion regime, so the Kramers result is accurate to a few percent. Converged cases should land on **2.5e-07** within their statistical error. A case can agree with every *other* case yet sit several sigma from the analytical rate -- a shared systematic that the self-consistency check alone cannot see. The methods =========== The same double well is sampled with every available strategy and through every code path. The ``native_*`` cases run the in-process RETIS loop via ``pyretisrun``; ``infswap_wham`` runs the infinite-swapping sampler and is analysed with WHAM; the ``scheduler_*`` cases run a **native** config *through the infinite-swapping scheduler* (the Stage C native->scheduler route, ``PYRETIS_NATIVE_VIA_INFSWAP``), which emits native-format output and is therefore analysed exactly like a native case and must reproduce its native sibling's rate. .. table:: Method cases on the shared double well. ===================== =============================================================== Case Strategy / what it adds ===================== =============================================================== ``native_sh`` standard shooting (``pyretisrun``, in-process loop) ``native_ss`` stone skipping (``pyretisrun``) ``native_wt`` web throwing (``pyretisrun``) ``native_wf`` wire fencing (``pyretisrun``) ``native_wf_ha`` wire fencing with high acceptance (``pyretisrun``) ``native_wf_cap_*`` wire fencing with a capped window (``[tis] interface_cap``) ``native_wt_sour_*`` web throwing with a shifted source sub-interface (``[tis] interface_sour``) ``native_relshoot`` RETIS with relative per-ensemble shoot frequencies (``[retis] relative_shoots``) ``native_*_pcg64`` the same native methods driven by the PCG64 generator (``rgen = "pcg64"``) -- the A3.4 RNG-migration validation set ``infswap_wham`` infinite swapping (``pyretisrun``) analysed with WHAM -- a cross-check of the second sampler / code path ``scheduler_sh`` standard shooting routed through the infinite-swapping scheduler (Stage C native->scheduler route; native-format output, analysed like ``native_sh``) ``scheduler_sh_w3`` the same, with a **multi-worker** pool of three workers (``PYRETIS_NATIVE_WORKERS=3``) -- the parallel-worker route ``scheduler_wf`` wire fencing routed through the scheduler (WHAM ``Cxy/HA`` unweighting on the native route) ``scheduler_ss`` stone skipping routed through the scheduler ``scheduler_wt`` web throwing routed through the scheduler ===================== =============================================================== The ``native_wf_cap_*`` / ``native_wt_sour_*`` / ``native_relshoot`` cases (group ``params``) vary non-default TIS knobs that change the *sampling* but not the *physics*, so each must still reach the same analytical rate. The ``wf_convention`` group (``native_wf``/``native_wf_ha``, ``infswap_wham``, ``scheduler_wf``) cross-checks wire fencing across the code paths. On the scheduler's native-output route the wf occupancy is HA-weighted (the ``compute_weight`` crossing count drives the swap, as in infretis), so the native writer applies the WHAM ``Cxy/HA`` unweighting (``Weight = compute_weight / frac``) to recover the per-ensemble crossing probability -- without it the rate over-counts by ~two orders of magnitude; ``scheduler_ss`` carries the same treatment for stone skipping. The suite config's ``select`` key (group tags such as ``wf_convention`` or ``params``) runs a chosen subset without dropping the rest. The native ``sh/ss/wt/wf`` cases (analysed with the standard |pyretis| crossing probability), the infinite-swapping ``infswap_wham`` (analysed with WHAM), and the scheduler-routed cases (the Stage C path, analysed as native) are a direct cross-check of the distinct code paths. The ``native_*_pcg64`` cases are the **A3.4 step 2** go/no-go: they run the identical native methods with the canonical PCG64 generator instead of the legacy MT19937 and must reproduce the MT19937 rate within statistical error before the default generator is flipped (see ``MERGE_TODO.md`` A3.4). As the in-process and infinite-swapping code paths unify, the ``infswap_*`` and ``scheduler_*`` cases should collapse onto their native equivalents. .. note:: ``scheduler_ss`` (stone skipping through the scheduler) earlier crashed the internal engine's streaming dump (``FileNotFoundError`` on ``ss_shoot.xyz``) because the move re-pointed a *persistent* path's phase point at the transient dump file. That is fixed (stone skipping now dumps a copy), and the case is enabled like the others. Running it and reading the output ================================= The suite is **not** a unit test and **not** a tutorial: the GitLab CI cannot afford runs long enough to drive the rare-event statistics down, so it is launched **manually, once, on a cluster**. It is driven by a per-machine config, ``validation.toml`` -- an ordered list of cases with their target cycle counts plus a per-machine ``seed`` and a ``reverse_list`` toggle. The first positional argument selects the action: .. code-block:: pyretis cd examples/validation # usage helper (also printed when no action is given) python run_validation.py # show the recorded results -- READ-ONLY: runs nothing, writes nothing python run_validation.py status python run_validation.py status native_sh # ... for one case # (re)analyse the runs already on disk and update the results table python run_validation.py analyze python run_validation.py analyze native_sh # ... for one case # run every case listed in ./validation.toml, then analyse python run_validation.py run python run_validation.py run native_sh # ... for one case One-off overrides, usable with ``run``: .. code-block:: pyretis python run_validation.py run --config machineB.toml # different per-machine config python run_validation.py run --cycles 20000 # override every case's target python run_validation.py run --seed 2 # this machine's seed python run_validation.py run --jobs 8 # internal cases in parallel ``--jobs N`` runs that many **internal-engine** (``methods/``) cases at once; they are independent single-process runs, so this only uses more cores and does **not** change any result (each case has its own directory and seed). ``--cycles`` is a **target total**, not an increment -- a case with existing output is continued up to that count, otherwise it starts fresh from ``seed``. ``python run_validation.py --write-config validation.toml`` writes a fresh config template with every case enabled. Each case is analysed **as soon as it finishes** -- its rate prints on the go -- and a combined inventory, convergence plot and comparison follow at the end. The analysis prints a **"Rate vs analytical Kramers reference"** table (rate, ``k/k_ref``, ``|d|/sigma``, agree?) next to the suite-mean table. The persistent summaries land in ``validation_results.json`` (one entry per case: rate, relative error, cycles, number of independent runs, agreement flag, plus provenance -- when, at which git commit, and whether the tree was dirty) and the human-readable ``validation_results.html`` (the same table with the convergence ``rate_vs_cycles.png`` embedded). A ``CHECK`` in the comparison almost always means *not converged yet* -- raise that case's ``cycles`` and re-analyse. See ``examples/validation/README.rst`` for the full driver reference, including the two-machine combine-for-statistics workflow.