pyretis.analysis package¶

This package defines analysis tools for the PyRETIS program.

The analysis tools are intended to be used for analysis of the simulation output from the PyRETIS program. The typical use of this package is in post-processing of the results from a simulation (or several simulations).

Package structure¶

Modules¶

__init__.py: This file, imports from the other modules. The method to analyse results from MD flux simulations is defined here since it will make use of analysis tools from energy_analysis.py and order_analysis.py.
analysis.py (pyretis.analysis.analysis): General methods for numerical analysis.
energy_analysis.py (pyretis.analysis.energy_analysis): Defines methods useful for analysing the energy output.
histogram.py (pyretis.analysis.histogram): Defines methods useful for generating histograms.
order_analysis.py (pyretis.analysis.order_analysis): Defines methods useful for analysis of order parameters.
path_analysis.py (pyretis.analysis.path_analysis): Defines methods for analysis of path ensembles.

Important methods defined in this package¶

analyse_energies (analyse_energies()): Analyse energy data from a simulation. It will calculate a running average, a distribution and do a block error analysis.
analyse_flux (analyse_flux()): Analyse flux data from a MD flux simulation. It will calculate a running average, a distribution and do a block error analysis.
analyse_orderp (analyse_orderp()): Analyse order parameter data. It will calculate a running average, a distribution and do a block error analysis. In addition, it will analyse the mean square displacement (if requested).
analyse_path_ensemble (analyse_path_ensemble()): Analyse the results from a single path ensemble. It will calculate a running average of the probabilities, a crossing probability, perform a block error analysis, analyse lengths of paths, type of Monte Carlo moves and calculate an efficiency.
match_probabilities (match_probabilities()): Method to match probabilities from several path simulations. Useful for obtaining the overall crossing probability.
histogram (histogram()): Generates histogram, basically a wrapper around numpy’s histogram.
match_all_histograms (match_all_histograms()): Method to match histograms from umbrella simulations.
retis_flux (retis_flux()): Method for calculating the initial flux for RETIS simulations.
retis_rate (retis_rate()): Method for calculating the rate constant for RETIS simulations.
construct_M (construct_M()): Build the partial-path (REPPTIS) MSM transition matrix from the local crossing probabilities.
global_pcross_msm (global_pcross_msm()): Global crossing probability from a REPPTIS MSM transition matrix.
mfpt_to_first_last_state (mfpt_to_first_last_state()): Mean first passage time between the boundary states of a REPPTIS MSM (used for the flux and rate of slow permeation processes).

pyretis.analysis.analyse_md_flux(crossdata, energydata, orderdata, settings)[source]¶

Analyse the output from a MD-flux simulation.

The obtained results will be returned as a convenient structure for plotting or reporting.

Parameters:

crossdata (numpy.array) – This is the data containing information about crossings.
energydata (numpy.array) – This is the raw data for the energies.
orderdata (numpy.array) – This is the raw data for the order parameter.
settings (dict) – The settings for the analysis (e.g block length for error analysis) and some settings from the simulation (interfaces, time step etc.).

Returns:

results (dict) – This dict contains the results from the different analysis as a dictionary. This dict can be used further for plotting or for generating reports.

pyretis.analysis.analysis module¶

Module defining functions useful in the analysis of simulation data.

Important methods defined here¶

running_average (running_average()): Method to calculate a running average.
block_error (block_error()): Perform block error analysis.
block_error_corr (block_error_corr()): Method to run a block error analysis and calculate relative errors and correlation length.

pyretis.analysis.analysis.analyse_data(data, settings)[source]¶

Analyse the given data and run some common analysis procedures.

Specifically, it will:

Calculate a running average.
Obtain a histogram.
Run a block error analysis.

Parameters:

data (numpy.array, 1D) – This numpy.array contains the data as a function of time.
settings (dict) – This dictionary contains settings for the analysis.

Returns:

result (dict) – This dict contains the results.

pyretis.analysis.analysis.block_error(data, maxblock=None, blockskip=1, weights=None)[source]¶

Perform block error analysis.

This function will estimate the standard deviation in the input data by performing a block analysis. The number of blocks to consider can be specified or it will be taken as the half of the length of the input data. Averages and variance are calculated using an on-the-fly algorithm [1].

Parameters:

data (numpy.array (or iterable with data points)) – The data to analyse.
maxblock (int, optional) – Can be used to set the maximum length of the blocks to consider. Note that the maxblock will never be set longer than half the length in data.
blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.

Returns:

blocklen (numpy.array) – These contain the block lengths considered.
block_avg (numpy.array) – The averages as a function of the block length.
block_err (numpy.array) – Estimate of errors as a function of the block length.
block_err_avg (float) – Average of the error estimate using blocks where length > maxblock//2.

References

pyretis.analysis.analysis.block_error_corr(data, maxblock=None, blockskip=1)[source]¶

Run block error analysis on the given data.

This will run the block error analysis and return the relative errors and correlation length.

Parameters:

data (numpy.array) – Data to analyse.
maxblock (int, optional) – The maximum block length to consider.
blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.

Returns:

out[0] (numpy.array) – These contains the block lengths considered (blen).
out[1] (numpy.array) – Estimate of errors as a function of the block length (berr).
out[2] (float) – Average of the error estimate for blocks (berr_avg) with length > maxblock // 2.
out[3] (numpy.array) – Estimate of relative errors normalised by the overall average as a function of block length (rel_err).
out[4] (float) – The average relative error (avg_rel_err), for blocks with length > maxblock // 2.
out[5] (numpy.array) – The estimated correlation length as a function of the block length (ncor).
out[6] (float) – The average (for blocks with length > maxblock // 2) estimated correlation length (avg_ncor).

pyretis.analysis.analysis.mean_square_displacement(data, ndt=None)[source]¶

Calculate the mean square displacement for the given data.

Parameters:

data (numpy.array, 1D) – This numpy.array contains the data as a function of time.
ndt (int, optional) – This parameter is the number of time origins. I.e. points up to ndt will be used as time origins. If not specified the value of the input data.size // 5 will be used.

Returns:

msd (numpy.array, 2D) – The first column is the mean squared displacement and the second column is the corresponding standard deviation.

pyretis.analysis.analysis.running_average(data)[source]¶

Create a running average of the given data.

The running average will be calculated over the rows.

Parameters:: data (numpy.array) – This is the data we will average.
Returns:: out (numpy.array) – The running average.

pyretis.analysis.energy_analysis module¶

Methods for analysing energy data from simulations.

Important methods defined here¶

analyse_energies (py:func:.analyse_energies): Run the analysis for energies (kinetic, potential etc.).

pyretis.analysis.energy_analysis.analyse_energies(energies, settings)[source]¶

Run the energy analysis on several energy types.

The function will run the energy analysis on several energy types and collect the energies into a structure which is convenient for plotting the results.

Parameters:

energies (dict) – This dict contains the energies to analyse.
settings (dict) – This dictionary contains settings for the analysis.

Returns:

results (dict) – For each energy key results[key] contains the result from the energy analysis.

pyretis.analysis.flux_analysis module¶

Methods for analysis of crossings for flux data.

Important methods defined here¶

analyse_flux (analyse_flux()): Run analysis for simulation flux data. This will calculate the initial flux for a simulation.

pyretis.analysis.flux_analysis.analyse_flux(fluxdata, settings)[source]¶

Run the analysis on the given flux data.

This will run the flux analysis and collect the results into a structure which is convenient for plotting and reporting the results.

Parameters:

fluxdata (list of tuples of integers) – This array contains the data obtained from a MD simulation for the fluxes.
settings (dict) – This dict contains the settings for the analysis. Note that this dictionary also needs some settings from the simulation, in particular the number of cycles, the interfaces and information about the time step.

Returns:

results (dict) – This dict contains the results from the flux analysis. The keys are defined in the results variable.

pyretis.analysis.flux_analysis.find_crossings(order, interfaces)[source]¶

Find crossings with interfaces for given order parameter data.

Parameters:

order (numpy.array (1D)) – Order parameters, as a function of time.
interfaces (list of floats) – The interfaces for which we will investigate crossings.

Returns:

out (list of tuple) – Each tuple contains the crossings on the following form: (step, interface-number, direction), where direction = ‘+’ if the interface was crossed while moving to the right and ‘-’ if the movement was towards the left.

pyretis.analysis.histogram module¶

Histogram functions for data analysis.

This module defines some simple functions for histograms.

Important methods defined here¶

histogram (histogram()): Create a histogram from given data.
match_all_histograms (match_all_histograms()): Function to match histograms, for instance from an umbrella sampling simulation.
histogram_and_avg (histogram_and_avg()): Create a histogram and return bins, midpoints and simple statistics.

pyretis.analysis.histogram.histogram(data, bins=10, limits=(-1, 1), density=False, weights=None)[source]¶

Create a histogram of the given data.

Parameters:

data (numpy.array) – The data for making the histogram.
bins (int, optional) – The number of bins to divide the data into.
limits (tuple/list, optional) – The max/min values to consider.
density (boolean, optional) – If True the histogram will be normalized.
weights (numpy.array, optional) – Weighting factors for data.

Returns:

hist (numpy.array) – The binned counts.
bins (numpy.array) – The edges of the bins.
bin_mid (numpy.array) – The midpoint of the bins.

pyretis.analysis.histogram.histogram_and_avg(data, bins, density=True, weights=None)[source]¶

Create histogram an return bins, midpoints and simple statistics.

The simple statistics include the mean value and the standard deviation. The return structure is useful for plotting routines. The midpoints returned are the midpoints of the bins.

Parameters:

data (either 1D numpy.array or 2D numpy.array) – This is the data to create the histogram from. The eventual second dimension contains the weights.
bins (int) – The number of bins to use for the histogram.
density (boolean, optional) – If density is true, the histogram will be normalized.
weights (numpy.array, optional) – Weighting factors for data. Not used if data contains them.

Returns:

out[0] (numpy.array) – The histogram (frequency) values.
out[1] (numpy.array) – The midpoints for the bins.
out[2] (tuple of floats) – These are some simple statistics, out[2][0] is the average out[2][1] is the standard deviation.

pyretis.analysis.histogram.match_all_histograms(histograms, umbrellas)[source]¶

Match several histograms from an umbrella sampling.

Parameters:

histograms (list of numpy.arrays) – The histograms to match.
umbrellas (list of lists) – The umbrella windows used in the computation.

Returns:

histograms_s (list of numpy.arrays) – The scaled histograms.
scale_factor (list of floats) – The scale factors.
matched_count (numpy.array) – Count for overall matched histogram (an “averaged” histogram).

pyretis.analysis.order_analysis module¶

Module defining functions for analysis of order parameters.

Important methods defined here¶

analyse_orderp (analyse_orderp()): Run a simple order parameter analysis.

pyretis.analysis.order_analysis.analyse_orderp(orderdata, settings)[source]¶

Run the analysis on several order parameters.

The results are collected into a structure which is convenient for plotting.

Parameters:

orderdata (numpy.arrays) – The data read from the order parameter file.
settings (dict) – This dictionary contains settings for the analysis.

Returns:

results (numpy.array) – For each order parameter key, results[key] contains the result of the analysis.

pyretis.analysis.path_analysis module¶

Methods for analysis of path ensembles.

Important methods defined here¶

analyse_path_ensemble (analyse_path_ensemble()): Method to analyse a path ensemble, it will calculate crossing probabilities and information about moves etc. This method can be applied to files as well as path ensemble objects.
analyse_path_ensemble_object (analyse_path_ensemble_object()): Method to analyse a path ensemble, it will calculate crossing probabilities and information about moves etc. This method is intended to work directly on path ensemble objects.
match_probabilities (match_probabilities()): Match probabilities from several path ensembles and calculate efficiencies and the error for the matched probability.
retis_flux (retis_flux()): Calculate the initial flux with errors for a RETIS simulation.
retis_rate (retis_rate()): Calculate the rate constant with errors for a RETIS simulation.

pyretis.analysis.path_analysis.analyse_path_ensemble(path_ensemble, settings)[source]¶

Analyse a path ensemble.

This function will make use of the different analysis functions and analyse a path ensemble. This function is more general than the analyse_path_ensemble_object function in that it should work on both PathEnsemble and PathEnsembleFile objects. The running average is updated on-the-fly, see Wikipedia for details [wikimov].

Parameters:

path_ensemble (object like PathEnsemble) – This is the path ensemble to analyse.
settings (dict) – This dictionary contains settings for the analysis. We make use of the following keys:
- ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter.
- maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis.
- blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc.
- bins: The number of bins to use for creating histograms.

Returns:

out (dict) – This dictionary contains the main results for the analysis which can be used for plotting or other kinds of output.

pyretis.analysis.repptis_analysis module¶

This module ports the local/global crossing-probability analysis of REPPTIS partial paths from tistools (see [vervust2026loc]).

Local and global crossing probabilities for partial path TIS (REPPTIS).

A REPPTIS run samples short partial paths, each classified by where it enters and leaves the ensemble: LML / LMR / RML / RMR (Left/Right entry, Middle, Left/Right exit). From the weighted counts of these path types this module computes:

the local crossing probabilities of one ensemble (local_crossing_probabilities()) – the per-ensemble p_mm (LML) / p_mp (LMR) / p_pm (RML) / p_pp (RMR) that feed the REPPTIS Markov state model (pyretis.analysis.repptis_msm), and
the global crossing probability across all ensembles (global_crossing_probabilities_from_local()) via the closed-form recursion, an alternative to the MSM that does not need the transition-time vector.

This is a faithful port of the local/global crossing-probability part of the tistools repptis_analysis module (the analysis of Vervust, Wils, Safaei, Zhang and Ghysels, J. Chem. Theory Comput. 2026); the path-type weighting is preserved, only adapted to consume the lmrs / weights arrays that pyretis.analysis.path_analysis.analyse_repptis_ensemble() already produces, rather than the tistools reader objects.

References

[vervust2026loc]

W. Vervust, E. Wils, S. Safaei, D. T. Zhang and A. Ghysels, “Estimating full path lengths and kinetics from partial path transition interface sampling simulations”, J. Chem. Theory Comput. (2026), https://doi.org/10.1021/acs.jctc.5c01498.

pyretis.analysis.repptis_analysis.global_crossing_probabilities_from_local(p_minus_plus, p_minus_minus, p_plus_plus, p_plus_minus)[source]¶

Return global crossing probabilities from the local ones.

Combine the per-ensemble local crossing probabilities into the global crossing probabilities using the closed-form REPPTIS recursion

\[P^+_j = \frac{p^{LMR}_{j-1} P^+_{j-1}} {p^{LMR}_{j-1} + p^{LML}_{j-1} P^-_{j-1}}, \quad P^-_j = \frac{p^{RML}_{j-1} P^-_{j-1}} {p^{LMR}_{j-1} + p^{LML}_{j-1} P^-_{j-1}},\]

with \(P^+_0 = P^-_0 = 1\). P_cross is the corresponding TIS probability of crossing interface i + 1.

Parameters:

p_minus_plus (sequence of float) – Per-ensemble local probability of type LMR (forward crossing).
p_minus_minus (sequence of float) – Per-ensemble local probability of type LML (return left).
p_plus_plus (sequence of float) – Per-ensemble local probability of type RMR (return right).
p_plus_minus (sequence of float) – Per-ensemble local probability of type RML (cross left).

Returns:

p_min (list of float) – Per ensemble i, the probability of reaching state A before interface i + 1 given a crossing of i.
p_plus (list of float) – Per ensemble i, the probability of reaching interface i + 1 before A given a crossing of i.
p_cross (list of float) – Per ensemble i, the TIS probability of crossing i + 1.

Notes

If any input contains nan, or a denominator vanishes, the function returns [nan, nan, nan] – matching the upstream behaviour, so a degenerate ensemble does not silently produce a spurious finite rate.

pyretis.analysis.repptis_analysis.local_crossing_probabilities(lmrs, weights, tr=False)[source]¶

Return the local crossing probabilities of one REPPTIS ensemble.

The accepted partial paths of an [i^+-] / [0^+-'] ensemble are classified by type (RMR / RML / LMR / LML) and weighted by their Monte-Carlo weights. The local crossing probability of a type is its weight divided by the total weight of paths entering from the same side: paths entering from the left (LMR + LML) and from the right (RMR + RML) are normalised separately. This routine is not for the [0^-'] ensemble.

Parameters:

lmrs (sequence of string) – One "LMR"-style type tag per accepted path (the joined left/middle/right interface flags). Tags that are not one of PATH_TYPES (partial * tags) contribute zero weight.
weights (sequence of int or float) – The Monte-Carlo weight of each accepted path (same length and order as lmrs).
tr (boolean, optional) – If True, apply infinite time-reversal reweighting: double the RMR and LML weights and symmetrise RML / LMR. Default is False.

Returns:

p (dict) – Local crossing probabilities keyed by path type (RMR / RML / LMR / LML), plus the milestoning arrival probabilities 2R / 2L. A probability is nan when its normalising weight is zero.

pyretis.analysis.repptis_analysis.path_type_taus(orders_list, lmrs, weights, interfaces)[source]¶

Return the weighted-average transition times per path type.

For every accepted path the total interior time (tau_total()), the time before the middle crossing (tau_before_middle()) and the time after it (tau_after_middle()) are computed and weight-averaged per path type. The middle (recrossing) time taum is the remainder tau - tau1 - tau2. These per-type, per-ensemble averages feed the REPPTIS MSM mean-first-passage-time vector (pyretis.analysis.repptis_msm.construct_tau_vector()).

Parameters:

orders_list (sequence of numpy.ndarray) – One order-parameter trajectory per accepted path.
lmrs (sequence of string) – The path type of each accepted path (same length/order).
weights (sequence of int or float) – The Monte-Carlo weight of each accepted path.
interfaces (sequence of float) – The ensemble’s (left, middle, right) interfaces.

Returns:

dict – Maps each path type (LML / LMR / RML / RMR) to a dict with keys tau / tau1 / tau2 / taum (the weighted averages, or nan when the type is unsampled).

pyretis.analysis.repptis_analysis.tau_after_middle(orders, ptype, interfaces)[source]¶

Return the steps a path takes after its last middle crossing.

The mirror of tau_before_middle(), measured from the end of the path inward: for a path that exits left (LML / RML) from the left interface, for a path that exits right (LMR / RMR) from the right interface.

Parameters:

orders (numpy.ndarray) – The per-phase-point order parameters of the path.
ptype (string) – The path type (LMR / LML / RML / RMR).
interfaces (sequence of float) – The ensemble’s (left, middle, right) interfaces.

Returns:

int – The number of steps after the last middle crossing.

Raises:

ValueError – If the path type is not recognised.

pyretis.analysis.repptis_analysis.tau_before_middle(orders, ptype, interfaces)[source]¶

Return the steps a path takes to first cross the middle interface.

For a left-entering path (LMR / LML) this is the number of steps between the first crossing of the left interface interfaces[0] and the first crossing of the middle interface interfaces[1]; for a right-entering path (RML / RMR) it is measured from the right interface interfaces[2] inward.

Parameters:

orders (numpy.ndarray) – The per-phase-point order parameters of the path; column 0 is the main order parameter.
ptype (string) – The path type (LMR / LML / RML / RMR).
interfaces (sequence of float) – The ensemble’s (left, middle, right) interfaces.

Returns:

int – The number of steps before the middle interface is crossed.

Raises:

ValueError – If the path type is not recognised.

pyretis.analysis.repptis_analysis.tau_total(orders, ptype, interfaces)[source]¶

Return the path length between its first and last boundary crossing.

The total time the path spends inside the ensemble, excluding the leading segment before the entry crossing and the trailing segment after the exit crossing. Equals tau_before_middle + tau_after_middle + the middle (recrossing) time.

Parameters:

orders (numpy.ndarray) – The per-phase-point order parameters of the path.
ptype (string) – The path type (LMR / LML / RML / RMR).
interfaces (sequence of float) – The ensemble’s (left, middle, right) interfaces.

Returns:

int – The number of interior steps of the path.

Raises:

ValueError – If the path type is not recognised.

pyretis.analysis.repptis_msm module¶

Markov-state-model kinetics for partial path TIS (REPPTIS).

REPPTIS samples short partial paths, so the crossing probability, mean first passage time (MFPT), flux and rate cannot be read off the path ensembles directly the way they are for full RETIS. This module reconstructs them by building a Markov state model (MSM) over the partial-path ensembles from the local crossing probabilities and the per-ensemble transition times, following the closed-form analysis of Vervust, Wils, Safaei, Zhang and Ghysels [vervust2026] (building on the permeability-from-(RE)TIS formalism of Ghysels, Roet, Davoudi and van Erp [ghysels2021], and the REPPTIS replica-exchange scheme of Vervust, Zhang, van Erp and Ghysels [vervust2023]).

This is a faithful port of the repptis_msm module from the tistools analysis package (Elias W., May 2025); the numerical construction is preserved verbatim, only adapted to the PyRETIS coding conventions.

Note on the index convention¶

In the paper, N + 1 is the number of interfaces, while in this code N is the number of interfaces directly. Both yield the same state-space size NS = 4 * N - 5 (paper: 4 * (N - 1) - 1).

Important methods defined here¶

construct_M (construct_M()): Build the PPTIS transition matrix from the local crossing probabilities.
construct_M_milestoning (construct_M_milestoning()): Build the milestoning transition matrix.
global_pcross_msm (global_pcross_msm()): Global crossing probability from the transition matrix.
mfpt_to_absorbing_states (mfpt_to_absorbing_states()): Mean first passage time to a set of absorbing states.
mfpt_to_first_last_state (mfpt_to_first_last_state()): Mean first passage time to the first or last state.
construct_tau_vector (construct_tau_vector()): Flatten the per-path-type transition times into the MSM vector.

References

[vervust2026]

[ghysels2021]

A. Ghysels, S. Roet, S. Davoudi and T. S. van Erp, “Exact non-Markovian permeability from rare event simulations”, Phys. Rev. Research 3, 033068 (2021), https://doi.org/10.1103/PhysRevResearch.3.033068.

[vervust2023]

W. Vervust, D. T. Zhang, T. S. van Erp and A. Ghysels, “Path sampling with memory reduction and replica exchange to reach long permeation timescales”, Biophys. J. 122, 2960-2972 (2023), https://doi.org/10.1016/j.bpj.2023.02.021.

pyretis.analysis.repptis_msm.check_valid_indices(M, absor, kept)[source]¶

Validate the absorbing and non-absorbing state indices.

Ensure that the indices of absorbing (absor) and non-absorbing (kept) states are correctly defined, in range, and partition the state space (each state in exactly one of the two sets).

Parameters:

M (numpy.ndarray) – The transition matrix of the Markov process.
absor (array-like) – Indices of absorbing states.
kept (array-like) – Indices of non-absorbing (nonboundary) states.

Raises:

ValueError – If there are duplicate indices in absor or kept, if any index is out of bounds, or if any state is neither in absor nor in kept (or in both).

pyretis.analysis.repptis_msm.construct_M(p_mm, p_mp, p_pm, p_pp, N)[source]¶

Construct the PPTIS transition matrix M.

The matrix describes the probabilities of transitioning between states in the PPTIS framework, built from the local crossing probabilities for the different path types (LML, LMR, RML, RMR).

Parameters:

p_mm (list of float) – Local crossing probabilities for the LML (Left-to-Left) path type. Must have length N - 1.
p_mp (list of float) – Local crossing probabilities for the LMR (Left-to-Right) path type. Must have length N - 1.
p_pm (list of float) – Local crossing probabilities for the RML (Right-to-Left) path type. Must have length N - 1.
p_pp (list of float) – Local crossing probabilities for the RMR (Right-to-Right) path type. Must have length N - 1.
N (int) – The number of interfaces. Must be at least 3.

Returns:

M (numpy.ndarray) – The transition matrix of shape (NS, NS), where NS = 4 * N - 5.

Raises:

ValueError – If any of the input constraints are not met (e.g. invalid lengths or values).

pyretis.analysis.repptis_msm.construct_M_N3(p_mm, p_mp, NS)[source]¶

Construct the PPTIS transition matrix M for the case N=3.

Build the transition matrix when there are exactly 3 interfaces.

Parameters:

p_mm (list of float) – Local crossing probabilities for the LML (Left-to-Left) path type. Must have length 2.
p_mp (list of float) – Local crossing probabilities for the LMR (Left-to-Right) path type. Must have length 2.
NS (int) – The dimension of the transition matrix. Must satisfy NS = 4 * N - 5 with N = 3.

Returns:

M (numpy.ndarray) – The transition matrix of shape (NS, NS).

Raises:

ValueError – If the input probability lists do not have the correct length or if NS is invalid.

pyretis.analysis.repptis_msm.construct_M_milestoning(p_min, p_plus, N)[source]¶

Construct the transition matrix M for milestoning PPTIS.

Build a transition matrix for a milestoning framework. The states are lambda0-, lambda0+, lambda1, …, lambda(N-1) = B.

Parameters:

p_min (list of float) – Probabilities of transitioning to the previous milestone. Must have length N - 1.
p_plus (list of float) – Probabilities of transitioning to the next milestone. Must have length N - 1.
N (int) – The number of interfaces (milestones). Must be at least 3.

Returns:

M (numpy.ndarray) – The transition matrix of shape (NS, NS), where NS = N + 1.

Raises:

ValueError – If N is less than 3 or if the lengths of p_min and p_plus are not N - 1.

pyretis.analysis.repptis_msm.construct_tau_vector(N, NS, taumm, taump, taupm, taupp)[source]¶

Construct the flattened vector of PPTIS transition times.

The per-path-type transition times (LML, LMR, RML, RMR) are organised into a single flattened vector following the PPTIS ensemble structure.

Parameters:

N (int) – The number of ensembles. Must be at least 3.
NS (int) – The expected size of the output vector. Must satisfy NS = 4 * N - 5.
taumm (list of float) – Transition times for the LML (Left-to-Left) path type. Must have length N.
taump (list of float) – Transition times for the LMR (Left-to-Right) path type. Must have length N.
taupm (list of float) – Transition times for the RML (Right-to-Left) path type. Must have length N.
taupp (list of float) – Transition times for the RMR (Right-to-Right) path type. Must have length N.

Returns:

tau (numpy.ndarray) – A flattened vector of transition times with length NS.

Raises:

ValueError – If any of the input constraints are not met (e.g. invalid lengths or values).

pyretis.analysis.repptis_msm.create_labels_states(N)[source]¶

Generate labels for absorbing and non-absorbing states.

Create two lists of state labels: labels1 for the two absorbing states (0- and B), and labels2 for the N-2 non-absorbing states.

Parameters:

N (int) – The total number of states. Must be at least 3.

Returns:

labels1 (list of str) – Labels for the two absorbing states.
labels2 (list of str) – Labels for the N-2 non-absorbing states.

Raises:

ValueError – If N is less than 3.

pyretis.analysis.repptis_msm.create_labels_states_all(N)[source]¶

Generate labels for all states in sequential order.

Create a single list of state labels: the absorbing state 0-, all non-absorbing states, then the absorbing state B.

Parameters:: N (int) – The total number of states. Must be at least 3.
Returns:: labels (list of str) – Labels for all N states in sequential order.
Raises:: ValueError – If N is less than 3.

pyretis.analysis.repptis_msm.get_pieces_matrix(M, absor, kept)[source]¶

Partition a transition matrix into absorbing/nonboundary blocks.

Split M into four submatrices: Mp (nonboundary to nonboundary), D (nonboundary to absorbing), E (absorbing to nonboundary), and M11 (absorbing to absorbing).

Parameters:

M (numpy.ndarray) – The transition matrix of the Markov process.
absor (array-like) – Indices of absorbing states.
kept (array-like) – Indices of non-absorbing (nonboundary) states.

Returns:

Mp (numpy.ndarray) – Submatrix for nonboundary-to-nonboundary transitions.
D (numpy.ndarray) – Submatrix for nonboundary-to-absorbing transitions.
E (numpy.ndarray) – Submatrix for absorbing-to-nonboundary transitions.
M11 (numpy.ndarray) – Submatrix for absorbing-to-absorbing transitions.

Raises:

ValueError – If absor and kept contain invalid or overlapping indices.

pyretis.analysis.repptis_msm.get_pieces_vector(vec, absor, kept)[source]¶

Partition a vector into absorbing/nonboundary sub-vectors.

Split vec into v1 (elements at absorbing-state indices) and v2 (elements at nonboundary-state indices).

Parameters:

vec (numpy.ndarray) – A 1D array of state values.
absor (array-like) – Indices of absorbing states.
kept (array-like) – Indices of non-absorbing (nonboundary) states.

Returns:

v1 (numpy.ndarray) – A column vector (len(absor) x 1) of elements at absor.
v2 (numpy.ndarray) – A column vector (len(kept) x 1) of elements at kept.

Raises:

ValueError – If absor and kept contain invalid or overlapping indices, if vec is not 1D, or if vec does not have the expected length len(absor) + len(kept).

pyretis.analysis.repptis_msm.global_pcross_msm(M, doprint=False)[source]¶

Compute global crossing probabilities in a Markov process.

Calculate the probability of reaching state -1 before state 0 under different conditions, using the transition matrix. The function solves a linear system rather than inverting matrices, for better numerical stability.

Parameters:

M (numpy.ndarray) – A square transition matrix representing the Markov process. Must have at least 3 states.
doprint (bool, optional) – If True, log intermediate computation details. Default False.

Returns:

z1 (numpy.ndarray) – A 2-element array with crossing probabilities for states 0 and -1.
z2 (numpy.ndarray) – An (NS-2)-element array with crossing probabilities for the other states.
y1 (numpy.ndarray) – A 2-element array with adjusted crossing probabilities for states 0 and -1. y1[0] is the global crossing probability from state 0 to -1, given that the process starts at 0 and leaves it.
y2 (numpy.ndarray) – An (NS-2)-element array with adjusted crossing probabilities for the other states.

Raises:

ValueError – If the transition matrix has fewer than three states.

pyretis.analysis.repptis_msm.mfpt_to_absorbing_states(M, tau1, taum, tau2, absor, kept, doprint=False, remove_initial_m=True)[source]¶

Compute the mean first passage time (MFPT) to absorbing states.

Calculate the MFPT to reach any of the specified absorbing states, both unconditionally (g) and conditionally on leaving the current state (h). The system is solved without explicit matrix inversion, for numerical stability.

Parameters:

M (numpy.ndarray) – The transition matrix of the Markov process.
tau1 (numpy.ndarray) – Time before the first visit to an absorbing state.
taum (numpy.ndarray) – Time spent between the first and last visit to an absorbing state.
tau2 (numpy.ndarray) – Time after the last visit to an absorbing state.
absor (list or numpy.ndarray) – Indices of absorbing states.
kept (list or numpy.ndarray) – Indices of nonboundary (non-absorbing) states.
doprint (bool, optional) – If True, log intermediate computation details. Default False.
remove_initial_m (bool or str, optional) – If True or "m", the middle part (taum) is removed from the initial state’s MFPT. Default True.

Returns:

g1 (numpy.ndarray) – Array of size (n_absorb, 1) with unconditional MFPTs for the absorbing states.
g2 (numpy.ndarray) – Array of size (n_kept, 1) with unconditional MFPTs for the nonboundary states.
h1 (numpy.ndarray) – Array of size (n_absorb, 1) with conditional MFPTs for the absorbing states.
h2 (numpy.ndarray) – Array of size (n_kept, 1) with conditional MFPTs for the nonboundary states.

Raises:

ValueError – If the transition matrix has fewer than three states, or if the absorbing and nonboundary states do not partition the state space.

pyretis.analysis.repptis_msm.mfpt_to_first_last_state(M, tau1, taum, tau2, doprint=False)[source]¶

Compute the MFPT to reach either state 0 or state -1.

Calculate the MFPT to reach the first (0) or last (NS-1) state by treating these two states as absorbing. The key result is h1[0], the MFPT from state 0 to either 0 or -1, given that the process leaves state 0. Calls mfpt_to_absorbing_states() with remove_initial_m="m" so the intermediate passage time is excluded.

Parameters:

M (numpy.ndarray) – The transition matrix of the Markov process.
tau1 (numpy.ndarray) – Time before the first visit to an absorbing state.
taum (numpy.ndarray) – Time spent between the first and last visit to an absorbing state.
tau2 (numpy.ndarray) – Time after the last visit to an absorbing state.
doprint (bool, optional) – If True, log intermediate computation details. Default False.

Returns:

g1 (numpy.ndarray) – Unconditional MFPT for the absorbing states (0 and NS-1).
g2 (numpy.ndarray) – Unconditional MFPT for the nonboundary states.
h1 (numpy.ndarray) – Conditional MFPT for the absorbing states.
h2 (numpy.ndarray) – Conditional MFPT for the nonboundary states.

Raises:

ValueError – If the transition matrix has fewer than three states.

pyretis.analysis.repptis_msm.print_all_tau(pathensembles, taumm, taump, taupm, taupp)[source]¶

Print all tau values for each path ensemble.

Parameters:

pathensembles (list) – List of path ensemble objects, each with a .name attribute.
taumm (array-like) – First tau metric for each path.
taump (array-like) – Second tau metric for each path.
taupm (array-like) – Third tau metric for each path.
taupp (array-like) – Fourth tau metric for each path.

Raises:

ValueError – If the input arrays do not have the same length as pathensembles.

pyretis.analysis.repptis_msm.print_vector(g, states=None, sel=None)[source]¶

Print a vector g with the corresponding state labels.

Parameters:

g (array-like) – The vector to print.
states (list of str, optional) – Labels corresponding to each state. If None, indices are used.
sel (list of int, optional) – Indices of the selected states to print. If None, all states are printed.

Raises:

ValueError – If sel is provided but does not match the length of g.

pyretis.analysis.wham_analysis module¶

WHAM crossing-probability analysis for infinite-swapping RETIS.

This module computes per-interface and total crossing probabilities, the initial flux, and the rate constant from the path-data matrix of the infinite-swapping (replica-exchange) sampler – a literal infswap_data.txt if one still exists (an old run, or a pyretis.analysis.combine_data merge output), otherwise reconstructed from the per-ensemble output the scheduler writes now (see get_path_data_matrix()).

The estimator follows the standard weighted-histogram (WHAM) crossing-probability procedure for transition interface sampling: the high-acceptance (HA) weight unweighting, the eta normalisation, the Q-factor WHAM stitch (Lervik et al., J. Comput. Chem. 2015), a point-matching cross-check and a block-error analysis. The implementation returns its results as values rather than writing report files.

Data-file column layout (after str.split on a data line):

0 – path index
1 – path length
2 – maximum order parameter (lambda_max)
3 .. 3 + nintf - 1 – Cxy (fractional sampling occurrence) for ensembles [0-], [0+], [1+], …
3 + nintf .. 3 + 2*nintf - 1 – the high-acceptance (HA) weights for the same ensembles.

Here nintf is the number of interfaces and i0plus = 4 is the column of the [0+] ensemble.

pyretis.analysis.wham_analysis.INFSWAP_DATA_NAMES = ('infswap_data.txt', 'infretis_data.txt')¶: Names of the infinite-swapping data file, newest first. infswap_data.txt is the current name; infretis_data.txt is the pre-debrand legacy name, still produced by runs whose restart.toml persists the old data_file (so a restart started before the rename keeps writing the old name).

pyretis.analysis.wham_analysis._default_lamres(interfaces)[source]¶

Pick a default order-parameter resolution from the interfaces.

The WHAM stitch places every interface on a uniform grid of step lamres via round((lambda - lambda_A) / lamres). To keep adjacent interfaces on distinct grid points the step must be small relative to the smallest interface spacing – not the first one. Using the first gap (the historical default) silently collapses two interfaces onto the same grid index whenever a later gap is smaller, which corrupts the per-interface crossing probabilities. We therefore take one tenth of the smallest spacing.

Parameters:: interfaces (list of float) – Strictly increasing interface positions.
Returns:: lamres (float) – The chosen resolution.

pyretis.analysis.wham_analysis._load_run_config(directory='.')[source]¶

Load a run’s output.toml (or a legacy run file) as a dict.

Returns the parsed config from the first of output.toml / restart.toml / infswap.toml found in directory, or None if none exists. output.toml is the current single run file; the other two are the legacy names still present in runs created before the consolidation.

pyretis.analysis.wham_analysis._rec_blocks(reduced)[source]¶: Recover block averages from a reduced running-average array.

pyretis.analysis.wham_analysis._running_flux(matrix, interval=1.0)[source]¶

Running-average conventional flux, one entry per path.

The [0-]/[0+] path lengths are in simulation steps; interval (the time per step, timestep * subcycles) converts the running flux to a flux per unit time. See wham_crossing_probability() for the interval convention.

pyretis.analysis.wham_analysis._running_lengths(matrix)[source]¶: Weighted mean path length in the [0-] and [0+] ensembles.

pyretis.analysis.wham_analysis._running_pm(matrix, interfaces, lamres)[source]¶

Compute the running point-matching total crossing probability.

Running point-matching total crossing probability: at each path the product over ensembles of P_A(lambda_{i+1} | lambda_i) evaluated from the local crossing probabilities accumulated so far.

pyretis.analysis.wham_analysis._unweight_matrix(matrix, nintf)[source]¶

Unweight the Cxy columns by the HA-weights, in place.

Standard WHAM unweighting step: each Cxy value is divided by its HA-weight, the (now redundant) HA-weight column is overwritten with the running sum of the pre-unweighting Cxy (needed for the running WHAM averages), and finally each Cxy column is divided by the average inverse HA-weight so the eta values are comparable across ensembles that use different moves (e.g. [0+] shooting vs [i+] wire fencing).

Parameters:

matrix (list of list of float) – The data matrix from read_data_matrix(); modified in place.
nintf (int) – Number of interfaces.

pyretis.analysis.wham_analysis._validate_lamres(interfaces, lamres)[source]¶

Check that lamres resolves every interface separately.

Raises if two interfaces round to the same grid index (the stitch would double-count an ensemble) and warns if an interface does not sit exactly on the grid (the per-interface crossing probability is then read at the nearest grid point; the interface is mis-placed by at most lamres / 2 in lambda, and the probability error that induces depends on the local crossing-probability slope). Failing loud here is deliberate: a wrong crossing probability is worse than a refusal to run.

Parameters:

interfaces (list of float) – Strictly increasing interface positions.
lamres (float) – Candidate order-parameter resolution.

pyretis.analysis.wham_analysis._wham_pq(n_plus_ens, interfaces, lamres, eta, v_alpha)[source]¶

WHAM crossing probabilities at the interfaces and Q-factors.

WHAM crossing probabilities at the interfaces and Q-factors (Q-factor recursion of Lervik et al., JCTC 2015). Returns P (crossing probability at each interface, with lambda_B appended) and Q (the per-ensemble normalisation factors for v_alpha).

pyretis.analysis.wham_analysis._wham_ptot_run(interfaces, ploc_matrix, sum_pxy)[source]¶

Running-average total crossing probability via WHAM.

ploc_matrix[j][i] is P_A(lambda_i | lambda_j) from the [j+] ensemble using the data so far.

Each ensemble j enters the q-factor stitch weighted by sum_pxy[j]: its effective sample count = the sum of the HA-weights of its paths (the pre-unweighting Cxy sum from _unweight_matrix()). Because an HA-weight is a sample multiplicity, this makes WHAM – unlike point matching – not invariant when the same paths are re-expressed with larger HA-weights: a larger weight means more effective samples and hence more statistical weight. That is intentional and faithful to the q-factor recursion of Lervik et al., JCTC 11, 2440 (2015); see the test_wham_synthetic_weighted regression test.

pyretis.analysis.wham_analysis.analyse_wham_output(run_dir, interfaces, lamres=None, nskip=0, minblocks=5, interval=1.0)[source]¶

Run the standard WHAM analysis on an infinite-swapping run.

Parameters:

run_dir (str) – The run directory (see get_path_data_matrix(): a literal infswap_data.txt inside it takes precedence, falling back to reconstruction from per-ensemble output).
interfaces (sequence of float) – The interface positions [lambda_0, ..., lambda_B] (from the run’s restart.toml / config).
lamres (float, optional) – Order-parameter resolution (see wham_crossing_probability()).
nskip (int, optional) – Number of initial records to skip.
minblocks (int, optional) – Minimum number of blocks for the block-error analysis.
interval (float, optional) – Simulation time per recorded step (timestep * subcycles), used to convert the flux to a rate per unit time. See wham_crossing_probability(); read it from the run config with read_engine_interval(). The default 1.0 yields a rate per step.

Returns:

results (dict) – The dictionary returned by wham_crossing_probability(), plus path_lengths (the path_length_statistics() dict).

pyretis.analysis.wham_analysis.detect_infswap_output(directory='.')[source]¶

Check if a directory contains infinite-swapping sampler output.

Both the current infswap_data.txt and the legacy infretis_data.txt are recognised, so runs created (or restarted) before the data-file rename are still analysed; the current name takes precedence when both are present.

Parameters:: directory (str) – Path to check.
Returns:: data_file (str or None) – Path to the infinite-swapping data file if found, else None.

pyretis.analysis.wham_analysis.get_path_data_matrix(run_dir='.', nskip=0)[source]¶

Return the path-data matrix for a run, from whichever source exists.

The scheduler no longer writes infswap_data.txt (the per-ensemble route was made the only output format – see MERGE_TODO.md); WHAM-style analysis instead reconstructs the same matrix from the pathensemble.txt / ha_weight.txt files (pyretis.inout.pathensemble_output.reconstruct_path_data_matrix()). A literal infswap_data.txt/infretis_data.txt (an already -completed old run, or a pyretis.analysis.combine_data merge output) still takes precedence when present, via detect_infswap_output(), so this is the single entry point every consumer (pyretis.analysis.interface_estimation. interfaces_from_data(), the pyretisanalyse --method wham report) should call instead of read_data_matrix() directly.

Parameters:

run_dir (str, optional) – The run directory: either a literal data file lives directly in it, or it holds the numbered per-ensemble output directories.
nskip (int, optional) – Number of initial rows/paths to discard.

Returns:

matrix (list of list of float) – See read_data_matrix().

Raises:

FileNotFoundError – Neither a literal data file nor per-ensemble output directories were found in run_dir.

pyretis.analysis.wham_analysis.has_pathensemble_output(directory='.')[source]¶

Check if a directory holds numbered per-ensemble output directories.

Parameters:: directory (str) – Path to check.
Returns:: boolean – True if directory has at least one numeric-named subdirectory (the 000, 001, … layout).

pyretis.analysis.wham_analysis.path_length_statistics(matrix)[source]¶

Compute path-length statistics from a data matrix.

Parameters:: matrix (list of list of float) – Data matrix from read_data_matrix().
Returns:: stats (dict) – mean, std, min, max, median of path lengths.

pyretis.analysis.wham_analysis.read_data_matrix(filename, nskip=0)[source]¶

Read an infswap_data.txt file into a numeric matrix.

Parameters:

filename (str) – Path to the infswap_data.txt file.
nskip (int, optional) – Number of initial data rows to discard (loaded paths / equilibration).

Returns:

matrix (list of list of float) – One row per path; "----" entries are mapped to 0.0.

pyretis.analysis.wham_analysis.read_engine_interval(directory='.')[source]¶

Read the time per step (timestep * subcycles) from a run config.

The infinite-swapping data file records path lengths in steps, so the WHAM flux/rate needs this interval to be expressed per unit time (the convention of the PyRETIS flux analysis). The value is read from the run’s [engine] section, mirroring pyretis.analysis.flux_analysis.analyse_flux() (timestep * subcycles, with subcycles defaulting to 1).

Parameters:: directory (str) – Directory containing restart.toml (or infswap.toml).
Returns:: interval (float or None) – timestep * subcycles, or None if no config is found or it has no engine timestep (so the caller can decide whether to fail or fall back to a per-step rate rather than silently assuming a timestep).

pyretis.analysis.wham_analysis.read_interfaces(directory='.')[source]¶

Read the interface list from a run’s restart.toml.

Parameters:: directory (str) – Directory containing restart.toml (or infswap.toml).
Returns:: interfaces (list of float or None) – The interface positions, or None if no config is found.

pyretis.analysis.wham_analysis.rec_block_errors(runav, minblocks)[source]¶

Block-error analysis of a running-average series.

Standard recursive block-error analysis of the running average.

Parameters:

runav (sequence of float) – The running average of an observable, one entry per path.
minblocks (int) – Minimum number of blocks in the block-error analysis.

Returns:

half_av_err (float) – Average relative error over the second half of the block lengths.
n_stat_ineff (float) – Statistical inefficiency estimate.
rel_errors (list of float) – Relative error as a function of block length.

pyretis.analysis.wham_analysis.wham_crossing_probability(matrix, interfaces, lamres=None, minblocks=5, interval=1.0)[source]¶

Compute WHAM crossing probabilities and rate from a data matrix.

Core WHAM crossing-probability, flux and rate computation (crossing probability + flux + rate), with the file/plot output replaced by a returned dictionary. The matrix is modified in place by the HA-weight unweighting step.

Parameters:

matrix (list of list of float) – Data matrix from read_data_matrix() (already nskip-ped).
interfaces (sequence of float) – The interface positions [lambda_0, ..., lambda_B].
lamres (float, optional) – Order-parameter resolution. Defaults to one tenth of the smallest interface spacing (see _default_lamres()); this keeps non-uniform interfaces on distinct grid points. Whatever value is used is validated by _validate_lamres().
minblocks (int, optional) – Minimum number of blocks for the block-error analysis.
interval (float, optional) – Simulation time per recorded path step, i.e. timestep * subcycles. The [0-]/[0+] path lengths are in steps, so the flux is divided by this to give a rate per unit time – matching the PyRETIS flux analysis (pyretis.analysis.flux_analysis.analyse_flux()). The default 1.0 reproduces the upstream inftools convention of a rate per step; pass the real interval to obtain a physical rate.

Returns:

results (dict) – Keys: pcross_pm / pcross_wham (total crossing probability, point-matching and WHAM), pcross_pm_relerr / pcross_wham_relerr (relative block errors), pcross_at_intf (WHAM crossing probability at each interface), pcross_at_intf_pm (point-matching version), flux, rate_pm / rate_wham, rate_pm_relerr, length_0minus / length_0plus, n_records, lambda_values and pcross_curve_wham / pcross_curve_pm (the full P_A(lambda) profiles).

pyretis.analysis.combine_data module¶

Combine several infinite-swapping runs into one data file.

Ported from the upstream inftools combine_results.py. Several runs – possibly with different interface sets, e.g. successive infinit interface-placement steps, or independent runs to be pooled for better statistics – are merged into a single infswap_data.txt-shaped data file + config whose interfaces are the union of the inputs. Each run’s path data is read via pyretis.analysis.training_set.get_path_data() (a literal infswap_data.txt if the run still has one, else reconstructed from its per-ensemble output – the scheduler itself no longer writes the file). Each run’s ensemble columns are re-mapped onto the union interfaces, and the rows are proportionally interleaved so a downstream block-error estimate sees a representative mix from every run. The merged output keeps the historical infswap_data.txt matrix shape deliberately: it is a derived analysis artifact (an explicit merge a user asks for), not the scheduler’s own primary output, and downstream WHAM / interfaces_from_data consumers already read that shape.

pyretis.analysis.combine_data.combine_data(tomls, run_dirs, skip=(100,), out='combo')[source]¶

Combine several runs into <out>.txt + <out>.toml.

Parameters:

tomls (list of str) – The input TOML config files, one per run (interfaces are read from each).
run_dirs (list of str) – The matching run directories, in the same order – a literal infswap_data.txt inside one is used if present, otherwise its path data is reconstructed from per-ensemble output (see pyretis.analysis.training_set.get_path_data()).
skip (sequence of int, optional) – Initial rows to skip per run – either one value for all runs or one per run.
out (str, optional) – Output basename; writes <out>.txt and <out>.toml.

Returns:

interfaces (list of float) – The union interface set written to <out>.toml.

pyretis.analysis.interface_estimation module¶

Scientific core of the iterative infinit interface-placement driver.

Ported from the upstream inftools misc/infinit_helper.py. The infinit tool repeatedly runs a short simulation, re-estimates the crossing probability with WHAM, and re-places the interfaces so every ensemble carries roughly the same local crossing probability – converging on a good interface set automatically.

This module ports the pure, testable heart of one such iteration:

binless_pcross() – build the binless WHAM crossing-probability curve p(x) from the per-path weights returned by pyretis.analysis.path_weights.get_path_weights();
estimate_updated_interfaces() – given that curve and the current interfaces, compute the next iteration’s interfaces (and matching shooting-move list), reproducing the upstream update_toml_interfaces logic: interface-cap filtering, the “not enough data” guard, geometric placement via estimate_interface_positions(), rounding to the lambda resolution, and the land-on-state-A/B fix-ups;
interfaces_from_data() – the convenience chain data file -> weights -> Pcross -> updated interfaces.

The outer driver loop (running pyretisrun, threading the [infinit] cstep state through restart.toml, and accumulating combo data files across iterations) is orchestration and is not included here; it can be built on top of these functions plus pyretis.analysis.combine_data.combine_data().

pyretis.analysis.interface_estimation.binless_pcross(max_op, weight, first_interface)[source]¶

Build the binless crossing-probability curve from path weights.

A path that reaches order parameter max_op with unbiased weight weight contributes to the crossing probability at every order parameter up to max_op. Sorting the paths by max_op and accumulating their weights from the top down therefore gives the (unnormalised) probability to cross each order-parameter value; dividing by the total weight makes it a probability that starts at 1 at the first interface.

Parameters:

max_op (array_like) – The maximum order parameter of every positive-ensemble path.
weight (array_like) – The matching WHAM-unbiased path weight (see pyretis.analysis.path_weights.get_path_weights()).
first_interface (float) – The first interface (state-A) order parameter; the curve is anchored at (first_interface, 1.0).

Returns:

x (numpy.ndarray) – The order-parameter values (no duplicates, increasing).
p (numpy.ndarray) – The crossing probability at each x (decreasing, p[0] == 1).

pyretis.analysis.interface_estimation.estimate_updated_interfaces(x, p, interfaces, *, pl_target=0.3, num_ens=None, n_workers=1, lamres=0.001, interface_cap=None)[source]¶

Compute the next iteration’s interfaces from a Pcross curve.

Reproduces the order-parameter mathematics of the upstream update_toml_interfaces (without the restart.toml / combo bookkeeping): filter at the interface cap, bail out if the crossing probability could not be constructed, place interfaces at geometric local-Pcross spacing, round down to the lambda resolution, and nudge any interface that landed on a state boundary.

Parameters:

x, p (array_like) – The binless crossing-probability curve (see binless_pcross()).
interfaces (list of float) – The current interface positions (only the first and last, state A and state B, are used here).
pl_target (float, optional) – Target per-ensemble local crossing probability (default 0.3). The effective target is max(pl_target, p[-1] ** (1 / (2 * n_workers))) so the ensemble count cannot blow up past about twice the worker count.
num_ens (int, optional) – If given, place exactly this many ensembles (overrides pl_target).
n_workers (int, optional) – The number of simulation workers; the per-ensemble target and a lower bound on the ensemble count both depend on it.
lamres (float, optional) – The lambda resolution; interfaces are rounded down to a multiple of it (default 0.001).
interface_cap (float, optional) – Do not place interfaces above this order parameter (default: the last/state-B interface).

Returns:

result (dict or None) – None if the crossing probability could not be constructed (the caller should keep the current interfaces). Otherwise a dict with interfaces (the new list), shooting_moves ("sh","sh" then "wf" per positive ensemble) and pl_used (the actual per-ensemble local crossing probability).

pyretis.analysis.interface_estimation.interfaces_from_data(run_dir, interfaces, nskip=0, **kwargs)[source]¶

Estimate updated interfaces directly from a run’s path data.

Convenience chain: read the data matrix (a literal infswap_data.txt if one exists in run_dir, else reconstructed from the per-ensemble output – see pyretis.analysis.wham_analysis.get_path_data_matrix()), compute the WHAM-unbiased path weights, build the binless crossing probability, and re-estimate the interfaces. Keyword arguments are forwarded to estimate_updated_interfaces().

Parameters:

run_dir (str) – The run directory (see pyretis.analysis.wham_analysis. get_path_data_matrix()).
interfaces (list of float) – The current interface positions.
nskip (int, optional) – Number of initial data rows to discard.
**kwargs – Forwarded to estimate_updated_interfaces() (pl_target, num_ens, n_workers, lamres, interface_cap).

Returns:

result (dict or None) – As returned by estimate_updated_interfaces() (None if the crossing probability could not be constructed).

pyretis.analysis.interface_placement module¶

Automatic interface placement from a crossing-probability curve.

Ported from the upstream inftools infinit tool (misc/infinit_helper.py): given a (binless) crossing-probability curve – order parameter x versus total crossing probability p – place TIS/RETIS interfaces so that every ensemble carries approximately the same target local crossing probability pl_target (geometric spacing in p).

This is the self-contained scientific core of infinit; the iterative “run a short simulation, re-estimate the crossing probability with WHAM, re-place the interfaces, repeat” driver is a separate orchestration layer (it runs pyretisrun and rewrites the input) and is not included here. The crossing-probability curve consumed here is exactly the one pyretis.analysis.path_weights.get_path_weights() can emit (its binless Pcross), or any monotonically-decreasing p(x).

pyretis.analysis.interface_placement.estimate_interface_positions(x, p, pl_target=0.3, num_ens=None)[source]¶

Estimate interfaces equally spaced with respect to pl_target.

The interfaces are placed so the cumulative crossing probability drops by a constant factor (pl_target) from one interface to the next – i.e. each ensemble has a local crossing probability of about pl_target. The final (state-B / cap) interface is not added, so the probability to reach it stays pl_target.

Parameters:

x (array_like) – The order-parameter values of the crossing-probability curve (assumed to contain no duplicates, as produced by the binless Pcross).
p (array_like) – The total crossing probability at each x (monotonically decreasing, p[0] == 1).
pl_target (float, optional) – The target local crossing probability per ensemble (default 0.3).
num_ens (int, optional) – The number of ensembles to place. If None (default) it is derived from pl_target and the final crossing probability p[-1].

Returns:

interfaces (list of float) – The estimated interface positions (the first is x[0]; the state-B interface is not included).
pl_used (float) – The actual per-interface local crossing probability used (so the num_ens interfaces tile p[-1] exactly).

pyretis.analysis.path_weights module¶

Unbiased per-path weights and the transmission coefficient.

This module ports two infinite-swapping analysis tools from the upstream inftools package (tistools/path_weights.py and tistools/calc_transmission.py):

get_path_weights() – the WHAM-unbiased equilibrium weight of every positive-ensemble path, so that an observable can be averaged as <O> = sum_i w_i O_i (predictive-power analysis, binless crossing probability, …).
transmission_coefficient() – the conditional transmission coefficient for a proposed transition-state value along a chosen collective variable, defined as in Figure 8 of J. Chem. Theory Comput. (doi:10.1021/acs.jctc.5c01814).

The path weights reuse the same WHAM unweighting already implemented in pyretis.analysis.wham_analysis; the data are the infswap_data.txt-shaped matrix returned by pyretis.analysis.wham_analysis.get_path_data_matrix() (a literal file if one still exists, otherwise reconstructed from the per-ensemble output the scheduler writes now).

pyretis.analysis.path_weights.get_path_weights(matrix, interfaces)[source]¶

Compute the WHAM-unbiased weight of every positive-ensemble path.

The [0-] ensemble (the first Cxy/HA-weight column) is excluded; weights are computed for the paths sampled in the positive ensembles.

Parameters:

matrix (list of list of float) – The infswap_data.txt matrix as returned by pyretis.analysis.wham_analysis.read_data_matrix(). Each row is [path_nr, length, max_op, Cxy_0 ... Cxy_{n-1}, HA_0 ... HA_{n-1}] with "----" already mapped to 0.0.
interfaces (list of float) – The interface positions (n interfaces => n ensemble columns, ensemble 0 being [0-]).

Returns:

path_nr (numpy.ndarray) – The path index of every positive-ensemble path.
max_op (numpy.ndarray) – The maximum order parameter of each such path.
weight (numpy.ndarray) – The WHAM-unbiased equilibrium weight of each such path.

pyretis.analysis.path_weights.transmission_coefficient(matrix, interfaces, order_files, ts, *, dim=1, nskip=0)[source]¶

Estimate the conditional transmission coefficient for a CV.

Counts, for every reactive-or-recrossing positive-ensemble path, how many times the chosen collective variable dim crosses the proposed transition-state value ts, and forms the WHAM-weighted ratio of reactive paths to total positive crossings.

Parameters:

matrix (list of list of float) – The infswap_data.txt matrix (see get_path_weights()).
interfaces (list of float) – The interface positions.
order_files (callable) – order_files(path_nr) must return the per-frame order-parameter array of that path (columns [time, cv_0, cv_1, ...]), or None if it is unavailable.
ts (float) – The proposed transition-state value (along the original order parameter; reactive paths are still defined by it).
dim (int, optional) – The collective-variable column of the order array to count crossings for (default 1, the first CV after the time column).
nskip (int, optional) – Number of initial paths in matrix to discard.

Returns:

tcoeff (float) – The conditional transmission coefficient.

pyretis.analysis.training_set module¶

Collect a machine-learning training set from path-sampling output.

Ported from the upstream inftools tool tistools/collect_trainingset.py. It selects a set of configurations spread across the order-parameter range (one frame per chosen path, drawn from the interface band that path belongs to) so they can be used to train, e.g., a machine-learned committor / shooting-point selector.

The upstream tool drew its selections with numpy.random (not reproducible) and read trajectories with ASE only. This port:

threads an explicit pyretis.core.random_gen.RandomGenerator so a given seed reproduces the same selection (PyRETIS determinism);
separates the reproducible selection (select_training_frames(), no trajectory I/O) from the trajectory export (collect_training_set(), which reads/writes frames with ASE and is therefore limited to ASE-readable trajectory formats).

pyretis.analysis.training_set.collect_training_set(paths, interfaces, input_dir, n_frames, rgen, out='trainingset')[source]¶

Select training frames and export them to an .xyz + report.

This reads and writes trajectory frames with ASE, so only ASE-readable trajectory formats (multi-frame .xyz, .traj, NetCDF, …) are supported. The reproducible selection step (select_training_frames()) has no such restriction.

Parameters:

paths, interfaces, input_dir, n_frames, rgen – See select_training_frames().
out (str, optional) – Output basename; writes <out>.xyz and <out>_report.txt.

Returns:

selected (list of tuple) – The selected frames (see select_training_frames()).

pyretis.analysis.training_set.get_path_data(run_dir, nskip=0)[source]¶

Return read_path_data()’s shape for a run, from either source.

Like pyretis.analysis.wham_analysis.get_path_data_matrix(): a literal infswap_data.txt/infretis_data.txt in run_dir takes precedence (read via read_path_data()); otherwise the same per-path dicts are built from the per-ensemble output (pyretis.inout.pathensemble_output.reconstruct_path_data_matrix(), converted column-for-column to the [Cxy, HA-weight] string-pair shape read_path_data() returns, so existing consumers (select_training_frames(), pyretis.analysis.combine_data) need no further change).

Parameters:

run_dir (str) – The run directory.
nskip (int, optional) – Number of initial rows/paths to discard.

Returns:

paths (list of dict) – See read_path_data().

pyretis.analysis.training_set.read_path_data(filename)[source]¶

Parse an infswap_data.txt file into per-path dictionaries.

Port of the upstream inftools data_reader: each returned path records the ensembles it participated in (those columns where both the Cxy and the high-acceptance weight are present).

Parameters:: filename (str) – Path to the infswap_data.txt file.
Returns:: paths (list of dict) – One dict per path with keys pn (path number, str), len (str), max_op (float) and cols (a dict mapping the ensemble column index to its [Cxy, HA-weight] strings).

pyretis.analysis.training_set.select_training_frames(paths, interfaces, input_dir, n_frames, rgen)[source]¶

Reproducibly select training frames spread over the interfaces.

Paths are grouped by the ensembles they visited; from each group an even share of paths is drawn (without replacement, no path reused across groups), and from each path one frame is drawn from that group’s interface band. All draws use rgen, so a given generator state reproduces the selection exactly.

Parameters:

paths (list of dict) – As returned by read_path_data().
interfaces (list of float) – The interface positions.
input_dir (str) – Directory holding the per-path <pn>/order.txt files.
n_frames (int) – Target number of frames (split evenly across the groups).
rgen (object like RandomGenerator) – The random generator used for every selection.

Returns:

selected (list of tuple) – (path_nr, frame_index, order_value, ensemble) for each chosen frame, ordered by path number.