pyretis.analysis package¶

This package defines analysis tools for the PyRETIS program.

The analysis tools are intended to be used for analysis of the simulation output from the PyRETIS program. The typical use of this package is in post-processing of the results from a simulation (or several simulations).

Package structure¶

Modules¶

__init__.py: This file, imports from the other modules. The method to analyse results from MD flux simulations is defined here since it will make use of analysis tools from energy_analysis.py and order_analysis.py.
analysis.py (pyretis.analysis.analysis): General methods for numerical analysis.
energy_analysis.py (pyretis.analysis.energy_analysis): Defines methods useful for analysing the energy output.
histogram.py (pyretis.analysis.histogram): Defines methods useful for generating histograms.
order_analysis.py (pyretis.analysis.order_analysis): Defines methods useful for analysis of order parameters.
path_analysis.py (pyretis.analysis.path_analysis): Defines methods for analysis of path ensembles.

Important methods defined in this package¶

analyse_energies (analyse_energies()): Analyse energy data from a simulation. It will calculate a running average, a distribution and do a block error analysis.
analyse_flux (analyse_flux()): Analyse flux data from a MD flux simulation. It will calculate a running average, a distribution and do a block error analysis.
analyse_orderp (analyse_orderp()): Analyse order parameter data. It will calculate a running average, a distribution and do a block error analysis. In addition, it will analyse the mean square displacement (if requested).
analyse_path_ensemble (analyse_path_ensemble()): Analyse the results from a single path ensemble. It will calculate a running average of the probabilities, a crossing probability, perform a block error analysis, analyse lengths of paths, type of Monte Carlo moves and calculate an efficiency.
match_probabilities (match_probabilities()): Method to match probabilities from several path simulations. Useful for obtaining the overall crossing probability.
histogram (histogram()): Generates histogram, basically a wrapper around numpy’s histogram.
match_all_histograms (match_all_histograms()): Method to match histograms from umbrella simulations.
retis_flux (retis_flux()): Method for calculating the initial flux for RETIS simulations.
retis_rate (retis_rate()): Method for calculating the rate constant for RETIS simulations.

pyretis.analysis.analyse_md_flux(crossdata, energydata, orderdata, settings)[source]¶

Analyse the output from a MD-flux simulation.

The obtained results will be returned as a convenient structure for plotting or reporting.

Parameters:

Parameters:	crossdata (numpy.array) – This is the data containing information about crossings. energydata (numpy.array) – This is the raw data for the energies. orderdata (numpy.array) – This is the raw data for the order parameter. settings (dict) – The settings for the analysis (e.g block length for error analysis) and some settings from the simulation (interfaces, time step etc.).
Returns:	results (dict) – This dict contains the results from the different analysis as a dictionary. This dict can be used further for plotting or for generating reports.

crossdata (numpy.array) – This is the data containing information about crossings.
energydata (numpy.array) – This is the raw data for the energies.
orderdata (numpy.array) – This is the raw data for the order parameter.
settings (dict) – The settings for the analysis (e.g block length for error analysis) and some settings from the simulation (interfaces, time step etc.).

Returns:

results (dict) – This dict contains the results from the different analysis as a dictionary. This dict can be used further for plotting or for generating reports.

pyretis.analysis.analysis module¶

Module defining functions useful in the analysis of simulation data.

Important methods defined here¶

running_average (running_average()): Method to calculate a running average.
block_error (block_error()): Perform block error analysis.
block_error_corr (block_error_corr()): Method to run a block error analysis and calculate relative errors and correlation length.

pyretis.analysis.analysis.running_average(data)[source]¶

Create a running average of the given data.

The running average will be calculated over the rows.

Parameters:	data (numpy.array) – This is the data we will average.
Returns:	out (numpy.array) – The running average.

pyretis.analysis.analysis.block_error(data, maxblock=None, blockskip=1)[source]¶

Perform block error analysis.

This function will estimate the standard deviation in the input data by performing a block analysis. The number of blocks to consider can be specified or it will be taken as the half of the length of the input data. Averages and variance are calculated using an on-the-fly algorithm [1].

Parameters:

Parameters:	data (numpy.array (or iterable with data points)) – The data to analyse. maxblock (int, optional) – Can be used to set the maximum length of the blocks to consider. Note that the maxbloc will never be set longer than half the length in data. blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.
Returns:	blocklen (numpy.array) – These contain the block lengths considered. block_avg (numpy.array) – The averages as a function of the block length. block_err (numpy.array) – Estimate of errors as a function of the block length. block_err_avg (float) – Average of the error estimate using blocks where `length > maxblock//2`.

data (numpy.array (or iterable with data points)) – The data to analyse.
maxblock (int, optional) – Can be used to set the maximum length of the blocks to consider. Note that the maxbloc will never be set longer than half the length in data.
blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.

Returns:

blocklen (numpy.array) – These contain the block lengths considered.
block_avg (numpy.array) – The averages as a function of the block length.
block_err (numpy.array) – Estimate of errors as a function of the block length.
block_err_avg (float) – Average of the error estimate using blocks where length > maxblock//2.

References

[1]	Wikipedia, “Algorithms for calculating variance”, http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

pyretis.analysis.analysis.block_error_corr(data, maxblock=None, blockskip=1)[source]¶

Run block error analysis on the given data.

This will run the block error analysis and return the relative errors and correlation length.

Parameters:

Parameters:	data (numpy.array) – Data to analyse. maxblock (int, optional) – The maximum block length to consider. blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.
Returns:	out[0] (numpy.array) – These contains the block lengths considered (blen). out[1] (numpy.array) – Estimate of errors as a function of the block length (berr). out[2] (float) – Average of the error estimate for blocks (berr_avg) with `length > maxblock // 2`. out[3] (numpy.array) – Estimate of relative errors normalised by the overall average as a function of block length (rel_err). out[4] (float) – The average relative error (avg_rel_err), for blocks with `length > maxblock // 2`. out[5] (numpy.array) – The estimated correlation length as a function of the block length (ncor). out[6] (float) – The average (for blocks with length > maxblock // 2) estimated correlation length (avg_ncor).

data (numpy.array) – Data to analyse.
maxblock (int, optional) – The maximum block length to consider.
blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.

Returns:

out[0] (numpy.array) – These contains the block lengths considered (blen).
out[1] (numpy.array) – Estimate of errors as a function of the block length (berr).
out[2] (float) – Average of the error estimate for blocks (berr_avg) with length > maxblock // 2.
out[3] (numpy.array) – Estimate of relative errors normalised by the overall average as a function of block length (rel_err).
out[4] (float) – The average relative error (avg_rel_err), for blocks with length > maxblock // 2.
out[5] (numpy.array) – The estimated correlation length as a function of the block length (ncor).
out[6] (float) – The average (for blocks with length > maxblock // 2) estimated correlation length (avg_ncor).

pyretis.analysis.energy_analysis module¶

Methods for analysing energy data from simulations.

Important methods defined here¶

analyse_energies (py:func:.analyse_energies): Run the analysis for energies (kinetic, potential etc.).

pyretis.analysis.energy_analysis.analyse_energies(energies, settings)[source]¶

Run the energy analysis on several energy types.

The function will run the energy analysis on several energy types and collect the energies into a structure which is convenient for plotting the results.

Parameters:	energies (dict) – This dict contains the energies to analyse. settings (dict) – This dictionary contains settings for the analysis.
Returns:	results (dict) – For each energy key results[key] contains the result from the energy analysis.

See also

analyse_data()

pyretis.analysis.flux_analysis module¶

Methods for analysis of crossings for flux data.

Important methods defined here¶

analyse_flux (analyse_flux()): Run analysis for simulation flux data. This will calculate the initial flux for a simulation.

pyretis.analysis.flux_analysis.analyse_flux(fluxdata, settings)[source]¶

Run the analysis on the given flux data.

This will run the flux analysis and collect the results into a structure which is convenient for plotting and reporting the results.

Parameters:	fluxdata (list of tuples of integers) – This array contains the data obtained from a MD simulation for the fluxes. settings (dict) – This dict contains the settings for the analysis. Note that this dictionary also needs some settings from the simulation, in particular the number of cycles, the interfaces and information about the time step.
Returns:	results (dict) – This dict contains the results from the flux analysis. The keys are defined in the results variable.

pyretis.analysis.histogram module¶

Histogram functions for data analysis.

This module defines some simple functions for histograms.

Important methods defined here¶

histogram (histogram()): Create a histogram from given data.
match_all_histograms (match_all_histograms()): Function to match histograms, for instance from an umbrella sampling simulation.
histogram_and_avg (histogram_and_avg()): Create a histogram and return bins, midpoints and simple statistics.

pyretis.analysis.histogram.histogram(data, bins=10, limits=(-1, 1), density=False, weights=None)[source]¶

Create a histogram of the given data.

Parameters:

Parameters:	data (numpy.array) – The data for making the histogram. bins (int, optional) – The number of bins to divide the data into. limits (tuple/list, optional) – The max/min values to consider. density (boolean, optional) – If True the histogram will be normalized. weights (numpy.array, optional) – Weighting factors for data.
Returns:	hist (numpy.array) – The binned counts. bins (numpy.array) – The edges of the bins. bin_mid (numpy.array) – The midpoint of the bins.

data (numpy.array) – The data for making the histogram.
bins (int, optional) – The number of bins to divide the data into.
limits (tuple/list, optional) – The max/min values to consider.
density (boolean, optional) – If True the histogram will be normalized.
weights (numpy.array, optional) – Weighting factors for data.

Returns:

hist (numpy.array) – The binned counts.
bins (numpy.array) – The edges of the bins.
bin_mid (numpy.array) – The midpoint of the bins.

Examples

>>> import numpy as np
>>> from pyretis.analysis.histogram import histogram
>>> data = np.random.randn(50000)
>>> hist, bins, bin_mid = histogram(data, bins=30, limits=(-5, 5))

For plotting the histogram:

>>> from matplotlib import pyplot as plt
>>> plt.plot(bin_mid, hist, '-o', lw=3, alpha=0.8, ms=9)
>>> plt.show()

pyretis.analysis.histogram.match_all_histograms(histograms, umbrellas)[source]¶

Match several histograms from an umbrella sampling.

Parameters:

Parameters:	histograms (list of numpy.arrays) – The histograms to match. umbrellas (list of lists) – The umbrella windows used in the computation.
Returns:	histograms_s (list of numpy.arrays) – The scaled histograms. scale_factor (list of floats) – The scale factors. matched_count (numpy.array) – Count for overall matched histogram (an “averaged” histogram).

histograms (list of numpy.arrays) – The histograms to match.
umbrellas (list of lists) – The umbrella windows used in the computation.

Returns:

histograms_s (list of numpy.arrays) – The scaled histograms.
scale_factor (list of floats) – The scale factors.
matched_count (numpy.array) – Count for overall matched histogram (an “averaged” histogram).

pyretis.analysis.histogram.histogram_and_avg(data, bins, density=True)[source]¶

Create histogram an return bins, midpoints and simple statistics.

The simple statistics include the mean value and the standard deviation. The return structure is useful for plotting routines. The midpoints returned are the midpoints of the bins.

Parameters:

Parameters:	data (1D numpy.array) – This is the data to create the histogram from. bins (int) – The number of bins to use for the histogram. density (boolean, optional) – If density is true, the histogram will be normalized.
Returns:	out[0] (numpy.array) – The histogram (frequency) values. out[1] (numpy.array) – The midpoints for the bins. out[2] (tuple of floats) – These are some simple statistics, out[2][0] is the average out[2][1] is the standard deviation.

data (1D numpy.array) – This is the data to create the histogram from.
bins (int) – The number of bins to use for the histogram.
density (boolean, optional) – If density is true, the histogram will be normalized.

Returns:

out[0] (numpy.array) – The histogram (frequency) values.
out[1] (numpy.array) – The midpoints for the bins.
out[2] (tuple of floats) – These are some simple statistics, out[2][0] is the average out[2][1] is the standard deviation.

See also

histogram()

Examples

>>> import numpy as np
>>> from pyretis.analysis.histogram import histogram_and_avg
>>> data = np.random.randn(50000)
>>> hist_data = histogram_and_avg(data, bins=30)

Print out the average and standard deviation:

>>> print(hist_data[2])

For plotting with matplotlib:

>>> from matplotlib import pyplot as plt
>>> plt.plot(hist_data[1], hist_data[0], '-o', lw=3, alpha=0.8, ms=9)
>>> plt.show()

pyretis.analysis.order_analysis module¶

Module defining functions for analysis of order parameters.

Important methods defined here¶

analyse_orderp (analyse_orderp()): Run a simple order parameter analysis.

pyretis.analysis.order_analysis.analyse_orderp(orderdata, settings)[source]¶

Run the analysis on several order parameters.

The results are collected into a structure which is convenient for plotting.

Parameters:	orderdata (numpy.arrays) – The data read from the order parameter file. settings (dict) – This dictionary contains settings for the analysis.
Returns:	results (numpy.array) – For each order parameter key, results[key] contains the result of the analysis.

See also

None()

Note

We here (and in the plotting routines) make certain assumptions about the structure, i.e. the positions are assumed to have a specific meaning: column zero is the time, column one the order parameter and so on.

pyretis.analysis.path_analysis module¶

Methods for analysis of path ensembles.

Important methods defined here¶

analyse_path_ensemble (analyse_path_ensemble()): Method to analyse a path ensemble, it will calculate crossing probabilities and information about moves etc. This method can be applied to files as well as path ensemble objects.
analyse_path_ensemble_object (analyse_path_ensemble_object()): Method to analyse a path ensemble, it will calculate crossing probabilities and information about moves etc. This method is intended to work directly on path ensemble objects.
match_probabilities (match_probabilities()): Match probabilities from several path ensembles and calculate efficiencies and the error for the matched probability.
retis_flux (retis_flux()): Calculate the initial flux with errors for a RETIS simulation.
retis_rate (retis_rate()): Calculate the rate constant with errors for a RETIS simulation.

pyretis.analysis.path_analysis.analyse_path_ensemble(path_ensemble, settings)[source]¶

Analyse a path ensemble.

This function will make use of the different analysis functions and analyse a path ensemble. This function is more general than the analyse_path_ensemble_object function in that it should work on both PathEnsemble and PathEnsembleFile objects. The running average is updated on-the-fly, see Wikipedia for details [wikimov].

Parameters:

Parameters:	path_ensemble (object like `PathEnsemble`) – This is the path ensemble to analyse. settings (dict) – This dictionary contains settings for the analysis. We make use of the following keys: ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter. maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis. blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc. bins: The number of bins to use for creating histograms.
Returns:	out (dict) – This dictionary contains the main results for the analysis which can be used for plotting or other kinds of output.

path_ensemble (object like PathEnsemble) – This is the path ensemble to analyse.
settings (dict) – This dictionary contains settings for the analysis. We make use of the following keys:
- ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter.
- maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis.
- blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc.
- bins: The number of bins to use for creating histograms.

Returns:

out (dict) – This dictionary contains the main results for the analysis which can be used for plotting or other kinds of output.

See also

_update_shoot_stats(), pcross_lambda_cumulative()

and(): py:func:._create_shoot_histograms.

References

[wikimov]

Wikipedia, “Moving Average”, https://en.wikipedia.org/wiki/Moving_average

pyretis.analysis.path_analysis.analyse_path_ensemble_object(path_ensemble, settings)[source]¶

Analyse a path ensemble object.

This function will make use of the different analysis functions and analyse a path ensemble. This analysis function assumes that the given path ensemble is an object like PathEnsemble and that this path ensemble contains all the paths that are needed.

Parameters:

Parameters:	path_ensemble (object like `PathEnsemble`) – The path ensemble to analyse. settings (dict) – This dictionary contains settings for the analysis. Here we make use of the following keys from the analysis section: ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter. maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis. blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc. bins: The number of bins to use for creating histograms.
Returns:	out (dict) – This dictionary contains the main results for the analysis which can be used for plotting or other kinds of output.

path_ensemble (object like PathEnsemble) – The path ensemble to analyse.
settings (dict) – This dictionary contains settings for the analysis. Here we make use of the following keys from the analysis section:
- ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter.
- maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis.
- blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc.
- bins: The number of bins to use for creating histograms.

Returns:

out (dict) – This dictionary contains the main results for the analysis which can be used for plotting or other kinds of output.

See also

_pcross_lambda(), _running_pcross(), _get_path_distribution()

pyretis.analysis.path_analysis.match_probabilities(path_results, detect, settings=None)[source]¶

Match probabilities from several path ensembles.

It calculates the efficiencies and errors for the matched probability.

Parameters:

Parameters:	path_results (list) – These are the results from the path analysis. path_results[i] contains the output from analyse_path_ensemble applied to ensemble no. i. Here we make use of the following keys from path_results[i]: * pcross: The crossing probability. * prun: The running average of the crossing probability. * blockerror: The output from the block error analysis. * efficiency: The output from the efficiency analysis. detect (list of floats) – These are the detect interfaces used in the analysis. settings (dict, optional) – This dictionary contains settings for the analysis. Here we make use of the following keys from the analysis section: ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter. maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis. blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc. bins: The number of bins to use for creating histograms.
Returns:	results (dict) – These are results for the over-all probability and error and also some over-all TIS efficiencies.

path_results (list) – These are the results from the path analysis. path_results[i] contains the output from analyse_path_ensemble applied to ensemble no. i. Here we make use of the following keys from path_results[i]: * pcross: The crossing probability. * prun: The running average of the crossing probability. * blockerror: The output from the block error analysis. * efficiency: The output from the efficiency analysis.
detect (list of floats) – These are the detect interfaces used in the analysis.
settings (dict, optional) – This dictionary contains settings for the analysis. Here we make use of the following keys from the analysis section:
- ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter.
- maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis.
- blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc.
- bins: The number of bins to use for creating histograms.

Returns:

results (dict) – These are results for the over-all probability and error and also some over-all TIS efficiencies.

pyretis.analysis.path_analysis.retis_flux(results0, results1, timestep)[source]¶

Calculate the initial flux for RETIS.

Parameters:

Parameters:	results0 (dict) – Results from the analysis of ensemble [0^-] results1 (dict) – Results from the analysis of ensemble [0^+] timestep (float) – The simulation timestep.
Returns:	flux (float) – The initial flux. flux_error (float) – The relative error in the initial flux.

results0 (dict) – Results from the analysis of ensemble [0^-]
results1 (dict) – Results from the analysis of ensemble [0^+]
timestep (float) – The simulation timestep.

Returns:

flux (float) – The initial flux.
flux_error (float) – The relative error in the initial flux.

pyretis.analysis.path_analysis.retis_rate(pcross, pcross_relerror, flux, flux_relerror)[source]¶

Calculate the rate constant for RETIS.

Parameters:

Parameters:	pcross (float) – Estimated crossing probability pcross_relerror (float) – Relative error in crossing probability. flux (float) – The initial flux. flux_relerror (float) – Relative error in the initial flux.
Returns:	rate (float) – The rate constant rate_error (float) – The relative error in the rate constant.

pcross (float) – Estimated crossing probability
pcross_relerror (float) – Relative error in crossing probability.
flux (float) – The initial flux.
flux_relerror (float) – Relative error in the initial flux.

Returns:

rate (float) – The rate constant
rate_error (float) – The relative error in the rate constant.