.. _examples-pyvisa-analysis: Post-processing and visualization with |pyvisa| =============================================== In this example, we perform post-processing on the methane hydrate system from one of the previous examples called "Using GROMACS". All post-processing is done with the |pyvisa| analysis tab, shown in :numref:`fig_analysis_tab`, and the recalculation tool. We will continue the study of cage-to-cage diffusion within a S1 hydrate and add new collective variables. Then we perform clustering, principal component analysis (PCA), and visualization of the new results. These analysis methods require the ``scikit-learn`` and ``scipy`` packages. **Verification status:** engine -- see :ref:`example-tutorial-map`. Tutorial quick start -------------------- * **Best starting point:** first complete :ref:`examples-gromacs-hydrate` or another path-sampling tutorial that produces an ``out.rst`` file and trajectories. * **Requirements:** ``scikit-learn`` and ``scipy`` for clustering and PCA, plus PyQt5 for the GUI. * **Compress original data:** ``pyvisa -i out.rst -cmp``. * **Recalculate added variables:** edit ``retis.rst`` and ``orderp.py``, then run ``pyvisa -i retis.rst -recalculate``. * **Expected output:** compressed HDF5 data, recalculated collective variables, and clustering/PCA data saved by |pyvisa|. * **Related check:** :file:`examples/tests/test-pyvisa/`; see :ref:`example-test-status`. .. _fig_analysis_tab: .. figure:: /_static/examples/pyvisa-analysis/fig_analysis_tab.png :alt: PyVisA analysis-tab :width: 50% :align: center Analysis tab of |pyvisa| with options for interactivity, animation, and post-processing. Before we proceed with the analysis, we need to have some simulation results, so please run the example :ref:`Using GROMACS ` first. The example is structured as follows: .. contents:: :local: Visualization and compression ----------------------------- When the simulation has finished, the results can be visualized and post-processed. It is always important to save the original data, because the recalculation creates new |pyvisa| data from the trajectory files of the old simulation. This loses the frames between stored trajectory files. The following steps are suggested: 1. Compress the original data into an HDF5 file using: .. code-block:: pyretis pyvisa -i out.rst -cmp 2. Compress only the order parameter data into an ``.hdf5.zip`` file using: .. code-block:: pyretis pyvisa -i out.rst -cmp -oo 3. Rename these files to ``standard_simulation.hdf5.zip``. The results can be visualized from the ``.rst`` files or from the compressed files. If you want to visualize the data, the following commands can be run: .. code-block:: pyretis pyvisa -i .. code-block:: pyretis pyvisa -i -data where ```` can be either an ``.rst`` file or an ``.hdf5.zip`` file. If you are using the ``-data`` command, then ```` can be ``all`` to load all data, a trajectory file, or a simulation directory. For example, the first path ensemble can be loaded by using ``000`` as ````. With |pyvisa|, you can visualize trajectories from all path ensembles and sort them by status and Monte Carlo move. Try finding and visualizing a reactive pathway by clicking on the plot and pressing "Show trj" in the Analysis tab. This highlights the points that belong to the selected trajectory. You can also customize how the visualization looks. In :numref:`fig_dens_reactive` a reactive pathway has been shown. .. _fig_dens_reactive: .. figure:: /_static/examples/pyvisa-analysis/fig_dens_reactive.png :alt: A reactive pathway :width: 70% :align: center Density plot of the order parameter and the kinetic energy from the methane hydrate example with a reactive pathway shown in green. Recalculation of new collective variables ----------------------------------------- Now that the old data has been stored, we can add two more collective variables to the simulation using |pyvisa|. The collective variables we introduce are the area of the six-membered ring that the methane jumps through, and the volume of the starting cage. These descriptors will try to capture the breathing of the starting cage. The following lines must be added to the retis.rst file: .. literalinclude:: /_static/examples/pyvisa-analysis/order_rec.rst :language: rst With the collective variable added to ``retis.rst``, we also need to add the Python script for the calculation in the ``orderp.py`` file. The ``scipy`` package is needed to run this script, so make sure that it is installed. The full script is given here: .. literalinclude:: /_static/examples/pyvisa-analysis/orderp.py :language: python :lines: 20-59 As we also have the z-coordinate of the methane molecule as a collective variable, we also need to add this to our ``orderp.py`` file so that the recalculation tool can use it. See the full script for the ``Position`` descriptor here: .. literalinclude:: /_static/examples/pyvisa-analysis/orderp.py :language: python :lines: 62-125 Now, to begin the recalculation, we start |pyvisa| by loading all the data with the ``retis.rst`` file with the command: .. code-block:: pyretis pyvisa -i retis.rst -recalculate It is important to run |pyvisa| with the modified ``retis.rst`` file so the program knows what has been added. The recalculation can also be started from the |pyvisa| GUI file menu by selecting "Recalculate data". The warning shown in :numref:`fig_recalc_note` will appear. Press yes if you have added the post-processing requirements to both ``retis.rst`` and ``orderp.py``. .. _fig_recalc_note: .. figure:: /_static/examples/pyvisa-analysis/recalc_note.png :alt: Warning issued before recalculation of new collective variables. :width: 50% :align: center The warning issued by |pyvisa| before the recalculations. The recalculation will now start. When the procedure is finished, |pyvisa| loads the new data into the GUI and displays a message that the new data can be visualized. Post-processing and clustering ------------------------------ Now that we have the new data, we can use |pyvisa| to perform clustering and PCA. The following steps can be done in the data exploration: 1. Go to the Analysis tab of |pyvisa| and press the button "Show correlations". This will produce the correlation matrix. From these results, plot the order parameter and one of the collective variables. 2. Select a number of components to use in clustering and from the Analysis-selection, pick an algorithm for clustering and press the Analysis-button. This will produce a cluster plot of the chosen variables. Try to start with k-means, and then try Gaussian mixture and spectral clustering to see if there is a difference between the methods. 3. Try to perform a principal component analysis of the results. Begin by selecting 3 components, and PCA as the method, and press the analysis- button. This will produce the loading matrix, the scores plot from the first two components, and the cumulative explained variance. How much variance was retained? Were three components enough? Are there any strong correlations between the principal components and the original descriptors? If you want to study the results from the principal component analysis further, the data will be stored in an HDF5 file in your simulation directory containing all the simulation data, and the data from the analysis. Tested by --------- The automated coverage for this workflow is represented by :file:`examples/tests/test-pyvisa/`. The GUI portions remain environment-dependent, while the command-line compression and recalculation paths are the right targets for repeatable checks.