1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265
|
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
# (see the file AUTHORS for the full list of names)
#
# Released under the Lesser GNU Public Licence, v2.1 or any higher version
#
# Please cite your use of MDAnalysis in published work:
#
# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
# doi: 10.25080/majora-629e541a-00e
#
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#
"""Correlations utilities --- :mod:`MDAnalysis.lib.correlations`
=================================================================================
:Authors: Paul Smith & Mateusz Bieniek
:Year: 2020
:Copyright: Lesser GNU Public License v2.1+
.. versionadded:: 1.0.0
This module is primarily for internal use by other analysis modules. It
provides functionality for calculating the time autocorrelation function
of a binary variable (i.e one that is either true or false at each
frame for a given atom/molecule/set of molecules). This module includes
functions for calculating both the time continuous autocorrelation and
the intermittent autocorrelation. The function :func:`autocorrelation`
calculates the continuous autocorrelation only. The data may be
pre-processed using the function :func:`intermittency` in order to
acount for intermittency before passing the results to
:func:`autocorrelation`.
This module is inspired by seemingly disparate analyses that rely on the same
underlying calculation, including the survival probability of water around
proteins :footcite:p:`ArayaSecchi2014`, hydrogen bond lifetimes
:footcite:p:`Gowers2015,ArayaSecchi2014`, and the rate of cholesterol
flip-flop in lipid bilayers :footcite:p:`Gu2019`.
.. seeAlso::
Analysis tools that make use of modules:
* :class:`MDAnalysis.analysis.waterdynamics.SurvivalProbability`
Calculates the continuous or intermittent survival probability
of an atom group in a region of interest.
* :class:`MDAnalysis.analysis.hbonds.hbond_analysis`
Calculates the continuous or intermittent hydrogen bond
lifetime.
.. rubric:: References
.. footbibliography::
"""
import numpy as np
from copy import deepcopy
def autocorrelation(list_of_sets, tau_max, window_step=1):
r"""Implementation of a discrete autocorrelation function.
The autocorrelation of a property :math:`x` from a time :math:`t=t_0` to :math:`t=t_0 + \tau`
is given by:
.. math::
C(\tau) = \langle \frac{ x(t_0)x(t_0 +\tau) }{ x(t_0)x(t_0) } \rangle
where :math:`x` may represent any property of a particle, such as velocity or
potential energy.
This function is an implementation of a special case of the time
autocorrelation function in which the property under consideration can
be encoded with indicator variables, :math:`0` and :math:`1`, to represent the binary
state of said property. This special case is often referred to as the
survival probability (:math:`S(\tau)`). As an example, in calculating the survival
probability of water molecules within 5 Å of a protein, each water
molecule will either be within this cutoff range (:math:`1`) or not (:math:`0`). The
total number of water molecules within the cutoff at time :math:`t_0` will be
given by :math:`N(t_0)`. Other cases include the Hydrogen Bond Lifetime as
well as the translocation rate of cholesterol across a bilayer.
The survival probability of a property of a set of particles is
given by:
.. math::
S(\tau) = \langle \frac{ N(t_0, t_0 + \tau )} { N(t_0) }\rangle
where :math:`N(t0)` is the number of particles at time :math:`t_0` for which the feature
is observed, and :math:`N(t0, t_0 + \tau)` is the number of particles for which
this feature is present at every frame from :math:`t_0` to :math:`N(t0, t_0 + \tau)`.
The angular brackets represent an average over all time origins, :math:`t_0`.
See :footcite:`ArayaSecchi2014` for a description survival probability.
Parameters
----------
list_of_sets : list
List of sets. Each set corresponds to data from a single frame. Each element in a set
may be, for example, an atom id or a tuple of atoms ids. In the case of calculating the
survival probability of water around a protein, these atom ids in a given set will be
those of the atoms which are within a cutoff distance of the protein at a given frame.
tau_max : int
The last tau (lag time, inclusive) for which to calculate the autocorrelation. e.g if tau_max = 20,
the survival probability will be calculated over 20 frames.
window_step : int, optional
The step size for t0 to perform autocorrelation. Ideally, window_step will be larger than
tau_max to ensure independence of each window for which the calculation is performed.
Default is 1.
Returns
--------
tau_timeseries : list of int
the values of tau for which the autocorrelation was calculated
timeseries : list of int
the autocorelation values for each of the tau values
timeseries_data : list of list of int
the raw data from which the autocorrelation is computed, i.e :math:`S(\tau)` at each window.
This allows the time dependant evolution of :math:`S(\tau)` to be investigated.
.. versionadded:: 0.19.2
"""
# check types
if (type(list_of_sets) != list and len(list_of_sets) != 0) or type(
list_of_sets[0]
) != set:
raise TypeError(
"list_of_sets must be a one-dimensional list of sets"
) # pragma: no cover
# Check dimensions of parameters
if len(list_of_sets) < tau_max:
raise ValueError(
"tau_max cannot be greater than the length of list_of_sets"
) # pragma: no cover
tau_timeseries = list(range(1, tau_max + 1))
timeseries_data = [[] for _ in range(tau_max)]
# calculate autocorrelation
for t in range(0, len(list_of_sets), window_step):
Nt = len(list_of_sets[t])
if Nt == 0:
continue
for tau in tau_timeseries:
if t + tau >= len(list_of_sets):
break
# continuous: IDs that survive from t to t + tau and at every frame in between
Ntau = len(set.intersection(*list_of_sets[t : t + tau + 1]))
timeseries_data[tau - 1].append(Ntau / float(Nt))
timeseries = [np.mean(x) for x in timeseries_data]
# at time 0 the value has to be one
tau_timeseries.insert(0, 0)
timeseries.insert(0, 1)
return tau_timeseries, timeseries, timeseries_data
def correct_intermittency(list_of_sets, intermittency):
r"""Preprocess data to allow intermittent behaviour prior to calling :func:`autocorrelation`.
Survival probabilty may be calculated either with a strict continuous requirement or
a less strict intermittency. If calculating the survival probability water around a
protein for example, in the former case the water must be within a cutoff distance
of the protein at every frame from :math:`t_0` to :math:`t_0 + \tau` in order for it to be considered
present at :math:`t_0 + \tau`. In the intermittent case, the water molecule is allowed to
leave the region of interest for up to a specified consecutive number of frames whilst still
being considered present at :math:`t_0 + \tau`.
This function pre-processes data, such as the atom ids of water molecules within a cutoff
distance of a protein at each frame, in order to allow for intermittent behaviour, with a
single pass over the data.
For example, if an atom is absent for a number of frames equal or smaller than the parameter
:attr:`intermittency`, then this absence will be removed and thus the atom is considered to have
not left.
e.g 7,A,A,7 with `intermittency=2` will be replaced by 7,7,7,7, where A=absence.
The returned data can be used as input to the function :func:`autocorrelation` in order
to calculate the survival probability with a given intermittency.
See :footcite:p:`Gowers2015` for a description of intermittency in the
calculation of hydrogen bond lifetimes.
# TODO - is intermittency consitent with list of sets of sets? (hydrogen bonds)
Parameters
----------
list_of_sets: list
In the simple case of e.g survival probability, a list of sets of atom ids present at each frame, where a
single set contains atom ids at a given frame, e.g [{0, 1}, {0}, {0}, {0, 1}]
intermittency : int
The maximum gap allowed. The default `intermittency=0` means that if the datapoint is missing at any frame, no
changes are made to the data. With the value of `intermittency=2`, all datapoints missing for up to two
consecutive frames will be instead be considered present.
Returns
-------
list_of_sets: list
returns a new list with the IDs with added IDs which disappeared for <= :attr:`intermittency`.
e.g If [{0, 1}, {0}, {0}, {0, 1}] is a list of sets of atom ids present at each frame and `intermittency=2`,
both atoms will be considered present throughout and thus the returned list of sets will be
[{0, 1}, {0, 1}, {0, 1}, {0, 1}].
"""
if intermittency == 0:
return list_of_sets
list_of_sets = deepcopy(list_of_sets)
# an element (a superset) takes the form of:
# - an atom pair when computing hydrogen bonds lifetime,
# - atom ID in the case of water survival probability,
for i, elements in enumerate(list_of_sets):
# initially update each frame as seen 0 ago (now)
seen_frames_ago = {i: 0 for i in elements}
for j in range(1, intermittency + 2):
for element in seen_frames_ago.keys():
# no more frames
if i + j >= len(list_of_sets):
continue
# if the element is absent now
if element not in list_of_sets[i + j]:
# increase its absence counter
seen_frames_ago[element] += 1
continue
# the element is present
if seen_frames_ago[element] == 0:
# the element was present in the last frame
continue
# element was absent more times than allowed
if seen_frames_ago[element] > intermittency:
continue
# The element was absent but returned, i.e.
# within <= intermittency_value.
# Add it to the frames where it was absent.
# Introduce the corrections.
for k in range(seen_frames_ago[element], 0, -1):
list_of_sets[i + j - k].add(element)
seen_frames_ago[element] = 0
return list_of_sets
|