File: msd.py

package info (click to toggle)
mdanalysis 2.10.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 116,696 kB
  • sloc: python: 92,135; ansic: 8,156; makefile: 215; sh: 138
file content (491 lines) | stat: -rw-r--r-- 17,942 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
# (see the file AUTHORS for the full list of names)
#
# Released under the Lesser GNU Public Licence, v2.1 or any higher version
#
# Please cite your use of MDAnalysis in published work:
#
# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
# doi: 10.25080/majora-629e541a-00e
#
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#

r"""
Mean Squared Displacement --- :mod:`MDAnalysis.analysis.msd`
==============================================================

:Authors: Hugo MacDermott-Opeskin
:Year: 2020
:Copyright: Lesser GNU Public License v2.1+

This module implements the calculation of Mean Squared Displacements (MSDs)
by the Einstein relation. MSDs can be used to characterize the speed at
which particles move and has its roots in the study of Brownian motion.
For a full explanation of the theory behind MSDs and the subsequent calculation
of self-diffusivities the reader is directed to :footcite:p:`Maginn2019`.
MSDs can be computed from the following expression, known as the
**Einstein formula**:

.. math::

   MSD(r_{d}) = \bigg{\langle} \frac{1}{N} \sum_{i=1}^{N} |r_{d}
    - r_{d}(t_0)|^2 \bigg{\rangle}_{t_{0}}

where :math:`N` is the number of equivalent particles the MSD is calculated
over, :math:`r` are their coordinates and :math:`d` the desired dimensionality
of the MSD. Note that while the definition of the MSD is universal, there are
many practical considerations to computing the MSD that vary between
implementations. In this module, we compute a "windowed" MSD, where the MSD
is averaged over all possible lag-times :math:`\tau \le \tau_{max}`,
where :math:`\tau_{max}` is the length of the trajectory, thereby maximizing
the number of samples.

The computation of the MSD in this way can be computationally intensive due to
its :math:`N^2` scaling with respect to :math:`\tau_{max}`. An algorithm to
compute the MSD with :math:`N log(N)` scaling based on a Fast Fourier
Transform is known and can be accessed by setting ``fft=True`` [Calandri2011]_
[Buyl2018]_. The FFT-based approach requires that the
`tidynamics <https://github.com/pdebuyl-lab/tidynamics>`_ package is
installed; otherwise the code will raise an :exc:`ImportError`.

Please cite [Calandri2011]_ [Buyl2018]_ if you use this module in addition to
the normal MDAnalysis citations.

.. warning::
   To correctly compute the MSD using this analysis module, you must supply
   coordinates in the **unwrapped** convention, also known as **no-jump**. 
   That is, when atoms pass the periodic boundary, they must not be wrapped 
   back into the primary simulation cell.
   
   In MDAnalysis you can use the 
   :class:`~MDAnalysis.transformations.nojump.NoJump`
   transformation. 
   
   In GROMACS, for example, this can be done using `gmx trjconv`_ with the
   ``-pbc nojump`` flag.

.. _`gmx trjconv`: https://manual.gromacs.org/current/onlinehelp/gmx-trjconv.html

.. SeeAlso::
   :mod:`MDAnalysis.transformations.nojump`


Computing an MSD
----------------
This example computes a 3D MSD for the movement of 100 particles undergoing a
random walk. Files provided as part of the MDAnalysis test suite are used
(in the variables :data:`~MDAnalysis.tests.datafiles.RANDOM_WALK` and
:data:`~MDAnalysis.tests.datafiles.RANDOM_WALK_TOPO`)

First load all modules and test data

.. code-block:: python

    import MDAnalysis as mda
    import MDAnalysis.analysis.msd as msd
    from MDAnalysis.tests.datafiles import RANDOM_WALK_TOPO, RANDOM_WALK

Given a universe containing trajectory data we can extract the MSD
analysis by using the class :class:`EinsteinMSD`

.. code-block:: python

    u = mda.Universe(RANDOM_WALK_TOPO, RANDOM_WALK)
    MSD = msd.EinsteinMSD(u, select='all', msd_type='xyz', fft=True)
    MSD.run()

The MSD can then be accessed as

.. code-block:: python

    msd =  MSD.results.timeseries
    lagtimes = MSD.results.delta_t_values

Visual inspection of the MSD is important, so let's take a look at it with a simple plot.

.. code-block:: python

    import matplotlib.pyplot as plt
    nframes = MSD.n_frames
    fig = plt.figure()
    ax = plt.axes()
    # plot the actual MSD
    ax.plot(lagtimes, msd, lc="black", ls="-", label=r'3D random walk')
    exact = lagtimes*6
    # plot the exact result
    ax.plot(lagtimes, exact, lc="black", ls="--", label=r'$y=2 D\tau$')
    plt.show()

This gives us the plot of the MSD with respect to lag-time (:math:`\tau`).
We can see that the MSD is approximately linear with respect to :math:`\tau`.
This is a numerical example of a known theoretical result that the MSD of a
random walk is linear with respect to lag-time, with a slope of :math:`2d`.
In this expression :math:`d` is the dimensionality of the MSD. For our 3D MSD,
this is 3. For comparison we have plotted the line :math:`y=6\tau` to which an
ensemble of 3D random walks should converge.

.. _figure-msd:

.. figure:: /images/msd_demo_plot.png
    :scale: 100 %
    :alt: MSD plot

Note that a segment of the MSD is required to be linear to accurately
determine self-diffusivity. This linear segment represents the so called
"middle" of the MSD plot, where ballistic trajectories at short time-lags are
excluded along with poorly averaged data at long time-lags. We can select the
"middle" of the MSD by indexing the MSD and the time-lags. Appropriately
linear segments of the MSD can be confirmed with a log-log plot as is often
reccomended :footcite:p:`Maginn2019` where the "middle" segment can be identified
as having a slope of 1.

.. code-block:: python

    plt.loglog(lagtimes, msd)
    plt.show()

Now that we have identified what segment of our MSD to analyse, let's compute
a self-diffusivity.

Computing Self-Diffusivity
--------------------------------
Self-diffusivity is closely related to the MSD.

.. math::

   D_d = \frac{1}{2d} \lim_{t \to \infty} \frac{d}{dt} MSD(r_{d})

From the MSD, self-diffusivities :math:`D` with the desired dimensionality
:math:`d` can be computed by fitting the MSD with respect to the lag-time to
a linear model. An example of this is shown below, using the MSD computed in
the example above. The segment between :math:`\tau = 20` and :math:`\tau = 60`
is used to demonstrate selection of a MSD segment.

.. code-block:: python

    from scipy.stats import linregress
    start_time = 20
    start_index = int(start_time/timestep)
    end_time = 60
    linear_model = linregress(lagtimes[start_index:end_index], msd[start_index:end_index])
    slope = linear_model.slope
    error = linear_model.stderr
    # dim_fac is 3 as we computed a 3D msd with 'xyz'
    D = slope * 1/(2*MSD.dim_fac)

We have now computed a self-diffusivity!

Combining Multiple Replicates
--------------------------------
It is common practice to combine replicates when calculating MSDs. An example
of this is shown below using MSD1 and MSD2.

.. code-block:: python

    u1 = mda.Universe(RANDOM_WALK_TOPO, RANDOM_WALK)
    MSD1 = msd.EinsteinMSD(u1, select='all', msd_type='xyz', fft=True)
    MSD1.run()

    u2 = mda.Universe(RANDOM_WALK_TOPO, RANDOM_WALK)
    MSD2 = msd.EinsteinMSD(u2, select='all', msd_type='xyz', fft=True)
    MSD2.run()

    combined_msds = np.concatenate((MSD1.results.msds_by_particle,
                                    MSD2.results.msds_by_particle), axis=1)
    average_msd = np.mean(combined_msds, axis=1)

The same cannot be achieved by concatenating the replicas in a single run as
the jump between the last frame of the first trajectory and frame 0 of the
next trajectory will lead to an artificial inflation of the MSD and hence
any subsequent diffusion coefficient calculated.

Notes
_____

There are several factors that must be taken into account when setting up and
processing trajectories for computation of self-diffusivities.
These include specific instructions around simulation settings, using
unwrapped trajectories and maintaining a relatively small elapsed time between
saved frames. Additionally, corrections for finite size effects are sometimes
employed along with various means of estimating errors
:footcite:p:`Yeh2004,Bulow2020` The reader is directed to the following review,
which describes many of the common pitfalls :footcite:p:`Maginn2019`. There are
other ways to compute self-diffusivity, such as from a Green-Kubo integral. At
this point in time, these methods are beyond the scope of this module.


Note also that computation of MSDs is highly memory intensive. If this is
proving a problem, judicious use of the ``start``, ``stop``, ``step`` keywords
to control which frames are incorporated may be required.

References
----------

.. footbibliography::


Classes
-------

.. autoclass:: EinsteinMSD
    :members:
    :inherited-members:

"""

import numpy as np
import logging
from ..due import due, Doi
from .base import AnalysisBase
from ..core import groups
from tqdm import tqdm
import collections

logger = logging.getLogger("MDAnalysis.analysis.msd")

due.cite(
    Doi("10.21105/joss.00877"),
    description="Mean Squared Displacements with tidynamics",
    path="MDAnalysis.analysis.msd",
    cite_module=True,
)
due.cite(
    Doi("10.1051/sfn/201112010"),
    description="FCA fast correlation algorithm",
    path="MDAnalysis.analysis.msd",
    cite_module=True,
)
del Doi


class EinsteinMSD(AnalysisBase):
    r"""Class to calculate Mean Squared Displacement by the Einstein relation.

    Parameters
    ----------
    u : Universe or AtomGroup
        An MDAnalysis :class:`Universe` or :class:`AtomGroup`.
        Note that :class:`UpdatingAtomGroup` instances are not accepted.
    select : str
        A selection string. Defaults to "all" in which case
        all atoms are selected.
    msd_type : {'xyz', 'xy', 'yz', 'xz', 'x', 'y', 'z'}
        Desired dimensions to be included in the MSD. Defaults to 'xyz'.
    fft : bool
        If ``True``, uses a fast FFT based algorithm for computation of
        the MSD. Otherwise, use the simple "windowed" algorithm.
        The tidynamics package is required for `fft=True`.
        Defaults to ``True``.
    non_linear : bool
        If ``True``, calculates MSD for trajectory where frames are
        non-linearly dumped. To use this set `fft=False`.
        Defaults to ``False``.

        .. versionadded:: 2.10.0


    Attributes
    ----------
    dim_fac : int
        Dimensionality :math:`d` of the MSD.
    results.timeseries : :class:`numpy.ndarray`
        The averaged MSD over all the particles with respect to constant lag-time or
        unique Δt intervals.
    results.msds_by_particle : :class:`numpy.ndarray`
        The MSD of each individual particle with respect to constant lag-time or
        unique Δt intervals.
            - for `non_linear=False`: a 2D array of shape (n_lagtimes, n_atoms)
            - for `non_linear=True`: a 2D array of shape (n_delta_t_values, n_atoms)
    results.delta_t_values : :class:`numpy.ndarray`
        Array of unique Δt (time differences) at which time-averaged MSD values are
        computed.

        .. versionadded:: 2.10.0

    ag : :class:`AtomGroup`
        The :class:`AtomGroup` resulting from your selection
    n_frames : int
        Number of frames included in the analysis.
    n_particles : int
        Number of particles MSD was calculated over.


    .. versionadded:: 2.0.0
    .. versionchanged:: 2.10.0
       Added ability to calculate MSD from samples that are not linearly spaced with the
       new `non_linear` keyword argument.
    """

    def __init__(
        self,
        u,
        select="all",
        msd_type="xyz",
        fft=True,
        non_linear=False,
        **kwargs,
    ):
        if isinstance(u, groups.UpdatingAtomGroup):
            raise TypeError(
                "UpdatingAtomGroups are not valid for MSD computation"
            )

        super(EinsteinMSD, self).__init__(u.universe.trajectory, **kwargs)

        # args
        self.select = select
        self.msd_type = msd_type
        self._parse_msd_type()
        self.fft = fft
        self.non_linear = non_linear

        # local
        self.ag = u.select_atoms(self.select)
        self.n_particles = len(self.ag)
        self._position_array = None

        # result
        self.results.msds_by_particle = None
        self.results.timeseries = None
        self.results.delta_t_values = None

    def _prepare(self):
        # self.n_frames only available here
        # these need to be zeroed prior to each run() call
        self.results.msds_by_particle = np.zeros(
            (self.n_frames, self.n_particles)
        )
        self._position_array = np.zeros(
            (self.n_frames, self.n_particles, self.dim_fac)
        )
        # self.results.timeseries not set here

    def _parse_msd_type(self):
        r"""Sets up the desired dimensionality of the MSD."""
        keys = {
            "x": [0],
            "y": [1],
            "z": [2],
            "xy": [0, 1],
            "xz": [0, 2],
            "yz": [1, 2],
            "xyz": [0, 1, 2],
        }

        self.msd_type = self.msd_type.lower()

        try:
            self._dim = keys[self.msd_type]
        except KeyError:
            raise ValueError(
                "invalid msd_type: {} specified, please specify one of xyz, "
                "xy, xz, yz, x, y, z".format(self.msd_type)
            )

        self.dim_fac = len(self._dim)

    def _single_frame(self):
        r"""Constructs array of positions for MSD calculation."""
        # shape of position array set here, use span in last dimension
        # from this point on
        self._position_array[self._frame_index] = self.ag.positions[
            :, self._dim
        ]

    def _conclude(self):
        if self.non_linear:
            self._conclude_non_linear()
        else:
            if self.fft:
                self._conclude_fft()
            else:
                self._conclude_simple()

    def _conclude_simple(self):
        r"""Calculates the MSD via the simple "windowed" algorithm."""
        lagtimes = np.arange(1, self.n_frames)
        positions = self._position_array.astype(np.float64)
        for lag in tqdm(lagtimes):
            disp = positions[:-lag, :, :] - positions[lag:, :, :]
            sqdist = np.square(disp).sum(axis=-1)
            self.results.msds_by_particle[lag, :] = np.mean(sqdist, axis=0)
        self.results.timeseries = self.results.msds_by_particle.mean(axis=1)
        self.results.delta_t_values = np.arange(self.n_frames) * (
            self.times[1] - self.times[0]
        )

    def _conclude_fft(self):  # with FFT, np.float64 bit prescision required.
        r"""Calculates the MSD via the FCA fast correlation algorithm."""
        try:
            import tidynamics
        except ImportError:
            raise ImportError(
                """ERROR --- tidynamics was not found!

                tidynamics is required to compute an FFT based MSD (default)

                try installing it using pip eg:

                    pip install tidynamics

                or set fft=False"""
            )

        positions = self._position_array.astype(np.float64)
        for n in tqdm(range(self.n_particles)):
            self.results.msds_by_particle[:, n] = tidynamics.msd(
                positions[:, n, :]
            )
        self.results.timeseries = self.results.msds_by_particle.mean(axis=1)
        self.results.delta_t_values = np.arange(self.n_frames) * (
            self.times[1] - self.times[0]
        )

    def _conclude_non_linear(self):

        n_frames = self.n_frames
        n_atoms = self.n_particles
        positions = self._position_array.astype(np.float64)
        # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}
        msd_dict = collections.defaultdict(list)
        msds_by_particle_dict = collections.defaultdict(list)

        # TODO: optimize the code
        # Looping over all the frames as if the referenced gets shifted frame to frame
        for i in range(n_frames):
            for j in range(i + 1, n_frames):
                delta_t = self.times[j] - self.times[i]
                # Compute displacement and squared displacement
                disp = positions[j] - positions[i]
                squared_disp = np.sum(disp**2, axis=1)
                msd = np.mean(squared_disp)
                # Store MSD under corresponding Δt
                msd_dict[delta_t].append(msd)
                msds_by_particle_dict[delta_t].append(squared_disp)

        msd_dict[0] = [0]
        msds_by_particle_dict[0.0] = [np.zeros(n_atoms)]

        # For each delta_t, stacked all squared_disp arrays and averaging over axis=0 (time origins)
        delta_t_values = sorted(msd_dict.keys())
        avg_msds = [np.mean(msd_dict[dt]) for dt in delta_t_values]
        msds_by_particle_array = np.zeros((len(delta_t_values), n_atoms))
        for idx, dt in enumerate(delta_t_values):
            # Stack list of arrays like -- (n_time_origins, n_atoms)
            arr = np.vstack(msds_by_particle_dict[dt])
            msds_by_particle_array[idx, :] = np.mean(arr, axis=0)

        self.results.timeseries = np.array(avg_msds)
        self.results.delta_t_values = np.array(delta_t_values)
        self.results.msds_by_particle = msds_by_particle_array