1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431
|
r"""
===============
Decoding (MVPA)
===============
.. contents:: Contents
:local:
:depth: 3
.. include:: ../git_links.inc
Design philosophy
=================
Decoding (a.k.a. MVPA) in MNE largely follows the machine
learning API of the scikit-learn package.
Each estimator implements ``fit``, ``transform``, ``fit_transform``, and
(optionally) ``inverse_transform`` methods. For more details on this design,
visit scikit-learn_. For additional theoretical insights into the decoding
framework in MNE, see [1]_.
For ease of comprehension, we will denote instantiations of the class using
the same name as the class but in small caps instead of camel cases.
Let's start by loading data for a simple two-class problem:
"""
# sphinx_gallery_thumbnail_number = 6
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import mne
from mne.datasets import sample
from mne.decoding import (SlidingEstimator, GeneralizingEstimator, Scaler,
cross_val_multiscore, LinearModel, get_coef,
Vectorizer, CSP)
data_path = sample.data_path()
raw_fname = data_path + '/MEG/sample/sample_audvis_raw.fif'
tmin, tmax = -0.200, 0.500
event_id = {'Auditory/Left': 1, 'Visual/Left': 3} # just use two
raw = mne.io.read_raw_fif(raw_fname, preload=True)
# The subsequent decoding analyses only capture evoked responses, so we can
# low-pass the MEG data. Usually a value more like 40 Hz would be used,
# but here low-pass at 20 so we can more heavily decimate, and allow
# the examlpe to run faster. The 2 Hz high-pass helps improve CSP.
raw.filter(2, 20)
events = mne.find_events(raw, 'STI 014')
# Set up pick list: EEG + MEG - bad channels (modify to your needs)
raw.info['bads'] += ['MEG 2443', 'EEG 053'] # bads + 2 more
picks = mne.pick_types(raw.info, meg='grad', eeg=False, stim=True, eog=True,
exclude='bads')
# Read epochs
epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True,
picks=picks, baseline=(None, 0.), preload=True,
reject=dict(grad=4000e-13, eog=150e-6), decim=10)
epochs.pick_types(meg=True, exclude='bads') # remove stim and EOG
X = epochs.get_data() # MEG signals: n_epochs, n_meg_channels, n_times
y = epochs.events[:, 2] # target: Audio left or right
###############################################################################
# Transformation classes
# ======================
#
# Scaler
# ^^^^^^
# The :class:`mne.decoding.Scaler` will standardize the data based on channel
# scales. In the simplest modes ``scalings=None`` or ``scalings=dict(...)``,
# each data channel type (e.g., mag, grad, eeg) is treated separately and
# scaled by a constant. This is the approach used by e.g.,
# :func:`mne.compute_covariance` to standardize channel scales.
#
# If ``scalings='mean'`` or ``scalings='median'``, each channel is scaled using
# empirical measures. Each channel is scaled independently by the mean and
# standand deviation, or median and interquartile range, respectively, across
# all epochs and time points during :class:`~mne.decoding.Scaler.fit`
# (during training). The :meth:`~mne.decoding.Scaler.transform` method is
# called to transform data (training or test set) by scaling all time points
# and epochs on a channel-by-channel basis. To perform both the ``fit`` and
# ``transform`` operations in a single call, the
# :meth:`~mne.decoding.Scaler.fit_transform` method may be used. To invert the
# transform, :meth:`~mne.decoding.Scaler.inverse_transform` can be used. For
# ``scalings='median'``, scikit-learn_ version 0.17+ is required.
#
# .. note:: Using this class is different from directly applying
# :class:`sklearn.preprocessing.StandardScaler` or
# :class:`sklearn.preprocessing.RobustScaler` offered by
# scikit-learn_. These scale each *classification feature*, e.g.
# each time point for each channel, with mean and standard
# deviation computed across epochs, whereas
# :class:`mne.decoding.Scaler` scales each *channel* using mean and
# standard deviation computed across all of its time points
# and epochs.
#
# Vectorizer
# ^^^^^^^^^^
# Scikit-learn API provides functionality to chain transformers and estimators
# by using :class:`sklearn.pipeline.Pipeline`. We can construct decoding
# pipelines and perform cross-validation and grid-search. However scikit-learn
# transformers and estimators generally expect 2D data
# (n_samples * n_features), whereas MNE transformers typically output data
# with a higher dimensionality
# (e.g. n_samples * n_channels * n_frequencies * n_times). A Vectorizer
# therefore needs to be applied between the MNE and the scikit-learn steps
# like:
# Uses all MEG sensors and time points as separate classification
# features, so the resulting filters used are spatio-temporal
clf = make_pipeline(Scaler(epochs.info),
Vectorizer(),
LogisticRegression(solver='lbfgs'))
scores = cross_val_multiscore(clf, X, y, cv=5, n_jobs=1)
# Mean scores across cross-validation splits
score = np.mean(scores, axis=0)
print('Spatio-temporal: %0.1f%%' % (100 * score,))
###############################################################################
# PSDEstimator
# ^^^^^^^^^^^^
# The :class:`mne.decoding.PSDEstimator`
# computes the power spectral density (PSD) using the multitaper
# method. It takes a 3D array as input, converts it into 2D and computes the
# PSD.
#
# FilterEstimator
# ^^^^^^^^^^^^^^^
# The :class:`mne.decoding.FilterEstimator` filters the 3D epochs data.
#
# Spatial filters
# ===============
#
# Just like temporal filters, spatial filters provide weights to modify the
# data along the sensor dimension. They are popular in the BCI community
# because of their simplicity and ability to distinguish spatially-separated
# neural activity.
#
# Common spatial pattern
# ^^^^^^^^^^^^^^^^^^^^^^
#
# :class:`mne.decoding.CSP` is a technique to analyze multichannel data based
# on recordings from two classes [2]_ (see also
# http://en.wikipedia.org/wiki/Common_spatial_pattern).
#
# Let :math:`X \in R^{C\times T}` be a segment of data with
# :math:`C` channels and :math:`T` time points. The data at a single time point
# is denoted by :math:`x(t)` such that :math:`X=[x(t), x(t+1), ..., x(t+T-1)]`.
# Common spatial pattern (CSP) finds a decomposition that projects the signal
# in the original sensor space to CSP space using the following transformation:
#
# .. math:: x_{CSP}(t) = W^{T}x(t)
# :label: csp
#
# where each column of :math:`W \in R^{C\times C}` is a spatial filter and each
# row of :math:`x_{CSP}` is a CSP component. The matrix :math:`W` is also
# called the de-mixing matrix in other contexts. Let
# :math:`\Sigma^{+} \in R^{C\times C}` and :math:`\Sigma^{-} \in R^{C\times C}`
# be the estimates of the covariance matrices of the two conditions.
# CSP analysis is given by the simultaneous diagonalization of the two
# covariance matrices
#
# .. math:: W^{T}\Sigma^{+}W = \lambda^{+}
# :label: diagonalize_p
# .. math:: W^{T}\Sigma^{-}W = \lambda^{-}
# :label: diagonalize_n
#
# where :math:`\lambda^{C}` is a diagonal matrix whose entries are the
# eigenvalues of the following generalized eigenvalue problem
#
# .. math:: \Sigma^{+}w = \lambda \Sigma^{-}w
# :label: eigen_problem
#
# Large entries in the diagonal matrix corresponds to a spatial filter which
# gives high variance in one class but low variance in the other. Thus, the
# filter facilitates discrimination between the two classes.
#
# .. topic:: Examples
#
# * :ref:`sphx_glr_auto_examples_decoding_plot_decoding_csp_eeg.py`
# * :ref:`sphx_glr_auto_examples_decoding_plot_decoding_csp_timefreq.py`
#
# .. note::
#
# The winning entry of the Grasp-and-lift EEG competition in Kaggle used
# the :class:`~mne.decoding.CSP` implementation in MNE and was featured as
# a `script of the week`_.
#
# .. _script of the week: http://blog.kaggle.com/2015/08/12/july-2015-scripts-of-the-week/ # noqa
#
# We can use CSP with these data with:
csp = CSP(n_components=3, norm_trace=False)
clf = make_pipeline(csp, LogisticRegression(solver='lbfgs'))
scores = cross_val_multiscore(clf, X, y, cv=5, n_jobs=1)
print('CSP: %0.1f%%' % (100 * scores.mean(),))
###############################################################################
# Source power comodulation (SPoC)
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Source Power Comodulation (:class:`mne.decoding.SPoC`) [3]_
# identifies the composition of
# orthogonal spatial filters that maximally correlate with a continuous target.
#
# SPoC can be seen as an extension of the CSP where the target is driven by a
# continuous variable rather than a discrete variable. Typical applications
# include extraction of motor patterns using EMG power or audio patterns using
# sound envelope.
#
# .. topic:: Examples
#
# * :ref:`sphx_glr_auto_examples_decoding_plot_decoding_spoc_CMC.py`
#
# xDAWN
# ^^^^^
# :class:`mne.preprocessing.Xdawn` is a spatial filtering method designed to
# improve the signal to signal + noise ratio (SSNR) of the ERP responses [4]_.
# Xdawn was originally
# designed for P300 evoked potential by enhancing the target response with
# respect to the non-target response. The implementation in MNE-Python is a
# generalization to any type of ERP.
#
# .. topic:: Examples
#
# * :ref:`sphx_glr_auto_examples_preprocessing_plot_xdawn_denoising.py`
# * :ref:`sphx_glr_auto_examples_decoding_plot_decoding_xdawn_eeg.py`
#
# Effect-matched spatial filtering
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# The result of :class:`mne.decoding.EMS` is a spatial filter at each time
# point and a corresponding time course [5]_.
# Intuitively, the result gives the similarity between the filter at
# each time point and the data vector (sensors) at that time point.
#
# .. topic:: Examples
#
# * :ref:`sphx_glr_auto_examples_decoding_plot_ems_filtering.py`
#
# Patterns vs. filters
# ^^^^^^^^^^^^^^^^^^^^
#
# When interpreting the components of the CSP (or spatial filters in general),
# it is often more intuitive to think about how :math:`x(t)` is composed of
# the different CSP components :math:`x_{CSP}(t)`. In other words, we can
# rewrite Equation :eq:`csp` as follows:
#
# .. math:: x(t) = (W^{-1})^{T}x_{CSP}(t)
# :label: patterns
#
# The columns of the matrix :math:`(W^{-1})^T` are called spatial patterns.
# This is also called the mixing matrix. The example
# :ref:`sphx_glr_auto_examples_decoding_plot_linear_model_patterns.py`
# discusses the difference between patterns and filters.
#
# These can be plotted with:
# Fit CSP on full data and plot
csp.fit(X, y)
csp.plot_patterns(epochs.info)
csp.plot_filters(epochs.info, scalings=1e-9)
###############################################################################
# Decoding over time
# ==================
#
# This strategy consists in fitting a multivariate predictive model on each
# time instant and evaluating its performance at the same instant on new
# epochs. The :class:`mne.decoding.SlidingEstimator` will take as input a
# pair of features :math:`X` and targets :math:`y`, where :math:`X` has
# more than 2 dimensions. For decoding over time the data :math:`X`
# is the epochs data of shape n_epochs x n_channels x n_times. As the
# last dimension of :math:`X` is the time, an estimator will be fit
# on every time instant.
#
# This approach is analogous to SlidingEstimator-based approaches in fMRI,
# where here we are interested in when one can discriminate experimental
# conditions and therefore figure out when the effect of interest happens.
#
# When working with linear models as estimators, this approach boils
# down to estimating a discriminative spatial filter for each time instant.
#
# Temporal decoding
# ^^^^^^^^^^^^^^^^^
#
# We'll use a Logistic Regression for a binary classification as machine
# learning model.
# We will train the classifier on all left visual vs auditory trials on MEG
clf = make_pipeline(StandardScaler(), LogisticRegression(solver='lbfgs'))
time_decod = SlidingEstimator(clf, n_jobs=1, scoring='roc_auc', verbose=True)
scores = cross_val_multiscore(time_decod, X, y, cv=5, n_jobs=1)
# Mean scores across cross-validation splits
scores = np.mean(scores, axis=0)
# Plot
fig, ax = plt.subplots()
ax.plot(epochs.times, scores, label='score')
ax.axhline(.5, color='k', linestyle='--', label='chance')
ax.set_xlabel('Times')
ax.set_ylabel('AUC') # Area Under the Curve
ax.legend()
ax.axvline(.0, color='k', linestyle='-')
ax.set_title('Sensor space decoding')
###############################################################################
# You can retrieve the spatial filters and spatial patterns if you explicitly
# use a LinearModel
clf = make_pipeline(StandardScaler(),
LinearModel(LogisticRegression(solver='lbfgs')))
time_decod = SlidingEstimator(clf, n_jobs=1, scoring='roc_auc', verbose=True)
time_decod.fit(X, y)
coef = get_coef(time_decod, 'patterns_', inverse_transform=True)
evoked = mne.EvokedArray(coef, epochs.info, tmin=epochs.times[0])
joint_kwargs = dict(ts_args=dict(time_unit='s'),
topomap_args=dict(time_unit='s'))
evoked.plot_joint(times=np.arange(0., .500, .100), title='patterns',
**joint_kwargs)
###############################################################################
# Temporal generalization
# ^^^^^^^^^^^^^^^^^^^^^^^
#
# Temporal generalization is an extension of the decoding over time approach.
# It consists in evaluating whether the model estimated at a particular
# time instant accurately predicts any other time instant. It is analogous to
# transferring a trained model to a distinct learning problem, where the
# problems correspond to decoding the patterns of brain activity recorded at
# distinct time instants.
#
# The object to for Temporal generalization is
# :class:`mne.decoding.GeneralizingEstimator`. It expects as input :math:`X`
# and :math:`y` (similarly to :class:`~mne.decoding.SlidingEstimator`) but
# generates predictions from each model for all time instants. The class
# :class:`~mne.decoding.GeneralizingEstimator` is generic and will treat the
# last dimension as the one to be used for generalization testing. For
# convenience, here, we refer to it as different tasks. If :math:`X`
# corresponds to epochs data then the last dimension is time.
#
# This runs the analysis used in [6]_ and further detailed in [7]_:
# define the Temporal generalization object
time_gen = GeneralizingEstimator(clf, n_jobs=1, scoring='roc_auc',
verbose=True)
scores = cross_val_multiscore(time_gen, X, y, cv=5, n_jobs=1)
# Mean scores across cross-validation splits
scores = np.mean(scores, axis=0)
# Plot the diagonal (it's exactly the same as the time-by-time decoding above)
fig, ax = plt.subplots()
ax.plot(epochs.times, np.diag(scores), label='score')
ax.axhline(.5, color='k', linestyle='--', label='chance')
ax.set_xlabel('Times')
ax.set_ylabel('AUC')
ax.legend()
ax.axvline(.0, color='k', linestyle='-')
ax.set_title('Decoding MEG sensors over time')
###############################################################################
# Plot the full (generalization) matrix:
fig, ax = plt.subplots(1, 1)
im = ax.imshow(scores, interpolation='lanczos', origin='lower', cmap='RdBu_r',
extent=epochs.times[[0, -1, 0, -1]], vmin=0., vmax=1.)
ax.set_xlabel('Testing Time (s)')
ax.set_ylabel('Training Time (s)')
ax.set_title('Temporal generalization')
ax.axvline(0, color='k')
ax.axhline(0, color='k')
plt.colorbar(im, ax=ax)
###############################################################################
# Source-space decoding
# =====================
#
# Source space decoding is also possible, but because the number of features
# can be much larger than in the sensor space, univariate feature selection
# using ANOVA f-test (or some other metric) can be done to reduce the feature
# dimension. Interpreting decoding results might be easier in source space as
# compared to sensor space.
#
# .. topic:: Examples
#
# * :ref:`tut_dec_st_source`
#
# Exercise
# ========
#
# - Explore other datasets from MNE (e.g. Face dataset from SPM to predict
# Face vs. Scrambled)
#
# References
# ==========
# .. [1] Jean-Rémi King et al. (2018) "Encoding and Decoding Neuronal Dynamics:
# Methodological Framework to Uncover the Algorithms of Cognition",
# in press. https://hal.archives-ouvertes.fr/hal-01848442/
# .. [2] Zoltan J. Koles. The quantitative extraction and topographic mapping
# of the abnormal components in the clinical EEG. Electroencephalography
# and Clinical Neurophysiology, 79(6):440--447, December 1991.
# .. [3] Dahne, S., Meinecke, F. C., Haufe, S., Hohne, J., Tangermann, M.,
# Muller, K. R., & Nikulin, V. V. (2014). SPoC: a novel framework for
# relating the amplitude of neuronal oscillations to behaviorally
# relevant parameters. NeuroImage, 86, 111-122.
# .. [4] Rivet, B., Souloumiac, A., Attina, V., & Gibert, G. (2009). xDAWN
# algorithm to enhance evoked potentials: application to
# brain-computer interface. Biomedical Engineering, IEEE Transactions
# on, 56(8), 2035-2043.
# .. [5] Aaron Schurger, Sebastien Marti, and Stanislas Dehaene, "Reducing
# multi-sensor data to a single time course that reveals experimental
# effects", BMC Neuroscience 2013, 14:122
# .. [6] Jean-Remi King, Alexandre Gramfort, Aaron Schurger, Lionel Naccache
# and Stanislas Dehaene, "Two distinct dynamic modes subtend the
# detection of unexpected sounds", PLOS ONE, 2013,
# http://www.ncbi.nlm.nih.gov/pubmed/24475052
# .. [7] King & Dehaene (2014) 'Characterizing the dynamics of mental
# representations: the temporal generalization method', Trends In
# Cognitive Sciences, 18(4), 203-210.
# http://www.ncbi.nlm.nih.gov/pubmed/24593982
|