File: s2_efa.rst

package info (click to toggle)
bioxtasraw 2.3.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 275,072 kB
  • sloc: python: 74,496; makefile: 29; sh: 21
file content (302 lines) | stat: -rw-r--r-- 16,864 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
Advanced SEC-SAXS processing – Evolving factor analysis (EFA)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. _raw_efa:

Sometimes SEC fails to fully separate out different species, and you end up with overlapping
peaks in your SEC-SAXS curve. It is possible to apply more advanced mathematical techniques
to determine if there are multiple species of macromolecule in a SEC-SAXS peak, and to attempt
to extract out scattering profiles for each component in an overlapping peak.
:ref:`Singular value decomposition (SVD) <raw_svd>` can be used to help determine how many distinct scatterers are in a
SEC-SAXS peak. Evolving factor analysis (EFA) is an extension of SVD that can extract individual
components from overlapping SEC-SAXS peaks. Note that the first step of EFA is
doing SVD, but that happens entirely within the EFA analysis window. The SVD
window does not need to be opened before doing EFA. This tutorial covers
EFA.

:ref:`REGALS <raw_regals>` is a similar deconvolution technique, but can be
applied in cases where there are components that are not strictly
first-in-first-out and EFA would fail. EFA is recommended for standard
SEC-SAXS data, but for more complex data, such as ion exchange chromatography,
or time resolved or titration data you should use REGALS. REGALS can also
handle deconvolution of SEC-SAXS data with a sloping baseline.

If you use EFA in RAW, in addition to citing the RAW paper, please cite the
EFA paper: S. P. Meisburger, A. B. Taylor, C. A. Khan, S. Zhang, P. F.
Fitzpatrick, N. Ando. Journal of the American Chemical Society (2016). 138(20),
6506-6516. DOI: `10.1021/jacs.6b01563 <https://doi.org/10.1021/jacs.6b01563>`_

A video version of this tutorial is available:

.. raw:: html

    <style>.embed-container { position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden; max-width: 100%; } .embed-container iframe, .embed-container object, .embed-container embed { position: absolute; top: 0; left: 0; width: 100%; height: 100%; }</style><div class='embed-container'><iframe src='https://www.youtube.com/embed/U2bSg20mU8s' frameborder='0' allowfullscreen></iframe></div>

The written version of the tutorial follows.


#.  Clear all of the data in RAW. Load the **phehc_sec.hdf5** file in the **series_data** folder.

    *   *Note:* The data were provided by the Ando group at Cornell University
        and is some of the data used in the paper: *Domain Movements upon Activation of
        Phenylalanine Hydroxylase Characterized by Crystallography and Chromatography-Coupled
        Small-Angle X-ray Scattering*\ . Steve P. Meisburger, Alexander B. Taylor, Crystal
        A. Khan, Shengnan Zhang, Paul F. Fitzpatrick, and Nozomi Ando. Journal of the
        American Chemical Society 2016 138 (20), 6506-6516. `DOI: 10.1021/jacs.6b01563
        <https://dx.doi.org/10.1021/jacs.6b01563>`_

    |efa_series_plot_png|

#.  We will use EFA to extract out the two scattering components in the main
    peak in this data. Right click on the **phehc_sec.hdf5** item in the Series
    list. Select the “EFA” option.

#.  The EFA window will be displayed. On the left are controls, on the right are plots of
    the value of the singular values and the first autocorrelation of the left and right
    singular vectors.

    *   *Note:* Large singular values indicate significant components. What matters is the relative
        magnitude, that is, whether the value is large relative to the mostly flat/unchanging
        value of high index singular values.

    *   *Note:* A large autocorrelation indicates that the singular vector is varying smoothly,
        while a low autocorrelation indicates the vector is very noisy. Vectors corresponding to
        significant components will tend to have autocorrelations near 1 (roughly, >0.6-0.7) and
        vectors corresponding to insignificant components will tend to have autocorrelations near 0.

    |efa_panel_png|

#.  For successful EFA, you want to use Subtracted data, and you often want to have
    a buffer region before and after the sample. For this data set, using the entire
    frame range (from 0 to 385) is appropriate. With other data sets, you may need to
    change the frame range to, for example, remove other, well separated, peaks from the
    analysis.

        *   *Tip:* If you have a dataset where you have a large number of components,
            such as 4+, it can be useful to set the EFA range to isolate just
            2-3 of those components. The more components you have, the harder
            it is to do the EFA. There is a trade off in the amount of data
            used (more is better), and the number of components in the
            deconvolution (less is better) that requires some experimentation
            to find the right balance for a given dataset.

#.  RAW attempts to automatically determine how many significant singular values (SVs) there
    are in the selected range. This corresponds to the number of significant scattering
    components in solution that EFA will attempt to deconvolve. At the bottom of
    the control panel, you should see that RAW thinks there are three significant
    SVs (scattering components) in our data. For this data set, that is accurate.
    We evaluate the number of significant components by how many singular values
    are above the baseline, and how many components have both left and right singular
    vectors with autocorrelations near one. For this data there are three singular
    values above baseline, and three singular vectors with autocorrelations near
    1 (see step 3).

    |efa_components_png|

    *   *Note:* Typically you want the number of significant singular values and
        the number of singular vectors with autocorrelations near 1 to be equal.
        If they aren't, it likely indicates a weak or otherwise poorly resolved
        component in the dataset. Try the deconvolution first with the lower then
        the higher number of components.

    *   *Note:* RAW can find the wrong number of components automatically. You will
        always want to double check this automatic determination against the SVD results in
        the plots. If you change the data range used (or data type), the number
        of components will not automatically update so you should check and update
        it if necessary.

#.  Click the “Next” button in the lower right-hand corner of the window to advance to
    the second stage of the EFA analysis.

    *   *Note:* It may take some time to compute the necessary values for this next step,
        so be patient.

    |efa_panel_2_png|

#.  This step shows you the “Forward EFA” and “Backward EFA” plots. These plots represent
    the value of the singular values as a function of frame.

    *   *Note:* There is one more singular value displayed on each plot than available in
        the controls. This is so that in the following Steps you can determine where each
        component deviates from the baseline.

#.  In the User Input panel, tweak the “Forward” value start frames so that the frame
    number, as indicated by the open circle on the plot, aligns with where the singular
    value first starts to increase quickly. This should be around 147, 164, and 322.

    *   *Note:* For the Forward EFA plot, SVD is run on just the first two frames, then
        the first three, and so on, until all frames in the range are included. As more
        frames are added, the singular values change, as shown on the plot. When a singular
        value starts increasingly sharply, it indicates that there is a new scattering
        component in the scattering profile measured at that point. So, for the first ~150
        frames, there are no new scattering components (i.e. just buffer scattering). At
        frame ~147, we see the first singular value (the singular value with index 0,
        labeled SV 0 on the plot) start to strongly increase, showing that we have gained
        a scattering component. We see SV 1 start to increase at ~164, indicating another
        scattering component starting to be present in the data.

#.  In the User Input panel, tweak the “Backward” value start frames so that the frame
    number, as indicated by the open circle on the plot, aligns with where
    the singular value drops back to near the baseline. This should be around
    383, 360, and 200.

    *   *Note:* For the Backward EFA plot, SVD is run on just the last two frames, then the
        last three, and so on, until all frames in the range are included. As more frames are
        added, the singular values change, as shown on the plot. When a singular value
        drops back to baseline, it indicates that a scattering component is leaving
        the dataset at that point.

    *   *Note:* The algorithm for determining the start and end points is not particularly
        advanced. For some datasets you may need to do significantly more adjustment of these values

    |efa_ranges_png|

#.  Click the “Next” button in the bottom right corner to move to the last stage of the
    EFA analysis.

    |efa_panel_3_png|

#.  This window shows controls on the left and results on the right. In the controls area,
    at the top is a plot showing the SEC-SAXS curve, along with the ranges occupied by
    each scattering component, as determined from the input on the Forward and Backward
    EFA curves in stage 2 of the analysis. The colors of the ranges correspond to the
    colors labeled in the Scattering Profiles plot on the top right and the Concentration
    plot in the lower right. This panel takes the SVD vectors and rotates them back into
    scattering vectors corresponding to real components.

    *   *Note:* This rotation is not guaranteed to be successful, or to give you valid
        scattering vectors. Any data obtained via this method should be supported in other
        ways, either using other methods of deconvolving the peak, other biophysical or
        biochemical data, or both!

#.  This rotation looks quite good, as judged by the reasonable profiles, concentration
    peaks, and relatively flat chi^2 vs. frame plot. However, you don't always pick the
    right ranges the first time, sometimes some fine tuning is necessary. To simulate
    this, in the “Component Range Controls” set the ranges back to the original
    default values found by RAW: 151 to 193, 164 to 322, and 319 to 347.

#.  After making these adjustments, you should see some spikes in the chi^2 values.

    |efa_poor_rotation_png|

#.  Fine tune the ranges using the controls in the “Component Range Controls” box.
    Adjust the starts and ends of Ranges 0 and 1 and the end of Range 2 by a few points
    until the spikes in the chi-squared plot go away. After these adjustments, Range 0
    should be about 142 to 198, Range 1 from 161 to 322, and Range 2 from 319 to 360.

    *   *Note:* These ranges are a little different from what you previously found,
        particularly the end of ranges 1 and 2. This likely means that there is
        very little (or no) contribution of those components in the extended range.
        You can verify this by setting the ends of those ranges back to 360 and 383
        respectively and looking at the concentration profiles, you'll see that the
        profiles are essentially zero in the more extended ranges. It's usually
        a good idea to minimize the component range to avoid introducing contamination
        from other components. In this case, you could do that by narrowing the
        range of the components until you start to see chi^2 spikes, then returning
        to the last good value.

#.  To see these changes on the Forward and Backward EFA plots, click the “Back” button
    at the bottom right of the page. Verify that all of your start and end values are
    close to where the components become significant, as discussed in Steps 8 and 9.

#.  Click the “Next” button to return to the final stage of the EFA analysis.

#.  In the Rotation Controls box, you can set the method, the number of iterations, the
    convergence threshold, and whether you're starting with the previous results.
    As you can see in the Status window, the rotation was successful for this
    data. If it was not, you could try changing methods or adjusting the number
    of iterations or threshold.

    *   *Tip:* If it takes a while to run EFA every time you change a component,
        you can speed up the convergence by starting with the previous results.
        To do so, you would check the "Start with previous results" box. This
        will allow you to quickly iterate on changes, as long as the magnitude
        of the change is relatively small. Just be sure to set the convergence
        criteria back to not using previous results to do your final EFA run,
        as you can bias the rotation with the previous results and guide the
        EFA into a solution that is path dependent and thus isn't reproducible later.

#.  Examine the chi-squared plot. It should be uniformly close to 1 for good EFA. For
    this data, it is.

#.  Examine the concentration plot. You’ll see three peaks, corresponding to the
    concentrations for the three components. In the Range Controls, uncheck the Range
    0 C>=0 box. That removes the constraint that the concentration must be positive.
    If this results in a significant change in the peak, your EFA analysis is likely
    poor, and you should not trust your results.

    *   *Note:* The height of the concentration peaks is arbitrary, all peaks are
        normalized to have an area of 1.

#.  Uncheck all of the C>=0 controls.

    *   *Question:* Do you observe any significant changes in the scattering profiles,
        chi-squared, or concentration when you do this? How about if you uncheck one and
        leave the others checked?

#.  Recheck all of the C>=0 controls. You have now verified, as much as you can, that
    the EFA analysis is giving you reasonable results.

#.  *Reminder:* Here are the verification steps we have carried out, and you should carry
    out every time you do EFA:

        #.  Confirm that your selected ranges correspond to the start points of the
            Forward and Backward EFA values (Steps 12-13).

        #.  Confirm that your chi-squared plot is close to 1, without any major
            spikes.

        #.  Confirm that your concentrations are not significantly altered by
            constraining the concentration to be positive (Steps 17-19).

#.  Click the “Save EFA Data (not profiles)” to save the EFA data, including the SVD,
    the Forward and Backward EFA data, the chi-squared, and the concentration, along
    with information about the selected ranges and the rotation method used.

#.  Click the “Done” button to send the scattering profiles to the Profiles Plot.

#.  In the main RAW window, go to the Profiles control tab and the Profiles plot. If
    it is not already, put the Profiles plot on a semi-Log or Log-Log scale.

    |efa_profiles_png|

#.  The three scattering profiles from EFA are in the manipulation list. The labels _0,
    _1, and _2 correspond to the 0, 1, and 2 components/ranges.

    *   *Note:* Regardless of whether you use subtracted or unsubtracted data, these
        scattering profiles will be buffer subtracted, as the buffer represents a
        scattering component itself, and so (in theory) even if it is present will be
        separated out by successful EFA.


Note: By default, RAW bins the profiles before doing an SVD and calculating the
evolving factor plots, in order to speed up the process. The final EFA rotation is
done on the full unbinned dataset. You can turn binning on and off and adjust
the binning parameters in the Series options panel in the Advanced Options window.

Note 2: If you save a pdf report of a series that has EFA analysis done on it, a
summary of the EFA analysis and the various plots is saved in the report.

.. |efa_series_plot_png| image:: images/efa_series_plot.png
    :target: ../_images/efa_series_plot.png

.. |efa_panel_png| image:: images/efa_panel.png
    :target: ../_images/efa_panel.png

.. |efa_components_png| image:: images/efa_components.png
    :target: ../_images/efa_components.png

.. |efa_panel_2_png| image:: images/efa_panel_2.png
    :target: ../_images/efa_panel_2.png

.. |efa_ranges_png| image:: images/efa_ranges.png
    :width: 200 px
    :target: ../_images/efa_ranges.png

.. |efa_panel_3_png| image:: images/efa_panel_3.png
    :target: ../_images/efa_panel_3.png

.. |efa_poor_rotation_png| image:: images/efa_poor_rotation.png
    :target: ../_images/efa_poor_rotation.png

.. |efa_profiles_png| image:: images/efa_profiles.png
    :target: ../_images/efa_profiles.png