1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
|
<!-- header fragment for html documentation -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="description" CONTENT="Estimation of population parameters using
genetic data using a maximum likelihood approach with Metropolis-Hastings
Monte Carlo Markov chain importance sampling">
<META NAME="keywords" CONTENT="MCMC, Markov chain, Monte Carlo,
Metropolis-Hastings, population, parameters, migration rate, population
size, recombination rate, maximum likelihood">
<TITLE>LAMARC Documentation: Panel Corrections</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF"> <!-- coalescent, coalescence, Markov chain Monte
Carlo simulation, migration rate, effective population size, recombination
rate, maximum likelihood -->
<P>(<A HREF="divergence.html">Back</A> | <A HREF="index.html">Contents</A> |
<A HREF="converter_cmd.html">Next</A>)</P>
<H2>Panel Corrections</H2>
<UL>
<LI><A HREF="panels.html#overview">Overview </A></LI>
<LI><A HREF="panels.html#must">Must I Use Panel Correction? </A></LI>
<LI> <A HREF="panels.html#converter">Defining Panel Counts via Converter</a></LI>
<LI> <A HREF="panels.html#xmlsyntax">Defining Panel Counts via XML</a></LI>
</UL>
<H3><A NAME="overview">Overview</H3>
<p>
SNP panel data is an increasingly popular input to sequence analysis programs.
Unlike fully sequenced data, this input type is the result of a two-step
process:
<ul>
<li> a <u>panel</u> of samples is chosen to span the potential variation in a given organism or population,
and these samples are fully sequenced, after which
<li> a <u>SNP chip</u> is created to quickly assay the variaton in additional samples, but
only at those positions at which the original panel members varied.
</ul>
</p>
<p>
This process is popular because it can be used to inexpensively gather precise estimates of the
rates of variation at the measured SNPs without the cost of exhaustively sequencing all study samples.
While this type of data can be used to measure patterns of common variation in genetic data,
it is not ideal for coalescent analysis.
</p>
<p>
Like other coalescent programs, LAMARC searches the space of coalescent trees, concentrating
on those matching the input sequence data best.
The SNP panel process removes most of the information available about recent variation,
which is where most of the tree structure exists.
LAMARC's panel correction feature compensates for this loss of data.
If it is not used when it should be, estimates of Theta will be artificially low.
</p>
<H3><A NAME="must">Must I Use Panel Correction?</H3>
<p> If your sequence data came from a SNP chip, you must use panel correction.
Otherwise you will underestimate Theta, and if you turn on estimation of growth
you will badly underestimate the growth rate. The strength of this bias
is worst for small panels, but we do not recommend uncorrected use of panel SNP
data from panels of any size.</p>
<p> Additional information beyond the produced sequences is required. You
need to know:
<ul>
<li> how many (haplotype) sequences were used to create the panel, and
<li> (if the organism has population structure) which sub-populations
the panel members came from.
</ul>
</p>
<p>
This information is not always readily available.
If you are using a commercial SNP chip, you may need to contact
the manufacturer for this information.
</p>
<p> If you designed and had fabricated a SNP chip created from your own panel
members, you should consider performing your coalescent analysis on
the original, completely sequenced set of panel members.
The SNP chip will be useful for measuring allele frequencies, but searching
coalescent trees is much more accurate on data that is not missing rare
and low-frequency SNPs.
</p>
<p> Additionally, you should determine if your SNP chip was constructed with
a minor allele frequency cutoff, in which alleles occuring with low frequency
in the panel are omitted from the chip. It is critical to know what cutoff was
used.
<H3>Possible Panel Correction Gotchas</H3>
<P>
<font color="#FF0000">Caution:</font>
results are highly dependent on having accurate panel size information. Therefore,
</P>
<ul>
<li><p>If you don't know how many panel sequences were used to create the SNP chip,
<b>Do Not Estimate The Count!</b> You cannot perform an accurate coalescent
analysis on your data.</p>
<li><p>If your panel SNPs are limited to positions which met a Minor Allele Frequency
(MAF) cutoff, your panel size must be adjusted to compensate for this effect. The
procedure is described in McGill, Walkup and Kuhner, Genetics 193(4): 1185-1196
(2013). We are happy to help with this correction on email request.
</p>
</ul>
<H3><A NAME="converter">Defining Panel Counts via the Converter</H3>
<p> By default, panel corrections are not turned on in the converter.</p>
<p><img src="batch_converter/images/DataPartitionsMigTab.png" alt="Converter before panels are turned on"/></p>
<p> To turn panels on, double click inside the box labeled "Use Panels".
The initial panel size will be zero, which is equivalent to no panel correction.</P>
<p><img src="batch_converter/images/PanelCorrectionOn.png" alt="Panel Correction button on so Panel data is visible"/></p>
<p>To edit the count and optionally name the panel, double click inside the box labeled "panel". It will look like this:</p>
<p><img src="batch_converter/images/EditPanelCorrection.png" alt="Panel Correction edit window"/></p>
<p>One possible source of confusion is that Panels are associated with Region / Population pairs, not with Contiguous Segments
(loci close enough to each other to be in linkage disequilibrium).
If you combine two or more Regions in the Converter Editor, the Loci remain separate, but they are now associated with the same Region.
The other selected Regions are eliminated by the combine process. This means that Panel information associated with any
Population that spaned the initial Region set must be consolidated. As there is no way to know which member count is correct, the
largest value becomes the value for the combined Region. The resulting screen (created by separating the orignal input file into two separate files,
reading them both in, and combining the Regions) looks like this:</p>
<p><img src="batch_converter/images/CombinedPanels.png" alt="Panel definition spanning two Loci"/></p>
<p>
<p>The other side of this is that if you decide to separate multiple Loci that are associated with a single Region (as above), multiple new Regions are produced. The Panel member count associated with the original Region will be associated with the Panels now attached to each of the new Regions.</p>
<H3><A NAME="xmlsyntax">Defining Panel Counts via XML</H3>
<p>You can add a panel definition directly into an existing LAMARC XML file via hand-editing.
See the <a href="xmlinput.html#panel-xml">XML Input Format documentation on panels</a>
for an example.
</p>
<P>(<A HREF="divergence.html">Back</A> | <A HREF="index.html">Contents</A> |
<A HREF="converter_cmd.html">Next</A>)</P>
<!--
//$Id: panels.html,v 1.8 2014/09/08 21:05:23 mkkuhner Exp $
-->
</BODY>
</HTML>
|