File: Configurations.html

package info (click to toggle)
shasta 0.14.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 29,636 kB
  • sloc: cpp: 82,262; python: 2,348; makefile: 222; sh: 143
file content (311 lines) | stat: -rw-r--r-- 9,049 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
<!DOCTYPE html>
<html>

<head>
<link rel=stylesheet href=style.css />
<link rel=icon href=CZI-new-logo.png />
</head>

<body>
<main>
<div class="goto-index"><a href="index.html">Table of contents</a></div>

<h1>Assembly configurations</h1>

<p>
Shasta provides a number of 
<a href="CommandLineOptions.html">command line options</a> 
that can be used to set computational parameters and thresholds
for assemblies. 

All of these options have default values,
but the default values are not necessarily optimal for
any particular combination of a number of factors:

<ul>
<li>The technology used to generate the reads.
Technologies currently available to generate the long reads
supported by Shasta are 
<a href="https://nanoporetech.com/">Oxford Nanopore</a> (ONT) and 
<a href="https://www.pacb.com">Pacific BioSciences</a> (HiFi and others).
<li>The amount of coverage available (average number of reads
overlapping each genome region).
<li>The characteristics of the genome being sequenced, including
heterozygosity, ploidy, and repeats content.
</ul>

To adjust to these and other factors, 
options adjustments are generally necessary to achieve
good quality assemblies.
To facilitate the process of generating useful assembly options
for a particular situation, Shasta uses <b>assembly configurations</b>.
An assembly configuration is a predefined set of assembly options
that can be stored in a <b>configuration file</b>
in a format defined 
<a href="#ConfigFile">below</a>.
A number of sample configuration files applicable to specific situations
are provided in <code>shasta/conf</code>.
The applicability of each of the files is described in comments 
embedded in each file.

<p>
Shasta command line option <code>--config</code>
is used to specify the configuration to be used, as described
below in details. This option is <b>mandatory</b>
when running an assembly.
If any option is specified both in a configuration
and explictly on the command line, the value
on the command line takes precedence.
This allows you to use a configuration as a useful
set of defaults, while still overriding some of its
options as desired.

<p>
In addition to configuration files, Shasta also provides 
a set of built-in configurations that are compiled 
in the Shasta executable. These built-in configurations
can be used without the need for a configuration file.
Each built-in configuration has a corresponding configuration
file with the same name in <code>shasta/conf</code>, with 
an extension <code>.conf</code>. 
For example, configuration <code>Nanopore-Oct2021</code>
can be specified in one of two ways:

<pre>
shasta --config Nanopore-May2022
</pre>
or
<pre>
shasta --config .../shasta/conf/Nanopore-May2022.conf
</pre>

When using the second form, the file must be available,
and the <code>...</code> should be replaced depending on the
location of the <code>shasta</code> directory.

<p>
To obtain a list of available built-in configurations,
use Shasta command <code>listConfigurations</code> as follows:

<br>
<pre>
shasta --command listConfigurations
</pre>

At the time of writing (May 2024), this outputs the following
list of built-in configurations:

<pre>
Nanopore-Dec2019
Nanopore-UL-Dec2019
Nanopore-Sep2020
Nanopore-UL-Sep2020
Nanopore-UL-iterative-Sep2020
Nanopore-OldGuppy-Sep2020
Nanopore-Plants-Apr2021
Nanopore-Oct2021
Nanopore-UL-Oct2021
HiFi-Oct2021
Nanopore-UL-Jan2022
Nanopore-Phased-Jan2022
Nanopore-UL-Phased-Jan2022
Nanopore-May2022
Nanopore-Phased-May2022
Nanopore-UL-May2022
Nanopore-UL-Phased-May2022
Nanopore-Human-SingleFlowcell-May2022
Nanopore-Human-SingleFlowcell-Phased-May2022
Nanopore-UL-Phased-Nov2022
Nanopore-R10-Fast-Nov2022
Nanopore-R10-Slow-Nov2022
Nanopore-Phased-R10-Fast-Nov2022
Nanopore-Phased-R10-Slow-Nov2022
Nanopore-ncm23-May2024
Nanopore-r10.4.1_e8.2-400bps_sup-Herro-Sep2024
Nanopore-r10.4.1_e8.2-400bps_sup-Raw-Sep2024
</pre>

<p>
The following table summarizes configurations 
recommended at the time of writing (November 2022, Shasta 0.11.0)
under the following conditions:
<ul>
<li>Human assemblies
<li>Oxford Nanopore reads.
<li>Guppy 5 or 6 basecaller with "super" accuracy.
</ul>


<p>
<table>
<tr><th>ONT chemistry<th>Read length<th>Coverage<th>Haploid assembly<th>Phased assembly

<tr><th>R9<th>Standard<th>40x to 80x
<td class=centered><code>Nanopore-May2022</code>
<td class=centered><code>Nanopore-Phased-May2022</code>

<tr><th>R9<th class=centered>Ultra-Long (UL)
<th>40x to 80x
<td class=centered><code>Nanopore-UL-May2022</code>
<td class=centered><code>Nanopore-UL-Phased-Nov2022</code>

<tr><th>R9<th>Standard<th>Human genome with a single flowcell
(about 30x)
<td class=centered><code>Nanopore-Human-SingleFlowcell-May2022</code>
<td class=centered><code>Nanopore-Human-SingleFlowcell-Phased-May2022</code>

<tr><th>R10, fast mode<th>Standard<th>Human genome with a single flowcell
(about 30x)
<td class=centered><code>Nanopore-R10-Fast-Nov2022</code>
<td class=centered><code>Nanopore-Phased-R10-Fast-Nov2022</code>

<tr><th>R10, slow mode<br>(no longer in use)<th>Standard<th>Human genome with two flowcells
(about 45x)
<td class=centered><code>Nanopore-R10-Slow-Nov2022</code>
<td class=centered><code>Nanopore-Phased-R10-Slow-Nov2022</code>

<tr>
<th><a href='https://labs.epi2me.io/gm24385_ncm23_preview/'>
ONT December 2023 Data release</a><br>
(<i>"Experimental extremely high-accuracy, ultra-long
sequencing kit"</i>)
<th>Ultra-Long (UL)
<th>Tested at 40x to 60x but may be functional outside this range
<td>
<td class=centered><code>Nanopore-ncm23-May2024</code>

<tr>
<th>
r10.4.1_e8.2-400bps_sup, error corrected with Herro
<th>
Ultra-Long (UL)
<th>
Tested at 45x but may be functional at higher or lower coverage
<td><td>
<code>Nanopore-r10.4.1_e8.2-400bps_sup-Herro-Sep2024</code>

<tr>
<th>
r10.4.1_e8.2-400bps_sup, without error correction
<th>
Ultra-Long (UL)
<th>
Tested at 45x but may be functional at higher or lower coverage
<td><td>
<code>Nanopore-r10.4.1_e8.2-400bps_sup-Raw-Sep2024</code>

</table>


<p>
To get details of a specific built-in configuration
use Shasta command <code>listConfiguration</code> as follows,
specifiying the built-in configuration of interest after <code>--config</code>:

<pre>
shasta --command listConfiguration --config Nanopore-May2022
</pre>

<p>
This output includes comments that describe the 
applicability of the selected configuration.
Details of the configuration are written out in the configuration
file format defined below. This allows you to
create your own configuration file using a built-in configuration 
as a starting point.
 



<p>
Shasta command line option <code>--config</code> must be used
to specified the desired configuration to be used for an assembly.
The option must specify either a build-in configuration 
or a path to a configuration file.





<h2 id=ConfigFile>Configuration file</h2>
<p>
Some options are only allowed on the command line,
but most of them can also optionally be specified using a configuration file.
Values specified on the command line take precedence over
values specified in the configuration file.
This makes it easy to override specific values in a 
configuration file.

<p>
Options that can be specified both on the command line 
and in a configuration file are of the form 
<code>--SectionName.optionName</code>. The format of the configuration file
is as follows:

<pre id=ConfigFile>
[SectionA]
option1 = valueA1
option2 = valueA2
[SectionB]
option1 = valueB1
option2 = valueB2
</pre>
The above is equivalent to using the following command line options:

<pre>
--SectionA.option1 valueA1 
--SectionA.option2 valueA2 
--SectionB.option1 valueB1 
--SectionB.option2 valueB2 
</pre>

<p>
For example, the value for option <code>MarkerGraph.minCoverage</code>
can be specified in the <code>[MarkerGraph]</code>
section of the configuration file as follows:

<pre>
[MarkerGraph]
minCoverage = 0
</pre>

<p>
In the configuration file, blank lines and lines begining with <code>#</code>
are ignored and can be used to add coments and to improve readability
of the configuration file.



<h2 id=BooleanSwitches>Boolean switches</h2>
<p>
Some command line options are boolean switches,
that is, control options that can be turned on or off
rather then be given a value. 
<p>
To turn on one of these switches on the command line,
just add it to the command line without any value, for
example <code>--Assembly.storeCoverageData</code>.
To turn it off, just omit it from the command line
(the default value is turned off).
<p>
To turn on one of these switches in a
configuration file, you can either enter it without value 
<pre>
storeCoverageData =
</pre>
or assign to it one of the following values:
<code>1, true, True, yes, Yes</code>.
To turn off one of these switches in a
configuration file, assign to it one of the following values:
<code>0, false, False,no, No</code>. 

<p>
Boolean switches are indicated as such in the Description column in he tables below.



<div class="goto-index"><a href="index.html">Table of contents</a></div>

</main>
</body>
</html>