1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398
|
###############
Usage examples
###############
.. note::
In the following examples, ``test_data`` can be replaced with the directory containing the FAST5 files
from your own runs. If you are new to ONT sequencing, the ``test_data`` directory is shipped with ``poretools``
for experimentation.
===================
poretools ``fastq``
===================
Extract sequences in FASTQ format from a set of FAST5 files.
.. code-block:: bash
poretools fastq test_data/*.fast5
Or, if there are too many files for your OS to do the wildcard expansion, just provide a directory.
``poreutils`` will automatically find all of the FAST5 files in the directory.
.. code-block:: bash
poretools fastq test_data/
Extract sequences in FASTQ format from a set of FAST5 files.
.. code-block:: bash
poretools fastq test_data/
poretools fastq --min-length 5000 test_data/
poretools fastq --max-length 5000 test_data/
poretools fastq --type all test_data/
poretools fastq --type fwd test_data/
poretools fastq --type rev test_data/
poretools fastq --type 2D test_data/
poretools fastq --type fwd,rev test_data/
A type of "best" will extract the 2D read, if it exists. If not, it will extract either the template or complement read, whichever is available and has a better average Phred score.
.. code-block:: bash
poretools fastq --type best test_data/
Only extract sequence with more complement events than template. These are the so-called "high quality 2D reads" and are the most accurate sequences from a
given run.
.. code-block:: bash
poretools fastq --type 2D --high-quality test_data/
The data in fastq format are returned in standard output.
===================
poretools ``fasta``
===================
Extract sequences in FASTA format from a set of FAST5 files.
.. code-block:: bash
poretools fasta test_data/
poretools fasta --min-length 5000 test_data/
poretools fasta --max-length 5000 test_data/
poretools fasta --type all test_data/
poretools fasta --type fwd test_data/
poretools fasta --type rev test_data/
poretools fasta --type 2D test_data/
poretools fasta --type fwd,rev test_data/
poretools fasta --type best test_data/
The data in fasta format are returned in standard output.
=====================
poretools ``combine``
=====================
Create a tarball from a set of FAST5 (HDF5) files.
.. code-block:: bash
# plain tar (recommended for speed)
poretools combine -o foo.fast5.tar test_data/*.fast5
# gzip
poretools combine -o foo.fast5.tar.gz test_data/*.fast5
# bzip2
poretools combine -o foo.fast5.tar.bz2 test_data/*.fast5
========================
poretools ``yield_plot``
========================
Create a collector's curve reflecting the sequencing yield over time for a set of reads. There are two types of plots. The first is the yield of reads over time:
.. code-block:: bash
poretools yield_plot --plot-type reads test_data/
The result should look something like:\
.. image:: _images/yield.reads.png
:width: 400pt
The second is the yield of base pairs over time:
.. code-block:: bash
poretools yield_plot --plot-type basepairs test_data/
The result should look something like:
.. image:: _images/yield.bp.png
:width: 400pt
Of course, you can save to PDF or PNG with `--saveas`:
.. code-block:: bash
poretools yield_plot \
--plot-type basepairs \
--saveas foo.pdf\
test_data/
poretools yield_plot \
--plot-type basepairs \
--saveas foo.png\
test_data/
If you don't like the default aesthetics, try `--theme-bw`:
.. code-block:: bash
poretools yield_plot --theme-bw test_data/
======================
poretools ``squiggle``
======================
Make a "squiggle" plot of the signal over time for a given read or set of reads
.. code-block:: bash
poretools squiggle test_data/foo.fast5
The result should look something like:
.. image:: _images/foo.fast5.png
:width: 400pt
If you don't like the default aesthetics, try `--theme-bw`:
.. code-block:: bash
poretools squiggle --theme-bw test_data/
Other options:
.. code-block:: bash
# save as PNG
poretools squiggle --saveas png test_data/foo.fast5
# save as PDF
poretools squiggle --saveas pdf test_data/foo.fast5
# make a PNG for each FAST5 file in a directory
poretools squiggle --saveas png test_data/
====================
poretools ``winner``
====================
Report the longest read among a set of FAST5 files.
.. code-block:: bash
poretools winner test_data/
poretools winner --type all test_data/
poretools winner --type fwd test_data/
poretools winner --type rev test_data/
poretools winner --type 2D test_data/
poretools winner --type fwd,rev test_data/
poretools winner --type best test_data/
===================
poretools ``stats``
===================
Collect read size statistics from a set of FAST5 files.
.. code-block:: bash
poretools stats test_data/
total reads 2286.000000
total base pairs 8983574.000000
mean 3929.822397
median 4011.500000
min 13.000000
max 6864.000000
===================
poretools ``hist``
===================
Plot a histogram of read sizes from a set of FAST5 files.
.. code-block:: bash
poretools hist test_data/
poretools hist --min-length 1000 --max-length 10000 test_data/
poretools hist --num-bins 20 --max-length 10000 test_data/
If you don't like the default aesthetics, try `--theme-bw`:
.. code-block:: bash
poretools hist --theme-bw test_data/
The result should look something like:
.. image:: _images/hist.png
:width: 400pt
=====================
poretools ``nucdist``
=====================
Look at the nucleotide composition of a set of FAST5 files.
.. code-block:: bash
poretools nucdist test_data/
A 78287 335291 0.233489714904
C 75270 335291 0.224491561062
T 92575 335291 0.276103444471
G 84754 335291 0.252777438106
N 4405 335291 0.0131378414571
======================
poretools ``qualdist``
======================
Look at the quality score composition of a set of FAST5 files.
.. code-block:: bash
poretools qualdist test_data/
! 0 83403 335291 0.248748102395
" 1 46151 335291 0.137644613187
# 2 47463 335291 0.141557632027
$ 3 34471 335291 0.102809201559
% 4 24879 335291 0.0742012162569
& 5 20454 335291 0.0610037251224
' 6 16783 335291 0.0500550268274
( 7 13699 335291 0.0408570465655
) 8 11356 335291 0.0338690868529
* 9 9077 335291 0.0270720061081
+ 10 6492 335291 0.0193622852984
, 11 4891 335291 0.014587328619
- 12 3643 335291 0.0108651887465
. 13 2585 335291 0.00770972080968
/ 14 1969 335291 0.0058725107444
0 15 1475 335291 0.00439916371152
1 16 1146 335291 0.00341792651756
2 17 902 335291 0.00269020045274
3 18 790 335291 0.00235616225905
4 19 619 335291 0.0018461575169
5 20 532 335291 0.00158668142002
6 21 440 335291 0.00131229290378
7 22 397 335291 0.00118404609727
8 23 379 335291 0.00113036138757
9 24 313 335291 0.000933517452004
: 25 327 335291 0.000975272226215
; 26 138 335291 0.000411582774366
< 27 121 335291 0.000360880548538
= 28 96 335291 0.000286318451733
> 29 76 335291 0.000226668774289
? 30 69 335291 0.000205791387183
@ 31 61 335291 0.000181931516205
A 32 48 335291 0.000143159225866
B 33 23 335291 6.8597129061e-05
C 34 14 335291 4.17547742111e-05
D 35 6 335291 1.78949032333e-05
F 37 3 335291 8.94745161666e-06
======================
poretools ``qualpos``
======================
Produce a box-whisker plot of qualoty score distribution over positions in reads.
.. code-block:: bash
poretools qualpos test_data/
The result should look something like:
.. image:: _images/qualpos.png
:width: 400pt
=====================
poretools ``tabular``
=====================
Dump the length, name, seq, and qual of the sequence in one or a set of FAST5 files.
.. code-block:: bash
poretools tabular foo.fast5
length name sequence quals
10 @channel_100_read_14_complement GTCCCCAACAACAC $%%'"$"%!)
====================
poretools ``events``
====================
Extract the raw nanopore events from each FAST5 file.
.. code-block:: bash
poretools events test_data/ | head -5
file strand mean start stdv length model_state model_level move p_model_state mp_model_state p_mp_model_state p_A p_C p_G p_T raw_index
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5 template 58.3245290305 1559.89409031 1.34165996292 0.0146082337317 CGACTT 58.1304809188 0 0.0226559 CATCTT 0.0229866 0.284469 0.130683 0.137386 0.447461
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5 template 50.1420877511 1559.90869854 0.921372775302 0.0348605577689 GACTTT 49.3934875964 1 0.0849836 GACTTT 0.0849836 0.257314 0.350541 0.101351 0.290794
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5 template 47.5841029424 1559.9435591 0.771398562801 0.00763612217795 ACTTTG 48.2080162623 1 0.108899 TCTTTG 0.13079 0.000477931 0.00853333 0.306356 0.684632
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5 template 51.5879264562 1559.95119522 0.684238307171 0.0112881806109 CTTTGA 52.7784154546 1 0.110625 CTTTGG 0.121103 4.69995e-06 0.00382846 0.0169048 0.979262
Extract the pre-basecalled events from each FAST5 file.
.. code-block:: bash
poretools events --pre-basecalled test_data/ | head -5
file strand mean start stdv length model_state model_level move p_model_state mp_model_state p_mp_model_state p_A p_C p_G p_T raw_index
burn-in-run-2/ch100_file15_strand.fast5 pre_basecalled 51.4652695313 5352344 0.655003995591 35
burn-in-run-2/ch100_file15_strand.fast5 pre_basecalled 60.1776123047 5352379 1.05143911309 18
burn-in-run-2/ch100_file15_strand.fast5 pre_basecalled 48.9152374359 5352397 0.864834628834 67
burn-in-run-2/ch100_file15_strand.fast5 pre_basecalled 55.4002178596 5352464 1.75915620083 17
===================
poretools ``times``
===================
.. code-block:: bash
poretools times test_data/ | head -5
channel filename read_length exp_starttime unix_timestamp duration unix_timestamp_end iso_timestamp day hour minute
120 test_data/2016_3_4_3507_1_ch120_read240_strand.fast5 5826 1457127309 1457128868 47 1457128915 2016-03-04T15:01:08-0700 04 15 01
120 test_data/2016_3_4_3507_1_ch120_read353_strand.fast5 3399 1457127309 1457129863 28 1457129891 2016-03-04T15:17:43-0700 04 15 17
120 test_data/2016_3_4_3507_1_ch120_read415_strand.fast5 2640 1457127309 1457130808 24 1457130832 2016-03-04T15:33:28-0700 04 15 33
120 test_data/2016_3_4_3507_1_ch120_read418_strand.fast5 3487 1457127309 1457130851 31 1457130882 2016-03-04T15:34:11-0700 04 15 34
=======================
poretools ``occupancy``
=======================
Plot the throughput performance of each pore on the flowcell during a given sequencing run.
.. code-block:: bash
poretools occupancy test_data/
The result should look something like:
.. image:: _images/occupancy.png
:width: 400pt
===================
poretools ``index``
===================
Tabulate all file location info and metadata such as ASIC ID and temperature from a set of FAST5 files
.. code-block:: bash
poretools index test_data | head -5 | column -t
source_filename template_fwd_length complement_rev_length 2d_length asic_id asic_temp heatsink_temp channel exp_start_time exp_start_time_string_date exp_start_time_string_time start_time start_time_string_date start_time_string_time duration fast5_version
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5 5826 5011 5079 3571011476 30.37 36.99 120 1457127309 2016-Mar-04 (Fri) 14:35:09 1457128868 2016-Mar-04 (Fri) 15:01:08 47 metrichor1.16
test_data/2016_3_4_3507_1_ch120_read353_strand.fast5 3399 2962 2940 3571011476 30.37 36.99 120 1457127309 2016-Mar-04 (Fri) 14:35:09 1457129863 2016-Mar-04 (Fri) 15:17:43 28 metrichor1.16
test_data/2016_3_4_3507_1_ch120_read415_strand.fast5 2640 2244 2428 3571011476 30.37 36.99 120 1457127309 2016-Mar-04 (Fri) 14:35:09 1457130808 2016-Mar-04 (Fri) 15:33:28 24 metrichor1.16
test_data/2016_3_4_3507_1_ch120_read418_strand.fast5 3487 2950 3384 3571011476 30.37 36.99 120 1457127309 2016-Mar-04 (Fri) 14:35:09 1457130851 2016-Mar-04 (Fri) 15:34:11 31 metrichor1.16
======================
poretools ``metadata``
=======================
Extract the metadata from the fast5 file
.. code-block:: bash
poretools metadata 013731_11rx_v2_3135_1_ch20_file19_strand.fast5
asic_id asic_temp heatsink_temp
31037 28.11 37.88
poretools metadata --read 013731_11rx_v2_3135_1_ch20_file19_strand.fast5
filename scaling_used abasic_peak_height hairpin_polyt_level median_before start_time read_id read_number hairpin_peak_height abasic_found abasic_event_index duration start_mux hairpin_found hairpin_event_index
013731_11rx_v2_3135_1_ch20_file19_strand.fast5 1 124.31769966 0.413218809334 226.393825112 4648221 3b4e45bf-6d42-45bc-9314-1d8a630971c2 19 125.783167256 1 2 195322 4 1 1478
|