1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
|
# Shasta long read assembler
De novo assembler for long reads, optimized for Oxford Nanopore (ONT) reads.
🆕 [Mode 3 assembly: presentation of assembly results](https://paoloshasta.github.io/shasta/Shasta-0.12.0.pdf)
🆕 [Mode 3 assembly: usage notes](https://paoloshasta.github.io/shasta/Mode3-0.12.0.html)
___
**Shasta development continues in this fork.**
New releases will appear in the
[Releases page](https://github.com/paoloshasta/shasta/releases) of this repository.
Previous releases (up to 0.10.0) are available from the
[Release page](https://github.com/chanzuckerberg/shasta/releases) of pre-fork
repository `chanzuckerberg/shasta`.
___
**The complete user documentation is available [here](https://paoloshasta.github.io/shasta/).**
**For quick start information see [here](https://paoloshasta.github.io/shasta/QuickStart.html).**
The main paper describing Shasta and its methods and results is
[Shafin et al., Nature Biotechnology 2020](https://www.nature.com/articles/s41587-020-0503-6).
Reads from this paper are available
[here](https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html).
The assembly results are
[here](https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=publications/SHASTA2019/assemblies/).
**Requests for help:** please file GitHub issues to report problems, request help, or ask questions.
**Please keep each issue on a single topic when possible.**
___
Main features of the Shasta long read assembler:
* Optimized to rapidly
produce accurate assembled sequence using DNA reads
generated by [Oxford Nanopore](https://nanoporetech.com) flow cells as input.
* High performance (a few hours for a human assembly
using a single machine of appropriate size).
* Haploid or phased diploid assembly.
Computational methods used by the Shasta assembler include:
* Using a
[run-length](https://en.wikipedia.org/wiki/Run-length_encoding)
representation of the read sequence.
This makes the assembly process more resilient to errors in
homopolymer repeat counts, which are the most common type
of errors in Oxford Nanopore reads.
* Most phases of the computation use a representation
of the read sequence based on *markers*, a fixed
subset of short k-mers (k ≈ 10).
See [this documentation page](https://paoloshasta.github.io/shasta/ComputationalMethods.html)
for more information on computational methods.
#### Acknowledgments
The Shasta software uses various external software packages.
See [here](https://paoloshasta.github.io/shasta/Acknowledgments.html) for more information.
___
**The complete user documentation is available [here](https://paoloshasta.github.io/shasta/).**
**For quick start information see [here](https://paoloshasta.github.io/shasta/QuickStart.html).**
___
|