File: README.md

package info (click to toggle)
shasta 0.14.0-2
links: PTS, VCS
area: main
in suites: sid
size: 29,648 kB
sloc: cpp: 82,262; python: 2,348; makefile: 223; sh: 143
file content (70 lines) | stat: -rw-r--r-- 2,767 bytes
parent folder | download | duplicates (3)
# Shasta long read assembler

De novo assembler for long reads, optimized for Oxford Nanopore (ONT) reads.

🆕 [Mode 3 assembly: presentation of assembly results](https://paoloshasta.github.io/shasta/Shasta-0.12.0.pdf)

🆕 [Mode 3 assembly: usage notes](https://paoloshasta.github.io/shasta/Mode3-0.12.0.html)

___

**Shasta development continues in this fork.** 

New releases will appear in the 
[Releases page](https://github.com/paoloshasta/shasta/releases) of this repository.
Previous releases (up to 0.10.0) are available from the 
[Release page](https://github.com/chanzuckerberg/shasta/releases) of pre-fork
repository `chanzuckerberg/shasta`.
___

**The complete user documentation is available [here](https://paoloshasta.github.io/shasta/).**

**For quick start information see [here](https://paoloshasta.github.io/shasta/QuickStart.html).**

The main paper describing Shasta and its methods and results is 
[Shafin et al., Nature Biotechnology 2020](https://www.nature.com/articles/s41587-020-0503-6).
Reads from this paper are available 
[here](https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html).
The assembly results are
[here](https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=publications/SHASTA2019/assemblies/).

**Requests for help:** please file GitHub issues to report problems, request help, or ask questions.
**Please keep each issue on a single topic when possible.** 
___

Main features of the Shasta long read assembler:

* Optimized to rapidly
produce accurate assembled sequence using DNA reads
generated by [Oxford Nanopore](https://nanoporetech.com) flow cells as input.
* High performance (a few hours for a human assembly 
using a single machine of appropriate size).
* Haploid or phased diploid assembly.

Computational methods used by the Shasta assembler include:

* Using a
[run-length](https://en.wikipedia.org/wiki/Run-length_encoding)
representation of the read sequence.
This makes the assembly process more resilient to errors in
homopolymer repeat counts, which are the most common type
of errors in Oxford Nanopore reads. 

* Most phases of the computation use a representation
of the read sequence based on *markers*, a fixed
subset of short k-mers (k ≈ 10).

See [this documentation page](https://paoloshasta.github.io/shasta/ComputationalMethods.html)
for more information on computational methods.

#### Acknowledgments

The Shasta software uses various external software packages.
See [here](https://paoloshasta.github.io/shasta/Acknowledgments.html) for more information.

___

**The complete user documentation is available [here](https://paoloshasta.github.io/shasta/).**

**For quick start information see [here](https://paoloshasta.github.io/shasta/QuickStart.html).**
___