File: fastahack.1

package info (click to toggle)

libfastahack 1.0.0%2Bdfsg-11

links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 184 kB
sloc: cpp: 576; makefile: 54

file content (60 lines) | stat: -rw-r--r-- 2,148 bytes

parent folder | download | duplicates (6)

.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.46.4.
.TH FASTAHACK "1" "June 2016" "fastahack 0.0+20160309" "User Commands"
.SH NAME
fastahack \- indexing and sequence extraction from FASTA files
.SH SYNOPSIS
.B fastahack
[options] <fasta reference>
.SH DESCRIPTION
fastahack is a small application for indexing and extracting sequences and
subsequences from FASTA files.  The included Fasta.cpp library provides a FASTA
reader and indexer that can be embedded into applications which would benefit
from directly reading subsequences from FASTA files.  The library automatically
handles index file generation and use.
.P
Features:
.IP
FASTA index (.fai) generation for FASTA files
.IP
Sequence extraction
.IP
Subsequence extraction
.IP
Sequence statistics (currently only entropy is provided)
.P
Sequence and subsequence extraction use fseek64 to provide fastest-possible
extraction without RAM-intensive file loading operations.  This makes fastahack
a useful tool for bioinformaticists who need to quickly extract many
subsequences from a reference FASTA sequence.
.SH OPTIONS
.TP
\fB\-i\fR, \fB\-\-index\fR
generate fasta index <fasta reference>.fai
.TP
\fB\-r\fR, \fB\-\-region\fR REGION
print the specified region
.TP
\fB\-c\fR, \fB\-\-stdin\fR
read a stream of line\-delimited region specifiers on stdin
and print the corresponding sequence for each on stdout
.TP
\fB\-e\fR, \fB\-\-entropy\fR
print the shannon entropy of the specified region
.TP
\fB\-d\fR, \fB\-\-dump\fR
print the fasta file in the form 'seq_name <tab> sequence'
.P
REGION is of the form
.IP
<seq>, <seq>:<start>[sep]<end>, <seq1>:<start>[sep]<seq2>:<end>
.P
where start and end are 1\-based, and the region includes the end position.
[sep] is "\-" or ".."
.PP
Specifying a sequence name alone will return the entire sequence, specifying
range will return that range, and specifying a single coordinate pair, e.g.
<seq>:<start> will return just that base.
.SH AUTHOR
This software was written by Erik Garrison <erik.garrison@bc.edu>.
.P
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.