File: seqkit.1

package info (click to toggle)
seqkit 2.3.1%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 71,884 kB
  • sloc: sh: 929; perl: 114; makefile: 14
file content (119 lines) | stat: -rw-r--r-- 4,699 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.48.5.
.TH SEQKIT "1" "January 2022" "seqkit 2.1.0+ds" "User Commands"
.SH NAME
seqkit \- cross-platform and ultrafast toolkit for FASTA/Q file manipulation
.SH DESCRIPTION
SeqKit \fB\-\-\fR a cross\-platform and ultrafast toolkit for FASTA/Q file manipulation
.PP
Version: 2.1.0
.PP
Author: Wei Shen <shenwei356@gmail.com>
.PP
Documents  : http://bioinf.shenwei.me/seqkit
Source code: https://github.com/shenwei356/seqkit
Please cite: https://doi.org/10.1371/journal.pone.0163962
.PP
Seqkit utlizies the pgzip (https://github.com/klauspost/pgzip) package to
read and write gzip file, and the outputted gzip file would be slighty
larger than files generated by GNU gzip.
.PP
Seqkit writes gzip files very fast, much faster than the multi\-threaded pigz,
therefore there's no need to pipe the result to gzip/pigz.
.SS "Usage:"
.IP
seqkit [command]
.SS "Available Commands:"
.TP
amplicon
extract amplicon (or specific region around it) via primer(s)
.TP
bam
monitoring and online histograms of BAM record features
.TP
common
find common sequences of multiple files by id/name/sequence
.TP
concat
concatenate sequences with same ID from multiple files
.TP
convert
convert FASTQ quality encoding between Sanger, Solexa and Illumina
.TP
duplicate
duplicate sequences N times
.TP
faidx
create FASTA index file and extract subsequence
.TP
fish
look for short sequences in larger sequences using local alignment
.TP
fq2fa
convert FASTQ to FASTA
.TP
fx2tab
convert FASTA/Q to tabular format (and length, GC content, average quality...)
.IP
genautocomplete generate shell autocompletion script (bash|zsh|fish|powershell)
grep            search sequences by ID/name/sequence/sequence motifs, mismatch allowed
head            print first N FASTA/Q records
head\-genome     print sequences of the first genome with common prefixes in name
locate          locate subsequences/motifs, mismatch allowed
mutate          edit sequence (point mutation, insertion, deletion)
pair            match up paired\-end reads from two fastq files
range           print FASTA/Q records in a range (start:end)
rename          rename duplicated IDs
replace         replace name/sequence by regular expression
restart         reset start position for circular genome
rmdup           remove duplicated sequences by ID/name/sequence
sample          sample sequences by number or proportion
sana            sanitize broken single line FASTQ files
scat            real time recursive concatenation and streaming of fastx files
seq             transform sequences (extract ID, filter by length, remove gaps...)
shuffle         shuffle sequences
sliding         extract subsequences in sliding windows
sort            sort sequences by id/name/sequence/length
split           split sequences into files by id/seq region/size/parts (mainly for FASTA)
split2          split sequences into files by size/parts (FASTA, PE/SE FASTQ)
stats           simple statistics of FASTA/Q files
subseq          get subsequences by region/gtf/bed, including flanking sequences
tab2fx          convert tabular format to FASTA/Q format
translate       translate DNA/RNA to protein sequence (supporting ambiguous bases)
version         print version information and check for update
watch           monitoring and online histograms of sequence features
.SS "Flags:"
.TP
\fB\-\-alphabet\-guess\-seq\-length\fR int
length of sequence prefix of the first FASTA record based on which seqkit guesses the sequence type (0 for whole seq) (default 10000)
.TP
\fB\-h\fR, \fB\-\-help\fR
help for seqkit
.TP
\fB\-\-id\-ncbi\fR
FASTA head is NCBI\-style, e.g. >gi|110645304|ref|NC_002516.2| Pseud...
.TP
\fB\-\-id\-regexp\fR string
regular expression for parsing ID (default "^(\e\eS+)\e\es?")
.TP
\fB\-\-infile\-list\fR string
file of input files list (one file per line), if given, they are appended to files from cli arguments
.TP
\fB\-w\fR, \fB\-\-line\-width\fR int
line width when outputting FASTA format (0 for no wrap) (default 60)
.TP
\fB\-o\fR, \fB\-\-out\-file\fR string
out file ("\-" for stdout, suffix .gz for gzipped out) (default "\-")
.TP
\fB\-\-quiet\fR
be quiet and do not show extra information
.TP
\fB\-t\fR, \fB\-\-seq\-type\fR string
sequence type (dna|rna|protein|unlimit|auto) (for auto, it automatically detect by the first sequence) (default "auto")
.TP
\fB\-j\fR, \fB\-\-threads\fR int
number of CPUs. can also set with environment variable SEQKIT_THREADS) (default 4)
.PP
Use "seqkit [command] \fB\-\-help\fR" for more information about a command.
.SH AUTHOR
This manpage was written by Nilesh Patra for the Debian distribution and
can be used for any other usage of the program.