1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
|
Command line interface
======================
New in ``pyfastx`` 0.5.0
.. code:: bash
$ pyfastx -h
usage: pyfastx COMMAND [OPTIONS]
A command line tool for FASTA/Q file manipulation
optional arguments:
-h, --help show this help message and exit
-v, --version show program version number and exit
Commands:
index build index for fasta/q file
stat show detailed statistics information of fasta/q file
split split fasta/q file into multiple files
fq2fa convert fastq file to fasta file
subseq get subsequences from fasta file by region
sample randomly sample sequences from fasta or fastq file
extract extract full sequences or reads from fasta/q file
Build index
-----------
New in ``pyfastx`` 0.6.10
.. code:: bash
$ pyfastx index -h
usage: pyfastx index [-h] [-f] fastx [fastx ...]
positional arguments:
fastx fasta or fastq file, gzip support
optional arguments:
-h, --help show this help message and exit
-f, --full build full index, base composition will be calculated
The --full option was used to count bases in FASTA/Q file and speedup calculation of GC content.
Show statistics information
---------------------------
.. code:: bash
$ pyfastx stat -h
usage: pyfastx stat [-h] fastx [fastx ...]
positional arguments:
fastx input fasta or fastq file, gzip support
optional arguments:
-h, --help show this help message and exit
For example:
.. code:: bash
$ pyfastx info tests/data/*.fa*
fileName seqType seqCounts totalBases GC% avgLen medianLen maxLen minLen N50 L50
protein.fa protein 17 2265 - 133.24 80.0 419 23 263 4
rna.fa RNA 2 720 65.283 360.0 360.0 360 360 360 1
test.fa DNA 211 86262 43.529 408.82 386.0 821 118 516 66
test.fa.gz DNA 211 86262 43.529 408.82 386.0 821 118 516 66
seqType: sequence type (DNA, RNA, or protein); seqCounts: total sequence counts; totalBases: total number of bases; GC%: GC content; avgLen: average sequence length; medianLen: median sequence length; maxLen: maximum sequence length; minLen: minimum sequence length; N50: N50 length; L50: L50 sequence counts.
.. code:: bash
$ pyfastx info tests/data/*.fq*
fileName readCounts totalBases GC% avgLen maxLen minLen maxQual minQual qualEncodingSystem
test.fq 800 120000 66.175 150.0 150 150 70 35 Sanger Phred+33,Illumina 1.8+ Phred+33
test.fq.gz 800 120000 66.175 150.0 150 150 70 35 Sanger Phred+33,Illumina 1.8+ Phred+33
readCounts: total read counts; totalBases: total number of bases; GC%: GC content; avgLen: average sequence length; maxLen: maximum sequence length; minLen: minimum sequence length; maxQual: maximum quality score; minQual: minimum quality score; qualEncodingSystem: quality encoding system.
Split FASTA/Q file
------------------
.. code:: bash
$ pyfastx split -h
usage: pyfastx split [-h] (-n int | -c int) [-o str] fastx
positional arguments:
fastx fasta or fastq file, gzip support
optional arguments:
-h, --help show this help message and exit
-n int split a fasta/q file into N new files with even size
-c int split a fasta/q file into multiple files containing the same sequence counts
-o str, --out-dir str
output directory, default is current folder
Convert FASTQ to FASTA file
---------------------------
.. code:: bash
$ pyfastx fq2fa -h
usage: pyfastx fq2fa [-h] [-o str] fastx
positional arguments:
fastx input fastq file, gzip support
optional arguments:
-h, --help show this help message and exit
-o str, --out-file str
output file, default: output to stdout
Get subsequence with region
---------------------------
.. code:: bash
$ pyfastx subseq -h
usage: pyfastx subseq [-h] [-r str | -b str] [-o str]
fastx [region [region ...]]
positional arguments:
fastx input fasta file, gzip support
region format is chr:start-end, start and end position is
1-based, multiple regions were separated by space
optional arguments:
-h, --help show this help message and exit
-r str, --region-file str
tab-delimited file, one region per line, both start
and end position are 1-based
-b str, --bed-file str
tab-delimited BED file, 0-based start position and
1-based end position
-o str, --out-file str
output file, default: output to stdout
Sample sequences
----------------
.. code:: bash
$ pyfastx sample -h
usage: pyfastx sample [-h] (-n int | -p float) [-s int] [--sequential-read]
[-o str]
fastx
positional arguments:
fastx fasta or fastq file, gzip support
optional arguments:
-h, --help show this help message and exit
-n int number of sequences to be sampled
-p float proportion of sequences to be sampled, 0~1
-s int, --seed int random seed, default is the current system time
--sequential-read start sequential reading, particularly suitable for
sampling large numbers of sequences
-o str, --out-file str
output file, default: output to stdout
Extract sequences
-----------------
New in ``pyfastx`` 0.6.10
.. code:: bash
$ pyfastx extract -h
usage: pyfastx extract [-h] [-l str] [--reverse-complement] [--out-fasta]
[-o str] [--sequential-read]
fastx [name [name ...]]
positional arguments:
fastx fasta or fastq file, gzip support
name sequence name or read name, multiple names were
separated by space
optional arguments:
-h, --help show this help message and exit
-l str, --list-file str
a file containing sequence or read names, one name per
line
--reverse-complement output reverse complement sequence
--out-fasta output fasta format when extract reads from fastq,
default output fastq format
-o str, --out-file str
output file, default: output to stdout
--sequential-read start sequential reading, particularly suitable for
extracting large numbers of sequences
|