File: strainphlan.1

package info (click to toggle)
metaphlan2 2.6.0%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 72,684 kB
  • ctags: 351
  • sloc: python: 4,352; sh: 26; makefile: 7
file content (209 lines) | stat: -rw-r--r-- 7,391 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
.TH METAPHLAN2_STRAINER "1" "July 2016" "metaphlan2_strainer 2.5.0" "User Commands"
.SH NAME
metaphlan2_strainer \- METAgenomic PHyLogenetic ANalysis for metagenomic taxonomic profiling (strainer)
.SH SYNOPSIS
.B metaphlan2_strainer.py
[\-h] \fB\-\-ifn_samples\fR IFN_SAMPLES [IFN_SAMPLES ...]
\fB\-\-mpa_pkl\fR MPA_PKL \fB\-\-output_dir\fR OUTPUT_DIR
[\-\-ifn_markers IFN_MARKERS]
[\-\-nprocs_main NPROCS_MAIN]
[\-\-nprocs_load_samples NPROCS_LOAD_SAMPLES]
[\-\-nprocs_align_clean NPROCS_ALIGN_CLEAN]
[\-\-nprocs_raxml NPROCS_RAXML]
[\-\-bootstrap_raxml BOOTSTRAP_RAXML]
[\-\-ifn_ref_genomes IFN_REF_GENOMES [IFN_REF_GENOMES ...]]
[\-\-N_in_marker N_IN_MARKER]
[\-\-marker_strip_length MARKER_STRIP_LENGTH]
[\-\-marker_in_clade MARKER_IN_CLADE]
[\-\-sample_in_clade SAMPLE_IN_CLADE]
[\-\-sample_in_marker SAMPLE_IN_MARKER]
[\-\-gap_in_trailing_col GAP_IN_TRAILING_COL]
[\-\-gap_trailing_col_limit GAP_TRAILING_COL_LIMIT]
[\-\-gap_in_internal_col GAP_IN_INTERNAL_COL]
[\-\-gap_in_sample GAP_IN_SAMPLE] [\-\-N_col N_COL]
[\-\-N_count N_COUNT]
[\-\-long_gap_length LONG_GAP_LENGTH]
[\-\-long_gap_percentage LONG_GAP_PERCENTAGE]
[\-\-p_value P_VALUE]
[\-\-clades CLADES [CLADES ...]]
[\-\-marker_list_fn MARKER_LIST_FN]
[\-\-print_clades_only]
[\-\-alignment_program {muscle,mafft}]
[\-\-relaxed_parameters] [\-\-relaxed_parameters2]
[\-\-keep_alignment_files]
[\-\-keep_full_alignment_files]
[\-\-save_sample2fullfreq] [\-\-use_threads]
.SH DESCRIPTION
Metaphlan2_strainer is a computational tool for tracking individual strains
across large set of samples. The input of metaphlan2_strainer is a set of
metagenomic samples and the output is a set of phylogenetic.
.
For each sample, metaphlan2_strainer extracts the strain of a specific species
by merging and concatenating all reads mapped against that species markers in
the MetaPhlAn2 database.
.SH OPTIONS
.SS optional arguments
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-\-ifn_samples\fR IFN_SAMPLES [IFN_SAMPLES ...]
The list of sample files (space separated).The
wildcard can also be used.
.TP
\fB\-\-mpa_pkl\fR MPA_PKL
The database of metaphlan3.py.
.TP
\fB\-\-output_dir\fR OUTPUT_DIR
The output directory.
.TP
\fB\-\-ifn_markers\fR IFN_MARKERS
The marker file in fasta format.
.TP
\fB\-\-nprocs_main\fR NPROCS_MAIN
The number of processors are used for the main
threads. Default 1.
.TP
\fB\-\-nprocs_load_samples\fR NPROCS_LOAD_SAMPLES
The number of processors are used for loading samples.
Default nprocs_main.
.TP
\fB\-\-nprocs_align_clean\fR NPROCS_ALIGN_CLEAN
The number of processors are used for aligning and
cleaning markers. Default nprocs_main.
.TP
\fB\-\-nprocs_raxml\fR NPROCS_RAXML
The number of processors are used for running raxml.
Default nprocs_main.
.TP
\fB\-\-bootstrap_raxml\fR BOOTSTRAP_RAXML
The number of runs for bootstraping when building the
tree. Default 0.
.TP
\fB\-\-ifn_ref_genomes\fR IFN_REF_GENOMES [IFN_REF_GENOMES ...]
The reference genome file names. They are separated by
spaces.
.TP
\fB\-\-N_in_marker\fR N_IN_MARKER
The consensus markers with the rate of N nucleotides
greater than this threshold are removed. Default 0.2.
.TP
\fB\-\-marker_strip_length\fR MARKER_STRIP_LENGTH
The number of nucleotides will be deleted from each of
two ends of a marker. Default 50.
.TP
\fB\-\-marker_in_clade\fR MARKER_IN_CLADE
In each sample, the clades with the rate of present
markers less than this threshold are removed. Default
0.8.
.TP
\fB\-\-sample_in_clade\fR SAMPLE_IN_CLADE
Only clades present in at least sample_in_clade
samples are kept. Default 2.
.TP
\fB\-\-sample_in_marker\fR SAMPLE_IN_MARKER
If the percentage of samples that a marker present in
is less than this threshold, that marker is removed.
Default 0.8.
.TP
\fB\-\-gap_in_trailing_col\fR GAP_IN_TRAILING_COL
If the number of the trailing nucleotide columns in
aligned markers with the percentage of gaps greater
than gap_in_trailing_col is less than
gap_trailing_col_limit, these columns will be removed.
Default 0.2.
.TP
\fB\-\-gap_trailing_col_limit\fR GAP_TRAILING_COL_LIMIT
If the number of the trailing nucleotide columns in
aligned markers with the percentage of gaps greater
than gap_in_trailing_col is less than
gap_trailing_col_limit, these columns will be removed.
Default 101.
.TP
\fB\-\-gap_in_internal_col\fR GAP_IN_INTERNAL_COL
The internal nucleotide columns in aligned markers
with the percentage of gaps greater than
gap_in_internal_col will be removed. Default 0.3.
.TP
\fB\-\-gap_in_sample\fR GAP_IN_SAMPLE
The samples with full sequences from all markers and
having the percentage of gaps greater than this
threshold will be removed. Default 0.2.
.TP
\fB\-\-N_col\fR N_COL
In aligned markers, if the percentage of nucleotide
columns containing more than N_count Ns less than this
threshold, these columns will be removed. Default 0.8.
.TP
\fB\-\-N_count\fR N_COUNT
In aligned markers, if the percentage of nucleotide
columns containing more than N_count Ns less than
N_col threshold, these columns will be removed.
Default 0.
.TP
\fB\-\-long_gap_length\fR LONG_GAP_LENGTH
In each concatenated sequence of a sample, sequential
gap positions is a gap group. A gap group with length
greater than this threshold is considered as a long
gap group. If the ratio between the number of unique
positions in all long gap groups and the concatenated
sequence length is less than long_gap_percentage,
these positions will be removed from all concatenated
sequences. Default 2.
.TP
\fB\-\-long_gap_percentage\fR LONG_GAP_PERCENTAGE
Combining this threshold with long_gap_length to
removed long gaps. Default 0.8.
.TP
\fB\-\-p_value\fR P_VALUE
The p_value to reject a non\-polymorphic site.Default
0.05.
.TP
\fB\-\-clades\fR CLADES [CLADES ...]
The clades (space separated) for which the script will
compute the marker alignments in fasta format and the
phylogenetic trees. If a file name is specified, the
clade list in that file where each clade name is on a
line will be read.Default "automatically identify all
clades".
.TP
\fB\-\-marker_list_fn\fR MARKER_LIST_FN
The file name containing the list of considered
markers. The other markers will be discarded. Default
"None".
.TP
\fB\-\-print_clades_only\fR
Only print the potential clades and stop without
building any tree. This option is useful when you want
to check quickly all possible clades and rerun only
for some specific ones. Default "False".
.TP
\fB\-\-alignment_program\fR {muscle,mafft}
The alignment program. Default "muscle".
.TP
\fB\-\-relaxed_parameters\fR
Set marker_in_clade=0.5, sample_in_marker=0.5,
N_in_marker=0.5, gap_in_sample=0.5. Default "False".
.TP
\fB\-\-relaxed_parameters2\fR
Set marker_in_clade=0.2, sample_in_marker=0.2,
N_in_marker=0.8, gap_in_sample=0.8. Default "False".
.TP
\fB\-\-keep_alignment_files\fR
Keep the alignment files of all markers before
cleaning step.
.TP
\fB\-\-keep_full_alignment_files\fR
Keep the alignment files of all markers before
truncating the starting and ending parts, and cleaning
step. This is equivalent to \fB\-\-keep_alignment_files\fR
\fB\-\-marker_strip_length\fR 0
.TP
\fB\-\-save_sample2fullfreq\fR
Save sample2fullfreq to a msgpack file
sample2fullfreq.msgpack.
.TP
\fB\-\-use_threads\fR
Use multithreading. Default "Use multiprocessing".
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.