File: proteinortho6.1

package info (click to toggle)
proteinortho 6.0.28%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 2,160 kB
  • sloc: perl: 4,487; cpp: 3,560; python: 655; makefile: 329; ansic: 266; sh: 27
file content (97 lines) | stat: -rw-r--r-- 3,186 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
.TH PROTEINORTHO6 "1" "November 2015" "proteinortho6 6.0.6" "User Commands"
.SH NAME
proteinortho6 \- orthology detection tool
.SH SYNOPSIS
.B proteinortho6
[\fI\,OPTIONS\/\fR] \fI\,FASTA1 FASTA2 \/\fR[\fI\,FASTA\/\fR...]
.SH DESCRIPTION
Proteinortho is a stand-alone tool that is geared towards large datasets
and makes use of distributed computing techniques when run on multi-core
hardware. It implements an extended version of the reciprocal best
alignment heuristic. Proteinortho was applied to compute orthologous
proteins in the complete set of all 717 eubacterial genomes available at
NCBI at the beginning of 2009. Authors succeeded identifying thirty
proteins present in 99% of all bacterial proteomes.
.SH OPTIONS
.TP
\fB\-e=\fR
E\-value for blast [default: 1e\-05]
.TP
\fB\-p=\fR
blast program {blastp+|blastn+|tblastx+|diamond|usearch|ublast|lastp|lastn|rapsearch|topaz|blatp|blatn|mmseqsp|mmseqsn}
[default: diamond]
.TP
\fB\-project=\fR
prefix for all result file names [default: myproject]
.TP
\fB\-synteny\fR
activate PoFF extension to separate similar sequences
by contextual adjacencies (requires .gff for each .fasta)
.TP
\fB\-dups=\fR
PoFF: number of reiterations for adjacencies heuristic,
to determine duplicated regions (default: 0)
.TP
\fB\-cs=\fR
PoFF: Size of a maximum common substring (MCS) for
adjacency matches (default: 3)
.TP
\fB\-alpha=\fR
PoFF: weight of adjacencies vs. sequence similarity
(default: 0.5)
.TP
\fB\-desc\fR
write description files (for NCBI FASTA input only)
.TP
\fB\-keep\fR
stores temporary blast results for reuse
.TP
\fB\-force\fR
forces recalculation of blast results in any case
.TP
\fB\-cpus=\fR
number of processors to use [default: auto]
.TP
\fB\-selfblast\fR
apply selfblast, detects paralogs without orthologs
.TP
\fB\-singles\fR
report singleton genes without any hit
.TP
\fB\-identity=\fR
min. percent identity of best blast hits [default: 25]
.TP
\fB\-cov=\fR
min. coverage of best blast alignments in % [default: 50]
.TP
\fB\-conn=\fR
min. algebraic connectivity [default: 0.1]
.TP
\fB\-sim=\fR
min. similarity for additional hits (0..1) [default: 0.95]
.TP
\fB\-step=\fR
1 \-> generate indices
2 \-> run blast (and ff\-adj, if \fB\-synteny\fR is set)
3 \-> clustering
0 \-> all (default)
.TP
\fB\-binpath=\fR
path to your local blast/diamond/... (if not installed globally)
.TP
\fB\-verbose\fR
keeps you informed about the progress
.TP
\fB\-clean\fR
remove all unnecessary files after processing
.TP
\fB\-debug\fR
gives detailed information for bug tracking
.PP
More specific blast parameters can be defined by
.TP
\fB\-subparaBLAST=\fR'[parameters]' (e.g. \fB\-subparaBLAST=\fR'\-seg no')
.PP
In case jobs should be distributed onto several machines, use
.TP
\fB\-jobs=M/N\fR    If you want to involve multiple machines or separate a Proteinortho run into smaller chunks, use the -jobs=M/N option. First, run 'proteinortho6.pl -steps=1 ...' to generate the indices. Then you can run 'proteinortho6.pl -steps=2 -jobs=M/N ...' to run small chunks separately. Instead of M and N numbers must be set representing the number of jobs you want to divide the run into (M) and the job division to be performed by the process.