1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
|
SOAPaligner/soap2(1) Bioinformatics tool SOAPaligner/soap2(1)
NNAAMMEE
SOAPaligner/soap2 - Short Oligonucleotide Analysis Package aligner
SSYYNNOOPPSSIISS
soap reference.index short_reads.fast[a|q] alignment.out [options]
DDEESSCCRRIIPPTTIIOONN
SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analy-
sis Package). It is an updated version of SOAP software for short
oligonucleotide alignment. The new program features in super fast and
accurate alignment for huge amounts of short reads generated by Illu-
mina/Solexa Genome Analyzer. Compared to soap v1, it is one order of
magnitude faster. It require only 2 minutes aligning one million sin-
gle-end reads onto the human reference genome. Another remarkable
improvement of SOAPaligner is that it now supports a wide range of the
read length.
SOAPaligner benefitted in time and space efficiency by a revolution in
the basic data structures and algorithms used.The core algorithms and
the indexing data structures (2way-BWT) are developed by the algorithms
research group of the Department of Computer Science, the University of
Hong Kong (T.W. Lam, Alan Tam, Simon Wong, Edward Wu and S.M. Yiu).
CCOOMMMMAANNDD AANNDD OOPPTTIIOONNSS
ssooaapp -D <in.fasta.index> -a <query.file.a> [-b <query.file.b>] -o
<alignment.output> [-2 <unpaired.output>] [options]
OOPPTTIIOONNSS::
--DD SSTTRR Prefix name for reference index [*.index]. See AAPPPPEENNDDIIXX
How to build the reference index
--aa SSTTRR Query file, for SE reads alignment or one end of PE reads
--bb SSTTRR Query b file, one end of PE reads
--oo SSTTRR Output file for alignment results
--22 SSTTRR Output file contains mapped but unpaired reads when do PE
alignment
--uu SSTTRR Output file for unmapped reads, [none]
--mm IINNTT Minimal insert size INT allowed for PE, [400]
--xx IINNTT Maximal insert size INT allowed for PE, [600]
--nn IINNTT Filter low quality reads containing more INT bp Ns, [5]
--tt Output reads id instead reads name, [none]
--rr IINNTT How to report repeat hits, 0=none; 1=random one; 2=all,
[1]
--RR RF alignment for long insert size(>= 2k bps) PE data,
[none] FR alignment
--ll IINNTT For long reads with high error rate at 3'-end, those
can't align whole length, then first align 5' INT bp sub-
sequence as a seed, [256] use whole length of the read
--ss IINNTT minimal alignment length (for soft clip)
--vv IINNTT Totally allowed mismatches in one read, when use subse-
quence as a seed, [5]
--gg IINNTT Allow gap size in one read, [0]
--MM IINNTT Match mode for each read or the seed part of read, which
shouldn't contain more than 2 mismatches, [4]
0: exact match only
1: 1 mismatch match only
2: 2 mismatch match only
4: find the best hits
--pp IINNTT Multithreads, n threads, [1]
OOUUTTPPUUTT FFOORRMMAATT
SOAP2 output format contains following column information:
1. reads name / reads ID (if -t is available)
2. reads sequence (if read align to reverse strand, here is the reverse
sequence of orignal read)
3. quality sequence (if input is fasta reads, the column will be all
'h', and the sequence is backward if reads mapping reverse )
4.
AAPPPPEENNDDIIXX
Before use soap2 to do alignment, the reference index must be generated
by 2bwt-builder.
22bbwwtt--bbuuiillddeerr <reference.fasta>
NNOOTTEE:: 1. the reference input should only be FASTA format; 2. the
program wil auto generate the index files in the directory where
the fasta file is located, so confirm the permission at first.
EENNVVIIRROONNMMEENNTT
The datastructure is imcompatible with 32bit, so it can't be migrated
on any 32bit platforms. Due to using the MMX instruction to opitimize
parts of code, the current version can only run on xx8866__6644 ppllaattffoorrmm.. We
will provide a universal version for most of the 64bit platform later.
HHAARRDDWWAARREE RREEQQUUIIRREEMMEENNTT
1.8Gb RAM (for a genome as large as human's)
2.at least 8Gb hard disk to store index (for a genome as large
as human's)
SSYYSSTTEEMM RREEQQUUIIRREEMMEENNTT
Linux x86_64
SSEEEE AALLSSOO
Website for SOAP <http://soap.genomics.org.cn>,
Google Group for SOAP <http://groups.google.com/group/bgi-soap>
PPuubblliiccaattiioonn::
"SOAP: short oligonucleotide alignment program" (2008) BIOINFOR-
MATICS,Vol. 24 no.5 2008, pages 713-714
AATTHHOOUURR
BBGGII SShheennzzhheenn SOAP team. The core algorithm Bidirect-BWT is wrotten by
Prof. T.W. Lam and his team at HongKong University.
RREEPPOORRTT BBUUGGSS
Report bugs to <soap@genomics.org.cn>
AACCKKNNOOWWLLEEDDGGEEMMEENNTTSS
We appreciate Prof. T.W. Lam, Alan Tam, Simon Wong, Edward Wu and S.M.
Yiu prominent work on Bidirect-BWT.
SOAPaligner-2.1X 25 May 2009 SOAPaligner/soap2(1)
|