File: README.md

package info (click to toggle)
ray 2.3.1-9
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 6,008 kB
  • sloc: cpp: 49,973; sh: 339; makefile: 281; python: 168
file content (249 lines) | stat: -rw-r--r-- 6,349 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# Ray assembler

Ray is a parallel de novo genome assembler that utilises the message-passing interface everywhere
and is implemented using peer-to-peer communication.

Ray is free software distributed under the terms of the GNU General Public License,
version 3 (GPLv3).

Ray is implemented using RayPlatform, a message-passing-interface programming framework.

Ray is documented in

- Documentation/  (many files)
- MANUAL_PAGE.txt  (command-line options, same as Ray -help)
- README.md  (general)
- INSTALL.txt  (quick installation)

## Solutions (all bundled in a single Product called Ray)

Standard:

- de novo genome assembly (works by default)
  => http://online.liebertpub.com/doi/abs/10.1089/cmb.2009.0238
  => Documentation/README-heuristics
- quantification of contig abundances (works by default)

Metagenomics:

- Ray Meta: de novo metagenome assembly (works by default)
  => http://genomebiology.com/2012/13/12/R122
- Ray Communities: quantification of microbiome consortia members (with Ray Communities with -search)
  => Documentation/BiologicalAbundances.txt
- Ray Communities: taxonomy profiling of samples (with -search and -with-taxonomy)
  => Documentation/Taxonomy.txt
- Ray Ontology: gene ontology profiling of samples (with -search and -gene-ontology)
  => Documentation/GeneOntology.txt
- Ray Surveyor: compare genomic content between samples (with -run-surveyor)
  => Documentation/Ray-Surveyor.md

Transcriptomics:

- de novo transcriptome assembly (works, but not tested a lot)
- quantification of transcript expression



# Distributors

- Geeknet, Inc. http://sourceforge.net/projects/denovoassembler/
- GitHub, Inc. https://github.com/sebhtml/ray
- Debian (Software in the Public Interest, Inc.) http://packages.debian.org/sid/ray
- Ubuntu (Canonical Ltd.) https://launchpad.net/ubuntu/+source/ray
- archlinux (AUR Development Team.) https://aur.archlinux.org/packages/ray/
- DNAnexus, Inc. https://platform.dnanexus.com/app/ray
- CloudBioLinux https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/custom/bio_nextgen.py

In progress:

- Fedora (Red Hat, Inc.) https://bugzilla.redhat.com/show_bug.cgi?id=872783 (in progress)
- Galaxy (Galaxy Team) http://user.list.galaxyproject.org/How-do-I-add-Ray-to-Galaxy-Central-in-the-tool-shed-td4655623.html#none 
- BaseSpace (Illumina, Inc.)

# Website

- http://denovoassembler.sf.net

# Code repositories

- http://github.com/sebhtml/ray  (Ray plugins for genomics)

- http://github.com/sebhtml/RayPlatform  (the engine RayPlatform)

If you want to contribute, clone the repository, make changes
and I (Sébastien Boisvert) will pull from you after reviewing
the code changes.

# Other related repositories

- http://github.com/sebhtml/Ray-TestSuite  (system tests & unit tests)

- http://github.com/sebhtml/Ray.web  (Ray SourceForge web site)

- http://github.com/sebhtml/ray-logo  (artworks)

# Mailing lists

- Users: denovoassembler-users AT lists.sourceforge.net

- Read it on gmane: http://blog.gmane.org/gmane.science.biology.ray-genome-assembler

- Development/hacking: denovoassembler-devel AT lists.sourceforge.net

- SEQanswers: http://seqanswers.com/forums/showthread.php?t=4301

# Installation

You need a C++ compiler (supporting C++ 1998), make, an implementation of MPI (supporting MPI 2.2).

## Compilation

	tar xjf Ray-x.y.z.tar.bz2
	cd Ray-x.y.z
	make PREFIX=build
	make install
	ls build

## Compilation using CMake

	tar xjf Ray-x.y.z.tar.bz2
	cd Ray-x.y.z
	mkdir build
	cd build
	cmake ..
	make

## Change the compiler

	make PREFIX=build2000 MPICXX=/software/openmpi-1.4.3/bin/mpicxx
	make install

Tested C++ compilers: see Documentation/COMPILERS.txt

## Parallel I/O

To compile with MPI I/O, use this:

	make MPI_IO=y

## Faster execution

Some processors have the popcnt instruction and other cool instructions.
With gcc, add -march=native to build Ray for the processor used for
the compilation.

	make PREFIX=Build.native DEBUG=n ASSERT=n EXTRA=" -march=native"
	make install

Another way to build Ray is to use whole-program optimization.
With gcc, use this script:

	./scripts/Build-Link-Time-Optimization.sh

## Use large k-mers

	make PREFIX=Ray-Large-k-mers MAXKMERLENGTH=64
	# wait
	make install
	mpirun -np 512 Ray-Large-k-mers/Ray -k 63 -p lib1_1.fastq lib1_2.fastq \
	-p lib2_1.fastq lib2_2.fastq -o DeadlyBug,Assembler=Ray,K=63
	# wait
	ls DeadlyBug,Assembler=Ray,K=63/Scaffolds.fasta

## Compilation options

	make PREFIX=build-3000 MAXKMERLENGTH=64 HAVE_LIBZ=y HAVE_LIBBZ2=y \
	ASSERT=n FORCE_PACKING=y
	# wait
	make install
	ls build-3000

see the Makefile for more.


# Run Ray

To run Ray on paired reads:

	mpiexec -n 25 Ray -k31 -p lib1.left.fasta lib1.right.fasta -p lib2.left.fasta lib2.right.fasta -o RayOutput
	ls RayOutput/Contigs.fasta
	ls RayOutput/Scaffolds.fasta
	ls RayOutput/

# Using a configuration file

Ray can be run with a configuration file instead.

mpiexec -n 16 Ray Ray.conf

Content of Ray.conf:

 
 -k 31  # this is a comment
 -p 
    lib1.left.fasta 
    lib1.right.fasta 

 -p
   lib2.left.fasta 
   lib2.right.fasta  
 
  -o RayOutput

# Outputted files

RayOutput/Contigs.fasta and RayOutput/Scaffolds.fasta

type Ray -help for a full list of options and outputs


# Color space

Ray assembles color-space reads and generate color-space contigs.
Files must have the .csfasta extension. Nucleotide reads can not be mixed
with color-space reads. This is an experimental feature.

# Publications

http://denovoassembler.sf.net/publications.html

# Code

## Code documentation

	cd code
	doxygen DoxygenConfigurationFile
	cd DoxygenDocumentation/html
	firefox index.html

# Useful links

## Cloud computing

- http://aws.amazon.com/ec2/hpc-applications/
- https://cloud.genomics.cn/
- http://szdaily.sznews.com/html/2011-08/04/content_1689998.htm
- http://www.nature.com/nbt/journal/v28/n7/full/nbt0710-691.html

## Message-passing interface

- http://dskernel.blogspot.com/2011/07/understand-main-loop-of-message-passing.html
- http://cw.squyres.com/
- http://blogs.cisco.com/performance/
- http://www.parawiki.org/index.php/Message_Passing_Interface#Peer_to_Peer_Communication

# Funding

Doctoral Award to S.B., Canadian Institutes of Health Research (CIHR)

# Authors

see AUTHORS

# Compile Ray on Microsoft Windows with Microsoft Visual Studio

see Documentation/VISUAL_STUDIO.txt