File: INSTALL.Chado

package info (click to toggle)
libchado-perl 1.31-6
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid
  • size: 44,716 kB
  • sloc: sql: 282,721; xml: 192,553; perl: 25,524; sh: 102; python: 73; makefile: 57
file content (413 lines) | stat: -rw-r--r-- 15,776 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
$Id: INSTALL.Chado,v 1.10 2008-11-04 20:36:20 scottcain Exp $

* IMPORTANT NOTE for the 1.31 release:

While this packages can be used to both update older Chado schemas and
install new ones, the newly installed schemas will be of limited utility
due to issues with the Relationship Ontology/Relations Ontology.  When
those issues are sorted out, we will create a new release.  

* COMMAND-LINE INSTALL

If you experience problems, please email them to the gmod-schema mailing list
at gmod-schema@lists.sourceforge.net.  

This release will work with the most recent release of the Generic 
Genome Browser (GBrowse) version 1.68 or better. If you experience difficulties 
with GBrowse and Chado, you might want to look at getting a svn
checkout of the gbrowse-stable branch.  The installation instructions
for GBrowse are included in that package.  Additionally, for working
with GBrowse, you will need the Bio::DB::Das::Chado modules that you
can get from CPAN.

PREREQUISITES

- PostgreSQL

Currently GMOD developers are using 8.1 or better (PostgreSQL 9 has not
been tested).

Items to do with Postgres to make it ready to go:

    * Make it accept TCP/IP connections by adding this line to postgresql.conf
      (must be done either as user root or postgres; database must be restarted
      in order for this change to take affect):

        tcpip_socket = true

      (This option is not available and not needed in PostgreSQL 8.1 or better.)

    * Create a database user with permission to drop and add databases;
      the database user name should be the same as your Unix user name to
      allow the software build to progress smoothly (must be done as user
      postgres; createuser is a commandline program that comes with the
      PostgreSQL package):
        
        $ sudo su - postgres 
        $ createuser --createdb <your username> 
        $ exit                     # to exit out of the postgres user's shell

    * Tell postgres that it can use the plpgsql language (as user
      postgres; createlang is a commandline program that comes with
      the PostgreSQL package):

        $ sudo su - postgres 
        $ createlang plpgsql template1
        $ exit                     # to exit out of the postgres user's shell

    * Edit the pg_hba.conf (either as the user 'root' or 'postgres') to give
      the user created above permission to access the database.  Read
      the comments in pg_hba.conf regarding permissions.  An example
      pg_hba.conf looks like this (which is very loose permissions):

      # TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD
                                                                                
      local   all         all                               trust
      # IPv4 local connections:
      host    all         all         127.0.0.1/32          trust
      # IPv6 local connections:
      host    all         all         ::1/128               trust

      NOTE: If you are setting up a production Chado instance, now is a 
      good time to decide how you want to define users and do client 
      authenication for your database.  Postgresql supports several methods 
      for defining users, including using operating system users, LDAP,
      Kerberos and many others.  See the Postgesql manual for more:
      http://www.postgresql.org/docs/8.4/interactive/client-authentication.html
      http://www.postgresql.org/docs/8.4/interactive/user-manag.html

    * For Pg 8.1+, if you want to allow remote connections, the
      listen_addresses option may need to be modified; it does
      allow a wildcard '*', which corresponds to all available
      IP interfaces (it does *not* specify the IP addresses that
      are allowed to connect).  Set this in postresql.conf file, which
      is in the same directory as pg_hba.conf.

For information on tuning postgres for performance, see

    http://gmod.org/wiki/PostgreSQL_Performance_Tips

and

    http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

The two most critical parameters to tune are shared_buffers and
effective_cache size.  Adjusting these parameters may require
modification of memory settings in /etc/sysctl.conf, see the sysctl
manpage for details.  Also critical for continued performance of
postgres is the regular execution of the VACUUM FULL ANALYZE command.
This command clears out old, deleted data and analyzes the structure
of the database so that the execution planner can predict the
fastest way to execute a given query.

While the above link describes tuning in general, the examples given
for tuning kernel parameters are Linux specific.  For setting 
shmmax on Mac OS X boxes, edit 

   /System/Library/StartupItems/SystemTuning/SystemTuning (for OS X 10.2)
        
or
 
   /etc/rc (for OS X 10.3)

or

  /etc/sysctl.conf (for OS X 10.5)

to increase the values of shmmax and shmall, like this:

          sysctl -w kern.sysv.shmmax=52428800 # bytes: 50 megs
          sysctl -w kern.sysv.shmmin=1
          sysctl -w kern.sysv.shmmni=32
          sysctl -w kern.sysv.shmseg=8
          sysctl -w kern.sysv.shmall=25600 # 4k pages: 100 megs

(these are the values used on a Mac that has 1.2 G RAM) and reboot.

For a Linux box with 512M RAM, use these values in /etc/sysctl.conf:

         kernel.shmall = 134217728
         kernel.shmmax = 134217728

and make these changes to the postgresql.conf file:

         tcpip_socket = true  # Replaced with listen_addresses in Postgres 8.0+
         work_mem=2048        # This is "sort_mem" if using Postgres 7.x
         max_connections = 32

- BioPerl
		
bioperl-live or a version 1.6.1 or better.  See http://bioperl.org.


- go-perl
      
Can be obtained from CPAN using the cpan shell with the command

  cpan> install GO::Parser


- ant
      
When installing from svn, ant is needed to move GMODTools files
from schema/GMODTools into schema/chado.  When installing from
a distribution, this is not necessary, as the files will have
already been moved as part of the build process.


- Perl modules
      
The perl modules can be installed via the CPAN shell and by issuing 
the command 'install Bundle::GMOD' which will install all of the 
modules below except for SQL::Translator, which is optional.

    * CGI                      (GBrowse)
    * GD                       (GBrowse)
    * DBI                      (GBrowse, Chado)
    * DBD::Pg                  (GBrowse, Chado)
    * SQL::Translator          (chado) (only for a custom Chado schema)
    * Digest::MD5              (GBrowse)
    * Text::Shellwords         (GBrowse)
    * Graph                    (Bio-Chaos)
    * Data::Stag               (Chado)
    * XML::Parser::PerlSAX     (Chado)
    * Module::Build            (Chado)
    * Class::DBI               (GMODWeb, or with a custom Chado schema)
    * Class::DBI::Pg           (GMODWeb, or with a custom Chado schema)
    * Class::DBI::Pager        (GMODWeb, or with a custom Chado schema)
    * DBIx::DBStag             (Chado)
    * XML::Simple              (Chado)
    * LWP                      (Chado)
    * Template                 (Chado)
    * Bio::Chado::Schema       (Chado)
  

INSTALL THE CHADO SCHEMA

- Set environmental variables

First, you must set some variables in your environment.
If you are using bash or a bash-like shell, this is done via a command
like this:

       $ export VARNAME=value

If you are using tcsh or another csh-like shell, it is done like this:

       $ setenv VARNAME value

To make life easier on yourself, you will probably also want to put those
commands in your .tcshrc or .bashrc file so that the envirnment variables
are always available when you log in. 

   * GMOD_ROOT: The location of your Chado installation 
     (e.g., "/usr/local/gmod").  Will contain the source files that define 
     the schema, as well as configuration settings and temp space.

   * CHADO_DB_NAME: The name of your Chado database

   * CHADO_DB_USERNAME: The username to connect to Chado

   * CHADO_DB_PASSWORD: The password for the database user [opt]

   * CHADO_DB_HOST: The host on which the database runs (e.g. "localhost") [opt]

   * CHADO_DB_PORT: The port on which the database is listening [opt]

As indicated, the host, port, and password are optional.


*   Note: a mechanism exists to pass these variables directly to the
    installer during the "perl Makefile.PL" step.  By giving key=value pairs,
    it is possible to avoid setting environmental variables.  The syntax is as:

       $ perl Makefile.PL GMOD_ROOT=/usr/local/gmod CHADO_DB_NAME=dev_chado_01

Backward compatibility may not be maintained for this method of configuring
the install process will work.


- Create the Makefile and other configuration files

From the chado directory (the same directory INSTALL.Chado is in) run the 
following command:

  $ perl Makefile.PL   

You will be prompted for several configuration values
used by Chado and its associated tools:

        *   Use the simple install (uses default database schema) [Y]

        Answering yes eliminates the need to have SQL::Translator installed.
        This is recomended, and that is all that is necessary in order to use
        the full schema and run GBrowse and GMODWeb on top of it.

        *   Use values in '/home/scott/gmod/build.conf'? [Y]

        If `perl Makefile.PL` has been run before, answering yes to this
        will cause Makefile.PL to use the configuration options from the
        previous build.

        *   What database server will you be using? [PostgreSQL] 

        Specify what database vendor to use.  Currently only PostgreSQL works.

        *   What is the Chado database name? [dev_chado_allenday_05] 

        This will be the name of the created chado database.

        *   What is the database username? [allenday] 

        Default user that the installed libraries should try to
        connect to the database as.

        *   What is the password for 'allenday'?  

        Password for the default user.

        *   What is the database host? [localhost] 

        Host of the database daemon.

        *   What is your database port? [5432] 

        Port of the database daemon.

        *   Where shall downloaded ontologies go? [./tmp]

        The directory where ontology files and there lock files will be stored

        *   What is the default organism (common name, or "none")?

        The organism name should be one what will be in the organism table.
        When the database is created, several organisms will be there
        by default; these include: human, fruitfly, mouse, mosquito,
        rat, Arabidopsis thaliana, worm, zebrafish, rice, and yeast.  (The
        insert statements that create these default organisms are 
        contained in load/etc/initialize.sql).

        *   Do you want to make this the default chado instance? [y]

        You can have more than one Chado instance on a server, each with a
        different name.  You can supply one of those names when loading
        GFF, for example "--dbprofile fly_staging".  If you don't supply the
        --dbprofile option, it will just use the default database parameters.


        If you answered 'No' to the simple install question, AutoDBI.pm
        will now be created by SQL::Translator, see the CUSTOM DATABASE
        SCHEMAS section below for more information.

- Make the schema

    $ make

- Install the scripts and modules

   $ make install

or

   $ sudo make install

Probably needs to be run as root.  Installs data loading scripts
in perl's path (typically /usr/local/bin or /usr/bin), perl modules,
as well as placing various files in $GMOD_ROOT, and creating the
infastructure for logging of errors by creating $GMOD_ROOT/logs and
creating the file /etc/log4perl.conf if it does not already exist.

- Install the schema

   $ make load_schema

Creates database, installs schema.

This wipes out any database with the same name in the process!

- Insert baseline data

   $ make prepdb

Inserts a few useful items into fundamental Chado tables. It 
uses load/etc/initialize.sql.  It contains information for several
common organisms and source databases (e.g. Genbank). This file
can be edited to add any organism or source database, using the
INSERT statements for the examples as a template.  Note also that
the prepdb target needs to be executed before the ontologies target,
but it can be executed again later, if more insert statements are
added (for instance to add a new organism or database).

- Load ontologies

   $ make ontologies

Gets and installs various ontologies.  Requires a network 
connection.  Absolutely required are the Relationship Ontology and
the Sequence Ontology (SO).  All others are optional, though the Feature
Property controlled vocabulary will typically be useful for loading GFF
Files, and the Gene Ontology is generally useful for a wide variety of
gene feature annotations.  Note retrieved ontology files are stored in
the directory specified when 'perl Makefile.PL' was run (the default
is ./tmp).  In order to do a repeat installation, the directory
containing the downloaded ontology must be removed.  In addition
to 'rm -rf ./tmp', you can also issue the `make clean` command,
which will clear out all of the files and directories created
up until this point in the installation.   Also note that loading
a large ontology like the Gene Ontology will take several minutes
(perhaps as long as an hour).

Note that since `make ontologies` downloads ontology files from their
online repositories, this step is prone to failure due to network
problems.  

If you already have the desired ontology files locally, you
can execute a command for each file to load it.  Note again that
the Relationship Ontology is required before all others, and the
the Sequence Ontology (SO) is absolutely required for proper
functioning of the database.  The commands to load an ontology are:

      $  go2fmt -p obo_text -w xml /path/to/obofile | \
            go-apply-xslt oboxml_to_chadoxml - > obo_text.xml

This creates a chadoxml file of the obo file - then execute:

     $ stag-storenode \
     -d 'dbi:Pg:dbname=$CHADO_DB_NAME;host=$CHADO_DB_HOST;port=$CHADO_DB_PORT' \
     --user $CHADO_DB_USERNAME --password $CHADO_DB_PASSWORD obo_text.xml

If you have other ontology format files, the commands are similar;
consult the documentation for go2fmt and go-apply-xslt for your
file format.

It is a good idea at this point to make a back up of the database,
particularly if you loaded a large ontology like GO.  To make a complete
dump of the database, issue this command:

        $ pg_dump db_name  >  db_dump.sql

and to restore the database, issue this command:

        $ psql db_name <  db_dump.sql


LOADING DATA

With that, the installation of the schema is complete. Please see the HOWTOs
at http://gmod.org/ for information on loading the Chado schema with data.


CUSTOM DATABASE SCHEMAS

If you answered 'No' to the question about doing a simple install
during `perl Makefile.PL`, you must provide the files default_schema.sql
and default_nofuncs.sql.  The best way to create these files is using
bin/chado-build-schema.pl, a perl script with a graphical user interface
for interactively building a Chado schema.  If you are providing table
definitions of your own, you will also have to edit the file 
chado-module-metadata.xml to define how your tables relate to other
tables in the Chado schema.  While there is no documentation of the DTD
of this file, it is relatively straight forward.  See INSTALL.Custom
for more information on how chado-build-schema.pl relates to the build 
process.  Once your default_schema.sql and default_nofuncs.sql files 
are in place in the modules directory you can run `perl Makefile.PL`.