File: TODO

package info (click to toggle)
libchado-perl 1.31-6
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid
  • size: 44,716 kB
  • sloc: sql: 282,721; xml: 192,553; perl: 25,524; sh: 102; python: 73; makefile: 57
file content (135 lines) | stat: -rw-r--r-- 6,137 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
In preparation for the 0.1 release and a talk/poster at the Biology
of Genomes meeting, here is a new TODO list:

BioMart bridge layer (functions and views)
Complete most or all items in chado TODO list (needs prioritized, see below)
  -put todo items in sf system to allow comments
Build demo system (based on current, 0.1 schema) on new gmod server, with
  (organism?  yeast, rice, fly, other?)
  -chado
  -gbrowse
  -apollo functions? (ie, do we want to allow live, guest editting?)
  -turnkey
  -biomart
  -drupal (separate from gmod.org, for community stuff)?
  -cmap
  -BLAST/blastGraphic?
  -AmiGO?
  -DAS2 server?


For the 0.1 release:
- make the gff3 loaders validate that the uniquename is really unique
  before trying to insert. (HI-Done, should change to get a complete list
  and cache it to increase speed (although that would be LOW))
- add a function to the apollo triggers to simplify the merge and split
  operations (LOW-Apollo development currently unfunded)
- add the ability to pass a dbprofile name to scripts (HI - should be easy)
- add bioperl patching functionality similar to GBrowse (though make sure
  they don't stomp on each other (HI)
- address flybase's use of of analysisfeature combined with feature to
  give source-type information (in GFF terms).  This will need to 
  be addressed in the GBrowse adaptor. (MED - not sure what is involved)
- modify the bulk loader to allow 'mixed' GFF3 files (that is, containing 
  both analysis results and annotations). See perldoc gmod_bulk_load_gff3.pl
  for more info (HI - this is a major pain for non-me users)
- modify the bulk loader to optionally spit INSERT statements in the the
  files rather than the COPY FROM STDIN format as a fall back for really
  big files (HI)
- modify the bulk loader to (optionally) make gene (foster) parents for 
  orphan transcript types. (MED)
- deprecate the use of gmod_load_gff3.pl in favor of the bulk loader
- migrate ontology loading to use go-perl (instead of Class::DBI--which
  may completely eliminate the need for having Class::DBI installed 
  (except for turnkey)) (Done)
- Flesh out Apollo stuff to be more transparent to the person doing the
  installation (MED)
- Make a decision on sequence shredding and implement appropriate views 
  and functions (LOW - what we have works)
- fix perl modules to conform with Todd's Bio::GMOD namespace (MED - shouldn't
  be too hard)
- fix Bio::FeatureIO to return at least a simple value for Gap attributes (HI -
  should be easy)
- update docs for changes in installation procedures (HI)

The things below here are mostly done and just left here for historical interest
--------------------------------------------------------------------
MORE TODO
  - make cdbi optional for the make process

TODO for the next alpha release (SOON!)

* Have things install in GMOD_ROOT:
  DONE   - in src/  
            chado/modules
   - in doc/
            chado/READMEs, INSTALL, doc/
  DONE - in conf/
            create gmod.conf, chado.conf
   - in examples/
            chado/data  sample data
   - in bin/
            gmod_*.pl (including what is in load/bin)
  in docs, suggest that distribution directory be moved to /usr/local/gmod/src.

  Add docs for how to load genbank, blast and pfam results

Tue Jan  6 01:34:51 PST 2004 - allenday

Add documentation to README and/or INSTALL regarding Postgres VACUUM and ANALYZE
functions.  We might want to give users an example of how to set up a cron job
to do this nightly.  I had a 3 order of magnitude speed increase today from a 
10 second ANALYZE command.  See 

   http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

which has useful information for tuning postgresql.conf as well as a suggestion
for setting up the auto vacuum daemon.

We might also want to build an analyze() method into the loaders, or possibly
the middleware layer itself.  This will dramatically speed up large loads, but
requires changing the cache/flush strategy to the more complex
begin-transaction/cache/flush/end-transaction/analyze strategy.
--


where to start?

For the next release:
- fix Makefile.PL to install scripts in $GMOD_ROOT/bin (i'm not convinced I want this; I think /usr/local/bin is fine sc-3/26/04)
- add documentation for creating Class::DBI classes for an existing schema--
this will no doubt be necessary for future upgrade paths (I'm thinking the way
to suggest doing this is dumping the schema from the existing database, via
`pg_dump -s DBNAME > sqlfile`; `pg2cdbi.pl DBNAME USERNAME PASSWORD sqlfile`
- add documentation for loading ontologies when the files are already present
locally.
- remove hard coded passwords and other conf stuff from Class::DBI classes.
- integrate Apollo
- integrate a web front end (probably turnkey)
- provide a custom tag processing module for load_gff3.pl
- create a method for updating ontologies
- Makefile.PL should also track optional modules included and allow
  skipping the prompting for them
- create a method for populating cvtermpath (make_cvtermpath.pl is a starting
point).
- add support for relative coordinates in gff3 and the gbrowse adaptor
- ontology name in cv table needs to be the same as the name of the root node in the dag-edit ontology file,
  (requested by mungall).
- gene ontology (GO) needs to have its three aspects split into separate ontologies, so the bullet above would
  use one term below the root (eg, "biological process") as the cv.name and root node in cvterm/cvtermpath/cvtermrelationship.
  (requested by mungall).

For following releases:
- incorporate Pub*
- incorporate cmap
- write a bulk load utility that will quickly load GFF3 data to chado
- write a utility that will delete data from the database with a GFF
file as a template
--
Load order needs to change.  I have started adding "bridge" relationships between
ontologies to the prepdb step, and this has to happen after the ontologies have
already been entered so the INSERT statement can find the terms.

Documentation needs to reflect this, of course.

We should start using the bug tracker on SF  --start using 'Bugs' in http://sourceforge.net/tracker/?group_id=27707