Description of Schema

Tables

analysisfeature

analysisfeatureprop

analysis

Comments:

$Id: companalysis.sql,v 1.37 2007-03-23 15:18:02 scottcain Exp $
==========================================
Chado companalysis module
=================================================================
Dependencies:
:import feature from sequence
:import cvterm from cv
=================================================================
================================================
TABLE: analysis
================================================
An analysis is a particular type of a computational analysis; it may be a blast of one sequence against another, or an all by all blast, or a different kind of analysis altogether. It is a single unit of computation.

Field Name	Data Type	Size	Default Value	Other	Foreign Key
analysis_id	integer	11		PRIMARY KEY, NOT NULL
name	varchar	255		A way of grouping analyses. This should be a handy short identifier that can help people find an analysis they want. For instance "tRNAscan", "cDNA", "FlyPep", "SwissProt", and it should not be assumed to be unique. For instance, there may be lots of separate analyses done against a cDNA database.
description	text	64000
program	varchar	255		UNIQUE, NOT NULL, Program name, e.g. blastx, blastp, sim4, genscan.
programversion	varchar	255		UNIQUE, NOT NULL, Version description, e.g. TBLASTX 2.0MP-WashU [09-Nov-2000].
algorithm	varchar	255		Algorithm name, e.g. blast.
sourcename	varchar	255		UNIQUE, Source name, e.g. cDNA, SwissProt.
sourceversion	varchar	255
sourceuri	text	64000		This is an optional, permanent URL or URI for the source of the analysis. The idea is that someone could recreate the analysis directly by going to this URI and fetching the source data (e.g. the blast database, or the training model).
timeexecuted	timestamp	0	current_timestamp	NOT NULL

Constraints

Type	Fields
NOT NULL	analysis_id
NOT NULL	program
NOT NULL	programversion
NOT NULL	timeexecuted
UNIQUE	program, programversion, sourcename

analysisprop

Comments:

================================================
TABLE: analysisprop
================================================

Field Name	Data Type	Size	Default Value	Other	Foreign Key
analysisprop_id	integer	11		PRIMARY KEY, NOT NULL
analysis_id	integer	10		UNIQUE, NOT NULL	analysis.analysis_id
type_id	integer	10		UNIQUE, NOT NULL	cvterm.cvterm_id
value	text	64000
rank	integer	10	0	UNIQUE, NOT NULL

Indices

Name	Fields
analysisprop_idx1	analysis_id
analysisprop_idx2	type_id

Constraints

Type	Fields
NOT NULL	analysisprop_id
NOT NULL	analysis_id
FOREIGN KEY	analysis_id
NOT NULL	type_id
FOREIGN KEY	type_id
NOT NULL	rank
UNIQUE	analysis_id, type_id, rank

analysisfeature

Comments:

================================================
TABLE: analysisfeature
================================================
Computational analyses generate features (e.g. Genscan generates transcripts and exons; sim4 alignments generate similarity/match features). analysisfeatures are stored using the feature table from the sequence module. The analysisfeature table is used to decorate these features, with analysis specific attributes. A feature is an analysisfeature if and only if there is a corresponding entry in the analysisfeature table. analysisfeatures will have two or more featureloc entries, with rank indicating query/subject

Field Name	Data Type	Size	Default Value	Other	Foreign Key
analysisfeature_id	integer	11		PRIMARY KEY, NOT NULL
feature_id	integer	10		UNIQUE, NOT NULL	feature.feature_id
analysis_id	integer	10		UNIQUE, NOT NULL	analysis.analysis_id
rawscore	float	20		This is the native score generated by the program; for example, the bitscore generated by blast, sim4 or genscan scores. One should not assume that high is necessarily better than low.
normscore	float	20		This is the rawscore but semi-normalized. Complete normalization to allow comparison of features generated by different programs would be nice but too difficult. Instead the normalization should strive to enforce the following semantics: * normscores are floating point numbers >= 0, * high normscores are better than low one. For most programs, it would be sufficient to make the normscore the same as this rawscore, providing these semantics are satisfied.
significance	float	20		This is some kind of expectation or probability metric, representing the probability that the analysis would appear randomly given the model. As such, any program or person querying this table can assume the following semantics: * 0 <= significance <= n, where n is a positive number, theoretically unbounded but unlikely to be more than 10 * low numbers are better than high numbers.
identity	float	20		Percent identity between the locations compared. Note that these 4 metrics do not cover the full range of scores possible; it would be undesirable to list every score possible, as this should be kept extensible. instead, for non-standard scores, use the analysisprop table.

Indices

Name	Fields
analysisfeature_idx1	feature_id
analysisfeature_idx2	analysis_id

Constraints

Type	Fields
NOT NULL	analysisfeature_id
NOT NULL	feature_id
FOREIGN KEY	feature_id
NOT NULL	analysis_id
FOREIGN KEY	analysis_id
UNIQUE	feature_id, analysis_id

analysisfeatureprop

Field Name	Data Type	Size	Default Value	Other	Foreign Key
analysisfeatureprop_id	integer	11		PRIMARY KEY, NOT NULL
analysisfeature_id	integer	10		UNIQUE, NOT NULL	analysisfeature.analysisfeature_id
type_id	integer	10		UNIQUE, NOT NULL	cvterm.cvterm_id
value	text	64000
rank	integer	10		UNIQUE, NOT NULL

Constraints

Type	Fields
NOT NULL	analysisfeature_id
FOREIGN KEY	analysisfeature_id
NOT NULL	type_id
FOREIGN KEY	type_id
NOT NULL	rank
UNIQUE	analysisfeature_id, type_id, rank
FOREIGN KEY	analysisfeature_id
FOREIGN KEY	type_id

Created by SQL::Translator 0.11003