1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
|
README.coa
The permanent result files from a COA created by CodonW have the extension
.coa for a description of their and contents see Table 1.
Short description of output files created by correspondence analysis in
CodonW.
summary.coa
This file contains a summary of all the information generated by
correspondence analysis, including all the data written to files listed
below, except for the output written to cusort.coa.
eigen.coa
Each axis generated in the correspondence analysis is represented by a row
of information. Each row consists of four columns, (1) the number of the
axis, (2) the axis eigenvalue, (3) the relative inertia of the axis, (4) the
sum of the relative inertia.
amino.coa or codon.coa
Each codon or amino acid included in the correspondence analysis is
represented by a row. The first column is description of the variable, the
subsequent columns contain the coordinate of the codon or amino acid on the
axes, the number of axes is user definable.
genes.coa
Each row represents one gene, the first column contains a unique description
for each gene, and subsequent columns contain the coordinates for each of
the recorded axis. If additional genes are added to the correspondence
analysis (advanced correspondence analysis option), the coordinates of these
genes are appended to this file.
cusort.coa
Contains the codon usage of each gene, sorted by the genes coordinate on
the principal axis, this information is used to generate the table in
hilo.coa
This files records a 2 way Chi squared contingency test between two subsets
(as defined by the advanced correspondence analysis options) of genes
positioned at the extremes of axis 1 (cusort.coa).
cai.coa
Contains the relative usage of each codon within each synonym family, the
most frequent codon assigned the value one and all other codons are
expressed relative to this. This file can be used to calculate species
specific CAI values.
fop.coa and cbi.coa
Contains a list of the optimal codons and non-optimal codons as identified
in the file hilo.coa. The format of this file can be utilised by CodonW to
calculate Fop and CBI using a specific choice of optimal codons.
inertia.coa
This file is only generated if the exhaustive output option is selected
under the advanced correspondence analysis menu. It contains four tables of
information, the first two report the absolute contribution of each gene and
codon (or amino acid) to the inertia explained by each axis. The second two
tables report the fraction of variation in each gene and codon (or amino
acid) explained by each axis.
codon.coa and hilo.coaare not generated during the correspondence analysis
of amino acids
Detailed explanation of file contents
summary.coa
========================================
Correspondence analysis generate a large volume of data, CodonW writes the
essential data necessary to interpret the correspondence analysis to the
file summary.coa.
genes.coa codons.coa amino.coa
========================================
The most complex analysis that CodonW performs is correspondence analysis
(COA). COA creates a series of orthogonal axis to identify trends that
explain the data variation, with each subsequent axis explaining a
decreasing amount of the variation. COA positions each gene and codon (or
amino acid) on these axes. An important property is that the ordination of
the rows (genes) and columns (codons or amino acids) are superimposable.
eigen.coa
========================================
The Eigen values of the principle trends, as well as the more accessible
fraction (with the cumulative total) of the total data inertia, that each
axes is explaining, is recorded to summary.coa and eigen.coa.
cusort.coa
========================================
To simplify analyse of codon usage CodonW assumes that the principle trend
is correlated with gene expression. It uses this assumption to identify
putative optimal codons. Though the adage GIGO garbage in, garbage out
must be stressed, it is the researchers responsibility to establish that the
principle trend is correlated with gene expression (see tutorial for some
example of how to do this).
To identify the putative optimal codons, the genes are sorted according to
their position on the principle, the sorted codon usage of these genes is
written to the file cusort.coa. Then a number of genes, decided by the
advanced correspondence analysis menu option number of genes used to
identify optimal codons, are read from the start and end of this file (i.e.
equivalent the extremes of the principle axis), the codon usage of each set
of genes is totalled. The set of genes with the lower Nc (more highly
biased) is putatively
identified as the more highly expressed.
hilo.coa
========================================
Optimal codons are defined as those codons that occur significantly more
often in highly expressed genes relative to their frequency in lowly
expressed genes. Significance is assessed by a two-way chi square
contingency test with the criterion of p < 0.01. The advantage of using a
test of significance to identify optimal codons is that variation in codon
usage between highly and lowly expressed genes, that is due to random noise
is suppressed, but a disadvantage is that the test is dependent on sample
size.
After CodonW does a two way chi squared test on the genes taken from the
extremes of axis 1, their codon usage and RSCU is output as a table to
summary.coa and hilo.coa. those codons which have been putatively
identified as optimal p < 0.01 are indicated with an asterisk (*). Though
not considered optimal by CodonW, codons that occur more frequently in the
highly expressed dataset at 0.01 < p < 0.05 are indicated with a ampersand
(@).
fop.coa cbi.coa cai.coa
========================================
CodonW measures the degree to which the codon usage of a gene has adapted
towards the usage of optimal codons. It does this by calculating these
indices, the frequency of optimal codons (Fop), codon bias index, and codon
adaptation index (CAI). To calculate these indexes, information about codon
usage in the species being analysed is needed. The indices Fop and CBI used
the optimal codons for the species. The index CAI uses codon adaptation
values.
For some species this information is known, and for these the optimal codons
and codon adaptiveness values are in-built into codonW (see the Change
Defaults menu). For other species these indexes cannot be calculated unless
the additional information is know. During calculation of these indices the
user is prompted for input files.
During a COA CodonW generates the output files cai.coa, fop.coa and
cbi.coa. These files can be used as input files for their respective
indices (they are already in the correct format).
Again it must be stressed that CodonW must make a number of assumptions to
generate these files. These are: that the major trend in the codon usage is
correlated with expression level; that the dataset contains highly expressed
genes; that the genes used to identify of optimal codons where highly
expressed. If these assumptions are valid then the files cbi.coa,
cai.coa and fop.coa can be used to calculate the indexes CBI, CAI and
Fop respectively.
|