The permanent result files from a COA created by CodonW have the extension
�.coa� for a description of their and contents see Table 1.
Short description of output files created by correspondence analysis in
This file contains a summary of all the information generated by
correspondence analysis, including all the data written to files listed
below, except for the output written to cusort.coa.
Each axis generated in the correspondence analysis is represented by a row
of information. Each row consists of four columns, (1) the number of the
axis, (2) the axis eigenvalue, (3) the relative inertia of the axis, (4) the
sum of the relative inertia.
amino.coa� or codon.coa
Each codon or amino acid included in the correspondence analysis is
represented by a row. The first column is description of the variable, the
subsequent columns contain the coordinate of the codon or amino acid on the
axes, the number of axes is user definable.
Each row represents one gene, the first column contains a unique description
for each gene, and subsequent columns contain the coordinates for each of
the recorded axis. If additional genes are added to the correspondence
analysis (advanced correspondence analysis option), the coordinates of these
genes are appended to this file.
Contains the codon usage of each gene, sorted by the gene�s coordinate on
the principal axis, this information is used to generate the table in
This files records a 2 way Chi squared contingency test between two subsets
(as defined by the �advanced correspondence analysis options�) of genes
positioned at the extremes of axis 1 (cusort.coa).
Contains the relative usage of each codon within each synonym family, the
most frequent codon assigned the value one and all other codons are
expressed relative to this. This file can be used to calculate species
specific CAI values.
fop.coa �and cbi.coa�
Contains a list of the optimal codons and non-optimal codons as identified
in the file �hilo.coa�. The format of this file can be utilised by CodonW to
calculate Fop and CBI using a specific choice of optimal codons.
This file is only generated if the exhaustive output option is selected
under the advanced correspondence analysis menu. It contains four tables of
information, the first two report the absolute contribution of each gene and
codon (or amino acid) to the inertia explained by each axis. The second two
tables� report the fraction of variation in each gene and codon (or amino
acid) explained by each axis.
codon.coa and hilo.coaare not generated during the correspondence analysis
of amino acids
Detailed explanation of file contents
Correspondence analysis generate a large volume of data, CodonW writes the
essential data necessary to interpret the correspondence analysis to the
genes.coa codons.coa amino.coa
The most complex analysis that CodonW performs is correspondence analysis
(COA). COA creates a series of orthogonal axis to identify trends that
explain the data variation, with each subsequent axis explaining a
decreasing amount of the variation. COA positions each gene and codon (or
amino acid) on these axes. An important property is that the ordination of
the rows (genes) and columns (codons or amino acids) are superimposable.
The Eigen values of the principle trends, as well as the more accessible
fraction (with the cumulative total) of the total data inertia, that each
axes is explaining, is recorded to summary.coa and eigen.coa.
To simplify analyse of codon usage CodonW assumes that the principle trend
is correlated with gene expression. It uses this assumption to identify
putative optimal codons. Though the adage GIGO �garbage in, garbage out�
must be stressed, it is the researchers responsibility to establish that the
principle trend is correlated with gene expression (see tutorial for some
example of how to do this).
To identify the putative optimal codons, the genes are sorted according to
their position on the principle, the sorted codon usage of these genes is
written to the file �cusort.coa�. Then a number of genes, decided by the
advanced correspondence analysis menu option �number of genes used to
identify optimal codons�, are read from the start and end of this file (i.e.
equivalent the extremes of the principle axis), the codon usage of each set
of genes is totalled. The set of genes with the lower Nc (more highly
biased) is putatively
identified as the more highly expressed.
Optimal codons are defined as those codons that occur significantly more
often in highly expressed genes relative to their frequency in lowly
expressed genes. Significance is assessed by a two-way chi square
contingency test with the criterion of p < 0.01. The advantage of using a
test of significance to identify optimal codons is that variation in codon
usage between highly and lowly expressed genes, that is due to random noise
is suppressed, but a disadvantage is that the test is dependent on sample
After CodonW does a two way chi squared test on the genes taken from the
extremes of axis 1, their codon usage and RSCU is output as a table to
�summary.coa� and �hilo.coa�. those codons which have been putatively
identified as optimal p < 0.01 are indicated with an asterisk (*). Though
not considered optimal by CodonW, codons that occur more frequently in the
highly expressed dataset at 0.01 < p < 0.05 are indicated with a ampersand
fop.coa cbi.coa cai.coa
CodonW measures the degree to which the codon usage of a gene has adapted
towards the usage of optimal codons. It does this by calculating these
indices, the frequency of optimal codons (Fop), codon bias index, and codon
adaptation index (CAI). To calculate these indexes, information about codon
usage in the species being analysed is needed. The indices Fop and CBI used
the optimal codons for the species. The index CAI uses codon adaptation
For some species this information is known, and for these the optimal codons
and codon adaptiveness values are in-built into codonW (see the �Change
Defaults� menu). For other species these indexes cannot be calculated unless
the additional information is know. During calculation of these indices the
user is prompted for input files.
During a COA CodonW generates the output files �cai.coa�, �fop.coa� and
�cbi.coa�. These files can be used as input files for their respective
indices (they are already in the correct format).
Again it must be stressed that CodonW must make a number of assumptions to
generate these files. These are: that the major trend in the codon usage is
correlated with expression level; that the dataset contains highly expressed
genes; that the genes used to identify of optimal codons where highly
expressed. If these assumptions are valid then the files �cbi.coa�,
�cai.coa� and �fop.coa� can be used to calculate the indexes CBI, CAI and