1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328
|
cai
Wiki
The master copies of EMBOSS documentation are available at
http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.
Please help by correcting and extending the Wiki pages.
Function
Calculate codon adaptation index
Description
cai calculates the Codon Adaptation Index for a given nucleotide
sequence, given a reference codon usage table. The CAI index is a
simple, effective measure of synonymous codon usage bias. It index
assesses the extent to which selection has been effective in moulding
the pattern of codon usage. In that respect it is useful for predicting
the level of expression of a gene, for assessing the adaptation of
viral genes to their hosts, and for making comparisons of codon usage
in different organisms. The index may also give an approximate
indication of the likely success of heterologous gene expression.
Algorithm
The CAI index uses a reference set of highly expressed genes from a
species to assess the relative merits of each codon. A score for a gene
sequence is calculated from the frequency of use of all codons in that
gene sequence.
Usage
Here is a sample session with cai
% cai TEMBL:AB009602
Calculate codon adaptation index
Codon usage file [Eyeast_cai.cut]:
Output file [ab009602.cai]:
Go to the input files for this example
Go to the output files for this example
Command line arguments
Calculate codon adaptation index
Version: EMBOSS:6.6.0.0
Standard (Mandatory) qualifiers:
[-seqall] seqall Nucleotide sequence(s) filename and optional
format, or reference (input USA)
-cfile codon [Eyeast_cai.cut] Codon usage table name
[-outfile] outfile [*.cai] Output file name
Additional (Optional) qualifiers: (none)
Advanced (Unprompted) qualifiers: (none)
Associated qualifiers:
"-seqall" associated qualifiers
-sbegin1 integer Start of each sequence to be used
-send1 integer End of each sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-scircular1 boolean Sequence is circular
-squick1 boolean Read id and sequence only
-sformat1 string Input sequence format
-iquery1 string Input query fields or ID list
-ioffset1 integer Input start position offset
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-cfile" associated qualifiers
-format string Data format
"-outfile" associated qualifiers
-odirectory2 string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write first file to standard output
-filter boolean Read first file from standard input, write
first file to standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options and exit. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
-version boolean Report version number and exit
Input file format
cai reads a nucleic acid sequence of a gene.
Input files for usage example
Database entry: TEMBL:AB009602
ID AB009602; SV 1; linear; mRNA; STD; FUN; 561 BP.
XX
AC AB009602;
XX
DT 15-DEC-1997 (Rel. 53, Created)
DT 14-APR-2005 (Rel. 83, Last updated, Version 2)
XX
DE Schizosaccharomyces pombe mRNA for MET1 homolog, partial cds.
XX
KW MET1 homolog.
XX
OS Schizosaccharomyces pombe (fission yeast)
OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC Schizosaccharomycetes; Schizosaccharomycetales; Schizosaccharomycetaceae;
OC Schizosaccharomyces.
XX
RN [1]
RP 1-561
RA Kawamukai M.;
RT ;
RL Submitted (07-DEC-1997) to the INSDC.
RL Makoto Kawamukai, Shimane University, Life and Environmental Science; 1060
RL Nishikawatsu, Matsue, Shimane 690, Japan
RL (E-mail:kawamuka@life.shimane-u.ac.jp, Tel:0852-32-6587, Fax:0852-32-6499)
XX
RN [2]
RP 1-561
RA Kawamukai M.;
RT "S.pmbe MET1 homolog";
RL Unpublished.
XX
DR EnsemblGenomes; SPCC1739.06c; Schizosaccharomyces_pombe.
DR EnsemblGenomes; SPCC1739.06c.1; Schizosaccharomyces_pombe.
DR PomBase; SPCC1739.06c.
DR PomBase; SPCC1739.06c.1.
XX
FH Key Location/Qualifiers
FH
FT source 1..561
FT /organism="Schizosaccharomyces pombe"
FT /mol_type="mRNA"
FT /clone_lib="pGAD GH"
FT /db_xref="taxon:4896"
FT CDS <1..275
FT /codon_start=3
FT /transl_table=1
FT /product="MET1 homolog"
FT /db_xref="GOA:O74468"
FT /db_xref="InterPro:IPR000878"
FT /db_xref="InterPro:IPR003043"
FT /db_xref="InterPro:IPR006366"
FT /db_xref="InterPro:IPR012066"
FT /db_xref="InterPro:IPR014776"
FT /db_xref="InterPro:IPR014777"
FT /db_xref="InterPro:IPR016040"
FT /db_xref="UniProtKB/Swiss-Prot:O74468"
FT /protein_id="BAA23999.1"
FT /translation="SMPKIPSFVPTQTTVFLMALHRLEILVQALIESGWPRVLPVCIAE
FT RVSCPDQRFIFSTLEDVVEEYNKYESLPPGLLITGYSCNTLRNTA"
XX
SQ Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other;
gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt 60
tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac 120
cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg 180
aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg 240
gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt 300
tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac 360
ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt 420
ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt 480
tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa 540
aacaattcta atggtcaaaa a 561
//
Output file format
cai writes the Codon Adaptation Index to the output file.
Output files for usage example
File: ab009602.cai
Sequence: AB009602 CAI: 0.188
Data files
cai requires a reference codon usage table prepared from a set of genes
which are known to be highly expressed. This is specified by the -cfile
option and must exist in the EMBOSS data directory. The default codon
usage table Eyeastcai.cut is the standard set of Saccharomyces
cerevisiae highly expressed gene codon frequiencies. Another table
(Eschpo_cai.cut) was prepared from a set of Schizosaccharomyces pombe
genes by Peter Rice for the S. pombe sequencing team at the Sanger
Centre, and is available in the EMBOSS data directory. You should
prepare your own codon usage table for your organism of interest.
EMBOSS data files are distributed with the application and stored in
the standard EMBOSS data directory, which is defined by the EMBOSS
environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your
current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories.
Project specific files can be put in the current directory, or for
tidier directory listings in a subdirectory called ".embossdata". Files
for all EMBOSS runs can be put in the user's home directory, or again
in a subdirectory called ".embossdata".
The directories are searched in the following order:
* . (your current directory)
* .embossdata (under your current directory)
* ~/ (your home directory)
* ~/.embossdata
Notes
Codons are nucleotide triplet that encode an amino acid residue in a
polypeptide chain. There are four possible nucleotides in DNA; adenine
(A), guanine (G), cytosine (C) and thymine (T), therefore 64 possible
triplets to encode the 20 amino acids plus the translation termination
signal. The encoding is therefore redundant, with all but two amino
acids coded for by more than one triplet. Organisms often have a
particular preference for one of the possible codons for a given amino
acid.
Codon preferences reflect a balance between mutational bias and
selection for efficiency of translation. In fast-growing microorganisms
there are optimal codons that reflect the composition of the genomic
tRNA pool and probably help achieve faster translation rates and high
accuracy. Such selection is expected to be strong in highly expressed
genes, as is the case for Escherichia coli or Saccharomyces cerevisiae.
In contrast, codon usage optimization is normally absent in organisms
with slower growing rates such as Homo sapiens (human), where codon
preferences are determined by mutational biases characteristic to a
particular genome.
Various factors are thought to influence codon usage bias in baceteria,
including gene expression level already mentioned, %G+C composition
(reflecting horizontal gene transfer or mutational bias), GC skew
(reflecting strand-specific mutational bias), amino acid conservation,
protein hydropathy, transcriptional selection, RNA stability, and
optimal growth temperature.
Various methods have been used to analyze codon usage bias. CAI and
methods such as the 'frequency of optimal codons' (Fop) are commonly
used to predict gene expression levels. Others such as the 'effective
number of codons' (Nc) and Shannon entropy are used to measure codon
usage evenness, whereas multivariate statistical methods, iincluding
correspondence analysis and principal component analysis, may be used
to analyze variations in codon usage between genes.
References
1. Sharp PM., Li W-H. "The codon adaptation index - a measure of
directional synonymous codon usage bias, and its potential
applications." Nucleic Acids Research 1987 vol 15, pp 1281-1295.
2. Synonymous codon usage in bacteria. Curr Issues Mol Biol. 2001
Oct;3(4):91-7.
Warnings
None.
Diagnostic Error Messages
None.
Exit status
It always exits with status 0.
Known bugs
None.
See also
Program name Description
chips Calculate Nc codon usage statistic
codcmp Codon usage table comparison
codcopy Copy and reformat a codon usage table
cusp Create a codon usage table from nucleotide sequence(s)
syco Draw synonymous codon usage statistic plot for a nucleotide
sequence
Author(s)
Alan Bleasby
European Bioinformatics Institute, Wellcome Trust Genome Campus,
Hinxton, Cambridge CB10 1SD, UK
Please report all bugs to the EMBOSS bug team
(emboss-bug (c) emboss.open-bio.org) not to the original author.
History
Written (March 2001) - Alan Bleasby.
Target users
This program is intended to be used by everyone and everything, from
naive users to embedded scripts.
Comments
None
|