File: cai.txt

package info (click to toggle)
emboss 6.6.0%2Bdfsg-6
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 571,536 kB
  • ctags: 40,250
  • sloc: ansic: 460,579; java: 29,439; perl: 13,573; sh: 12,754; makefile: 3,283; csh: 706; asm: 351; xml: 239; pascal: 237; modula3: 8
file content (328 lines) | stat: -rw-r--r-- 12,372 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
                                     cai



Wiki

   The master copies of EMBOSS documentation are available at
   http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

   Please help by correcting and extending the Wiki pages.

Function

   Calculate codon adaptation index

Description

   cai calculates the Codon Adaptation Index for a given nucleotide
   sequence, given a reference codon usage table. The CAI index is a
   simple, effective measure of synonymous codon usage bias. It index
   assesses the extent to which selection has been effective in moulding
   the pattern of codon usage. In that respect it is useful for predicting
   the level of expression of a gene, for assessing the adaptation of
   viral genes to their hosts, and for making comparisons of codon usage
   in different organisms. The index may also give an approximate
   indication of the likely success of heterologous gene expression.

Algorithm

   The CAI index uses a reference set of highly expressed genes from a
   species to assess the relative merits of each codon. A score for a gene
   sequence is calculated from the frequency of use of all codons in that
   gene sequence.

Usage

   Here is a sample session with cai


% cai TEMBL:AB009602
Calculate codon adaptation index
Codon usage file [Eyeast_cai.cut]:
Output file [ab009602.cai]:


   Go to the input files for this example
   Go to the output files for this example

Command line arguments

Calculate codon adaptation index
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-seqall]            seqall     Nucleotide sequence(s) filename and optional
                                  format, or reference (input USA)
   -cfile              codon      [Eyeast_cai.cut] Codon usage table name
  [-outfile]           outfile    [*.cai] Output file name

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-seqall" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -scircular1         boolean    Sequence is circular
   -squick1            boolean    Read id and sequence only
   -sformat1           string     Input sequence format
   -iquery1            string     Input query fields or ID list
   -ioffset1           integer    Input start position offset
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-cfile" associated qualifiers
   -format             string     Data format

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


Input file format

   cai reads a nucleic acid sequence of a gene.

  Input files for usage example

  Database entry: TEMBL:AB009602

ID   AB009602; SV 1; linear; mRNA; STD; FUN; 561 BP.
XX
AC   AB009602;
XX
DT   15-DEC-1997 (Rel. 53, Created)
DT   14-APR-2005 (Rel. 83, Last updated, Version 2)
XX
DE   Schizosaccharomyces pombe mRNA for MET1 homolog, partial cds.
XX
KW   MET1 homolog.
XX
OS   Schizosaccharomyces pombe (fission yeast)
OC   Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC   Schizosaccharomycetes; Schizosaccharomycetales; Schizosaccharomycetaceae;
OC   Schizosaccharomyces.
XX
RN   [1]
RP   1-561
RA   Kawamukai M.;
RT   ;
RL   Submitted (07-DEC-1997) to the INSDC.
RL   Makoto Kawamukai, Shimane University, Life and Environmental Science; 1060
RL   Nishikawatsu, Matsue, Shimane 690, Japan
RL   (E-mail:kawamuka@life.shimane-u.ac.jp, Tel:0852-32-6587, Fax:0852-32-6499)
XX
RN   [2]
RP   1-561
RA   Kawamukai M.;
RT   "S.pmbe MET1 homolog";
RL   Unpublished.
XX
DR   EnsemblGenomes; SPCC1739.06c; Schizosaccharomyces_pombe.
DR   EnsemblGenomes; SPCC1739.06c.1; Schizosaccharomyces_pombe.
DR   PomBase; SPCC1739.06c.
DR   PomBase; SPCC1739.06c.1.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..561
FT                   /organism="Schizosaccharomyces pombe"
FT                   /mol_type="mRNA"
FT                   /clone_lib="pGAD GH"
FT                   /db_xref="taxon:4896"
FT   CDS             <1..275
FT                   /codon_start=3
FT                   /transl_table=1
FT                   /product="MET1 homolog"
FT                   /db_xref="GOA:O74468"
FT                   /db_xref="InterPro:IPR000878"
FT                   /db_xref="InterPro:IPR003043"
FT                   /db_xref="InterPro:IPR006366"
FT                   /db_xref="InterPro:IPR012066"
FT                   /db_xref="InterPro:IPR014776"
FT                   /db_xref="InterPro:IPR014777"
FT                   /db_xref="InterPro:IPR016040"
FT                   /db_xref="UniProtKB/Swiss-Prot:O74468"
FT                   /protein_id="BAA23999.1"
FT                   /translation="SMPKIPSFVPTQTTVFLMALHRLEILVQALIESGWPRVLPVCIAE
FT                   RVSCPDQRFIFSTLEDVVEEYNKYESLPPGLLITGYSCNTLRNTA"
XX
SQ   Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other;
     gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt        60
     tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac       120
     cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg       180
     aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg       240
     gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt       300
     tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac       360
     ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt       420
     ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt       480
     tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa       540
     aacaattcta atggtcaaaa a                                                 561
//

Output file format

   cai writes the Codon Adaptation Index to the output file.

  Output files for usage example

  File: ab009602.cai

Sequence: AB009602 CAI: 0.188

Data files

   cai requires a reference codon usage table prepared from a set of genes
   which are known to be highly expressed. This is specified by the -cfile
   option and must exist in the EMBOSS data directory. The default codon
   usage table Eyeastcai.cut is the standard set of Saccharomyces
   cerevisiae highly expressed gene codon frequiencies. Another table
   (Eschpo_cai.cut) was prepared from a set of Schizosaccharomyces pombe
   genes by Peter Rice for the S. pombe sequencing team at the Sanger
   Centre, and is available in the EMBOSS data directory. You should
   prepare your own codon usage table for your organism of interest.

   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by the EMBOSS
   environment variable EMBOSS_DATA.

   To see the available EMBOSS data files, run:

% embossdata -showall

   To fetch one of the data files (for example 'Exxx.dat') into your
   current directory for you to inspect or modify, run:

% embossdata -fetch -file Exxx.dat


   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata". Files
   for all EMBOSS runs can be put in the user's home directory, or again
   in a subdirectory called ".embossdata".

   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata

Notes

   Codons are nucleotide triplet that encode an amino acid residue in a
   polypeptide chain. There are four possible nucleotides in DNA; adenine
   (A), guanine (G), cytosine (C) and thymine (T), therefore 64 possible
   triplets to encode the 20 amino acids plus the translation termination
   signal. The encoding is therefore redundant, with all but two amino
   acids coded for by more than one triplet. Organisms often have a
   particular preference for one of the possible codons for a given amino
   acid.

   Codon preferences reflect a balance between mutational bias and
   selection for efficiency of translation. In fast-growing microorganisms
   there are optimal codons that reflect the composition of the genomic
   tRNA pool and probably help achieve faster translation rates and high
   accuracy. Such selection is expected to be strong in highly expressed
   genes, as is the case for Escherichia coli or Saccharomyces cerevisiae.
   In contrast, codon usage optimization is normally absent in organisms
   with slower growing rates such as Homo sapiens (human), where codon
   preferences are determined by mutational biases characteristic to a
   particular genome.

   Various factors are thought to influence codon usage bias in baceteria,
   including gene expression level already mentioned, %G+C composition
   (reflecting horizontal gene transfer or mutational bias), GC skew
   (reflecting strand-specific mutational bias), amino acid conservation,
   protein hydropathy, transcriptional selection, RNA stability, and
   optimal growth temperature.

   Various methods have been used to analyze codon usage bias. CAI and
   methods such as the 'frequency of optimal codons' (Fop) are commonly
   used to predict gene expression levels. Others such as the 'effective
   number of codons' (Nc) and Shannon entropy are used to measure codon
   usage evenness, whereas multivariate statistical methods, iincluding
   correspondence analysis and principal component analysis, may be used
   to analyze variations in codon usage between genes.

References

    1. Sharp PM., Li W-H. "The codon adaptation index - a measure of
       directional synonymous codon usage bias, and its potential
       applications." Nucleic Acids Research 1987 vol 15, pp 1281-1295.
    2. Synonymous codon usage in bacteria. Curr Issues Mol Biol. 2001
       Oct;3(4):91-7.

Warnings

   None.

Diagnostic Error Messages

   None.

Exit status

   It always exits with status 0.

Known bugs

   None.

See also

   Program name     Description
   chips            Calculate Nc codon usage statistic
   codcmp           Codon usage table comparison
   codcopy          Copy and reformat a codon usage table
   cusp             Create a codon usage table from nucleotide sequence(s)
   syco             Draw synonymous codon usage statistic plot for a nucleotide
                    sequence

Author(s)

   Alan Bleasby
   European Bioinformatics Institute, Wellcome Trust Genome Campus,
   Hinxton, Cambridge CB10 1SD, UK

   Please report all bugs to the EMBOSS bug team
   (emboss-bug (c) emboss.open-bio.org) not to the original author.

History

   Written (March 2001) - Alan Bleasby.

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.

Comments

   None