complex

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Find the linguistic complexity in nucleotide sequences

Description

Usage

Here is a sample session with complex


% complex -omnia 
Find the linguistic complexity in nucleotide sequences
Input nucleotide sequence(s): tembl:x*
Window length [100]: 
Step size [5]: 
Minimum word length [4]: 
Maximum word length [6]: 
Program complex output file [x59796.complex]: 
output sequence(s) [x59796.fasta]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

Find the linguistic complexity in nucleotide sequences
Version: EMBOSS:6.3.1

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-sequence]          seqall     Nucleotide sequence(s) filename and optional
                                  format, or reference (input USA)
   -lwin               integer    [100] Window length (Integer 1 or more)
   -step               integer    [5] Displacement of the window over the
                                  sequence (Integer 1 or more)
   -jmin               integer    [4] Minimum word length (Integer from 2 to
                                  20)
   -jmax               integer    [6] Maximum word length (Integer from 2 to
                                  50)
  [-outfile]           outfile    [*.complex] Program complex output file
*  -outseq             seqoutall  [.] Sequence set(s)
                                  filename and optional format (output USA)

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -omnia              toggle     [N] Calculate over a set of sequences
   -sim                integer    [0] Calculate the linguistic complexity by
                                  comparison with a number of simulations
                                  having a uniform distribution of bases (Any
                                  integer value)
   -freq               boolean    [N] Execute the simulation of a sequence
                                  based on the base frequency of the original
                                  sequence
   -print              boolean    [N] Generate a file named UjTable containing
                                  the values of Uj for each word j in the
                                  real sequence(s) and in any simulated
                                  sequences
   -ujtablefile        outfile    [complex.ujtable] Program complex temporary
                                  output file

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   "-ujtablefile" associated qualifiers
   -odirectory         string     Output directory

   "-outseq" associated qualifiers
   -osformat           string     Output seq format
   -osextension        string     File name extension
   -osname             string     Base file name
   -osdirectory        string     Output directory
   -osdbname           string     Database name to add
   -ossingle           boolean    Separate file for each entry
   -oufo               string     UFO features
   -offormat           string     Features format
   -ofname             string     Features file name
   -ofdirectory        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-sequence]
(Parameter 1)
seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
-lwin integer Window length Integer 1 or more 100
-step integer Displacement of the window over the sequence Integer 1 or more 5
-jmin integer Minimum word length Integer from 2 to 20 4
-jmax integer Maximum word length Integer from 2 to 50 6
[-outfile]
(Parameter 2)
outfile Program complex output file Output file <*>.complex
-outseq seqoutall Sequence set(s) filename and optional format (output USA) Writeable sequence(s) <*>.format
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-omnia toggle Calculate over a set of sequences Toggle value Yes/No No
-sim integer Calculate the linguistic complexity by comparison with a number of simulations having a uniform distribution of bases Any integer value 0
-freq boolean Execute the simulation of a sequence based on the base frequency of the original sequence Boolean value Yes/No No
-print boolean Generate a file named UjTable containing the values of Uj for each word j in the real sequence(s) and in any simulated sequences Boolean value Yes/No No
-ujtablefile outfile Program complex temporary output file Output file complex.ujtable
Associated qualifiers
"-sequence" associated seqall qualifiers
-sbegin1
-sbegin_sequence
integer Start of each sequence to be used Any integer value 0
-send1
-send_sequence
integer End of each sequence to be used Any integer value 0
-sreverse1
-sreverse_sequence
boolean Reverse (if DNA) Boolean value Yes/No N
-sask1
-sask_sequence
boolean Ask for begin/end/reverse Boolean value Yes/No N
-snucleotide1
-snucleotide_sequence
boolean Sequence is nucleotide Boolean value Yes/No N
-sprotein1
-sprotein_sequence
boolean Sequence is protein Boolean value Yes/No N
-slower1
-slower_sequence
boolean Make lower case Boolean value Yes/No N
-supper1
-supper_sequence
boolean Make upper case Boolean value Yes/No N
-sformat1
-sformat_sequence
string Input sequence format Any string  
-sdbname1
-sdbname_sequence
string Database name Any string  
-sid1
-sid_sequence
string Entryname Any string  
-ufo1
-ufo_sequence
string UFO features Any string  
-fformat1
-fformat_sequence
string Features format Any string  
-fopenfile1
-fopenfile_sequence
string Features file name Any string  
"-outfile" associated outfile qualifiers
-odirectory2
-odirectory_outfile
string Output directory Any string  
"-ujtablefile" associated outfile qualifiers
-odirectory string Output directory Any string  
"-outseq" associated seqoutall qualifiers
-osformat string Output seq format Any string  
-osextension string File name extension Any string  
-osname string Base file name Any string  
-osdirectory string Output directory Any string  
-osdbname string Database name to add Any string  
-ossingle boolean Separate file for each entry Boolean value Yes/No N
-oufo string UFO features Any string  
-offormat string Features format Any string  
-ofname string Features file name Any string  
-ofdirectory string Output directory Any string  
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

Input file format

Input files for usage example

'tembl:x*' is a sequence entry in the example nucleic acid database 'tembl'

Output file format

Sequence TEMBL:HHTETRA contains repeats and is included in the test database for repeat analysis.

Output files for usage example

File: complex.ujtable


File: x59796.complex

Length of window : 100 
jmin : 4 
jmax : 6 
step : 5 
Execution without simulation 
----------------------------------------------------------------------------
|                  |                  |                  |                  |
|     number of    |      name of     |     length of    |      value of    |
|     sequence     |     sequence     |     sequence     |     complexity   |
|                  |                  |                  |                  |
----------------------------------------------------------------------------
         1                      X59796           3170             0.6921 
         2                      X65923            518             0.6739 
         3                      X65921           2016             0.7105 
         4                      X51466           3075             0.6925 
         5                      X07523           1658             0.7314 
         6                      X03487            512             0.5609 
         7                      X03488           1132             0.7217 
         8                      X07797           1675             0.6201 
         9                      X51872           1832             0.6916 
        10                      X77160           1212             0.6596 
        11                      X13776           2167             0.6562 
        12                      X77161           1130             0.6989 

File: x59796.fasta

>X59796 X59796.1 H.sapiens mRNA for cadherin-5
ctccactcacgctcagccctggacggacaggcagtccaacggaacagaaacatccctcag
cccacaggcacgatctgttcctcctgggaagatgcagaggctcatgatgctcctcgccac
atcgggcgcctgcctgggcctgctggcagtggcagcagtggcagcagcaggtgctaaccc
tgcccaacgggacacccacagcctgctgcccacccaccggcgccaaaagagagattggat
ttggaaccagatgcacattgatgaagagaaaaacacctcacttccccatcatgtaggcaa
gatcaagtcaagcgtgagtcgcaagaatgccaagtacctgctcaaaggagaatatgtggg
caaggtcttccgggtcgatgcagagacaggagacgtgttcgccattgagaggctggaccg
ggagaatatctcagagtaccacctcactgctgtcattgtggacaaggacactggcgaaaa
cctggagactccttccagcttcaccatcaaagttcatgacgtgaacgacaactggcctgt
gttcacgcatcggttgttcaatgcgtccgtgcctgagtcgtcggctgtggggacctcagt
catctctgtgacagcagtggatgcagacgaccccactgtgggagaccacgcctctgtcat
gtaccaaatcctgaaggggaaagagtattttgccatcgataattctggacgtattatcac
aataacgaaaagcttggaccgagagaagcaggccaggtatgagatcgtggtggaagcgcg
agatgcccagggcctccggggggactcgggcacggccaccgtgctggtcactctgcaaga
catcaatgacaacttccccttcttcacccagaccaagtacacatttgtcgtgcctgaaga
cacccgtgtgggcacctctgtgggctctctgtttgttgaggacccagatgagccccagaa
ccggatgaccaagtacagcatcttgcggggcgactaccaggacgctttcaccattgagac
aaaccccgcccacaacgagggcatcatcaagcccatgaagcctctggattatgaatacat
ccagcaatacagcttcatagtcgaggccacagaccccaccatcgacctccgatacatgag
ccctcccgcgggaaacagagcccaggtcattatcaacatcacagatgtggacgagccccc
cattttccagcagcctttctaccacttccagctgaaggaaaaccagaagaagcctctgat
tggcacagtgctggccatggaccctgatgcggctaggcatagcattggatactccatccg
caggaccagtgacaagggccagttcttccgagtcacaaaaaagggggacatttacaatga
gaaagaactggacagagaagtctacccctggtataacctgactgtggaggccaaagaact
ggattccactggaacccccacaggaaaagaatccattgtgcaagtccacattgaagtttt
ggatgagaatgacaatgccccggagtttgccaagccctaccagcccaaagtgtgtgagaa
cgctgtccatggccagctggtcctgcagatctccgcaatagacaaggacataacaccacg
aaacgtgaagttcaaattcatcttgaatactgagaacaactttaccctcacggataatca
cgataacacggccaacatcacagtcaagtatgggcagtttgaccgggagcataccaaggt
ccacttcctacccgtggtcatctcagacaatgggatgccaagtcgcacgggcaccagcac
gctgaccgtggccgtgtgcaagtgcaacgagcagggcgagttcaccttctgcgaggatat
ggccgcccaggtgggcgtgagcatccaggcagtggtagccatcttactctgcatcctcac
catcacagtgatcaccctgctcatcttcctgcggcggcggctccggaagcaggcccgcgc
gcacggcaagagcgtgccggagatccacgagcagctggtcacctacgacgaggagggcgg
cggcgagatggacaccaccagctacgatgtgtcggtgctcaactcggtgcgccgcggcgg
ggccaagcccccgcggcccgcgctggacgcccggccttccctctatgcgcaggtgcagaa
gccaccgaggcacgcgcctggggcacacggagggcccggggagatggcagccatgatcga
ggtgaagaaggacgaggcggaccacgacggcgacggccccccctacgacacgctgcacat
ctacggctacgagggctccgagtccatagccgagtccctcagctccctgggcaccgactc
atccgactctgacgtggattacgacttccttaacgactggggacccaggtttaagatgct
ggctgagctgtacggctcggacccccgggaggagctgctgtattaggcggccgaggtcac
tctgggcctggggacccaaaccccctgcagcccaggccagtcagactccaggcaccacag
cvncadctccaaaaatggcagtgactccccagcccagcaccccttcctcgtgggtcccag
agacctcatcagccttgggatagcaaactccaggttcctgaaatatccaggaatatatgt
cagtgatgactattctcaaatgctggcaaatccaggctggtgttctgtctgggctcagac
atccacataaccctgtcacccacagaccgccgtctaactcaaagacttcctctggctccc
caaggctgcaaagcaaaacagactgtgtttaactgctgcagggtctttttctagggtccc
tgaacgccctggtaaggctggtgaggtcctggtgcctatctgcctggaggcaaaggcctg
gacagcttgacttgtggggcaggattctctgcagcccattcccaagggagactgaccatc


  [Part of this file has been deleted for brevity]

tacggttcctcgtgggctgctacatgtcgcacacgcgcaaggcggtgatgccggtggtcg
agcgcgccgacgcgctgctctgctacccgaccccctacgagggcttcgagtattcgccga
acatcgtctacggcggtccggcgccgaaccagaacagtgcgccgctggcggcgtacctga
ttcgccactacggcgagcgggtggtgttcatcggctcggactacatctatccgcgggaaa
gcaaccatgtgatgcgccacctgtatcgccagcacggcggcacggtgctcgaggaaatct
acattccgctgtatccctccgacgacgacttgcagcgcgccgtcgagcgcatctaccagg
cgcgcgccgacgtggtcttctccaccgtggtgggcaccggcaccgccgagctgtatcgcg
ccatcgcccgtcgctacggcgacggcaggcggccgccgatcgccagcctgaccaccagcg
aggcggaggtggcgaagatggagagtgacgtggcagaggggcaggtggtggtcgcgcctt
acttctccagcatcgatacgcccgccagccgggccttcgtccaggcctgccatggtttct
tcccggagaacgcgaccatcaccgcctgggccgaggcggcctactggcagaccttgttgc
tcggccgcgccgcgcaggccgcaggcaactggcgggtggaagacgtgcagcggcacctgt
acgacatcgacatcgacgcgccacaggggccggtccgggtggagcgccagaacaaccaca
gccgcctgtcttcgcgcatcgcggaaatcgatgcgcgcggcgtgttccaggtccgctggc
agtcgcccgaaccgattcgccccgacccttatgtcgtcgtgcataacctcgacgactggt
ccgccagcatgggcgggggaccgctcccatgagcgccaactcgctgctcggcagcctgcg
cgagttgcaggtgctggtcctcaacccgccgggggaggtcagcgacgccctggtcttgca
gctgatccgcatcggttgttcggtgcgccagtgctggccgccgccggaagccttcgacgt
gccggtggacgtggtcttcaccagcattttccagaatggccaccacgacgagatcgctgc
gctgctcgccgccgggactccgcgcactaccctggtggcgctggtggagtacgaaagccc
cgcggtgctctcgcagatcatcgagctggagtgccacggcgtgatcacccagccgctcga
tgcccaccgggtgctgcctgtgctggtatcggcgcggcgcatcagcgaggaaatggcgaa
gctgaagcagaagaccgagcagctccaggaccgcatcgccggccaggcccggatcaacca
ggccaaggtgttgctgatgcagcgccatggctgggacgagcgcgaggcgcaccagcacct
gtcgcgggaagcgatgaagcggcgcgagccgatcctgaagatcgctcaggagttgctggg
aaacgagccgtccgcctgagcgatccgggccgaccagaacaataacaagaggggtatcgt
catcatgctgggactggttctgctgtacgttggcgcggtgctgtttctcaatgccgtctg
gttgctgggcaagatcagcggtcgggaggtggcggtgatcaacttcctggtcggcgtgct
gagcgcctgcgtcgcgttctacctgatcttttccgcagcagccgggcagggctcgctgaa
ggccggagcgctgaccctgctattcgcttttacctatctgtgggtggccgccaaccagtt
cctcgag
>X77161 X77161.1 Pseudomonas aeruginosa (PAC1) amiS gene.
gagccgtccgcctgagcgatccgggccgaccagaacaataacaagaggggtatcgtcatc
atgctgggactggttctgctgtacgttggcgcggtgctgtttctcaatgccgtctggttg
ctgggcaagatcagcggtcgggaggtggcggtgatcaacttcctggtcggcgtgctgagc
gcctgcgtcgcgttctacctgatcttttccgcagcagccgggcagggctcgctgaaggcc
ggagcgctgaccctgctattcgcttttacctatctgtgggtggccgccaaccagttcctc
gaggtggacggcaagggcctcggctggttctgcctgttcgtcagcctcaccgcctgcacc
gtggcgatcgagtcgttcgccggcgccagtggtccgttcggcctgtggaacgcggtcaac
tggacagtctgggcgttgctctggttctgtttcttcctgctgctggggctgtcccgcggc
atccagaagccggtggcctacctgaccctggccagcgccatattcaccgcctggttgccc
ggcctgctgctgctcggacaggtgctcaaggcatagcaggaagtcggaaagggatgacgg
cttgccgccatcccgtcccttccgaacgcctagccgagcggccagttgatcaccacgacg
gcgtcgttgtagtcgttgtcggtgccgtcttcagagccgaccagggcgaagttcagctcg
ttggtcaggattacctgtgccgagaccagatccgaggggcggccgttgacgctgacctgg
acctgtaccttgccactgctgccggagttgagcacctgggtgccgatgacggcgttattg
gtgctttgcccgctgaaggtcgcggccgtgctcgttgttgaccagcacgttcaccgtctg
ggttccggacgagttggcgaaggcggtgacgccggaacctggttgttggcgggaagggtg
aacactccttgtggttgccatggtggtatctccactgaatacctggccccttccttttca
ggcagccgtctggcgcgcggtatggcgtgtcgggagaaatccgcagtccttggcggcagg
cgatgcgcaggcaggaaggacgcatcgttcagccaatctacgccgtcgac

Data files

None

Notes

None

References

None

Warnings

None

Diagnostic Error Messages

None

Exit status

Always exits with status 0

Known bugs

None

See also

Program name Description
banana Plot bending and curvature data for B-DNA
btwisted Calculate the twisting in a B-DNA sequence
chaos Draw a chaos game representation plot for a nucleotide sequence
compseq Calculate the composition of unique words in sequences
dan Calculates nucleic acid melting temperature
density Draw a nucleic acid density plot
freak Generate residue/base frequency table or plot
isochore Plots isochores in DNA sequences
sirna Finds siRNA duplexes in mRNA
wordcount Count and extract unique words in molecular sequence(s)

Author(s)

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None