## File: wise2api.tex

package info (click to toggle)
wise 2.4.1-21
• links: PTS, VCS
• area: main
• in suites: bullseye, buster, sid
• size: 27,140 kB
• sloc: ansic: 276,365; makefile: 1,003; perl: 886; lex: 93; yacc: 81; sh: 24
 file content (445 lines) | stat: -rw-r--r-- 16,452 bytes parent folder | download | duplicates (2)
 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445 \documentstyle{article} \pdftrailerid{} \begin{document} \newcommand{\programtext}[1]{{\tt #1}} \title{Wise2 API (version 2.1.19b)} \author{Ewan Birney\\ Sanger Centre\\ Wellcome Trust Genome Campus\\ Hinxton, Cambridge CB10 1SA,\\ England.\\ Email: birney@sanger.ac.uk} \date{18/6/2001} \maketitle \newpage \tableofcontents \newpage \section{Overview} This document describes the API of the Wise2 system. The API (application programming interface) allows other programmers use the functionality in the Wise2 package directly, rather than treating the executables as a black box through which you get ASCII output. If you want to learn more about the Wise2 package itself, the algorithms in it or what it is used for, look for the Wise2 documentation (available as postscript), probably in the same place that you found this documentation! The API is accessible in 3 different ways: As a C function calls made inside the Wise2 package namespace - this is the way the current executables (eg, genewise) access the API, as C function calls made from outside the Wise2 package namespace - this is for people writing C programs with their own set of functions who do not want name clashes of things like Sequence'' (in this API the name is exported as Wise2\_Sequence''), and finally as a Perl API, using the XS extension code where C function calls which are dynamic loaded into the Perl interpretter can be executed as if they were standard Perl commands. Probably the most usable is the Perl API. Perl is a very forgiving language, and it is easier to learn for novice programmers - in particular memory management is handled for you. For people who want to use the Wise2 api from inside their own C program, I would use the external api. For people who want to extend wise2 programs to do other things, the internal api. \section{WARNING - still in alpha} After playing around with the API for a while, I have realised that a number of things are not clean enough in the interface. I am not currently considering the API to the 2.1.x series stable. An aim for the 2.2 series is to make a stable and useful API, that is well documented. However, this API does work, and there is this documentation for it, so it maybe worth people who like this sort of thing to play around with it. Anyone who uses the API gets huge guru points from me... \section{API generation} The API is not manually generated but rather is generated by the Dynamite compiler. Dynamite is a language which I wrote specifically for the Wise2 project: it is a cranky but useful language based heavily on C (it converts its source code to C), with a portion dedicated to dynamic programming code (a common algorithm in bioinformatics). It also has a lightweight object model that supports scalars and lists of types. Because the API is generated through Dynamite, you can expect consistent documentation and memory handling of all the functions and objects. \section{Getting Started for the impatient} Here is 3 different ways of using the Wise2 API to reverse complement a sequence. Once in perl, once using the name space protected API, and once using the internal API. These three programs all make the same output, using the same code. It is only how the programming is presented to the user (once in perl, twice in C) which changes. \subsection{Perl reverse complement} \begin{verbatim} #!/usr/local/bin/perl use Wise2; # loads in Wise2 api $file = shift; # first argument if( !defined$file ) { print "You must give a file to revcom for a reverse to work!"; exit(1); } $seq = &Wise2::Sequence::read_fasta_file_Sequence($file); $rev =$seq->revcomp(); print "Original sequence\n\n"; $seq->write_fasta(STDOUT); print "Reversed sequence\n\n";$rev->write_fasta(STDOUT); \end{verbatim} \subsection{Wise2 external API calls} \begin{verbatim} #include "dyna_api.h" int main(int argc,char ** argv) { Wise2_Sequence * seq; Wise2_Sequence * rev; if( argc != 2 ) { fprintf(stderr,"have to give an argument for a file"); exit(1); } seq = Wise2_read_fasta_file_Sequence(argv[1]); if( seq == NULL ) { fprintf(stderr,"Unable to read fasta file in %s\n",argv[1]); exit(1); } rev = Wise2_reverse_complement_Sequence(seq); printf("Original sequence\n\n"); Wise2_write_fasta_Sequence(seq,stdout); printf("Revcomp sequence\n\n"); Wise2_write_fasta_Sequence(rev,stdout); Wise2_free_Sequence(seq); Wise2_free_Sequence(rev); } \end{verbatim} \subsection{Wise2 internal API calls} \begin{verbatim} #include "dyna.h" int main(int argc,char ** argv) { Sequence * seq; Sequence * rev; if( argc != 2 ) { fprintf(stderr,"have to give an argument for a file"); exit(1); } seq = read_fasta_file_Sequence(argv[1]); if( seq == NULL ) { fprintf(stderr,"Unable to read fasta file in %s\n",argv[1]); exit(1); } rev = reverse_complement_Sequence(seq); printf("Original sequence\n\n"); write_fasta_Sequence(seq,stdout); printf("Revcomp sequence\n\n"); write_fasta_Sequence(rev,stdout); free_Sequence(seq); free_Sequence(rev); } \end{verbatim} \section{Navigating the source code} The Wise2 api has a bewildering number of objects and functions, and the biggest problem in using the API is knowing which objects can be made from what. This next section walks you through at an object level how to do some common tasks. This list is in no way complete, but it is better than just browsing around the index. A very good place to start is to read the scripts in the perl/scripts area (halfwise.pl does not use the Wise2 API but all the others do). \subsection{Making a translation of a DNA sequence} \begin{itemize} \item Build a codon table object from a file (\ref{object_CodonTable}) \item Build a sequence object, from a file or strings (\ref{object_Sequence}) \item Use the translate function on the Sequence object \end{itemize} \subsection{Comparing two sequences using smith waterman} \begin{itemize} \item Build a Comparison matrix object from a file (\ref{object_CompMat}) \item Build two Sequence objects, from a file or strings (\ref{object_Sequence}) \item Optionally convert the Sequence objects into Protein objects (\ref{object_Protein}). This ensures you have proteins \item Read in the comparison matrix using CompMat (\ref{object_CompMat}) \item Use one of the algorithm calls in sw_wrap module (\ref{module_sw_wrap}) \item Show the alignment using a call in the seqaligndisaply module (\ref{module_seqaligndisplay}) \end{itemize} \subsection{Running a smith waterman search of single protein sequence vs a db} \begin{itemize} \item Read in a sequence object and convert it to a protein object (\ref{object_Protein},\ref{object_Sequence}) \item Make a protein database from the single protein object (\ref{object_ProteinDB}) \item Make a protein database from a single fasta file (\ref{object_ProteinDB}) \item Using one of the calls to the sw_wrap module, make a Hscore object (\ref{module_sw_wrap}) \item Show the Hscore object using a show function (\ref{object_Hscore}) \item Retrieve individual protein objects from the database by taking out the DataEntry objects (\ref{object_DataEntry}) and passing them into the ProteinDB object (\ref{object_ProteinDB}), giving you a protein object \item optionally align them as in the above section \end{itemize} \subsection{Running a genewise on a single protein vs a single DNA sequence} See the script genewise.pl in the distribution \begin{itemize} \item Make a Sequence object from a strings or a file (\ref{object_Sequence}) \item Make that a protein object (\ref{object_Protein}) \item Make a Sequence object from a string or a file (\ref{object_Sequence}) \item Make that a Genomic object (\ref{object_Genomic}) \item Add any additional repeat areas from external information to the genomic object \item Read in a gene frequency counts (\ref{object_GeneFrequency}) \item Read in a codon table (\ref{object_CodonTable}) \item Make a random DNA model (\ref{object_RandomModelDNA}) \item Make an algorithm type (\ref{module_gwrap}) \item Build an entire parameter set for genewise using Wise2::GeneParameter21_wrap (\ref{module_gwrap}) \item Run the actual algorithm (\ref{module_gwrap}) \item show the alignment using genedisplay (\ref{module_gwrap}) \end{itemize} \section{Concepts and overview of the API} The API is organised in the following way. There are 4 main areas of source code in the wise2 package \begin{itemize} \item wisebase - base memory, string and error handling libraries \item dynlibsrc - generic bioinformatics objects \item models - specific Wise2 objects \item HMMer2 - HMMER 2 (Sean Eddy's HMM package) \end{itemize} The API is mainly derived from the dynlibsrc and models directories. There is no distinction in the API of one directory from another \section{The reference section} The reference section is built automatically from the Dynamite source. This means that the function names, argument lists and in nearly all cases the documentation should be completely up to date with whatever version you got this documentation from. The code is divided up into modules: each module having potentially a number of objects in it and a number of free standing functions (factory functions). The documentation lists each object and the fields in the object which are accessible by the Perl API and the external API (more fields maybe accessible by the internal API, but generally these are not fields that you are expected to use). Fields can either be scalar or list types. In either case the scalar or list can either be a basic type or another object type. The following access methods are available for scalar types \subsection{Accessing fields in objects} \label{accessing_fields} In both the external API and the Perl API you can access all the fields via function calls. In Perl these function calls have the correct names space system to be called using the OOP syntax of Perl. \subsubsection{Perl scalar accessors} \begin{itemize} \item \$obj\->\emph{fieldname}() gets the value of this field \item \$obj\->set\_\emph{fieldname}(\emph{new value}) sets the value of this field \end{itemize} For example \begin{verbatim} $name =$seq->name(); # get the name of a sequence $seq->set_name('NewName'); # set the name of a sequence \end{verbatim} \subsubsection{External C scalar accessors} \begin{itemize} \item Wise2\_access\_\emph{fieldname}\_\emph{ObjectName}(obj) gets the value of this field \item Wise2\_replace\_\emph{fieldname}\_\emph{ObjectName}(obj,\emph{new value}) sets the value of this field \end{itemize} For example \begin{verbatim} char * name; Wise2_Sequence * seq; /* ... get a sequence object somehow ... */ name = Wise2_access_name_Sequence(seq); Wise2_replace_name_Sequence(seq,"NewName"); \end{verbatim} \subsubsection{Perl List accessors} \begin{itemize} \item \$obj\->each\_\emph{fieldname}() Gives a Perl array of all the items in a list \item \$obj\->length\_\emph{fieldname} Length of the list \item \$obj\->\emph{fieldname}(\$i) The ith member of the list \item \$obj\->add\_\emph{fieldname}(\$another\_obj) Adds another object to the list \item \$obj\->flush\_\emph{fieldname}() Destroys all the items in a list, sets list size to zero \end{itemize} \subsubsection{External API List accessors} \begin{itemize} \item Wise2\_access\_\emph{fieldname}\_\emph{ObjectName}(obj,i) access the ith position in the list \item Wise2\_flush\_\emph{fieldname}\_\emph{ObjectName}(obj) Flushes the list \item Wise2\_add\_\emph{fieldname}\_\emph{ObjectName}(obj,added\_object) Adds an object onto the end of the list \end{itemize} \subsection{Object Construction and handling} The good news is that in the Perl API \emph{all} the memory handling is managed between the Perl memory handling method and the Wise2 handling method. Bascially you can completely forget about these things and code normally in Perl and all the memory is handled for you. In the C external API, as in any C program, the programmer is responsible for the memory, and you need to read the documentation as to whether the objects you recieve from function calls need explict frees or not. \subsubsection{Low level Object Constructing} In both Perl and in C you have the possibility of making a new object from scratch. \begin{itemize} \item In Perl it is \\$obj = new Wise2::\emph{ObjectName}; \item in C it is obj = \emph{ObjectName}\_alloc for objects with no lists, and obj = \emph{ObjectName}\_alloc\_std for objects with lists (this is a mistake in the API I know). \end{itemize} However I would read carefully the documentation for an object first, as in some cases the objects have to made through specific functions. These are likely to be things like new\_\emph{ObjectName} or such like. They are likely to be factory'' functions, that is functions not attached to any object. \subsubsection{Object deconstructors} In Perl you don't have to worry about this (heaven). In the C API you have two functions to handle the memory of objects. The objects have a reference counted memory: when the free function is called it decrements the object reference count and if this count hits 0 then the object itself is free'd. To up the reference count you call the hard\_link\_\emph{ObjectName} function. \begin{itemize} \item free\_\emph{ObjectName}(obj) Releases this pointer on this object \item obj = hard\_link\_\emph{ObjectName}() Adds this pointer to object, increasing the reference count \end{itemize} \section{Wise2 Specific Modules} There are a number of modules which are specific to Wise2 algorithms. These should be the starting point for how to use Wise2 algorithms: try to find a function in these modules which provide the functionality that you want. Then figure out how to make the appropiate objects to use this functionality. \begin{description} \item[gwrap] \ref{module_gwrap} The gwrap module has the main entry points for the genewise algorithm and how to build parameters for it \item[estwrap] \ref{module_estwrap} The estwrap module has the main entry points for the estwise algorithm \item[sw\_wrap] \ref{module_sw_wrap} The sw\_wrap has the main entry points for the smith waterman algorithm \item[genedisplay] \ref{module_genedisplay} The pretty ascii output used for genewise and estwise output \item[seqaligndisplay] \ref{module_seqaligndisplay} The pretty ascii output used for smith waterman alignments \item[threestatemodel] \ref{module_threestatemodel} profile-HMM support \item[threestatedb] \ref{module_threestatedb} profile-HMM database support \item[genefrequency] \ref{module_genefrequency} Raw counts for the genewise model \item[geneparameter] \ref{module_geneparameter} probabilities for the genewise model \item[cdparser] \ref{module_cdparser} probabilities for the estwise model \end{description} \section{Dynamite library modules} \subsection{Sequence modules} \begin{description} \item[sequence] \ref{module_sequence} Basic sequences \item[sequencedb] \ref{module_sequencedb} Basic sequence database \item[protein] \ref{module_protein} Protein specific type \item[proteindb] \ref{module_proteindb} Protein database \item[genomic] \ref{module_genomic} Genomic specific type \item[genomicdb] \ref{module_genomicdb} Genomic database \item[cdna] \ref{module_cdna} Cdna specific type \item[cdnadb] \ref{module_cdnadb} Cdna database \end{description} \subsection{Generic probabilistic modelling support} \begin{description} \item[probability] \ref{module_probability} Probability to log space conversions \item[codon] \ref{module_codon} Codon Table support \item[compmat] \ref{module_compmat} Protein Comparison matrix support \item[codonmat] \ref{module_codonmat} Codon Matrix comparison matrix \item[codonmapper] \ref{module_codonmapper} Codon bias/substitution errors support \end{description} \subsection{Generic Database Searching} \begin{description} \item[hscore] \ref{module_hscore} High Score list \item[histogram] \ref{module_histogram} Extreme Value distribution fitting \item[dbimpl] \ref{module_dbimpl} Database Implementation \end{description} \subsection{Generic Dynamite algorithm support} \begin{description} \item[aln] \ref{module_aln} Label alignments \item[packaln] \ref{module_packaln} Raw (low level) alignments \item[basematrix] \ref{module_basematrix} Memory management for DP implementations \end{description}