File: wise3arch.tex

package info (click to toggle)
wise 2.4.1-21
links: PTS, VCS
area: main
in suites: buster
size: 27,140 kB
sloc: ansic: 276,365; makefile: 1,003; perl: 886; lex: 93; yacc: 81; sh: 24
file content (169 lines) | stat: -rw-r--r-- 6,769 bytes
parent folder | download | duplicates (3)
\documentstyle{article}
\begin{document}
\newcommand{\programtext}[1]{{\tt #1}}

\title{Wise3 Open Architecture}
\author{Ewan Birney, Guy Slater
Sanger Centre\\
Wellcome Trust Genome Campus\\
Hinxton, Cambridge CB10 1SA,\\
England.\\
Email: birney@sanger.ac.uk,gslater@hgmp.mrc.ac.uk}

\maketitle
 
\newpage
\tableofcontents
\newpage

\section{Introduction}

The aim of this paper is to lay out some architecture goals for the next generation
of the Wise package, Wise3. In addition we would like to lay out some of the changes
to the software of Wise. The architecture is designed to be open and provide
additional code to work seemlessly with the Wise package. There are two main groups of
people this open architecture is aimed at.

FIXME: the document should be more focused at general wise3 issues.


\begin{itemize}
\item Large sites with databases which are not kept in simple fasta databases.
\item Hardware manufacturers, including specialised hardware, who would like
to improve on the speed or sensitivity aspects of the database search.
\end{itemize}

The aim is also to encourage people to work on a consistent framework for using
genewise and genewise type algorithms sensibly by encouraging people to conform
to standards whenever it is sensible.

The main goal is to prevent the annoying habit of hardware manufactures being
asked to ``implement genewise'' without a clear definition of what that means,
and in addition, being forced not merely to implement genewise, but also the
entire supporting framework, such as alignments and post processing. This is
a huge additional strain on the hardware manufactures which does not help anyone.

For consumers of hardware or database systems, this provides a single document
which you can point to to indicate what you want from the system. It should
clear up a considerable amount of confusion for providing compliant systems.

\subsection{Committment to open source, freely available code}

The Wise2 package has been licensed under the GNU General Public License since
its inception.  In addition, parts of the package has even less
restrictive Licenses. I have a strong committment to keep Wise a
freely available, open source package. The aim of the open
architecture is to allow Wise to be compiled with additional
extensions provided by third party sources: As the users will be
compiling in the additional extensions, this will not require the 3rd
parties to License their code under GPL. They will be free to license
their code under any license they see fit, including keep their source
code closed and charging for it.

\subsection{Potential conflict of interests}

I have been a consultant to Compugen (a company which builds specialised
hardware) and am currently a consultant to Paracel (again, a specialised
hardware company) and I have been a consultant to a number of bioinformatics
and pharmaceutical companies worldwide. I do \emph{not believe} that my
involvement with these companies prejuidices the open source nature of the
Wise package, nor has this architecture been written to favour one particular
company over another. I'd like to point out both that my committment to 
the scientific endeavour involved in the Wise package is far greater than
any committment to any company, and that if I wanted to make money out
of the Wise package in a serious manner I would have gone private \emph{myself}
some time ago.

I am sympathetic to people who are concerned about any 
potential conflict of interests that I have: I would welcome people to email
their concerns and we can discuss it. I am very happy to allow the involvement
of independent researchers who can verify the open nature Wise - all suggestions
welcome.

\subsection{A collaborative approach}

This document is the first stab at a definition of an open architecture. I would
very much welcome feedback from both manufactures, corporate users and academic
users as to what would help them make better use of Wise in a large scale software
environment. Please feel free to make your own suggestions and corrections.

\section{Overview of the architecture}

The architecture will define 3 main interfaces

\begin{itemize}
\item A C based interface of opening, iterating over and closing a database
of sequences.
\item A CORBA based interface of opening, iterating over and closing a database
of sequences: this reuses the BioSource idl from bioperl for greater code reuse
between packages.
\item A C based interface of running a database search of a particular algorithm
type, of a single query structure against a database of sequences as give by
the above, C interface
\end{itemize}

In addition there needs to be additional rules for propagating command line
arguments into the intialisation of the database and search routines, and also
conventions for how to find and link against libaries containing this information,
regardless of where they came from.

Although at first the expected mode of action will be that the Wise source code
will be compiled and then linked to the additional functionalities provided as
C libraries, eventually dynamic loading routines might be considered.

It was also tempting to focus on a CORBA only based interface of the
code: however, even with free, lightweight effective ORBs such as
ORBit, the technology is both an overkill for what is effectively a
series of very simple interfaces and ties the portability of the code
to the portability of the ORBs. In addition, people may be concerned
that performance issues would come into play, even though ORBit will
do direct function calls for in process object requests.

A CORBA defintion has been provided for the database layer, but will 
always sit behind the pure C interface (ORBit will be our ORB of choice
to do the mapping).

The C interfaces will be designed with the following features in mind

\begin{itemize}
\item All the definitions will focus on passing opaque pointers to functions
\item When functions return status values, they will return an integer, with
0 being success and non zero indicating an error of some sort
\end{itemize}

\section{Database Access, C definition}

\begin{verbatim}



typedef struct WOpenSeq       WOpenSeq;
typedef struct WOpenSeqStream WOpenSeqStream;
typedef struct WOpenSeqDB     WOpenSeqDB;

typedef int WOpenStatus;

char * WOpenSeq_seq(WOpenSeq * seq);
char * WOpenSeq_subseq(WOpenSeq * seq,long start,long end);
long   WOpenSeq_length(WOpenSeq * seq);
char * WOpenSeq_desc(WOpenSeq * seq);
char * WOpenSeq_identifier(WOpenSeq * seq);


WOpenStatus WOpenSeqStream_open(WOpenSeqStream * stream);

\end{verbatim}


\section{Error Libraries}

The idea is that we ditch the error handling libraries of Wise2 (the /base stuff) and
switch over to using glib routines

\end{document}