File: introtowebdb.html

package info (click to toggle)
sqlrelay 1%3A0.37.1-3.1
links: PTS
area: main
in suites: etch, etch-m68k
size: 13,084 kB
ctags: 6,691
sloc: cpp: 48,136; python: 10,118; ansic: 9,673; java: 9,195; php: 8,839; perl: 8,827; sh: 8,554; ruby: 8,516; tcl: 5,039; makefile: 3,665
file content (282 lines) | stat: -rw-r--r-- 15,703 bytes
parent folder | download | duplicates (2)
<html>
<head>
</head>
<title>firstworks   Introduction to Database Access from Web-Based Applications</title>
<link href="css/styles.css" rel="stylesheet">
</head>
<body>

<span class="heading1">Introduction to Database Access from Web-Based Applications</span><br>

<p>It's rare these days that a substantial web-based application is written 
that doesn't need to access some sort of database.  This discussion examines (in
reasonably high level detail) the different tools available for developing web
based applications and the database access options a developer has with each.
</p>

<span class="heading1">The definition of "Database"</span><br>

<p>First, the question "What do you mean when you say database?" needs to be 
answered.  That may seem like a no-brainer, but I've argued long and hard with 
people about the definition of the word "database" in the past.  For the 
purposes of this discussion, a database is an organized store of 
information that an application can access.  That's a very general definition, 
and there is probably a more specific and perhaps more accurate definition 
somewhere on the net, but that's what "database" means in this discussion.</p>

<span class="heading1">Different Types of Databases</span><br>

<p>There are quite a few different types of databases.  Heirarchical and
relational databases are probably the only ones relevant to a developer today.
</p>

<p>Heirarchical databases are constructed like a tree (actually an acyclic 
(or infrequently a cyclic) graph) of nodes.  Each node must be referenced by 
the path to that node from some other node.  Individual nodes can contain a variety of information, but the tree structure is what defines the database 
type.  Examples of heirarchical databases are LDAP databases and filesystems.  
Yes, a filesystem is a actually a database; a heirarchical one.</p>

<p>Relational databases are collections of tables.  Each table consists of
rows and columns.  The intersection of a row and column is called a field.
Columns have a data type such as character, numeric or boolean and usually
a length.  What makes this whole thing relational is that a field in one
table can correspond to a field in another table, relating the rows in the two
tables.</p>

<span class="heading1">Relational Databases</span><br>

<p>There are actually different types of relational databases.</p>

<p>Structured Query Language (or SQL) databases are probably the most popular. 
In an SQL database, clients access the data store by submitting queries to the 
RDBMS (Relational Database Management System).  The RDBMS collects the data
requested in the query and returns it to the client in a result set.
Examples of SQL databases are the open source MySQL and mSQL databases and
commercial products like Oracle, Sybase and Microsoft SQL Server databases.</p>

<p>Another kind of relational database is the xBase or BerkeleyDB style 
database.  The word xBase is derived from the dBase I,II,III,IV and FoxBase 
products.  In this kind of database, tables are stored as individual files 
and methods are provided for searching through and accessing rows and 
columns in the files directly.  No high level query language is provided.</p>

<p>Between xBase and SQL databases lie databases like Microsoft Access and
FoxPro.  They both provide both direct table access methods and a high level 
query language, though not SQL proper.</p>

<p>Flat file databases are another kind of relational database.  In flat
file databases, the data store is kept in some kind of human readable format.  
Flat file databases can be opened and edited by a text editor.  They are most 
often used to store small amounts of data.  Though file formats vary, columns 
are often quoted and seperated by commas or left unquoted and seperated by 
tabs.  Alternatively, columns can be fixed length and white space padded.  
Rows are usually seperated by carriage returns.  These kinds of databases are 
often proprietary to a particular software package and are accessible by 
using low level file manipulation commands or through an API provided by the 
package.</p>

<span class="heading1">Transactional Databases</span><br>

<p>In the RDBMS world, there are 2 varieties: transactional and 
non-transactional.  A transactional RDBMS maintains discrete sessions 
and provides commit and rollback methods. Changes made by one session are 
invisible to other sessions until a commit is executed. Changes can 
also be rolled back before they are committed.  During a rollback, the state 
of the database prior to the changes is restored.  Most expensive commercial 
RDBMS's are transactional while lower end commercial and free databases
are not.</p>

<span class="heading1">Accessing Databases</span><br>

<p>"How can I get data from my database onto the web?"</p>

<p>That is the burning question.  There are many answers.</p>

<p>Significant factors in determining a solution are the choices of operating
system, database and development tools.  Being constrained by one constrains 
the others.  Another significant factor is the way the database information 
needs to be displayed.</p>

<p>If the application merely displays information from an
infrequently updated database, then generating web-presentable reports may
be an effective solition.  Commercial reporting tools like Crystal Reports
and Oracle Reports can print straight to HTML.  These tools only run on 
Microsoft Windows though there may be a similar tools for other platforms.  
At a lower level, most databases have scriptable tools for executing queries.  
Presentations can be generated periodically with one of these tools and 
processsed for the web with some template html files and crafty sed
scripting.  For databases with no scriptable query tools, programs can be 
written in a variety of languages to accomplish the same end result using API 
calls.</p>

<p>For building dynamic, database-driven applications, the simplest 
solution is to use a web application server.  Most have built in connectivity 
with some set of relational databases and come with a web page 
builder where database objects can by simply dragged and dropped onto 
web pages.  Web application servers are usually constrained in OS support, 
functionality, extensibility, scalability and in the databases that they can 
connect to.</p>

<p>Microsoft Visual InterDev is more flexible.  It's not a web 
application server, but a web-based application development system.  In 
InterDev, relational database objects can be dragged and dropped into web pages,
but the pages are Active Server Pages (ASP's) and the database objects are 
(ultimately) accessed from them through ODBC (Open Database Connectivity).  
ASP's are very extensible and ODBC can connect to a lot of different 
relational databases, but in the end the application will most likely have to 
run through Microsoft Internet Information System (IIS) on a Microsoft 
Operating System.</p>

<p>Oracle provides a unique solution with Oracle 8i.  In Oracle 8i the
RDBMS attaches to an external web server like Apache, Netscape or Microsoft IIS
and serves applications from within.  Applications reside and execute inside 
the database as PL/SQL procedures.</p>

<p>Most non-drag-and-drop solutions require some programming and an API.  
Popular programming languages for writing web-based applications include PHP, 
Perl, Python, C/C++ and Java.</p>

<p>PHP provides methods for accessing most databases including ODBC, LDAP, 
BerkeleyDB, xBase and a variety of SQL databases.  It also provides filesystem 
access methods.</p>

<p>Perl provides DBI modules which can access almost any SQL database that 
exists.  The DBI modules provide perl interfaces to database API calls.
LDAP and filesystem access modules exist too.</p>

<p>Python has modules similar to Perl.</p>

<p>Database API's are almost always C libraries and may be used directly by 
any C or C++ program or serverlet.  Alternatively, the ODBC API may be used in 
place of a specific database API on many platforms.  xBase databases can be 
accessed using Sequiter CodeBASE or a number of freeware libraries.  The 
BerkeleyDB database libraries come with Unix.  Various freeware LDAP API's are 
available as C libraries.</p>

<p>For connecting to Oracle databases from C or C++ programs, Pro C is another 
option.  Pro C is a preprocessor that translates embedded SQL into Oracle API 
calls.</p>

<p>Java provides JDBC (Java Database Connectivity) which, like ODBC, can
connect to a lot of different relational databases.  Additionally, some 
commercial database vendors provide a Java API for their databases along with 
the C API.</p>

<span class="heading1">Accessing Databases Without an API</span><br>

<p>Sometimes it's necessary to write a web-based application using a language
for which no database API's exist. Most languages can either execute 
DLL (shared object libraries on Unix) or COM (Common Object Model) object 
methods or make command line calls.  Shared object libraries can be written in 
C and command line programs can be written in some language listed above.  
Though some extra work (perhaps a lot of extra work) is involved, there is 
almost always a way to access a database from a language without an API.</p>

<p>An even more limiting issue is the platform issue.  Database API's only exist
for a limited set of platforms.  Some databases only have API's for Microsoft
operating systems on Intel-based hardware.  Others support both Microsoft and 
Unix, but only some Unixes.  What about MacOS?  What about PowerPC Linux?  
What about Microsoft NT on Alpha?  If the database API doesn't exist for a 
particular platform then the language-specific interface to it doesn't exist 
for that platform either.  A program which makes non-existant interface
calls won't run or compile.</p>

<p>There are solutions to the "no-API-on-my-platform" challenge.</p>

<p>If you need to access Microsoft SQL Server from Unix.  SQL Server and Sybase
both use the TDS (Tabular Data Stream) protocol, so if you're using ODBC on 
Unix the Sybase driver may work.  Unfortunately, SQL Server 7 uses a slightly
modified protocol.  Fortunately, the FreeTDS project provides API's which work 
with all releases of Sybase and Microsoft SQL Server.</p>

<p>Distributed processing is a generic solution.  RPC (Remote Procedure Call), 
Corba, DCOM (Distributed COM) and Java RMI (Remote Method Invocation)
allow a program to invoke methods on other computers and get the results back
locally.  These are useful if you have another computer somewhere to farm out
the database work to.  C and C++ support RPC and Corba on almost any platform
and DCOM on some platforms (including some Unixes, but it's really expensive).
Java supports RMI.  Perl and Python support some of these methods outright, and
may be extended with modules to support more.</p>

<p><b>SQL Relay</b> provides another solution for accessing many different
databases from software on unsupported platforms.  The <b>SQL Relay</b> API can
be compiled for platforms on which the server doesn't run.  The server can be 
run on a different machine where it communicates over the network with the 
client on one hand and the database on the other.  <b>SQL Relay</b> provides a 
service similar to the distributed processing solutions above, only through a 
highly specialized interface.</p>

<p>Another solution is to write command line programs on a remote machine (for
which the database API of interest exists) which execute database API calls 
and return the results to standard output or to a file on a shared volume.  
These command line programs can then be executed remtotely using rcmd or 
rexec.  The rcmd and rexec client/server system is available on most Unix 
systems out of the box and as freeware for other platforms.</p>

<span class="heading1">Performance Issues and Non-Issues</span><br>

<p>Some RDBMS's, especially transactional SQL RDBMS's have time consuming
login procedures, presumably because they were designed 
with an older client-server paradigm in mind.  In that paradigm, a client 
logs into an RDBMS when a user starts it up in the morning then accesses 
the database through the same connection over and over, all day long. 
The client doesn't logout until the user shuts it down at the end of the day
and doesn't login again until the next day.  Since clients login infrequently 
and at non-mission critical times, the amount of time associated with that 
particular activity is irrelevant.  Another reason may be that RDBMS's place a 
lower priority on accepting new connections than on handling currently 
exectuing queries.  Or it could just be that logins are really expensive in 
transactional RDBMS's because of the whole session/commit/rollback thing.  At 
any rate, web-based applications may need to log into an RDBMS every time a 
new page is loaded and the time those operations take can make an application 
intolerably slow.</p>

<p>Some databases were designed a long time ago and have no RDBMS overhead,
transactional or otherwise.  These include filesystems, BerkeleyDB and xBase 
databases.  Since the programmer is responsible for accessing the database 
using low level commands, database operations can be highly optimized.  The 
most signifigant performance issue encountered with these kinds of databases 
results from programmers using poorly optimized access algorithms because they 
are easier to implement than more highly optimized algorithms.</p>

<p>LDAP databases are designed for very high performance access and lower 
performance updates.  Everything about accessing an LDAP database is optimized
for speed, including logging in.</p>

<span class="heading1">Performance Solutions</span><br>

<p>Recognizing the performance issues associated with accessing RDBMS's from 
web-based applications, developers have devised some solutions.</p>

<p>One solution is to tune the database to make logins faster.  This
sounds like a good solution at first, but inevitably has performance 
consequences impacting other database activities.</p>

<p>Another solution is to attach the web-server directly to the RDBMS.  Oracle 
8i does just that.  It has limitations though.  Platform support is
extensive but may exclude your particular platform, PL/SQL is no speed demon, 
and Oracle 8i might be too expensive for some pursuits.</p>

<p>The third and most popular solution is to maintain pools of RDBMS 
connections and loan them out to needy executables.  Web application servers
do this by logging in when the application server starts up and allowing 
application threads to access the database through the open connection.
When run as an Apache module, PHP has the same capability.  Persistent CGI's 
written using the FastCGI library can be written to do the same thing.
<b>SQL Relay</b> provides a solution along these lines for accessing 
many different databases.  There are other examples, but those come to mind 
immediately.  This solution suffers with regard to dynamic scalability.  As the 
demand on an application increases, more immediately available database 
connections are needed.  Otherwise, the application stalls while new connections
start up or old ones finish their current job.  Application load must be 
predictable ahead of time for this solution to be flawlessly effective.</p>

<span class="heading1">Next...</span><br>

<p>Hopefully, this discussion has shed some light on an unfamiliar topic for
some and struck a familiar chord with others.  For more information about
<b>SQL Relay</b>, press your browser's back button and proceed to the next 
discussion in the list:  <b>Introduction to SQL Relay</b>.</p>

</body>
</html>