File: NoSQL-1.html

package info (click to toggle)
nosql 3.1-4
links: PTS
area: main
in suites: woody
size: 1,448 kB
ctags: 267
sloc: cpp: 1,028; ansic: 915; awk: 732; perl: 502; tcl: 292; sh: 289; makefile: 44
file content (196 lines) | stat: -rw-r--r-- 8,975 bytes
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE>NoSQL: Foreword and Introduction</TITLE>
 <LINK HREF="NoSQL-2.html" REL=next>

 <LINK HREF="NoSQL.html#toc1" REL=contents>
</HEAD>
<BODY BGCOLOR="#fff0e0">
<A HREF="NoSQL-2.html">Next</A>
Previous
<A HREF="NoSQL.html#toc1">Contents</A>
<HR>
<H2><A NAME="s1">1. Foreword and Introduction</A>  </H2>

<H2><A NAME="ss1.1">1.1 Copyright</A>
    </H2>

<P>NoSQL RDBMS, Copyright (C) 1998-2001 by Carlo Strozzi. Part of
NoSQL code comes from RDB, a similar package written by
W.Hobbs, and has been included in NoSQL with permission
from the author.
<P>This program comes with ABSOLUTELY NO WARRANTY; for details
refer to the GNU General Public License.
<P>A copy of the GNU General Public License is included with this
program, in the file COPYING.
<P>
<H2><A NAME="ss1.2">1.2 Preface</A>
    </H2>

<P>This working draft describes, and provides instructions for
the use of, NoSQL (I personally like to pronounce it
<EM>noseequel</EM>), a derivative of the RDB DataBase
system.
The RDB system was (and still is) developed
at RAND Organization by Walter W. Hobbs. Part of the NoSQL
code, as well as large parts of the text of this document,
have been taken directly from RDB, so a good share of the credit
goes to the original author.
<P>Other major contributors to the original RDB system, besides W. Hobbs,
were:
<P>Chuck Bush
<P>Don Emerson
<P>Judy Lender
<P>Roy Gates Rae Starr
<P>People who helped with turning RDB into NoSQL:
<P>David Frey
<P>Maurizio (Masar) Sartori
<P>Vincenzo (Vicky) Belloli
<P>Giuseppe Patern&ograve;
<P>Paul Lussier
<P>Seth LaForge
<P>The NoSQL.png logo has been kindly provided by Kyle Hart.
<P>NoSQL tends to be biased in favour of
<A HREF="http://www.linux.org">Linux</A>. This means that,
wherever it matters, NoSQL makes use of the GNU
versions of the various UNIX commands, as those are the ones
normally found on this UNIX workalike. NoSQL is
<A HREF="http://www.gnu.org/philosophy/categories.html">Free Software</A>, released under the terms of the 
<A HREF="http://www.gnu.org/copyleft/gpl.html">GNU General Public License</A>. As such, it fully qualifies as
<A HREF="http://www.opensource.org">Open Source</A>
Software.
<P>
<H2><A NAME="sec-intro"></A> <A NAME="ss1.3">1.3 Introduction </A>
    </H2>

<P>A good question one could ask is "With all the relational
database management systems available today, why do we need
another one ?". The main reasons are:
<OL>
<LI>Several times I have found myself writing applications
that needed to rely upon <EM>simple</EM> database management
tasks. Most commercial database products are often too
costly and too feature-packed to encourage casual use.
There is also plenty of good free databases around, but they
too tend to provide far more than I need most of the times,
and they too lack the shell-level approach of NoSQL.
By contrast, NoSQL takes a very simple approach (even simplistic,
as some may argue :-), but that is exactly its distinguishing
feature. Admittedly, having
been written mostly with interpretive languages (Shell, Perl, AWK),
NoSQL is not the fastest DBMS of all, at least not always
(a lot depends on the application).</LI>
<LI>NoSQL is easy to use by non-computer people. The concept
is straightforward and logical. To select rows of data,
the 'row' operator is used; to select columns of
data, the 'column' operator is used.</LI>
<LI>The data is highly portable to and from other types of
machines, like Macintoshes or DOS computers.</LI>
<LI>The system should run on any UNIX machine that has perl(1)
and mawk(1) installed.</LI>
<LI>NoSQL essentially has no arbitrary limits, and can work where
other products can't. For example there is no limit on data field
size, the number of columns, or file size (the number of
columns in a table may actually be limited to 32.768 by some
implementations of the AWK interpreter, including <CODE>mawk</CODE>
I think).</LI>
</OL>
<P>Note: NoSQL has only been tested with mawk(1), that is Mike Brennan's
implementation of the AWK programming language, and will most
likely *not* work out-of-the-box with any other AWK, including
gawk(1). While getting NoSQL to work also with gawk(1) should not
be difficult, making it work with other AWKs may prove hard, if at
all possible.
<P>As its name implies, NoSQL is <EM>not</EM> an SQL database.
The rationale behind this is well explained in the accompanying
paper 
<A HREF="4gl.ps">4gl.ps</A> (Postscript), or
<A HREF="4gl.txt">4gl.txt</A> (ASCII).
<P>The data is contained in regular UNIX ASCII files, and so
can be manipulated by regular UNIX utilities, e.g. ls, wc,
mv, cp, cat, more, less, editors like 'vi', head, RCS, etc.
<P>The form of each file of data is that of a relation, or table,
with rows and columns of information.
<P>To extract information, a file of data is fed to one or more
"operators" via the UNIX Input/Output redirection mechanism.
<P>There are also programs to generate, modify, and validate the data.
A thorough discussion on why this type of relational
database structure makes sense is found in the book, "UNIX
Relational Database Management", Reference #2.
<P>It is assumed that the reader has at least a minimum
knowledge of the UNIX Operating System, including knowledge
of Input/Outout redirection (e.g., STDIN, STDOUT, pipes).
<P>Again, the key feature of NoSQL (and other similar packages 
mentioned in this manual), is its <B>close integration with
UNIX</B>. Real-world problems are tipically more complex than
the data models provided by many DBMS'es.
Actual applications, and Web-based ones make no exception, are complex
puzzles made up by many small pieces, several of which are data-related.
Unlike other fourth generation systems, NoSQL is an
extension of the UNIX environment, making available the full
power of UNIX during application development and usage.
NoSQL was designed with the UNIX shell language as its user
interface. This
level of integration removes the need to learn yet another set of
commands to use and administer the database system. A database
is just a file, and can be maintained like all other files
that the user owns or has access to. Because NoSQL commands are
executable programs, the UNIX shell is inherited as the primary
command language of the database; no other proprietary database
scripting language, to my knowledge, is as powerful and 
flexible as the UNIX shell. The shell-level nature of NoSQL
encourages casual use of the system, and successful casual use leads
to familiarity and successful formal use.
This concept is much more thoroughly
explained in the paper "The UNIX Shell As a Fourth Generation
Language", included in the NoSQL documentation tree with the
file name 
<A HREF="4gl.ps">4gl.ps</A> (Postscript) or
<A HREF="4gl.txt">4gl.txt</A> (ASCII),
that shows why the UNIX shell is an excellent tool for scripting
database access.
<H2><A NAME="sec-philosophy"></A> <A NAME="ss1.4">1.4 Perl and the Operator/Stream Paradigm </A>
    </H2>

<P>As stated in the Abstract, NoSQL uses the Operator/Stream
DBMS Paradigm. The main reason why I decided to turn
the original RDB system into NoSQL is that the former
is entirely written in Perl. Perl is a good programming
language for writing <EM>self-contained</EM> programs,
but Perl's pre-compilation phase and long start-up time
are worth paying only if once the program has loaded it
can do everything in one go. This contrasts sharply with
the Operator/Stream model, where operators are chained
together in pipelines of two, three or more programs.
The overhead associated with initializing Perl at every stage of
the pipeline makes pipelining somewhat inefficient. A better
way of manipulating structured ASCII files is using
the AWK programming language, which is <EM>much</EM> smaller than
Perl, is more specialized for this task and very fast at startup
(on my Pentium II Linux /usr/bin/mawk (POSIX AWK) is just 99K.
Perl 5 is almost 500K. You get the point).
<H2><A NAME="ss1.5">1.5 Bug reports</A>
    </H2>

<P>There is a mailing list for discussions related to NoSQL. The address is
<A HREF="mailto:noseequel@texne.com">noseequel@texne.com</A>.
To subscribe simply send a message to
<A HREF="mailto:minimalist@texne.com">minimalist@texne.com</A>  with the
word "subscribe" (without the quotes) in the message subject.
<P>Please send bug reports (fixes are most welcome :-) to the same list
<A HREF="mailto:noseequel@texne.com">noseequel@texne.com</A>.
Always include as much information as possible, especially the content
of file <CODE>nosql_version</CODE>, which is created in the NoSQL
installation directory during install.
By 'bug reports' I mean not just errors in
the code, but also grammatical mistakes, typos and bad English 
constructions in the documentation, as English isn't my native
language :-)
<HR>
<A HREF="NoSQL-2.html">Next</A>
Previous
<A HREF="NoSQL.html#toc1">Contents</A>
</BODY>
</HTML>