File: dbz.man

package info (click to toggle)
fidogate 4.2.8-3
links: PTS
area: main
in suites: hamm, slink
size: 5,804 kB
ctags: 2,843
sloc: ansic: 22,020; perl: 2,885; sh: 1,550; yacc: 671; makefile: 582
file content (396 lines) | stat: -rw-r--r-- 17,385 bytes
parent folder | download | duplicates (4)



DBZ(3Z)                                                   DBZ(3Z)


NNAAMMEE
       dbminit,  fetch, store, dbmclose - somewhat dbm-compatible
       database routines
       dbzfresh, dbzagain, dbzfetch, dbzstore - database routines
       dbzsync,   dbzsize,   dbzincore,   dbzcancel,  dbzdebug  -
       database routines

SSYYNNOOPPSSIISS
       ##iinncclluuddee <<ddbbzz..hh>>

       ddbbmmiinniitt((bbaassee))
       cchhaarr **bbaassee;;

       ddaattuumm
       ffeettcchh((kkeeyy))
       ddaattuumm kkeeyy;;

       ssttoorree((kkeeyy,, vvaalluuee))
       ddaattuumm kkeeyy;;
       ddaattuumm vvaalluuee;;

       ddbbmmcclloossee(())

       ddbbzzffrreesshh((bbaassee,, ssiizzee,, ffiieellddsseepp,, ccmmaapp,, ttaaggmmaasskk))
       cchhaarr **bbaassee;;
       lloonngg ssiizzee;;
       iinntt ffiieellddsseepp;;
       iinntt ccmmaapp;;
       lloonngg ttaaggmmaasskk;;

       ddbbzzaaggaaiinn((bbaassee,, oollddbbaassee))
       cchhaarr **bbaassee;;
       cchhaarr **oollddbbaassee;;

       ddaattuumm
       ddbbzzffeettcchh((kkeeyy))
       ddaattuumm kkeeyy;;

       ddbbzzssttoorree((kkeeyy,, vvaalluuee))
       ddaattuumm kkeeyy;;
       ddaattuumm vvaalluuee;;

       ddbbzzssyynncc(())

       lloonngg
       ddbbzzssiizzee((nneennttrriieess))
       lloonngg nneennttrriieess;;

       ddbbzziinnccoorree((nneewwvvaalluuee))

       ddbbzzccaanncceell(())

       ddbbzzddeebbuugg((nneewwvvaalluuee))




                            3 Feb 1991                          1





DBZ(3Z)                                                   DBZ(3Z)


DDEESSCCRRIIPPTTIIOONN
       These functions provide an indexing system for rapid  ran-
       dom  access  to  a  text file (the _b_a_s_e _f_i_l_e).  Subject to
       certain constraints, they are call-compatible with _d_b_m(3),
       although  they  also  provide some extensions.  (Note that
       they are _n_o_t  file-compatible  with  _d_b_m  or  any  variant
       thereof.)

       In  principle,  _d_b_z stores key-value pairs, where both key
       and value are arbitrary sequences of bytes,  specified  to
       the  functions  by  values of type _d_a_t_u_m, typedefed in the
       header file to be a structure with members _d_p_t_r  (a  value
       of  type  _c_h_a_r _* pointing to the bytes) and _d_s_i_z_e (a value
       of type _i_n_t indicating how long the byte sequence is).

       In practice, _d_b_z is  more  restricted  than  _d_b_m.   A  _d_b_z
       database  must  be  an  index  into  a base file, with the
       database _v_a_l_u_es being _f_s_e_e_k(3) offsets into the base file.
       Each such _v_a_l_u_e must ``point to'' a place in the base file
       where the corresponding _k_e_y sequence is found.  A key  can
       be  no  longer  than  DBZMAXKEY (a constant defined in the
       header file) bytes.  No key can be an initial  subsequence
       of  another, which in most applications requires that keys
       be either bracketed or terminated in  some  way  (see  the
       discussion  of  the _f_i_e_l_d_s_e_p parameter of _d_b_z_f_r_e_s_h, below,
       for a fine point on terminators).

       _D_b_m_i_n_i_t opens a database, an  index  into  the  base  file
       _b_a_s_e, consisting of files _b_a_s_e..ddiirr and _b_a_s_e..ppaagg which must
       already exist.  (If the database is new,  they  should  be
       zero-length   files.)   Subsequent  accesses  go  to  that
       database until _d_b_m_c_l_o_s_e is called to close  the  database.
       The  base  file need not exist at the time of the _d_b_m_i_n_i_t,
       but it must exist before accesses are attempted.

       _F_e_t_c_h searches the database for the specified _k_e_y, return-
       ing the corresponding _v_a_l_u_e if any.  _S_t_o_r_e stores the _k_e_y-
       _v_a_l_u_e pair in the database.  _S_t_o_r_e will  fail  unless  the
       database  files  are writeable.  See below for a complica-
       tion arising from case mapping.

       _D_b_z_f_r_e_s_h is a  variant  of  _d_b_m_i_n_i_t  for  creating  a  new
       database  with  more  control  over  details.   Unlike for
       _d_b_m_i_n_i_t, the database files need not exist: they  will  be
       created if necessary, and truncated in any case.

       _D_b_z_f_r_e_s_h's  _s_i_z_e parameter specifies the size of the first
       hash table within the database, in key-value pairs.   Per-
       formance  will  be  best if _s_i_z_e is a prime number and the
       number of key-value pairs stored in the database does  not
       exceed  about  2/3  of _s_i_z_e.  (The _d_b_z_s_i_z_e function, given
       the expected number of key-value  pairs,  will  suggest  a
       database  size  that meets these criteria.)  Assuming that
       an _f_s_e_e_k offset is 4 bytes, the ..ppaagg file will  be  4*_s_i_z_e



                            3 Feb 1991                          2





DBZ(3Z)                                                   DBZ(3Z)


       bytes (the ..ddiirr file is tiny and roughly constant in size)
       until the number of key-value pairs exceeds about  80%  of
       _s_i_z_e.   (Nothing  awful  will happen if the database grows
       beyond 100% of _s_i_z_e, but accesses will slow down  somewhat
       and the ..ppaagg file will grow somewhat.)

       _D_b_z_f_r_e_s_h's  _f_i_e_l_d_s_e_p parameter specifies the field separa-
       tor in the base file.  If this is not  NUL  (0),  and  the
       last character of a _k_e_y argument is NUL, that NUL compares
       equal to either a NUL or a  _f_i_e_l_d_s_e_p  in  the  base  file.
       This  permits  use of NUL to terminate key strings without
       requiring that NULs appear in the base file.  The _f_i_e_l_d_s_e_p
       of  a  database created with _d_b_m_i_n_i_t is the horizontal-tab
       character.

       For use in news systems, various  forms  of  case  mapping
       (e.g.  uppercase to lowercase) in keys are available.  The
       _c_m_a_p parameter to _d_b_z_f_r_e_s_h is a single character  specify-
       ing which of several mapping algorithms to use.  Available
       algorithms are:

              00      case-sensitive:  no case mapping

              BB      same as 00

              NNUULL    same as 00

              ==      case-insensitive:  uppercase  and  lowercase
                     equivalent

              bb      same as ==

              CC      RFC822   message-ID   rules,  case-sensitive
                     before `@'  (with  certain  exceptions)  and
                     case-insensitive after

              ??      whatever the local default is, normally CC

       Mapping algorithm 00 (no mapping) is faster than the others
       and is overwhelmingly the correct choice for most applica-
       tions.   Unless compatibility constraints interfere, it is
       more efficient to pre-map the keys, storing mapped keys in
       the  base  file,  than to have _d_b_z do the mapping on every
       search.

       For historical reasons, _f_e_t_c_h and _s_t_o_r_e expect  their  _k_e_y
       arguments  to  be  pre-mapped, but expect unmapped keys in
       the base file.  _D_b_z_f_e_t_c_h and _d_b_z_s_t_o_r_e do the same jobs but
       handle  all  case mapping internally, so the customer need
       not worry about it.

       _D_b_z stores only the database _v_a_l_u_es in its files,  relying
       on  reference  to the base file to confirm a hit on a key.
       References to the base  file  can  be  minimized,  greatly



                            3 Feb 1991                          3





DBZ(3Z)                                                   DBZ(3Z)


       speeding up searches, if a little bit of information about
       the keys can be stored in the _d_b_z files.  This is ``free''
       if  there are some unused bits in an _f_s_e_e_k offset, so that
       the offset can be _t_a_g_g_e_d with some information  about  the
       key.   The _t_a_g_m_a_s_k parameter of _d_b_z_f_r_e_s_h allows specifying
       the location of unused bits.  _T_a_g_m_a_s_k  should  be  a  mask
       with one group of contiguous 11 bits.  The bits in the mask
       should be unused (0) in _m_o_s_t offsets.  The bit immediately
       above  the mask (the _f_l_a_g bit) should be unused (0) in _a_l_l
       offsets; _(_d_b_z_)_s_t_o_r_e will reject attempts to store  a  key-
       value  pair in which the _v_a_l_u_e has the flag bit on.  Apart
       from this restriction, tagging is invisible to  the  user.
       As  a  special  case, a _t_a_g_m_a_s_k of 1 means ``no tagging'',
       for use with  enormous  base  files  or  on  systems  with
       unusual offset representations.

       A _s_i_z_e of 0 given to _d_b_z_f_r_e_s_h is synonymous with the local
       default; the normal default  is  suitable  for  tables  of
       90-100,000  key-value pairs.  A _c_m_a_p of 0 (NUL) is synony-
       mous with the character  00,  signifying  no  case  mapping
       (note  that  the  character ??  specifies the local default
       mapping, normally CC).  A _t_a_g_m_a_s_k of 0 is  synonymous  with
       the  local default tag mask, normally 0x7f000000 (specify-
       ing the top bit in a 32-bit offset as the  flag  bit,  and
       the  next  7  bits as the mask, which is suitable for base
       files up to circa 24MB).  Calling _d_b_m_i_n_i_t_(_n_a_m_e_)  with  the
       database    files   empty   is   equivalent   to   calling
       _d_b_z_f_r_e_s_h_(_n_a_m_e_,_0_,_'_\_t_'_,_'_?_'_,_0_).

       When databases are regenerated periodically, as  in  news,
       it  is  simplest to pick the parameters for a new database
       based on the old one.  This also permits  some  memory  of
       past  sizes  of  the  old database, so that a new database
       size can be chosen to cover expected fluctuations.   _D_b_z_a_-
       _g_a_i_n  is  a variant of _d_b_m_i_n_i_t for creating a new database
       as a new generation of  an  old  database.   The  database
       files  for  _o_l_d_b_a_s_e must exist.  _D_b_z_a_g_a_i_n is equivalent to
       calling _d_b_z_f_r_e_s_h with the same field separator, case  map-
       ping,  and  tag mask as the old database, and a _s_i_z_e equal
       to the result of applying _d_b_z_s_i_z_e to the largest number of
       entries in the _o_l_d_b_a_s_e database and its previous 10 gener-
       ations.

       When many accesses are being done by the same program, _d_b_z
       is  massively faster if its first hash table is in memory.
       If an internal flag is 1, an attempt is made to  read  the
       table  in when the database is opened, and _d_b_m_c_l_o_s_e writes
       it out to disk again (if it was read successfully and  has
       been  modified).   _D_b_z_i_n_c_o_r_e  sets  the  flag  to _n_e_w_v_a_l_u_e
       (which should be 0 or 1) and returns the  previous  value;
       this  does  not  affect  the status of a database that has
       already been opened.  The default is 0.   The  attempt  to
       read the table in may fail due to memory shortage; in this
       case _d_b_z quietly  falls  back  on  its  default  behavior.



                            3 Feb 1991                          4





DBZ(3Z)                                                   DBZ(3Z)


       _S_t_o_r_es to an in-memory database are not (in general) writ-
       ten out to the file  until  _d_b_m_c_l_o_s_e  or  _d_b_z_s_y_n_c,  so  if
       robustness  in  the  presence  of  crashes  or  concurrent
       accesses is crucial, in-memory databases  should  probably
       be avoided.

       _D_b_z_s_y_n_c  causes  all buffers etc. to be flushed out to the
       files.  It is  typically  used  as  a  precaution  against
       crashes  or  concurrent  accesses when a _d_b_z-using process
       will be running for a long time.  It is a somewhat  expen-
       sive operation, especially for an in-memory database.

       _D_b_z_c_a_n_c_e_l  cancels  any pending writes from buffers.  This
       is typically useful  only  for  in-core  databases,  since
       writes  are  otherwise done immediately.  Its main purpose
       is to let a child process, in the wake of  a  _f_o_r_k,  do  a
       _d_b_m_c_l_o_s_e without writing its parent's data to disk.

       If  _d_b_z has been compiled with debugging facilities avail-
       able (which makes it bigger and a  bit  slower),  _d_b_z_d_e_b_u_g
       alters  the  value  (and returns the previous value) of an
       internal flag which (when 1; default is 0) causes  verbose
       and cryptic debugging output on standard output.

       Concurrent  reading of databases is fairly safe, but there
       is no (inter)locking, so concurrent updating is not.

       The database files include a record of the byte  order  of
       the  processor creating the database, and accesses by pro-
       cessors with different byte order will work, although they
       will be slightly slower.  Byte order is preserved by _d_b_z_a_-
       _g_a_i_n.  However, agreement on the size and internal  struc-
       ture  of  an _f_s_e_e_k offset is necessary, as is consensus on
       the character set.

       An open database occupies three _s_t_d_i_o  streams  and  their
       corresponding  file descriptors; a fourth is needed for an
       in-memory  database.   Memory  consumption  is  negligible
       (except for _s_t_d_i_o buffers) except for in-memory databases.

SSEEEE AALLSSOO
       dbz(1), dbm(3)

DDIIAAGGNNOOSSTTIICCSS
       Functions returning _i_n_t values return 0  for  success,  -1
       for  failure.   Functions  returning _d_a_t_u_m values return a
       value with _d_p_t_r set to NULL for failure.  _D_b_m_i_n_i_t attempts
       to  have _e_r_r_n_o set plausibly on return, but otherwise this
       is not guaranteed.  An _e_r_r_n_o of EEDDOOMM  from  _d_b_m_i_n_i_t  indi-
       cates  that  the database did not appear to be in _d_b_z for-
       mat.

HHIISSTTOORRYY
       The original  _d_b_z  was  written  by  Jon  Zeeff  (zeeff@b-



                            3 Feb 1991                          5





DBZ(3Z)                                                   DBZ(3Z)


       tech.ann-arbor.mi.us).   Later contributions by David But-
       ler and Mark Moraes.  Extensive reworking, including  this
       documentation, by Henry Spencer (henry@zoo.toronto.edu) as
       part of the C News project.   Hashing  function  by  Peter
       Honeyman.

BBUUGGSS
       The  _d_p_t_r members of returned _d_a_t_u_m values point to static
       storage which is overwritten by later calls.

       Unlike _d_b_m, _d_b_z will misbehave if  an  existing  key-value
       pair  is  `overwritten'  by a new _(_d_b_z_)_s_t_o_r_e with the same
       key.  The user is responsible for avoiding this  by  using
       _(_d_b_z_)_f_e_t_c_h  first  to  check  for  duplicates; an internal
       optimization remembers the result of the first  search  so
       there is minimal overhead in this.

       Waiting  until  after  _d_b_m_i_n_i_t to bring the base file into
       existence will fail if _c_h_d_i_r(2) has been used meanwhile.

       The RFC822 case mapper implements only a first  approxima-
       tion to the hideously-complex RFC822 case rules.

       The prime finder in _d_b_z_s_i_z_e is not particularly quick.

       Should  implement  the _d_b_m functions _d_e_l_e_t_e, _f_i_r_s_t_k_e_y, and
       _n_e_x_t_k_e_y.

       On C implementations which trap integer overflow, _d_b_z will
       refuse to _(_d_b_z_)_s_t_o_r_e an _f_s_e_e_k offset equal to the greatest
       representable positive number, as this would  cause  over-
       flow in the biased representation used.

       _D_b_z_a_g_a_i_n  perhaps ought to notice when many offsets in the
       old database were too big for tagging, and shrink the  tag
       mask to match.

       Marking  _d_b_z's  file  descriptors close-on-_e_x_e_c would be a
       better approach to the problem _d_b_z_c_a_n_c_e_l tries to address,
       but that's harder to do portably.

















                            3 Feb 1991                          6