1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996
|
WhiteDB Tutorial
================
Introduction
------------
This tutorial will cover the basic usage of WhiteDB's C API. Most examples
you will encounter here will also be available in the Examples directory
of the WhiteDB source distribution.
A thorough reference of the API is available in the 'Manual.txt' file. If
you're looking for information on how to use WhiteDB from Python, please see
'python.txt' (also in semi-tutorial form).
Compiling the examples
----------------------
Before we can get started with the tutorial, we need to know how to compile
and run programs that use WhiteDB.
If you invoked the standard `./configure; make; make install`, things are
quite simple. Let's try to compile 'Examples/demo.c' (it's already compiled,
but let's do it again).
gcc -o Examples/mydemo Examples/demo.c -lwgdb
That's it! This is how you'd normally compile and link a WhiteDB program.
The `-lwgdb` tells the linker to use 'libwgdb.a' that sits somewhere in your
library path. If you get an error at this point, it may be that your
computer has libraptor installed. In that case WhiteDB automatically decides
that it wants to use it; just add `-lraptor` to the command line and all
should be fine.
You can now run `Examples/mydemo` and see what it does. Also, you may skip the
rest of this section and go directly to "Connecting to the database".
In the event that you haven't run `make install` (let's say you're still
evaluating and getting acquainted with WhiteDB) or you've installed it
in a location that's not in your standard library path, you'll need to
add a few things.
gcc -L/alternate/libpath -o Examples/mydemo Examples/demo.c -lwgdb
Alternate libpath is where 'libwgdb.a' is on your machine. If you didn't
`make install` at all, it is still in 'Main/.libs'. Also, your system probably
does not know where to look for the libraries, so if you just run
`Examples/mydemo`, it will exit with "error while loading shared libraries".
LD_LIBRARY_PATH=/alternate/libpath Examples/mydemo
will take care of that. However, things will definitely be easier once you've
done `make install`. Another issue that you might run into is that the compiler
does not know where the API header files are. The programs in the 'Examples'
directory work around that by referring to the headers in the 'Db' directory
directly, but you might prefer a more flexible way. In this case, try
gcc -L/alternate/libpath -I/path/to/headers -o myprog myprog.c -lwgdb
where '/path/to/headers' is where 'dbapi.h' is located.
Finally, in the event that your system does not have the make
program or some other part of the toolchain required for the standard
installation, but you still have the C compiler, such as gcc, you may
directly compile the examples or your own programs with the WhiteDB sources.
Have a look at 'Examples/compile_demo.sh' to see what source files should
be compiled.
So you're on Windows
~~~~~~~~~~~~~~~~~~~~
So far the there hasn't been a peep about following the tutorial on a Windows
computer, but don't feel left out - the compilation is different enough that
it deserves it's own section.
You need the MSVC compiler (provided by Microsoft Visual Studio, Express
Edition, for example). Set it up so you can run `cl.exe` from the command
prompt. Visual Studio includes it's own command prompt menu entry that has
the environment set up correctly for you.
First, we recommend that you compile the 'wgdb.lib'. If you followed the
installation documentation, you probably have it in WhiteDB's directory
already. If not, run `compile.bat`. This produces everything you'll
need for the tutorial. Now try:
cl.exe /FeMYDEMO Examples\demo.c wgdb.lib
This produces 'mydemo.exe' in the current directory. As long as the file
'wgdb.dll' is also in the same directory, you can run `mydemo.exe` and see
the output.
The above command works for the distributed examples, but it should be
pointed out that the following way is more flexible, once you've started
creating your own programs:
cl.exe /I"\path\to\whitedb\headers" yourprog.c \path\to\wgdb.lib
Replace the '\path\to..'-s with the actual directories where the WhiteDB files
are on your computer.
Connecting to the database
--------------------------
Before you can read from or write to the WhiteDB database, your program
needs to connect to it. Let's look at how we might do that ('Examples/tut1.c'):
[source,C]
----
#include <whitedb/dbapi.h> /* or #include <dbapi.h> on Windows */
int main(int argc, char **argv) {
void *db;
db = wg_attach_database("1000", 2000000);
return 0;
}
----
First, the program needs to include the API headers. There are a few other
header files distributed with WhiteDB, but 'dbapi.h' is the one we'll need
for now.
NOTE: The programs in the 'Examples' directory use a different way of including
the headers, by referring to their location directly. This is so that the
examples can be compiled before the installation of the database and it is
perfectly acceptable - but let's stick to the standard way of using library
headers in this tutorial.
`void *db` is the database handle. Once we have the handle, we can use it in
all the subsequent database operations - it will always point to the same
database we originally attached to. Why stress that it is the same database?
WhiteDB allows using multiple databases in parallel, without any prior
configuration. The number "1000" we give to the `wg_attach_database()` function
is the key that refers to the shared memory segment containing our database.
Observe that when using `wg_attach_database()`, it does not matter whether the
database already exists or not - it will be created, if necessary. The size
of the database will be the one we supplied, 2MB in this case. When the
program exits, the database will remain in memory.
When you have already created a database in shared memory, you can later use
`wg_attach_existing_database(dbname)` which functions exactly as
`wg_attach_database(dbname,...)` but does not create a new database.
If no database with the name `dbname` is found, it simply returns NULL.
This is quite handy when you want to avoid creating a new base or just want
to check whether it exists already.
Adding data to the database
---------------------------
An empty database isn't usually much of a practical use, so we need to learn
how to populate it with data. It is actually a three-step process: creating
a record, encoding the data and writing to the fields of the records.
Records
~~~~~~~
A WhiteDB record is a n-tuple of encoded data. The n refers to the length of
the record and there is no specific limit except that it must fit inside
the database memory segment (of course, the size is given as `wg_int`, the
universal datatype of WhiteDB, which itself has a maximum value, but this is
quite large, especially on a 64-bit system).
void *rec = wg_create_record(db, 10);
The datatype of the record is `void *`, just like the database handle. Now we
can use `rec` any time we need to do something with the record we've created.
By the way, the records do not all need to be the same size, so we could do
void *rec2 = wg_create_record(db, 2);
and have two records, one of them 10 fields and the other 2 fields in size.
However, the size is final and cannot be changed later.
Data in WhiteDB
~~~~~~~~~~~~~~~
An important distinction between WhiteDB and traditional databases is that
the user can and in some cases must pop the hood open and get their hands
dirty. Data encoding is one of such cases.
Everything inside the database is a "WhiteDB int", or a `wg_int` when we're
writing C code. These are basically numbers (32-bit or 64-bit integers,
depending on your system), but for WhiteDB they contain encoded pieces of
information - type of a value and the value itself or some way to access the
value.
So whenever we need to write something, be it a string, a number or a date
to the database, first we have to encode it so that WhiteDB is ready to handle
it.
wg_int enc = wg_encode_int(db, 443);
wg_int enc2 = wg_encode_str(db, "this is my string", NULL);
The first line should be self-explanatory - `enc` is now 443 in WhiteDB's
internal format. When encoding a string, be aware that the string itself
will be written to the database memory segment at that point - the encoded
value `enc2` will merely contain a reference to it. Also, there is a third
parameter which we can ignore for simple applications.
Setting field values
~~~~~~~~~~~~~~~~~~~~
You may be asking yourself why do we need to bother with encoding the values
when we could simply write things like integers or character arrays directly.
The main reason for that is that WhiteDB is schemaless. When we created
records, we did not specify what type any of the fields were - they can be of
any type. The encoded value is how WhiteDB can tell what type of data it is
dealing with, since field 1 could be an integer in one record, a
floating-point number in another one and so on.
With that out of the way, let's take our encoded data and store it properly
in the database:
wg_set_field(db, rec, 7, enc);
wg_set_field(db, rec2, 0, enc2);
Field 7 of the first record now contains 443 and field 0 of the second record
(which has two fields, field 0 and field 1) contains "this is my string".
We didn't touch any of the other fields and if we were to look at the
contents of the records now, these would be filled with NULL values. Each time
a new record is created, it initially contains a row of NULL-s which the user
can then overwrite with their own data.
Here is our complete example ('Examples/tut2.c'):
[source,C]
----
#include <whitedb/dbapi.h>
int main(int argc, char **argv) {
void *db, *rec, *rec2;
wg_int enc, enc2;
db = wg_attach_database("1000", 2000000);
rec = wg_create_record(db, 10);
rec2 = wg_create_record(db, 2);
enc = wg_encode_int(db, 443);
enc2 = wg_encode_str(db, "this is my string", NULL);
wg_set_field(db, rec, 7, enc);
wg_set_field(db, rec2, 0, enc2);
return 0;
}
----
It is likely that you need to deal with more types than just strings and
integers. The manual will provide a full list of supported types.
The wgdb utility
----------------
Once you've started working with WhiteDB, the `wgdb` tool may come in handy
to manage the databases, so let's take a quick look at it. First we deal with
database persistence, you may skip to "Looking at data" if you're not on
Windows.
Database persistence on Windows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The way shared memory works on Windows is that it is only present as long as
there is a program holding a handle to it. So when we compile and run the
previous example, the data gets written to the memory but then the program
terminates and the database immediately disappears. To get around that, run
wgdb.exe 1000 server 2000000
in another window. That will keep the shared memory present, until you press
CTRL+C. You can now run the tutorial programs and the following examples
should work.
Looking at data
~~~~~~~~~~~~~~~
If you ran the program from the previous section, there should be some records
in memory now. Let's take a look:
wgdb 1000 select 20
It should return something like this:
[NULL,NULL,NULL,NULL,NULL,NULL,NULL,443,NULL,NULL]
["this is my string",NULL]
The "1000" in the command is the same shared memory key we used earlier.
"select" prints records from the database and "20" limits the maximum number
of records that will be shown. There is also a query command that lets you
specify which records you are interested in:
wgdb 1000 query 7 = 443
That will only return the first record, the one where field 7 equals 443.
There are other comparison operators: "!=" for not equal, "<" for less than,
"<=" for less than or equal and so forth. Currently the query command does not
have a row limit parameter.
Modifying data
~~~~~~~~~~~~~~
The command line tools allows some data manipulation: deleting and adding
records. The "del" command has the same syntax as the query command, so
wgdb 1000 del 7 = 443
will delete the first row from the database. We can also add records, but
only integer and string values are recognized this way - dealing with other
types unambiguously would become complicated.
wgdb 1000 add 1 2 3
This created a record with the length 3 and inserted three integer values
in it. Let's see what the database now contains. By the way, since "1000" is
the default key, we may omit it:
wgdb select 20
The entire contents of the database will now be:
["this is my string",NULL]
[1,2,3]
Freeing the memory
~~~~~~~~~~~~~~~~~~
At some point we may need to delete the database for whatever reason. The
`wgdb` tool will help:
wgdb 1000 free
The database with the given key will be freed. Again, "1000" may be omitted
as it is the default.
Making queries
--------------
Finding matching records
~~~~~~~~~~~~~~~~~~~~~~~~
Finding records that match some condition is easy:
void *rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, NULL);
This returns the first record that has the integer 443 in field 7. That much
is obvious, but some of the parameters might need extra explanation.
First, just a reminder that `db` is the database handle we've been using
each time we call a WhiteDB function. As a second parameter we give the
number of the field that the database engine should check against the
value we've given.
The third parameter is the condition: we need some way of stating that we want
records where "field 7" "equals" "443" so that is the "equals" part. There are
other conditions, for example, if we substituted WG_COND_LESSTHAN there, we
would receive a record where the value in field 7 is less than 443. The full
list of possibe conditions is given in 'Manual.txt'.
The fourth parameter is, of course, the value. The function we called ended
with '_int' and that parameter should also be of the type `int`. There is
a function for most of WhiteDB's datatypes, so if we were looking for a
string we would use `wg_find_record_str()` instead.
Now let's turn our attention to the mysterious NULL parameter. Remember
that our example function call returned the *first* record that matched
our parameters? What if there are more matching records and we want to find
those too? That can be done:
void *nextrec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, rec);
Instead of the NULL, we can give the record that the function returned last
time and WhiteDB will return the next one that matches the same condition.
The following example will call `wg_find_record_int()` in a cycle, finding
all the matching record from the database. We're adding some records, so it
will print "Found a record..." at least once. Run it multiple times and see
the number of matching records increase ('Examples/tut3.c'):
[source,C]
----
#include <stdio.h>
#include <whitedb/dbapi.h>
int main(int argc, char **argv) {
void *db, *rec;
wg_int enc;
db = wg_attach_database("1000", 2000000);
/* create some records for testing */
rec = wg_create_record(db, 10);
enc = wg_encode_int(db, 443); /* will match */
wg_set_field(db, rec, 7, enc);
rec = wg_create_record(db, 10);
enc = wg_encode_int(db, 442);
wg_set_field(db, rec, 7, enc); /* will not match */
/* now find the records that match our condition
* "field 7 equals 443"
*/
rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, NULL);
while(rec) {
printf("Found a record where field 7 is 443\n");
rec = wg_find_record_int(db, 7, WG_COND_EQUAL, 443, rec);
}
return 0;
}
----
Full query interface
~~~~~~~~~~~~~~~~~~~~
The above method of finding data is convinient, but it can be too limited
(and in some specific cases inefficient) so eventually we may need to make
use of full queries. The query API has a number of features we will not
be discussing here, instead we'll look at only the basic steps.
Running a query in WhiteDB consists of preparing the argument list, creating
the query itself and then fetching the matching records from the query.
wg_query_arg arglist[2];
arglist[0].column = 7;
arglist[0].cond = WG_COND_EQUAL;
arglist[0].value = wg_encode_query_param_int(db, 443);
The `wg_query_arg` type is where we store one condition that the returned
records should match (or, a "clause" of the query, if you like more exact
terminology). Here we've specified again that we'd like to find records where
"field 7 equals 443".
Notice that we declared `arglist` as an array of 2 elements? Well, this is
because we can give more than one argument:
arglist[1].column = 6;
arglist[1].cond = WG_COND_EQUAL;
arglist[1].value = wg_encode_query_param_null(db, NULL);
Now we're looking for records where "field 7 equals 443 and field 6 equals
NULL". The value for both arguments is encoded and it's recommended to use the
special `wg_encode_query_param_*()` functions for that purpose.
wg_query *query = wg_make_query(db, NULL, 0, arglist, 2);
We pass the argument list and it's size, which is 2, to WhiteDB (ignore the
other parameters for now, they're not used if you use the argument list).
We receive a query object in return and are finally ready to start fetching
records:
void *rec = wg_fetch(db, query);
The `wg_fetch()` function will return a different record each time you call it
and eventually it will return NULL, meaning that you've already fetched all
the records that match our argument list. Finally we should do some
housekeeping, as queries may take up quite a bit of memory:
wg_free_query(db, query);
wg_free_query_param(db, arglist[0].value);
wg_free_query_param(db, arglist[1].value);
That was quite a bit of work to do essentialy the same thing we achieved
with the help of just one function earlier, but it will also give you more
power and flexibility. The following program ('Examples/tut4.c') summarizes
what we've looked at here:
[source,C]
----
#include <stdio.h>
#include <whitedb/dbapi.h>
int main(int argc, char **argv) {
void *db, *rec;
wg_int enc;
wg_query_arg arglist[2]; /* holds the arguments to the query */
wg_query *query; /* used to fetch the query results */
db = wg_attach_database("1000", 2000000);
/* just in case, create some records for testing */
rec = wg_create_record(db, 10);
enc = wg_encode_int(db, 443); /* will match */
wg_set_field(db, rec, 7, enc);
rec = wg_create_record(db, 10);
enc = wg_encode_int(db, 442);
wg_set_field(db, rec, 7, enc); /* will not match */
/* now find the records that match the condition
* "field 7 equals 443 and field 6 equals NULL". The
* second part is a bit redundant but we're adding it
* to show the use of the argument list.
*/
arglist[0].column = 7;
arglist[0].cond = WG_COND_EQUAL;
arglist[0].value = wg_encode_query_param_int(db, 443);
arglist[1].column = 6;
arglist[1].cond = WG_COND_EQUAL;
arglist[1].value = wg_encode_query_param_null(db, NULL);
query = wg_make_query(db, NULL, 0, arglist, 2);
while((rec = wg_fetch(db, query))) {
printf("Found a record where field 7 is 443 and field 6 is NULL\n");
}
/* Free the memory allocated for the query */
wg_free_query(db, query);
wg_free_query_param(db, arglist[0].value);
wg_free_query_param(db, arglist[1].value);
return 0;
}
----
You may run this program a couple of times, then run `wgdb select 20` to
verify that the tutorial program prints the correct number of rows.
Doing things properly
---------------------
The examples we've been following up to now have been a bit sloppy. We haven't
bothered to check whether the WhiteDB functions fail or succeed, nor to clean
up after after we were done - there was only the bit about freeing queries
which was just too important to ignore.
First thing you should consider is that attaching to a database can sometimes
fail. For example, it is possible that you requested more shared memory than
the system configuration allows. So do a check like this:
void *db = wg_attach_database("1000", 1000000);
if(db == NULL) { /* do something to handle the error */ }
Creating records or encoding data can fail if the database is full - actually
a common occurence since memory databases are naturally smaller than the
traditional disk-based ones.
void *rec = wg_create_record(db, 1000);
if(rec == NULL) { /* record was not created, can't use it */ }
wg_int enc = wg_encode_str(db, "This could fail", NULL);
if(enc == WG_ILLEGAL) { /* string encoding failed */ }
Notice the WG_ILLEGAL value? This is a special encoded value that WhiteDB uses
to tell you that something went wrong. All `wg_encode_*()` functions return
that so it is easy to check for encode errors.
Decoding values can also fail. The most obvious case is when you expect some
field in a record to be of a certain type, but some other program or user has
written a different value there. This can sometimes be tricky to detect
so you should consult the 'Manual.txt' file how a particular `wg_decode_*()`
function behaves.
If the database is used in a way that makes it difficult to predict what
type of data a field contains, there is a way to find out:
if(wg_get_field_type(db, rec, 0)) == WG_STRTYPE) {
printf("Field 0 in rec is a string\n");
}
Finally, here's something that is nice to do whenever a program stops using
a database:
wg_detach_database(db);
This detaches us from the shared memory and also frees any memory that may
be allocated for the database handle.
Let's try to apply all of those things in practice ('Examples/tut5.c'). The
program creates records in a loop and writes a short string value in each
of them. We have a small database, so soon we'll run out of space, causing
some errors which we are now able to cope with:
[source,C]
----
#include <stdio.h>
#include <stdlib.h>
#include <whitedb/dbapi.h>
int main(int argc, char **argv) {
void *db, *rec, *lastrec;
wg_int enc;
int i;
db = wg_attach_database("1000", 1000000); /* 1MB should fill up fast */
if(!db) {
printf("ERR: Could not attach to database.\n");
exit(1);
}
lastrec = NULL;
for(i=0;;i++) {
char buf[20];
rec = wg_create_record(db, 1);
if(!rec) {
printf("ERR: Failed to create a record (made %d so far)\n", i);
break;
}
lastrec = rec;
sprintf(buf, "%d", i); /* better to use snprintf() in real applications */
enc = wg_encode_str(db, buf, NULL);
if(enc == WG_ILLEGAL) {
printf("ERR: Failed to encode a string (%d records currently)\n", i+1);
break;
}
if(wg_set_field(db, rec, 0, enc)) {
printf("ERR: This error is less likely, but wg_set_field() failed.\n");
break;
}
}
/* For educational purposes, let's pretend we're interested in what's
* stored in the last record.
*/
if(lastrec) {
char *str = wg_decode_str(db, wg_get_field(db, lastrec, 0));
if(!str) {
printf("ERR: Decoding the string field failed.\n");
if(wg_get_field_type(db, lastrec, 0) != WG_STRTYPE) {
printf("ERR: The field type is not string - "
"should have checked that first!\n");
}
}
}
wg_detach_database(db);
return 0;
}
----
To try this out, let's first try to cause an error on attaching. Type these
commands:
wgdb free
wgdb create 999999
Then run the example program. Since we've already created a database and it's
smaller than what the progam is requesting, it should complain and exit early.
When done with this, type `wgdb free` again to delete the smaller database.
Run the example program again, this time there is nothing in the way so it
can create the database named "1000" itself.
The results can be a bit unpredictable, but you should either see a record
creation error or a number of errors related to a string field. Of course,
this does not mean that the program failed or did nothing useful - a number
of records were created successfully, we just eventually ran out of database
space. Everything our program printed has "ERR:" in front of it - the rest
of the messages come from WhiteDB.
Parallel use
------------
WhiteDB never locks the database for you. Whenever something is read or
written, the engine just goes and does it without checking whether some other
program (or user) is currently using the database. This goes with the
philosophy of speed and simplicity.
But there are many use cases where parallel use of the database is needed and
in those cases everybody cannot just crowd the database at the same time and
start making changes to the shared memory area - that could result in
inconsistent data, or worse, a corrupt and useless database. Fortunately,
WhiteDB does provide the user with the tools to handle that.
The rule of thumb is that you need these concurrency control functions
whenever the database is *both read and written* by several processes and
possibly at the same time. For example, if a database is serving data to
a webserver and there are occasional updates to the data (without shutting
down the webserver), that would qualify as needing concurrency control.
To implement this concurrency control, we first request permission to read
whenever we are about to read something from the database and similarly
declare our intention to write to the database. Once we're done we inform
the database engine that we're finished so others may proceed. So,
wg_int lock_id = wg_start_read(db);
/* do some reading */
wg_end_read(db, lock_id);
requests permission to read. There may be some time until the function
`wg_start_read()` returns - it may need to wait for some other process to
finish whatever it is doing with the database. Once it returns the `lock_id`,
we have shared access to the database - we may read it safely and so may
other processes, but no one can write anything. `wg_end_read()` declares
that we no longer need the read access.
It is quite possible that `wg_start_read()` fails - it can happen under heavy
load or if some other process is hogging the database for a long time. We
should always check:
lock_id = wg_start_read(db);
if(!lock_id) {
printf("wg_start_read() timed out\n");
exit(1); /* or go and retry, whatever is appropriate */
}
Getting write access is similar, the major difference is that once we get
the permission, we have exclusive access - everyone else has to wait until
we're done adding or updating the data:
lock_id = wg_start_write(db);
if(!lock_id) {
printf("wg_start_write() timed out\n");
exit(1);
}
/* do some writing */
wg_end_write(db, lock_id);
NOTE: At present time these functions behave exactly like operating on a
single big database-level lock. This tutorial does not make a secret of it,
however, the future direction may be that they start behaving more like
transactions. The important thing to remember is that the purpose is to allow
you to read and write data safely, without corruption and inconsistency. By
following the pattern described here, your program will continue to work
unmodified, no matter how WhiteDB implements things internally.
To illustrate parallel use, we will implement a counter that is incremented
from two programs simultaneously. This kind of example is frequently used
in parallel programming tutorials, when done naively the counter counts
incorrectly because the processes end up ignoring some of the increments done
by the other process. We will place this counter value inside a WhiteDB
database ('Examples/tut6.c').
[source,C]
----
#include <stdio.h>
#include <stdlib.h>
#include <whitedb/dbapi.h>
#define NUM_INCREMENTS 100000
void die(void *db, int err) {
wg_detach_database(db);
exit(err);
}
int main(int argc, char **argv) {
void *db, *rec;
wg_int lock_id;
int i, val;
if(!(db = wg_attach_database("1000", 1000000))) {
exit(1); /* failed to attach */
}
/* First we need to make sure both counting programs start at the
* same time (otherwise the example would be boring).
*/
lock_id = wg_start_read(db);
rec = wg_get_first_record(db); /* our database only contains one record,
* so we don't need to make a query.
*/
wg_end_read(db, lock_id);
if(!rec) {
/* There is no record yet, we're the first to run and have
* to set up the counter.
*/
lock_id = wg_start_write(db);
if(!lock_id) die(db, 2);
rec = wg_create_record(db, 1);
wg_end_write(db, lock_id);
if(!rec) die(db, 3);
printf("Press a key when all the counter programs have been started.");
fgetc(stdin);
/* Setting the counter to 0 lets each counting program know it can
* start counting now.
*/
lock_id = wg_start_write(db);
if(!lock_id) die(db, 2);
wg_set_field(db, rec, 0, wg_encode_int(db, 0));
wg_end_write(db, lock_id);
} else {
/* Some other program has started first, we wait until the counter
* is ready.
*/
int ready = 0;
while(!ready) {
lock_id = wg_start_read(db);
if(!lock_id) die(db, 2);
if(wg_get_field_type(db, rec, 0) == WG_INTTYPE)
ready = 1;
wg_end_read(db, lock_id);
}
}
/* Now start the actual counting. */
for(i=0; i<NUM_INCREMENTS; i++) {
lock_id = wg_start_write(db);
if(!lock_id) die(db, 2);
/* This is the "critical section" for the counter. we read the value,
* increment it and write it back.
*/
val = wg_decode_int(db, wg_get_field(db, rec, 0));
wg_set_field(db, rec, 0, wg_encode_int(db, ++val));
wg_end_write(db, lock_id);
}
printf("\nCounting done. My last value was %d\n", val);
return 0;
}
----
Before running the program, type `wgdb free`. We've tried to keep the example
as simple as possible, so it needs a completely empty database to start. Next
start the program in one window; it will prompt you to press a key. Now go
to another window and start the program one more time - this one will just
wait and do nothing.
Both programs are now ready to start counting, but the first program is
waiting for a key press and the second one is waiting for the counter to get
it's initial value. Go back to the first window and press a key. The counter
will be initialized and both programs proceed to increment it 100000 times.
Type `wgdb select 1` to check what the value of the counter is - it should be
200000.
If those programs were for different sensors in a real-life program, counting
some event, we would have managed to get the correct number of events in the
end.
Network database
----------------
If you can picture the content of a WhiteDB database, there isn't necessarily
much structure there - just many, many records which may be of different
sizes. Because you can query records by the field contents, the fields
can be thought of as columns and the entire database as a flat table where
the records form the rows.
To fit any kind of data model in WhiteDB, the user needs to create the
structure themselves. Carrying over the RDBMS mindset and creating, for
example 'employee', 'address' and 'customer' records, which can be referred
to by some id field, would certainly work, but there is another way.
Before there were relational databases, there was the network model. It is
a fairly simple concept where instead of referring to to something by key, you
include it directly: if I want to add an address to a customer, I just insert
the address into the customer record. The difference may seem superficial, but
the underlying mechanics are very different.
Record linking in WhiteDB
~~~~~~~~~~~~~~~~~~~~~~~~~
WhiteDB implements the network model in a straightforward way. This tutorial
has already shown how to write an integer or a string into a field inside
a record, putting another record there works exactly the same:
void *rec = wg_create_record(db, 2); /* this is some record */
void *rec2 = wg_create_record(db, 3); /* this is another record */
wg_int enc = wg_encode_record(db, rec); /* encode as a WhiteDB value */
wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL));
wg_set_field(db, rec2, 2, enc);
Assuming we don't add any other fields, the first record now looks like this:
`[NULL, "hello"]`. But the second one is more interesting: `[NULL, NULL,
[NULL, "hello"]]`. Of course, the data isn't actually duplicated inside the
second record, its field 2 just contains a reference. But if your program
already holds `rec2`, getting to the contents of `rec` is very fast.
So how does one make use of such data? If I have the second record, I can
get to the message inside the first record like this (checking for the
type isn't mandatory of course, this is just to show that the record is
a datatype just like an integer or a double):
if(wg_get_field_type(db, rec2, 2) == WG_RECORDTYPE) {
void *rec = wg_decode_record(db, wg_get_field(db, rec2, 2));
char *message = wg_decode_str(db, wg_get_field(db, rec, 1));
/* we find that the message is "hello" */
}
Linking records together can produce a graph, a tree or whatever is needed.
Or we can treat the records as containing other records inside, so our
initial view of them as flat table rows was a bit deceptive and a set would
perhaps be a more accurate description.
NOTE: A field containing a reference to a record isn't different from other
fields. It can still be indexed and you could write a query like "give me
a record where field 3 is [ "owner", [ "name", "Peter", "age", 25 ]]" and
it would work. However, while record references are fast, keeping track of
such recursion can be expensive so it is best to avoid deep chains of records
on indexed fields.
The next program ('Examples/tut7.c') creates some records and links them
together. Run it, then use `wgdb select 20` to examine the database contents.
The records will appear nested inside each other:
[source,C]
----
#include <stdlib.h>
#include <whitedb/dbapi.h>
int main(int argc, char **argv) {
void *db, *rec, *rec2, *rec3;
wg_int enc;
if(!(db = wg_attach_database("1000", 2000000)))
exit(1); /* failed to attach */
rec = wg_create_record(db, 2); /* this is some record */
rec2 = wg_create_record(db, 3); /* this is another record */
rec3 = wg_create_record(db, 4); /* this is a third record */
if(!rec || !rec2 || !rec3)
exit(2);
/* Add some content */
wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL));
wg_set_field(db, rec2, 0, wg_encode_str(db,
"I'm pointing to other records", NULL));
wg_set_field(db, rec3, 0, wg_encode_str(db,
"I'm linked from two records", NULL));
/* link the records to each other */
enc = wg_encode_record(db, rec);
wg_set_field(db, rec2, 2, enc); /* rec2[2] points to rec */
enc = wg_encode_record(db, rec3);
wg_set_field(db, rec2, 1, enc); /* rec2[1] points to rec3 */
wg_set_field(db, rec, 0, enc); /* rec[0] points to rec3 */
wg_detach_database(db);
return 0;
}
----
When to use the network model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTE: the code included in this section isn't here to show you how to do
things but rather to illustrate what would happen if things *were* done this
way.
Consider our earlier example with two records, one of them containing the
important message "hello". Let's try to use techniques that work in relational
databases to accomplish the same thing. Storing the data isn't complicated:
void *rec = wg_create_record(db, 2);
void *rec2 = wg_create_record(db, 3);
wg_int hello_id = wg_encode_int(db, 11913); /* an arbitrary record id */
wg_set_field(db, rec, 0, hello_id); /* assign the id */
wg_set_field(db, rec, 1, wg_encode_str(db, "hello", NULL));
wg_set_field(db, rec2, 2, hello_id); /* reference by id */
So far so good. The records should now contain `[11913, "hello"]` and `[NULL,
NULL, 11913]`. Pretend again that we have just the `rec2` (as a result of a
query, for example) and need the contents of `rec`:
int hello_id = wg_decode_int(db, wg_get_field(db, rec2, 2));
/* this will search the database for 11913 */
void *rec = wg_find_record_int(db, 0, WG_COND_EQUAL, hello_id, NULL);
char *message = wg_decode_str(db, wg_get_field(db, rec, 1));
If you have an index on field 0, this doesn't look that bad. Sure, the value
"11913" needs to be looked up, but `wg_find_record_int()` can manage it
reasonably fast. What if you had to do this inside a loop, though? How about
a nested loop? Executing a million queries isn't that fun anymore, even if
they don't take up time individually - `wg_decode_record()` would have taken
a fraction of that.
The lesson to learn from here is that whenever you have a data model where
objects have relations to each other and need to perform JOIN type queries
on them, it can be much faster in WhiteDB to implement the "join" in advance
by linking the object records to each other. Those links can then be navigated
rapidly to collect all the data.
Another use case is storing semi-structured data. We have used the network
model in our implementation of JSON documents in WhiteDB (a work in progress)
where for example an array can contain other arrays or objects: the record
linking feature allows us to accomplish this elegantly by making the array
a record whose fields contain links to other records holding the child
elements.
More examples
-------------
There are a few more examples distributed with WhiteDB that were not covered
in this tutorial. You may look at 'Examples/demo.c' and 'Examples/query.c' that
should be commented well enough to be understandable by now.
Examples in Examples/speed are geared towards speed testing and are covered
in http://whitedb.org/speed.html
A bit more involved example to look at is 'Examples/dserve.c': making queries
from WhiteDB with a simple REST cgi program giving json or csv output.
`dserve` is useful both as it is and as an example/template for making your own
tools for WhiteDB data handling: see http://whitedb.org/tools.html
|