File: server.html

package info (click to toggle)
gadfly 1.0-2
  • links: PTS
  • area: main
  • in suites: slink
  • size: 796 kB
  • ctags: 1,697
  • sloc: python: 10,962; makefile: 35; sh: 19
file content (287 lines) | stat: -rw-r--r-- 10,376 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
<html>
<head>
<title>Gadfly: server operations</title>
</head>
<body bgcolor="#ffffff">

<table><tr><td>
<img src="gadfly.JPG">
</td><td>
<h1>Gadfly Server Operations</h1>
</td></tr></table>

To permit multiple processes to access and modify a
single database instance, and to reduce the overhead per process
of connecting to a Gadfly database a Gadfly database may be
run in server mode.  A Gadfly server can use a DBA
(data base administrator) configured
start-up script to set up optimized query accesses and certain
forms of security.
<p>
For example to startup a server for the
test database "test" in directory "dbtest" (created by gftest.py)
use:
<pre>
   python gfserve.py 2222 test dbtest admin
</pre>
or to start up the same server with some non-priviledged
policies and some named prepared queries (as initialized
in gfstest.startup(...)) use
<pre>
   python gfserve.py 2222 test dbtest admin gfstest
</pre>
In both cases the admin password for the server is "admin"
and the server runs on port 2222.
See the doc string for gfserve.py for more information on
the command line arguments.
<p>
Only one process should directly access a gadfly database at once
(not mediated by a server),
so if a server is running, no other server for that database
should be started and no other process should connect in "non-server"
mode to that database.

<h1>Motivation</h1>

There are several reasons to run a server: to allow multiple
processes to access the same database; to allow password protected
restricted access to the database by non-priviledged agents;
and to permit faster access to the database 
by providing globally shared
prepared statements.  Using a server also eliminates the need
to start up and load the database many times -- and startup
time could be considerable if the database is large.
<p>
For example I imagine that simple Gadfly servers may be of
use to implement database enabled CGI scripts, whereas the
"non-server" Gadfly will only run with CGI scripts that do not modify
the database, and the startup time for Gadfly might make those
scripts unacceptibly slow if the database is large.  Furthermore,
by using the security features a Gadfly server could be configured
to allow restricted data distribution across a network without
compromising the integrity of the database.

<h1>Security</h1>

The primary goal of Gadfly server security is to prevent
accidental or malicious destruction of a database.
<p>
Security is arbitrated by policies.  Policies have passwords
that are never transmitted in clear text.  However, a "captured"
command could potentially be repeated by a hostile program
even without knowing the password.  It is not a good idea to
run admin or other unrestricted commands on a network that may
have hostile parties sniffing the network.  As with the rest
of the system I provide no guarantees, but for many purposes
the level of security provided may be acceptible.  To be specific
passwords are used to generate md5 certificates for all server
accesses (please see gfsocket.py for implementation details).
<p>
A server always has
an "admin" policy that is permitted to shutdown, restart, or
force a checkpoint on the server.  By default the admin
policy also has the ability to run arbitrary SQL statements
such as "drop table x".  This ability can be disabled in
a startup function if needed.
<pre>
admin_policy.general_queries=0
</pre>
<p>
Other policies can be created that have very restricted access.
For example the following startup function initializes two
policies beyond the admin policy that can only access certain
tables in specific ways (from gfstest.py):
<pre>
def startup(admin_policy, connection, Server_instance):
    """example startup script.

       add a policies test and test1 passwords same
         test1 is allowed to query the likess table by name
         test is allowed to update likes where drinker='nan'
       also add prepared query dumpwork to admin_policy.
    """
    from gfserve import Policy
    admin_policy["dumpwork"] = "select * from work"
    test1 = Policy("test1", "test1", connection, queries=0)
    test = Policy("test", "test", connection, queries=0)
    test1["qlike"] = "select * from likes where drinker=?"
    test["updatenan"] = """
      update likes
      set beer=?, perday=?
      where drinker='nan'
    """
    test["getnan"] = """
      select * from likes where drinker='nan'
    """
    return {"test": test, "test1": test1}
</pre>
Please see the doc string for gfserve.py for more information
on creating startup functions.
<p>
A policy with queries disabled (queries=0) can only execute
named queries.  By using such policies a DBA can configure
a server such that client programs can only read certain tables,
can only update certain rows of certain tables in certain ways,
and so forth.
<p>
Even policies with "unrestricted access" (queries=1)
can provide performance benefits if they have associated
named, prepared queries (like "dumpwork" above).  At the moment
the SQL parser slows down gadfly a bit, and prepared queries
will only be parsed once for all clients.  After the first
access subsequent accesses may be noticably faster (10x faster
in some cases), especially
if the server has the kjbuckets builtin C module.  However,
with queries=1 the policy can execute any SQL statement.
<p>
<strong>NOTE: The server runs all accesses (once the complete
message has been read from the network) serially -- there is
no concurrent access permitted to
a Gadfly instance at this time.  For this
reason a "large query" may cause the server to "freeze" and
delay other accesses.
</strong>  Incomplete requests due to network delays or
other problems will not freeze the server, however (sockets
are polled using select.select).
<p>
<strong>NOTE: All server accesses run in "autocommit mode" at
this time.  A successful access automatically triggers a database
commit (but an unsuccessful access will rollback).</strong>
<p>
As an optimization, however, checkpoints only occur occasionally,
once per a given number of accesses, configurable by setting:
<pre>
Server_instance.check_loop = 100
</pre>

<h1>Start up</h1>

Servers can be started from the command line using the gfserve.py
script interpretation 
(as shown above)
or using gfserve.Server(...) from another
program.  See the doc strings and source for gfserve.py and gfstest.py
for more information.

<h1>Shut down</h1>

Servers can be shut down from the command line interpretation of
gfclient.py or from another program using the gfclient(...) class
shutdown() method, but only using the admin policy with the admin
password.  For example to shut down the server started above:
<pre>
python gfclient.py shutdown 2222 admin
</pre>
See the doc strings and source for gfserve.py 
and gfstest.py
for more information.

<h1>Client Access</h1>

Client access to a gadfly server is similar to the normal 
Python DB-SIG DBAPI
access to gadfly, except that it is sometimes faster and can
potentially
be run from any machine reachable on the network (if the client
program knows the password).
<p>
To access a gadfly server from a remote machine the only
python modules required (in addition to the standard libraries)
are gfclient.py and gfsocket.py.
<p>
Initialize a connection with a given "POLICY" with "PASSWORD"
 to a running server
on "machine.domain.com" using port number 2222 with:
<pre>
   from gfclient import gfclient
   conn = gfclient("POLICY", 2222, "PASSWORD", "machine.domain.com")
</pre>
Note that policy names and passwords are case sensitive.
<p>
Queries and other statements are normally executed via cursors.
Obtain a cursor from a connection using:
<pre>
   cursor = connection.cursor()
</pre>
Execute a statement in a cursor using:
<pre>
   cursor.execute(statement)
</pre>
or to provide dynamic parameters:
<pre>
   cursor.execute(statement, dynamic_parameters)
</pre>
For example
<pre>
   cursor.execute("select * from work")
   ...
   cursor.execute("select * from work where name=?", ("carla",))
</pre>
The dynamic parameters work the same as described in the
<a href="gadfly.html">the main gadfly documentation page</a>.
In particular INSERT VALUES can insert several rows at once
by using a list of tuples for the rows.
<p>
If there is any problem (bad policy name, bad password, server
not running, queries not allowed for this policy) 
the execute will generate an exception.
<p>
To run a named/prepared query (initialized at startup) use
execute_prepared, which takes a prepared statement name
rather than a query string:
<pre>
cursor.execute_prepared("updatenan", ("rollingrock", 1))
...
cursor.execute_prepared("getnan")
</pre>
The execute_prepared method works just like the execute
method except that the "name" must be the name of a query initialized
by the startup(...) function at server startup.
<p><strong>
NOTE: by default any execution that sends or recieves "too much
data" will be aborted.  Edit gfsocket.py (both on the client end
and on the server end if different) if you wish to disable this
sanity check feature.
<pre>
LEN_LIMIT=10e8
</pre></strong>
<p>
As with other dbapi cursors the results of a query can be
extracted as a list of tuples using (after execute):
<pre>
   result_list = cursor.fetchall()
</pre>
The other fetches (fetchone and fetchmany) have not been
implemented yet (partially since they don't make much sense
in this context).
<p>
Both named and unnamed statements may be semicolon separated
sequences of several SQL statements, but if they are they will return
no results.

<h1>Implementation Comments</h1>

For your information the server/client interaction is much like
"finger" or "http" -- each client access is a separate TCP/Stream
connection where the client sends a request and the server sends
a response.  After each access the connection is closed and the
next access generates a new connection.
I did it that way, because it was a simple and
robust strategy (witness the success of HTTP).

<p>
<strong>
Please note: Although I have attempted to provide a robust
implementation
for this software I do not guarantee its correctness.  I hope
it will work well for you but I do not assume any legal
responsibility for problems anyone may have during use
of these programs.
</strong>

<p>
<a href="mailto:arw@ifu.net">feedback</a><br>
<a href="../index.html">home</a><br>
<a href="index.html">Gadfly home</a>
</Body>
</html>