1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408
|
<HTML>
<HEAD>
<TITLE>TaskKit QuickStart</TITLE>
</HEAD>
<BODY BGCOLOR="WHITE">
<H4>Tom's Webware Documentation</H4>
<H1>Scheduling with Python and Webware</H1>
<a href="mailto:Tom.Schwaller@linux-community.de" style="text-decoration: none; color : #009999;">Tom Schwaller</a>
<P>
<HR NOSHADE>
<I><FONT COLOR="#0000ff">
Since version 0.5 the web application framework Webware has a scheduling plug-in called TaskKit.
This QuickStart Guide describes how to use it in your daily work with Webware and also with
normal Python programs.
</FONT></I>
<HR NOSHADE>
<P>
Scheduling periodic tasks is a very common activity for users of a modern operating system.
System administrators for example know very well how to start new <code>cron</code> jobs or the
corresponding Windows analogues. So, why does a web application server like Webware/WebKit
need its own scheduling framework. The answer is simple: Because it knows better how to
react to a failed job, has access to internal data structures, which otherwise would have
to be exposed to the outside world and last but not least it needs scheduling capabilities
anyway (e.g. for session sweeping and other memory cleaning operations).
<P>
Webware is developped with the object oriented scripting language Python so it seemed natural
to write a general purpose Python based scheduling framework. One could think that this
problem is already solved (remember the Python slogan: batteries included), but strange
enough there has not much work been done in this area. The two standard Python modules
<code>sched.py</code> and <code>bisect.py</code> are way too simple, not really object oriented
and also not multithreaded. This was the reason to develop a new scheduling framework, which
can not only be used with Webware but also with general purpose Python programs. Unfortunately
scheduling has an annoying side effect. The more you delve into the subject the more it becomes
difficult.
<P>
After some test implementations I discovered the Java scheduling framework of the
<a href="http://www.arlut.utexas.edu/gash2/">Ganymede</a> network directory management
system and took it as a model for the Python implementation. Like any other Webware Kit
or plug-in the TaskKit is self contained and can be used in other Python projects.
This modularity is one of the real strengths of Webware and in sharp contrast to Zope
where people tend to think in Zope and not in Python terms. In a perfect world one should
be able to use web wrappers (for Zope, Webware, Quixote,..) around clearly designed Python
classes and not be forced to use one framework. Time will tell if this is just a dream
or if people will reinvent the "python wheels" over and over again.
<h2>Tasks</h2>
<P>
The TaskKit implements the three classes <code>Scheduler, TaskHandler</code> and <code>Task</code>.
Let's begin with the simplest one, i.e. Task. It's an abstract base class, from which you
have to derive your own task classes by overriding the <code>run()</code>-method like in
the following example:
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<pre>
<FONT COLOR=black><B>from</B></FONT> TaskKit.Task <FONT COLOR=black><B>import</B></FONT> Task
<FONT COLOR=black><B>from</B></FONT> time <FONT COLOR=black><B>import</B></FONT> time, strftime, localtime
<FONT COLOR=black><B>class</B></FONT><A NAME="SimpleTask"><FONT COLOR=#AA00AA><B> SimpleTask</B></FONT></A>(Task):
<FONT COLOR=black><B>def</B></FONT><A NAME="run"><FONT COLOR=#AA00AA><B> run</B></FONT></A>(self):
<FONT COLOR=black><B>print</B></FONT> self.name(), strftime(<FONT COLOR=#FF0000>"%H:%M:%S"</FONT>, localtime(time()))
</pre>
</td></tr></table>
<p>
<code>self.name()</code> returns the name under which the task was registered by the scheduler.
It is unique among all tasks and scheduling tasks with the same name will delete the old
task with that name (so beware of that feature!). Another simple example which is used
by WebKit itself is found in <code>WebKit/Tasks/SessionTask.py</code>.
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<pre>
<FONT COLOR=black><B>from</B></FONT> TaskKit.Task <FONT COLOR=black><B>import</B></FONT> Task
<FONT COLOR=black><B>class</B></FONT><A NAME="SessionTask"><FONT COLOR=#AA00AA><B> SessionTask</B></FONT></A>(Task):
<FONT COLOR=black><B>def</B></FONT><A NAME="__init__"><FONT COLOR=#AA00AA><B> __init__</B></FONT></A>(self, sessions):
Task.__init__(self)
self._sessionstore = sessions
<FONT COLOR=black><B>def</B></FONT><A NAME="run"><FONT COLOR=#AA00AA><B> run</B></FONT></A>(self):
<FONT COLOR=black><B>if</B></FONT> self.proceed():
self._sessionstore.cleanStaleSessions(self)
</pre>
</td></tr></table>
<p>
Here you see the <code>proceed()</code> method in action. It can be used by long running tasks
to check if they should terminate. This is the case when the scheduler or the task itself has been
stoped. The latter is achieved with a <code>stopTask()</code> call which is not recommented though.
It's generally better to let the task finish and use the <code>unregister()</code> and
<code>disable()</code> methods. The first really deletes the task after termination while
the second only disables its rescheduling. You can still use it afterwards. Right now the implementation
of <code>proceed()</code>
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<pre>
<FONT COLOR=black><B>def</B></FONT><A NAME="proceed"><FONT COLOR=#AA00AA><B> proceed</B></FONT></A>(self):
<FONT COLOR=#FF0000>"""
Should this task continue running? Should be called periodically
by long tasks to check if the system wants them to exit.
Returns 1 if its OK to continue, 0 if it's time to quit.
"""</FONT>
<FONT COLOR=black><B>return</B></FONT> <FONT COLOR=black><B>not</B></FONT>( self._close.isSet() <FONT COLOR=black><B>or</B></FONT> (<FONT COLOR=black><B>not</B></FONT> self._handle._isRunning) )
</pre>
</td></tr></table>
<P>
uses the <code>_close</code> Event variable, which was also available trough the <code>close()</code> method.
Don't count on that in future versions, it will probably be removed. Just use <code>proceed()</code> instead
(take a look at <code>TaskKit/Tests/BasicTest.py</code>). Another API change after version 0.5 of Webware was
the removal of the <code>close</code> variable in <code>run()</code>. If you plan to make serious use of TaskKit
it's better to take the newest CVS snapshot of Webware, otherwise you will have to delete all occurences of
<code>close</code> afterwards. Another thing to remember about tasks is, that they know nothing about
scheduling, how often they will run (periodically or just once) or if they are on hold. All this is managed
by the task wrapper class <code>TaskManager</code>, which will be discussed shortly. Let's look at some more
examples first.
<h2>Generating static pages</h2>
<P>
On a high trafic web site (a la <a href="http://slashdot.org">slashdot</a>) it's
common practice to use semistatic page generation techniques. For example
you can generate the entry page as a static page once per minute. During this time
the content will not be completely accurate (e.g. the number of comments will certainly
increase), but nobody really cares about that. The benefit is a dramatic reduction of
database requests. For other pages (like older news with comments attached) it makes
more sense to generate static versions on demand. This is the case when the discussion
has come to an end, but somebody adds a comment afterwards and implicitely changes the
page by this action. Generating a static version will happen very seldom after the
"hot phase" when getting data directly out of the database is more appropriate. So
you need a periodic task which checks if there are new "dead" stories (e.g. no comments
for 2 days) and marks them with a flag for static generation on demand. It should
be clear by now, that an integrated Webware scheduling mechnism is very useful for
this kind of things and the better approach than external <code>cron</code> jobs.
Let's look a litle bit closer at the static generation technique now. First of all we
need a <code>PageGenerator</code> class. To keep the example simple we just
write the actual date into a file. In real life you will assemble much more complex data
into such static pages.
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<pre>
<FONT COLOR=black><B>from</B></FONT> TaskKit.Task <FONT COLOR=black><B>import</B></FONT> Task
<FONT COLOR=black><B>from</B></FONT> time <FONT COLOR=black><B>import</B></FONT> *
html = <FONT COLOR=#FF0000>'''<html>
<head><title>%s</title></head>
<body bgcolor="white">
<h1>%s</h1>
</body>
</html>
'''</FONT>
<FONT COLOR=black><B>class</B></FONT><A NAME="PageGenerator"><FONT COLOR=#AA00AA><B> PageGenerator</B></FONT></A>(Task):
<FONT COLOR=black><B>def</B></FONT><A NAME="__init__"><FONT COLOR=#AA00AA><B> __init__</B></FONT></A>(self, filename):
Task.__init__(self)
self._filename = filename
<FONT COLOR=black><B>def</B></FONT><A NAME="run"><FONT COLOR=#AA00AA><B> run</B></FONT></A>(self):
f = open(self._filename, <FONT COLOR=#FF0000>'w'</FONT>)
now = asctime(localtime(time()))
f.write( html % (<FONT COLOR=#FF0000>'Static Page'</FONT>, now) )
f.close()
</pre>
</td></tr></table>
<h2>Scheduling</h2>
<P>
That was easy. Now it's time to schedule our task. In the following example you can see how this is
accomplished with TaskKit. As a general recommendation you should put all your tasks in a
separate folder (with an empty <code>__init__.py</code> file to make this folder a Python
package). First of all we create a new <code>Scheduler</code> object, start it as a thread and add
a periodic page generation object (of type <code>PageGenerator</code>) with the
<code>addPeriodicAction</code> method (this will probably be changed in the near future to
the more constitent name <code>addPeriodicTask</code>). The first parameter here is the first
execution time (which can be in the future), the second is the period (in seconds), the third an instance
of our task class and the last parameter is a unique task name which allows us to find the task
later on (e.g. if we want to change the period or put the task on hold).
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<PRE>
<FONT COLOR=black><B>from</B></FONT> TaskKit.Scheduler <FONT COLOR=black><B>import</B></FONT> Scheduler
<FONT COLOR=black><B>from</B></FONT> Tasks.PageGenerator <FONT COLOR=black><B>import</B></FONT> PageGenerator
<FONT COLOR=black><B>from</B></FONT> time <FONT COLOR=black><B>import</B></FONT> *
<FONT COLOR=black><B>def</B></FONT><A NAME="main"><FONT COLOR=#AA00AA><B> main</B></FONT></A>():
scheduler = Scheduler()
scheduler.start()
scheduler.addPeriodicAction(time(), 5, PageGenerator(<FONT COLOR=#FF0000>'static.html'</FONT>), <FONT COLOR=#FF0000>'PageGenerator'</FONT>)
sleep(20)
scheduler.stop()
<FONT COLOR=black><B>if</B></FONT> __name__==<FONT COLOR=#FF0000>'__main__'</FONT>:
main()
</PRE>
</td></tr></table>
<P>
When you fire up this example you will notice that the timing is not 100% accurate. The reason for this
seems to be an imprecise <code>wait()</code> function in the Python <code>threading</code> module.
Unfortunately this method in indispensible because we need to be able to wake up a sleeping scheduler when
scheduling new tasks with first execution times smaller than <code>scheduler.nextTime()</code>. This is
achieved through the <code>notify()</code> method, which sets the <code>notifyEvent</code>
(<code>scheduler._notifyEvent.set()</code>). On Unix we could use <code>sleep</code> and a <code>signal</code>
to interrupt this system call, but TaskKit has to be plattform independant to be of any use. But don't worry,
this impreciseness is not important for normal usage, because we are talking about scheduling in the minute
(not second) range here. Unix <code>cron</code> jobs have a granularity of one minute, which is a good value
for TaskKit too. Of course nobody can stop you starting tasks with a period of one second (but you have been
warned that this is not a good idea, except for testing purposes).
<h2>Generating static pages again</h2>
<P>
Let's refine our example a little bit and plug it into Webware. We will write a Python servlet which loks like this:
<P>
<center>
<form method="post">
<input type=submit name=_action_ value=Generate>
<input type=text name=filename value="static.html" size=20> every
<input type=text name=seconds value=60 size=5> seconds</form>
<table width=50% border=1 cellspacing=0>
<tr bgcolor=00008B><th colspan=2><font color=white>Task List</font></th></tr>
<tr bgcolor=#dddddd><td><b>Task Name</b></td><td><b>Period</b></td></tr>
<tr><td>SessionSweeper</td><td>360</td></tr>
<tr><td>PageGenerator for static3.html</td><td>30</td></tr>
<tr><td>PageGenerator for static1.html</td><td>60</td></tr>
<tr><td>PageGenerator for static2.html</td><td>120</td></tr>
</table>
</center>
<P>
When you click on the <code>Generate</code> button a new periodic <code>PageGenerator</code> task
will be added to the Webware scheduler. Remember that this will generate a static page <code>static.html</code>
every 60 seconds (if you use the default values). The new task name is <code>"PageGenerator for filename"</code>,
so you can use this servlet to change the settings of already scheduled tasks (by rescheduling) or
add new <code>PageGenerator</code> tasks with different filenames. This is quite useless here, but as
soon as you begin to parametrize your <code>Task</code> classes this approach can become quite powerful
(consider for example a mail reminder form or collecting news from different news channels as periodic tasks
with user defined parameters). In any case, don't be shy and contribute other interesting examples
(the sky's the limit!).
<P>
Finally we come to the servlet code, which should be more or less self explanatory, except for the
<code>_action_</code> construct which is very well explained in the Webware documentation though (just
in case you forgot that). <code>app.taskManager()</code> gives you the WebKit scheduler, which can be
used to add new tasks. In real life you will have to make the scheduling information persistent
and reschedule all tasks after a WebKit restart because it would be quite annoying to enter this
data again and again. <code>PersistantScheduler</code> is a class which is on the ToDo list for the
next TaskKit version and will probably be implemented with the new MiddleKit from Chuck Esterbrook.
MiddleKit is a new object relational mapping framework for Python and greatly simplyfies this
kind of developments. You'll certainly read more about it in the future.
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<PRE>
<FONT COLOR=black><B>import</B></FONT> os, string, time
<FONT COLOR=black><B>from</B></FONT> ExamplePage <FONT COLOR=black><B>import</B></FONT> ExamplePage
<FONT COLOR=black><B>from</B></FONT> Tasks.PageGenerator <FONT COLOR=black><B>import</B></FONT> PageGenerator
<FONT COLOR=black><B>class</B></FONT><A NAME="Schedule"><FONT COLOR=#AA00AA><B> Schedule</B></FONT></A>(ExamplePage):
<FONT COLOR=black><B>def</B></FONT><A NAME="writeContent"><FONT COLOR=#AA00AA><B> writeContent</B></FONT></A>(self):
wr = self.write
wr(<FONT COLOR=#FF0000>'<center><form method="post">'</FONT>)
wr(<FONT COLOR=#FF0000>'<input type=submit name=_action_ value=Generate> '</FONT>)
wr(<FONT COLOR=#FF0000>'<input type=text name=filename value="static.html" size=20> every '</FONT>)
wr(<FONT COLOR=#FF0000>'<input type=text name=seconds value=60 size=5> seconds'</FONT>)
wr(<FONT COLOR=#FF0000>'</form>'</FONT>)
wr(<FONT COLOR=#FF0000>'<table width=50% border=1 cellspacing=0>'</FONT>)
wr(<FONT COLOR=#FF0000>'<tr bgcolor=00008B><th colspan=2><font color=white>Task List</font></th></tr>'</FONT>)
wr(<FONT COLOR=#FF0000>'<tr bgcolor=#dddddd><td><b>Task Name</b></td><td><b>Period</b></td></tr>'</FONT>)
<FONT COLOR=black><B>for</B></FONT> taskname, handler <FONT COLOR=black><B>in</B></FONT> self.application().taskManager().scheduledTasks().items():
wr(<FONT COLOR=#FF0000>'<tr><td>%s</td><td>%s</td></tr>'</FONT> % (taskname, handler.period()))
wr(<FONT COLOR=#FF0000>'</table></center>'</FONT>)
<FONT COLOR=black><B>def</B></FONT><A NAME="generate"><FONT COLOR=#AA00AA><B> generate</B></FONT></A>(self, trans):
app = self.application()
tm = app.taskManager()
req = self.request()
<FONT COLOR=black><B>if</B></FONT> req.hasField(<FONT COLOR=#FF0000>'filename'</FONT>) <FONT COLOR=black><B>and</B></FONT> req.hasField(<FONT COLOR=#FF0000>'seconds'</FONT>):
self._filename = req.field(<FONT COLOR=#FF0000>'filename'</FONT>)
self._seconds = string.atoi(req.field(<FONT COLOR=#FF0000>'seconds'</FONT>))
task = PageGenerator(app.serverSidePath(<FONT COLOR=#FF0000>'Examples/'</FONT> + self._filename))
taskname = <FONT COLOR=#FF0000>'PageGenerator for '</FONT> + self._filename
tm.addPeriodicAction(time.time(), self._seconds, task, taskname)
self.writeBody()
<FONT COLOR=black><B>def</B></FONT><A NAME="methodNameForAction"><FONT COLOR=#AA00AA><B> methodNameForAction</B></FONT></A>(self, name):
<FONT COLOR=black><B>return</B></FONT> string.lower(name)
<FONT COLOR=black><B>def</B></FONT><A NAME="actions"><FONT COLOR=#AA00AA><B> actions</B></FONT></A>(self):
<FONT COLOR=black><B>return</B></FONT> ExamplePage.actions(self) + [<FONT COLOR=#FF0000>'generate'</FONT>]
</PRE>
</td></tr></table>
<p>
<h2>The Scheduler</h2>
<P>
Now it's time to take a closer look at the <code>Scheduler</code> class itself. As you have seen in the examples above,
writing tasks is only a matter of overloading the <code>run()</code> method in a derived class and adding it to the
scheduler with <code>addTimedAction, addActionOnDemand, addDailyAction</code> or <code>addPeriodicAction</code>.
The scheduler will wrap the Task in a <code>TaskHandler</code> structure which knows all the scheduling details
and add it to its <code>_scheduled</code> or <code>_onDemand</code> dictionaries. The latter is populated by
<code>addActionOnDemand</code> and contains tasks which can be called any time by
<code>scheduler.runTaskNow('taskname')</code> as you can see in the following example. After that
the task has gone.
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<PRE>
scheduler = Scheduler()
scheduler.start()
scheduler.addActionOnDemand(SimpleTask(), <FONT COLOR=#FF0000>'SimpleTask'</FONT>)
sleep(5)
<FONT COLOR=black><B>print</B></FONT> <FONT COLOR=#FF0000>"Demanding SimpleTask"</FONT>
scheduler.runTaskNow(<FONT COLOR=#FF0000>'SimpleTask'</FONT>)
sleep(5)
scheduler.stop()
</PRE>
</td></tr></table>
<p>
If you need a task more than one time it's better to start it regularly with one of
the <code>add*Action</code> methods first. It will be added to the <code>_scheduled</code>
dictionary. If you do not need the task for a certain time disable it with
<code>scheduler.disableTask('taskname')</code> and enable it later with
<code>scheduler.enableTask('taskname')</code>. There are some more methods
(e.g. <code>demandTask(), stopTask(), ...</code>) in the <code>Scheduler</code> class
which are all documented by doc strings. Take a look at them and write your own examples
to understand the methods (and maybe find bugs ;-)).
<P>
When a periodic task is scheduled it is added in a wrapped version to the
<code>_scheduled</code> dictionary first. The (most of the time sleeping) scheduler thread
always knows when to wake up and start the next task whose wrapper is moved to the
<code>_runnning</code> dictionary. After completion of the task thread the handle
reschedules the task by putting it back from <code>_running</code> to <code>_scheduled</code>),
calculating the next execution time <code>nextTime</code> and possibly waking up the scheduler.
It is important to know that you can manipulate the handle while the task is running, eg. change
the period or call <code>runOnCompletion</code> to request that a task be re-run after its
current completion. For normal use you will probably not need the handles at all, but the
more you want to manipulate the task execution, the more you will appreciate the TaskHandler API.
You get all the available handles from the scheduler with the <code>running('taskname),
scheduled('taskname')</code> and <code>onDemand('taskname')</code> methods.
<P>
In our last example which was contributed by Jay Love, who debugged, stress tested and contributed
a lot of refinements to TaskKit, you see how to write a period modifying Task. This is quite weird
but shows the power of handle manipulations. The last thing to remember is, that the scheduler does
not start a separate thread for each periodic task. It uses a thread for each task run instead and
at any time keeps the number of threads as small as possible.
<p>
<table border=0 bgcolor="#eeeeee" width="100%"><tr><td>
<PRE>
<FONT COLOR=black><B>class</B></FONT><A NAME="SimpleTask"><FONT COLOR=#AA00AA><B> SimpleTask</B></FONT></A>(Task):
<FONT COLOR=black><B>def</B></FONT><A NAME="run"><FONT COLOR=#AA00AA><B> run</B></FONT></A>(self):
<FONT COLOR=black><B>if</B></FONT> self.proceed():
<FONT COLOR=black><B>print</B></FONT> self.name(), time()
<B>print</b> "Increasing period"
self.handle().setPeriod(self.handle().period()+2)
<FONT COLOR=black><B>else</B></FONT>:
<FONT COLOR=black><B>print</B></FONT> <FONT COLOR=#FF0000>"Should not proceed"</FONT>, self.name()
</PRE>
</td></tr></table>
<p>
As you can see the TaskKit framework is quite sophisticated and will hopefully be used by many people
from the Python community. If you have further question, please feel free to ask them on the Webware
mailing list. (last changes: 2. March 2001)
<P>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=0 WIDTH="100%">
<TR ALIGN=CENTER BGCOLOR="#DFEFFF">
<TD ALIGN=CENTER><FONT SIZE=+2>Info</FONT>
</TD>
</TR>
<TR ALIGN=LEFT VALIGN=TOP>
<TD>
<BR>
[1] Webware: <a href="http://webware.sourceforge.net/">http://webware.sourceforge.net/</a>
<br>
[2] Ganymede: <a href="http://www.arlut.utexas.edu/gash2/">http://www.arlut.utexas.edu/gash2/</a>
<br>
</TD>
</TR>
</TABLE>
<P>
Published under the <a href="http://www.gnu.org/copyleft/fdl.html">GNU Free Documentation License</a>.
</BODY>
</HTML>
|