File: timed_event_thread_pool.txt

package info (click to toggle)
gridengine 8.1.9%2Bdfsg-13.1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 57,140 kB
  • sloc: ansic: 432,689; java: 87,068; cpp: 31,958; sh: 29,445; jsp: 7,757; perl: 6,336; xml: 5,828; makefile: 4,704; csh: 3,934; ruby: 2,221; tcl: 1,676; lisp: 669; yacc: 519; python: 503; lex: 361; javascript: 200
file content (219 lines) | stat: -rw-r--r-- 8,141 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
      Functional Specification: Timed Event Thread Pool 
      =================================================

      Version  Comment                                Date      Author
      -------  -------------------------------------  --------  -------------
      0.1      Initial version                        03-04-09  Ernst Bablick 
      0.2      Changes according to comments from     07-04-09  Ernst Bablick
               JG, RD, CR  

1     INTRODUCTION
      ============

      In Grid Engine 6.2 qmaster threads have the possibility to define 
      one-time and recurring events that trigger a event handler function.
      These event handler functions are registered at and also executed by 
      the timed event thread.

      The fact that event handling functions are executed by the timed event
      thread is no problem as long as the execution time of these event handler
      functions is short. If the execution of such a function takes more time
      then this will have an influence on the start time of other events.
      Especially recurring events might then not be handled at the expected
      point in time.

      This limitation might be addressed by introducing an additional thread 
      pool that would get the responsibility to execute event handler 
      functions when they are triggered.
      
      This enhancement would make it possible to move spooling related code,
      that is currently executed at the end of a job deletion request, into
      a event handler function. For the job deletion request this would mean
      that the global lock might be released earlier and as a consequence
      cluster throughput might be increased especially in clusters with a huge
      amounts of short running jobs.

2     PROJECT OVERVIEW
      ================

2.1   Project Aim

      Aim of the project is it to provide the necessary infrastructure in
      qmaster so that event handling functions are not anymore executed by 
      the timed event thread. Instead they will run in threads of a thread 
      pool dedicated to handle only event handling functions.

      Additionally current code should be changed so that all existing
      event handling function will be handled by that thread pool.

      A new event handling function will be introduced that executes
      the code that removes job related spool files after a job has been 
      finished. This is currently done within the global lock when 
      the deletion of a job is triggered via GDI.

2.2   Project Benefit

      Throughput in the cluster will increase especially in high loaded
      cluster with a huge amount of short running jobs.

2.3   Project Duration

      WP DURATION DESCRIPTION
      -- -------- ------------------------------------------------
      1  5d       (CLEANUP) Generic list implementation 
                  - makes list implementation obsolete, where the work 
                    packages for GDI worker threads are stored.
                  - implementation must be thread safe 
                  - can be used for new thread pool queue
                  - prerequisite of additional performance tests to compare
                    a generic list implementation with the CULL implementation
                  - module test 
                  - TS run / Review / Checkin

      2  5d       Introduce thread pool of EHTs
                  - bootstrap enhancements
                  - installation enhancements
                  - startup/shutdown 
                  - TS run / Review / Checkin

      3  1d      Introduce task queue for EHTs
                  - EHT is consumer of tasks
                  - statistics output for logging/profiling
                  - TS run / Review / Checkin
      
      4  5d       Change timed event thread
                  - TET is producer of tasks 
                  - change behaviour for one-time events
                  - change behaviour for recurring events
                  - TS run
  
      5  10d      Change code that is triggered when jobs should be deleted
                  - introduce a new state (finished but still spooled)
                  - job ids for jobs in that state have to be stored in
                    a global data structure that is setup during startime
                    of the master. 
                  - make it possible to disable the spooling part
                    when jobs finish or when they are deleted (GDI DEL and
                    JOB DEL ORDER)
                  - make it possible to execute that code when a recurring
                    event is triggered. The event handler has to handle
                    the deletion of all job related files for all jobs
                    since the event occured.
                  - make it configurable where the deletion code is
                    executed
                  - TS run

      6  4d       Performance tests
                  - current 6.2u2
                  - with new threads but old deletion behaviour
                  - with new threads and new deletion behaviour

      7  5d       Testsuite
                  - Scenario: qmaster shutdown/restart when 
                              jobs are in the new state
                  - ...

      ------------

         35d 
 
2.4   Project Dependencies

      There are no known dependencies with other projects. Especially
      projects that try to break the global lock or to improve qmaster 
      performance by introducing a read only thread should be no issues.


3     SYSTEM ARCHITECTURE
      ===================

3.1   Enhancement Functions

      There are no user interface changes planned.

      Additional threads will be shown in the debug output and also
      in qping output, when profiling is enabled.

3.2   Overall Block Diagram

      Scenario: In the timed event thread there are two activities
         registered. The first one can begin at point A in time. The
         second one can begin at point B. The duration of the procedure 
         that is triggered at A will last longer that B-A.

      Current              Future 
      -------              ------
                                   
        TET                TET      EHT1     EHT2    ...  EHTn
         |                  |        |        |            |
        +-+ ----+ A        +-+ ---> +-+       |  A         |
        | |     |          | |      | |       |            .
       +---+ <--+          | |      | |       |            .
       |   |               | |      | |       |            .
       |   | ---+ B        | | ------------> +-+ B
       |   |    |          | |      | |      | |
       |   |    |          | |      +-+      | |
       |   |    |          | |       |       | |
       +---+    |          | |       |       | |
        | |     |          | |       |       +-+ 
       +---+ <--+          | |       |        |
       |   |               +-+       |        |
       |   |                |        |        |
       |   |  
       |   |  
       +---+    
        | |       
        +-+
         | 

      The left side shows the current behaviour of GE. All functions
      are executed by the timed event thread (TET) itself. The execution of
      the procedure that should be triggered at point B in time
      is postponed till the first procedure finishes.

      The right side shows the behaviour if we would implement a thread 
      pool of event handling threads (EHT). During the time the first event 
      handler thread executes the first function the second one
      is started at time B.

4     FUNCTIONAL DEFINITION
      =====================

4.1   Performance

4.2   Reliability, Availability, Serviceability

4.3   Diagnostics

4.4   User Experience

4.5   Manufacturing

4.6   Quality Assurance

4.7   Security & Privacy

4.8   Mitigation Path

4.9   Documentation

4.10  Installation

4.11  Packing

4.12  Issues/Risks and Purposed Mitigation


5     COMPONENT DESCRIPTION
      =====================

5.1   Component: Command line  

5.1.1 Overview

5.1.2 Functionality

5.1.3 Interfaces

5.1.4 Other Requirements