File: README-dtrace.txt

package info (click to toggle)
gridengine 8.1.9%2Bdfsg-10
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 56,880 kB
  • sloc: ansic: 432,689; java: 87,068; cpp: 31,958; sh: 29,429; jsp: 7,757; perl: 6,336; xml: 5,828; makefile: 4,701; csh: 3,928; ruby: 2,221; tcl: 1,676; lisp: 669; yacc: 519; python: 503; lex: 361; javascript: 200
file content (167 lines) | stat: -rw-r--r-- 5,913 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
                Monitoring Grid Engine Masters with dtrace
                ------------------------------------------

Content
-------
1. Introduction
2. Master bottleneck analyis with dtrace
3. Copyright

1. Introduction
---------------

   Dtrace is a comprehensive framework for tracing dynamic events in
   Solaris 10. Please see under

      http://www.sun.com/bigadmin/content/dtrace/

   for more detailed information about dtrace.

2. Master bottleneck analyis with dtrace
----------------------------------------

   Understanding the bottlenecks of distributed systems is crucial for 
   performance tuning. The script $SGE_ROOT/util/dtrace/monitor.sh allows 
   a Grid Engine master be monitored, if Solaris 10 dtrace(1) can be used.

   Monitor.sh measures throughput-relevant data of your running Grid Engine 
   master and compiles this data into few indices that are printed in a 
   single-line view per interval with columns below.

      Spooling:
        #wrt 
           Number of qmaster write operations via spool_write_object() and 
           spool_delete_object(). Almost any significant write operation goes
           through this function both in bdb/classic spooling.

        wrt/ms
           Total time all threads spend in spool_write_object() in micro
           seconds.

      Message processing:
        #rep
           Number of reports qmaster processed through sge_c_report().
           Most data sent by execd's to qmaster comes as such a report
           (job/load/config report).

        #gdi 
           Number of GDI requests qmaster processed through do_gdi_request().
           Almost anything sent from client commands arrives qmaster as a
           GDI request, but also execd's and scheduler use GDI requests.

        #ack
           Number of ACK messages qmaster processed through do_c_ack().
           High numbers of ACK messages can be an indication of job
           signalling, but they are used also for other purposes.
           
      Scheduling:
         #dsp
           Number of calls to dispatch_jobs() in schedd. Each call
           to dispatch_jobs() can seen as a scheduling run.

         dsp/ms
           Total time scheduler spent in all calls to dispatch_jobs().

         #sad
           Number of calls to select_assign_debit(). Each call to
           select_assign_debit() can be seen as a try of the scheduler
           to find an assignement or a reservation for a job.

      Qmaster/Schedd synchronization:
         #snd
           Number of event packages sent by qmaster to schedd. If that
           number goes down to zero over longer time there is something
           wrong and qmaster/schedd get out of sync.

         #rcv
           Number of event packages received by schedd from qmaster.
           If that number goes down to zero over longer time there is
           something wrong and qmaster/schedd get out of sync.

      Qmaster communication:
         #in++   
           Number of messages added into qmaster received messages 
           buffer.

         #in--
           Number of messages removed from qmaster received messages 
           buffer. If more messages are added than removed during an 
           interval, the total of messages not yet processed is about 
           to grow.

         #out++  
           Number of messages added into qmaster send messages 
           buffer.

         #out--
           Number of messages removed from qmaster send messages 
           buffer. If more messages are added than removed during an 
           interval, the total of not yet messages not yet delivered 
           is about to grow.

      Qmaster locks:
         #lck0/#ulck0
           Number of calls to sge_lock()/sge_unlock() for qmasters
           "global" lock. This lock must always be obtained, when
           qmaster-internal lists (job list, queue list, etc.) are
           accessed.

         #lck1/#ulck1
           Number of calls to sge_lock()/sge_unlock() for qmasters
           "master_config" lock. This lock is a secondary lock, but
           also plays it's role.

   note, currently the following options are supported:

      -interval <time> 
       
          For use of statistics intervals other than "15sec"

      -spooling
     
          Shows qmaster spooling probes besides statistics. This
          option allows diving into a presumed spooling bottleneck.

      -requests

          Shows incoming qmaster request probes. This option allows 
          diving into cases where you presume there must be someone
          flooding your qmaster.

      -verify

          Just verify probes are functioning and exit(0) then.

   besides, for ease of use any critical/error/warning logging appears
   in monitor.sh output.

3. Copyright
------------
___INFO__MARK_BEGIN__
The Contents of this file are made available subject to the terms of the Sun
Industry Standards Source License Version 1.2

Sun Microsystems Inc., March, 2001

Sun Industry Standards Source License Version 1.2
=================================================

The contents of this file are subject to the Sun Industry Standards Source
License Version 1.2 (the "License"); You may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://gridengine.sunsource.net/Gridengine_SISSL_license.html

Software provided under this License is provided on an "AS IS" basis,
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
WITHOUT LIMITATION, WARRANTIES THAT THE SOFTWARE IS FREE OF DEFECTS,
MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE, OR NON-INFRINGING.

See the License for the specific provisions governing your rights and
obligations concerning the Software.

The Initial Developer of the Original Code is: Sun Microsystems, Inc.

Copyright: 2001 by Sun Microsystems, Inc.

All Rights Reserved.
___INFO__MARK_END__