File: timer.rst

package info (click to toggle)
pytorch-cuda 2.6.0%2Bdfsg-7
  • links: PTS, VCS
  • area: contrib
  • in suites: forky, sid, trixie
  • size: 161,620 kB
  • sloc: python: 1,278,832; cpp: 900,322; ansic: 82,710; asm: 7,754; java: 3,363; sh: 2,811; javascript: 2,443; makefile: 597; ruby: 195; xml: 84; objc: 68
file content (60 lines) | stat: -rw-r--r-- 1,606 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Expiration Timers
==================

.. automodule:: torch.distributed.elastic.timer
.. currentmodule:: torch.distributed.elastic.timer

Client Methods
---------------
.. autofunction:: torch.distributed.elastic.timer.configure

.. autofunction:: torch.distributed.elastic.timer.expires

Server/Client Implementations
------------------------------
Below are the timer server and client pairs that are provided by torchelastic.

.. note:: Timer server and clients always have to be implemented and used
          in pairs since there is a messaging protocol between the server
          and client.

Below is a pair of timer server and client that is implemented based on
a ``multiprocess.Queue``.

.. autoclass:: LocalTimerServer

.. autoclass:: LocalTimerClient

Below is another pair of timer server and client that is implemented
based on a named pipe.

.. autoclass:: FileTimerServer

.. autoclass:: FileTimerClient


Writing a custom timer server/client
--------------------------------------

To write your own timer server and client extend the
``torch.distributed.elastic.timer.TimerServer`` for the server and
``torch.distributed.elastic.timer.TimerClient`` for the client. The
``TimerRequest`` object is used to pass messages between
the server and client.

.. autoclass:: TimerRequest
   :members:

.. autoclass:: TimerServer
   :members:

.. autoclass:: TimerClient
   :members:


Debug info logging
-------------------

.. automodule:: torch.distributed.elastic.timer.debug_info_logging

.. autofunction:: torch.distributed.elastic.timer.debug_info_logging.log_debug_info_for_expired_timers