File: README.md

package info (click to toggle)
pytorch-cuda 2.6.0%2Bdfsg-7
  • links: PTS, VCS
  • area: contrib
  • in suites: forky, sid, trixie
  • size: 161,620 kB
  • sloc: python: 1,278,832; cpp: 900,322; ansic: 82,710; asm: 7,754; java: 3,363; sh: 2,811; javascript: 2,443; makefile: 597; ruby: 195; xml: 84; objc: 68
file content (62 lines) | stat: -rw-r--r-- 2,480 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# RPC PS Benchmark

## How to add your experiment

1. Data
    - Create a data class and add it to the data directory
    - Update benchmark_class_helper.py to include your data class in the data_map
    - Add configurations to data_configurations.json in the configurations directory
2. Model
    - Create a model class and add it to the model directory
    - Update benchmark_class_helper.py to include your model class in the model_map
    - Add configurations to model_configurations.json in the configurations directory
3. Trainer
    - Create a trainer class and add it to the trainer directory
    - Update benchmark_class_helper.py to include your trainer class in the trainer_map
    - Add configurations to trainer_configurations.json in the configurations directory
4. Parameter Server
    - Create a parameter server class and add it to the parameter_servers directory
    - Update benchmark_class_helper.py to include your parameter_server class in the ps_map
    - Add configurations to parameter_server_configurations.json in the configurations directory
5. Script
    - Create a bash script for your experiment and add it to the experiment_scripts directory
6. Testing
    - Add a test method for your script to test_scripts.py

## Trainer class

The trainer directory contains base classes to provide a starting point for implementing a trainer.
Inherit from a base class and implement your trainer. The benchmark has two requirements for trainers.

1. It must implement a __init__ method that takes rank, trainer_count, and ps_rref as arguments

    ```python
    def __init__(self, rank, trainer_count, ps_rref, backend, use_cuda_rpc):
    ```

2. It must implement a train method that takes model and data as arguments.

    ```python
    def train(self, model, data):
    ```

## Parameter Server class

The parameter_server directory contains base classes to provide a starting point for implementing a parameter server.
Inherit from a base class and implement your parameter server. The benchmark has two requirements for parameter servers.

1. It must implement a __init__ method that takes rank and ps_trainer_count as arguments

    ```python
    def __init__(self, rank, ps_trainer_count, backend, use_cuda_rpc):
    ```

2. It must implement a reset_state method

    ```python
    def reset_state(ps_rref):
    ```

## Testing

Use `pytest` to run the test methods added to test_scripts.py. To test all the scripts added use `pytest test_scripts.py`.