File: Thread_Private_Storage.md

package info (click to toggle)
mpich 5.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 251,828 kB
  • sloc: ansic: 1,323,147; cpp: 82,869; f90: 72,420; javascript: 40,763; perl: 28,296; sh: 19,399; python: 16,191; xml: 14,418; makefile: 9,468; fortran: 8,046; java: 4,635; pascal: 352; asm: 324; ruby: 176; awk: 27; lisp: 19; php: 8; sed: 4
file content (140 lines) | stat: -rw-r--r-- 6,178 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# Thread Private Storage

While the MPI standard is fairly careful to be thread-safe, and in
particular to avoid the need for thread-private state, there are a few
places where the MPICH implementation may need to storage that is
specific to each thread. These places include

- Error return from predefined MPI reduction operations
- Nesting count used to ensure that `MPI_ERRORS_RETURN` is used when
  an MPI routine is used internally (e.g., if a routine in MPICH calls
  `MPI_Send`, in case of an error, `MPI_Send` should return and not
  invoke the error handler, since we will want the calling routine to
  handle the error).
- Performance counters

These are only needed in a relatively few places, and not in most of the
performance-critical paths.

Unfortunately, access to thread-private (or thread-specific) storage is
not easy in C or C++, since the compiler knows nothing about threads. At
least one compiler, the C compiler for the NEC SX-4, provides a pragma
to mark a variable as thread-private, making it somewhat easier to make
use of thread-private data. In most cases, however, it is necessary to
call a routine provided by the thread package to access thread-private
data; this data is usually identified by an integer id. For performance
reasons, we'll want to make only one call to access the thread-private
storage within a routine (since the thread-private data will have to be
assigned to a variable that is allocated on the stack, since that is the
only thread-private data on which we can depend).

These considerations lead to the following design:

1.  There is one structure, of type `MPICH_PerThread_t` (see
    `src/include/mpiimpl.h`) that contains all thread-private data. The
    thread-private data that we will have the thread package manage is a
    pointer to this storage, which will be allocated on first use.
2.  The id for this storage is in the `MPIR_Process` structure, of type
    `MPIR_PerProcess_t`, which is also defined in
    `src/include/mpiimpl.h`. The id is in the thread_storage field.
3.  In any routine that needs access to the thread-private data, the
    following macros are used. These are macros so that they can be
    rendered into null statements when single-threaded versions of MPICH
    are built.

    - `MPIU_THREADPRIV_DECL` - Used the declare the variables needed to
      access the thread-private storage. To enable a check that the
      thread-private storage has been set, this pointer is initialized to zero.
    - `MPIU_THREADPRIV_GET` - Used to acquire the (pointer to) the
      thread-private structure. This must be called before any of the
      routines that may access the thread-private storage.
    - `MPIU_THREADPRIV_FIELD` - Use to access a field within the per-thread
      instance of `MPICH_PerThread_t`. This is used by the implementations of the
      "nest" macros, the error returns for the collective operations, and
      the performance statistics collection. This macro allows the
      single-threaded code to use a statically allocated `MPIR_Thread`
      variable while the multi-threaded code must use a pointer to the
      thread-private storage:

```
#ifdef MPICH_MULTI_THREADED
#define MPIU_THREADPRIV_FIELD(name) MPIR_Thread->name
#else
#define MPIU_THREADPRIV_FIELD(name) MPIR_Thread.name
#endif
```

      This ensures the best performance in the single threaded case by
      avoiding the extra indirection that is unnecessary in that case.
    - `MPIU_THREADPRIV_INITKEY` - This macro is used to initialize the key
      that is used to identify the thread-private storage. This macro must
      be invoked early in `MPI_Init` (or `MPI_Init_thread`) to ensure that
      the key is available to all subsequently created threads.
    - `MPIU_THREADPRIV_INIT` - This macro is used to allocate and initialize
      the thread-private storage. It is used within the definition of
      `MPIU_THREADPRIV_GET` to handle the first use of thread-private storage
      within a thread and in the `MPI_Init` routine to make sure that the storage
      is allocated for the master thread.
    - `MPE_Thread_tls_t` - The type of the key for the thread-private storage.
      It may be initialized to the value `MPE_THREAD_TLS_T_NULL`.

A special routine, `MPIR_GetPerThread`, returns as its argument a
pointer to the thread-private storage for `MPIR_Thread`.

## An Alternative Proposal for Thread-Private Storage

Thread-private data is needed for some global state (such as the error
handler to be used in a thread) or by performance counters. In POSIX and
many other thread libraries, there are routines to access thread-private
data. Some compilers allow variables to be defined as thread-private -
the compiler inserts enough code to make this thread private. This
proposal is intended to replace the form described above to permit the
use of compiler-support thread-private variables.

The GCC form for thread-private variables is:

```
__thread <type> name;
```

e.g.,

```
__thread int errno;
```
as a global (or file-scoped) variable.

Where a library is used, there are the following declarations in MPICH:

```
MPID_Thread_tls_t <keyname>
```

a **local** (stack) variable for the thread private variable.

One way to unify these is to make all thread private variables pointers.
Then we can do something like the following (this is simplified, e.g.,
there is no error handling):

```
#if HAVE__THREAD
#define MPID_Threadpriv __thread void *
#define MPID_ThreadprivGet( name, localpointer ) localpointer = name
#define MPID_ThreadprivInit( name, size ) name = (void *)calloc(size)
#else
#define MPID_Threadpriv MPID_Thread_tls_t
#define MPID_ThreadprivGet( name, localpointer ) \
    MPID_Thread_tls_get( name, &localpointer )
#define MPID_ThreadprivInit( name, size ) \
    {void * _tmp = (void *)calloc(size);\
    MPID_Thread_tls_create( MPIR_FreeTLSSpace, &name,_tmp );\
    MPID_Thread_tls_set( name, _tmp );}
#endif
```

(The MPIR_FreeTLSSpace is used to free the allocated space when the
thread exits.)

An even better approach is to integrate this approach with the use of
thread-private structs, as used in MPICH for most of the thread-private
data.