File: librpmem.7.md

package info (click to toggle)
pmdk 1.9.2-1~bpo10+1
  • links: PTS, VCS
  • area: main
  • in suites: buster-backports
  • size: 29,060 kB
  • sloc: ansic: 126,596; sh: 27,032; cpp: 10,073; python: 4,080; makefile: 3,457; pascal: 1,463; perl: 1,372
file content (460 lines) | stat: -rw-r--r-- 16,065 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
---
layout: manual
Content-Style: 'text/css'
title: _MP(LIBRPMEM, 7)
collection: librpmem
header: PMDK
date: rpmem API version 1.3
...

[comment]: <> (SPDX-License-Identifier: BSD-3-Clause)
[comment]: <> (Copyright 2016-2019, Intel Corporation)

[comment]: <> (librpmem.7 -- man page for librpmem)

[NAME](#name)<br />
[SYNOPSIS](#synopsis)<br />
[DESCRIPTION](#description)<br />
[TARGET NODE ADDRESS FORMAT](#target-node-address-format)<br />
[REMOTE POOL ATTRIBUTES](#remote-pool-attributes)<br />
[SSH](#ssh)<br />
[FORK](#fork)<br />
[CAVEATS](#caveats)<br />
[LIBRARY API VERSIONING](#library-api-versioning-1)<br />
[ENVIRONMENT](#environment)<br />
[DEBUGGING AND ERROR HANDLING](#debugging-and-error-handling)<br />
[EXAMPLE](#example)<br />
[ACKNOWLEDGEMENTS](#acknowledgements)<br />
[SEE ALSO](#see-also)

# NAME #

**librpmem** - remote persistent memory support library (EXPERIMENTAL)

# SYNOPSIS #

```c
#include <librpmem.h>
cc ... -lrpmem
```

##### Library API versioning: #####

```c
const char *rpmem_check_version(
	unsigned major_required,
	unsigned minor_required);
```

##### Error handling: #####

```c
const char *rpmem_errormsg(void);
```

##### Other library functions: #####

A description of other **librpmem** functions can be found on the following
manual pages:

+ **rpmem_create**(3), **rpmem_persist**(3)

# DESCRIPTION #

**librpmem** provides low-level support for remote access to
*persistent memory* (pmem) utilizing RDMA-capable RNICs. The library can be
used to remotely replicate a memory region over the RDMA protocol. It utilizes
an appropriate persistency mechanism based on the remote node's platform
capabilities. **librpmem** utilizes the **ssh**(1) client to authenticate
a user on the remote node, and for encryption of the connection's out-of-band
configuration data. See **SSH**, below, for details.

The maximum replicated memory region size can not be bigger than the maximum
locked-in-memory address space limit. See **memlock** in **limits.conf**(5)
for more details.

This library is for applications that use remote persistent memory directly,
without the help of any library-supplied transactions or memory
allocation. Higher-level libraries that build on **libpmem**(7) are
available and are recommended for most applications, see:

+ **libpmemobj**(7), a general use persistent memory API, providing memory
allocation and transactional operations on variable-sized objects.

# TARGET NODE ADDRESS FORMAT #

```
[<user>@]<hostname>[:<port>]
```

The target node address is described by the *hostname* which the client
connects to, with an optional *user* name. The user must be authorized
to authenticate to the remote machine without querying for password/passphrase.
The optional *port* number is used to establish the SSH connection. The default
port number is 22.

# REMOTE POOL ATTRIBUTES #

The *rpmem_pool_attr* structure describes a remote pool and is stored in remote
pool's metadata. This structure must be passed to the **rpmem_create**(3)
function by caller when creating a pool on remote node. When opening the pool
using **rpmem_open**(3) function the appropriate fields are read from pool's
metadata and returned back to the caller.

```c
#define RPMEM_POOL_HDR_SIG_LEN    8
#define RPMEM_POOL_HDR_UUID_LEN   16
#define RPMEM_POOL_USER_FLAGS_LEN 16

struct rpmem_pool_attr {
	char signature[RPMEM_POOL_HDR_SIG_LEN];
	uint32_t major;
	uint32_t compat_features;
	uint32_t incompat_features;
	uint32_t ro_compat_features;
	unsigned char poolset_uuid[RPMEM_POOL_HDR_UUID_LEN];
	unsigned char uuid[RPMEM_POOL_HDR_UUID_LEN];
	unsigned char next_uuid[RPMEM_POOL_HDR_UUID_LEN];
	unsigned char prev_uuid[RPMEM_POOL_HDR_UUID_LEN];
	unsigned char user_flags[RPMEM_POOL_USER_FLAGS_LEN];
};
```

The *signature* field is an 8-byte field which describes the pool's on-media
format.

The *major* field is a major version number of the pool's on-media format.

The *compat_features* field is a mask describing compatibility of pool's
on-media format optional features.

The *incompat_features* field is a mask describing compatibility of pool's
on-media format required features.

The *ro_compat_features* field is a mask describing compatibility of pool's
on-media format features. If these features are not available,
the pool shall be opened in read-only mode.

The *poolset_uuid* field is an UUID of the pool which the remote pool is
associated with.

The *uuid* field is an UUID of a first part of the remote pool. This field can
be used to connect the remote pool with other pools in a list.

The *next_uuid* and *prev_uuid* fields are UUIDs of next and previous replicas
respectively. These fields can be used to connect the remote pool with other
pools in a list.

The *user_flags* field is a 16-byte user-defined flags.

# SSH #

**librpmem** utilizes the **ssh**(1) client to login and execute the
**rpmemd**(1) process on the remote node. By default, **ssh**(1)
is executed with the **-4** option, which forces using **IPv4** addressing.

For debugging purposes, both the ssh client and the commands executed
on the remote node may be overridden by setting the **RPMEM_SSH** and
**RPMEM_CMD** environment variables, respectively. See **ENVIRONMENT**
for details.

# FORK #
The **ssh**(1) client is executed
by **rpmem_open**(3) and **rpmem_create**(3) after forking a child process
using **fork**(2).  The application must take this into account when
using **wait**(2) and **waitpid**(2), which may return the *PID* of
the **ssh**(1) process executed by **librpmem**.

If **fork**(2) support is not enabled in **libibverbs**,
**rpmem_open**(3) and **rpmem_create**(3) will fail.
By default, **fabric**(7) initializes **libibverbs** with **fork**(2) support
by calling the **ibv_fork_init**(3) function. See **fi_verbs**(7) for more
details.

# CAVEATS #

**librpmem** relies on the library destructor being called from the main thread.
For this reason, all functions that might trigger destruction (e.g.
**dlclose**(3)) should be called in the main thread. Otherwise some of the
resources associated with that thread might not be cleaned up properly.

**librpmem** registers a pool as a single memory region. A Chelsio T4 and T5
hardware can not handle a memory region greater than or equal to 8GB due to
a hardware bug. So *pool_size* value for **rpmem_create**(3) and **rpmem_open**(3)
using this hardware can not be greater than or equal to 8GB.

# LIBRARY API VERSIONING #

This section describes how the library API is versioned,
allowing applications to work with an evolving API.

The **rpmem_check_version**() function is used to see if the installed
**librpmem** supports the version of the library API required by an
application. The easiest way to do this is for the application to supply
the compile-time version information, supplied by defines in
**\<librpmem.h\>**, like this:

```c
reason = rpmem_check_version(RPMEM_MAJOR_VERSION,
                             RPMEM_MINOR_VERSION);
if (reason != NULL) {
	/* version check failed, reason string tells you why */
}
```

Any mismatch in the major version number is considered a failure, but a
library with a newer minor version number will pass this check since
increasing minor versions imply backwards compatibility.

An application can also check specifically for the existence of an
interface by checking for the version where that interface was
introduced. These versions are documented in this man page as follows:
unless otherwise specified, all interfaces described here are available
in version 1.0 of the library. Interfaces added after version 1.0 will
contain the text *introduced in version x.y* in the section of this
manual describing the feature.

When the version check performed by **rpmem_check_version**() is
successful, the return value is NULL. Otherwise the return value is a
static string describing the reason for failing the version check. The
string returned by **rpmem_check_version**() must not be modified or
freed.

# ENVIRONMENT #

**librpmem** can change its default behavior based on the following
environment variables. These are largely intended for testing and are
not normally required.

+ **RPMEM_SSH**=*ssh_client*

Setting this environment variable overrides the default **ssh**(1) client
command name.

+ **RPMEM_CMD**=*cmd*

Setting this environment variable overrides the default command executed on
the remote node using either **ssh**(1) or the alternative remote shell command
specified by **RPMEM_SSH**.

**RPMEM_CMD** can contain multiple commands separated by a vertical bar (`|`).
Each consecutive command is executed on the remote node in order read from a
pool set file. This environment variable is read when the library is
initialized, so **RPMEM_CMD** must be set prior to application launch (or
prior to **dlopen**(3) if **librpmem** is being dynamically loaded).

+ **RPMEM_ENABLE_SOCKETS**=0\|1

Setting this variable to 1 enables using **fi_sockets**(7) provider for
in-band RDMA connection. The *sockets* provider does not support IPv6.
It is required to disable IPv6 system wide if **RPMEM_ENABLE_SOCKETS** == 1 and
*target* == localhost (or any other loopback interface address) and
**SSH_CONNECTION** variable (see **ssh**(1) for more details) contains IPv6
address after ssh to loopback interface. By default the *sockets* provider is
disabled.

* **RPMEM_ENABLE_VERBS**=0\|1

Setting this variable to 0 disables using **fi_verbs**(7) provider for
in-band RDMA connection. The *verbs* provider is enabled by default.

* **RPMEM_MAX_NLANES**=*num*

Limit the maximum number of lanes to *num*. See **LANES**, in **rpmem_create**(3), for details.

* **RPMEM_WORK_QUEUE_SIZE**=*size*

Suggest the work queue size. The effective work queue size can be greater than
suggested if **librpmem** requires it or it can be smaller if underlying hardware
does not support the suggested size. The work queue size affects the performance
of communication to the remote node.
**rpmem_flush**(3) operations can be added to the work queue up to the size of
this queue. When work queue is full any subsequent call has to wait till the work
queue will be drained. **rpmem_drain**(3) and **rpmem_persist**(3) among other
things also drain the work queue.

# DEBUGGING AND ERROR HANDLING #

If an error is detected during the call to a **librpmem** function, the
application may retrieve an error message describing the reason for the failure
from **rpmem_errormsg**(). This function returns a pointer to a static buffer
containing the last error message logged for the current thread. If *errno*
was set, the error message may include a description of the corresponding
error code as returned by **strerror**(3). The error message buffer is
thread-local; errors encountered in one thread do not affect its value in
other threads. The buffer is never cleared by any library function; its
content is significant only when the return value of the immediately preceding
call to a **librpmem** function indicated an error, or if *errno* was set.
The application must not modify or free the error message string, but it may
be modified by subsequent calls to other library functions.

Two versions of **librpmem** are typically available on a development
system. The normal version, accessed when a program is linked using the
**-lrpmem** option, is optimized for performance. That version skips checks
that impact performance and never logs any trace information or performs any
run-time assertions.

A second version of **librpmem**, accessed when a program uses the libraries
under _DEBUGLIBPATH(), contains run-time assertions and trace points. The
typical way to access the debug version is to set the environment variable
**LD_LIBRARY_PATH** to _LDLIBPATH(). Debugging output is
controlled using the following environment variables. These variables have
no effect on the non-debug version of the library.

>NOTE:
On Debian/Ubuntu systems, this extra debug version of the library is
shipped in the respective **-debug** Debian package and placed in
the **/usr/lib/$ARCH/pmdk_dbg/** directory.

+ **RPMEM_LOG_LEVEL**

The value of **RPMEM_LOG_LEVEL** enables trace points in the debug version
of the library, as follows:

+ **0** - This is the default level when **RPMEM_LOG_LEVEL** is not set.
No log messages are emitted at this level.

+ **1** - Additional details on any errors detected are logged
(in addition to returning the *errno*-based errors as usual).
The same information may be retrieved using **rpmem_errormsg**().

+ **2** - A trace of basic operations is logged.

+ **3** - Enables a very verbose amount of function call
tracing in the library.

+ **4** - Enables voluminous and fairly obscure tracing information
that is likely only useful to the **librpmem** developers.

Unless **RPMEM_LOG_FILE** is set, debugging output is written to *stderr*.

+ **RPMEM_LOG_FILE**

Specifies the name of a file where all logging information should be written.
If the last character in the name is "-", the *PID* of the current process will
be appended to the file name when the log file is created. If
**RPMEM_LOG_FILE** is not set, logging output is written to *stderr*.

# EXAMPLE #

The following example uses **librpmem** to create a remote pool on given
target node identified by given pool set name. The associated local memory
pool is zeroed and the data is made persistent on remote node. Upon success
the remote pool is closed.

```c
#include <assert.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <librpmem.h>

#define POOL_SIGNATURE	"MANPAGE"
#define POOL_SIZE	(32 * 1024 * 1024)
#define NLANES		4

#define DATA_OFF	4096
#define DATA_SIZE	(POOL_SIZE - DATA_OFF)

static void
parse_args(int argc, char *argv[], const char **target, const char **poolset)
{
	if (argc < 3) {
		fprintf(stderr, "usage:\t%s <target> <poolset>\n", argv[0]);
		exit(1);
	}

	*target = argv[1];
	*poolset = argv[2];
}

static void *
alloc_memory()
{
	long pagesize = sysconf(_SC_PAGESIZE);
	if (pagesize < 0) {
		perror("sysconf");
		exit(1);
	}

	/* allocate a page size aligned local memory pool */
	void *mem;
	int ret = posix_memalign(&mem, pagesize, POOL_SIZE);
	if (ret) {
		fprintf(stderr, "posix_memalign: %s\n", strerror(ret));
		exit(1);
	}

	assert(mem != NULL);

	return mem;
}

int
main(int argc, char *argv[])
{
	const char *target, *poolset;
	parse_args(argc, argv, &target, &poolset);

	unsigned nlanes = NLANES;
	void *pool = alloc_memory();
	int ret;

	/* fill pool_attributes */
	struct rpmem_pool_attr pool_attr;
	memset(&pool_attr, 0, sizeof(pool_attr));
	strncpy(pool_attr.signature, POOL_SIGNATURE, RPMEM_POOL_HDR_SIG_LEN);

	/* create a remote pool */
	RPMEMpool *rpp = rpmem_create(target, poolset, pool, POOL_SIZE,
			&nlanes, &pool_attr);
	if (!rpp) {
		fprintf(stderr, "rpmem_create: %s\n", rpmem_errormsg());
		return 1;
	}

	/* store data on local pool */
	memset(pool, 0, POOL_SIZE);

	/* make local data persistent on remote node */
	ret = rpmem_persist(rpp, DATA_OFF, DATA_SIZE, 0, 0);
	if (ret) {
		fprintf(stderr, "rpmem_persist: %s\n", rpmem_errormsg());
		return 1;
	}

	/* close the remote pool */
	ret = rpmem_close(rpp);
	if (ret) {
		fprintf(stderr, "rpmem_close: %s\n", rpmem_errormsg());
		return 1;
	}

	free(pool);

	return 0;
}
```

# NOTE #

The **librpmem** API is experimental and may be subject to change in the future.
However, using the remote replication in **libpmemobj**(7) is safe and backward
compatibility will be preserved.

# ACKNOWLEDGEMENTS #

**librpmem** builds on the persistent memory programming model
recommended by the SNIA NVM Programming Technical Work Group:
<https://snia.org/nvmp>

# SEE ALSO #

**rpmemd**(1), **ssh**(1), **fork**(2), **dlclose**(3), **dlopen**(3),
**ibv_fork_init**(3), **rpmem_create**(3), **rpmem_drain**(3), **rpmem_flush**(3),
**rpmem_open**(3), **rpmem_persist**(3), **strerror**(3), **limits.conf**(5),
**fabric**(7), **fi_sockets**(7), **fi_verbs**(7), **libpmem**(7), **libpmemblk**(7),
**libpmemlog**(7), **libpmemobj**(7)
and **<https://pmem.io>**