File: chirp_protocol.html

package info (click to toggle)
cctools 3.5.1-2
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 5,704 kB
  • sloc: ansic: 49,398; cpp: 15,568; perl: 12,324; sh: 2,668; python: 1,422; makefile: 632; yacc: 433; lex: 152; xml: 109
file content (367 lines) | stat: -rwxr-xr-x 21,191 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
<style type="text/css">

#example {
	background: #ffffaa;
	font-family: courier;
	white-space: pre;
	margin: 10px;
	padding: 10px;
	border: 1 solid black;
}

#cmd {
	margin: 10px;
	padding: 10px;
	border: 1 solid black;
	background: #ffffaa;
	font-family: courier;
	font-weight: bold;
}

</style>

<title>Chirp Protocol Version 2</title>

<h1>Chirp Protocol Version 2</h2>
<address>Douglas Thain<br>
University of Notre Dame<br>
23 June 2008<br>
</address>

<h2>Introduction</h2>

Chirp is a simple and lightweight remote I/O protocol with multiple implementations.  It is used in several distributed computing systems where users wish to access data over the wide area using a Unix filesystem like protocol.  This document lays out the underlying protocol so as to promote interoperability between implementations, and permit additional implementations by other parties.  At the time of writing, there are four independent implementations used for both research and production:
<p>
<dir>
<li> A standalone C server and client maintained by Douglas Thain at the University of Notre Dame.
<li> A Java client implementation by Justin Wozniak used by the GEMS project at the University of Notre Dame.
<li> A C server and Java client embedded in the Condor distributed processing system at the University of Wisconsin.
<li> A C client implementation that is part of the ROOT I/O library used in the high energy physics community.
</dir>

<h2>Authentication Protocol</h2>

Chirp is carried over a stream protocol such as TCP or Unix domain sockets.  A server waits for a client to connect by the appropriate mechanism, and then waits for the client to identify itself.  There are currently two methods of authentication: cookie authentication and negotiated authentication.

<h3>Cookie Authentication</h3>

In certain situations, the client may be provided in advance with a "cookie" that provides expedited access to a server with a fixed identity.  This method is typically used within Condor when communicating with the embedded Chirp proxy.
<p>
In this situation, the client is provided with a file named <tt>chirp.config</tt> which contains the host and port of the server, and a cookie string.  The client must connect to the server and immediately send <tt>cookie</tt>, a space, the cookie string, and a newline.  If the response from the server is <tt>0\n</tt> then the cookie is accepted, and the client may proceed to the main Chirp protocol described below.  Any other response indicates the cookie was rejected, and the client must abort. 

<h3>Negotiated Authentication</h3>

In the general case, the client must prove its identity to the server.  There are several possible methods of authentication, and both client and server may only accept some subset of them.  The client drives the negotiation process by simply sending the requested authentication type as a plain string followed by a newline.  If the server does not support the method, it responds <tt>no\n</tt> and waits for the client to request another.  If it does support the method, it responds <tt>yes\n</tt> and then performs that method.  At the time of writing, the supported authentication methods are <tt>hostname</tt>, <tt>unix</tt>, <tt>kerberos</tt>, and <tt>globus</tt>.
<p>
In this example, a client first requests <tt>kerberos</tt> authentication, and then <tt>hostname</tt>:
<pre id=example>
client: kerberos
server: no
client: unix
server: yes
</pre>

<h4>Hostname Authentication</h4>

This method of authentication simply asks that the server identify by the client by the name of the host it is calling from.  The server responds by performing a remote domain name lookup on the callers IP address.  If this succeeds, the server responds <tt>yes\n</tt>, otherwise <tt>no\n</tt> and fails (see below).  If the client is authorized to connect, the server sends an additional <tt>yes\n</tt> otherwise <tt>no\n</tt> and fails.

<h4>Unix Authentication</h4>

This method of authentication causes the server the challenge the client to prove its identity by creating a file on the local file system.  The server transmits the name of a non-existent file in a writable directory.  The client must then attempt to create it.  If successful, the client should respond <tt>yes\n</tt> otherwise <tt>no\n</tt> and fails.  The server then checks that the file was actually created, and if satisfied responds <tt>yes\n</tt> otherwise <tt>no\n</tt> and fails.

<h4>Kerberos Authentication</h4>

This method is performed by invoking the standard Kerberos 5 libraries to authenticate in the usual manner, and is defined by that standard.

<h4>Globus Authentication</h4>

This method is performed by invoking the Secure Sockets Layer to authenticate using Globus credentials and is defined by that standard.

<h3>Completing Authentication</h3>

If the authentication method succeeds, then the server will send two lines to the client, giving the successful authentication method and the client's identity.  The client may then proceed to the main protocol below.  If the method fails, then the server and client return to negotiating a method.

<h2>Chirp Protocol</h2>

A request consists of an LF-terminated line possibly followed by binary data. At a minimum, a Chirp server must accept requests of 1024 characters. It may accept longer lines.  If a line exceeds a server's internal maximum, it must gracefully consume the excess characters and return a TOO_BIG error code, defined below. Certain requests are immediately followed by binary data.  The number of bytes is dependent on the semantics of the request, also described below.
<p>
A request is parsed into words separated by any amount of white space consisting of spaces and tabs.  The first word of the request is called the "command" and the rest are called "arguments."
<p>

Words fall into two categories: strings and decimals. A string is any arbitrary sequence of characters, with whitespaces and unprintables encoding according to RFC 2396.  (Very briefly, RFC 2396 says that whitespace must be encoding using a percent sign and two hex digits: <tt>ab cd</tt> becomes <tt>ab%20cd</tt>.)  A decimal is any sequence of the digits 0-9, optionally prefixed by a single + or -.
<p>

At the protocol level, words have no maximum size.  They may stretch on to fill any number of bytes.  Of course, each implementation has limits on concepts such a file name lengths and integer sizes.  If a receiver cannot parse, store, or execute a word contained in a request, it must gracefully consume the excess characters and return a TOO_BIG error response, defined below.
<p>
A response consists of an LF-terminated line, bearing an ASCII integer.  A valid response may also contain whitespace and additional optional material following the response.  If the response is greater than or equal to zero, the response indicates the operation succeeded. If negative, the operation failed, and the exact value tells why:
<p>
<pre id=example>
-1   NOT_AUTHENTICATED The client has not authenticated its identity.
-2   NOT_AUTHORIZED    The client is not authorized to perform that action.
-3   DOESNT_EXIST      There is no object by that name.
-4   ALREADY_EXISTS    There is already an object by that name.
-5   TOO_BIG           That request is too big to execute.
-6   NO_SPACE          There is not enough space to store that.
-7   NO_MEMORY         The server is out of memory.
-8   INVALID_REQUEST   The form of the request is invalid.
-9   TOO_MANY_OPEN     There are too many resources in use.
-10  BUSY              That object is in use by someone else.
-11  TRY_AGAIN         A temporary condition prevented the request.
-12  BAD_FD            The file descriptor requested is invalid.
-13  IS_DIR            A file-only operation was attempted on a directory.
-14  NOT_DIR           A directory operation was attempted on a file.
-15  NOT_EMPTY         A directory cannot be removed because it is not empty.
-16  CROSS_DEVICE_LINK A hard link was attempted across devices.
-17  OFFLINE           The requested resource is temporarily not available.
-127  UNKNOWN          An unknown error occurred.
</pre>
<p>

Negative values beyond -17 are reserved for future expansion.  The
receipt of such a values should be interpreted in the same way as UNKNOWN.

<h2>Chirp Commands</h2>

Following are the available commands.  Each argument to a command is specific 
Here are the available commands that are standardized.  Note that each implementation may support some additional commands which are not (yet) documented because they are still experimental.  Each argument to a command is specified with a type and a name.

<div id=cmd>open  (string:name) (string:flags) (decimal:mode)</div>

Open the file named "name"  "mode" is interpreted in the same manner as a POSIX file mode.  Note that the mode is printed in decimal representation, although the UNIX custom is to use octal.  For example, the octal mode 0755, representing readable and executable by everyone would be printed as 493. "flags" may contain any of these characters which affect the nature of the call:

<dir>
<li> r - open for reading
<li> w - open for writing
<li> a - force all writes to append
<li> t - truncate before use
<li> c - create if it doesn't exist
<li> x - fail if 'c' is given and the file already exists
</dir>

The open command returns an integer file description which may be used with later calls referring to open files.  The implementation may optionally return file metadata in the same form as the stat command, immediately following the result.  A client implementation must be prepared for both forms of result.  On failure, returns a negative value in the response.

Note that a file descriptor returned by open only has meaning within the current connection between the client and the server.  If the connection is lost, the server will implicitly close all open files.  A file descriptor opened in one connection has no meaning in another connection.

<div id=cmd>close (decimal:fd)</div>

Complete access to this file descriptor.  If the connection between a client an server is lost, all files are implicitly closed.

<div id=cmd>read (decimal:fd) (decimal:length)</div>

Read up to "length" bytes from the file descriptor "fd."  If successful, the server may respond with any value between zero and "length."   Immediately following the response will be binary data of exactly as many bytes indicated in the response.  If zero, the end of the file has been reached.  If any other value is given, no assumptions about the file size may be made.

<div id=cmd>pread (decimal:fd) (decimal:length) (decimal:offset)</div>

Read up to length bytes from the file descriptor fd, starting at offset.  Return value is identical to that of read.

<div id=cmd>read (decimal:fd) (decimal:length) (decimal:offset) (decimal:stride_length) (decimal:stride_skip)</div>

Read up to "length" bytes from the file descriptor "fd", starting at offset, retrieving stride_length bytes every stride_skip bytes.  Return value is identical to that of read.

<div id=cmd>write (decimal:fd) (decimal:length)</div>

Write up to "length" bytes to the file descriptor "fd."  This request should be immediately followed by "length" binary bytes.  If successful, may return any value between 0 and "length," indicating the number of bytes actually accepted for writing.  An efficient server should avoid accepting fewer bytes than requested, but the client must be prepared for this possibility.

<div id=cmd>pwrite  (decimal:fd) (decimal:length) (decimal:offset)</div>

Write up to "length" bytes to the file descriptor "fd" at offset offset.  Return value is identical to that of write.

<div id=cmd>swrite   (decimal:fd) (decimal:length) (decimal:offset) (decimal:stride_length) (decimal:stride_skip)</div>

Write up to "length" bytes to the file descriptor "fd", starting at offset, writing stride_length bytes every stride_skip bytes.  Return value is identical to that of write.

<div id=cmd>fstat (decimal:fd)</div>

Get metadata 

<div id=cmd>fsync (decimal:fd)</div>

Block until all uncommitted changes to this file descriptor have been written to stable storage.

<div id=cmd>lseek (decimal:fd) (decimal:offset) (decimal:whence)</div>

Move the current seek pointer of "fd" by "offset" bytes from the base given by "whence." "whence" may be:

<dir>
<li> 0 - from the beginning of the file
<li> 1 - from the current seek position
<li> 2 - from the end of the file.
</dir>

Returns the current seek position.

<div id=cmd>rename (string:oldpath) (string:newpath)</div>

Rename the file "oldpath" to be renamed to "newpath".

<div id=cmd>unlink (string:path)</div>

Delete the file named "path".

<div id=cmd>rmdir  (string:path)</div>

Delete a directory by this name.

<div id=cmd>rmall  (string:path)</div>

Recursively delete an entire directory.

<div id=cmd>mkdir  (string:name) (decimal:mode)</div>

Create a new directory by this name. "mode" is interpreted in the same manner as a POSIX file mode.  Note that mode is expressed in decimal rather than octal form.

<div id=cmd>fstat (decimal:fd)</div>

Get metadata describing an open file.  If the response line indicates success, it is followed by a second line of thirteen integer values, indicating the following values:
<dir>
<li> st_dev - Device number.
<li> st_ino - Inode number
<li> st_mode - Mode bits.
<li> st_nlink - Number of hard links.
<li> st_uid - User ID of the files owner.
<li> st_gid - Group ID of the file.
<li> st_rdev - Device number, if this file represents a device.
<li> st_size - Size of the file in bytes.
<li> st_blksize - Recommended transfer block size for accessing this file.
<li> st_blocks - Number of physical blocks consumed by this file.
<li> st_atime - Last time file was accessed in Unix time() format.
<li> st_mtime - Last time file data was modified in Unix time() format.
<li> st_ctime - Last time the inode was changed, in Unix time() format.
</dir>

Note that not all fields may have meaning to all implementations.  For example, in the standalone Chirp server, st_uid has no bearing on access control, and the user should employ setacl and getacl instead.

<div id=cmd>fstatfs (string:path)</div>

Get filesystem metadata for an open file.  If the response line indicates success, it is followed by a second line of seven decimals with the following interpretation:

<dir>
<li>f_type  - The integer type of the filesystem. 
<li>f_blocks  - The total number of blocks in the filesystem. 
<li>f_bavail  - The number of blocks available to an ordinary user. 
<li>f_bsize  - The size in bytes of a block. 
<li>f_bfree  - The number of blocks free. 
<li>f_files  - The maximum number of files (inodes) on the filesystem. 
<li>f_ffree  - The number of files (inodes) currently in use
</dir>

<div id=cmd>fchown (decimal:fd) (decimal:uid) (decimal:gid)</div>

Change the ownership of an open file to the UID and GID indicated.

<div id=cmd>fchmod (decimal:fd) (decimal:mode)</div>

Change the Unix mode bits on an open file.

<div id=cmd>ftruncate (decimal:fd) (decimal:length)</div>

Truncate an open file.

<div id=cmd>getfile (string:path)</div>

Retrieves an entire file efficiently.  A positive response indicates the number of bytes in the file, and is followed by exactly that many bytes of binary data which are the files contents.

<div id=cmd>putfile (string:path) (decimal:mode) (decimal:length)</div>

Stores an entire file efficiently.  The client first sends the request line indicating the file name, the desired mode bits, and the length in bytes.  The response indicates whether the client may proceed.  If it indicates success, the client must write exactly length bytes of data, which are the files contents.  If the response indicates failure, the client must not send any additional data.

<div id=cmd>getlongdir (string:path)</div>

Lists a directory and all metadata.  If the response indicates success, it will be followed by a series of lines, alternating the name of a directory entry with its metadata in the same form as returned by fstat.  The end of the list is indicated by a single blank line.

<div id=cmd>getdir (string:path)</div>

Lists a directory.  If the response indicates success, it will be followed by a series of lines indicating the name of each directory entry. The end of the list is indicated by a single blank line.

<div id=cmd>getacl (string:path)</div>

Get an access control list.  If the response indicates success, it will be followed by a series of lines indicating each entry of the access control list.  The end of the list is indicated by a single blank line.

<div id=cmd>setacl (string:path) (string:subject) (string:rights)</div>

Modify an access control list.  On an object identified by path, set rights rights for a given subject.  The string - may be used to indicate no rights.

<div id=cmd>whoami</div>

Get the users current identity with respect to this server.  If the response is greater than zero, it is followed by exactly that many bytes in data, containing the users identity.

<div id=cmd>whoareyou (string:rhost)</div>

Get the servers current identity with respect to a remote host.  If the response is greater than zero, it is followed by exactly that many bytes in data, containing the servers identity.

<div id=cmd>link (string:oldpath) (string:newpath)</div>

Create a hard link from newpath to oldpath.

<div id=cmd>symlink (string:oldpath) (string:newpath)</div>

Create a symlink from newpath to oldpath.

<div id=cmd>readlink (string:path)</div>

Read the contents of a symbolic link.  If the response is greater than zero, it is followed by exactly that many bytes in data, containing the contents of the link.

<div id=cmd>mkdir (string:path) (decimal:mode)</div>

Create a new directory with the given Unix mode bits.

<div id=cmd>stat (string:path)</div>

Get file status.  If the response indicates success, it is followed by a second line in the same format as the command fstat.  If the pathname represents a symbolic link, this command examines the target of the link.

<div id=cmd>lstat (string:path)</div>

Get file status.  If the response indicates success, it is followed by a second line in the same format as the command fstat.  If the pathname represents a symbolic link, this command examines the link itself.

<div id=cmd>statfs (string:path)</div>

Get filesystem status.  If the response indicates success, it is followed by a second line in the same format as the command fstatfs.

<div id=cmd>access (string:path) (decimal:mode)</div>

Check access permissions.  Returns success if the user may access the file according to the specified mode.  The mode may be any of the values logical-ord together:
<dir>
<li> 0 (F_OK)  Test for existence of the file.
<li> 1 (X_OK)  Test if the file is executable.
<li> 2 (W_OK)  Test if the file is readable.
<li> 4 (R_OK)  Test if the file is executable.
</dir>

<div id=cmd>chmod (string:path) (decimal:mode)</div>

Change the Unix mode bits on a given path.  (NOTE 1)

<div id=cmd>chown (string:path) (decimal:uid) (decimal:gid)</div>

Change the Unix UID or GID on a given path.  If the path is a symbolic link, change its target.  (NOTE 1)

<div id=cmd>lchown (string:path) (decimal:uid) (decimal:gid)</div>

Change the Unix UID or GID on a given path.  If the path is a symbolic link, change the link.  (NOTE 1)

<div id=cmd>truncate (string:path) (decimal:length)</div>

Truncate a file to a given number of bytes.

<div id=cmd>utime (string:path) (decimal:actime) (decimal:mtime)</div>

Change the access and modification times of a file.

<div id=cmd>md5 (string:path)</div>

Checksum a remote file using the MD5 message digest algorithm.  If successful, the response will be 16, and will be followed by 16 bytes of data representing the checksum in binary form.

<div id=cmd>thirdput (string:path) (string:remotehost) (string:remotepath)</div>

Direct the server to transfer the path to a remote host and remote path.  If the indicated path is a directory, it will be transferred recursively, preserving metadata such as access control lists.

<div id=cmd>mkalloc (string:path) (decimal:size) (decimal:mode)</div>

Create a new space allocation at the given path that can contain size bytes of data and has initial mode of mode.

<div id=cmd>lsalloc (string:path)</div>

List the space allocation state on a directory.  If the response indicates success, it is followed by a second line which gives the path of the containing allocation, the total size of the allocation, and the 
<p>
<b>(NOTE 1)</b> The standalone Chirp server ignores the traditional Unix mode bits when performing access control.  Calls such as fchmod, chmod, chown, and fchown are essentially ignored, and the caller should employ setacl and getacl instead.