File: TIPC_Programmers_Guide.txt

package info (click to toggle)
swi-prolog 9.0.4%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 82,408 kB
  • sloc: ansic: 387,503; perl: 359,326; cpp: 6,613; lisp: 6,247; java: 5,540; sh: 3,147; javascript: 2,668; python: 1,900; ruby: 1,594; yacc: 845; makefile: 428; xml: 317; sed: 12; sql: 6
file content (1473 lines) | stat: -rw-r--r-- 68,836 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
TIPC Programmer's Guide
=======================

Version 1.2.2 (03 July 2007)

This document is intended to assist software developers who are writing 
applications that use TIPC.  For information about setting up and operating a 
network that supports TIPC please see the TIPC User's Guide.

ADVISORY:
Many topics discussed in this document are presented in a very compact format,
which presents as much information as possible in as few words as possible.
Readers will gain the most benefit from this document by carefully reading
(and re-reading) the material, as this will provide a better understanding of
TIPC than is likely to be obtained by quickly skimming through it.


Table of Contents
-----------------
1. TIPC Fundamentals
2. Socket API
3. Native API
4. FAQ
5. Tips and Techniques


1. TIPC Fundamentals
--------------------
A brief summary of the major concepts is provided in the following sections.

For a comprehensive description of TIPC, please consult the latest version of 
the TIPC specification at http://sourceforge.net/projects/tipc.

IMPORTANT: 
At the time of this writing, the TIPC specification has not been updated to 
include the latest modifications incorporated into TIPC version 1.5 (or 
later).  In cases of conflict between the specification and this document, 
the information in this document takes precedence.

1.1 TIPC Network Structure
--------------------------
Conceptually, a TIPC network consists of individual processing elements or 
"nodes".  A set of related nodes form a "cluster", while a set of related 
clusters form a "zone".  Typically the grouping of nodes into clusters and 
zones is based on location; for example, all nodes in the same shelf or the 
same room may be assigned to the same cluster, while all clusters in the same 
building may be assigned to the same zone.

Each node in a TIPC network is assigned a unique network address consisting 
of a zone, cluster, and node identifier, usually denoted <Z.C.N>.  Each of 
these identifiers is an integer value in the range from 1 to the maximum 
value defined by the network administrator.  (Exception: A TIPC node which is 
not yet part of a larger network uses the default network address of <0.0.0> 
until a real network address is assigned.)  By default the maximum values for 
zone, cluster, and node identifiers are 3, 1, and 63, respectively, while the 
theoretical maximums are 255, 4095, and 2047, respectively.

A TIPC node is also assigned a "network identifier".  This allows multiple 
TIPC networks to use the same physical medium (for example, the same Ethernet 
cables) without interfering with one another -- each node only recognizes 
traffic originating on nodes having the same network identifier.

Nodes in a TIPC network communicate with each other by using one or more 
network interfaces to send and receive messages.  Each network interface must 
be connected to a physical medium that is supported by TIPC (such as 
Ethernet).  When properly configured, TIPC automatically establishes "links" 
to enable communication with the other nodes in the network, and takes care 
of routing traffic over the appropriate link, retransmitting messages in the 
event of errors, etc.

THINGS TO REMEMBER:
- TIPC network addressing is NOT like IP network addressing!!!  There is only 
one network address per node in TIPC, even if the node has multiple network 
interfaces.  A node's network interfaces are NOT assigned network addresses 
at all.
- The network administrator takes care of assigning the network address and 
the network identifier for each node in the network, so programmers don't 
have to worry about this.
- The network administrator also takes care of configuring each node's 
network interfaces to enable communication with all other nodes in the 
network, so programmers don't have to worry about this either.

CURRENT LIMITATIONS: 
- TIPC only supports a single cluster per zone.
- TIPC does not support inter-zone communication, so nodes in different zones 
are effectively part of distinct networks even if they have the same network 
identifier.
- TIPC does not support the "secondary nodes" concept mentioned in the 
specification document.

1.2 Messaging Overview
----------------------
Applications in a TIPC network typically communicate with one another by 
sending data units called "messages" between communication endpoints called 
"ports".

From an application's perspective, a message is a byte string from 1 to 66000 
bytes long, whose internal structure is determined by the application.  A 
port is an entity that can send and receive messages in either a connection-
oriented manner or a connectionless manner.

Connection-oriented messaging allows a port to establish a connection to a 
peer port elsewhere in the network, and then exchange messages with that 
peer.  A connection can be established using an explicit handshake mechanism 
prior to the exchange of any application messages (a variation of the SYN/ACK 
mechanism used by TCP) or an implicit handshake mechanism that occurs during 
the first exchange of application messages.  Once a connection has been 
established it remains active until it is terminated by one of the ports, or 
until the communication path between the ports is severed (for example, by 
the failure of the node on which one of the ports is running); TIPC then 
immediately notifies the affected port(s) that the connection has terminated.

Connectionless messaging allows a port to exchange messages with one or more 
ports elsewhere in the network.  A given message can be sent to a single port 
(unicast) or to a collection of ports (multicast), depending on the destination
address specified when the message is sent.

TIPC is designed to be a reliable messaging mechanism, in which an application
can send a message and assume that the message will be delivered to the
specified destination as long as that destination is reachable.  If a message
cannot be delivered the message sender can specify whether it should be returned
to its point of origin or discarded, according to the needs of the application;
however, it should be noted that some conditions that prevent TIPC from
delivering a message may also prevent it from returning the message.
(See the section on "Message Delivery" that appears later in this document for
more information.)

THINGS TO REMEMBER:
- In a TIPC network where different nodes may be running on different CPU 
types and/or operating systems applications must ensure that the internal 
structure of a message is well-defined, and accounts for any differences in 
message content endianness, field size, and field padding.

1.3 TIPC Addressing
-------------------
TIPC uses 3 distinct forms of addressing within a network.

1.3.1 Network Address
---------------------
The network address was introduced in section 1.1, and is typically denoted 
as <Z.C.N>.  Applications use this address format with certain operations to 
specify the portion of a TIPC network the operation applies to:

a) <Z.C.N> indicates a network node
b) <Z.C.0> indicates a network cluster
c) <Z.0.0> indicates a network zone
d) <0.0.0> has special meaning, which is operation-specific

When coding, a network address is represented as an unsigned 32-bit value, 
comprising 3 fields: an 8-bit zone field, a 12-bit cluster field, and a 12-
bit node field.  (See section 1.6 for a description of routines for 
constructing and deconstructing networks addresses.)

1.3.2 Port Identifier
---------------------
Each port in a TIPC network has a unique "port identifier" or "port ID", 
which is typically denoted as <Z.C.N:ref>.  The port ID is assigned 
automatically by TIPC when the port is created, and consists of the 32-bit 
network address of the port's node and a 32-bit reference value.  The 
reference value is guaranteed to be unique on a per-node basis and will not 
be reused for a long time once the port ceases to exist. 

1.3.3 Port Naming
-----------------
While a TIPC port can send messages to another port by specifying the port ID 
of the destination port, it is usually more convenient to use a "functional 
address" that does not require the sending port to know the physical location 
of the destination within the network.  This simplifies communication when 
server ports are being created, deleted, or relocated dynamically, or when 
multiple ports are providing a given service.

The basic unit of functional addressing within TIPC is the "port name", which 
is typically denoted as {type,instance}.  A port name consists of a 32-bit 
type field and a 32-bit instance field, both of which are chosen by the 
application.  Typically, the type field is used to indicate the class of 
service provided by the port, while the instance field can be used as a sub-
class indicator.

Unlike port IDs, port names need not be unique within a TIPC network.  
Applications are permitted to assign a given port name to multiple ports, and 
to assign multiple port names to a given port, or both.  Port names can also 
be unbound from a port if they are no longer required.

Whenever an application binds a port name to a port, it must specify the level
of visibility, or "scope", that the name has within the TIPC network: either
"node scope", "cluster scope", or "zone scope".  TIPC then ensures that only 
applications within that portion of the network (i.e. the same node, the same 
cluster, or the same zone) can access the port using that name.

To simplify the task of specifying a range of similar port name instances 
TIPC supports the concept of the "port name sequence", which is typically 
denoted as {type,lower bound,upper bound}.  A port name sequence consists 
of a 32-bit type field and a pair of 32-bit instance fields, and represents 
the set of port names from {type,lower bound} through {type,upper bound}, 
inclusive.  The lower bound of a name sequence cannot be larger than the 
upper bound.

There are a number of restrictions on port naming that programmers need to 
be aware of.

1) The type values from 0 to 63 are reserved by TIPC and cannot be used to 
designate an application service.

2) Port names and name sequences are designed for use by server ports.  TIPC 
does not allow a named sever port to initiate a connection (as if it were a 
client port), nor does it allow the assignment of names to a connected client 
port (as if it were a server port).

3) TIPC does not currently allow the creation of partially overlapping port 
name sequences (unless the name sequences cannot be seen simultaneously).  For 
example, once a node has a port having the name sequence {100,1000,2000} -- 
or the node is notified that another node has a port with this name sequence --
it cannot then assign the name sequences {100,500,1200} or {100,1100,1500}
to any of its ports; however, the node is permitted to assign {100,1000,2000}
to other ports or to use name sequences having other type values such as 
{150,500,1200}.  Overlapping name sequences *are* permitted if they are
published by different nodes and are published with non-overlapping scopes; 
for example, you can publish {100,500,1200} on node <1.1.1> and {100,1100,1500}
on node <1.1.2> as long as they both are published with node scope.

THINGS TO REMEMBER:
- Programmers typically only have to worry about the selection of TIPC names 
and name sequences.  (Network addresses are chosen by the TIPC network 
adminstrator, while port ID's are chosen by TIPC automatically.)
- Well-known names (or name sequences) are used in a TIPC network in the same 
way that well-known port numbers are used in an IP network.
- An application must use TIPC names (or name sequences) that do not conflict 
with the names used by other applications.

1.4 Using Port Names
--------------------
This section discusses more advanced aspects of TIPC's functional addressing.

1.4.1 Address Resolution 
------------------------
Whenever an application specifies a port name as the destination address of a
message, it must also indicate where within the network TIPC should look to
find the destination by specifying a "lookup domain".

The most commonly used lookup domain is <0.0.0>, which tells TIPC to use a
"closest first" approach.  TIPC first looks on the sending node to find a port
having the specified port name; if more than one such port exists, TIPC selects
one in a round-robin manner.  If the sending node does not contain a matching
port, TIPC then looks to all other nodes in the sending node's cluster to see
if any ports have published that name using cluster scope or zone scope; again,
if more than one such port exists, TIPC selects one in a round-robin manner.
Finally, if no matching port is found within the sending node's cluster, TIPC
looks at all other nodes in the sending node's zone for ports with a matching
name and having zone scope, and selects one in a round-robin manner.  (In short,
address resolution is performed using 3 lookup domains in succession: first
using <Z.C.N>, then <Z.C.0>, and finally <Z.0.0>.)  This algorithm results in 
the message being delivered to a suitable destination as quickly as possible, 
and also load sharing similarly named messages among all such destinations at 
the same distance from the sender.

Alternatively, an application can specify a single lookup domain to be used
for address resolution.  Specifying a lookup domain of the form <Z.C.N>, 
<Z.C.0>, or <Z.0.0> tells TIPC to take all ports with compatible name and scope
values within the specified node, cluster, or zone, respectively, and then 
select one in a round-robin manner.  These forms can be useful in preventing 
a message from being sent off-node, or for evenly distributing work to all 
servers scattered throughout a cluster or zone.

It should be noted that the round-robin selection mechanism used by TIPC is
shared by all applications using a given node.  So, for example, if there are
two ports within the specified lookup domain that have the desired port name,
an application cannot assume that two successive messages it sends to that name
will be distributed one to each port.  This is because a similarly named message
sent by another application may arrive in the interim, thereby causing the
second message sent by the first application to go to the same destination as
its first message.

1.4.2 Multicast Messaging
-------------------------
Whenever an application specifies a port name sequence as the destination 
address of a message (rather than a port name), this instructs TIPC to send a
copy of the message to every port in the sender's cluster that has at least one 
port name within the destination name sequence.

This is most easily illustrated using an example.  Suppose a multicast message
is sent to {1000,100,200}, then the following ports will each receive exactly
one copy of the message:

    <1.1.10:1234> having {1000,100}                 - one matching name
    <1.1.11:4321> having {1000,123} and {1000,175}  - two matching names
    <1.1.10:5678> having {1000,150} and {2000,150}  - non-matching name ignored
    <1.1.12:5555> having {1000,110,120}             - subset overlap
    <1.1.10:8888> having {1000,50,500}              - superset overlap
    <1.1.14:9999> having {1000,170,300}             - partial overlap

while the following ports will not receive a copy at all:

    <1.1.10:1111> having {2000,100,200}             - name type mismatch
    <1.1.10:4444> having {1000,50,75}               - no overlap
    <1.1.10:6666> having no names bound to it       - no overlap

Note that a port never receives more than one copy of the multicast message,
even if it has several port names (or port name sequences) bound to it that
overlap the specified destination name sequence.

Also note that the requirement that the destination address for a multicast
message be a name sequence does not prevent applications from multicasting to
a single port name; an application can simply specify a name sequence that
encompasses a single instance value, such as {1000,123,123}.

THINGS TO REMEMBER:
- Multicast messaging can only be done in a connectionless manner, as TIPC
does not support the concept of a "one to many" or "many to many" connection.
- It is not possible to limit the distribution of a multicast message to the
ports within a given node by specifying a "lookup domain", as can be done with 
unicast messages.

1.4.3 Name Subscriptions
------------------------
TIPC provides a network topology service that applications can use to receive
information about what port names exist within the application's network zone.

An application accesses the topology service by opening a message-based 
connection to port name {1,1} and then sending "subscription" messages to the 
topology service that indicate the port names of interest to the application; 
in return, the topology service sends "event" messages to the application when 
these names are published or withdrawn by ports within the network.  
Applications are allowed to have multiple subscriptions active at the same 
time; issuing a new subscription does not affect any existing subscription.

A subscription request message must contain the following information:

1) The port name sequence of interest to the application.

   Applications that are interested in a single port name can specify a port
   name sequence in which the lower and upper instance values are the same.

2) An event filter specifying which events are of interest to the application.

   The value TIPC_SUB_PORTS causes the topology service to generate a 
   TIPC_PUBLISHED event for each port name or port name sequence it finds that
   overlaps the specified port name sequence; a TIPC_WITHDRAWN event is issued
   each time a previously reported name becomes unavailable.  The value
   TIPC_SUB_SERVICE causes the topology service to generate a single publish 
   event for the first port it finds with an overlapping name and a single 
   withdraw event when the last such port becomes unavailable.  Thus, the latter
   event filter allows the topology service to inform the application if there 
   are *any* ports of interest, while the former informs it about *all* such
   ports.
   
3) A subscription timeout value. 
 
   If the subscription is still active after the specified number of milli-
   seconds, a TIPC_SUBSCR_TIMEOUT event message is sent to the application 
   and the topology service deletes the subscription.  (The value 
   TIPC_WAIT_FOREVER can be specified if no time limit is desired.)
   
4) An 8 byte "user handle" that is application-defined.

   This value is returned to the application as part of all events associated
   with the subscription request.  Applications may find it useful to use this
   field to hold a unique subscription identifier when multiple subscription
   requests are active simultaneously.

An event message contains the following information:

1) A code indicating the type of event that has occurred.

   This may be either TIPC_PUBLISHED, TIPC_WITHDRAWN, or TIPC_SUBSCR_TIMEOUT.

2) The instance values denoting the lower and upper bounds of the port name
   sequence that overlaps the name sequence specified by the subscription.
   
   The name type value is not supplied as it is always equal to the value
   specified by the subscription request.

3) The port ID of the associated port.

4) The subscription request associated with the event.

The exchange of messages between application and topology service is entirely 
asynchronous.  The application may issue new subscription requests at any time, 
while the topology service may send event messages about these subscriptions 
to the application at any time.

The connection between the application and the topology service continues until 
the application terminates it, or until the topology service encounters an 
error that requires it to terminate the connection.  When the connection ends,
any active subscription requests are automatically cancelled by TIPC.

THINGS TO REMEMBER:
- It is not possible to limit the range of a subscription request to a specific
node, cluster, or zone by specifying a lookup domain; the topology service
always monitors the requestor's entire zone for matching port names.
- Every node in a TIPC network automatically publishes a port name of the form
{0,<Z.C.N>}, where <Z.C.N> is the node's network address.  Applications can
determine what nodes currently comprise the network, and track the subsequent
arrival and departure of nodes from the network, by creating a subscription
that tracks the publication and withdrawl of names for type 0.  The dummy
subscription example in section 5.1 below illustrates this technique.

CURRENT LIMITATIONS: 
- The only way to cancel a subscription (other than letting it time out) is to
close the connection to the topology service, thereby cancelling all 
subscriptions issued on that connection.

1.5 Message Delivery
--------------------
On the surface, message delivery in TIPC is a simple series of steps: a message
is created by a sender, TIPC carries it to the specified destination, and the 
receiver consumes the message.  And, in practice, this is exactly what happens 
most of the time.  However, there are a number of places along the way where 
things can get complicated, and in these cases it is important for application 
designers to understand exactly what TIPC will do.

The sections that follow describe the various steps performed by TIPC during the
exchange of a unicast message; the final section outlines how any differences
that occur when dealing with a multicast message. 

1.5.1 Message Creation
----------------------
The first step in sending a message is to create it.  The most common reason
TIPC is unable to create a message is because the sender passes in one or more
invalid arguments to the send routine.  The term "invalid" refers both to values
that are never acceptable under any circumstances (such as specifying a message
length greater than 66000 bytes) and to values that are not acceptable for the
current sender (such as requesting a send operation on a socket that has been 
turned into a listening socket).

Other reasons that TIPC may be unable to create a message:

- There are no more message buffers available that TIPC can use.

- The link TIPC selected to carry the message to its destination was congested
  and the sender did not want to block until the congestion cleared (see 1.5.3
  below).
  
- The peer socket on a connection was congested (i.e. had too many unconsumed
  messages in its receive queue) and the sender did not want to block until the
  congestion cleared (see 1.5.6 below).
  
In all of these cases the send operation will return a failure code indicating 
that the intended message was not sent.  If the message is created successfully
the send operation returns a success indication.

THINGS TO REMEMBER:
- If the sender specifies a destination address that does not currently exist
within the TIPC network, TIPC does *NOT* treat this as an invalid send request
(i.e. it's not the sender's fault that the destination doesn't exist).  Instead
TIPC creates the message and then "rejects" it because it is undeliverable
(see 1.5.6 below).  The return value for the send operation will indicate 
success since the message was successfully created and processed by TIPC.
 
1.5.2 Source Routing
--------------------
Once a message has been created, TIPC then determines what node the message
should be sent to.  If the specified destination address is a port ID, the
destination node is pre-determined; if the address is a port name, TIPC performs
a name table lookup to select a port (see 1.4.1 above), and then uses the node
associated with that port.  The message is then passed to a link for off-node
transmission (see 1.5.3 below) or is handed off to the destination port directly
if it is on the same node as the sender (see 1.5.5 below).

Problems that can arise during the source routine phase of message delivery:

- No matching port can be located during a name table lookup when sending by
  port name.
- No working link to the specified destination node can be found when sending
  by port ID.

In all cases of source routing failure, the message is rejected (see section
1.5.6 below).

THINGS TO REMEMBER:
- A message that specifies a port name and is sent off-node may not actually 
end up going to the port selected during the name table lookup, since the 
destination node will perform a second name table lookup when it receives the 
message (see 1.5.4 below).

1.5.3 Link Transmission
-----------------------
Once a message is given to a link for transmission to another node, the link
will normally deliver the message to that node even if problems arise.  For
example, the link will automatically detect lost messages and retransmit them,
or will re-route messages over an alternate link if it loses contact with
its peer link endpoint on the other node.  Such error recovery is possible
because TIPC keeps a copy of each outgoing message in a transmit queue until
it is notified that the message has been successfully received by the peer link
endpoint.

If a link endpoint's transmit queue grows too large because the peer link 
endpoint falls behind in acknowledging the successful arrival of messages 
(typically around 50 messages), TIPC declares "link congestion" on that link.
When a link becomes congested, the link only accepts a new message for 
transmission if it is important enough (i.e. the more important the message, 
the longer the queue is allowed to be).

Whenever a message cannot be sent because of link congestion, TIPC checks the
"source droppable" setting of the sending port.  If the setting is enabled
(indicating that the message is being sent in an unreliable manner) TIPC
discards the message, but provides no indication of this to the sender.  If the
source droppable setting is disabled (which is the default case), TIPC will
normally block the sending application until the congestion clears, and then
resume the send operation; however, if the application has requested a non-
blocking send, the application will not block when link congestion occurs and
the send operation returns a failure indication.

In the event that a link to a destination node fails and there are no other
links available that can be used to re-route traffic, any messages in the link's
transmit queue are simply discarded.  The messages are *not* rejected (and
potentially returned to their originating ports) because TIPC does not know
whether or not they were successfully delivered.

1.5.4 Destination Routing
-------------------------
Once a message arrives at the specified destination node over a link, TIPC then
determines what port it should be sent to on that node.

If the specified destination address is a port ID, the destination port is 
pre-determined; if no such port exists the message is considered undeliverable 
and rejected (see 1.5.6 below).

If the destination address is a port name, TIPC performs a name table lookup 
and selects a port (see 1.4.1 above).  If no such port exists TIPC repeats
the source routing operation and tries to send the message to another node;
if no such node can be found, or if the message has been previously re-routed
too many times, the message is considered undeliverable and rejected (see 1.5.6
below).

1.5.5 Message Consumption
-------------------------
When a message (finally!) reaches the destination port it is either consumed
immediately (if the controlling application is using the native API) or added
to a receive queue (if the controlling application is using the socket API).
In the latter case, the message typically remains in the socket's receive queue
until it is received by the application that owns the socket.  Queued messages
are consumed by the application in a FIFO manner, and once the contents of a
message have been passed to the application the message is discarded.
 
If an application terminates access to the socket (using either the close() or 
shutdown() APIs) before all messages in the receive queue are consumed, all
unconsumed messages are considered undeliverable and are rejected (see 1.5.6
below).

It is very important that TIPC applications be engineered to consume their 
incoming messages at a rate that prevents them from accumulating in large 
numbers in any socket receive queue.  Failure to do so can result in TIPC
declaring either "port congestion" or "socket congestion".

Port congestion can occur once more than 512 messages have accumulated in the 
receive queue of a connection-oriented socket.  Once this is detected, TIPC
may block the peer socket from sending messages until the congestion clears
(see 1.5.1 above) or, if the sender is sending in an unreliable manner, cause 
such messages to be discarded (see 1.5.3 above).

Socket congestion can occur once TIPC detects that too many unreceived messages
exist on a node or on an individual socket.  More precisely, a node can have
up to 5000 messages sitting in socket receive queues before congestion handling 
kicks in; once this happens, low importance messages will be rejected (see
1.5.6 below) but higher importance messages will continue to be accepted.
Medium importance messages get screened out once the number of pending messages
hits 10000, and high priority messages at 500,000 messages; critical priority 
messages are always accepted.  Similar congestion handling occurs on a per-
socket basis, but the thresholds are one half the global threshold values (i.e.
at 2500, 5000, and 250,000).

Since the impact of socket congestion is more significant for a connection-
oriented socket than port congestion (i.e. it terminates the connection), 
the smaller port congestion threshold has been chosen so that it will normally
kick in first and prevent the socket receive queue from growing larger.  
However, the existence of the per-node socket congestion threshold means that 
it is possible for socket congestion to occur before port congestion occurs.

NOTE: These message congestion thresholds may be more configurable in future 
releases of TIPC since it's not really realistic to have a one-size-fits-all 
solution that will work well on a wide variety of hardware configurations (i.e.
a resource-constrained DSP will probably need lower thresholds than a resource-
rich Linux box).

1.5.6 Message Rejection
-----------------------
When a message is "rejected" because it cannot be delivered, TIPC checks the
message's "destination droppable" setting to see what the sender wanted done
with the message.

If the destination droppable setting is enabled, TIPC simply discards the 
message.  This setting is the default for messages sent using a connectionless
socket, and was chosen to simplify the job of porting applications written for
UDP to use TIPC.

If the destination droppable setting is disabled, TIPC attempts to send the
first 1024 bytes of the message back to the message originator; such a message
is called a "returned message".  An error code is also incorporated into each
returned message to allow the sender to determine why the message was returned.
In the case of a connection-oriented message, the return of an undeliverable
message also causes the connection to be terminated at both ends.  By default,
the destination droppable setting is disabled for messages sent using
connection-oriented sockets; this decision was made to simplify the job of
porting applications written for TCP to use TIPC.

NOTE: In some instances, TIPC may be unable to return an undeliverable message
to the message sender, even though the message's destination droppable setting
is disabled.  For example, if message cannot be delivered to its destination
because of congestion within the TIPC network, the same congestion may also
prevent TIPC from returning the message to the originator.

TIPC's socket API has been designed so that applications that don't want to
concern themselves with returned messages can easily ignore them.  However,
the ability for a sending application to examine returned messages can be
helpful in debugging problems during the design and testing of an new TIPC
application.

THINGS TO REMEMBER:
- The returned message capability of TIPC must NOT be used by a sending 
application to determine what messages were successfully consumed by the
receiving application!  While the return of a message indicates that the
receiver did not consume the message, the non-return a message does not
indicate that is was successfully consumed.  (For example, if a destination
node suffers a power failure, TIPC will be unable to return any messages that
are sitting unprocessed in a socket receive queue.)  The only way for the
sending application to know that a message was consumed is for it to receive
an explicit acknowledgement message generated by the receiving application.

1.5.7 Multicast Message Delivery
--------------------------------
Multicast message creation is done the same way as for unicast messages, but
since multicasting is always done in a connectionless manner it is not possible
for peer port congestion to occur.

Multicast source routing always involves a name table lookup of a port name
sequence.  If no port within the cluster overlaps the specified name sequence
the message is simply discarded.  Otherwise, TIPC sends a copy of the message 
to each overlapping port on the sending node and also determines if any off-node
ports have an overlap; if there is at least one such port then the messsage is
passed to a special multicast link.

The multicast link operates much like a regular unicast link, except that it
sends its messages to *all* nodes in the cluster rather than just one, and has
a smaller congestion threshold (around 20 messages).   

Whenever the multicast link delivers the message to a node, TIPC repeats the 
name table lookup and sends a copy of the message to all overlapping ports it
finds on that node; if there are no such ports the message is discarded.  Once
a multicast message arrives at a destination port, it is treated just like a
unicast message and is subject to the same socket congestion and message
rejection handling.  

THINGS TO REMEMBER:
- TIPC currently does not permit an application to send a multicast message
with the "destination droppable" setting disabled; consequently, TIPC will
never try to return an undeliverable multicast message to its sender.

1.6 Routines
------------
The following utility routines are available to programmers:

  tipc_addr( ) - combine zone, cluster, and node numbers into a TIPC address
  tipc_cluster( ) - take a TIPC network address and return the cluster number
  tipc_node( ) - take a TIPC address and return the node number
  tipc_zone( ) - take a TIPC address and return the zone number

Further information about each of these routines is provided in the following 
sections.

1.6.1 tipc_addr
---------------
u32 tipc_addr(unsigned int zone, unsigned int cluster, unsigned int node)

This routine takes individual zone, cluster, and node numbers and combines 
them into a 32-bit TIPC network address. 

1.6.2 tipc_cluster
------------------
unsigned int tipc_cluster(u32 addr)

This routine takes a 32-bit TIPC network address and returns the cluster 
number contained in the address. 

1.6.3 tipc_node
---------------
unsigned int tipc_node(u32 addr)

This routine takes a 32-bit TIPC network address and returns the node number 
contained in the address. 

1.6.4 tipc_zone
---------------
unsigned int tipc_zone(u32 addr)

This routine takes a 32-bit TIPC network address and returns the zone number 
contained in the address. 


2. Socket API
-------------
The TIPC socket API allows programmers to access the capabilities of TIPC 
using the well-known socket paradigm.

IMPORTANT:
TIPC does not support all socket API routines available with other socket-
based protocols, nor does it support all possible capabilities for the 
routines that are provided.  Likewise, certain new capabilities provided by 
TIPC have been made available by adapting the socket API to accommodate them.

Programmers who have used the socket API with other protocols are strongly 
advised to read this section carefully to ensure they understand where the 
TIPC socket API differs from their previous experiences.

GENERAL LIMITATIONS:
TIPC's socket API is currently tailored for use by single-threaded applications;
consequently, if multiple threads of control try to perform I/O operations on a
given socket it is possible that some threads may become blocked unexpectedly.
Most significantly, if there is a thread blocked trying to read from a socket,
any other thread that writes to the socket will block until the read operation
is completed.  Until this situation is rectified, multi-threaded applications
should use select() or poll() to test whether a socket is ready for reading 
before beginning a blocking read operation.

2.1 Routines
------------
The following socket API routines are available to programmers:

  accept( )      - accept a new connection on a socket
  bind( )        - bind or unbind a TIPC name to the socket
  close( )       - close the socket
  connect( )     - connect the socket
  getpeername( ) - get the port ID of the peer socket
  getsockname( ) - get the port ID of the socket
  getsockopt( )  - get the value of an option for the socket
  listen( )      - listen for socket connections
  poll( )        - input/output multiplexing
  recv()         - receive a message from the socket 
  recvfrom()     - receive a message from the socket 
  recvmsg( )     - receive a message from the socket
  send()         - send a message on the socket
  sendmsg()      - send a message on the socket
  sendto()       - send a message on the socket
  setsockopt( )  - set the value of an option for the socket
  shutdown( )    - shut down socket send and receive operations
  socket( )      - create an endpoint for communication

Further information about each of these routines can be found in the 
following sections.

2.1.1 accept
------------
int accept(int sockfd, struct sockaddr *cliaddr, socklen_t *addrlen)

Accepts a new connection on a socket.

Comments:
- If non-NULL, "cliaddr" is set to the port ID of the peer socket.  This info 
can be used to perform basic validation of the connection requestor's 
identity (eg. disallow connections originating from certain network nodes).

2.1.2 bind
----------
int bind(int sockfd, const struct sockaddr *myaddr, socklen_t addrlen)

Binds or unbinds a TIPC name (or name sequence) to the socket.  A bind 
operation is requested by setting "myaddr->scope" to TIPC_NODE_SCOPE, 
TIPC_CLUSTER_SCOPE, or TIPC_ZONE_SCOPE, as appropriate.  An unbind operation 
is requested by setting "myaddr->scope" to arithmetic inverse of the scope 
used when the name was bound (eg. -TIPC_NODE_SCOPE).  Specifying zero for 
"addrlen" unbinds all names and name sequences currently bound to the socket.

Comments:
- It is legal to bind more than one TIPC name or name sequence to a socket.
- If a socket is currently connected to a peer it is considered to be 
unavailable to serve other clients and cannot use bind() to bind an 
additional TIPC name to itself.
- bind() cannot be used to changed the port ID of a socket.

2.1.3 close
-----------
int close(int)

Closes the socket.

Comments:
- Any unprocessed messages remaining in the socket's receive queue are 
rejected (i.e. discarded or returned), as appropriate.

2.1.4 connect
-------------
int connect(int sockfd, const struct sockaddr *servaddr, socklen_t addrlen)

Attempts to make a connection on the socket using TIPC's explicit handshake 
mechanism.

Comments:
- If "servaddr" is set to a TIPC name (but not a TIPC name sequence or a TIPC 
port ID), the "addr.name.domain" field can be used to affect the name lookup 
process.  The "scope" field of "servaddr" is ignored by connect().
- If a socket has a name bound to it, it is considered to be a server and 
cannot use connect() to initiate a connection as a client.
- TIPC does not support the use of connect() with connectionless sockets. 
(POSIX non-conformity)
- TIPC does not support the non-blocking form of connect(); use sendto() to
establish connections using implicit handshaking to achieve this effect.
(POSIX non-conformity)

2.1.5 getpeername
-----------------
int getpeername(int sockfd, struct sockaddr *peeraddr, socklen_t *addrlen)

Gets the port ID of the peer socket.

Comments:
- The use of "name" in getpeername() can be confusing, as the routine does 
not actually return the TIPC names or name sequences that have been bound to 
the peer socket.

2.1.6 getsockname
-----------------
int getsockname(int sockfd, struct sockaddr *localaddr, socklen_t *addrlen)

Gets the port ID of the socket.

Comments:
- The use of "name" in getsockname() can be confusing, as the routine does 
not actually return the TIPC names or name sequences that have been bound to 
the socket.

2.1.7 getsockopt
----------------
int getsockopt(int sockfd, int level, int optname, 
               void *optval, socklen_t *optlen)

Gets the current value of a socket option.  For a description of the options 
supported when "level" is SOL_TIPC, see the description for setsockopt() 
below.

Comments:
- TIPC does not currently support socket options for level SOL_SOCKET, such 
as SO_SNDBUF.
- TIPC does not currently support socket options for level IPPROTO_TCP, such 
as TCP_MAXSEG.  Attempting to get the value of these options on a SOCK_STREAM 
socket returns the value 0.

2.1.8 listen
------------
int listen(int sockfd, int backlog)

Enables a socket to listen for connection requests.

Comments:
- The "backlog" parameter is currently ignored.

2.1.9 poll
----------
int poll(struct pollfd *fdarray, unsigned long nfds, int timeout)

Indicates the readiness of the specified TIPC sockets for I/O operations, 
using the standard poll() mechanism.

TIPC currently sets the returned event flags as follows:

a) POLLRDNORM and POLLIN are set when the socket's receive queue is non-
empty. (They are also set for a connection-oriented, non-listening socket 
whose connection has been terminated or has not yet been initiated, since a 
receive operation on such a socket will fail immediately.)                                  
b) POLLOUT is set except when a socket's connection has been terminated.
c) POLLHUP is set when a socket's connection has been terminated.                                    

Comments:
- It is important to realize that the poll() bits indicate that the 
associated input/output operation will not block, NOT that the operation will 
be successful!
- The POLLOUT event flag cannot be used in isolation to guarantee that a send 
operation performed on the socket will not block.  Since outgoing messages 
are queued on a per-link basis rather than a per-socket basis, sending to a 
destination that is routed through link A may be blocked while sending to a 
destination that is routed through link B may not be blocked.  To ensure that 
an application does not blocked during a send operation, it should use the 
MSG_DONTWAIT flag or set the socket for nonblocking I/O using fcntl().
- A socket that is connecting asynchronously is considered writeable, since 
attempting a second send operation during an implied connection setup will 
immediately fail. (POSIX non-conformity)

2.1.10 recv
-----------
ssize_t recv(int sockfd, void *buff, size_t nbytes, int flags)

Attempts to receive a message from the socket.

Comments:
- When used with a connectionless socket, a return value of 0 indicates the 
return of an undelivered data message that was originally sent by this 
socket.
- When used with a connection-oriented socket, a return value of 0 or -1 
indicates connection termination.  A value of 0 indicates that the connection 
was terminated by the peer using shutdown(); connection termination by any 
other means causes a return value of -1.
- Applications can determine the exact cause of connection termination and/or 
message non-delivery by using recvmsg() instead of recv().
- TIPC supports the MSG_PEEK flag when receiving, as well as the MSG_WAITALL 
flag when receiving on a SOCK_STREAM socket; all other flags are ignored.

2.1.11 recvfrom
---------------
ssize_t recvfrom(int sockfd, void *buff, size_t nbytes, int flags,
                 struct sockaddr *from, socklen_t *addrlen)

Attempts to receive a message from the socket.  If successful, the port ID of 
the message sender is returned in "from".

Comments:
- See the comments section for recv().

2.1.12 recvmsg
--------------
ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags)

Attempts to receive a message from the socket.  If successful, the port ID of 
the message sender is captured in the "msg_name" field of "msg" (if non-NULL) 
and ancillary data relating to the message is captured in the "msg_control" 
field of "msg" (if non-NULL).

The following ancillary data objects may be captured:

1) TIPC_ERRINFO - The TIPC error code associated with a returned data message 
or a connection termination message, and the length of the returned data.  (8 
bytes: error code + data length)
2) TIPC_RETDATA - The contents of a returned data message, up to a maximum of 
1024 bytes.
3) TIPC_DESTNAME - The TIPC name or name sequence that was specified by the 
sender of the message.  (12 bytes: type + lower instance + upper instance; 
the latter two values are the same for a TIPC name, but may differ for a name 
sequence)

Each of these objects is only created where relevant.  For example, receipt 
of a normal data message never creates the TIPC_ERRINFO and TIPC_RETDATA 
objects, and only creates the TIPC_DESTNAME object if the message was sent 
using a TIPC name or name sequence as the destination rather than a TIPC port 
ID.  Those objects that are created will always appear in the relative order 
shown above.

If ancillary data objects capture is requested (i.e. "msg->msg_control" is 
non-NULL) but insufficient space is provided, the MSG_CTRUNC flag is set to 
indicate that one or more available objects were not captured.

Comments:
- When used with a connectionless socket, a return value of 0 indicates the 
arrival of a returned data message that was originally sent by this socket.
- When used with a connection-oriented socket, a return value of 0 or -1 
indicates connection termination.  The exact return value upon connection 
termination is influenced by the "msg_control" field of "msg".  If 
"msg_control" is NULL, a return value of 0 indicates that the connection was 
terminated by the peer using shutdown(); connection termination by any other 
means causes a return value of -1.  If "msg_control" is non-NULL, a return 
value of 0 is always used; the application must examine the TIPC_ERRINFO 
object to determine if the connection was explicitly terminated by the peer.  
(POSIX non-conformity)
- When used with connection-oriented sockets, TIPC_DESTNAME is captured for 
each data message received by the socket if the connection was established 
using a TIPC name or name sequence as the destination address.  Note: There is
currently no way for the destination socket to capture TIPC_DESTNAME following 
accept() until the originator sends a data message.
- TIPC supports the MSG_PEEK flag when receiving, as well as the MSG_WAITALL 
flag when receiving on a SOCK_STREAM socket; all other flags are ignored.

2.1.13 send
-----------
ssize_t send(int sockfd, const void *buff, size_t nbytes, int flags)

Attempts to send a message from the socket to its peer socket.

Comments:
- send() should not be used until a connection has been fully established 
using either explicit or implicit handshaking.
- TIPC supports the MSG_DONTWAIT flag when sending; all other flags are 
ignored.

2.1.14 sendmsg
--------------
ssize_t sendmsg(int sockfd, struct msghdr *msg, int flags)

Attempts to send a message from the socket to the specified destination.  If 
the destination is denoted by a TIPC name or a port ID the message is unicast 
to a single port; if the destination is denoted by a TIPC name sequence the 
message is multicast to all ports having a TIPC name or name sequence that 
overlaps the destination name sequence.

Comments:
- See the comments section for sendto().
- TIPC does not currently support the use of ancillary data with sendmsg().

2.1.15 sendto
-------------
ssize_t sendto(int sockfd, const void *buff, size_t nbytes, int flags,
               const struct sockaddr *to, socklen_t addrlen)

Attempts to send a message from the socket to the specified destination.  If 
the destination is denoted by a TIPC name or a port ID the message is unicast 
to a single port; if the destination is denoted by a TIPC name sequence the 
message is multicast to all ports having a TIPC name or name sequence that 
overlaps the destination name sequence.

Comments:
- If the destination address is a TIPC name the "addr.name.domain" field 
indicates the search domain used during the name lookup process.  (In contrast, 
if the destination address is a TIPC name sequence the default "closest first" 
algorithm is always used; if it is a TIPC port ID no name lookup occurs.)
The "scope" field of the destination address is always ignored when sending.
- TIPC supports the MSG_DONTWAIT flag when sending; all other flags are 
ignored.
- A connection-oriented socket that is unconnected can initiate connection 
establishment using implicit handshaking by simply sending a message to a 
specified destination, rather than using connect().  However, the connection 
is not fully established until the socket successfully receives a message 
sent by the destination using recv(), recvfrom(), or recvmsg().

2.1.16 setsockopt
-----------------
int setsockopt(int sockfd, int level, int optname, 
               const void *optval, socklen_t optlen)

Sets a socket option to the specified value.  Currently, the following values 
of "optname" are supported when "level" is SOL_TIPC:

1) TIPC_IMPORTANCE
This option governs how likely a message sent by the socket is to be affected 
by congestion.  A message with higher importance is less likely to be delayed 
or dropped due to link congestion, and also less likely to be rejected due to 
receiver congestion.  The following values are defined: TIPC_LOW_IMPORTANCE, 
TIPC_MEDIUM_IMPORTANCE, TIPC_HIGH_IMPORTANCE, and TIPC_CRITICAL_IMPORTANCE.

By default, TIPC_LOW_IMPORTANCE is used for all TIPC socket types.

2) TIPC_SRC_DROPPABLE
This option governs the handling of messages sent by the socket if link 
congestion occurs.  If enabled, the message is discarded; otherwise the 
system queues the message for later transmission.

By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, and 
SOCK_RDM socket types (resulting in "reliable" data transfer), and enabled 
for SOCK_DGRAM (resulting in "unreliable" data transfer).

3) TIPC_DEST_DROPPABLE
This option governs the handling of messages sent by the socket if the 
message cannot be delivered to its destination, either because the receiver 
is congested or because the specified receiver does not exist.  If enabled, 
the message is discarded; otherwise the message is returned to the sender.

By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM socket 
types, and enabled for SOCK_RDM and SOCK_DGRAM.  This arrangement ensures 
proper teardown of failed connections when connection-oriented data transfer 
is used, without increasing the complexity of connectionless data transfer.

4) TIPC_CONN_TIMEOUT
This option specifies the number of milliseconds connect() will wait before 
aborting a connection attempt because the destination has not responded.  By 
default, 8000 (i.e. 8 seconds) is used.

This option has no effect when establishing connections using sendto().

Comments:
- TIPC does not currently support socket options for level SOL_SOCKET, such 
as SO_SNDBUF.
- TIPC does not currently support socket options for level IPPROTO_TCP, such 
as TCP_MAXSEG.  Setting these options on a SOCK_STREAM socket has no effect.

2.1.17 shutdown
---------------
int shutdown(int sockfd, int howto)

Shuts down socket send and receive operations on a connection-oriented 
socket.  The socket's peer is notified that the connection was deliberately 
terminated by the application (by means of the TIPC_CONN_SHUTDOWN error 
code), rather than as the result of an error.

Comments:
- Applications should normally call shutdown() to terminate a connection 
before calling close().
- TIPC does not support partial shutdown of a connection; attempting to shut 
down either send or receive operations always shuts down both.
- A socket that has been shutdown() cannot be re-used for a new connection; 
this prevents any "stale" incoming messages from an earlier connection from 
interfering with the new connection.

2.1.18 socket
-------------
int socket(int family, int type, int protocol)

Creates an endpoint for communication.

TIPC currently supports the following values for "type":

a) SOCK_DGRAM - for unreliable connectionless messages
b) SOCK_RDM - for reliable connectionless messages
c) SOCK_SEQPACKET - for reliable connection-oriented messages
d) SOCK_STREAM - for reliable connection-oriented byte streams

Comments:
- The "family" parameter should always be set to AF_TIPC.
- The "protocol" parameter should always be set to 0.

2.2 Examples
------------
A variety of demo programs can be found at http://sourceforge.net/projects/tipc,
which may be useful in understanding how to write an application that uses TIPC.


3. Native API
-------------
The TIPC native API allows programmers to access the capabilities of TIPC 
in a more direct manner than with the socket API.

Benefits of native API:

1. Low-level operation can lead to faster execution speed.
2. Can exclude socket code from system to reduce object code size.

Limitations of native API:

1. Not available to user-space applications.
2. Low-level operation places a greater burden on programmer.

3.1 Concepts
------------
There are a number of important conceptual differences between programming
with the native API and programming with the socket API.  Understanding these
concepts is an essential pre-requisite for using the native API effectively.

Ports:
The fundamental communication endpoint of the native API is a "port", which
operates at a much more primitive level than a socket.  Applications using
TIPC ports are sometimes required to deal with aspects of the TIPC protocol
that were hidden by the socket API, including handling undeliverable messages
that are returned to the sending port and managing the handshaking required
to set up and tear down port-to-port connections.

Port reference:
Every TIPC port has a unique "reference" value, which is analogous to the
file descriptor value that is associated with a socket.  Native API routines
that manipulate ports use the port reference argument to identify the port,
rather than a pointer to the actual port data structure; this allows TIPC to
gracefully handle cases where an application inadvertently attempts to utilize
a port that no longer exists.

User registration:
TIPC allows an application using the native API to register as a "user", and
assigns it a user identifier.  If this user identifier is provided by the
application when it creates a port, TIPC will delete the port automatically
if the application later deregisters itself.  This feature can simplify things
for a programmer whose application uses a constantly changing set of ports,
since TIPC takes care of deleting all ports currently in use by the application
when the application terminates.  (Applications not wishing to take advantage
of this capability can skip the optional registration process entirely and
simply create their ports anonymously using a user identifier value of 0.)

Sending messages:
Applications can send messages using the native API in much the same way as
with the socket API.  The message can be specified either as a set of one or
more byte arrays (using the "iovec" structure) or as a socket buffer (using
the "sk_buff" structure), as long as it does not exceed TIPC's 66000 byte limit
on message size.  The latter form can improve performance by eliminating the
need for TIPC to copy the data into a socket buffer, but for best results
the application that creates the buffer should reserve 80 bytes of headroom
to allow a TIPC message header and data link header to be prepended easily.

Receiving messages:
The native API does not provide any synchronous mechanism for receiving messages
sent to a port.  (That is, there is no equivalent of the recv(), recvfrom(), or
recvmsg() routines that the socket API provides.)  Instead, an application
specifies a set of message handling callback routines when it creates a port;
TIPC then invokes the appropriate routine each time a message is received by
the port.

Individual callback routines may be specified to handle:

1) a direct message              (i.e. one sent to a port ID)
2) a named message               (i.e. one sent to a port name or name sequence)
3) a connection message          (i.e. one sent on an established connection)
4) an errored direct message     (i.e. a direct message that was returned)
5) an errored named message      (i.e. a named message that was returned)
6) an errored connection message (i.e. a connection message that was returned)

An application only needs to supply callback routines for the messages that the
port actually needs to handle.  If TIPC receives a message for which no callback
routine has been specified, it automatically rejects the message (or, in the
case of an errored message, discards it).
  
Since the callback routine executes in a TIPC kernel thread, rather than one
of the application's threads, the programmer must be prepared to handle any 
critical section issues that arise between the various threads.  Alternatively,
the callback routine can transfer responsibility to an application thread
(as outlined in section 3.3.3 below), thereby allowing the application to
emulate a synchronous receive capability of its own.

3.2 Routines
------------
The native API routines listed below are available to programmers.  More
detail about the arguments and return value for each of these routines can be
found by looking at the function prototypes in tipc.h.  In many cases the
use of the routine will be obvious.  Unfortunately, a comprehensive description
of each routine is not currently available.  (Feel free to write one!)  You
can also consult the examples section below and/or the source code for each 
routine to learn more about what these routines do and how to use them.

  WARNING!  The native API is still under development at this time
  WARNING!  and has not been finalized.  Expect changes in future
  WARNING!  versions of TIPC.

  /* TIPC operating mode routines */

  tipc_get_addr() - get <Z.C.N> of own node
  tipc_get_mode() - get TIPC operating mode
  tipc_attach()   - register application as a TIPC user
  tipc_detach()   - deregister TIPC user & free all associated ports

  /* TIPC port manipulation routines */

  tipc_createport()           - create a TIPC port & generate reference
  tipc_deleteport()           - delete a TIPC port & obsolete reference
  tipc_ref_valid()            - determine if port reference is valid
  tipc_ownidentity()          - get port ID of port
  tipc_set_portimportance()   - set port traffic importance level
  tipc_portimportance()       - get port traffic importance level
  tipc_set_portunreliable()   - set port traffic "source droppable" setting
  tipc_portunreliable()       - get port traffic "source droppable" setting
  tipc_set_portunreturnable() - set port traffic "destination droppable" setting
  tipc_portunreturnable()     - get port traffic "destination droppable" setting
  tipc_publish()              - bind name/name sequence to port
  tipc_withdraw()             - unbind name/name sequence from port
  tipc_connect2port()         - associate port with peer
  tipc_disconnect()           - disassociate port with peer
  tipc_shutdown()             - shut down connection to peer & disassociate
  tipc_isconnected()          - determine if port is currently connected
  tipc_peer()                 - get port ID of peer port

  /* TIPC messaging routines */

  tipc_send()             - send iovec(s) on connection
  tipc_send_buf()         - send sk_buff on connection
  tipc_send2name()        - send iovec(s) to port name
  tipc_send_buf2name()    - send sk_buff to port name
  tipc_send2port()        - send iovec(s) to port ID
  tipc_send_buf2port()    - send sk_buff to port ID
  tipc_multicast()        - multicast iovec(s) to port name sequence

  tipc_forward2name()     - [may be obsoleted] 
  tipc_forward_buf2name() - [may be obsoleted]
  tipc_forward2port()     - [may be obsoleted]
  tipc_forward_buf2port() - [may be obsoleted]
  
  /* TIPC subscription routines */

  tipc_ispublished()     - determines if a specific name has been published
  tipc_available_nodes() - [likely to be obsoleted]
  
3.3 Examples
------------

3.3.1 Basic port operations 
---------------------------
Create a port:

    static u32 port_ref;

    tipc_createport(0, NULL, TIPC_LOW_IMPORTANCE,
		    NULL, NULL, NULL,
		    NULL, named_msg_event, NULL,
		    NULL, &port_ref);

Bind the name {100,123} with "cluster" scope to the port:

    struct tipc_name_seq seq;
    
    seq.type  = 100 ;
    seq.lower = 123 ;
    seq.upper = 123 ;
    tipc_publish(port_ref, TIPC_CLUSTER_SCOPE, &seq);

Process messages sent to port {100,123}:

    /* Note: This callback routine was specified during port creation above */
    
    static void named_msg_event(void *usr_handle,
			        u32 port_ref,
			        struct sk_buff **buf,
			        unsigned char const *data,
			        unsigned int size,
			        unsigned int importance, 
			        struct tipc_portid const *orig,
			        struct tipc_name_seq const *dest)
    {
	/* 'data' points to message content, 'size' indicates how much */

	printk("%s", data);

	/* can send reply message(s) back to originator, if desired */

	struct iovec my_iov;
	char reply_info[30];
	
	strcpy(reply_info, "here is the reply");
	my_iov.iov_base = reply_info;
	my_iov.iov_len = strlen(reply_info) + 1;
	tipc_send2port(port_ref, orig, 1, &my_iov);
	
	/* TIPC discards the received message upon exit */
    }

Delete the port:

    tipc_deleteport(port_ref);
   
3.3.2 TIPC user registration 
----------------------------
Register TIPC user:

    static u32 user_ref;
    
    tipc_attach(&user_ref, NULL, NULL);
    
Create port and associate with registered TIPC user:

    static u32 port_ref;

    tipc_createport(user_ref, NULL, TIPC_LOW_IMPORTANCE,
		    NULL, NULL, NULL,
		    NULL, named_msg_event, NULL,
		    NULL, &port_ref);

Deregister TIPC user (and all associated ports):

    tipc_detach(user_ref);
   
3.3.3 Synchronous message receive 
---------------------------------
Application thread:

    /* Initialize data structures */
    
    struct sk_buff_head message_q;
    wait_queue_head_t wait_q;
        
    skb_queue_head_init(&message_q);
    init_waitqueue_head(&wait_q);

    /* Wait for messages; process & discard each one in turn */
    
    while (1) {
        struct sk_buff *skb;

        if (wait_event_interruptible(&wait_q, (!skb_queue_empty(&message_q))))
	    continue;

	skb = skb_dequeue(&message_q);
    
        < ... Process message as required ... >

	kfree_skb(skb);
    }

Callback routine converts asynchronous receive into synchronous receive:

    static void named_msg_event(void *usr_handle,
			        u32 port_ref,
			        struct sk_buff **buf,
			        unsigned char const *data,
			        unsigned int size,
			        unsigned int importance, 
			        struct tipc_portid const *orig,
			        struct tipc_name_seq const *dest)
    {
	/* Add message to queue of unprocessed messages */

	skb_queue_tail(&message_q, *buf);
	
	/* Tell TIPC *not* to discard the received message upon exit */
	
	*buf = NULL;
	
	/* Wake up application */

        wake_up_interruptible(&wait_q);
    }

3.3.4 More examples 
-------------------
Demo programs utilizing the native API can be found at http://tipc.sf.net.

In addition, the TIPC source code itself contains a couple of sections that
utilize the native API just like an application might:

1) tipc_cfg_init() in net/tipc/config.c

This file contains the TIPC configuration service (using port name {0,<Z.C.N>},
which handles messages sent by the tipc-config application.  It utilizes a very
simple connectionless request-and-reply approach to messaging.

2) tipc_subscr_start() in net/tipc/subscr.c

This file contains the TIPC topology service (using port name {1,1}), which
handles subscription requests from applications and returns subscription events.
It demonstrates the correct way to handle connection establishment (both
explicit and implied) and tear down (both self-initiated and peer-initiated). 


4. FAQ
------
This section contains the answers to frequently asked questions.

4.1 How can I determine what node a socket is running on?
---------------------------------------------------------
Use getsockname() to determine the port ID of the socket, then examine the
"node" field to determine the network address its node.  For example, if
getsockname returns a port ID of <1.1.19:1234567> then the node field will
have the value 0x01001013 (representing network address <1.1.19>).

4.2 How can I cancel a subscription?
------------------------------------
The only way to cancel a subscription (other than letting it time out) is to
close the connection to the topology service, thereby cancelling all 
subscriptions issued on that connection.  The ability to cancel a single 
subscription will be added in an upcoming release.

4.3 When should I use an implied connect instead of an explicit connect?
------------------------------------------------------------------------
The simplest approach is to assume that you will be using an explicit connect
and design your code accordingly.  If you end up with a connect() followed by
a send routine followed by a receive routine, then you should be able to combine
the connect() and send routine into a sendto(), thereby saving yourself the
overhead of an additional system call and the exchange of empty handshaking
messages during connection establishment.  On the other hand, if you end up with
a connect() followed by a receive, or a connect() followed by two sends, then
you can't use the implied connect approach.


5. Tips and Techniques
----------------------
This section illustrates some techniques for using TIPC that may be of interest
to programmers when designing applications using TIPC.

5.1 Dummy Subscriptions
-----------------------
It is sometimes useful to issue a "dummy" subscription to the TIPC topology
server -- that is, a subscription that has a time limit but will never match
any published name.  Such a subscription will never generate any publish or 
withdraw events, but will generate a timeout event when it expires.

For example, suppose an application wishes to report what nodes are present
in the network every minute.  This can be accomplished as follows:

        connect to TIPC topology server
        send subscription for {0,1,2^^31-1}, time limit = none
        send subscription for {1,0,0}, time limit = 60 seconds
	set set of available nodes to "empty"
        loop
                receive event
                if (event == publish)
                        add node to set of available nodes
                else if (event == withdraw)
                        remove node from set of available nodes
                else
                        send subscription for {1,0,0}, time limit = 60 seconds
                        print list of available nodes
        end loop
        
The use of a dummy subscription having the name sequence {1,0,0} allows the
application to print the desired information at required time without having
to use additional timers or threads of control, and without having to deal with
the complexities of determining how long to continue waiting when the main
thread is awoken to process a publish or withdraw event.  The name sequence
specified by the dummy subscription can be anything that the application
designer knows will not generate any publish or withdraw events, and need not
be {1,0,0}.

Note: This code ignores processing overhead that will result in each successive
display occurring slightly more than 60 seconds after the previous one.  This
could be compensated for by measuring the overhead and subtracting it from the
specified time limit each time the dummy subscription is re-issued.

[END OF DOCUMENT]