1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633
|
OpenSM Release Notes 3.2
=============================
Version: OpenSM 3.2.x
Repo: git://git.openfabrics.org/~sashak/management.git
Date: May 2009
1 Overview
----------
This document describes the contents of the OpenSM 3.2 release.
OpenSM is an InfiniBand compliant Subnet Manager and Administration,
and runs on top of OpenIB. The OpenSM version for this release
is opensm-3.2.5
This document includes the following sections:
1 This Overview section (describing new features and software
dependencies)
2 Known Issues And Limitations
3 Unsupported IB compliance statements
4 Bug Fixes
5 Main Verification Flows
6 Qualified Software Stacks and Devices
1.1 Major New Features
* Cached Routing
OpenSM provides an optional unicast routing cache (enabled by '-A' or
'--ucast_cache' options). When enabled, unicast routing cache prevents
routing recalculation (which is a heavy task in a large cluster) when
there was no topology change detected during the heavy sweep, or when
the topology change does not require new routing calculation, e.g. when
one or more CAs/RTRs/leaf switches going down, or one or more of these
nodes coming back after being down.
* Routing Chaining
Routing chaining is the ability to configure the order in which routing
algorithms are applied in opensm, i.e. '-R ftree,updn,minhop' - try
using ftree routing. If ftree fails, try updn. If updn fails, try
minhop.
* IPv6 Solicited Node Multicast addresses consolidation
When this mode is used (enabled with --consolidate_ipv6_snm_req option)
OpenSM will map all IPv6 Solicited Node Multicast address join requests
into a single Multicast group with address ff10:601b::1:ff00:0. In this
way limited MLID space is saved. This IBA noncompliant feature is very
useful with large (~> 1024 nodes) clusters.
* OpenSM sweep state machine rework
Huge and buggy OpenSM sweep state machine was fully rewritten in safer
and more effective synchronous manner.
* Multi lid routing balancing for updn/minhop routing algorithms
When LMC > 0 is used OpenSM will ensure to generate routing paths via
different switches and when possible chassis.
* Preserve base lid routes when LMC > 0
When LMC > 0 is used OpenSM will preserve routing paths for base lids
as it would be with LMC = 0. In this way traffic on each LID level is
not affected by LMC changes.
* Ordered routing paths balancing
This adds ability to predefine the port order in which routing paths
balancing is performed by OpenSM. Helps to improve performance
dramatically (40-50%) for applications with known communication
pattern. Activated with --guid_routing_order_file command line option.
* Unified OpenSM configuration
Now there is "conventional" config file instead of hidden option cache
file (opensm.opts). OpenSM will find this in a default place (consult
man page for exact value) or the file name can be specified with '-F'
command line option. Also there is an option ('-c') to generate config
file template.
* Query remote SMs during light sweep
Master OpenSM will query remote standby SMs periodically to catch its
possible state changes and react accordingly (as required by IBA spec).
* Predefined port ids for Up/Down algorithm
This is useful as Up/Down fine tuning tool - the algorithm will use
predefined port IDs instead of GUIDs for its decision about direction.
Activated with --ids_guid_file command line option.
* Improved plugin API version 2.
Now OpenSM will provide to plugins the access to all data structures.
This make it possible to implement powerful multi purpose plugins. All
OpenSM header files are installed now and specific configuration/build
options are exported via generated osm_config.h header file.
* Many code improvements, optimizations and cleanups
* Automatic daily snapshots generation.
This is is not a "feature", but simplifies the access to recent OpenSM
bits.
1.2 Minor New Features:
* Cleanup cl_qlock_pool memory allocator - speedup memory allocations
* Support for configurable (via OSM_UMAD_MAX_PENDING environment variable)
size of pending MADs pool.
* Set packet life time to subnet timeout option rather than default
* Enforce routing paths rebalancing on switch reconnection
* In Up/Down routing algorithm compare GUID values in host byte order
* Add 'switchbalance' and 'lidbalance' commands for OpenSM console
* Respond to new trap 144 node description update flag
* Add '--connect_roots' command line options. This preserves connectivity
between root nodes in Up/Down routing algorithm
* Setting SL in the IPoIB MCast groups in accordance with QoS policy
* Dump auto detected root node guids in Up/Down routing algorithm
* Unify OpenSM dumpers code
* Unify various guid files parsers - add generic nodenamemap style parser
* When root node guids were provided in file update the list on each
Up/Down run
* During ./configure show values of configuration dirs and files
* Make prefix routes config file name configurable
* Add a Performance Manager HOWTO to the docs and the dist
* Support separate SA and SM keys as clarified in IBA 1.2.1
* Remove AM_MAINTAINER_MODE in ./configure
* Make vendor type OSM_VENDOR_INTF_OPENIB (libibumad) to be default
* Build osm_perfmgr_db.* content only when PerfMgr is enabled.
* Move PerfMgr event_db_dump_file to common OpenSM dump dir
* Allow space separated strings as values in OpenSM config
* Support for multiple event plugins
* Add '--version' command line option
* Add '--create-config <file-name>' command line option
* Speedup and simplify logging code
* Speedup multicast processing in SA DB
* In log messages convert unicast LIDs from hex to decimal format and
GIDs from hex to IPv6 address format
* Handle all possible ports in "ignore-guids" file
* Add 'reroute' console command
* Remove many install-exec-hook from Makefiles
* Some cleanups in LASH routing algorithm code
* In Makefiles remove -rpath and explicit -lpthread, -ldl from LDFLAGS
(move to configurator)
* Install all OpenSM header files
* Improve locking in SM Info receiver
* Add new OSM_EVENT_ID_SUBNET_UP event for plugins
* Redo lex and yacc files generation in conventional way
* Add a missing Node Description check on light sweep.
* Move vendor specific compilation defines from command to generated
config.h file
* Provide useful error message when log file opening fails
* Add generated osm_config.h file with OpenSM specific defines
* Display port number in decimal in log messages
* Replace osm_vendor_select.h by generated osm_config.h
* Unify options listing in OpenSM usage message
* LFT buffers handling simplification
* Add 'dump_conf' console command
* OpenSM performs sweep on SIGCONT (coming out of suspend).
* When our SM is in Standby state and its priority is increased
(via console command), notify master SM by sending Trap 144.
* When entering standby state (after discovery) notify master SM
with Trap 144.
* support more PortInfo:CapabilityMask bits
* When babbling port policy is on disable the port with the least hop
count.
1.3 Library API Changes
None
1.4 Software Dependencies
OpenSM depends on the installation of either OFED 1.x, OpenIB gen2 (e.g.
IBG2 distribution), OpenIB gen1 (e.g. IBGD distribution), or Mellanox
VAPI stacks. The qualified driver versions are provided in Table 2,
"Qualified IB Stacks".
Also, building of QoS manager policy file parser requires flex, and either
bison or byacc installed.
1.5 Supported Devices Firmware
The main task of OpenSM is to initialize InfiniBand devices. The
qualified devices and their corresponding firmware versions
are listed in Table 3.
2 Known Issues And Limitations
------------------------------
* No Service / Key associations:
There is no way to manage Service access by Keys.
* No SM to SM SMDB synchronization:
Puts the burden of re-registering services, multicast groups, and
inform-info on the client application (or IB access layer core).
* When running with QoS with default configuration (opensm -Q),
OpenSM prints list of "Invalid Cached Option" error messages.
This does not affect OpenSM functionality.
* SMs do not hand-over when running on ConnectX in a switch-based topology.
3 Unsupported IB Compliance Statements
--------------------------------------
The following section lists all the IB compliance statements which
OpenSM does not support. Please refer to the IB specification for detailed
information regarding each compliance statement.
* C14-22 (Authentication):
M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
SubnSet method. As a work-around, an OpenSM option is provided for
defining the protect bits.
* C14-67 (Authentication):
On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
the SM shall generate a SubnGetResp if the M_Key matches, or
silently drop the packet if M_Key does not match.
* C15-0.1.23.4 (Authentication):
InformInfoRecords shall always be provided with the QPN set to 0,
except for the case of a trusted request, in which case the actual
subscriber QPN shall be returned.
* o13-17.1.2 (Event-FWD):
If no permission to forward, the subscription should be removed and
no further forwarding should occur.
* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
GUIDInfo - SM should enable assigning Port GUIDInfo.
* C14-44 (Initialization):
If the SM discovers that it is missing an M_Key to update CA/RT/SW,
it should notify the higher level.
* C14-62.1.1.12 (Initialization):
PortInfo:M_Key - Set the M_Key to a node based random value.
* C14-62.1.1.13 (Initialization):
PortInfo:P_KeyProtectBits - set according to an optional policy.
* C14-62.1.1.24 (Initialization):
SwitchInfo:DefaultPort - should be configured for random FDB.
* C14-62.1.1.32 (Initialization):
RandomForwardingTable should be configured.
* o15-0.1.12 (Multicast):
If the JoinState is SendOnlyNonMember = 1 (only), then the endport
should join as sender only.
* o15-0.1.8 (Multicast):
If a request for creating an MCG with fields that cannot be met,
return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
* C15-0.1.8.6 (SA-Query):
Respond to SubnAdmGetTraceTable - this is an optional attribute.
* C15-0.1.13 Services:
Reject ServiceRecord create, modify or delete if the given
ServiceP_Key does not match the one included in the ServiceGID port
and the port that sent the request.
* C15-0.1.14 (Services):
Provide means to associate service name and ServiceKeys.
4 Bug Fixes
-----------
4.1 Major Bug Fixes
* Set SA attribute offset to 0 when no records are returned
* Send trap 64 only after new ports are in ACTIVE state.
* Fix in sending client reregistration bit
* Fix default OpenSM SM (and SA) Key byte order
* Fix in sending Multicast groups creation/deletion notification (Traps
66,67)
* Don't startup automatically on SuSE based systems
* Discovery bug, where some ports were leaved unlinked (without remote side).
4.2 Other Bug Fixes
* opensm/osm_console.c: fix seg fault when running "portstatus ca" in
the console
* opensm: fix potential core dumps where osm_node_get_physp_ptr can
return NULL
* opensm/osm_mcast_mgr: limit spanning tree creation recursion to value
of max hops (64)
* opensm: switch LFTs incremental update fix
* opensm/osm_state_mgr.c: fix segmentation fault
* opensm: eliminate some potential NULL pointer dereferences
* opensm/osm_console.c: fix guid parsing
* opensm: fix off by 1 issue with max_lid and max_multicat_lid_ho
* opensm: fix potentially wrong port_guid initialization
* opensm/configure.in: fix wrong HAVE_DEFAULT_OPENSM_CONFIG_FILE define
generation
* opensm: fix snprintf() usage
* opensm/osm_sa_lft_record: validate LFT block number
* opensm/osm_sa_lft_record: pass block parameter in host byte order
* opensm/include/Makefile.am: don't duplicate header files in EXTRA_DIST
* opensm/osm_sa_class_port_info.c: fix over bound array access
* osmtest/osmt_service.c: fix over bound array access
* osmtest: fix qpn encoding in osmtest_informinfo_request()
* opensm/osm_vendor_mlx_sa.c: handling attribute offset of 0
* opensm: fix segfault corner case when osm_console_init fails
* opensm/console: close console socket on cleanup path
* opensm/osm_ucast_lash: fix buffer overflow
* opensm: fix broken IPv6 SNM consolidation code
* opensm/osm_sa_lft_record.c: fix block number encoding byte order
* opensm/osm_sa: fix memory leak in SA responder
* opensm/osm_mcast_mgr: fix memory leak
* opensm: fix qos config parsing bugs
* opensm/osm_mcast_tbl.c: fix sending invalid MF block due to max mlid
overflow
* opensm: log_max_size config parameter in MB
* opensm/osm_ucast_lash: fix extra memory allocations
* opensm: fix race in main OpenSM flow
* opensm/ftree: fix GUID check against cn_guid_file
* opensm/ftree: save FLT buffers memory allocations
* opensm/osm_sa_link_record.c: prevent potential endless recursion
* opensm: remove SM from sm_guid_tbl when IsSM port capability flag is
not set
* opensm: fix QoS config bug
* opensm: don't reassign zeroed params from config file
* Other less critical or visible bugs were also fixed.
* opensm: update LFTs when entering master
* opensm: invalidate routing cache when entering master state
* opensm/osm_port_info_rcv.c: don't clear sw->need_update if port 0 is active
5 Main Verification Flows
-------------------------
OpenSM verification is run using the following activities:
* osmtest - a stand-alone program
* ibmgtsim (IB management simulator) based - a set of flows that
simulate clusters, inject errors and verify OpenSM capability to
respond and bring up the network correctly.
* small cluster regression testing - where the SM is used on back to
back or single switch configurations. The regression includes
multiple OpenSM dedicated tests.
* cluster testing - when we run OpenSM to setup a large cluster, perform
hand-off, reboots and reconnects, verify routing correctness and SA
responsiveness at the ULP level (IPoIB and SDP).
5.1 osmtest
osmtest is an automated verification tool used for OpenSM
testing. Its verification flows are described by list below.
* Inventory File: Obtain and verify all port info, node info, link and path
records parameters.
* Service Record:
- Register new service
- Register another service (with a lease period)
- Register another service (with service p_key set to zero)
- Get all services by name
- Delete the first service
- Delete the third service
- Added bad flows of get/delete non valid service
- Add / Get same service with different data
- Add / Get / Delete by different component mask values (services
by Name & Key / Name & Data / Name & Id / Id only )
* Multicast Member Record:
- Query of existing Groups (IPoIB)
- BAD Join with insufficient comp mask (o15.0.1.3)
- Create given MGID=0 (o15.0.1.4)
- Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
- Create BAD MGID=0xFA. (o15.0.1.6)
- Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
- New MGID with invalid join state (o15.0.1.9)
- Retry of existing MGID - See JoinState update (o15.0.1.11)
- BAD RATE when connecting to existing MGID (o15.0.1.13)
- Partial JoinState delete request - removing FullMember (o15.0.1.14)
- Full Delete of a group (o15.0.1.14)
- Verify Delete by trying to Join deleted group (o15.0.1.14)
- BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
* GUIDInfo Record:
- All GUIDInfoRecords in subnet are obtained
* MultiPathRecord:
- Perform some compliant and noncompliant MultiPathRecord requests
- Validation is via status in responses and IB analyzer
* PKeyTableRecord:
- Perform some compliant and noncompliant PKeyTableRecord queries
- Validation is via status in responses and IB analyzer
* LinearForwardingTableRecord:
- Perform some compliant and noncompliant LinearForwardingTableRecord queries
- Validation is via status in responses and IB analyzer
* Event Forwarding: Register for trap forwarding using reports
- Send a trap and wait for report
- Unregister non-existing
* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
disconnecting/connecting ports) and wait for report, then unregister.
* Stress Test: send PortInfoRecord queries, both single and RMPP and
check for the rate of responses as well as their validity.
5.2 IB Management Simulator OpenSM Test Flows:
The simulator provides ability to simulate the SM handling of virtual
topologies that are not limited to actual lab equipment availability.
OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
regressions use smaller (16 and 128 nodes clusters).
The following test flows are run on the IB management simulator:
* Stability:
Up to 12 links from the fabric are randomly selected to drop packets
at drop rates up to 90%. The SM is required to succeed in bringing the
fabric up. The resulting routing is verified to be correct as well.
* LID Manager:
Using LMC = 2 the fabric is initialized with LIDs. Faults such as
zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
randomly assigned to various nodes and other errors are randomly
output to the guid2lid cache file. The SM sweep is run 5 times and
after each iteration a complete verification is made to ensure that all
LIDs that could possibly be maintained are kept, as well as that all nodes
were assigned a legal LID range.
* Multicast Routing:
Nodes randomly join the 0xc000 group and eventually the
resulting routing is verified for completeness and adherence to
Up/Down routing rules.
* osmtest:
The complete osmtest flow as described in the previous table is run on
the simulated fabrics.
* Stress Test:
This flow merges fabric, LID and stability issues with continuous
PathRecord, ServiceRecord and Multicast Join/Leave activity to
stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
were added to the test such both existing and non existing nodes
perform them in random order.
5.3 OpenSM Regression
Using a back-to-back or single switch connection, the following set of
tests is run nightly on the stacks described in table 2. The included
tests are:
* Stress Testing: Flood the SA with queries from multiple channel
adapters to check the robustness of the entire stack up to the SA.
* Dynamic Changes: Dynamic Topology changes, through randomly
dropping SMP packets, used to test OpenSM adaptation to an unstable
network & verify DB correctness.
* Trap Injection: This flow injects traps to the SM and verifies that it
handles them gracefully.
* SA Query Test: This test exhaustively checks the SA responses to all
possible single component mask. To do that the test examines the
entire set of records the SA can provide, classifies them by their
field values and then selects every field (using component mask and a
value) and verifies that the response matches the expected set of records.
A random selection using multiple component mask bits is also performed.
5.4 Cluster testing:
Cluster testing is usually run before a distribution release. It
involves real hardware setups of 16 to 32 nodes (or more if a beta site
is available). Each test is validated by running all-to-all ping through the IB
interface. The test procedure includes:
* Cluster bringup
* Hand-off between 2 or 3 SM's while performing:
- Node reboots
- Switch power cycles (disconnecting the SM's)
* Unresponsive port detection and recovery
* osmtest from multiple nodes
* Trap injection and recovery
6 Qualified Software Stacks and Devices
---------------------------------------
OpenSM Compatibility
--------------------
Note that OpenSM version 3.2.1 and earlier used a value of 1 in host
byte order for the default SM_Key, so there is a compatibility issue
with these earlier versions of OpenSM when the 3.2.2 or later version
is running on a little endian machine. This affects SM handover as well
as SA queries (saquery tool in infiniband-diags).
Table 2 - Qualified IB Stacks
=============================
Stack | Version
-----------------------------------------|--------------------------
OFED | 1.4
OFED | 1.3
OFED | 1.2
OFED | 1.1
OFED | 1.0
OpenIB Gen2 (IBG2 distribution) | 1.0
OpenIB Gen1 (IBGD distribution) | 1.8.0
VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later
Table 3 - Qualified Devices and Corresponding Firmware
======================================================
Mellanox
Device | FW versions
------------------------------------|-------------------------------
InfiniScale | fw-43132 5.2.000 (and later)
InfiniScale III | fw-47396 0.5.000 (and later)
InfiniScale IV | fw-48436 7.1.000 (and later)
InfiniHost | fw-23108 3.5.000 (and later)
InfiniHost III Lx | fw-25204 1.2.000 (and later)
InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later)
InfiniHost III Ex (MemFree Mode) | fw-25218 5.3.000 (and later)
ConnectX IB | fw-25408 2.3.000 (and later)
QLogic/PathScale
Device | Note
--------|-----------------------------------------------------------
iPath | QHT6040 (PathScale InfiniPath HT-460)
iPath | QHT6140 (PathScale InfiniPath HT-465)
iPath | QLE6140 (PathScale InfiniPath PE-880)
iPath | QLE7240
iPath | QLE7280
Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
QP0 and QP1. However, it does support it as a device on the subnet.
Note 2: QoS firmware and Mellanox devices
HCAs: QoS supported by ConnectX. QoS-enabled FW release is 2_5_000 and
later.
Switches: QoS supported by InfiniScale III
Any InfiniScale III FW that is supported by OpenSM supports QoS.
|