1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
|
# Level Zero Validation Layer
## Introduction
The Level Zero driver implementations [by design](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/INTRO.html#error-handling) do minimal error checking and do not guard against invalid API programming.
The Level Zero Validation layer is intended to be the primary Level Zero API error handling mechanism. The validation layer can be enabled at runtime with environment settings. When validation layer is enabled, L0 loader will inject calls to validation layer into L0 API DDI tables. When validation layer is not enabled, it is completely removed from the call path and has no performance cost.
## Usage
The validation layer is built into a shared library named libze_validation_layer.so or ze_validation_layer.dll. This library must be in your library search path.
The validation layer can be enabled at runtime by setting `ZE_ENABLE_VALIDATION_LAYER=1`
Level Zero Loader will read this environment settings when either `zeInit` or `zesInit` is called and set up the DDI function pointer tables accordingly.
By default, no validation modes will be enabled. The individual validation modes must be enabled with the following environment settings:
- `ZE_ENABLE_PARAMETER_VALIDATION`
- `ZE_ENABLE_HANDLE_LIFETIME`
- `ZEL_ENABLE_EVENTS_CHECKER`
- `ZEL_ENABLE_BASIC_LEAK_CHECKER`
- `ZE_ENABLE_THREADING_VALIDATION` (Not yet Implemented)
- `ZEL_ENABLE_CERTIFICATION_CHECKER`
- `ZEL_ENABLE_SYSTEM_RESOURCE_TRACKER_CHECKER`
## Validation Modes
### `ZE_ENABLE_PARAMETER_VALIDATION`
Parameter Validation mode maintains no internal state. It performs the following checks on each API before calling into driver:
- Non-optional input pointers must not be `nullptr`
- Non-optional input handles must not be `0`
- Input flags must only have valid flag values set
- Input enums values must not be greater than max defined value
- (Planned) `stype` must be set to a valid `ze_structure_type_t` for struct
- (Planned) `pNext` must be `nullptr` or point to a valid extension struct
If a check fails, the appropriate error code is returned and the driver API is not called.
### `ZE_ENABLE_HANDLE_LIFETIME`
This mode maintains an internal mapping of each handle type to a state structure.
- When handle is created it is added to map
- When handle is destroyed it is removed from map
- When application inputs a handle it is validated
- validates handles are properly destroyed
- Additional per handle state checks added as needed
- Example - Check ze_cmdlist_handle_t open or closed
### `ZEL_ENABLE_EVENTS_CHECKER`
The Events Checker validates usage of events.
- It is designed to detect potential deadlocks that might occur due to improper event usage in the Level Zero API. It prints out warning messages for user when it detects a potential deadlock.
- In some cases it may also detect whether an event is being used more than once without being reset. Consider a case in which a single event is signaled from twice.
### `ZEL_ENABLE_BASIC_LEAK_CHECKER`
Basic leak checker in the validation layer which tracks the Create and Destroy calls for a given handle type and reports if a create/destroy is missing.
#### Sample Output
```
----------------------------------------------------------------------
zeContextCreate = 1 \---> zeContextDestroy = 1
zeCommandQueueCreate = 1 \---> zeCommandQueueDestroy = 1
zeModuleCreate = 1 \---> zeModuleDestroy = 1
zeKernelCreate = 1 \---> zeKernelDestroy = 1
zeEventPoolCreate = 1 \---> zeEventPoolDestroy = 1
zeCommandListCreateImmediate = 1 |
zeCommandListCreate = 1 \---> zeCommandListDestroy = 1 ---> LEAK = 1
zeEventCreate = 2 \---> zeEventDestroy = 2
zeFenceCreate = 1 \---> zeFenceDestroy = 1
zeImageCreate = 0 \---> zeImageDestroy = 0
zeSamplerCreate = 0 \---> zeSamplerDestroy = 0
zeMemAllocDevice = 0 |
zeMemAllocHost = 1 |
zeMemAllocShared = 0 \---> zeMemFree = 1
```
### `ZE_ENABLE_THREADING_VALIDATION` (Not yet Implemeneted)
Validates:
- Objects are not concurrently reused in free-threaded API calls
### `ZEL_ENABLE_CERTIFICATION_CHECKER`
When this mode is enabled, the certification checker validates API usage against the version supported by the driver or an explicitly specified version.
If an API is used that was introduced in a version higher than the supported version, the checker will return `ZE_RESULT_ERROR_UNSUPPORTED_VERSION`.
### `ZEL_ENABLE_SYSTEM_RESOURCE_TRACKER_CHECKER` (Linux Only)
The System Resource Tracker monitors both Level Zero API resources and system resources in real-time. It tracks:
- **L0 Resources**: Contexts, command queues, modules, kernels, event pools, command lists, events, fences, images, samplers, and memory allocations
- **System Metrics**: Virtual memory (VmSize, VmRSS, VmData, VmPeak), thread count, file descriptors
- **Deltas**: Resource changes for each API call
- **Cumulative Totals**: Running summaries of all resource types
The tracker can log to the Level Zero debug log and optionally export data to CSV for graphing and analysis:
```bash
export ZE_ENABLE_VALIDATION_LAYER=1
export ZEL_ENABLE_SYSTEM_RESOURCE_TRACKER_CHECKER=1
export ZEL_SYSTEM_RESOURCE_TRACKER_CSV=tracker_output.csv # Optional: enable CSV export
export ZEL_ENABLE_LOADER_LOGGING=1
export ZEL_LOADER_LOGGING_LEVEL=debug
```
**CSV Output Features:**
- Per-process unique filenames (PID appended automatically)
- 22 columns of metrics including timestamps, system resources, L0 resource counts, and deltas
- Atomic line writes for thread safety
- Companion Python plotting script (`scripts/plot_resource_tracker.py`) for visualization
**Use Cases:**
- Performance analysis and memory leak detection
- Resource lifecycle tracking and optimization
- Debugging and benchmarking
- CI/CD integration for automated resource monitoring
**Platform Support:** This checker is Linux-only and uses `/proc/self/status` for system metrics. It is automatically excluded from Windows and macOS builds.
See [System Resource Tracker documentation](checkers/system_resource_tracker/system_resource_tracker.md) for detailed usage and CSV format.
## Testing
There is a small set of negative test cases designed to test the validation layer in the [level zero tests repo](https://github.com/oneapi-src/level-zero-tests/tree/master/negative_tests).
It is desired to add new unit tests directly into validation layer repo that executes with null driver and does not have additional dependencies. Help Wanted!
## Contributing
See [CONTRIBUTING](CONTRIBUTING.md) for more information.
|