1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983
|
========================================================================
Release Notes for Intel(R) Multi-Buffer Crypto for IPsec Library
v2.0 October 2024
======================================================================
General
- OpenSSF scorecard badge added.
- YASM support removed.
- CMake library only build option added.
- CET support added to CMake build.
- Replaced Makefiles with CMake as default build system.
- Man pages installation path fixed.
- Improved CMake project definitions and installation paths.
- Added FreeBSD CMake builds to workflows.
- Updated style check to clang-format version 18.
- Marked direct API for wireless algorithms (KASUMI, SNOW3G and ZUC) as deprecated,
to be removed in the next release.
Library
- AES-GCM changes
- Reduced binary size of AVX512 type 2 and AVX2 type 1 code by re-using internal GHASH functions.
- Optimized small packets for AVX512 type 2 (1 to 256 bytes).
- Removed specialized AVX512 type 1 and AVX2 type 1 is used instead.
- Implemented multiply reduce optimization for GHASH AVX2 type 1.
- Slightly improved large buffer performance for AVX2 type 1.
- Added new AVX2 type 2 implementation.
- DES, 3DES/TDES and DES-DOCSIS binary size reduction.
- reduced stack frame size for DES and DES-DOCSIS.
- re-used common transpose macros in the implementation.
- Fixed LFSR update in single buffer ZUC API implementation.
- SM4 changes:
- Added SM4-CTR and SM4-GCM SSE implementations.
- Added AVX2-SM4-NI implementation for SM4-GCM, SM4-CTR, SM4-CBC and SM4-ECB.
- SHA2-512/384 changes:
- Added SHA2-512/384 update AVX2-SHA512-NI single-buffer implementation.
- Added SHA2-512/384 and HMAC-SHA2-512/384 AVX2-SHA512-NI x2 multi-buffer implementations.
- Added SM3 and SM3-HMAC SM3-NI implementations.
- Added AES-CFB SSE type 1 and AVX512 type 2 implementations.
- Removed features:
- Removed AESNI emulation support.
- Removed AVX Type 2 implementation.
- Removed AES-CMAC, AES-CCM, AES-CBC and AES-ECB x4 and by4 implementations from SSE type 1.
- They are replaced with x8 and by8 implementations from SSE type 3.
- Removed AVX type 1 implementations: SHA/MD5, CHACHA20-POLY1305, SNOW3G and KASUMI.
- Moved remaining AVX type 1 implementations into AVX2 type 1.
- Removed AVX architecture type.
- Changed SHA1 on AVX2 type 4 architecture to use multi-buffer implementation.
- Added check for XSAVE and OSXSAVE CPUID features for any AVX architecture type.
- Extended cipher burst API support with: AES-ECB, AES-CFB.
- Extended hash burst API support with: SHA1, SHA2-384/512, AES-CMAC.
- Added AEAD burst API with AES-CCM support.
- Added new API to retrieve optimal minimum burst size for hash, cipher and AEAD API's.
Test Applications
- Reduced false positive hit ratio in the cross validation safe check mode.
- Improved performance of safe check pattern search in the cross validation tool.
- Added new test vectors to KAT application for AES-CFB, SM4-CTR and SM4-GCM.
- Added new multi-process test to exercise active-passive scenarios.
- Removed Makefile support.
- Removed AVX architecture type.
- Added tests for AES-CFB.
- Added burst API tests for SHA1, SHA2, AES-ECB, AES-CFB, AES-CMAC and AES-CCM.
- Added AES-CFB to ACVP application.
- Extended ctest infrastructure with improved test granularity.
Performance Applications
- Removed Makefile support.
- Removed AVX architecture type.
- Added display of time-box and measurement mode details at start.
- Added burst API tests for SHA1, SHA2, AES-ECB, AES-CFB, AES-CMAC and AES-CCM.
- Added new throughput test mode option to `imb-perf`.
- It works together with new set time box option to report throughput for selected period of time.
- Added `imb-speed.py` tool that mimics `openssl speed`.
Example Applications
- Removed Makefile support.
v1.5 November 2023
=======================================================================
General
- CMake MinGW support added.
Library
- QUIC CHACHA20-POLY1305 and CHACHA20 HP API added.
- AVX2-VAES AES-CTR implementation added.
- SM4-ECB SSE implementation added.
- SM4-CBC SSE implementation added.
- x86-64 SM3 and SM3-HMAC implementation added.
- Self-Test callback functionality added with message corrupt option.
- Implemented AES-GCM with VAES AVX2.
- Implemented AES-CTR with VAES AVX2.
- Implemented a workaround for false load-block condition in SSE AES-CBC implementations.
- Optimized CRC32 algorithms.
- Optimized AES-GCM AVX2 and AVX512 implementations.
Test Applications
- QUIC CHACHA20-POLY1305 and CHACHA20 HP tests added.
- SM4-ECB and SM4-CBC tests added.
- SM3 and SM3-HMAC tests added.
- Self-Test callback support added to the KAT-APP.
- Updated ACVP app (imb-acvp) to support libacvp v2.0+.
- Test vector standardized for various algorithms (CBC/CFB/CTR/ECB/DES/GCM/CCM/CHACHA20-POLY/SNOW3G/ZUC/KASUMI/SNOW-V).
- Extended xvalid app to test burst API.
Performance Applications
- New parameter added to benchmark QUIC `--quic-api`.
- Burst API is benchmarked by default now.
- SM4-ECB and SM4-CBC support added.
- SM3 and SM3-HMAC support added.
v1.4 June 2023
========================================================================
General
- Experimental CMake support for Linux, FreeBSD and Windows added
Library
- POLY1305 AVX2 with AVX-IFMA instructions added.
- Optimized GHASH component in AVX512 VAES (type2) AES-GCM implementation.
- Implemented a workaround for false load-block condition in SSE and AVX2 AES-GCM implementations.
- Removed AVX AES-GCM implementation, its API symbols map to the SSE implementation.
- QUIC AES-ECB header protection API added.
- QUIC AES-GCM-128/256 AEAD API added.
- Removed v0.53 (and older) compatibility symbol mapping (NO_COMPAT_IMB_API_053 not defined).
- ZUC AVX2-GFNI implementation added.
- SHA-NI instructions enabled to use in SHA1/224/256 direct API
- New API (imb_set_session) added to be used with burst API, helping speeding up the crypto scheduling.
- New API added to calculate IPAD/OPAD for SHAx-HMAC.
- New direct API added to calculate DES-CFB and AES-CFB-256 on a single block.
Test Applications
- ACVP test application extended to support: AES-ECB and 3DES-CBC.
- Added sample applications showcasing how to use the new burst API.
- CMake support added, including ability to run tests with it.
- Extended fuzzing app to cover remaining direct APIs.
- Test vector standardized for various algorithms (SHA/XCBC/POLY1305/CMAC/GMAC/GHASH/HMAC-SHAx/MD5).
- Changed `test` directory structure and test application names. Each test application has its own subdirectory.
Performance Application
- New parameter added to benchmark crypto on unaligned buffers
- Renamed performance application to `imb-perf`
Fixes
- Fixed MB_MGR initialization corruption (issue #115)
- Fixed performance scaling issue resulting from the misuse of global `errno` variable (issue #112)
v1.3 September 2022
========================================================================
Library
- ZUC-EIA3-256 8-byte and 16-byte tag support added for SSE, AVX, AVX2 and AVX512
- AES-ECB AVX512-VAES implementation added
- AES-ECB optimizations for AVX and SSE
- AES-ECB AVX2-VAES implementation added
- JOB API GHASH support added
- SHA1/224/256/384/512 multi-buffer implementation added
- Multi-buffer SHA1, SHA224 and SHA256 use SHANI if available
- Synchronous cipher and hash burst API added
- cipher API only supports AES-CBC and AES-CTR
- hash API only supports HMAC-SHA1, HMAC-224, HMAC-256, HMAC-384 and HMAC-512
- Asynchronous burst API added that supports all cipher and hash modes
- SNOW3G-UEA2 SSE multi-buffer implementation added
- SNOW3G-UIA2 SSE multi-buffer initialization and key-stream generation added
- SNOW3G-UEA2 and SNOW3G-UIA2 SSE implementation used in JOB API for
AVX and AVX2 architectures
- API documentation added (doxygen generated)
- New SGL job API (AES-GCM and CHACHA20-POLY1305 only)
- Enforced EVEX PMADD52 encoding in AVX512 code
- Restructured reset flow of architecture managers
- SSE, AVX, AVX2 and AVX512 managers were split to better cover different types
- Added library self-test functionality
- enbranch64 not emitted on Windows builds (CET related)
- use SHANI extensions in AVX2 type-2 and AVX type-2 for SHA224, HMAC-SHA224,
SHA256 and HMAC-SHA256
- use SHANI extensions in AVX type-2 for SHA1, HMAC-SHA1
- no-GFNI option added to help with testing
- single buffer SHANI implementation of SHA1 and SHA256 added
- single buffer SHANI implementation used in HMAC-SHA1, HMAC-SHA224 and
HMAC-SHA256 flush operation
Test Applications
- GHASH JOB API support added in the test application, fuzzing and xvalid tools
- Burst API support added for supported algorithms
- ACVP test application extended to support: AES-GCM, AES-GMAC, AES-CCM,
AES-CBC, AES-CTR, AES-CMAC, SHA1, SHA224, SHA256, SHA384, SHA512, HMAC-SHA1,
HMAC-SHA224, HMAC-SHA256, HMAC-SHA384, HMAC-SHA512
- Cross validation (xvalid) tool improvements in pattern search functionality
- FreeBSD added to github CI
- Added AVX-SSE transition check to the cross validation tool (xvalid)
- Wycheproof AES-GCM, AES-CCM, CHACHA20-POLY1305, AES-CMAC, AES-GMAC, HMAC-SHA1,
HMAC-SHA224, HMAC-SHA256, HMAC-SHA384 and HMAC-SHA512 test vectors added
to a new test tool
- no-GFNI option added
- Fuzzing application extended to cover new burst API's
Performance Application
- GHASH support added (through JOB and direct API)
- CHACHA20-POLY1305 support through direct API
- Support added for SHA1/224/256/384/512
- Burst API support added for supported algorithms
- SGL support added (AES-GCM and CHACHA20-POLY1305 only)
- no-GFNI option added
Fixes
- Fixed 23-byte IV expansion for ZUC-256 (issue #102)
- Fixed incorrect 8-buffer SNOW3G key-stream generation (issue #104)
- Numerous AVX-SSE transition fixes with SAFE_OPTIONS=n
- [ZUC-EIA3] allow unaligned digest load/stores
- AES-CCM authentication flush may load out of scope data (issue #107)
- AES-CMAC authentication flush may load out of scope data (similar to issue #107)
v1.2 February 2022
========================================================================
General
- Windows CET support
- Disabled AESNI emulation support by default and make it optional
Library
- Generation of PDB in release build on Windows added
- SAFE_OPTIONS option added to unify SAFE_DATA, SAFE_PARAM, SAFE_LOOKUP options
Test Applications
- Extended invalid IV length tests
- Test application improvements added
- Fuzz testing tool improvements added
- Auto-generation of direct API invalid parameters tests added
- ACVP test application added
Performance Application
- GCM support for SGL API added
Fixes
- Fixed incorrect job length calculation in CBCS encryption
- Fixed FreeBSD build (intel/intel-ipsec-mb#94)
- Added missing checks for HMAC IPAD and opad
- Added missing checks for XCBC K1, K2 and K3
v1.1 October 2021
========================================================================
General
- Added support to build with Mingw-w64 on Windows
Library
- PON algorithm AVX512-VAES implementation added
- SNOW3G-UIA2 AVX512 and AVX512-VPCLMULQDQ implementations added
- SNOW3G-UEA2 AVX512 and AVX512-VAES implementations added
- SNOW-V AVX implementation added
- ZUC optimizations for AVX512
- GCM optimizations for AVX512
- Poly1305 optimizations for AVX512-VAES
- Improved error code handling (missing mainly on assembly modules)
- ZUC-256 23-byte IV support added
Test Applications
- Error handling tests added (for job and direct API)
- Fuzz testing added
v1.0 April 2021
========================================================================
General
- Top level `lib` directory tidy up
- build scripts and header file left at the top level
- `lib/x86_64` directory created
- files requiring compilation moved from `lib/include`
- Symbols not stripped from static library at installation
- API name changes and unification
- mapping provided for backwards compatibility
- NASM version check in the build script
- CET enabling in the build scripts
Library
- CET enabling (endbranch opcodes added)
- ZUC-EIA3-256 support for SSE, AVX, AVX2 and AVX512 (VAES)
- 4 byte tag length only
- Chacha20 optimizations for SSE, AVX and AVX2
- ZUC-EEA3-256 support for SSE, AVX, AVX2 and AVX512 (VAES)
- SNOW-V and SNOW-V-AEAD support for SSE
- Poly1305 AVX512 and AVX512-IFMA implementations added
- Chacha20-Poly1305 AEAD implementations extended to AVX512 and AVX512-IFMA
- CBCS AVX512 optimizations
- Extended CBCS to return last cipher block to maintain context between calls
- AVX/SSE transition fixes
- Added SGL support for AEAD Chacha20-Poly1305
- Poly1305 minor optimization in the scalar code
- GHASH API change
- IFMA CPU feature detection
- SGL support added for AES-GCM through job API
- Added CRC functions through job API
Test Applications
- ZUC-EEA3-256 tests added to test and xvalidation applications
- SNOW-V and SNOW-V-AEAD tests added to test and xvalidation applications
- IMIX support added to the xvalidation application
- AEAD Chacha20-Poly1305 tests added
- SGL tests added for AES-GCM through job API
- CRC function tests added through job API
Performance Application
- ZUC-EEA3-256 support added
- SNOW-V and SNOW-V-AEAD support added
- AEAD Chacha20-Poly1305 support added
- Created `ipsec_perf_tool.py` to run multiple `ipsec_perf`
instances at the same time
- DOCSIS cipher combined with CRC32 treated as AEAD algorithm
- CRC functions added
v0.55 October 2020
========================================================================
General
- Restructured project to move all library code into new 'lib' directory
- Renamed LibPerfApp directory to perf
- Renamed LibTestApp directory to test
Library
- AES-CCM-256 implementation for SSE, AVX and AVX512 (VAES)
- AES-CMAC-256 implementation for SSE, AVX and AVX512 (VAES)
- 32bit and 64bit HEC compute API added
- AES-GMAC direct API added to support Scatter-Gather list (SGL)
- CALC_AAD_HASH macro improved for AVX512 (VAES), boosting performance
for AES-GMAC, GHASH and hash calculation for AAD in AES-GCM
- ZUC-EEA3 and ZUC-EIA3 Multi-buffer implemented for SSE using
GFNI instructions.
- AES-XCBC-128 implementation for AVX512 (VAES)
- AES-CBCS-128 implementation for SSE and AVX (1:9 crypt:skip pattern)
- Chacha20 SSE, AVX and AVX512 implementations
- Automatic multi-buffer manager initialization API added
- Error handling API added
- Build with SAFE_DATA and SAFE_PARAM options by default
- Poly1305 scalar implementation
- AEAD Chacha20-Poly1305 implementation
- CRC implementation for RNC, LTE, WiMAX, SCTP, Ethernet and CRC16 CCIT
Test Applications
- CCM tests extended to test AES-CCM-256
- CMAC tests extended to test AES-CMAC-256
- HEC tests added to test app
- AES-GMAC SGL tests added to test app
- AES-XCBC-128 tests added to test app
- AES-CBCS-128 tests added to test app
- AES-CBCS-128 support added to xvalidation app
- Chacha20 tests added to test and xvalidation app
- Poly1305 tests added to test and xvalidation app
- AEAD Chacha20-Poly1305 tests added to test and xvalidation app
- CRC tests added
- Automatic architecture detection done by default
Performance Application
- AES-CCM-256 support added
- AES-CMAC-256 support added
- AES-CBCS-128 support added
- Chacha20 support added
- Poly1305 support added
- AEAD Chacha20-Poly1305 support added
v0.54 April 2020
========================================================================
Library
- ZUC-EEA3 and ZUC-EIA3 algorithms added in job API (using cipher mode
IMB_CIPHER_ZUC_EEA3 and hash_alg IMB_AUTH_ZUC_EIA3_BITLEN)
- SNOW3G-UEA2 and SNOW3G-UIA2 algorithms added in job API (using cipher type
IMB_CIPHER_SNOW3G_UEA2_BITLEN and hash type IMB_AUTH_SNOW3G_UIA2_BITLEN)
- AVX512 implementation of stitched DOCSIS cipher with CRC32 calculations
- KASUMI-UEA1 and KASUMI-UIA1 algorithms added in job API (using cipher type
IMB_CIPHER_KASUMI_UEA1_BITLEN and hash type IMB_AUTH_KASUMI_UIA1)
- New GHASH API added
- ZUC-EIA3 Multi-buffer API added and implemented for SSE and AVX.
- Added support for any IV size in AES-GCM, through the job API and new
direct API
- Check for new flag NO_COMPAT_IMB_API_053, which exposes only new API,
removing backwards compatibility with version v0.53.
- AES-CMAC implementation for VAES added
- SSE AES128-CTR, AES192-CTR, AES256-CTR and AES128-CCM by8 cipher
implementations added
- Added by8/x8 implementations of SSE AES128-CBC, AES192-CBC, AES256-CBC,
DOCSIS SEC BPI, AES-CCM and AES-CMAC.
- AES256-DOCSIS algorithm added.
- ZUC-EEA3 and ZUC-EIA3 Multi-buffer implemented for AVX2 and AVX512.
- ZUC-EEA3 and ZUC-EIA3 Multi-buffer implemented for AVX2 and AVX512,
using this latter GFNI and VAES instructions where these are present
in the CPU.
- Minimum required version for NASM is now 2.14.
- ZUC-EEA3 and ZUC-EIA3 Multi-buffer implemented with AESNI emulation instructions.
- SNOW3G-UIA2 and SNOW3G-UEA2 reimplemented for increased security and performance.
- AES-CBC improvement for VAES
- AES-CCM implementation for VAES added
LibTestApp
- Extended ZUC tests to validate ZUC-EEA3 and ZUC-EIA3 algorithms through
job API
- Extended SNOW3G tests to validate SNOW3G-UEA2 and SNOW3G-UIA2 algorithms
through job API
- Extended DOCSIS tests with combined CRC32 calculation cases
- Extended KASUMI tests to validate KASUMI-UEA1 and KASUMI-UIA1 algorithms
through job API
- Extended ZUC tests to validate ZUC-EIA3 multi-buffer implementation
through direct and job API
- Extended AES-DOCSIS tests with 256-bit keys
LibPerfApp
- Added support for ZUC-EEA3 and ZUC-EIA3 algorithms
- Added support for SNOW3G-UEA2 and SNOW3G-UIA2 algorithms
- Added support for DOCSIS combined with CRC32
- Added support for KASUMI-UEA1 and KASUMI-UIA1 algorithms
v0.53 October 2019
========================================================================
Library
- AES-CCM performance optimizations done
- full assembly implementation
- authentication decoupled from cipher
- CCM chain order expected to be HASH_CIPHER for encryption and
CIPHER_HASH for decryption
- AES-CTR implementation for VAES added
- AES-CBC implementation for VAES added
- Single buffer AES-GCM performance improvements added for VPCLMULQDQ + VAES
- Multi-buffer AES-GCM implementation added for VPCLMULQDQ + VAES
- Data transposition optimizations and unification across the library
implemented
- Generation of make dependency files for Linux added
- AES-ECB implementation added
- PON specific stitched algorithm implementation added
- stitched AES-CTR-128 (optional) with CRC32 and BIP (running 32-bit XOR)
- AES-CMAC-128 implementation for bit length messages added
- ZUC-EEA3 and ZUC-EIA3 implementation added
- FreeBSD experimental support added
- KASUMI-F8 and KASUMI-F9 implementation added
- SNOW3G-UEA2 and SNOW3G-UIA2 implementation added
- AES-CTR implementation for bit length (128-NEA2/192-NEA2/256-NEA2) messages added
- SAFE_PARAM, SAFE_DATA and SAFE_LOOKUP compile time options added.
Find more about these options in the README file or on-line at
https://github.com/intel/intel-ipsec-mb/blob/master/README.
LibTestApp
- New API tests added
- CMAC test vectors extended
- New chained operation tests added
- Out-of-place chained operation tests added
- AES-ECB tests added
- PON algorithm tests added
- Extra AES-CTR test vectors added
- Extra AES-CBC test vectors added
- AES-CMAC-128 bit length message tests added
- CPU capability detection used to disable tests if instruction not present
- ZUC-EEA3 and ZUC-EIA3 tests added
- New cross architecture test application (ipsec_xvalid) added,
which mixes different implementations (based on different architectures),
to double check their correctness
- SNOW3G-UEA2 and SNOW3G-UIA2 tests added
- AES-CTR-128 bit length message tests added
- Negative tests extended to cover all API's
LibPerfApp
- Job size and number of iterations options added
- Single architecture test option added
- AAD size option added
- Allow zero length source buffer option added
- Custom performance test combination added:
cipher-algo, hash-algo and aead-algo arguments.
- Cipher direction option added
- The maximum buffer size extended from 2K to 16K
- Support for user defined range of job sizes added
Fixes
- Uninitialized memory reported by Valgrind fixed
- Flush decryption job fixed (issue #33)
- NULL_CIPHER order check removed (issue #30)
- Save XMM registers when emulating AES fixed (issue #28)
- SSE & AVX AES-CMAC fixed (issue #27)
- Missing GCM pointers fixed for AES-NI emulation (issue #29)
v0.52 December 2018
========================================================================
03 Dec, 2018
General
- Added AESNI emulation implementation
- Added AES-GCM multi-buffer implementation for AVX512
- Added flexible job chain order support
- GCM submit and flush functions moved into architecture MB manager modules
- AVX512/AVX2/AVX/SSE AAD GHASH computation performance improvement
- GCM API's added to MB_MGR structure
- Added plain SHA support in JOB API
- Added architectural compiler optimizations for GCC/CC
LibTestApp
- Added option not to run GCM tests
- Added AESNI emulation tests
- Added plain SHA tests
- Updated to take advantage of new GCM macros
LibPerfApp
- Buffer alignment update
- Updated to take advantage of new GCM macros
v0.51 September 2018
========================================================================
13 Sep, 2018
General
- AES-CMAC performance optimizations
- Implemented store to load optimizations in
- AES-CMAC submit and flush jobs for SSE and AVX
- HMAC-MD5, HMAC-SHA submit jobs for AVX
- HMAC-MD5 submit job for AVX2
- Added zero-sized message support in GCM
- Stack execution flag disabled in new asm modules
LibTestApp
- Added AES vectors
- Added DOCSIS AES vectors
- Added CFB validation
LibPerfApp
- Smoke test option added
v0.50 June 2018
========================================================================
13 Jun, 2018
General
- Added support for compile time and runtime library version checking
- Added support for full MD5 digest size
- Replaced defines for API with symbols for binary compatibility
- Added HMAC-SHA & HMAC-MD5 vectors to LibTestApp
- Added support for zero cipher length in AES-CCM
- Added new API's to compute SHA1, SHA224, SHA256, SHA384 and SHA512 hashes
to support key reduction cases where key is longer than a block size
- Extended support for HMAC full digest sizes for HMAC-SHA1, HMAC-SHA224,
HMAC-SHA256, HMAC-SHA384 and HMAC-SHA512. Previously only truncated sizes
were supported.
- Added AES-CMAC support for output digest size between 4 and 16 bytes
- Added GHASH support for output digest size up to 16 bytes
- Optimized submit job API's with store to load optimization in SSE, AVX,
AVX2 (excluding MD5)
- Improved performance application accuracy by increase number of
test iterations
- Extended multi-thread features of LibPerfApp Windows version to match
Linux version of the application
v0.49 March 2018
========================================================================
21 Mar, 2018
General
- AES-CMAC support added (AES-CMAC-128 and AES-CMAC-96)
- 3DES support added
- Library compiles to SO/DLL by default
- Install/uninstall targets added to makefiles
- Multiple API header files consolidated into one (intel-ipsec-mb.h)
- Unhalted cycles support added to LibPerfApp (Linux at the moment)
- ELF stack execute protection added for assembly files
- VZEROUPPER instruction issued after AVX2/AVX512 code to avoid
expensive SSE<->AVX transitions
- MAN page added
- README documentation extensions and updates
- AVX512 DES performance smoothed out
- Multi-buffer manager instance allocate and free API's added
- Core affinity support added in LibPerfApp
v0.48 December 2017
========================================================================
12 Dec, 2017
General
- Linux SO compilation option added
- Windows DLL compilation option added
- AES CCM 128 support added
- Multithread command line option added to LibPerfApp
- Coding style fixes
- Coding style target added to Makefile
v0.47 October 2017
========================================================================
Oct 5, 2017
Intel(R) AVX-512 Instructions
- DES CBC AVX512 implementation
- DOCSIS DES AVX512 implementation
General
- DES CBC cipher added (generic x86 implementation)
- DOCSIS DES cipher added (generic x86 implementation)
- DES and DOCSIS DES tests added
- RPM SPEC file created
v0.46 June 2017
========================================================================
Jun 27, 2017
General
- AES GCM optimizations for AVX2
- Change of AES GCM API: renamed and expanded keys separated from the context
- New AES GCM API via job structure and API's
- use of the interface may simplify application design at the expense of
slightly lower performance vs direct AES GCM API's
- AES GCM IV automatically padded with block counter (no need for application to do it)
- IV in AES CTR mode can be 12 bytes (no block counter); 16 byte format still allowed
- Macros added to ease access to job API for specific architecture
- use of these macros can simplify application design but it may produce worse
performance than calling architecture job API's directly
- Submit_job_nocheck() API added to gain some cycles by not validating job structure
- Result stability improvements in LibPerfApp
v0.45 March 2017
========================================================================
Mar 29, 2017
Intel(R) AVX-512 Instructions
- Added optimized HMAC-SHA224 and HMAC-SHA256
- Added optimized HMAC-SHA384 and HMAC-SHA512
General
- Windows x64 compilation target
- New DOCSIS SEC BPI V3.1 cipher
- GCM128 and GCM256 updates (with new API that is scatter gather list friendly)
- GCM192 added
- Added library API benchmark tool 'ipsec_perf' and
script to compare results 'ipsec_diff_tool.py'
Bug Fixes (vs v0.44)
- AES CTR mode fix to allow message size not to be multiple of AES block size
- RSI and RDI registers clobbered when running HMAC-SHA224 or HMAC-SHA256
on Windows using SHA extensions
v0.44 November 2016
========================================================================
Nov 21, 2016
Intel(R) AVX-512 Instructions
- AVX512 multi buffer manager added (uses AVX2 implementations by default)
- Optimized SHA1 implementation added
Intel(R) SHA Extensions
- SHA1, SHA224 and SHA256 implementations added for Intel(R) SSE
General
- NULL cipher added
- NULL hash added
- NASM tool chain compilation added (default)
=======================================
Feb 11, 2015
Fixed, so that the job auth_tag_output_len_in_bytes takes a different
value for different MAC types. In particular, the valid values are(in bytes):
SHA1 - 12
sha224 - 14
SHA256 - 16
sha384 - 24
SHA512 - 32
XCBC - 12
MD5 - 12
=======================================
Oct 24, 2011
SHA_256 added to multibuffer
------------------------
12 Aug 2011
API
The GCM API is distinct from the Multi-buffer API. This is because
the GCM code is an optimized single-buffer implementation. By
packaging them separately, the application has the option of where,
when, and how to call the GCM code, independent of how it is calling
the multi-buffer code.
For example, the application might be enqueuing multi-buffer requests
for a separate thread to process. In this scenario, if a particular
packet used GCM, then the application could choose whether to call
the GCM routines directly, or whether to enqueue those requests and
have the compute thread call the GCM routines.
GCM API
The GCM functions are defined as described the the header
files. They are simple computational routines, with no state
associated with them.
Multi-Buffer API: Two Sets of Functions
There are two parallel interfaces, one suffixed with "_sse" and one
suffixed with "_avx". These are functionally equivalent. The "_sse"
functions work on WSM and later processors. The "_avx" functions
offer better performance, but they only run on processors after WSM.
The same interface object structures are used for both sets of
interfaces, although one cannot mix the two interfaces on the same
initialized object (e.g. it would be wrong to initialize with
init_mb_mgr_sse() and then to pass that to submit_job_avx() ). After
the MB_MGR structure has been initialized with one of the two
initialization functions (init_mb_mgr_sse() or init_mb_mgr_avx()),
only the corresponding functions should be used on it.
There are several ways in which an application could use these
interfaces.
1) Direct
If an application is only going to be run on a post-WSM machine,
it can just call the "_avx" functions directly. Conversely, if it
is just going to be run on WSM machines, it can call the "_sse"
functions directly.
2) Via Branches
If an application can run on both WSM and SNB and wants the
improved performance on SNB, then it can use some method to
determine if it is on SNB, and then use a conditional branch to
determine which function to call. E.g. this could be wrapped in a
macro along the lines of:
#define submit_job(mb_mgr) \
if (_use_avx) submit_job_avx(mb_mgr); \
else submit_job_sse(mb_mgr)
3) Via a Function Table
One can embed the function addresses into a structure, call them
through this structure, and change the structure based on which
set of functions one wishes to use, e.g.
struct funcs_t {
init_mb_mgr_t init_mb_mgr;
get_next_job_t get_next_job;
submit_job_t submit_job;
get_completed_job_t get_completed_job;
flush_job_t flush_job;
};
funcs_t funcs_sse = {
init_mb_mgr_sse,
get_next_job_sse,
submit_job_sse,
get_completed_job_sse,
flush_job_sse
};
funcs_t funcs_avx = {
init_mb_mgr_avx,
get_next_job_avx,
submit_job_avx,
get_completed_job_avx,
flush_job_avx
};
funcs_t *funcs = &funcs_sse;
...
if (do_avx)
funcs = &funcs_avx;
...
funcs->init_mb_mgr(&mb_mgr);
For simplicity in the rest of this document, the functions will be
referred to no suffix.
API: Overview
The basic unit of work is a "job". It is represented by a
JOB_AES_HMAC structure. It contains all of the information needed to
perform encryption/decryption and SHA1/HMAC authentication on one
buffer for IPSec processing.
The basic paradigm is that the application needs to be able to
provide new jobs before old jobs have completed processing. One
might call this an "asynchronous" interface.
The basic interface is that the application "submits" a job to the
multi-buffer manager (MB_MGR), and it may receive a completed job
back, or it may receive NULL. The returned job, if there is one,
will not be the same as the submitted job, but the jobs will be
returned in the same order in which they are submitted.
Since there can be a semi-arbitrary number of outstanding jobs,
management of the job object is handled by the MB_MGR. The
application gets a pointer to a new job object by calling
get_next_job(). It then fills in the data fields and submits it by
calling submit_job(). If a job is returned, then that job has been
completed, and the application should do whatever it needs to do in
order to further process that buffer.
The job object is not explicitly returned to the MB_MGR. Rather it
is implicitly returned by the next call to get_next_job(). Another
way to put this is that the data within the job object is
guaranteed to be valid until the next call to get_next_job().
In order to reduce latency, there is an optional function that may
be called, get_completed_job(). This returns the next job if that
job has previously been completed. But if that job has not been
completed, no processing is done, and the function returns
NULL. This may be used to reduce the number of outstanding jobs
within the MB_MGR.
At times, it may be necessary to process the jobs currently within
the MB_MGR without providing new jobs as input. This process is
called "flushing", and it is invoked by calling flush_job(). If
there are any jobs within the MB_MGR, this will complete processing
on the earliest job and return it. It will only return NULL if there
are no jobs within the MB_MGR.
Flushing will be described in more detail below.
The presumption is that the same AES key will apply to a number of
buffers. For increased efficiency, it requires that the AES key
expansion happens as a distinct step apart from buffer
encryption/decryption. The expanded keys are stored in a data
structure (array), and this expanded key structure is used by the
job object.
There are two variants provided, MB_MGR and MB_MGR2. They are
functionally equivalent. The reason that two are provided is that
they differ slightly in their implementation, and so they may have
slightly different characteristics in terms of latency and overhead.
API: Usage Skeleton
The basic usage is illustrated in the following pseudo_code:
init_mb_mgr(&mb_mgr);
...
aes_keyexp_128(key, enc_exp_keys, dec_exp_keys);
...
while (work_to_be_done) {
job = get_next_job(&mb_mgr);
// TODO: Fill in job fields
job = submit_job(&mb_mgr);
while (job) {
// TODO: Complete processing on job
job = get_completed_job(&mb_mgr);
}
}
API: Job Fields
The mode is determined by the fields "cipher_direction" and
"chain_order". The first specifies encrypt or decrypt, and the
second specifies whether whether the hash should be done before or
after the cipher operation.
In the current implementation, only two combinations of these are
supported. For encryption, these should be set to "ENCRYPT" and
"CIPHER_HASH", and for decryption, these should be set to "DECRYPT"
and "HASH_CIPHER".
The expanded keys are pointed to by "aes_enc_key_expanded" and
"aes_dec_key_expanded". These arrays must be aligned on a 16-byte
boundary. Only one of these is necessary (as determined by
"cipher_direction").
One selects AES128 vs AES256 by using the "aes_key_len_in_bytes"
field. The only valid values are 16 (AES128) and 32 (AES256).
One selects the AES mode (CBC versus counter-mode) using
"cipher_mode".
One selects the hash algorithm (SHA1-HMAC, AES-XCBC, or MD5-HMAC)
using "hash_alg".
The data to be encrypted/decrypted is defined by
"src + cipher_start_src_offset_in_bytes". The length of data is
given by "msg_len_to_cipher_in_bytes". It must be a multiple of
16 bytes.
The destination for the cipher operation is given by "dst" (NOT by
"dst + cipher_start_src_offset_in_bytes". In many/most applications,
the destination pointer may overlap the source pointer. That is,
"dst" may be equal to "src + cipher_start_src_offset_in_bytes".
The IV for the cipher operation is given by "iv". The
"iv_len_in_bytes" should be 16. This pointer does not need to be
aligned.
The data to be hashed is defined by
"src + hash_start_src_offset_in_bytes". The length of data is
given by "msg_len_to_hash_in_bytes".
The output of the hash operation is defined by
"auth_tag_output". The number of bytes written is given by
"auth_tag_output_len_in_bytes". Currently the only valid value for
this parameter is 12.
The ipad and opad are given as the result of hashing the HMAC key
xor'ed with the appropriate value. That is, rather than passing in
the HMAC key and rehashing the initial block for every buffer, the
hashing of the initial block is done separately, and the results of
this hash are used as input in the job structure.
Similar to the expanded AES keys, the premise here is that one HMAC
key will apply to many buffers, so we want to do that hashing once
and not for each buffer.
The "status" reflects the status of the returned job. It should be
"STS_COMPLETED".
The "user_data" field is ignored. It can be used to attach
application data to the job object.
Flushing Concerns
As long as jobs are coming in at a reasonable rate, jobs should be
returned at a reasonable rate. However, if there is a lull in the
arrival of new jobs, the last few jobs that were submitted tend to
stay in the MB_MGR until new jobs arrive. This might result in there
being an unreasonable latency for these jobs.
In this case, flush_job() should be used to complete processing on
these outstanding jobs and prevent them from having excessive
latency.
Exactly when and how to use flush_job() is up to the application,
and is a balancing act. The processing of flush_job() is less
efficient than that of submit_job(), so calling flush_job() too
often will lower the system efficiency. Conversely, calling
flush_job() too rarely may result in some jobs seeing excessive
latency.
There are several strategies that the application may employ for
flushing. One usage model is that there is a (thread-safe) queue
containing work items. One or more threads puts work onto this
queue, and one or more processing threads removes items from this
queue and processes them through the MB_MGR. In this usage, a simple
flushing strategy is that when the processing thread wants to do
more work, but the queue is empty, it then proceeds to flush jobs
until either the queue contains more work, or the MB_MGR no longer
contains jobs (i.e. that flush_job() returns NULL). A variation on
this is that when the work queue is empty, the processing thread
might pause for a short time to see if any new work appears, before
it starts flushing.
In other usage models, there may be no such queue. An alternate
flushing strategy is that have a separate "flush thread" hanging
around. It wakes up periodically and checks to see if any work has
been requested since the last time it woke up. If some period of
time has gone by with no new work appearing, it would proceed to
flush the MB_MGR.
AES Key Usage
If the AES mode is CBC, then the fields aes_enc_key_expanded or
aes_dec_key_expanded are using depending on whether the data is
being encrypted or decrypted. However, if the AES mode is CNTR
(counter mode), then only aes_enc_key_expanded is used, even for a
decrypt operation.
The application can handle this dichotomy, or it might choose to
simply set both fields in all cases.
Thread Safety
The MB_MGR and the associated functions ARE NOT thread safe. If
there are multiple threads that may be calling these functions
(e.g. a processing thread and a flushing thread), it is the
responsibility of the application to put in place sufficient locking
so that no two threads will make calls to the same MB_MGR object at
the same time.
XMM Register Usage
The current implementation is designed for integration in the Linux
Kernel. All of the functions satisfy the Linux ABI with respect to
general purpose registers. However, the submit_job() and flush_job()
functions use XMM registers without saving/restoring any of them. It
is up to the application to manage the saving/restoring of XMM
registers itself.
Auxiliary Functions
There are several auxiliary functions packed with MB_MGR. These may
be used, or the application may choose to use their own version. Two
of these, aes_keyexp_128() and aes_keyexp_256() expand AES keys into
a form that is acceptable for reference in the job structure.
In the case of AES128, the expanded key structure should be an array
of 11 128-bit words, aligned on a 16-byte boundary. In the case of
AES256, it should be an array of 15 128-bit words, aligned on a
16-byte boundary.
There is also a function, sha1(), which will compute the SHA1 digest
of a single 64-byte block. It can be used to compute the ipad and
opad digests. There is a similar function, md5(), which can be used
when using MD5-HMAC.
For further details on the usage of these functions, see the sample
test application.
|