summaryrefslogtreecommitdiff
path: root/arch/x86/crypto
AgeCommit message (Collapse)Author
2025-05-05crypto: x86/sha256 - implement library instead of shashEric Biggers
Instead of providing crypto_shash algorithms for the arch-optimized SHA-256 code, instead implement the SHA-256 library. This is much simpler, it makes the SHA-256 library functions be arch-optimized, and it fixes the longstanding issue where the arch-optimized SHA-256 was disabled by default. SHA-256 still remains available through crypto_shash, but individual architectures no longer need to handle it. To match sha256_blocks_arch(), change the type of the nblocks parameter of the assembly functions from int to size_t. The assembly functions actually already treated it as size_t. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28crypto: x86/polyval - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28crypto: x86 - move library functions to arch/x86/lib/crypto/Eric Biggers
Continue disentangling the crypto library functions from the generic crypto infrastructure by moving the x86 BLAKE2s, ChaCha, and Poly1305 library functions into a new directory arch/x86/lib/crypto/ that does not depend on CRYPTO. This mirrors the distinction between crypto/ and lib/crypto/. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-28crypto: x86 - drop redundant dependencies on X86Eric Biggers
arch/x86/crypto/Kconfig is sourced only when CONFIG_X86=y, so there is no need for the symbols defined inside it to depend on X86. In the case of CRYPTO_TWOFISH_586 and CRYPTO_TWOFISH_X86_64, the dependency was actually on '(X86 || UML_X86)', which suggests that these two symbols were intended to be available under user-mode Linux as well. Yet, again these symbols were defined only when CONFIG_X86=y, so that was not the case. Just remove this redundant dependency. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: x86/sm3 - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: x86/sha512 - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: sha256_base - Remove partial block helpersHerbert Xu
Now that all sha256_base users have been converted to use the API partial block handling, remove the partial block helpers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: x86/sha256 - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: x86/sha1 - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: x86/ghash - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-19crypto: lib/poly1305 - restore ability to remove modulesEric Biggers
Though the module_exit functions are now no-ops, they should still be defined, since otherwise the modules become unremovable. Fixes: 1f81c58279c7 ("crypto: arm/poly1305 - remove redundant shash algorithm") Fixes: f4b1a73aec5c ("crypto: arm64/poly1305 - remove redundant shash algorithm") Fixes: 378a337ab40f ("crypto: powerpc/poly1305 - implement library instead of shash") Fixes: 21969da642a2 ("crypto: x86/poly1305 - remove redundant shash algorithm") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-19crypto: lib/chacha - restore ability to remove modulesEric Biggers
Though the module_exit functions are now no-ops, they should still be defined, since otherwise the modules become unremovable. Fixes: 08820553f33a ("crypto: arm/chacha - remove the redundant skcipher algorithms") Fixes: 8c28abede16c ("crypto: arm64/chacha - remove the skcipher algorithms") Fixes: f7915484c020 ("crypto: powerpc/chacha - remove the skcipher algorithms") Fixes: ceba0eda8313 ("crypto: riscv/chacha - implement library instead of skcipher") Fixes: 632ab0978f08 ("crypto: x86/chacha - remove the skcipher algorithms") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16crypto: x86/poly1305 - don't select CRYPTO_LIB_POLY1305_GENERICEric Biggers
The x86 Poly1305 code never falls back to the generic code, so selecting CRYPTO_LIB_POLY1305_GENERIC is unnecessary. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16crypto: x86/poly1305 - remove redundant shash algorithmEric Biggers
Since crypto/poly1305.c now registers a poly1305-$(ARCH) shash algorithm that uses the architecture's Poly1305 library functions, individual architectures no longer need to do the same. Therefore, remove the redundant shash algorithm from the arch-specific code and leave just the library functions there. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16crypto: poly1305 - centralize the shash wrappers for arch codeEric Biggers
Following the example of the crc32, crc32c, and chacha code, make the crypto subsystem register both generic and architecture-optimized poly1305 shash algorithms, both implemented on top of the appropriate library functions. This eliminates the need for every architecture to implement the same shash glue code. Note that the poly1305 shash requires that the key be prepended to the data, which differs from the library functions where the key is simply a parameter to poly1305_init(). Previously this was handled at a fairly low level, polluting the library code with shash-specific code. Reorganize things so that the shash code handles this quirk itself. Also, to register the architecture-optimized shashes only when architecture-optimized code is actually being used, add a function poly1305_is_arch_optimized() and make each arch implement it. Change each architecture's Poly1305 module_init function to arch_initcall so that the CPU feature detection is guaranteed to run before poly1305_is_arch_optimized() gets called by crypto/poly1305.c. (In cases where poly1305_is_arch_optimized() just returns true unconditionally, using arch_initcall is not strictly needed, but it's still good to be consistent across architectures.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16crypto: lib/sm3 - Move sm3 library into lib/cryptoHerbert Xu
Move the sm3 library code into lib/crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-09crypto: x86/chacha - Restore SSSE3 fallback pathHerbert Xu
The chacha_use_simd static branch is required for x86 machines that lack SSSE3 support. Restore it and the generic fallback code. Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: 9b4400215e0e ("crypto: x86/chacha - Remove SIMD fallback path") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/chacha - remove the skcipher algorithmsEric Biggers
Since crypto/chacha.c now registers chacha20-$(ARCH), xchacha20-$(ARCH), and xchacha12-$(ARCH) skcipher algorithms that use the architecture's ChaCha and HChaCha library functions, individual architectures no longer need to do the same. Therefore, remove the redundant skcipher algorithms and leave just the library functions. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: chacha - centralize the skcipher wrappers for arch codeEric Biggers
Following the example of the crc32 and crc32c code, make the crypto subsystem register both generic and architecture-optimized chacha20, xchacha20, and xchacha12 skcipher algorithms, all implemented on top of the appropriate library functions. This eliminates the need for every architecture to implement the same skcipher glue code. To register the architecture-optimized skciphers only when architecture-optimized code is actually being used, add a function chacha_is_arch_optimized() and make each arch implement it. Change each architecture's ChaCha module_init function to arch_initcall so that the CPU feature detection is guaranteed to run before chacha_is_arch_optimized() gets called by crypto/chacha.c. In the case of s390, remove the CPU feature based module autoloading, which is no longer needed since the module just gets pulled in via function linkage. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aes-xts - optimize _compute_first_set_of_tweaks for AVX-512Eric Biggers
Optimize the AVX-512 version of _compute_first_set_of_tweaks by using vectorized shifts to compute the first vector of tweak blocks, and by using byte-aligned shifts when multiplying by x^8. AES-XTS performance on AMD Ryzen 9 9950X (Zen 5) improves by about 2% for 4096-byte messages or 6% for 512-byte messages. AES-XTS performance on Intel Sapphire Rapids improves by about 1% for 4096-byte messages or 3% for 512-byte messages. Code size decreases by 75 bytes which outweighs the increase in rodata size of 16 bytes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86 - Remove CONFIG_AS_AVX512 handlingUros Bizjak
Current minimum required version of binutils is 2.25, which supports AVX-512 instruction mnemonics. Remove check for assembler support of AVX-512 instructions and all relevant macros for conditional compilation. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86 - Remove CONFIG_AS_SHA256_NIUros Bizjak
Current minimum required version of binutils is 2.25, which supports SHA-256 instruction mnemonics. Remove check for assembler support of SHA-256 instructions and all relevant macros for conditional compilation. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86 - Remove CONFIG_AS_SHA1_NIUros Bizjak
Current minimum required version of binutils is 2.25, which supports SHA-1 instruction mnemonics. Remove check for assembler support of SHA-1 instructions and all relevant macros for conditional compilation. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/chacha - Remove SIMD fallback pathHerbert Xu
Get rid of the fallback path as SIMD is now always usable in softirq context. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/twofish - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/sm4 - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/serpent - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/cast - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/camellia - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aria - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aes - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aegis - stop using the SIMD helperEric Biggers
Stop wrapping skcipher and aead algorithms with the crypto SIMD helper (crypto/simd.c). The only purpose of doing so was to work around x86 not always supporting kernel-mode FPU in softirqs. Specifically, if a hardirq interrupted a task context kernel-mode FPU section and then a softirqs were run at the end of that hardirq, those softirqs could not use kernel-mode FPU. This has now been fixed. In combination with the fact that the skcipher and aead APIs only support task and softirq contexts, these can now just use kernel-mode FPU unconditionally on x86. This simplifies the code and improves performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-07crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR codeEric Biggers
Intel made a late change to the AVX10 specification that removes support for a 256-bit maximum vector length and enumeration of the maximum vector length. AVX10 will imply a maximum vector length of 512 bits. I.e. there won't be any such thing as AVX10/256 or AVX10/512; there will just be AVX10, and it will essentially just consolidate AVX512 features. As a result of this new development, my strategy of providing both *_avx10_256 and *_avx10_512 functions didn't turn out to be that useful. The only remaining motivation for the 256-bit AVX512 / AVX10 functions is to avoid downclocking on older Intel CPUs. But in the case of AES-XTS and AES-CTR, I already wrote *_avx2 code too (primarily to support CPUs without AVX512), which performs almost as well as *_avx10_256. So we should just use that. Therefore, remove the *_avx10_256 AES-XTS and AES-CTR functions and algorithms, and rename the *_avx10_512 AES-XTS and AES-CTR functions and algorithms to *_avx512. Make Ice Lake and Tiger Lake use *_avx2 instead of *_avx10_256 which they previously used. I've left AES-GCM unchanged for now. There is no VAES+AVX2 optimized AES-GCM in the kernel yet, so the path forward for that is not as clear. However, I did write a VAES+AVX2 optimized AES-GCM for BoringSSL. So one option is to port that to the kernel and then do the same cleanup. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-29Merge tag 'v6.15-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Remove legacy compression interface - Improve scatterwalk API - Add request chaining to ahash and acomp - Add virtual address support to ahash and acomp - Add folio support to acomp - Remove NULL dst support from acomp Algorithms: - Library options are fuly hidden (selected by kernel users only) - Add Kerberos5 algorithms - Add VAES-based ctr(aes) on x86 - Ensure LZO respects output buffer length on compression - Remove obsolete SIMD fallback code path from arm/ghash-ce Drivers: - Add support for PCI device 0x1134 in ccp - Add support for rk3588's standalone TRNG in rockchip - Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93 - Fix bugs in tegra uncovered by multi-threaded self-test - Fix corner cases in hisilicon/sec2 Others: - Add SG_MITER_LOCAL to sg miter - Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp" * tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (187 commits) crypto: testmgr - Add multibuffer acomp testing crypto: acomp - Fix synchronous acomp chaining fallback crypto: testmgr - Add multibuffer hash testing crypto: hash - Fix synchronous ahash chaining fallback crypto: arm/ghash-ce - Remove SIMD fallback code path crypto: essiv - Replace memcpy() + NUL-termination with strscpy() crypto: api - Call crypto_alg_put in crypto_unregister_alg crypto: scompress - Fix incorrect stream freeing crypto: lib/chacha - remove unused arch-specific init support crypto: remove obsolete 'comp' compression API crypto: compress_null - drop obsolete 'comp' implementation crypto: cavium/zip - drop obsolete 'comp' implementation crypto: zstd - drop obsolete 'comp' implementation crypto: lzo - drop obsolete 'comp' implementation crypto: lzo-rle - drop obsolete 'comp' implementation crypto: lz4hc - drop obsolete 'comp' implementation crypto: lz4 - drop obsolete 'comp' implementation crypto: deflate - drop obsolete 'comp' implementation crypto: 842 - drop obsolete 'comp' implementation crypto: nx - Migrate to scomp API ...
2025-03-25Merge tag 'crc-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Another set of improvements to the kernel's CRC (cyclic redundancy check) code: - Rework the CRC64 library functions to be directly optimized, like what I did last cycle for the CRC32 and CRC-T10DIF library functions - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ support and acceleration for crc64_be and crc64_nvme - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for crc_t10dif, crc64_be, and crc64_nvme - Remove crc_t10dif and crc64_rocksoft from the crypto API, since they are no longer needed there - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect - Add kunit test cases for crc64_nvme and crc7 - Eliminate redundant functions for calculating the Castagnoli CRC32, settling on just crc32c() - Remove unnecessary prompts from some of the CRC kconfig options - Further optimize the x86 crc32c code" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits) x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512 lib/crc: remove unnecessary prompt for CONFIG_CRC64 lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C lib/crc: remove unnecessary prompt for CONFIG_CRC8 lib/crc: remove unnecessary prompt for CONFIG_CRC7 lib/crc: remove unnecessary prompt for CONFIG_CRC4 lib/crc7: unexport crc7_be_syndrome_table lib/crc_kunit.c: update comment in crc_benchmark() lib/crc_kunit.c: add test and benchmark for crc7_be() x86/crc32: optimize tail handling for crc32c short inputs riscv/crc64: add Zbc optimized CRC64 functions riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function riscv/crc32: reimplement the CRC32 functions using new template riscv/crc: add "template" for Zbc optimized CRC functions x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings x86/crc32: improve crc32c_arch() code generation with clang x86/crc64: implement crc64_be and crc64_nvme using new template x86/crc-t10dif: implement crc_t10dif using new template x86/crc32: implement crc32_le using new template x86/crc: add "template" for [V]PCLMULQDQ based CRC functions ...
2025-03-21crypto: lib/chacha - remove unused arch-specific init supportEric Biggers
All implementations of chacha_init_arch() just call chacha_init_generic(), so it is pointless. Just delete it, and replace chacha_init() with what was previously chacha_init_generic(). Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: skcipher - Make skcipher_walk src.virt.addr constHerbert Xu
Mark the src.virt.addr field in struct skcipher_walk as a pointer to const data. This guarantees that the user won't modify the data which should be done through dst.virt.addr to ensure that flushing is done when necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15crypto: scatterwalk - Change scatterwalk_next calling conventionHerbert Xu
Rather than returning the address and storing the length into an argument pointer, add an address field to the walk struct and use that to store the address. The length is returned directly. Change the done functions to use this stored address instead of getting them from the caller. Split the address into two using a union. The user should only access the const version so that it is never changed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-02crypto: lib/Kconfig - Hide arch options from userHerbert Xu
The ARCH_MAY_HAVE patch missed arm64, mips and s390. But it may also lead to arch options being enabled but ineffective because of modular/built-in conflicts. As the primary user of all these options wireguard is selecting the arch options anyway, make the same selections at the lib/crypto option level and hide the arch options from the user. Instead of selecting them centrally from lib/crypto, simply set the default of each arch option as suggested by Eric Biggers. Change the Crypto API generic algorithms to select the top-level lib/crypto options instead of the generic one as otherwise there is no way to enable the arch options (Eric Biggers). Introduce a set of INTERNAL options to work around dependency cycles on the CONFIG_CRYPTO symbol. Fixes: 1047e21aecdf ("crypto: lib/Kconfig - Fix lib built-in failure when arch is modular") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@kernel.org> Closes: https://lore.kernel.org/oe-kbuild-all/202502232152.JC84YDLp-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-02crypto: x86/aegis - use the new scatterwalk functionsEric Biggers
In crypto_aegis128_aesni_process_ad(), use scatterwalk_next() which consolidates scatterwalk_clamp() and scatterwalk_map(). Use scatterwalk_done_src() which consolidates scatterwalk_unmap(), scatterwalk_advance(), and scatterwalk_done(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-02crypto: x86/aes-gcm - use the new scatterwalk functionsEric Biggers
In gcm_process_assoc(), use scatterwalk_next() which consolidates scatterwalk_clamp() and scatterwalk_map(). Use scatterwalk_done_src() which consolidates scatterwalk_unmap(), scatterwalk_advance(), and scatterwalk_done(). Also rename some variables to avoid implying that anything is actually mapped (it's not), or that the loop is going page by page (it is for now, but nothing actually requires that to be the case). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: x86/ghash - Use proper helpers to clone requestHerbert Xu
Rather than copying a request by hand with memcpy, use the correct API helpers to setup the new request. This will matter once the API helpers start setting up chained requests as a simple memcpy will break chaining. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: lib/Kconfig - Fix lib built-in failure when arch is modularHerbert Xu
The HAVE_ARCH Kconfig options in lib/crypto try to solve the modular versus built-in problem, but it still fails when the the LIB option (e.g., CRYPTO_LIB_CURVE25519) is selected externally. Fix this by introducing a level of indirection with ARCH_MAY_HAVE Kconfig options, these then go on to select the ARCH_HAVE options if the ARCH Kconfig options matches that of the LIB option. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501230223.ikroNDr1-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: x86/aes-xts - change license to Apache-2.0 OR BSD-2-ClauseEric Biggers
As with the other AES modes I've implemented, I've received interest in my AES-XTS assembly code being reused in other projects. Therefore, change the license to Apache-2.0 OR BSD-2-Clause like what I used for AES-GCM. Apache-2.0 is the license of OpenSSL and BoringSSL. Note that it is difficult to *directly* share code between the kernel, OpenSSL, and BoringSSL for various reasons such as perlasm vs. plain asm, Windows ABI support, different divisions of responsibility between C and asm in each project, etc. So whether that will happen instead of just doing ports is still TBD. But this dual license should at least make it possible to port changes between the projects. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: x86/aes-ctr - rewrite AESNI+AVX optimized CTR and add VAES supportEric Biggers
Delete aes_ctrby8_avx-x86_64.S and add a new assembly file aes-ctr-avx-x86_64.S which follows a similar approach to aes-xts-avx-x86_64.S in that it uses a "template" to provide AESNI+AVX, VAES+AVX2, VAES+AVX10/256, and VAES+AVX10/512 code, instead of just AESNI+AVX. Wire it up to the crypto API accordingly. This greatly improves the performance of AES-CTR and AES-XCTR on VAES-capable CPUs, with the best case being AMD Zen 5 where an over 230% increase in throughput is seen on long messages. Performance on non-VAES-capable CPUs remains about the same, and the non-AVX AES-CTR code (aesni_ctr_enc) is also kept as-is for now. There are some slight regressions (less than 10%) on some short message lengths on some CPUs; these are difficult to avoid, given how the previous code was so heavily unrolled by message length, and they are not particularly important. Detailed performance results are given in the tables below. Both CTR and XCTR support is retained. The main loop remains 8-vector-wide, which differs from the 4-vector-wide main loops that are used in the XTS and GCM code. A wider loop is appropriate for CTR and XCTR since they have fewer other instructions (such as vpclmulqdq) to interleave with the AES instructions. Similar to what was the case for AES-GCM, the new assembly code also has a much smaller binary size, as it fixes the excessive unrolling by data length and key length present in the old code. Specifically, the new assembly file compiles to about 9 KB of text vs. 28 KB for the old file. This is despite 4x as many implementations being included. The tables below show the detailed performance results. The tables show percentage improvement in single-threaded throughput for repeated encryption of the given message length; an increase from 6000 MB/s to 12000 MB/s would be listed as 100%. They were collected by directly measuring the Linux crypto API performance using a custom kernel module. The tested CPUs were all server processors from Google Compute Engine except for Zen 5 which was a Ryzen 9 9950X desktop processor. Table 1: AES-256-CTR throughput improvement, CPU microarchitecture vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ---------------------+-------+-------+-------+-------+-------+-------+ AMD Zen 5 | 232% | 203% | 212% | 143% | 71% | 95% | Intel Emerald Rapids | 116% | 116% | 117% | 91% | 78% | 79% | Intel Ice Lake | 109% | 103% | 107% | 81% | 54% | 56% | AMD Zen 4 | 109% | 91% | 100% | 70% | 43% | 59% | AMD Zen 3 | 92% | 78% | 87% | 57% | 32% | 43% | AMD Zen 2 | 9% | 8% | 14% | 12% | 8% | 21% | Intel Skylake | 7% | 7% | 8% | 5% | 3% | 8% | | 300 | 200 | 64 | 63 | 16 | ---------------------+-------+-------+-------+-------+-------+ AMD Zen 5 | 57% | 39% | -9% | 7% | -7% | Intel Emerald Rapids | 37% | 42% | -0% | 13% | -8% | Intel Ice Lake | 39% | 30% | -1% | 14% | -9% | AMD Zen 4 | 42% | 38% | -0% | 18% | -3% | AMD Zen 3 | 38% | 35% | 6% | 31% | 5% | AMD Zen 2 | 24% | 23% | 5% | 30% | 3% | Intel Skylake | 9% | 1% | -4% | 10% | -7% | Table 2: AES-256-XCTR throughput improvement, CPU microarchitecture vs. message length in bytes: | 16384 | 4096 | 4095 | 1420 | 512 | 500 | ---------------------+-------+-------+-------+-------+-------+-------+ AMD Zen 5 | 240% | 201% | 216% | 151% | 75% | 108% | Intel Emerald Rapids | 100% | 99% | 102% | 91% | 94% | 104% | Intel Ice Lake | 93% | 89% | 92% | 74% | 50% | 64% | AMD Zen 4 | 86% | 75% | 83% | 60% | 41% | 52% | AMD Zen 3 | 73% | 63% | 69% | 45% | 21% | 33% | AMD Zen 2 | -2% | -2% | 2% | 3% | -1% | 11% | Intel Skylake | -1% | -1% | 1% | 2% | -1% | 9% | | 300 | 200 | 64 | 63 | 16 | ---------------------+-------+-------+-------+-------+-------+ AMD Zen 5 | 78% | 56% | -4% | 38% | -2% | Intel Emerald Rapids | 61% | 55% | 4% | 32% | -5% | Intel Ice Lake | 57% | 42% | 3% | 44% | -4% | AMD Zen 4 | 35% | 28% | -1% | 17% | -3% | AMD Zen 3 | 26% | 23% | -3% | 11% | -6% | AMD Zen 2 | 13% | 24% | -1% | 14% | -3% | Intel Skylake | 16% | 8% | -4% | 35% | -3% | Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-14x86/cfi: Clean up linkagePeter Zijlstra
With the introduction of kCFI the addition of ENDBR to SYM_FUNC_START* no longer suffices to make the function indirectly callable. This now requires the use of SYM_TYPED_FUNC_START. As such, remove the implicit ENDBR from SYM_FUNC_START* and add some explicit annotations to fix things up again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20250207122546.409116003@infradead.org
2025-02-14x86,kcfi: Fix EXPORT_SYMBOL vs kCFIPeter Zijlstra
The expectation is that all EXPORT'ed symbols are free to have their address taken and called indirectly. The majority of the assembly defined functions currently violate this expectation. Make then all use SYM_TYPED_FUNC_START() in order to emit the proper kCFI preamble. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20250207122546.302679189@infradead.org
2025-02-10x86: move ZMM exclusion list into CPU feature flagEric Biggers
Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is set when the CPU is on this list. This allows other code in arch/x86/, such as the CRC library code, to apply the same exclusion list when deciding whether to execute 256-bit or 512-bit optimized functions. Note that full AVX512 support including ZMM registers is still exposed to userspace and is still supported for in-kernel use. This flag just indicates whether in-kernel code should prefer to use YMM registers. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250210174540.161705-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-09crypto: x86/aes-xts - make the fast path 64-bit specificEric Biggers
Remove 32-bit support from the fast path in xts_crypt(). Then optimize it for 64-bit, and simplify the code, by switching to sg_virt() and removing the now-unnecessary checks for crossing a page boundary. The result is simpler code that is slightly smaller and faster in the case that actually matters (64-bit). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-24Merge tag 'v6.14-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Remove physical address skcipher walking - Fix boot-up self-test race Algorithms: - Optimisations for x86/aes-gcm - Optimisations for x86/aes-xts - Remove VMAC - Remove keywrap Drivers: - Remove n2 Others: - Fixes for padata UAF - Fix potential rhashtable deadlock by moving schedule_work outside lock" * tag 'v6.14-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (75 commits) rhashtable: Fix rhashtable_try_insert test dt-bindings: crypto: qcom,inline-crypto-engine: Document the SM8750 ICE dt-bindings: crypto: qcom,prng: Document SM8750 RNG dt-bindings: crypto: qcom-qce: Document the SM8750 crypto engine crypto: asymmetric_keys - Remove unused key_being_used_for[] padata: avoid UAF for reorder_work padata: fix UAF in padata_reorder padata: add pd get/put refcnt helper crypto: skcipher - call cond_resched() directly crypto: skcipher - optimize initializing skcipher_walk fields crypto: skcipher - clean up initialization of skcipher_walk::flags crypto: skcipher - fold skcipher_walk_skcipher() into skcipher_walk_virt() crypto: skcipher - remove redundant check for SKCIPHER_WALK_SLOW crypto: skcipher - remove redundant clamping to page size crypto: skcipher - remove unnecessary page alignment of bounce buffer crypto: skcipher - document skcipher_walk_done() and rename some vars crypto: omap - switch from scatter_walk to plain offset crypto: powerpc/p10-aes-gcm - simplify handling of linear associated data crypto: bcm - Drop unused setting of local 'ptr' variable crypto: hisilicon/qm - support new function communication ...