linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-30 13:03:01 +00:00

Author	SHA1	Message	Date
Ard Biesheuvel	1c0cf6d196	crypto: arm64/neonbs - fix out-of-bounds access on short input The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with. It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code. The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs. Fixes: `fc074e1300` ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk") Cc: <stable@vger.kernel.org> Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-02-24 08:37:24 +08:00
Ard Biesheuvel	f691d444f9	crypto: arm64/aes-ccm - Merge finalization into en/decrypt asm helpers The C glue code already infers whether or not the current iteration is the final one, by comparing walk.nbytes with walk.total. This means we can easily inform the asm helpers of this as well, by conditionally passing a pointer to the original IV, which is used in the finalization of the MAC. This removes the need for a separate call into the asm code to perform the finalization. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	7150528849	crypto: arm64/aes-ccm - Merge encrypt and decrypt tail handling The encryption and decryption code paths are mostly identical, except for a small difference where the plaintext input into the MAC is taken from either the input or the output block. We can factor this in quite easily using a vector bit select, and a few additional XORs, without the need for branches. This way, we can use the same tail handling logic on the encrypt and decrypt code paths, allowing further consolidation of the asm helpers in a subsequent patch. (In the main loop, adding just a handful of ALU instructions results in a noticeable performance hit [around 5% on Apple M2], so those routines are kept separate) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	565def1542	crypto: arm64/aes-ccm - Cache round keys and unroll AES loops The CCM code as originally written attempted to use as few NEON registers as possible, to avoid having to eagerly preserve/restore the entire NEON register file at every call to kernel_neon_begin/end. At that time, this API took a number of NEON registers as a parameter, and only preserved that many registers. Today, the NEON register file is restored lazily, and the old API is long gone. This means we can use as many NEON registers as we can make meaningful use of, which means in the AES case that we can keep all round keys in registers rather than reloading each of them for each AES block processed. On Cortex-A53, this results in a speedup of more than 50%. (From 4 cycles per byte to 2.6 cycles per byte) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	948ffc66e5	crypto: arm64/aes-ccm - Reuse existing MAC update for AAD input CCM combines the counter (CTR) encryption mode with a MAC based on the same block cipher. This MAC construction is a bit clunky: it invokes the block cipher in a way that cannot be parallelized, resulting in poor CPU pipeline efficiency. The arm64 CCM code mitigates this by interleaving the encryption and MAC at the AES round level, resulting in a substantial speedup. But this approach does not apply to the additional authenticated data (AAD) which is not encrypted. This means the special asm routine dealing with the AAD is not any better than the MAC update routine used by the arm64 AES block encryption driver, so let's reuse that, and drop the special AES-CCM version. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	c131098d6d	crypto: arm64/aes-ccm - Replace bytewise tail handling with NEON permute Implement the CCM tail handling using a single sequence that uses permute vectors and overlapping loads and stores, rather than going over the tail byte by byte in a loop, and using scalar operations. This is more efficient, even though the measured speedup is only around 1-2% on the CPUs I have tried. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	97c4c10daf	crypto: arm64/aes-ccm - Pass short inputs via stack buffer In preparation for optimizing the CCM core asm code using permutation vectors and overlapping loads and stores, ensure that inputs shorter than the size of a AES block are passed via a buffer on the stack, in a way that positions the data at the end of a 16 byte buffer. This removes the need for the asm code to reason about a rare corner case where the tail of the data cannot be read/written using a single NEON load/store instruction. While at it, tweak the copyright header and authorship to bring it up to date. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	88c6d50f64	crypto: arm64/aes-ccm - Keep NEON enabled during skcipher walk Now that kernel mode NEON no longer disables preemption, we no longer have to take care to disable and re-enable use of the NEON when calling into the skcipher walk API. So just keep it enabled until done. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Ard Biesheuvel	f722002441	crypto: arm64/aes-ccm - Revert "Rewrite skcipher walker loop" This reverts commit `57ead1bf1c`, which updated the CCM code to only rely on walk.nbytes to check for failures returned from the skcipher walk API, mostly for the common good rather than to fix a particular problem in the code. This change introduces a problem of its own: the skcipher walk is started with the 'atomic' argument set to false, which means that the skcipher walk API is permitted to sleep. Subsequently, it invokes skcipher_walk_done() with preemption disabled on the final iteration of the loop. This appears to work by accident, but it is arguably a bad example, and providing a better example was the point of the original patch. Given that future changes to the CCM code will rely on the original behavior of entering the loop even for zero sized inputs, let's just revert this change entirely, and proceed from there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-01-26 16:39:32 +08:00
Herbert Xu	29fa12e918	crypto: arm64/sm4 - Remove cfb(sm4) Remove the unused CFB implementation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-12-08 11:59:45 +08:00
Eric Biggers	1be7505933	crypto: arm64/sha512 - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha512_block_data_order() got this backwards. Fix this, albeit without changing the name in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:26 +08:00
Eric Biggers	455951b5e1	crypto: arm64/sha256 - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha256_block_data_order() and __sha256_block_neon() got this backwards. Fix this, albeit without changing the names in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:26 +08:00
Eric Biggers	5f720a3df3	crypto: arm64/sha512-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha512_ce_transform() and __sha512_block_data_order() got this backwards. Fix this, albeit without changing "sha512_block_data_order" in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	ba30d31121	crypto: arm64/sha2-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha2_ce_transform() and __sha256_block_data_order() got this backwards. Fix this, albeit without changing "sha256_block_data_order" in the perlasm since that is OpenSSL code. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	1f9f3a5218	crypto: arm64/sha1-ce - clean up backwards function names In the Linux kernel, a function whose name has two leading underscores is conventionally called by the same-named function without leading underscores -- not the other way around. __sha1_ce_transform() got this backwards. Fix this. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	ddefde7b2a	crypto: arm64/nhpoly1305 - implement ->digest Implement the ->digest method to improve performance on single-page messages by reducing the number of indirect calls. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Eric Biggers	1efcbf0eff	crypto: arm64/sha2-ce - implement ->digest for sha256 Implement a ->digest function for sha256-ce. This improves the performance of crypto_shash_digest() with this algorithm by reducing the number of indirect calls that are made. This only adds ~112 bytes of code, mostly for the inlined init, as the finup function is tail-called. For now, don't bother with this for sha224, since sha224 is rarely used. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-10-20 13:39:25 +08:00
Masahiro Yamada	ac2d838fb7	crypto: arm64/aes - remove Makefile hack Do it more simiply. This also fixes single target builds. [before] $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i [snip] make[4]: *** No rule to make target 'arch/arm64/crypto/aes-glue-ce.i'. Stop. [after] $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- arch/arm64/crypto/aes-glue-ce.i [snip] CPP arch/arm64/crypto/aes-glue-ce.i Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-08-11 19:19:27 +08:00
Herbert Xu	f573db7aa5	crypto: arm64/sha256-glue - Include module.h Include module.h in arch/arm64/crypto/sha256-glue.c as it uses various macros (such as MODULE_AUTHOR) that are defined there. Also fix the ordering of types.h. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202305191953.PIB1w80W-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-05-19 20:56:59 +08:00
Eric Biggers	47446d7cd4	crypto: arm64/aes-neonbs - fix crash with CFI enabled aesbs_ecb_encrypt(), aesbs_ecb_decrypt(), aesbs_xts_encrypt(), and aesbs_xts_decrypt() are called via indirect function calls. Therefore they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause their type hashes to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure if the compiler doesn't happen to optimize out the indirect calls. Fixes: `c50d32859e` ("arm64: Add types to indirect called assembly functions") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-03-14 17:06:44 +08:00
Herbert Xu	4e4a08868f	crypto: arm64/sm4-gcm - Fix possible crash in GCM cryption An often overlooked aspect of the skcipher walker API is that an error is not just indicated by a non-zero return value, but by the fact that walk->nbytes is zero. Thus it is an error to call skcipher_walk_done after getting back walk->nbytes == 0 from the previous interaction with the walker. This is because when walk->nbytes is zero the walker is left in an undefined state and any further calls to it may try to free uninitialised stack memory. The sm4 arm64 ccm code gets this wrong and ends up calling skcipher_walk_done even when walk->nbytes is zero. This patch rewrites the loop in a form that resembles other callers. Reported-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Tianjia Zhang	3b9d902153	crypto: arm64/sm4-ccm - Rewrite skcipher walker loop The fact that an error in the skcipher walker API are indicated not only by a non-zero return value, but also by the fact that walk->nbytes is zero, causes the layout of the skcipher walker loop to be sufficiently different from the usual layout, which is not a problem in itself, but it is likely to cause reading confusion and difficulty in code maintenance. This patch rewrites skcipher walker loop, and separates the last chunk cryption from the loop to avoid wrong calls to the skcipher walker API. In addition to following the usual convention of checking walk->nbytes, it also makes the loop execute logic clearer and easier to understand. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Herbert Xu	57ead1bf1c	crypto: arm64/aes-ccm - Rewrite skcipher walker loop An often overlooked aspect of the skcipher walker API is that an error is not just indicated by a non-zero return value, but by the fact that walk->nbytes is zero. Thus it is an error to call skcipher_walk_done after getting back walk->nbytes == 0 from the previous interaction with the walker. This is because when walk->nbytes is zero the walker is left in an undefined state and any further calls to it may try to free uninitialised stack memory. The arm64 ccm code has to deal with zero-length messages, and it needs to process data even when walk->nbytes == 0 is returned. It doesn't have this bug because there is an explicit check for walk->nbytes != 0 prior to the skcipher_walk_done call. However, the loop is still sufficiently different from the usual layout and it appears to have been copied into other code which then ended up with this bug. This patch rewrites it to follow the usual convention of checking walk->nbytes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-02-10 17:20:19 +08:00
Ard Biesheuvel	9e3457112b	crypto: arm64/gcm - add RFC4106 support Add support for RFC4106 ESP encapsulation to the accelerated GCM implementation. This results in a ~10% speedup for IPsec frames of typical size (~1420 bytes) on Cortex-A53. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2023-01-20 18:29:31 +08:00
Tianjia Zhang	736f88689c	crypto: arm64/sm4 - fix possible crash with CFI enabled The SM4 CCM/GCM assembly functions for encryption and decryption is called via indirect function calls. Therefore they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure (if the compiler didn't happen to optimize out the indirect call). Fixes: `67fa3a7fdf` ("crypto: arm64/sm4 - add CE implementation for CCM mode") Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-30 17:57:42 +08:00
Ard Biesheuvel	a428636d4c	crypto: arm64/ghash-ce - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to set up the stack frame so that return address protections will be enabled automically when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	489a4a05fe	crypto: arm64/crct10dif - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to set up the stack frame so that return address protections will be enabled automically when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	7d709af180	crypto: arm64/aes-modes - use frame_push/pop macros consistently Use the frame_push and frame_pop macros to create the stack frames in the AES chaining mode wrappers so that they will get PAC and/or shadow call stack protection when configured. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Ard Biesheuvel	67ab02dce3	crypto: arm64/aes-neonbs - use frame_push/pop consistently Use the frame_push and frame_pop macros consistently to create the stack frame, so that we will get PAC and/or shadow call stack handling as well when enabled. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-09 18:45:00 +08:00
Herbert Xu	14386d4713	crypto: Prepare to move crypto_tfm_ctx The helper crypto_tfm_ctx is only used by the Crypto API algorithm code and should really be in algapi.h. However, for historical reasons many files relied on it to be in crypto.h. This patch changes those files to use algapi.h instead in prepartion for a move. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-12-02 18:12:40 +08:00
Eric Biggers	be8f6b6496	crypto: arm64/sm3 - fix possible crash with CFI enabled sm3_neon_transform() is called via indirect function calls. Therefore it needs to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause its type hash to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure (if the compiler didn't happen to optimize out the indirect call). Fixes: `c50d32859e` ("arm64: Add types to indirect called assembly functions") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:19 +08:00
Eric Biggers	e5e1c67e2f	crypto: arm64/nhpoly1305 - eliminate unnecessary CFI wrapper Since the CFI implementation now supports indirect calls to assembly functions, take advantage of that rather than use a wrapper function. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:19 +08:00
Tianjia Zhang	824db5cd1e	crypto: arm64 - Fix unused variable compilation warnings of cpu_feature The cpu feature defined by MODULE_DEVICE_TABLE is only referenced when compiling as a module, and the warning of unused variable will be encountered when compiling with intree. The warning can be removed by adding the __maybe_unused flag. Fixes: `03c9a333fe` ("crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL") Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Ard Biesheuvel	61c581a46a	crypto: move gf128mul library into lib/crypto The gf128mul library does not depend on the crypto API at all, so it can be moved into lib/crypto. This will allow us to use it in other library code in a subsequent patch without having to depend on CONFIG_CRYPTO. While at it, change the Kconfig symbol name to align with other crypto library implementations. However, the source file name is retained, as it is reflected in the module .ko filename, and changing this might break things for users. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Tianjia Zhang	ae1b83c7d5	crypto: arm64/sm4 - add CE implementation for GCM mode This patch is a CE-optimized assembly implementation for GCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (gcm_base(ctr-sm4-ce,ghash-generic)): gcm(sm4) \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 25.24 64.65 104.66 116.69 123.81 125.12 129.67 130.62 GCM dec \| 25.40 64.80 104.74 116.70 123.81 125.21 129.68 130.59 GCM mb enc \| 24.95 64.06 104.20 116.38 123.55 124.97 129.63 130.61 GCM mb dec \| 24.92 64.00 104.13 116.34 123.55 124.98 129.56 130.48 After: gcm-sm4-ce \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 108.62 397.18 971.60 1283.92 1522.77 1513.39 1777.00 1806.96 GCM dec \| 116.36 398.14 1004.27 1319.11 1624.21 1635.43 1932.54 1974.20 GCM mb enc \| 107.13 391.79 962.05 1274.94 1514.76 1508.57 1769.07 1801.58 GCM mb dec \| 113.40 389.36 988.51 1307.68 1619.10 1631.55 1931.70 1970.86 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:44 +08:00
Tianjia Zhang	67fa3a7fdf	crypto: arm64/sm4 - add CE implementation for CCM mode This patch is a CE-optimized assembly implementation for CCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 223 and 225 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is ccm_base(ctr-sm4-ce,cbcmac-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (rfc4309(ccm_base(ctr-sm4-ce,cbcmac-sm4-ce))): ccm(sm4) \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc \| 35.07 125.40 336.47 468.17 581.97 619.18 712.56 736.01 CCM dec \| 34.87 124.40 335.08 466.75 581.04 618.81 712.25 735.89 CCM mb enc \| 34.71 123.96 333.92 465.39 579.91 617.49 711.45 734.92 CCM mb dec \| 34.42 122.80 331.02 462.81 578.28 616.42 709.88 734.19 After (rfc4309(ccm-sm4-ce)): ccm-sm4-ce \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc \| 77.12 249.82 569.94 725.17 839.27 867.71 952.87 969.89 CCM dec \| 75.90 247.26 566.29 722.12 836.90 865.95 951.74 968.57 CCM mb enc \| 75.98 245.25 562.91 718.99 834.76 864.70 950.17 967.90 CCM mb dec \| 75.06 243.78 560.58 717.13 833.68 862.70 949.35 967.11 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:32 +08:00
Tianjia Zhang	6b5360a5e0	crypto: arm64/sm4 - add CE implementation for cmac/xcbc/cbcmac This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac(sm4-ce) \| 293.33 403.69 503.76 527.78 531.10 535.46 535.81 xcbc(sm4-ce) \| 292.83 402.50 504.02 529.08 529.87 536.55 538.24 cbcmac(sm4-ce) \| 318.42 415.79 497.12 515.05 523.15 521.19 523.01 After: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac-sm4-ce \| 371.99 675.28 903.56 971.65 980.57 990.40 991.04 xcbc-sm4-ce \| 372.11 674.55 903.47 971.61 980.96 990.42 991.10 cbcmac-sm4-ce \| 371.63 675.33 903.23 972.07 981.42 990.93 991.45 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:43 +08:00
Tianjia Zhang	01f633113b	crypto: arm64/sm4 - add CE implementation for XTS mode This patch is a CE-optimized assembly implementation for XTS mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is xts(ecb-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: xts(ecb-sm4-ce) \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc \| 117.17 430.56 732.92 1134.98 2007.03 2136.23 2347.20 XTS dec \| 116.89 429.02 733.40 1132.96 2006.13 2130.50 2347.92 After: xts-sm4-ce \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc \| 224.68 798.91 1248.08 1714.60 2413.73 2467.84 2612.62 XTS dec \| 229.85 791.34 1237.79 1720.00 2413.30 2473.84 2611.95 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	b1863fd074	crypto: arm64/sm4 - add CE implementation for CTS-CBC mode This patch is a CE-optimized assembly implementation for CTS-CBC mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: cts(cbc-sm4-ce) \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc \| 286.09 297.17 457.97 627.75 868.58 900.80 957.69 CTS-CBC dec \| 286.67 285.63 538.35 947.08 2241.03 2577.32 3391.14 After: cts-cbc-sm4-ce \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc \| 288.19 428.80 593.57 741.04 911.73 931.80 950.00 CTS-CBC dec \| 292.22 468.99 838.23 1380.76 2741.17 3036.42 3409.62 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	45089dbe59	crypto: arm64/sm4 - export reusable CE acceleration functions In the accelerated implementation of the SM4 algorithm using the Crypto Extension instructions, there are some functions that can be reused in the upcoming accelerated implementation of the GCM/CCM mode, and the CBC/CFB encryption is reused in the optimized implementation of SVESM4. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	cb9ba02b07	crypto: arm64/sm4 - simplify sm4_ce_expand_key() of CE implementation Use a 128-bit swap mask and tbl instruction to simplify the implementation for generating SM4 rkey_dec. Also fixed the issue of not being wrapped by kernel_neon_begin/end() when using the sm4_ce_expand_key() function. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:31 +08:00
Tianjia Zhang	ce41fefd24	crypto: arm64/sm4 - refactor and simplify CE implementation This patch does not add new features, but only refactors and simplifies the implementation of the Crypto Extension acceleration of the SM4 algorithm: Extract the macro optimized by SM4 Crypto Extension for reuse in the subsequent optimization of CCM/GCM modes. Encryption in CBC and CFB modes processes four blocks at a time instead of one, allowing the ld1 instruction to load 64 bytes of data at a time, which will reduces unnecessary memory accesses. CBC/CFB/CTR makes full use of free registers to reduce redundant memory accesses, and rearranges some instructions to improve out-of-order execution capabilities. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:31 +08:00
Tianjia Zhang	62508017a2	crypto: arm64/sm4 - refactor and simplify NEON implementation This patch does not add new features. The main work is to refactor and simplify the implementation of SM4 NEON, which is reflected in the following aspects: The accelerated implementation supports the arbitrary number of blocks, not just multiples of 8, which simplifies the implementation and brings some optimization acceleration for data that is not aligned by 8 blocks. When loading the input data, use the ld4 instruction to replace the original ld1 instruction as much as possible, which will save the cost of matrix transposition of the input data. Use 8-block parallelism whenever possible to speed up matrix transpose and rotation operations, instead of up to 4-block parallelism. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:21 +08:00
Tianjia Zhang	a41b212946	crypto: arm64/sm3 - add NEON assembly implementation This patch adds the NEON acceleration implementation of the SM3 hash algorithm. The main algorithm is based on SM3 NEON accelerated work of the libgcrypt project. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 326 mode of tcrypt, and compares the performance data of sm3-generic and sm3-ce. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- sm3-generic \| 185.24 221.28 301.26 307.43 300.83 308.82 308.91 sm3-neon \| 171.81 220.20 322.94 339.28 334.09 343.61 343.87 sm3-ce \| 227.48 333.48 502.62 527.87 520.45 534.91 535.40 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:21 +08:00
Tianjia Zhang	e1fa51aa2b	crypto: arm64/sm3 - raise the priority of the CE implementation Raise the priority of the sm3-ce algorithm from 200 to 400, this is to make room for the implementation of sm3-neon. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:33:22 +08:00
Linus Torvalds	3604a7f568	This update includes the following changes: API: - Feed untrusted RNGs into /dev/random. - Allow HWRNG sleeping to be more interruptible. - Create lib/utils module. - Setting private keys no longer required for akcipher. - Remove tcrypt mode=1000. - Reorganised Kconfig entries. Algorithms: - Load x86/sha512 based on CPU features. - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher. Drivers: - Add HACE crypto driver aspeed. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmM785cACgkQxycdCkmx i6dveBAAmGVYtrPmcGfA6CmzZ8ps9KdZxhjHjzLKwuqrOMulZvE2IYeUV4QtNqpQ 6NLY2+TkqL0XIbCXoByIk32lMYIlXBaJdMYdHHDTeo7E2wqZn/46SPSWeNKazyJx dkL8Oj62nqDc2s0LOi3vLvod+sENFQ69R+vkHOa0fZhX0UBsac3NIXo+74Y2A7bE 0+iQFKTWdNnoQzQ0j4q8WMiolKYh21iPZ9l5sjgMgichLCaE6PrITlRcaWrtPhey U1OmJtbTPsg+5X1r9KyLtoAXtBDONl66GQyne+p/ZYD8cMhxomjJaPlMhwWE/n4d d2KJKvoXoPPo4c+yNIS9hBav07ZriPl0q0jd2M1rd6oYTmFpaodTgIBfjvxO+wfV GoqDS8PEc42U1uwkuKC/cvfr6pB8WiybfXy+vSXBm/jUgIOO3y+eqsC8Jx9ZoQeG F+d34PYfJrJbmDRtcA6ZKdzN0OmKq7aCilx1kGKGPg0D+uq64FBo7zsT6XzTK8HL 2Za9AACPn87xLQwGrKDSBfyrlSSIJm2FaIIPayUXHEo7cyoiZwbTpXRRJ1mDR+v9 jzI+xPEXCthtjysuRmufNhTkiZUv3lZ8ORfQ0QFKR53tjZUm+dVQo0V/N/ZSXoSV SyRvXYO+ToXePAofNWl1LcO1grX/vxtFNedMkDLHXooRcnCaIYo= =rq2f -----END PGP SIGNATURE----- Merge tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Feed untrusted RNGs into /dev/random - Allow HWRNG sleeping to be more interruptible - Create lib/utils module - Setting private keys no longer required for akcipher - Remove tcrypt mode=1000 - Reorganised Kconfig entries Algorithms: - Load x86/sha512 based on CPU features - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher Drivers: - Add HACE crypto driver aspeed" * tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (124 commits) crypto: aspeed - Remove redundant dev_err call crypto: scatterwalk - Remove unused inline function scatterwalk_aligned() crypto: aead - Remove unused inline functions from aead crypto: bcm - Simplify obtain the name for cipher crypto: marvell/octeontx - use sysfs_emit() to instead of scnprintf() hwrng: core - start hwrng kthread also for untrusted sources crypto: zip - remove the unneeded result variable crypto: qat - add limit to linked list parsing crypto: octeontx2 - Remove the unneeded result variable crypto: ccp - Remove the unneeded result variable crypto: aspeed - Fix check for platform_get_irq() errors crypto: virtio - fix memory-leak crypto: cavium - prevent integer overflow loading firmware crypto: marvell/octeontx - prevent integer overflows crypto: aspeed - fix build error when only CRYPTO_DEV_ASPEED is enabled crypto: hisilicon/qm - fix the qos value initialization crypto: sun4i-ss - use DEFINE_SHOW_ATTRIBUTE to simplify sun4i_ss_debugfs crypto: tcrypt - add async speed test for aria cipher crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher crypto: aria - prepare generic module for optimized implementations ...	2022-10-10 13:04:25 -07:00
Sami Tolvanen	c50d32859e	arm64: Add types to indirect called assembly functions With CONFIG_CFI_CLANG, assembly functions indirectly called from C code must be annotated with type identifiers to pass CFI checking. Use SYM_TYPED_FUNC_START for the indirectly called functions, and ensure we emit `bti c` also with SYM_TYPED_FUNC_START. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-10-samitolvanen@google.com	2022-09-26 10:13:13 -07:00
Lukas Bulwahn	611d451e40	crypto: arm64 - revert unintended config name change for CRYPTO_SHA1_ARM64_CE Commit `3f342a2325` ("crypto: Kconfig - simplify hash entries") makes various changes to the config descriptions as part of some consolidation and clean-up, but among all those changes, it also accidently renames CRYPTO_SHA1_ARM64_CE to CRYPTO_SHA1_ARM64. Revert this unintended config name change. See Link for the author's confirmation of this happening accidently. Fixes: `3f342a2325` ("crypto: Kconfig - simplify hash entries") Link: https://lore.kernel.org/all/MW5PR84MB18424AB8C095BFC041AE33FDAB479@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/ Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-09-24 16:14:43 +08:00
Robert Elliott	cf514b2a59	crypto: Kconfig - simplify cipher entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:43 +08:00
Robert Elliott	3f342a2325	crypto: Kconfig - simplify hash entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:43 +08:00
Robert Elliott	ec84348da4	crypto: Kconfig - simplify CRC entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:42 +08:00
Robert Elliott	9e5647eb06	crypto: Kconfig - sort the arm64 entries Sort the arm64 entries so all like entries are together. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:42 +08:00
Robert Elliott	4a329fecc9	crypto: Kconfig - submenus for arm and arm64 Move ARM- and ARM64-accelerated menus into a submenu under the Crypto API menu (paralleling all the architectures). Make each submenu always appear if the corresponding architecture is supported. Get rid of the ARM_CRYPTO and ARM64_CRYPTO symbols. The "ARM Accelerated" or "ARM64 Accelerated" entry disappears from: General setup ---> Platform selection ---> Kernel Features ---> Boot options ---> Power management options ---> CPU Power Management ---> [] ACPI (Advanced Configuration and Power Interface) Support ---> [] Virtualization ---> [] ARM Accelerated Cryptographic Algorithms ---> (or) [] ARM64 Accelerated Cryptographic Algorithms ---> ... -- Cryptographic API ---> Library routines ---> Kernel hacking ---> and moves into the Cryptographic API menu, which now contains: ... Accelerated Cryptographic Algorithms for CPU (arm) ---> (or) Accelerated Cryptographic Algorithms for CPU (arm64) ---> [] Hardware crypto devices ---> ... Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:41 +08:00
GUO Zihua	7ae19d422c	crypto: arm64/poly1305 - fix a read out-of-bound A kasan error was reported during fuzzing: BUG: KASAN: slab-out-of-bounds in neon_poly1305_blocks.constprop.0+0x1b4/0x250 [poly1305_neon] Read of size 4 at addr ffff0010e293f010 by task syz-executor.5/1646715 CPU: 4 PID: 1646715 Comm: syz-executor.5 Kdump: loaded Not tainted 5.10.0.aarch64 #1 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.59 01/31/2019 Call trace: dump_backtrace+0x0/0x394 show_stack+0x34/0x4c arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x158/0x1e4 lib/dump_stack.c:118 print_address_description.constprop.0+0x68/0x204 mm/kasan/report.c:387 __kasan_report+0xe0/0x140 mm/kasan/report.c:547 kasan_report+0x44/0xe0 mm/kasan/report.c:564 check_memory_region_inline mm/kasan/generic.c:187 [inline] __asan_load4+0x94/0xd0 mm/kasan/generic.c:252 neon_poly1305_blocks.constprop.0+0x1b4/0x250 [poly1305_neon] neon_poly1305_do_update+0x6c/0x15c [poly1305_neon] neon_poly1305_update+0x9c/0x1c4 [poly1305_neon] crypto_shash_update crypto/shash.c:131 [inline] shash_finup_unaligned+0x84/0x15c crypto/shash.c:179 crypto_shash_finup+0x8c/0x140 crypto/shash.c:193 shash_digest_unaligned+0xb8/0xe4 crypto/shash.c:201 crypto_shash_digest+0xa4/0xfc crypto/shash.c:217 crypto_shash_tfm_digest+0xb4/0x150 crypto/shash.c:229 essiv_skcipher_setkey+0x164/0x200 [essiv] crypto_skcipher_setkey+0xb0/0x160 crypto/skcipher.c:612 skcipher_setkey+0x3c/0x50 crypto/algif_skcipher.c:305 alg_setkey+0x114/0x2a0 crypto/af_alg.c:220 alg_setsockopt+0x19c/0x210 crypto/af_alg.c:253 __sys_setsockopt+0x190/0x2e0 net/socket.c:2123 __do_sys_setsockopt net/socket.c:2134 [inline] __se_sys_setsockopt net/socket.c:2131 [inline] __arm64_sys_setsockopt+0x78/0x94 net/socket.c:2131 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall+0x64/0x100 arch/arm64/kernel/syscall.c:48 el0_svc_common.constprop.0+0x220/0x230 arch/arm64/kernel/syscall.c:155 do_el0_svc+0xb4/0xd4 arch/arm64/kernel/syscall.c:217 el0_svc+0x24/0x3c arch/arm64/kernel/entry-common.c:353 el0_sync_handler+0x160/0x164 arch/arm64/kernel/entry-common.c:369 el0_sync+0x160/0x180 arch/arm64/kernel/entry.S:683 This error can be reproduced by the following code compiled as ko on a system with kasan enabled: #include <linux/module.h> #include <linux/crypto.h> #include <crypto/hash.h> #include <crypto/poly1305.h> char test_data[] = "\x00\x01\x02\x03\x04\x05\x06\x07" "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" "\x10\x11\x12\x13\x14\x15\x16\x17" "\x18\x19\x1a\x1b\x1c\x1d\x1e"; int init(void) { struct crypto_shash tfm = NULL; char data = NULL, *out = NULL; tfm = crypto_alloc_shash("poly1305", 0, 0); data = kmalloc(POLY1305_KEY_SIZE - 1, GFP_KERNEL); out = kmalloc(POLY1305_DIGEST_SIZE, GFP_KERNEL); memcpy(data, test_data, POLY1305_KEY_SIZE - 1); crypto_shash_tfm_digest(tfm, data, POLY1305_KEY_SIZE - 1, out); kfree(data); kfree(out); return 0; } void deinit(void) { } module_init(init) module_exit(deinit) MODULE_LICENSE("GPL"); The root cause of the bug sits in neon_poly1305_blocks. The logic neon_poly1305_blocks() performed is that if it was called with both s[] and r[] uninitialized, it will first try to initialize them with the data from the first "block" that it believed to be 32 bytes in length. First 16 bytes are used as the key and the next 16 bytes for s[]. This would lead to the aforementioned read out-of-bound. However, after calling poly1305_init_arch(), only 16 bytes were deducted from the input and s[] is initialized yet again with the following 16 bytes. The second initialization of s[] is certainly redundent which indicates that the first initialization should be for r[] only. This patch fixes the issue by calling poly1305_init_arm64() instead of poly1305_init_arch(). This is also the implementation for the same algorithm on arm platform. Fixes: `f569ca1647` ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Cc: stable@vger.kernel.org Signed-off-by: GUO Zihua <guozihua@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-07-29 18:29:17 +08:00
Qian Cai	fac76f2260	crypto: arm64/gcm - Select AEAD for GHASH_ARM64_CE Otherwise, we could fail to compile. ld: arch/arm64/crypto/ghash-ce-glue.o: in function 'ghash_ce_mod_exit': ghash-ce-glue.c:(.exit.text+0x24): undefined reference to 'crypto_unregister_aead' ld: arch/arm64/crypto/ghash-ce-glue.o: in function 'ghash_ce_mod_init': ghash-ce-glue.c:(.init.text+0x34): undefined reference to 'crypto_register_aead' Fixes: `537c1445ab` ("crypto: arm64/gcm - implement native driver using v8 Crypto Extensions") Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-06-30 15:56:58 +08:00
Jilin Yuan	1b069597c2	crypto: arm64/aes-neon - Fix typo in comment Delete the redundant word 'the'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-06-30 15:56:57 +08:00
Nathan Huckleberry	9d2c0b485c	crypto: arm64/polyval - Add PMULL accelerated implementation of POLYVAL Add hardware accelerated version of POLYVAL for ARM64 CPUs with Crypto Extensions support. This implementation is accelerated using PMULL instructions to perform the finite field computations. For added efficiency, 8 blocks of the message are processed simultaneously by precomputing the first 8 powers of the key. Karatsuba multiplication is used instead of Schoolbook multiplication because it was found to be slightly faster on ARM64 CPUs. Montgomery reduction must be used instead of Barrett reduction due to the difference in modulus between POLYVAL's field and other finite fields. More information on POLYVAL can be found in the HCTR2 paper: "Length-preserving encryption with HCTR2": https://eprint.iacr.org/2021/1441.pdf Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-06-10 16:40:18 +08:00
Nathan Huckleberry	c0eb7591c1	crypto: arm64/aes-xctr - Improve readability of XCTR and CTR modes Added some clarifying comments, changed the register allocations to make the code clearer, and added register aliases. Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-06-10 16:40:17 +08:00
Nathan Huckleberry	23a251cc16	crypto: arm64/aes-xctr - Add accelerated implementation of XCTR Add hardware accelerated version of XCTR for ARM64 CPUs with ARMv8 Crypto Extension support. This XCTR implementation is based on the CTR implementation in aes-modes.S. More information on XCTR can be found in the HCTR2 paper: "Length-preserving encryption with HCTR2": https://eprint.iacr.org/2021/1441.pdf Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-06-10 16:40:17 +08:00
Tianjia Zhang	b45b0a1220	crypto: arm64/sm4 - Fix wrong dependency of NEON/CE implementation Commit `d2825fa936` ("crypto: sm3,sm4 - move into crypto directory") moved the sm4 library implementation from the lib/crypto directory to the crypto directory and configured the name as CRYPTO_SM4. The arm64 SM4 NEON/CE implementation depends on this and needs to be modified uniformly. Fixes: `4f1aef9b80` ("crypto: arm64/sm4 - add ARMv8 NEON implementation") Fixes: `5b33e0ec88` ("crypto: arm64/sm4 - add ARMv8 Crypto Extensions implementation") Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-04-15 16:34:30 +08:00
Tianjia Zhang	5b33e0ec88	crypto: arm64/sm4 - add ARMv8 Crypto Extensions implementation This adds ARMv8 implementations of SM4 in ECB, CBC, CFB and CTR modes using Crypto Extensions, also includes key expansion operations because the Crypto Extensions instruction is much faster than software implementations. The Crypto Extensions for SM4 can only run on ARMv8 implementations that have support for these optional extensions. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: sm4-generic \| 16 64 128 256 1024 1420 4096 ECB enc \| 80.05 91.42 93.66 94.77 95.69 95.77 95.86 ECB dec \| 79.98 91.41 93.64 94.76 95.66 95.77 95.85 CBC enc \| 78.55 86.50 88.02 88.77 89.36 89.42 89.48 CBC dec \| 76.82 89.06 91.52 92.77 93.75 93.83 93.96 CFB enc \| 77.64 86.13 87.62 88.42 89.08 88.83 89.18 CFB dec \| 77.57 88.34 90.36 91.45 92.34 92.00 92.44 CTR enc \| 77.80 88.28 90.23 91.22 92.11 91.81 92.25 CTR dec \| 77.83 88.22 90.22 91.22 92.04 91.82 92.28 sm4-neon ECB enc \| 28.31 112.77 203.03 209.89 215.49 202.11 210.59 ECB dec \| 28.36 113.45 203.23 210.00 215.52 202.13 210.65 CBC enc \| 79.32 87.02 88.51 89.28 89.85 89.89 89.97 CBC dec \| 28.29 112.20 203.30 209.82 214.99 201.51 209.95 CFB enc \| 79.59 87.16 88.54 89.30 89.83 89.62 89.92 CFB dec \| 28.12 111.05 202.47 209.02 214.21 210.90 209.12 CTR enc \| 28.04 108.81 200.62 206.65 211.78 208.78 206.74 CTR dec \| 28.02 108.82 200.45 206.62 211.78 208.74 206.70 sm4-ce-cipher ECB enc \| 336.79 587.13 682.70 747.37 803.75 811.52 818.06 ECB dec \| 339.18 584.52 679.72 743.68 798.82 803.83 811.54 CBC enc \| 316.63 521.47 597.00 647.14 690.82 695.21 700.55 CBC dec \| 291.80 503.79 585.66 640.82 689.86 695.16 701.72 CFB enc \| 294.79 482.31 552.13 594.71 631.60 628.91 638.92 CFB dec \| 293.09 466.44 526.56 563.17 594.41 592.26 601.97 CTR enc \| 309.61 506.13 576.86 620.47 656.38 654.51 665.10 CTR dec \| 306.69 505.57 576.84 620.18 657.09 654.52 665.32 sm4-ce ECB enc \| 366.96 1329.81 2024.29 2755.50 3790.07 3861.91 4051.40 ECB dec \| 367.30 1323.93 2018.72 2747.43 3787.39 3862.55 4052.62 CBC enc \| 358.09 682.68 807.24 885.35 958.29 963.60 973.73 CBC dec \| 366.51 1303.63 1978.64 2667.93 3624.53 3683.41 3856.08 CFB enc \| 351.51 681.26 807.81 893.10 968.54 969.17 985.83 CFB dec \| 354.98 1266.61 1929.63 2634.81 3614.23 3611.59 3841.68 CTR enc \| 324.23 1121.25 1689.44 2256.70 2981.90 3007.79 3060.74 CTR dec \| 324.18 1120.44 1694.31 2258.32 2982.01 3010.09 3060.99 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-04-08 16:13:29 +08:00
Tianjia Zhang	4f1aef9b80	crypto: arm64/sm4 - add ARMv8 NEON implementation This adds ARMv8 NEON implementations of SM4 in ECB, CBC, CFB and CTR modes. This implementation uses the plain NEON instruction set, All S-BOX substitutions uses the tbl/tbx instructions of ARMv8, combined with the out-of-order execution in CPU, this optimization supports encryption of up to 8 blocks at the same time. The performance of encrypting one block is not as good as software implementation, so the encryption operations of CBC and CFB still use pure software algorithms. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: sm4-generic \| 16 64 128 256 1024 1420 4096 ECB enc \| 80.05 91.42 93.66 94.77 95.69 95.77 95.86 ECB dec \| 79.98 91.41 93.64 94.76 95.66 95.77 95.85 CBC enc \| 78.55 86.50 88.02 88.77 89.36 89.42 89.48 CBC dec \| 76.82 89.06 91.52 92.77 93.75 93.83 93.96 CFB enc \| 77.64 86.13 87.62 88.42 89.08 88.83 89.18 CFB dec \| 77.57 88.34 90.36 91.45 92.34 92.00 92.44 CTR enc \| 77.80 88.28 90.23 91.22 92.11 91.81 92.25 CTR dec \| 77.83 88.22 90.22 91.22 92.04 91.82 92.28 sm4-neon ECB enc \| 28.31 112.77 203.03 209.89 215.49 202.11 210.59 ECB dec \| 28.36 113.45 203.23 210.00 215.52 202.13 210.65 CBC enc \| 79.32 87.02 88.51 89.28 89.85 89.89 89.97 CBC dec \| 28.29 112.20 203.30 209.82 214.99 201.51 209.95 CFB enc \| 79.59 87.16 88.54 89.30 89.83 89.62 89.92 CFB dec \| 28.12 111.05 202.47 209.02 214.21 210.90 209.12 CTR enc \| 28.04 108.81 200.62 206.65 211.78 208.78 206.74 CTR dec \| 28.02 108.82 200.45 206.62 211.78 208.74 206.70 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-04-08 16:13:13 +08:00
Tianjia Zhang	02436762f5	crypto: arm64/sm4-ce - rename to sm4-ce-cipher The subsequent patches of the series will have an implementation of SM4-ECB/CBC/CFB/CTR accelerated by the CE instruction set, which conflicts with the current module name. In order to keep the naming rules of the AES algorithm consistent, the sm4-ce algorithm is renamed to sm4-ce-cipher. In addition, the speed of sm4-ce-cipher is better than that of SM4 NEON. By the way, the priority of the algorithm is adjusted to 300, which is also to leave room for the priority of SM4 NEON. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-04-08 16:12:47 +08:00
Jason A. Donenfeld	d2825fa936	crypto: sm3,sm4 - move into crypto directory The lib/crypto libraries live in lib because they are used by various drivers of the kernel. In contrast, the various helper functions in crypto are there because they're used exclusively by the crypto API. The SM3 and SM4 helper functions were erroniously moved into lib/crypto/ instead of crypto/, even though there are no in-kernel users outside of the crypto API of those functions. This commit moves them into crypto/. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-04-08 16:11:48 +08:00
Tom Rix	cd6714f940	crypto: arm64 - cleanup comments For spdx, use // for *.c files Replacements significanty to significantly Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-03-09 15:12:32 +12:00
Ard Biesheuvel	dfc6031ec9	crypto: arm64/aes-neonbs-xts - use plain NEON for non-power-of-2 input sizes Even though the kernel's implementations of AES-XTS were updated to implement ciphertext stealing and can operate on inputs of any size larger than or equal to the AES block size, this feature is rarely used in practice. In fact, in the kernel, AES-XTS is only used to operate on 4096 or 512 byte blocks, which means that not only the ciphertext stealing is effectively dead code, the logic in the bit sliced NEON implementation to deal with fewer than 8 blocks at a time is also never used. Since the bit-sliced NEON driver already depends on the plain NEON version, which is slower but can operate on smaller data quantities more straightforwardly, let's fallback to the plain NEON implementation of XTS for any residual inputs that are not multiples of 128 bytes. This allows us to remove a lot of complicated logic that rarely gets exercised in practice. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-02-05 15:10:51 +11:00
Ard Biesheuvel	fc074e1300	crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk Instead of processing the entire input with the 8-way bit sliced algorithm, which is sub-optimal for inputs that are not a multiple of 128 bytes in size, invoke the plain NEON version of CTR for the remainder of the input after processing the bulk using 128 byte strides. This allows us to greatly simplify the asm code that implements CTR, and get rid of all the branches and special code paths. It also gains us a couple of percent of performance. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-02-05 15:10:51 +11:00
Ard Biesheuvel	8daa399ede	crypto: arm64/aes-neon-ctr - improve handling of single tail block Instead of falling back to C code to do a memcpy of the output of the last block, handle this in the asm code directly if possible, which is the case if the entire input is longer than 16 bytes. Cc: Nathan Huckleberry <nhuck@google.com> Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-02-05 15:10:51 +11:00
Tianjia Zhang	f3a03d319d	crypto: arm64/sm3-ce - make dependent on sm3 library SM3 generic library is stand-alone implementation, sm3-ce can depend on the SM3 library instead of sm3-generic. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-01-28 16:51:10 +11:00
Mark Brown	9be34be87c	arm64: Add macro version of the BTI instruction BTI is only available from v8.5 so we need to encode it using HINT in generic code and for older toolchains. Add an assembler macro based on one written by Mark Rutland which lets us use the mnemonic and update the existing users. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20211214152714.2380849-2-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2021-12-14 18:12:58 +00:00
Ard Biesheuvel	898387e40c	crypto: arm64/aes-ccm - avoid by-ref argument for ce_aes_ccm_auth_data With the SIMD code path removed, we can clean up the CCM auth-only path a bit further, by passing the 'macp' input buffer pointer by value, rather than by reference, and taking the output value from the function's return value. This way, the compiler is no longer forced to allocate macp on the stack. This is not expected to make any difference in practice, it just makes for slightly cleaner code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-09-17 11:05:11 +08:00
Ard Biesheuvel	741691c446	crypto: arm64/aes-ccm - reduce NEON begin/end calls for common case AES-CCM (as used in WPA2 CCMP, for instance) typically involves authenticate-only data, and operates on a single network packet, and so the common case is for the authenticate, en/decrypt and finalize SIMD helpers to all be called exactly once in sequence. Since kernel_neon_end() now involves manipulation of the preemption state as well as the softirq mask state, let's reduce the number of times we are forced to call it to only once if we are handling this common case. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-09-17 11:05:11 +08:00
Ard Biesheuvel	b3482635e5	crypto: arm64/aes-ccm - remove non-SIMD fallback path AES/CCM on arm64 is implemented as a synchronous AEAD, and so it is guaranteed by the API that it is only invoked in task or softirq context. Since softirqs are now only handled when the SIMD is not being used in the task context that was interrupted to service the softirq, we no longer need a fallback path. Let's remove it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-09-17 11:05:11 +08:00
Ard Biesheuvel	36a916af64	crypto: arm64/aes-ccm - yield NEON when processing auth-only data In SIMD accelerated crypto drivers, we typically yield the SIMD unit after processing 4 KiB of input, to avoid scheduling blackouts caused by the fact that claiming the SIMD unit disables preemption as well as softirq processing. The arm64 CCM driver does this implicitly for the ciphertext, due to the fact that the skcipher API never processes more than a single page at a time. However, the scatterwalk performed by this driver when processing the authenticate-only data will keep the SIMD unit occupied until it completes. So cap the scatterwalk steps to 4 KiB. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-09-17 11:05:10 +08:00
Ard Biesheuvel	676e508122	crypto: arm64/aes-ce - stop using SIMD helper for skciphers Calls into the skcipher API can only occur from contexts where the SIMD unit is available, so there is no need for the SIMD helper. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-09-17 11:05:10 +08:00
Ard Biesheuvel	96c34e1436	crypto: arm64/aes-neonbs - stop using SIMD helper for skciphers Calls into the skcipher API can only occur from contexts where the SIMD unit is available, so there is no need for the SIMD helper. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-09-17 11:05:10 +08:00
Ard Biesheuvel	b9e699f912	crypto: arm64/gcm-aes-ce - remove non-SIMD fallback path Now that kernel mode SIMD is guaranteed to be available when executing in task or softirq context, we no longer need scalar fallbacks to use when the NEON is unavailable. So get rid of them. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-09-17 11:05:10 +08:00
Tianjia Zhang	c59de48e12	crypto: arm64/sm4-ce - Make dependent on sm4 library instead of sm4-generic SM4 library is abstracted from sm4-generic algorithm, sm4-ce can depend on the SM4 library instead of sm4-generic, and some functions in sm4-generic do not need to be exported. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-07-30 10:58:30 +08:00
Masahiro Yamada	2063257d4b	crypto: arm64 - use a pattern rule for generating *.S files Unify similar build rules. sha256-core.S opts out it because it is generated from sha512-armv8.pl. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-05-14 19:07:54 +08:00
Masahiro Yamada	12dd461ebd	crypto: arm64 - generate .S by Perl at build time instead of shipping them Generate .S by Perl like arch/{mips,x86}/crypto/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-05-14 19:07:54 +08:00
Linus Torvalds	31a24ae89c	arm64 updates for 5.13: - MTE asynchronous support for KASan. Previously only synchronous (slower) mode was supported. Asynchronous is faster but does not allow precise identification of the illegal access. - Run kernel mode SIMD with softirqs disabled. This allows using NEON in softirq context for crypto performance improvements. The conditional yield support is modified to take softirqs into account and reduce the latency. - Preparatory patches for Apple M1: handle CPUs that only have the VHE mode available (host kernel running at EL2), add FIQ support. - arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new functions for the HiSilicon HHA and L3C PMU, cleanups. - Re-introduce support for execute-only user permissions but only when the EPAN (Enhanced Privileged Access Never) architecture feature is available. - Disable fine-grained traps at boot and improve the documented boot requirements. - Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC). - Add hierarchical eXecute Never permissions for all page tables. - Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs to control which PAC keys are enabled in a particular task. - arm64 kselftests for BTI and some improvements to the MTE tests. - Minor improvements to the compat vdso and sigpage. - Miscellaneous cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmB5xkkACgkQa9axLQDI XvEBgRAAsr6r8gsBQJP3FDHmbtbVf2ej5QJTCOAQAGHbTt0JH7Pk03pWSBr7h5nF vsddRDxxeDgB6xd7jWP7EvDaPxHeB0CdSj5gG8EP/ZdOm8sFAwB1ZIHWikgUgSwW nu6R28yXTMSj+EkyFtahMhTMJ1EMF4sCPuIgAo59ST5w/UMMqLCJByOu4ej6RPKZ aeSJJWaDLBmbgnTKWxRvCc/MgIx4J/LAHWGkdpGjuMK6SLp38Kdf86XcrklXtzwf K30ZYeoKq8zZ+nFOsK9gBVlOlocZcbS1jEbN842jD6imb6vKLQtBWrKk9A6o4v5E XulORWcSBhkZb3ItIU9+6SmelUExf0VeVlSp657QXYPgquoIIGvFl6rCwhrdGMGO bi6NZKCfJvcFZJoIN1oyhuHejgZSBnzGEcvhvzNdg7ItvOCed7q3uXcGHz/OI6tL 2TZKddzHSEMVfTo0D+RUsYfasZHI1qAiQ0mWVC31c+YHuRuW/K/jlc3a5TXlSBUa Dwu0/zzMLiqx65ISx9i7XNMrngk55uzrS6MnwSByPoz4M4xsElZxt3cbUxQ8YAQz jhxTHs1Pwes8i7f4n61ay/nHCFbmVvN/LlsPRpZdwd8JumThLrDolF3tc6aaY0xO hOssKtnGY4Xvh/WitfJ5uvDb1vMObJKTXQEoZEJh4hlNQDxdeUE= =6NGI -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - MTE asynchronous support for KASan. Previously only synchronous (slower) mode was supported. Asynchronous is faster but does not allow precise identification of the illegal access. - Run kernel mode SIMD with softirqs disabled. This allows using NEON in softirq context for crypto performance improvements. The conditional yield support is modified to take softirqs into account and reduce the latency. - Preparatory patches for Apple M1: handle CPUs that only have the VHE mode available (host kernel running at EL2), add FIQ support. - arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new functions for the HiSilicon HHA and L3C PMU, cleanups. - Re-introduce support for execute-only user permissions but only when the EPAN (Enhanced Privileged Access Never) architecture feature is available. - Disable fine-grained traps at boot and improve the documented boot requirements. - Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC). - Add hierarchical eXecute Never permissions for all page tables. - Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs to control which PAC keys are enabled in a particular task. - arm64 kselftests for BTI and some improvements to the MTE tests. - Minor improvements to the compat vdso and sigpage. - Miscellaneous cleanups. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (86 commits) arm64/sve: Add compile time checks for SVE hooks in generic functions arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG. arm64: pac: Optimize kernel entry/exit key installation code paths arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere arm64/sve: Remove redundant system_supports_sve() tests arm64: fpsimd: run kernel mode NEON with softirqs disabled arm64: assembler: introduce wxN aliases for wN registers arm64: assembler: remove conditional NEON yield macros kasan, arm64: tests supports for HW_TAGS async mode arm64: mte: Report async tag faults before suspend arm64: mte: Enable async tag check fault arm64: mte: Conditionally compile mte_enable_kernel_*() arm64: mte: Enable TCO in functions that can read beyond buffer limits kasan: Add report for async mode arm64: mte: Drop arch_enable_tagging() kasan: Add KASAN mode kernel parameter arm64: mte: Add asynchronous mode support arm64: Get rid of CONFIG_ARM64_VHE arm64: Cope with CPUs stuck in VHE mode ...	2021-04-26 10:25:03 -07:00
Ard Biesheuvel	0f19dbc994	crypto: arm64/aes-ce - deal with oversight in new CTR carry code The new carry handling code in the CTR driver can deal with a carry occurring in the 4x/5x parallel code path, by using a computed goto to jump into the carry sequence at the right place as to only apply the carry to a subset of the blocks being processed. If the lower half of the counter wraps and ends up at exactly 0x0, a carry needs to be applied to the counter, but not to the counter values taken for the 4x/5x parallel sequence. In this case, the computed goto skips all register assignments, and branches straight to the jump instruction that gets us back to the fast path. This produces the correct result, but due to the fact that this branch target does not carry the correct BTI annotation, this fails when BTI is enabled. Let's omit the computed goto entirely in this case, and jump straight back to the fast path after applying the carry to the main counter. Fixes: `5318d3db46` ("crypto: arm64/aes-ctr - improve tail handling") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-04-16 21:16:31 +10:00
Ard Biesheuvel	13150149aa	arm64: fpsimd: run kernel mode NEON with softirqs disabled Kernel mode NEON can be used in task or softirq context, but only in a non-nesting manner, i.e., softirq context is only permitted if the interrupt was not taken at a point where the kernel was using the NEON in task context. This means all users of kernel mode NEON have to be aware of this limitation, and either need to provide scalar fallbacks that may be much slower (up to 20x for AES instructions) and potentially less safe, or use an asynchronous interface that defers processing to a later time when the NEON is guaranteed to be available. Given that grabbing and releasing the NEON is cheap, we can relax this restriction, by increasing the granularity of kernel mode NEON code, and always disabling softirq processing while the NEON is being used in task context. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210302090118.30666-4-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2021-04-12 11:55:34 +01:00
Arnd Bergmann	8d195e7a8a	crypto: poly1305 - fix poly1305_core_setkey() declaration gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 raw_key[16]) \| ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 ’ {aka ‘const unsigned char ’} 21 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: `1c08a10436` ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-04-02 18:28:12 +11:00
Ard Biesheuvel	fc754c024a	crypto: arm64/crc-t10dif - move NEON yield to C code Instead of yielding from the bowels of the asm routine if a reschedule is needed, divide up the input into 4 KB chunks in the C glue. This simplifies the code substantially, and avoids scheduling out the task with the asm routine on the call stack, which is undesirable from a CFI/instrumentation point of view. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-02-10 17:55:58 +11:00
Ard Biesheuvel	f0070f4a79	crypto: arm64/aes-ce-mac - simplify NEON yield Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-02-10 17:55:58 +11:00
Ard Biesheuvel	f5943ef456	crypto: arm64/aes-neonbs - remove NEON yield calls There is no need for elaborate yield handling in the bit-sliced NEON implementation of AES, given that skciphers are naturally bounded by the size of the chunks returned by the skcipher_walk API. So remove the yield calls from the asm code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-02-10 17:55:58 +11:00
Ard Biesheuvel	5f6cb2e617	crypto: arm64/sha512-ce - simplify NEON yield Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit `6caf7adc5e`. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-02-10 17:55:58 +11:00
Ard Biesheuvel	9ecc9f31d0	crypto: arm64/sha3-ce - simplify NEON yield Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit `7edc86cb1c`. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-02-10 17:55:58 +11:00
Ard Biesheuvel	b2eadbf40e	crypto: arm64/sha2-ce - simplify NEON yield Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit `d82f37ab5e`. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-02-10 17:55:57 +11:00
Ard Biesheuvel	5a69e1b73d	crypto: arm64/sha1-ce - simplify NEON yield Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit `7df8d16475`. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-02-10 17:55:57 +11:00
Ard Biesheuvel	0df07d8117	crypto: arm64/sha - add missing module aliases The accelerated, instruction based implementations of SHA1, SHA2 and SHA3 are autoloaded based on CPU capabilities, given that the code is modest in size, and widely used, which means that resolving the algo name, loading all compatible modules and picking the one with the highest priority is taken to be suboptimal. However, if these algorithms are requested before this CPU feature based matching and autoloading occurs, these modules are not even considered, and we end up with suboptimal performance. So add the missing module aliases for the various SHA implementations. Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-01-22 14:58:01 +11:00
Ard Biesheuvel	5318d3db46	crypto: arm64/aes-ctr - improve tail handling Counter mode is a stream cipher chaining mode that is typically used with inputs that are of arbitrarily length, and so a tail block which is smaller than a full AES block is rule rather than exception. The current ctr(aes) implementation for arm64 always makes a separate call into the assembler routine to process this tail block, which is suboptimal, given that it requires reloading of the AES round keys, and prevents us from handling this tail block using the 5-way stride that we use for better performance on deep pipelines. So let's update the assembler routine so it can handle any input size, and uses NEON permutation instructions and overlapping loads and stores to handle the tail block. This results in a ~16% speedup for 1420 byte blocks on cores with deep pipelines such as ThunderX2. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-01-03 08:41:37 +11:00
Ard Biesheuvel	15deb4333c	crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled Commit `69b6f2e817` ("crypto: arm64/aes-neon - limit exposed routines if faster driver is enabled") intended to hide modes from the plain NEON driver that are also implemented by the faster bit sliced NEON one if both are enabled. However, the defined() CPP function does not detect if the bit sliced NEON driver is enabled as a module. So instead, let's use IS_ENABLED() here. Fixes: `69b6f2e817` ("crypto: arm64/aes-neon - limit exposed routines if ...") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-01-03 08:41:37 +11:00
Eric Biggers	a24d22b225	crypto: sha - split sha.h into sha1.h and sha2.h Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-11-20 14:45:33 +11:00
Ard Biesheuvel	3ad99c22ce	crypto: arm64/gcm - move authentication tag check to SIMD domain Instead of copying the calculated authentication tag to memory and calling crypto_memneq() to verify it, use vector bytewise compare and min across vector instructions to decide whether the tag is valid. This is more efficient, and given that the tag is only transiently held in a NEON register, it is also safer, given that calculated tags for failed decryptions should be withheld. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-11-20 14:45:32 +11:00
Ard Biesheuvel	c4fc6328d6	crypto: arm64/chacha - simplify tail block handling Based on lessons learnt from optimizing the 32-bit version of this driver, we can simplify the arm64 version considerably, by reordering the final two stores when the last block is not a multiple of 64 bytes. This removes the need to use permutation instructions to calculate the elements that are clobbered by the final overlapping store, given that the store of the penultimate block now follows it, and that one carries the correct values for those elements already. While at it, simplify the overlapping loads as well, by calculating the address of the final overlapping load upfront, and switching to this address for every load that would otherwise extend past the end of the source buffer. There is no impact on performance, but the resulting code is substantially smaller and easier to follow. Cc: Eric Biggers <ebiggers@google.com> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-11-13 20:38:55 +11:00
Ard Biesheuvel	519a0d7e49	crypto: arm64/poly1305-neon - reorder PAC authentication with SP update PAC pointer authentication signs the return address against the value of the stack pointer, to prevent stack overrun exploits from corrupting the control flow. However, this requires that the AUTIASP is issued with SP holding the same value as it held when the PAC value was generated. The Poly1305 NEON code got this wrong, resulting in crashes on PAC capable hardware. Fixes: `f569ca1647` ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS ...") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-11-06 14:29:11 +11:00
Arvind Sankar	458c0480dc	crypto: hash - Use memzero_explicit() for clearing state Without the barrier_data() inside memzero_explicit(), the compiler may optimize away the state-clearing if it can tell that the state is not used afterwards. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-10-30 17:35:03 +11:00
Linus Torvalds	39a5101f98	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Allow DRBG testing through user-space af_alg - Add tcrypt speed testing support for keyed hashes - Add type-safe init/exit hooks for ahash Algorithms: - Mark arc4 as obsolete and pending for future removal - Mark anubis, khazad, sead and tea as obsolete - Improve boot-time xor benchmark - Add OSCCA SM2 asymmetric cipher algorithm and use it for integrity Drivers: - Fixes and enhancement for XTS in caam - Add support for XIP8001B hwrng in xiphera-trng - Add RNG and hash support in sun8i-ce/sun8i-ss - Allow imx-rngc to be used by kernel entropy pool - Use crypto engine in omap-sham - Add support for Ingenic X1830 with ingenic" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (205 commits) X.509: Fix modular build of public_key_sm2 crypto: xor - Remove unused variable count in do_xor_speed X.509: fix error return value on the failed path crypto: bcm - Verify GCM/CCM key length in setkey crypto: qat - drop input parameter from adf_enable_aer() crypto: qat - fix function parameters descriptions crypto: atmel-tdes - use semicolons rather than commas to separate statements crypto: drivers - use semicolons rather than commas to separate statements hwrng: mxc-rnga - use semicolons rather than commas to separate statements hwrng: iproc-rng200 - use semicolons rather than commas to separate statements hwrng: stm32 - use semicolons rather than commas to separate statements crypto: xor - use ktime for template benchmarking crypto: xor - defer load time benchmark to a later time crypto: hisilicon/zip - fix the uninitalized 'curr_qm_qp_num' crypto: hisilicon/zip - fix the return value when device is busy crypto: hisilicon/zip - fix zero length input in GZIP decompress crypto: hisilicon/zip - fix the uncleared debug registers lib/mpi: Fix unused variable warnings crypto: x86/poly1305 - Remove assignments with no effect hwrng: npcm - modify readl to readb ...	2020-10-13 08:50:16 -07:00

1 2 3 4 5 ...

396 Commits