Make use of the t8 register for vgetrandom-chacha, so we don't need to
reuse a register and rematerialize a constant. I
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
As Christophe pointed out, tuning the chacha implementation by
scheduling the instructions like what GCC does can improve the
performance.
The tuning does not introduce too much complexity (basically it's just
reordering some instructions). And the tuning does not hurt readibility
too much: actually the tuned code looks even more similar to a
textbook-style implementation based on 128-bit vectors. So overall it's
a good deal to me.
Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
with a lower issue rate.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@csgroup.eu/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
wire up the selftests.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Acked-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>