mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-12-02 21:40:45 +00:00
When retpolines are enabled they have high overhead in the inner loop inside kvm_handle_hva_range() that iterates over the provided memory area. Let's mark this function and its TDP MMU equivalent __always_inline so compiler will be able to change the call to the actual handler function inside each of them into a direct one. This significantly improves performance on the unmap test on the existing kernel memslot code (tested on a Xeon 8167M machine): 30 slots in use: Test Before After Improvement Unmap 0.0353s 0.0334s 5% Unmap 2M 0.00104s 0.000407s 61% 509 slots in use: Test Before After Improvement Unmap 0.0742s 0.0740s None Unmap 2M 0.00221s 0.00159s 28% Looks like having an indirect call in these functions (and, so, a retpoline) might have interfered with unrolling of the whole loop in the CPU. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|---|---|---|
| .. | ||
| mmu_audit.c | ||
| mmu_internal.h | ||
| mmu.c | ||
| mmutrace.h | ||
| page_track.c | ||
| paging_tmpl.h | ||
| spte.c | ||
| spte.h | ||
| tdp_iter.c | ||
| tdp_iter.h | ||
| tdp_mmu.c | ||
| tdp_mmu.h | ||