Commit Graph

1620 Commits

Author SHA1 Message Date
Linus Torvalds
56ec8e4cd8 arm64 updates for 6.7:
* Major refactoring of the CPU capability detection logic resulting in
   the removal of the cpus_have_const_cap() function and migrating the
   code to "alternative" branches where possible
 
 * Backtrace/kgdb: use IPIs and pseudo-NMI
 
 * Perf and PMU:
 
   - Add support for Ampere SoC PMUs
 
   - Multi-DTC improvements for larger CMN configurations with multiple
     Debug & Trace Controllers
 
   - Rework the Arm CoreSight PMU driver to allow separate registration of
     vendor backend modules
 
   - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
     driver; use device_get_match_data() in the xgene driver; fix NULL
     pointer dereference in the hisi driver caused by calling
     cpuhp_state_remove_instance(); use-after-free in the hisi driver
 
 * HWCAP updates:
 
   - FEAT_SVE_B16B16 (BFloat16)
 
   - FEAT_LRCPC3 (release consistency model)
 
   - FEAT_LSE128 (128-bit atomic instructions)
 
 * SVE: remove a couple of pseudo registers from the cpufeature code.
   There is logic in place already to detect mismatched SVE features
 
 * Miscellaneous:
 
   - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
     bouncing is needed. The buffer is still required for small kmalloc()
     buffers
 
   - Fix module PLT counting with !RANDOMIZE_BASE
 
   - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
     synchronisation code out of the set_ptes() loop
 
   - More compact cpufeature displaying enabled cores
 
   - Kselftest updates for the new CPU features
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmU7/QUACgkQa9axLQDI
 XvEx3xAAjICmHm+ryKJxS1IGXLYu2DXMcHUjeW6w1SxkK/vKhTMlHRx/CIWDze2l
 eENu7TcDLtTw+Gv9kqg30TSwzLfJhP9oFpX2T5TKkh5qlJlbz8fBtm+as14DTLCZ
 p2sra3J0w4B5JwTVqnj2RHOlEftMKvbyLGRkz3ve6wIUbsp5pXMkxAd/k3wOf0lC
 m6d9w1OMA2sOsw9YCgjcCNQGEzFMJk+13w7K+4w6A8Djn/Jxkt4fAFVn2ZlCiZzD
 NA2lTDWJqGmeGHo3iFdCTensWXmWTqjzxsNEf7PyBk5mBOdzDVxlTfEL7vnJg7gf
 BlTQ/nhIpra7rHQ9q2rwqEzbF+4Tn3uWlQfdDb7+/4goPjDh7tlBhEOYyOwTCEIT
 0t9cCSvBmSCKeXC3lKWWtJ+QJKhZHSmXN84EotTs65KyyfIsi4RuSezvV/+aIL86
 06sHYlYxETuujZP1cgOjf69Wsdsgizx0mqXJXf/xOjp22HFDcL4Bki6Rgi6t5OZj
 GEHG15kSE+eJ+RIpxpuAN8fdrlxYubsVLIksCqK7cZf9zXbQGIlifKAIrYiEx6kz
 FD+o+j/5niRWR6yJZCtCcGxqpSlwnYWPqc1Ds0GES8A/BphWMPozXUAZ0ll4Fnp1
 yyR2/Due/eBsCNESn579kP8989rashubB8vxvdx2fcWVtLC7VgE=
 =QaEo
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "No major architecture features this time around, just some new HWCAP
  definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.

  The bulk of the changes is reworking of the CPU capability checking
  code (cpus_have_cap() etc).

   - Major refactoring of the CPU capability detection logic resulting
     in the removal of the cpus_have_const_cap() function and migrating
     the code to "alternative" branches where possible

   - Backtrace/kgdb: use IPIs and pseudo-NMI

   - Perf and PMU:

      - Add support for Ampere SoC PMUs

      - Multi-DTC improvements for larger CMN configurations with
        multiple Debug & Trace Controllers

      - Rework the Arm CoreSight PMU driver to allow separate
        registration of vendor backend modules

      - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
        driver; use device_get_match_data() in the xgene driver; fix
        NULL pointer dereference in the hisi driver caused by calling
        cpuhp_state_remove_instance(); use-after-free in the hisi driver

   - HWCAP updates:

      - FEAT_SVE_B16B16 (BFloat16)

      - FEAT_LRCPC3 (release consistency model)

      - FEAT_LSE128 (128-bit atomic instructions)

   - SVE: remove a couple of pseudo registers from the cpufeature code.
     There is logic in place already to detect mismatched SVE features

   - Miscellaneous:

      - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
        bouncing is needed. The buffer is still required for small
        kmalloc() buffers

      - Fix module PLT counting with !RANDOMIZE_BASE

      - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
        synchronisation code out of the set_ptes() loop

      - More compact cpufeature displaying enabled cores

      - Kselftest updates for the new CPU features"

 * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
  arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
  arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
  arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
  perf: hisi: Fix use-after-free when register pmu fails
  drivers/perf: hisi_pcie: Initialize event->cpu only on success
  drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
  arm64: cpufeature: Change DBM to display enabled cores
  arm64: cpufeature: Display the set of cores with a feature
  perf/arm-cmn: Enable per-DTC counter allocation
  perf/arm-cmn: Rework DTC counters (again)
  perf/arm-cmn: Fix DTC domain detection
  drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
  drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
  drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
  clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
  arm64: Remove system_uses_lse_atomics()
  arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
  drivers/perf: xgene: Use device_get_match_data()
  perf/amlogic: add missing MODULE_DEVICE_TABLE
  arm64/mm: Hoist synchronization out of set_ptes() loop
  ...
2023-11-01 09:34:55 -10:00
Catalin Marinas
14dcf78a6c Merge branch 'for-next/cpus_have_const_cap' into for-next/core
* for-next/cpus_have_const_cap: (38 commits)
  : cpus_have_const_cap() removal
  arm64: Remove cpus_have_const_cap()
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
  arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
  arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
  arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
  arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
  arm64: Avoid cpus_have_const_cap() for ARM64_MTE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
  ...
2023-10-26 17:10:18 +01:00
Andrey Konovalov
416a616e54 arm64, kasan: update comment in kasan_init
Patch series "kasan: assorted fixes and improvements".


This patch (of 5):

Update the comment in kasan_init to also mention the Hardware Tag-Based
KASAN mode.

Link: https://lkml.kernel.org/r/cover.1696605143.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/4186aefd368b019eaf27c907c4fa692a89448d66.1696605143.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:15 -07:00
Mark Rutland
412cb3801d arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_2645198 but
this is not necessary and alternative_has_cap() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_WORKAROUND_2645198 cpucap is detected and patched before any
userspace translation table exist, and the workaround is only necessary
when manipulating usrspace translation tables which are in use. Thus it
is not necessary to use cpus_have_const_cap(), and alternative_has_cap()
is equivalent.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.  The ARM64_WORKAROUND_2645198 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is
not possible, and redundant IS_ENABLED() checks are removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:07 +01:00
Mark Rutland
4e00f1d9b7 arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
We use cpus_have_const_cap() to check for ARM64_HAS_EPAN but this is not
necessary and alternative_has_cap() or cpus_have_cap() would be
preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_HAS_EPAN cpucap is used to affect two things:

1) The permision bits used for userspace executable mappings, which are
   chosen by adjust_protection_map(), which is an arch_initcall. This is
   called after the ARM64_HAS_EPAN cpucap has been detected and
   alternatives have been patched, and before any userspace translation
   tables exist.

2) The handling of faults taken from (user or kernel) accesses to
   userspace executable mappings in do_page_fault(). Userspace
   translation tables are created after adjust_protection_map() is
   called, and hence after the ARM64_HAS_EPAN cpucap has been detected
   and alternatives have been patched.

Neither of these run until after ARM64_HAS_EPAN cpucap has been detected
and alternatives have been patched, and hence there's no need to use
cpus_have_const_cap(). Since adjust_protection_map() is only executed
once at boot time it would be best for it to use cpus_have_cap(), and
since do_page_fault() is executed frequently it would be best for it to
use alternatives_have_cap_unlikely().

This patch replaces the uses of cpus_have_const_cap() with
cpus_have_cap() and alternative_has_cap_unlikely(), which will avoid
generating redundant code, and should be better for all subsequent calls
at runtime. The ARM64_HAS_EPAN cpucap is added to cpucap_is_possible()
so that code can be elided entirely when this is not possible.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:04 +01:00
Mark Rutland
34f66c4c4d arm64: Use a positive cpucap for FP/SIMD
Currently we have a negative cpucap which describes the *absence* of
FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is
somewhat awkward relative to other cpucaps that describe the presence of
a feature, and it would be nicer to have a cpucap which describes the
presence of FP/SIMD:

* This will allow the cpucap to be treated as a standard
  ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard
  has_cpuid_feature() function and ARM64_CPUID_FIELDS() description.

* This ensures that the cpucap will only transition from not-present to
  present, reducing the risk of unintentional and/or unsafe usage of
  FP/SIMD before cpucaps are finalized.

* This will allow using arm64_cpu_capabilities::cpu_enable() to enable
  the use of FP/SIMD later, with FP/SIMD being disabled at boot time
  otherwise. This will ensure that any unintentional and/or unsafe usage
  of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is
  never unintentionally enabled for userspace in mismatched big.LITTLE
  systems.

This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a
positive ARM64_HAS_FPSIMD cpucap, making changes as described above.
Note that as FP/SIMD will now be trapped when not supported system-wide,
do_fpsimd_acc() must handle these traps in the same way as for SVE and
SME. The commentary in fpsimd_restore_current_state() is updated to
describe the new scheme.

No users of system_supports_fpsimd() need to know that FP/SIMD is
available prior to alternatives being patched, so this is updated to
use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD
cpucap, without generating code to test the system_cpucaps bitmap.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:03 +01:00
Catalin Marinas
65033574ad arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed
With CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC enabled, the arm64 kernel still
allocates the default SWIOTLB buffer (64MB) even if ZONE_DMA is disabled
or all the RAM fits into this zone. However, this potentially wastes a
non-negligible amount of memory on platforms with little RAM.

Reduce the SWIOTLB size to 1MB per 1GB of RAM if only needed for
kmalloc() buffer bouncing.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Ross Burton <ross.burton@arm.com>
Cc: Ross Burton <ross.burton@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2023-10-13 16:10:39 +01:00
Baoquan He
fdc268232d arm64: kdump: use generic interface to simplify crashkernel reservation
With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:

1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN,
   CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
   DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>;

2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
   reserve_crashkernel_generic();

3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
   arch/arm64/Kconfig.

The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.

Link: https://lkml.kernel.org/r/20230914033142.676708-8-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04 10:41:58 -07:00
Baoquan He
a9e1a3d84e crash_core: change the prototype of function parse_crashkernel()
Add two parameters 'low_size' and 'high' to function parse_crashkernel(),
later crashkernel=,high|low parsing will be added.  Make adjustments in
all call sites of parse_crashkernel() in arch.

Link: https://lkml.kernel.org/r/20230914033142.676708-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04 10:41:58 -07:00
Ryan Roberts
6f1bace9a9 arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
When called with a swap entry that does not embed a PFN (e.g. 
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215a ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

Arguably, the root cause is really due to commit 18f3962953 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above.  While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs.  And since then new
callers have come along which rely on this working.  But given the
brokeness is only observable after commit 8a13897fb0 ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.

Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.

Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Ryan Roberts
935d4f0c6d mm: hugetlb: add huge page size param to set_huge_pte_at()
Patch series "Fix set_huge_pte_at() panic on arm64", v2.

This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic.  The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory.  This test (and the uffd poison feature) was merged for
v6.5-rc7.

Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.


Description of Bug
==================

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215a ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():

    static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
    {
        VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));

        return page_folio(pfn_to_page(swp_offset_pfn(entry)));
    }


Fix
===

The simplest fix would have been to revert the dodgy cleanup commit
18f3962953 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at().  As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.

So instead, I've added a huge page size parameter to set_huge_pte_at(). 
This means that the arm64 code has the size in all cases.  It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.

I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64).  I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.


This patch (of 2):

In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at().  Provide for this by
adding an `unsigned long sz` parameter to the function.  This follows the
same pattern as huge_pte_clear().

This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc).  The actual arm64 bug will be fixed in a separate
commit.

No behavioral changes intended.

Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>	[powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>	[vmalloc change]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Linus Torvalds
df57721f9a Add x86 shadow stack support
Convert IBT selftest to asm to fix objtool warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ
 krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC
 Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/
 gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO
 Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR
 ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx
 rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9
 MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/
 Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s
 VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh
 8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp
 AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA=
 =3UUm
 -----END PGP SIGNATURE-----

Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 shadow stack support from Dave Hansen:
 "This is the long awaited x86 shadow stack support, part of Intel's
  Control-flow Enforcement Technology (CET).

  CET consists of two related security features: shadow stacks and
  indirect branch tracking. This series implements just the shadow stack
  part of this feature, and just for userspace.

  The main use case for shadow stack is providing protection against
  return oriented programming attacks. It works by maintaining a
  secondary (shadow) stack using a special memory type that has
  protections against modification. When executing a CALL instruction,
  the processor pushes the return address to both the normal stack and
  to the special permission shadow stack. Upon RET, the processor pops
  the shadow stack copy and compares it to the normal stack copy.

  For more information, refer to the links below for the earlier
  versions of this patch set"

Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/

* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/shstk: Change order of __user in type
  x86/ibt: Convert IBT selftest to asm
  x86/shstk: Don't retry vm_munmap() on -EINTR
  x86/kbuild: Fix Documentation/ reference
  x86/shstk: Move arch detail comment out of core mm
  x86/shstk: Add ARCH_SHSTK_STATUS
  x86/shstk: Add ARCH_SHSTK_UNLOCK
  x86: Add PTRACE interface for shadow stack
  selftests/x86: Add shadow stack test
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/shstk: Wire in shadow stack interface
  x86: Expose thread features in /proc/$PID/status
  x86/shstk: Support WRSS for userspace
  x86/shstk: Introduce map_shadow_stack syscall
  x86/shstk: Check that signal frame is shadow stack mem
  x86/shstk: Check that SSP is aligned on sigreturn
  x86/shstk: Handle signals for shadow stack
  x86/shstk: Introduce routines modifying shstk
  x86/shstk: Handle thread shadow stack
  x86/shstk: Add user-mode shadow stack support
  ...
2023-08-31 12:20:12 -07:00
Linus Torvalds
6c1b980a7e dma-maping updates for Linux 6.6
- allow dynamic sizing of the swiotlb buffer, to cater for secure
    virtualization workloads that require all I/O to be bounce buffered
    (Petr Tesarik)
  - move a declaration to a header (Arnd Bergmann)
  - check for memory region overlap in dma-contiguous (Binglei Wang)
  - remove the somewhat dangerous runtime swiotlb-xen enablement and
    unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)
  - per-node CMA improvements (Yajun Deng)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmTuDHkLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOqvhAApMk2/ceTgVH17sXaKE822+xKvgv377O6TlggMeGG
 W4zA0KD69DNz0AfaaCc5U5f7n8Ld/YY1RsvkHW4b3jgw+KRTeQr0jjitBgP5kP2M
 A1+qxdyJpCTwiPt9s2+JFVPeyZ0s52V6OJODKRG3s0ore55R+U09VySKtASON+q3
 GMKfWqQteKC+thg7NkrQ7JUixuo84oICws+rZn4K9ifsX2O0HYW6aMW0feRfZjJH
 r0TgqZc4RdPTSaF22oapR9Ls39+7hp/pBvoLm5sBNA3cl5C3X4VWo9ERMU1jW9h+
 VYQv39NycUspgskWJmpbU06/+ooYqQlwHSR/vdNusmFIvxo4tf6/UX72YO5F8Dar
 ap0wYGauiEwTjSnhVxPTXk3obWyWEsgFAeRnPdTlH2CNmv38QZU2HLb8eU1pcXxX
 j+WI2Ewy9z22uBVYiPOKpdW1jkSfmlmfPp/8SbAdua7I3YQ90rQN6AvU06zAi/cL
 NQTgO81E4jPkygqAVgS/LeYziWAQ73yM7m9ExThtTgqFtHortwhJ4Fd8XKtvtvEb
 viXAZ/WZtQBv/CIKAW98NhgIDP/SPOT8ym6V35WK+kkNFMS6LMSQUfl9GgbHGyFa
 n9icMm7BmbDtT1+AKNafG9En4DtAf9M9QNidAVOyfrsIk6S0gZoZwvIStkA7on8a
 cNY=
 =kVVr
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-maping updates from Christoph Hellwig:

 - allow dynamic sizing of the swiotlb buffer, to cater for secure
   virtualization workloads that require all I/O to be bounce buffered
   (Petr Tesarik)

 - move a declaration to a header (Arnd Bergmann)

 - check for memory region overlap in dma-contiguous (Binglei Wang)

 - remove the somewhat dangerous runtime swiotlb-xen enablement and
   unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)

 - per-node CMA improvements (Yajun Deng)

* tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: optimize get_max_slots()
  swiotlb: move slot allocation explanation comment where it belongs
  swiotlb: search the software IO TLB only if the device makes use of it
  swiotlb: allocate a new memory pool when existing pools are full
  swiotlb: determine potential physical address limit
  swiotlb: if swiotlb is full, fall back to a transient memory pool
  swiotlb: add a flag whether SWIOTLB is allowed to grow
  swiotlb: separate memory pool data from other allocator data
  swiotlb: add documentation and rename swiotlb_do_find_slots()
  swiotlb: make io_tlb_default_mem local to swiotlb.c
  swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated
  dma-contiguous: check for memory region overlap
  dma-contiguous: support numa CMA for specified node
  dma-contiguous: support per-numa CMA for all architectures
  dma-mapping: move arch_dma_set_mask() declaration to header
  swiotlb: unexport is_swiotlb_active
  x86: always initialize xen-swiotlb when xen-pcifront is enabling
  xen/pci: add flag for PCI passthrough being possible
2023-08-29 20:32:10 -07:00
Linus Torvalds
b96a3e9142 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP.  It
   also speeds up GUP handling of transparent hugepages.
 
 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").
 
 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").
 
 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").
 
 - xu xin has doe some work on KSM's handling of zero pages.  These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support tracking
   KSM-placed zero-pages").
 
 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
 
 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").
 
 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
 
 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").
 
 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").
 
 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").
 
 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").
 
 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").
 
 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap").  And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").
 
 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
 
 - Baoquan He has converted some architectures to use the GENERIC_IOREMAP
   ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
   GENERIC_IOREMAP way").
 
 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").
 
 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep").  Liam also developed some efficiency improvements
   ("Reduce preallocations for maple tree").
 
 - Cleanup and optimization to the secondary IOMMU TLB invalidation, from
   Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").
 
 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").
 
 - Kemeng Shi provides some maintenance work on the compaction code ("Two
   minor cleanups for compaction").
 
 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
   file-backed faults under the VMA lock").
 
 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").
 
 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").
 
 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").
 
 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
 
 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").
 
 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").
 
 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
 
 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").
 
 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").
 
 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
   on memory feature on ppc64").
 
 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
 
 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").
 
 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").
 
 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").
 
 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").
 
 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").
 
 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").
 
 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").
 
 - pagetable handling cleanups from Matthew Wilcox ("New page table range
   API").
 
 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").
 
 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").
 
 - Matthew Wilcox has also done some maintenance work on the MM subsystem
   documentation ("Improve mm documentation").
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
 jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
 Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
 =Afws
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
2023-08-29 14:25:26 -07:00
David Hildenbrand
cfeed8ffe5 mm/swap: stop using page->private on tail pages for THP_SWAP
Patch series "mm/swap: stop using page->private on tail pages for THP_SWAP
+ cleanups".

This series stops using page->private on tail pages for THP_SWAP, replaces
folio->private by folio->swap for swapcache folios, and starts using
"new_folio" for tail pages that we are splitting to remove the usage of
page->private for swapcache handling completely.


This patch (of 4):

Let's stop using page->private on tail pages, making it possible to just
unconditionally reuse that field in the tail pages of large folios.

The remaining usage of the private field for THP_SWAP is in the THP
splitting code (mm/huge_memory.c), that we'll handle separately later.

Update the THP_SWAP documentation and sanity checks in mm_types.h and
__split_huge_page_tail().

[david@redhat.com: stop using page->private on tail pages for THP_SWAP]
  Link: https://lkml.kernel.org/r/6f0a82a3-6948-20d9-580b-be1dbf415701@redhat.com
Link: https://lkml.kernel.org/r/20230821160849.531668-1-david@redhat.com
Link: https://lkml.kernel.org/r/20230821160849.531668-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:28 -07:00
Qi Zheng
00de2c9f26 arm64: mm: use ptep_clear() instead of pte_clear() in clear_flush()
In clear_flush(), the original pte may be a present entry, so we should
use ptep_clear() to let page_table_check track the pte clearing operation,
otherwise it may cause false positive in subsequent set_pte_at().

Link: https://lkml.kernel.org/r/20230810093241.1181142-1-qi.zheng@linux.dev
Fixes: 42b2547137 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:27 -07:00
Matthew Wilcox (Oracle)
4a169d61c2 arm64: implement the new page table range API
Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().  Change
the PG_dcache_clean flag from being per-page to per-folio.

Link: https://lkml.kernel.org/r/20230802151406.3735276-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:20 -07:00
Suren Baghdasaryan
4089eef0e6 mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED
handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means
mmap_lock has been released.  However with per-VMA locks behavior is
different and the caller should still release it.  To make the rules
consistent for the caller, drop the per-VMA lock when returning
VM_FAULT_RETRY or VM_FAULT_COMPLETED.  Currently the only path returning
VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns
VM_FAULT_COMPLETED for now.

[willy@infradead.org: fix riscv]
  Link: https://lkml.kernel.org/r/CAJuCfpE6GWEx1rPBmNpUfoD5o-gNFz9-UFywzCE2PbEGBiVz7g@mail.gmail.com
Link: https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:17 -07:00
Vishal Moola (Oracle)
11b4fa8b2a arm64: convert various functions to use ptdescs
As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Link: https://lkml.kernel.org/r/20230807230513.102486-19-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:55 -07:00
Matthew Wilcox (Oracle)
284e059204 mm: remove CONFIG_PER_VMA_LOCK ifdefs
Patch series "Handle most file-backed faults under the VMA lock", v3.

This patchset adds the ability to handle page faults on parts of files
which are already in the page cache without taking the mmap lock.


This patch (of 10):

Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to
eliminate ifdefs in the users.

Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:50 -07:00
Baoquan He
8f03d74f71 arm64 : mm: add wrapper function ioremap_prot()
Since hook functions ioremap_allowed() and iounmap_allowed() will be
obsoleted, add wrapper function ioremap_prot() to contain the the specific
handling in addition to generic_ioremap_prot() invocation.

Link: https://lkml.kernel.org/r/20230706154520.11257-19-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:36 -07:00
Zhang Jianhua
4e0bacd65e arm64: fix build warning for ARM64_MEMSTART_SHIFT
When building with W=1, the following warning occurs.

arch/arm64/include/asm/kernel-pgtable.h:129:41: error: "PUD_SHIFT" is not defined, evaluates to 0 [-Werror=undef]
  129 | #define ARM64_MEMSTART_SHIFT            PUD_SHIFT
      |                                         ^~~~~~~~~
arch/arm64/include/asm/kernel-pgtable.h:142:5: note: in expansion of macro ‘ARM64_MEMSTART_SHIFT’
  142 | #if ARM64_MEMSTART_SHIFT < SECTION_SIZE_BITS
      |     ^~~~~~~~~~~~~~~~~~~~

The generic PUD_SHIFT was defined in include/asm-generic/pgtable-nopud.h,
however the #ifndef __ASSEMBLY__ guard in this header file makes it unavailable
for assembly files. While someone .S file include the <asm/kernel-pgtable.h>,
the build warning would occur. Now move the macro ARM64_MEMSTART_SHIFT and
ARM64_MEMSTART_ALIGN to arch/arm64/mm/init.c where it is used only, to avoid
this issue.

Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230804075615.3334756-1-chris.zjh@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-08-04 17:19:44 +01:00
Yajun Deng
22e4a348f8 dma-contiguous: support per-numa CMA for all architectures
In the commit b7176c261c ("dma-contiguous: provide the ability to
reserve per-numa CMA"), Barry adds DMA_PERNUMA_CMA for ARM64.

But this feature is architecture independent, so support per-numa CMA
for all architectures, and enable it by default if NUMA.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-07-31 17:54:28 +02:00
Anshuman Khandual
d0999555e3 arm64/mm: Replace an open coding with ID_AA64MMFR1_EL1_HAFDBS_MASK
Replace '0xf' with ID_AA64MMFR1_EL1_HAFDBS_MASK while evaluating if the cpu
supports implicit page table entry access flag update in HW.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230711090458.238346-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-07-27 11:01:46 +01:00
Nikhil V
a8bd38dbc5 arm64: mm: Make hibernation aware of KFENCE
In the restore path, swsusp_arch_suspend_exit uses copy_page() to
over-write memory. However, with features like KFENCE enabled, there could
be situations where it may have marked some pages as not valid, due to
which it could be reported as invalid accesses.

Consider a situation where page 'P' was part of the hibernation image.
Now, when the resume kernel tries to restore the pages, the same page 'P'
is already in use in the resume kernel and is kfence protected, due to
which its mapping is removed from linear map. Since restoring pages happens
with the resume kernel page tables, we would end up accessing 'P' during
copy and results in kernel pagefault.

The proposed fix tries to solve this issue by marking PTE as valid for such
kfence protected pages.

Co-developed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
Link: https://lore.kernel.org/r/20230713070757.4093-1-quic_nprakash@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20 11:44:50 +01:00
Rick Edgecombe
6ecc21bb43 mm: Move pte/pmd_mkwrite() callers with no VMA to _novma()
The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.

One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable or shadow stack mappings.

But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.

So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.

Earlier work did the first step, so next move the callers that don't have
a VMA to pte_mkwrite_novma(). Also do the same for pmd_mkwrite().  This
will be ok for the shadow stack feature, as these callers are on kernel
memory which will not need to be made shadow stack, and the other
architectures only currently support one type of memory in pte_mkwrite()

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/all/20230613001108.3040476-3-rick.p.edgecombe%40intel.com
2023-07-11 14:10:57 -07:00
SeongJae Park
24be4d0b46 arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault()
Commit ae870a68b5 ("arm64/mm: Convert to using
lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if
CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the
ifdef.

As a result, building kernel without the config fails with undeclared
variable error as below:

    arch/arm64/mm/fault.c: In function 'do_page_fault':
    arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'?
      624 |  vma = lock_mm_and_find_vma(mm, addr, regs);
          |  ^~~
          |  vmap

Fix it by moving the declaration out of the ifdef.

Fixes: ae870a68b5 ("arm64/mm: Convert to using lock_mm_and_find_vma()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-03 19:04:32 -07:00
Linus Torvalds
9471f1f2f5 Merge branch 'expand-stack'
This modifies our user mode stack expansion code to always take the
mmap_lock for writing before modifying the VM layout.

It's actually something we always technically should have done, but
because we didn't strictly need it, we were being lazy ("opportunistic"
sounds so much better, doesn't it?) about things, and had this hack in
place where we would extend the stack vma in-place without doing the
proper locking.

And it worked fine.  We just needed to change vm_start (or, in the case
of grow-up stacks, vm_end) and together with some special ad-hoc locking
using the anon_vma lock and the mm->page_table_lock, it all was fairly
straightforward.

That is, it was all fine until Ruihan Li pointed out that now that the
vma layout uses the maple tree code, we *really* don't just change
vm_start and vm_end any more, and the locking really is broken.  Oops.

It's not actually all _that_ horrible to fix this once and for all, and
do proper locking, but it's a bit painful.  We have basically three
different cases of stack expansion, and they all work just a bit
differently:

 - the common and obvious case is the page fault handling. It's actually
   fairly simple and straightforward, except for the fact that we have
   something like 24 different versions of it, and you end up in a maze
   of twisty little passages, all alike.

 - the simplest case is the execve() code that creates a new stack.
   There are no real locking concerns because it's all in a private new
   VM that hasn't been exposed to anybody, but lockdep still can end up
   unhappy if you get it wrong.

 - and finally, we have GUP and page pinning, which shouldn't really be
   expanding the stack in the first place, but in addition to execve()
   we also use it for ptrace(). And debuggers do want to possibly access
   memory under the stack pointer and thus need to be able to expand the
   stack as a special case.

None of these cases are exactly complicated, but the page fault case in
particular is just repeated slightly differently many many times.  And
ia64 in particular has a fairly complicated situation where you can have
both a regular grow-down stack _and_ a special grow-up stack for the
register backing store.

So to make this slightly more manageable, the bulk of this series is to
first create a helper function for the most common page fault case, and
convert all the straightforward architectures to it.

Thus the new 'lock_mm_and_find_vma()' helper function, which ends up
being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa.  So we not only convert more
than half the architectures, we now have more shared code and avoid some
of those twisty little passages.

And largely due to this common helper function, the full diffstat of
this series ends up deleting more lines than it adds.

That still leaves eight architectures (ia64, m68k, microblaze, openrisc,
parisc, s390, sparc64 and um) that end up doing 'expand_stack()'
manually because they are doing something slightly different from the
normal pattern.  Along with the couple of special cases in execve() and
GUP.

So there's a couple of patches that first create 'locked' helper
versions of the stack expansion functions, so that there's a obvious
path forward in the conversion.  The execve() case is then actually
pretty simple, and is a nice cleanup from our old "grow-up stackls are
special, because at execve time even they grow down".

The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because
it's just more straightforward to write out the stack expansion there
manually, instead od having get_user_pages_remote() do it for us in some
situations but not others and have to worry about locking rules for GUP.

And the final step is then to just convert the remaining odd cases to a
new world order where 'expand_stack()' is called with the mmap_lock held
for reading, but where it might drop it and upgrade it to a write, only
to return with it held for reading (in the success case) or with it
completely dropped (in the failure case).

In the process, we remove all the stack expansion from GUP (where
dropping the lock wouldn't be ok without special rules anyway), and add
it in manually to __access_remote_vm() for ptrace().

Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases.
Everything else here felt pretty straightforward, but the ia64 rules for
stack expansion are really quite odd and very different from everything
else.  Also thanks to Vegard Nossum who caught me getting one of those
odd conditions entirely the wrong way around.

Anyway, I think I want to actually move all the stack expansion code to
a whole new file of its own, rather than have it split up between
mm/mmap.c and mm/memory.c, but since this will have to be backported to
the initial maple tree vma introduction anyway, I tried to keep the
patches _fairly_ minimal.

Also, while I don't think it's valid to expand the stack from GUP, the
final patch in here is a "warn if some crazy GUP user wants to try to
expand the stack" patch.  That one will be reverted before the final
release, but it's left to catch any odd cases during the merge window
and release candidates.

Reported-by: Ruihan Li <lrh2000@pku.edu.cn>

* branch 'expand-stack':
  gup: add warning if some caller would seem to want stack expansion
  mm: always expand the stack with the mmap write lock held
  execve: expand new process stack manually ahead of time
  mm: make find_extend_vma() fail if write lock not held
  powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
  mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
  arm/mm: Convert to using lock_mm_and_find_vma()
  riscv/mm: Convert to using lock_mm_and_find_vma()
  mips/mm: Convert to using lock_mm_and_find_vma()
  powerpc/mm: Convert to using lock_mm_and_find_vma()
  arm64/mm: Convert to using lock_mm_and_find_vma()
  mm: make the page fault mmap locking killable
  mm: introduce new 'lock_mm_and_find_vma()' page fault helper
2023-06-28 20:35:21 -07:00
Linus Torvalds
6e17c6de3d - Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing.
 
 - Nhat Pham adds the new cachestat() syscall.  It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability.
 
 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning.
 
 - Lorenzo Stoakes has done some maintanance work on the get_user_pages()
   interface.
 
 - Liam Howlett continues with cleanups and maintenance work to the maple
   tree code.  Peng Zhang also does some work on maple tree.
 
 - Johannes Weiner has done some cleanup work on the compaction code.
 
 - David Hildenbrand has contributed additional selftests for
   get_user_pages().
 
 - Thomas Gleixner has contributed some maintenance and optimization work
   for the vmalloc code.
 
 - Baolin Wang has provided some compaction cleanups,
 
 - SeongJae Park continues maintenance work on the DAMON code.
 
 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting.
 
 - Christoph Hellwig has some cleanups for the filemap/directio code.
 
 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the provided
   APIs rather than open-coding accesses.
 
 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings.
 
 - John Hubbard has a series of fixes to the MM selftesting code.
 
 - ZhangPeng continues the folio conversion campaign.
 
 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock.
 
 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from
   128 to 8.
 
 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management.
 
 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code.
 
 - Vishal Moola also has done some folio conversion work.
 
 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA
 joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh
 J0qXCzqaaN8+BuEyLGDVPaXur9KirwY=
 =B7yQ
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00
Linus Torvalds
2605e80d34 arm64 updates for 6.5:
- Support for the Armv8.9 Permission Indirection Extensions. While this
   feature doesn't add new functionality, it enables future support for
   Guarded Control Stacks (GCS) and Permission Overlays.
 
 - User-space support for the Armv8.8 memcpy/memset instructions.
 
 - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs
   identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and
   cleanups.
 
 - Removal of superfluous ISBs on context switch (following retrospective
   architecture tightening).
 
 - Decode the ISS2 register during faults for additional information to
   help with debugging.
 
 - KPTI clean-up/simplification of the trampoline exit code.
 
 - Addressing several -Wmissing-prototype warnings.
 
 - Kselftest improvements for signal handling and ptrace.
 
 - Fix TPIDR2_EL0 restoring on sigreturn
 
 - Clean-up, robustness improvements of the module allocation code.
 
 - More sysreg conversions to the automatic register/bitfields
   generation.
 
 - CPU capabilities handling cleanup.
 
 - Arm documentation updates: ACPI, ptdump.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmSZyXwACgkQa9axLQDI
 XvEM3BAAkMzHGTDhNVNGLSO07PVmdzTiuoNFlfX7bktdIb+El76VhGXhHeEywTje
 wAq9JIYBf/Src2HbgZLwuly8Fn2vCrhyp++bRJW82o9SiBnx91+0mH7zLf+XHiQ4
 FHKZxvaE6PaDc9o8WXr+IeucPRb5W2HgH37mktxh7ShMLsxorwS94V1oL29A2mV9
 t4XQY7/tdmrDKMKMuQnIr1DurNXBhJ1OKvDnSN/Zzm96JOU/QQ32N2wEE7Y0aHOh
 bBzClksx2mguQqV515mySGFe5yy9NqaAfx2hTAciq+1rwbiCSjqQQmEswoUH8WLX
 JNLylxADWT2qXThFe8W6uyFzEshSAoI1yKxlCGuOsQpu4sFJtR8oh8dDj5669g4Y
 j0jR87r9rWm0iyYI5I+XDMxFVyuh2eFInvjtynRbj+mtS3f/SkO8fXG6Uya+I76C
 UGLlBUKnLr/zHuIGN0LE/V4dYTqsi9EtHoc2Am2xCZsS9jqkxKJG8C93Zsm4GlJC
 OcUtBSjW0rYJq+tLk0yhR6hbh59QbiRh05KnZsPpOKi8purlKSL9ZNPRi7TndLdm
 HjHUY+vQwNIpPIb6pyK4aYZuTdGEQIsQykQ8CULiIGlHi7kc4g9029ouLc5bBAeU
 mU8D62I2ztzPoYljYWNtO7K6g/Dq8c4lpsaMAJ+1Wp2iq2xBJjo=
 =rNBK
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Notable features are user-space support for the memcpy/memset
  instructions and the permission indirection extension.

   - Support for the Armv8.9 Permission Indirection Extensions. While
     this feature doesn't add new functionality, it enables future
     support for Guarded Control Stacks (GCS) and Permission Overlays

   - User-space support for the Armv8.8 memcpy/memset instructions

   - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs
     identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and
     cleanups

   - Removal of superfluous ISBs on context switch (following
     retrospective architecture tightening)

   - Decode the ISS2 register during faults for additional information
     to help with debugging

   - KPTI clean-up/simplification of the trampoline exit code

   - Addressing several -Wmissing-prototype warnings

   - Kselftest improvements for signal handling and ptrace

   - Fix TPIDR2_EL0 restoring on sigreturn

   - Clean-up, robustness improvements of the module allocation code

   - More sysreg conversions to the automatic register/bitfields
     generation

   - CPU capabilities handling cleanup

   - Arm documentation updates: ACPI, ptdump"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits)
  kselftest/arm64: Add a test case for TPIDR2 restore
  arm64/signal: Restore TPIDR2 register rather than memory state
  arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
  Documentation/arm64: Add ptdump documentation
  arm64: hibernate: remove WARN_ON in save_processor_state
  kselftest/arm64: Log signal code and address for unexpected signals
  docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
  arm64/fpsimd: Exit streaming mode when flushing tasks
  docs: perf: Add new description for HiSilicon UC PMU
  drivers/perf: hisi: Add support for HiSilicon UC PMU driver
  drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
  perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
  perf/arm-cmn: Add sysfs identifier
  perf/arm-cmn: Revamp model detection
  perf/arm_dmc620: Add cpumask
  arm64: mm: fix VA-range sanity check
  arm64/mm: remove now-superfluous ISBs from TTBR writes
  Documentation/arm64: Update ACPI tables from BBR
  Documentation/arm64: Update references in arm-acpi
  Documentation/arm64: Update ARM and arch reference
  ...
2023-06-26 17:11:53 -07:00
Linus Torvalds
ae870a68b5 arm64/mm: Convert to using lock_mm_and_find_vma()
This converts arm64 to use the new page fault helper.  It was very
straightforward, but still needed a fix for the "obvious" conversion I
initially did.  Thanks to Suren for the fix and testing.

Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com>
Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24 14:12:58 -07:00
Catalin Marinas
abc17128c8 Merge branch 'for-next/feat_s1pie' into for-next/core
* for-next/feat_s1pie:
  : Support for the Armv8.9 Permission Indirection Extensions (stage 1 only)
  KVM: selftests: get-reg-list: add Permission Indirection registers
  KVM: selftests: get-reg-list: support ID register features
  arm64: Document boot requirements for PIE
  arm64: transfer permission indirection settings to EL2
  arm64: enable Permission Indirection Extension (PIE)
  arm64: add encodings of PIRx_ELx registers
  arm64: disable EL2 traps for PIE
  arm64: reorganise PAGE_/PROT_ macros
  arm64: add PTE_WRITE to PROT_SECT_NORMAL
  arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS
  KVM: arm64: expose ID_AA64MMFR3_EL1 to guests
  KVM: arm64: Save/restore PIE registers
  KVM: arm64: Save/restore TCR2_EL1
  arm64: cpufeature: add Permission Indirection Extension cpucap
  arm64: cpufeature: add TCR2 cpucap
  arm64: cpufeature: add system register ID_AA64MMFR3
  arm64/sysreg: add PIR*_ELx registers
  arm64/sysreg: update HCRX_EL2 register
  arm64/sysreg: add system registers TCR2_ELx
  arm64/sysreg: Add ID register ID_AA64MMFR3
2023-06-23 18:34:16 +01:00
Catalin Marinas
f42039d10b Merge branches 'for-next/kpti', 'for-next/missing-proto-warn', 'for-next/iss2-decode', 'for-next/kselftest', 'for-next/misc', 'for-next/feat_mops', 'for-next/module-alloc', 'for-next/sysreg', 'for-next/cpucap', 'for-next/acpi', 'for-next/kdump', 'for-next/acpi-doc', 'for-next/doc' and 'for-next/tpidr2-fix', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
  docs: perf: Add new description for HiSilicon UC PMU
  drivers/perf: hisi: Add support for HiSilicon UC PMU driver
  drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
  perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
  perf/arm-cmn: Add sysfs identifier
  perf/arm-cmn: Revamp model detection
  perf/arm_dmc620: Add cpumask
  dt-bindings: perf: fsl-imx-ddr: Add i.MX93 compatible
  drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver
  perf/arm_cspmu: Decouple APMT dependency
  perf/arm_cspmu: Clean up ACPI dependency
  ACPI/APMT: Don't register invalid resource
  perf/arm_cspmu: Fix event attribute type
  perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used
  drivers/perf: hisi: Don't migrate perf to the CPU going to teardown
  drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
  perf/arm-cmn: Fix DTC reset
  perf: qcom_l2_pmu: Make l2_cache_pmu_probe_cluster() more robust
  perf/arm-cci: Slightly optimize cci_pmu_sync_counters()

* for-next/kpti:
  : Simplify KPTI trampoline exit code
  arm64: entry: Simplify tramp_alias macro and tramp_exit routine
  arm64: entry: Preserve/restore X29 even for compat tasks

* for-next/missing-proto-warn:
  : Address -Wmissing-prototype warnings
  arm64: add alt_cb_patch_nops prototype
  arm64: move early_brk64 prototype to header
  arm64: signal: include asm/exception.h
  arm64: kaslr: add kaslr_early_init() declaration
  arm64: flush: include linux/libnvdimm.h
  arm64: module-plts: inline linux/moduleloader.h
  arm64: hide unused is_valid_bugaddr()
  arm64: efi: add efi_handle_corrupted_x18 prototype
  arm64: cpuidle: fix #ifdef for acpi functions
  arm64: kvm: add prototypes for functions called in asm
  arm64: spectre: provide prototypes for internal functions
  arm64: move cpu_suspend_set_dbg_restorer() prototype to header
  arm64: avoid prototype warnings for syscalls
  arm64: add scs_patch_vmlinux prototype
  arm64: xor-neon: mark xor_arm64_neon_*() static

* for-next/iss2-decode:
  : Add decode of ISS2 to data abort reports
  arm64/esr: Add decode of ISS2 to data abort reporting
  arm64/esr: Use GENMASK() for the ISS mask

* for-next/kselftest:
  : Various arm64 kselftest improvements
  kselftest/arm64: Log signal code and address for unexpected signals
  kselftest/arm64: Add a smoke test for ptracing hardware break/watch points

* for-next/misc:
  : Miscellaneous patches
  arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
  arm64: hibernate: remove WARN_ON in save_processor_state
  arm64/fpsimd: Exit streaming mode when flushing tasks
  arm64: mm: fix VA-range sanity check
  arm64/mm: remove now-superfluous ISBs from TTBR writes
  arm64: consolidate rox page protection logic
  arm64: set __exception_irq_entry with __irq_entry as a default
  arm64: syscall: unmask DAIF for tracing status
  arm64: lockdep: enable checks for held locks when returning to userspace
  arm64/cpucaps: increase string width to properly format cpucaps.h
  arm64/cpufeature: Use helper for ECV CNTPOFF cpufeature

* for-next/feat_mops:
  : Support for ARMv8.8 memcpy instructions in userspace
  kselftest/arm64: add MOPS to hwcap test
  arm64: mops: allow disabling MOPS from the kernel command line
  arm64: mops: detect and enable FEAT_MOPS
  arm64: mops: handle single stepping after MOPS exception
  arm64: mops: handle MOPS exceptions
  KVM: arm64: hide MOPS from guests
  arm64: mops: don't disable host MOPS instructions from EL2
  arm64: mops: document boot requirements for MOPS
  KVM: arm64: switch HCRX_EL2 between host and guest
  arm64: cpufeature: detect FEAT_HCX
  KVM: arm64: initialize HCRX_EL2

* for-next/module-alloc:
  : Make the arm64 module allocation code more robust (clean-up, VA range expansion)
  arm64: module: rework module VA range selection
  arm64: module: mandate MODULE_PLTS
  arm64: module: move module randomization to module.c
  arm64: kaslr: split kaslr/module initialization
  arm64: kasan: remove !KASAN_VMALLOC remnants
  arm64: module: remove old !KASAN_VMALLOC logic

* for-next/sysreg: (21 commits)
  : More sysreg conversions to automatic generation
  arm64/sysreg: Convert TRBIDR_EL1 register to automatic generation
  arm64/sysreg: Convert TRBTRG_EL1 register to automatic generation
  arm64/sysreg: Convert TRBMAR_EL1 register to automatic generation
  arm64/sysreg: Convert TRBSR_EL1 register to automatic generation
  arm64/sysreg: Convert TRBBASER_EL1 register to automatic generation
  arm64/sysreg: Convert TRBPTR_EL1 register to automatic generation
  arm64/sysreg: Convert TRBLIMITR_EL1 register to automatic generation
  arm64/sysreg: Rename TRBIDR_EL1 fields per auto-gen tools format
  arm64/sysreg: Rename TRBTRG_EL1 fields per auto-gen tools format
  arm64/sysreg: Rename TRBMAR_EL1 fields per auto-gen tools format
  arm64/sysreg: Rename TRBSR_EL1 fields per auto-gen tools format
  arm64/sysreg: Rename TRBBASER_EL1 fields per auto-gen tools format
  arm64/sysreg: Rename TRBPTR_EL1 fields per auto-gen tools format
  arm64/sysreg: Rename TRBLIMITR_EL1 fields per auto-gen tools format
  arm64/sysreg: Convert OSECCR_EL1 to automatic generation
  arm64/sysreg: Convert OSDTRTX_EL1 to automatic generation
  arm64/sysreg: Convert OSDTRRX_EL1 to automatic generation
  arm64/sysreg: Convert OSLAR_EL1 to automatic generation
  arm64/sysreg: Standardise naming of bitfield constants in OSL[AS]R_EL1
  arm64/sysreg: Convert MDSCR_EL1 to automatic register generation
  ...

* for-next/cpucap:
  : arm64 cpucap clean-up
  arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities()
  arm64: cpufeature: use cpucap naming
  arm64: alternatives: use cpucap naming
  arm64: standardise cpucap bitmap names

* for-next/acpi:
  : Various arm64-related ACPI patches
  ACPI: bus: Consolidate all arm specific initialisation into acpi_arm_init()

* for-next/kdump:
  : Simplify the crashkernel reservation behaviour of crashkernel=X,high on arm64
  arm64: add kdump.rst into index.rst
  Documentation: add kdump.rst to present crashkernel reservation on arm64
  arm64: kdump: simplify the reservation behaviour of crashkernel=,high

* for-next/acpi-doc:
  : Update ACPI documentation for Arm systems
  Documentation/arm64: Update ACPI tables from BBR
  Documentation/arm64: Update references in arm-acpi
  Documentation/arm64: Update ARM and arch reference

* for-next/doc:
  : arm64 documentation updates
  Documentation/arm64: Add ptdump documentation

* for-next/tpidr2-fix:
  : Fix the TPIDR2_EL0 register restoring on sigreturn
  kselftest/arm64: Add a test case for TPIDR2 restore
  arm64/signal: Restore TPIDR2 register rather than memory state
2023-06-23 18:32:20 +01:00
Catalin Marinas
1c1a429efd arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
With the DMA bouncing of unaligned kmalloc() buffers now in place, enable
it for arm64 to allow the kmalloc-{8,16,32,48,96} caches.  In addition,
always create the swiotlb buffer even when the end of RAM is within the
32-bit physical address range (the swiotlb buffer can still be disabled on
the kernel command line).

Link: https://lkml.kernel.org/r/20230612153201.554742-18-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:23 -07:00
Hugh Dickins
cafcb9ca5a arm64/hugetlb: pte_alloc_huge() pte_offset_huge()
pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits
that: to keep balance in future, use the recently added pte_alloc_huge()
instead; with pte_offset_huge() a better name for pte_offset_kernel().

Link: https://lkml.kernel.org/r/5849464-7191-40c5-c55f-fba9c3802e5d@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:06 -07:00
Hugh Dickins
52924726f4 arm64: allow pte_offset_map() to fail
In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Link: https://lkml.kernel.org/r/35e46485-8499-4337-c51f-b8fa495a1a93@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:06 -07:00
Mark Rutland
ab9b400809 arm64: mm: fix VA-range sanity check
Both create_mapping_noalloc() and update_mapping_prot() sanity-check
their 'virt' parameter, but the check itself doesn't make much sense.
The condition used today appears to be a historical accident.

The sanity-check condition:

	if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
		[ ... warning here ... ]
		return;
	}

... can only be true for the KASAN shadow region or the module region,
and there's no reason to exclude these specifically for creating and
updateing mappings.

When arm64 support was first upstreamed in commit:

  c1cc155261 ("arm64: MMU initialisation")

... the condition was:

	if (virt < VMALLOC_START) {
		[ ... warning here ... ]
		return;
	}

At the time, VMALLOC_START was the lowest kernel address, and this was
checking whether 'virt' would be translated via TTBR1.

Subsequently in commit:

  14c127c957 ("arm64: mm: Flip kernel VA space")

... the condition was changed to:

	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
		[ ... warning here ... ]
		return;
	}

This appear to have been a thinko. The commit moved the linear map to
the bottom of the kernel address space, with VMALLOC_START being at the
halfway point. The old condition would warn for changes to the linear
map below this, and at the time VA_START was the end of the linear map.

Subsequently we cleaned up the naming of VA_START in commit:

  77ad4ce693 ("arm64: memory: rename VA_START to PAGE_END")

... keeping the erroneous condition as:

	if ((virt >= PAGE_END) && (virt < VMALLOC_START)) {
		[ ... warning here ... ]
		return;
	}

Correct the condition to check against the start of the TTBR1 address
space, which is currently PAGE_OFFSET. This simplifies the logic, and
more clearly matches the "outside kernel range" message in the warning.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230615102628.1052103-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-15 17:51:07 +01:00
Jamie Iles
b9293d457f arm64/mm: remove now-superfluous ISBs from TTBR writes
At the time of authoring 7655abb953 ("arm64: mm: Move ASID from TTBR0
to TTBR1"), the Arm ARM did not specify any ordering guarantees for
direct writes to TTBR0_ELx and TTBR1_ELx and so an ISB was required
after each write to ensure TLBs would only be populated from the
expected (or reserved tables).

In a recent update to the Arm ARM, the requirements have been relaxed to
reflect the implementation of current CPUs and required implementation
of future CPUs to read (RDYDPX in D8.2.3 Translation table base address
register):

  Direct writes to TTBR0_ELx and TTBR1_ELx occur in program order
  relative to one another, without the need for explicit
  synchronization. For any one translation, all indirect reads of
  TTBR0_ELx and TTBR1_ELx that are made as part of the translation
  observe only one point in that order of direct writes.

Remove the superfluous ISBs to optimize uaccess helpers and context
switch.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230613141959.92697-1-quic_jiles@quicinc.com
[catalin.marinas@arm.com: rename __cpu_set_reserved_ttbr0 to ..._nosync]
[catalin.marinas@arm.com: move the cpu_set_reserved_ttbr0_nosync() call to cpu_do_switch_mm()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-15 17:47:54 +01:00
Russell King
601eaec513 arm64: consolidate rox page protection logic
Consolidate the arm64 decision making for the page protections used
for executable pages, used by both the trampoline code and the kernel
text mapping code.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q9T3v-00EDmW-BH@rmk-PC.armlinux.org.uk
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-14 18:03:10 +01:00
Arnd Bergmann
bb6e04a173 kasan: use internal prototypes matching gcc-13 builtins
gcc-13 warns about function definitions for builtin interfaces that have a
different prototype, e.g.:

In file included from kasan_test.c:31:
kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
  574 | void __asan_register_globals(struct kasan_global *globals, size_t size);
kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
  577 | void __asan_alloca_poison(unsigned long addr, size_t size);
kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
  580 | void __asan_load1(unsigned long addr);
kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
  581 | void __asan_store1(unsigned long addr);
kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char,  long int)' [-Werror=builtin-declaration-mismatch]
  643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size);

The two problems are:

 - Addresses are passes as 'unsigned long' in the kernel, but gcc-13
   expects a 'void *'.

 - sizes meant to use a signed ssize_t rather than size_t.

Change all the prototypes to match these.  Using 'void *' consistently for
addresses gets rid of a couple of type casts, so push that down to the
leaf functions where possible.

This now passes all randconfig builds on arm, arm64 and x86, but I have
not tested it on the other architectures that support kasan, since they
tend to fail randconfig builds in other ways.  This might fail if any of
the 32-bit architectures expect a 'long' instead of 'int' for the size
argument.

The __asan_allocas_unpoison() function prototype is somewhat weird, since
it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'.  This
looks like it is meant to be 'addr' and 'size' like the others, but the
implementation clearly treats them as 'top' and 'bottom'.

Link: https://lkml.kernel.org/r/20230509145735.9263-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:19 -07:00
Baoquan He
6c4dcaddbd arm64: kdump: simplify the reservation behaviour of crashkernel=,high
On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
suitable memory region top down. If the 'xM' of crashkernel high memory
is reserved from high memory successfully, it will try to reserve
crashkernel low memory later accoringly. Otherwise, it will try to search
low memory area for the 'xM' suitable region. Please see the details in
Documentation/admin-guide/kernel-parameters.txt.

While we observed an unexpected case where a reserved region crosses the
high and low meomry boundary. E.g on a system with 4G as low memory end,
user added the kernel parameters like: 'crashkernel=512M,high', it could
finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
The crashkernel high region crossing low and high memory boudary will bring
issues:

1) For crashkernel=x,high, if getting crashkernel high region across
low and high memory boundary, then user will see two memory regions in
low memory, and one memory region in high memory. The two crashkernel
low memory regions are confusing as shown in above example.

2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
and y <= 128M, when crashkernel high region crosses low and high memory
boundary and the part of crashkernel high reservation below boundary is
bigger than y, the expected crahskernel low reservation will be skipped.
But the expected crashkernel high reservation is shrank and could not
satisfy user space requirement.

3) The crossing boundary behaviour of crahskernel high reservation is
different than x86 arch. On x86_64, the low memory end is 4G fixedly,
and the memory near 4G is reserved by system, e.g for mapping firmware,
pci mapping, so the crashkernel reservation crossing boundary never happens.
From distros point of view, this brings inconsistency and confusion. Users
need to dig into x86 and arm64 system details to find out why.

For kernel itself, the impact of issue 3) could be slight. While issue
1) and 2) cause actual impact because it brings obscure semantics and
behaviour to crashkernel=,high reservation.

Here, for crashkernel=xM,high, search the high memory for the suitable
region only in high memory. If failed, try reserving the suitable
region only in low memory. Like this, the crashkernel high region will
only exist in high memory, and crashkernel low region only exists in low
memory. The reservation behaviour for crashkernel=,high is clearer and
simpler.

Note: RPi4 has different zone ranges than normal memory. Its DMA zone is
0~1G, and DMA32 zone is 1G~4G if CONFIG_ZONE_DMA|DMA32 are enabled by
default. The low memory end is 1G in order to validate all devices, high
memory starts at 1G memory. However, for being consistent with normal
arm64 system, its low memory end is still 1G, while reserving crashkernel
high memory from 4G if crashkernel=size,high specified. This will remove
confusion.

With above change applied, summary of arm64 crashkernel reservation range:
1)
RPi4(zone DMA:0~1G; DMA32:1G~4G):
 crashkernel=size
  0~1G: low memory | 1G~top: high memory

 crashkernel=size,high
  0~1G: low memory | 4G~top: high memory

2)
Other normal system:
 crashkernel=size
 crashkernel=size,high
  0~4G: low memory | 4G~top: high memory

3)
Systems w/o zone DMA|DMA32
 crashkernel=size
 crashkernel=size,high
  0~top: low memory

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/ZGIBSEoZ7VRVvP8H@MiWiFi-R3L-srv
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-09 17:37:28 +01:00
Mark Rutland
55123afffe arm64: kasan: remove !KASAN_VMALLOC remnants
Historically, KASAN could be selected with or without KASAN_VMALLOC, but
since commit:

  f6f37d9320 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes")

... we can never select KASAN without KASAN_VMALLOC on arm64, and thus
arm64 code for KASAN && !KASAN_VMALLOC is redundant and can be removed.

Remove the redundant code kasan_init.c

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 17:39:05 +01:00
Joey Gouly
9e9bb6ede0 arm64: enable Permission Indirection Extension (PIE)
Now that the necessary changes have been made, set the Permission Indirection
registers and enable the Permission Indirection Extension.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230606145859.697944-17-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 16:52:41 +01:00
Joey Gouly
f0af339fc4 arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS
With PIE enabled, the swapper PTEs would have a Permission Indirection Index
(PIIndex) of 0. A PIIndex of 0 is not currently used by any other PTEs.

To avoid using index 0 specifically for the swapper PTEs, mark them as
PTE_UXN and PTE_WRITE, so that they map to a PAGE_KERNEL_EXEC equivalent.

This also adds PTE_WRITE to KPTI_NG_PTE_FLAGS, which was tested by booting
with kpti=on.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230606145859.697944-12-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06 16:52:41 +01:00
Jisheng Zhang
0e2aba6948 arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block
When reading the arm64's PER_VMA_LOCK support code, I found a bit
difference between arm64 and other arch when calling handle_mm_fault()
during VMA lock-based page fault handling: the fault address is masked
before passing to handle_mm_fault(). This is also different from the
usage in mmap_lock-based handling. I think we need to pass the
original fault address to handle_mm_fault() as we did in
commit 84c5e23ede ("arm64: mm: Pass original fault address to
handle_mm_fault()").

If we go through the code path further, we can find that the "masked"
fault address can cause mismatched fault address between perf sw
major/minor page fault sw event and perf page fault sw event:

do_page_fault
  perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, ..., addr)   // orig addr
  handle_mm_fault
    mm_account_fault
      perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, ...) // masked addr

Fixes: cd7f176aea ("arm64/mm: try VMA lock-based page fault handling first")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230524131305.2808-1-jszhang@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-02 13:02:44 +01:00
Mark Brown
1f9d4ba683 arm64/esr: Add decode of ISS2 to data abort reporting
The architecture has added more information about faults to ISS2 within
ESR. Add decode of this to our data abort fault decode to aid diagnostics.
Features that are not currently enabled are included here for completeness.

Since the architecture specifies the values of bits within ISS2 in terms
of ISS2 rather than in terms of the register as a whole we do so for our
definitions as well, this makes it easier to review bitfield definitions.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230417-arm64-iss2-dabt-decode-v3-2-c1fa503e503a@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-05-26 10:11:42 +01:00
Arnd Bergmann
e13d32e992 arm64: move early_brk64 prototype to header
The prototype used for calling early_brk64() is in the file that calls
it, which is the wrong place, as it is not included for the definition:

arch/arm64/kernel/traps.c:1100:12: error: no previous prototype for 'early_brk64' [-Werror=missing-prototypes]

Move it to an appropriate header instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230516160642.523862-15-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-05-25 17:44:03 +01:00
Arnd Bergmann
1a11839389 arm64: flush: include linux/libnvdimm.h
The two cache management functions are declared in libnvdimm.h
but provided by architecture specific code. Without including
the header, this causes a W=1 warning:

arch/arm64/mm/flush.c:96:6: error: no previous prototype for 'arch_wb_cache_pmem' [-Werror=missing-prototypes]
arch/arm64/mm/flush.c:104:6: error: no previous prototype for 'arch_invalidate_pmem' [-Werror=missing-prototypes]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230516160642.523862-12-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-05-25 17:44:02 +01:00
Peter Collingbourne
2efbafb91e arm64: Also reset KASAN tag if page is not PG_mte_tagged
Consider the following sequence of events:

1) A page in a PROT_READ|PROT_WRITE VMA is faulted.
2) Page migration allocates a page with the KASAN allocator,
   causing it to receive a non-match-all tag, and uses it
   to replace the page faulted in 1.
3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1.

As a result of step 3, we are left with a non-match-all tag for a page
with tags accessible to userspace, which can lead to the same kind of
tag check faults that commit e74a684680 ("arm64: Reset KASAN tag in
copy_highpage with HW tags only") intended to fix.

The general invariant that we have for pages in a VMA with VM_MTE_ALLOWED
is that they cannot have a non-match-all tag. As a result of step 2, the
invariant is broken. This means that the fix in the referenced commit
was incomplete and we also need to reset the tag for pages without
PG_mte_tagged.

Fixes: e5b8d92189 ("arm64: mte: reset the page tag in page->flags")
Cc: <stable@vger.kernel.org> # 5.15
Link: https://linux-review.googlesource.com/id/I7409cdd41acbcb215c2a7417c1e50d37b875beff
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230420210945.2313627-1-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-05-16 14:58:54 +01:00
Min-Hua Chen
d91d580878 arm64/mm: mark private VM_FAULT_X defines as vm_fault_t
This patch fixes several sparse warnings for fault.c:

arch/arm64/mm/fault.c:493:24: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:493:24: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:493:24: sparse:    got int
arch/arm64/mm/fault.c:501:32: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:501:32: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:501:32: sparse:    got int
arch/arm64/mm/fault.c:503:32: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:503:32: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:503:32: sparse:    got int
arch/arm64/mm/fault.c:511:24: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:511:24: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:511:24: sparse:    got int
arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer
arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer
arch/arm64/mm/fault.c:713:39: sparse: warning: restricted vm_fault_t degrades to integer

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com>
Link: https://lore.kernel.org/r/20230502151909.128810-1-minhuadotchen@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-05-16 14:50:43 +01:00
Linus Torvalds
671e148d07 arm64 fixes for -rc1
- Fix regression in CPU erratum workaround when disabling the MMU
 
 - Fix detection of pointer authentication hwcaps
 
 - Avoid writeable, executable ELF sections in vmlinux
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmRTe+UQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNJXqB/9C9DrrbHUg9ZPqAIUpXkyaxem4gpIS+kyU
 +ard53uweuQHchuR/x2s2K9Sp/ano5jGnQXEjikNy29Opu2UYI/wmsqdJEn3km8q
 kohTRsiFgQ40Y85/3iJ8ug6+llxCxK6AXdZCskdWTP56Jur0WpNiQd0a/ShYQLdX
 wBHdInT3QpDVzd5bEWDtUEj4H//tTCy4rESQyGsLhrHgb/x8uZKgZMtPJp6+Q3Eq
 ofs+PQc0qHr/Ri3ahQOCMbxTbNaLIgUzkyXbZN+y2JtxgE+l8E3Gsir2+Pv7mcSx
 1gCSLCmpwE7rVJpTykN+jA6OSsoSUSJHXs6565nF4n8+ugdL7aqR
 =Tba+
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "A few arm64 fixes that came in during the merge window for -rc1.

  The main thing is restoring the pointer authentication hwcaps, which
  disappeared during some recent refactoring

   - Fix regression in CPU erratum workaround when disabling the MMU

   - Fix detection of pointer authentication hwcaps

   - Avoid writeable, executable ELF sections in vmlinux"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: lds: move .got section out of .text
  arm64: kernel: remove SHF_WRITE|SHF_EXECINSTR from .idmap.text
  arm64: cpufeature: Fix pointer auth hwcaps
  arm64: Fix label placement in record_mmu_state()
2023-05-04 12:45:32 -07:00
ndesaulniers@google.com
4df69e0df2 arm64: kernel: remove SHF_WRITE|SHF_EXECINSTR from .idmap.text
commit d54170812e ("arm64: fix .idmap.text assertion for large kernels")
modified some of the section assembler directives that declare
.idmap.text to be SHF_ALLOC instead of
SHF_ALLOC|SHF_WRITE|SHF_EXECINSTR.

This patch fixes up the remaining stragglers that were left behind.  Add
Fixes tag so that this doesn't precede related change in stable.

Fixes: d54170812e ("arm64: fix .idmap.text assertion for large kernels")
Reported-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230428-awx-v2-1-b197ffa16edc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-05-02 12:42:22 +01:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Will Deacon
1bb31cc7af Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
  arm64: mm: always map fixmap at page granularity
  arm64: mm: move fixmap code to its own file
  arm64: add FIXADDR_TOT_{START,SIZE}
  Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""
  arm: uaccess: Remove memcpy_page_flushcache()
  mm,kfence: decouple kfence from page granularity mapping judgement
2023-04-20 11:22:33 +01:00
Baoquan He
504cae453f arm64: kdump: defer the crashkernel reservation for platforms with no DMA memory zones
In commit 031495635b ("arm64: Do not defer reserve_crashkernel() for
platforms with no DMA memory zones"), reserve_crashkernel() is called
much earlier in arm64_memblock_init() to avoid causing base apge
mapping on platforms with no DMA meomry zones.

With taking off protection on crashkernel memory region, no need to call
reserve_crashkernel() specially in advance. The deferred invocation of
reserve_crashkernel() in bootmem_init() can cover all cases. So revert
the whole commit now.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230407011507.17572-4-bhe@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11 19:24:46 +01:00
Baoquan He
04a2a7af3d arm64: kdump: do not map crashkernel region specifically
After taking off the protection functions on crashkernel memory region,
there's no need to map crashkernel region with page granularity during
linear mapping.

With this change, the system can make use of block or section mapping
on linear region to largely improve perforcemence during system bootup
and running.

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230407011507.17572-3-bhe@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11 19:24:46 +01:00
Mark Rutland
414c109bdf arm64: mm: always map fixmap at page granularity
Today the fixmap code largely maps elements at PAGE_SIZE granularity,
but we special-case the FDT mapping such that it can be mapped with 2M
block mappings when 4K pages are in use. The original rationale for this
was simplicity, but it has some unfortunate side-effects, and
complicates portions of the fixmap code (i.e. is not so simple after
all).

The FDT can be up to 2M in size but is only required to have 8-byte
alignment, and so it may straddle a 2M boundary. Thus when using 2M
block mappings we may map up to 4M of memory surrounding the FDT. This
is unfortunate as most of that memory will be unrelated to the FDT, and
any pages which happen to share a 2M block with the FDT will by mapped
with Normal Write-Back Cacheable attributes, which might not be what we
want elsewhere (e.g. for carve-outs using Non-Cacheable attributes).

The logic to handle mapping the FDT with 2M blocks requires some special
cases in the fixmap code, and ties it to the early page table
configuration by virtue of the SWAPPER_TABLE_SHIFT and
SWAPPER_BLOCK_SIZE constants used to determine the granularity used to
map the FDT.

This patch simplifies the FDT logic and removes the unnecessary mappings
of surrounding pages by always mapping the FDT at page granularity as
with all other fixmap mappings. To do so we statically reserve multiple
PTE tables to cover the fixmap VA range. Since the FDT can be at most
2M, for 4K pages we only need to allocate a single additional PTE table,
and for 16K and 64K pages the existing single PTE table is sufficient.

The PTE table allocation scales with the number of slots reserved in the
fixmap, and so this also makes it easier to add more fixmap entries if
we require those in future.

Our VA layout means that the fixmap will always fall within a single PMD
table (and consequently, within a single PUD/P4D/PGD entry), which we
can verify at compile time with a static_assert(). With that assert a
number of runtime warnings become impossible, and are removed.

I've boot-tested this patch with both 4K and 64K pages.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11 18:55:28 +01:00
Mark Rutland
b97547761b arm64: mm: move fixmap code to its own file
Over time, arm64's mm/mmu.c has become increasingly large and painful to
navigate. Move the fixmap code to its own file where it can be understood in
isolation.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11 18:55:28 +01:00
Mark Rutland
32f5b6995f arm64: add FIXADDR_TOT_{START,SIZE}
Currently arm64's FIXADDR_{START,SIZE} definitions only cover the
runtime fixmap slots (and not the boot-time fixmap slots), but the code
for creating the fixmap assumes that these definitions cover the entire
fixmap range. This means that the ptdump boundaries are reported in a
misleading way, missing the VA region of the runtime slots. In theory
this could also cause the fixmap creation to go wrong if the boot-time
fixmap slots end up spilling into a separate PMD entry, though luckily
this is not currently the case in any configuration.

While it seems like we could extend FIXADDR_{START,SIZE} to cover the
entire fixmap area, core code relies upon these *only* covering the
runtime slots. For example, fix_to_virt() and virt_to_fix() try to
reject manipulation of the boot-time slots based upon
FIXADDR_{START,SIZE}, while __fix_to_virt() and __virt_to_fix() can
handle any fixmap slot.

This patch follows the lead of x86 in commit:

  55f49fcb87 ("x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP")

... and add new FIXADDR_TOT_{START,SIZE} definitions which cover the
entire fixmap area, using these for the fixmap creation and ptdump code.

As the boot-time fixmap slots are now rejected by fix_to_virt(),
the early_fixmap_init() code is changed to consistently use
__fix_to_virt(), as it already does in a few cases.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20230406152759.4164229-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11 18:55:28 +01:00
Suren Baghdasaryan
cd7f176aea arm64/mm: try VMA lock-based page fault handling first
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

Link: https://lkml.kernel.org/r/20230227173632.3292573-31-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 20:03:01 -07:00
Will Deacon
7bd6680b47 Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""
This reverts commit b7d9aae404.

With the Qualcomm remoteproc driver now modified to use a carveout
memory region in 57f72170a2 ("remoteproc: qcom_q6v5_mss: Use a
carveout to authenticate modem headers"), we can reinstate c44094eee3
("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()")
which relaxes the arm64 implementation of arch_dma_prep_coherent() to
perform only a data cache clean operation, rather than a
clean-and-invalidate.

Signed-off-by: Will Deacon <will@kernel.org>
2023-03-30 16:23:03 +01:00
Zhenhua Huang
bfa7965b33 mm,kfence: decouple kfence from page granularity mapping judgement
Kfence only needs its pool to be mapped as page granularity, if it is
inited early. Previous judgement was a bit over protected. From [1], Mark
suggested to "just map the KFENCE region a page granularity". So I
decouple it from judgement and do page granularity mapping for kfence
pool only. Need to be noticed that late init of kfence pool still requires
page granularity mapping.

Page granularity mapping in theory cost more(2M per 1GB) memory on arm64
platform. Like what I've tested on QEMU(emulated 1GB RAM) with
gki_defconfig, also turning off rodata protection:
Before:
[root@liebao ]# cat /proc/meminfo
MemTotal:         999484 kB
After:
[root@liebao ]# cat /proc/meminfo
MemTotal:        1001480 kB

To implement this, also relocate the kfence pool allocation before the
linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys
addr, __kfence_pool is to be set after linear mapping set up.

LINK: [1] https://lore.kernel.org/linux-arm-kernel/Y+IsdrvDNILA59UN@FVFF77S0Q05N/
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-03-27 16:15:20 +01:00
Linus Torvalds
39ce4395c3 arm64 fixes:
- In copy_highpage(), only reset the tag of the destination pointer if
   KASAN_HW_TAGS is enabled so that user-space MTE does not interfere
   with KASAN_SW_TAGS (which relies on top-byte-ignore).
 
 - Remove warning if SME is detected without SVE, the kernel can cope
   with such configuration (though none in the field currently).
 
 - In cfi_handler(), pass the ESR_EL1 value to die() for consistency with
   other die() callers.
 
 - Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte
   manipulation from the generic vmemmap_remap_pte() does not follow the
   required ARM break-before-make sequence (clear the pte, flush the
   TLBs, set the new pte). It may be re-enabled once this sequence is
   sorted.
 
 - Fix possible memory leak in the arm64 ACPI code if the SMCCC version
   and conduit checks fail.
 
 - Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores
   -falign-functions=N with -Os.
 
 - Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no
   randomisation would actually take place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmQBHFEACgkQa9axLQDI
 XvHf9Q/3Zg8o/8HchnWSvzgV//9ljGrrDfAjbfZHrE2W4PCniSd0op0uXYsVK3IH
 Nk6ZDiRe5uIXKgHuSq5caOoL4aRk0hk1TpQ3RKCuh8E3ybhQe9gwYm8xEWXDSSWh
 QzcfENsKlZLpuMoSMILJ2NlMPMbMLprXNCUlgENBbRT7KUToHZKTwE6BL2AUI3tg
 RdMntccorybxk1hiXV1YKT8482i+x2gAnylYXFsq3eI+G54rdfiks+tft0CQV3ng
 1/i1PfbnGC45sBoxXPqYXzBSUDNHpAqb5dwvtlVinGo3J6STxIvbM6Zi5Ma5hl3u
 QrhwyduwCTZ6wVOqzd4KAH9gmhJSzRG75OzCek2dTwU9KXVMOPEvp1ZfTwUXDx7J
 5j8UkjGgrbtj6IioGqBAO/HiFfoty8EBtmlSZIj0thwxkM73ZBG6efQOJaVWh85m
 ioUzMC2Y5yfKLfHEcy9yKIQVizMYoz6fl+QHOEbVSoFhJKNRc4wt5CCJCvsbMHsu
 K8rvD/CI9jFMP9GEK7ObTaC7ICjUz/+8wbIrRrm5ObRQ65Tm2zv3OLqGnK8O5O4W
 gcDEraTnSPHDUtgG6dAEPFN5Wi9hT3zYC0xAcNhc3aZC5ofS5RD6YXIWJvqjWrvL
 5k8G1gfa57C/hfxO6pPw7bg/nY8vvYpxUkZ9erRWD430g7y0Sg==
 =jfPa
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - In copy_highpage(), only reset the tag of the destination pointer if
   KASAN_HW_TAGS is enabled so that user-space MTE does not interfere
   with KASAN_SW_TAGS (which relies on top-byte-ignore).

 - Remove warning if SME is detected without SVE, the kernel can cope
   with such configuration (though none in the field currently).

 - In cfi_handler(), pass the ESR_EL1 value to die() for consistency
   with other die() callers.

 - Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte
   manipulation from the generic vmemmap_remap_pte() does not follow the
   required ARM break-before-make sequence (clear the pte, flush the
   TLBs, set the new pte). It may be re-enabled once this sequence is
   sorted.

 - Fix possible memory leak in the arm64 ACPI code if the SMCCC version
   and conduit checks fail.

 - Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores
  -falign-functions=N with -Os.

 - Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no
   randomisation would actually take place.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
  arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE
  arm64: acpi: Fix possible memory leak of ffh_ctxt
  arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP
  arm64: pass ESR_ELx to die() of cfi_handler
  arm64/fpsimd: Remove warning for SME without SVE
  arm64: Reset KASAN tag in copy_highpage with HW tags only
2023-03-02 14:57:53 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Peter Collingbourne
e74a684680 arm64: Reset KASAN tag in copy_highpage with HW tags only
During page migration, the copy_highpage function is used to copy the
page data to the target page. If the source page is a userspace page
with MTE tags, the KASAN tag of the target page must have the match-all
tag in order to avoid tag check faults during subsequent accesses to the
page by the kernel. However, the target page may have been allocated in
a number of ways, some of which will use the KASAN allocator and will
therefore end up setting the KASAN tag to a non-match-all tag. Therefore,
update the target page's KASAN tag to match the source page.

We ended up unintentionally fixing this issue as a result of a bad
merge conflict resolution between commit e059853d14 ("arm64: mte:
Fix/clarify the PG_mte_tagged semantics") and commit 20794545c1 ("arm64:
kasan: Revert "arm64: mte: reset the page tag in page->flags""), which
preserved a tag reset for PG_mte_tagged pages which was considered to be
unnecessary at the time. Because SW tags KASAN uses separate tag storage,
update the code to only reset the tags when HW tags KASAN is enabled.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/If303d8a709438d3ff5af5fd85706505830f52e0c
Reported-by: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com>
Cc: <stable@vger.kernel.org> # 6.1
Fixes: 20794545c1 ("arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"")
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20230215050911.1433132-1-pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-22 15:53:48 +00:00
Linus Torvalds
8bf1a529cd arm64 updates for 6.3:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that Linux
   needs to save/restore.
 
 - Include TPIDR2 in the signal context and add the corresponding
   kselftests.
 
 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time.
 
 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
 
 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire
   loaded kernel image to the PoC and disabling the MMU and caches before
   branching to the kernel bare metal entry point, leave the MMU and
   caches enabled and rely on EFI's cacheable 1:1 mapping of all of
   system RAM to populate the initial page tables.
 
 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values).
 
 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure.
 
 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice.
 
 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone.
 
 - Further arm64 sysreg conversion and some fixes.
 
 - arm64 kselftest fixes and improvements.
 
 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation.
 
 - Pseudo-NMI code generation optimisations.
 
 - Minor fixes for SME and TPIDR2 handling.
 
 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace
   strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic
   shadow call stack in two passes, intercept pfn changes in set_pte_at()
   without the required break-before-make sequence, attempt to dump all
   instructions on unhandled kernel faults.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmP0/QsACgkQa9axLQDI
 XvG+gA/+JDVEH9wRzAIZvbp9hSuohPc48xgAmIMP1eiVB0/5qeRjYAJwS33H0rXS
 BPC2kj9IBy/eQeM9ICg0nFd0zYznSVacITqe6NrqeJ1F+ftS4rrHdfxd+J7kIoCs
 V2L8e+BJvmHdhmNV2qMAgJdGlfxfQBA7fv2cy52HKYcouoOh1AUVR/x+yXVXAsCd
 qJP3+dlUKccgm/oc5unEC1eZ49u8O+EoasqOyfG6K5udMgzhEX3K6imT9J3hw0WT
 UjstYkx5uGS/prUrRCQAX96VCHoZmzEDKtQuHkHvQXEYXsYPF3ldbR2CziNJnHe7
 QfSkjJlt8HAtExA+BkwEe9i0MQO/2VF5qsa2e4fA6l7uqGu3LOtS/jJd23C9n9fR
 Id8aBMeN6S8+MjqRA9L2uf4t6e4ISEHoG9ZRdc4WOwloxEEiJoIeun+7bHdOSZLj
 AFdHFCz4NXiiwC0UP0xPDI2YeCLqt5np7HmnrUqwzRpVO8UUagiJD8TIpcBSjBN9
 J68eidenHUW7/SlIeaMKE2lmo8AUEAJs9AorDSugF19/ThJcQdx7vT2UAZjeVB3j
 1dbbwajnlDOk/w8PQC4thFp5/MDlfst0htS3WRwa+vgkweE2EAdTU4hUZ8qEP7FQ
 smhYtlT1xUSTYDTqoaG/U2OWR6/UU79wP0jgcOsHXTuyYrtPI/Q=
 =VmXL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that
   Linux needs to save/restore

 - Include TPIDR2 in the signal context and add the corresponding
   kselftests

 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time

 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64

 - Permit EFI boot with MMU and caches on. Instead of cleaning the
   entire loaded kernel image to the PoC and disabling the MMU and
   caches before branching to the kernel bare metal entry point, leave
   the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
   all of system RAM to populate the initial page tables

 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values)

 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure

 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice

 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone

 - Further arm64 sysreg conversion and some fixes

 - arm64 kselftest fixes and improvements

 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation

 - Pseudo-NMI code generation optimisations

 - Minor fixes for SME and TPIDR2 handling

 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
   replace strtobool() to kstrtobool() in the cpufeature.c code, apply
   dynamic shadow call stack in two passes, intercept pfn changes in
   set_pte_at() without the required break-before-make sequence, attempt
   to dump all instructions on unhandled kernel faults

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
  arm64: fix .idmap.text assertion for large kernels
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  arm64: kprobes: Drop ID map text from kprobes blacklist
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  arm64/sme: Fix __finalise_el2 SMEver check
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  arm64/signal: Only read new data when parsing the ZT context
  arm64/signal: Only read new data when parsing the ZA context
  arm64/signal: Only read new data when parsing the SVE context
  arm64/signal: Avoid rereading context frame sizes
  arm64/signal: Make interface for restore_fpsimd_context() consistent
  arm64/signal: Remove redundant size validation from parse_user_sigframe()
  arm64/signal: Don't redundantly verify FPSIMD magic
  arm64/cpufeature: Use helper macros to specify hwcaps
  arm64/cpufeature: Always use symbolic name for feature value in hwcaps
  arm64/sysreg: Initial unsigned annotations for ID registers
  arm64/sysreg: Initial annotation of signed ID registers
  ...
2023-02-21 15:27:48 -08:00
Catalin Marinas
156010ed9c Merge branches 'for-next/sysreg', 'for-next/sme', 'for-next/kselftest', 'for-next/misc', 'for-next/sme2', 'for-next/tpidr2', 'for-next/scs', 'for-next/compat-hwcap', 'for-next/ftrace', 'for-next/efi-boot-mmu-on', 'for-next/ptrauth' and 'for-next/pseudo-nmi', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  perf: arm_spe: Support new SPEv1.2/v8.7 'not taken' event
  perf: arm_spe: Use new PMSIDR_EL1 register enums
  perf: arm_spe: Drop BIT() and use FIELD_GET/PREP accessors
  arm64/sysreg: Convert SPE registers to automatic generation
  arm64: Drop SYS_ from SPE register defines
  perf: arm_spe: Use feature numbering for PMSEVFR_EL1 defines
  perf/marvell: Add ACPI support to TAD uncore driver
  perf/marvell: Add ACPI support to DDR uncore driver
  perf/arm-cmn: Reset DTM_PMU_CONFIG at probe
  drivers/perf: hisi: Extract initialization of "cpa_pmu->pmu"
  drivers/perf: hisi: Simplify the parameters of hisi_pmu_init()
  drivers/perf: hisi: Advertise the PERF_PMU_CAP_NO_EXCLUDE capability

* for-next/sysreg:
  : arm64 sysreg and cpufeature fixes/updates
  KVM: arm64: Use symbolic definition for ISR_EL1.A
  arm64/sysreg: Add definition of ISR_EL1
  arm64/sysreg: Add definition for ICC_NMIAR1_EL1
  arm64/cpufeature: Remove 4 bit assumption in ARM64_FEATURE_MASK()
  arm64/sysreg: Fix errors in 32 bit enumeration values
  arm64/cpufeature: Fix field sign for DIT hwcap detection

* for-next/sme:
  : SME-related updates
  arm64/sme: Optimise SME exit on syscall entry
  arm64/sme: Don't use streaming mode to probe the maximum SME VL
  arm64/ptrace: Use system_supports_tpidr2() to check for TPIDR2 support

* for-next/kselftest: (23 commits)
  : arm64 kselftest fixes and improvements
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  kselftest/arm64: Fix enumeration of systems without 128 bit SME for SSVE+ZA
  kselftest/arm64: Fix enumeration of systems without 128 bit SME
  kselftest/arm64: Don't require FA64 for streaming SVE tests
  kselftest/arm64: Limit the maximum VL we try to set via ptrace
  kselftest/arm64: Correct buffer size for SME ZA storage
  kselftest/arm64: Remove the local NUM_VL definition
  kselftest/arm64: Verify simultaneous SSVE and ZA context generation
  kselftest/arm64: Verify that SSVE signal context has SVE_SIG_FLAG_SM set
  kselftest/arm64: Remove spurious comment from MTE test Makefile
  kselftest/arm64: Support build of MTE tests with clang
  kselftest/arm64: Initialise current at build time in signal tests
  kselftest/arm64: Don't pass headers to the compiler as source
  kselftest/arm64: Remove redundant _start labels from FP tests
  kselftest/arm64: Fix .pushsection for strings in FP tests
  kselftest/arm64: Run BTI selftests on systems without BTI
  kselftest/arm64: Fix test numbering when skipping tests
  kselftest/arm64: Skip non-power of 2 SVE vector lengths in fp-stress
  kselftest/arm64: Only enumerate power of two VLs in syscall-abi
  ...

* for-next/misc:
  : Miscellaneous arm64 updates
  arm64/mm: Intercept pfn changes in set_pte_at()
  Documentation: arm64: correct spelling
  arm64: traps: attempt to dump all instructions
  arm64: Apply dynamic shadow call stack patching in two passes
  arm64: el2_setup.h: fix spelling typo in comments
  arm64: Kconfig: fix spelling
  arm64: cpufeature: Use kstrtobool() instead of strtobool()
  arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  arm64: make ARCH_FORCE_MAX_ORDER selectable

* for-next/sme2: (23 commits)
  : Support for arm64 SME 2 and 2.1
  arm64/sme: Fix __finalise_el2 SMEver check
  kselftest/arm64: Remove redundant _start labels from zt-test
  kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
  kselftest/arm64: Add coverage of the ZT ptrace regset
  kselftest/arm64: Add SME2 coverage to syscall-abi
  kselftest/arm64: Add test coverage for ZT register signal frames
  kselftest/arm64: Teach the generic signal context validation about ZT
  kselftest/arm64: Enumerate SME2 in the signal test utility code
  kselftest/arm64: Cover ZT in the FP stress test
  kselftest/arm64: Add a stress test program for ZT0
  arm64/sme: Add hwcaps for SME 2 and 2.1 features
  arm64/sme: Implement ZT0 ptrace support
  arm64/sme: Implement signal handling for ZT
  arm64/sme: Implement context switching for ZT0
  arm64/sme: Provide storage for ZT0
  arm64/sme: Add basic enumeration for SME2
  arm64/sme: Enable host kernel to access ZT0
  arm64/sme: Manually encode ZT0 load and store instructions
  arm64/esr: Document ISS for ZT0 being disabled
  arm64/sme: Document SME 2 and SME 2.1 ABI
  ...

* for-next/tpidr2:
  : Include TPIDR2 in the signal context
  kselftest/arm64: Add test case for TPIDR2 signal frame records
  kselftest/arm64: Add TPIDR2 to the set of known signal context records
  arm64/signal: Include TPIDR2 in the signal context
  arm64/sme: Document ABI for TPIDR2 signal information

* for-next/scs:
  : arm64: harden shadow call stack pointer handling
  arm64: Stash shadow stack pointer in the task struct on interrupt
  arm64: Always load shadow stack pointer directly from the task struct

* for-next/compat-hwcap:
  : arm64: Expose compat ARMv8 AArch32 features (HWCAPs)
  arm64: Add compat hwcap SSBS
  arm64: Add compat hwcap SB
  arm64: Add compat hwcap I8MM
  arm64: Add compat hwcap ASIMDBF16
  arm64: Add compat hwcap ASIMDFHM
  arm64: Add compat hwcap ASIMDDP
  arm64: Add compat hwcap FPHP and ASIMDHP

* for-next/ftrace:
  : Add arm64 support for DYNAMICE_FTRACE_WITH_CALL_OPS
  arm64: avoid executing padding bytes during kexec / hibernation
  arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
  arm64: ftrace: Update stale comment
  arm64: patching: Add aarch64_insn_write_literal_u64()
  arm64: insn: Add helpers for BTI
  arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
  ACPI: Don't build ACPICA with '-Os'
  Compiler attributes: GCC cold function alignment workarounds
  ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS

* for-next/efi-boot-mmu-on:
  : Permit arm64 EFI boot with MMU and caches on
  arm64: kprobes: Drop ID map text from kprobes blacklist
  arm64: head: Switch endianness before populating the ID map
  efi: arm64: enter with MMU and caches enabled
  arm64: head: Clean the ID map and the HYP text to the PoC if needed
  arm64: head: avoid cache invalidation when entering with the MMU on
  arm64: head: record the MMU state at primary entry
  arm64: kernel: move identity map out of .text mapping
  arm64: head: Move all finalise_el2 calls to after __enable_mmu

* for-next/ptrauth:
  : arm64 pointer authentication cleanup
  arm64: pauth: don't sign leaf functions
  arm64: unify asm-arch manipulation

* for-next/pseudo-nmi:
  : Pseudo-NMI code generation optimisations
  arm64: irqflags: use alternative branches for pseudo-NMI logic
  arm64: add ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap
  arm64: make ARM64_HAS_GIC_PRIO_MASKING depend on ARM64_HAS_GIC_CPUIF_SYSREGS
  arm64: rename ARM64_HAS_IRQ_PRIO_MASKING to ARM64_HAS_GIC_PRIO_MASKING
  arm64: rename ARM64_HAS_SYSREG_GIC_CPUIF to ARM64_HAS_GIC_CPUIF_SYSREGS
2023-02-10 18:51:49 +00:00
Matthew Wilcox (Oracle)
6bc56a4d85 mm: add vma_alloc_zeroed_movable_folio()
Replace alloc_zeroed_user_highpage_movable().  The main difference is
returning a folio containing a single page instead of returning the page,
but take the opportunity to rename the function to match other allocation
functions a little better and rewrite the documentation to place more
emphasis on the zeroing rather than the highmem aspect.

Link: https://lkml.kernel.org/r/20230116191813.2145215-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:18 -08:00
Anshuman Khandual
004fc58f91 arm64/mm: Intercept pfn changes in set_pte_at()
Changing pfn on a user page table mapped entry, without first going through
break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
to intercept such changes, via an updated pgattr_change_is_safe(). This new
check happens via __check_racy_pte_update(), which has now been renamed as
__check_safe_pte_update().

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230130121457.1607675-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-31 15:58:09 +00:00
Ard Biesheuvel
6178617038 efi: arm64: enter with MMU and caches enabled
Instead of cleaning the entire loaded kernel image to the PoC and
disabling the MMU and caches before branching to the kernel's bare metal
entry point, we can leave the MMU and caches enabled, and rely on EFI's
cacheable 1:1 mapping of all of system RAM (which is mandated by the
spec) to populate the initial page tables.

This removes the need for managing coherency in software, which is
tedious and error prone.

Note that we still need to clean the executable region of the image to
the PoU if this is required for I/D coherency, but only if we actually
decided to move the image in memory, as otherwise, this will have been
taken care of by the loader.

This change affects both the builtin EFI stub as well as the zboot
decompressor, which now carries the entire EFI stub along with the
decompression code and the compressed image.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-7-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:51:08 +00:00
Ard Biesheuvel
af7249b317 arm64: kernel: move identity map out of .text mapping
Reorganize the ID map slightly so that only code that is executed with
the MMU off or via the 1:1 mapping remains. This allows us to move the
identity map out of the .text segment, as it will no longer need
executable permissions via the kernel mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:51:07 +00:00
Anshuman Khandual
5db568e748 arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.

Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.

Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230102061651.34745-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-01-06 17:14:55 +00:00
Linus Torvalds
77856d911a arm64 fixes for -rc1
- Fix Kconfig dependencies to re-allow the enabling of function graph
   tracer and shadow call stacks at the same time.
 
 - Revert the workaround for CPU erratum #2645198 since the CONFIG_
   guards were incorrect and the code has therefore not seen any real
   exposure in -next.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmOcVWkQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNF/7B/wIGicobLxXMMkuao+ipm5V/eLnVRuVt6WD
 T//5ZG+3Td3xON+mdt/byIm/Npl1I2l+NDjK9jIFcS5A/Q7DmwbcxJV+6BhRdb7o
 XQxkHsKR2DTbmbeqd0+AkZGJc4jk5D+vuyLeo8jcc6bSpQhepCOV5R5JVOadg9mg
 WxuITwsodI9GfQGmupggF6C31yMCYibmlD9WWNW8tNx8TBojU97pJbQaf1h3bC9i
 JE9CxBVYmt3Qg5BAB46EdH60lxELyHpEjJNvgvZZpFz4a/bBB47mG7Cy+ECldc5p
 LukJnAImydedwgQqgBD0e0HyXoIQ8r8NZ+yNgig2asQxA5DYsI1L
 =IZRW
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:

 - Fix Kconfig dependencies to re-allow the enabling of function graph
   tracer and shadow call stacks at the same time.

 - Revert the workaround for CPU erratum #2645198 since the CONFIG_
   guards were incorrect and the code has therefore not seen any real
   exposure in -next.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption"
  ftrace: Allow WITH_ARGS flavour of graph tracer with shadow call stack
2022-12-16 13:46:41 -06:00
Linus Torvalds
8fa590bf34 ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
   option to keep the good old dirty log around for pages that are
   dirtied by something other than a vcpu.
 
 * Switch to the relaxed parallel fault handling, using RCU to delay
   page table reclaim and giving better performance under load.
 
 * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
   which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a9:
   "Fix a number of issues with MTE, such as races on the tags being
   initialised vs the PG_mte_tagged flag as well as the lack of support
   for VM_SHARED when KVM is involved.  Patches from Catalin Marinas and
   Peter Collingbourne").
 
 * Merge the pKVM shadow vcpu state tracking that allows the hypervisor
   to have its own view of a vcpu, keeping that state private.
 
 * Add support for the PMUv3p5 architecture revision, bringing support
   for 64bit counters on systems that support it, and fix the
   no-quite-compliant CHAIN-ed counter support for the machines that
   actually exist out there.
 
 * Fix a handful of minor issues around 52bit VA/PA support (64kB pages
   only) as a prefix of the oncoming support for 4kB and 16kB pages.
 
 * Pick a small set of documentation and spelling fixes, because no
   good merge window would be complete without those.
 
 s390:
 
 * Second batch of the lazy destroy patches
 
 * First batch of KVM changes for kernel virtual != physical address support
 
 * Removal of a unused function
 
 x86:
 
 * Allow compiling out SMM support
 
 * Cleanup and documentation of SMM state save area format
 
 * Preserve interrupt shadow in SMM state save area
 
 * Respond to generic signals during slow page faults
 
 * Fixes and optimizations for the non-executable huge page errata fix.
 
 * Reprogram all performance counters on PMU filter change
 
 * Cleanups to Hyper-V emulation and tests
 
 * Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
   running on top of a L1 Hyper-V hypervisor)
 
 * Advertise several new Intel features
 
 * x86 Xen-for-KVM:
 
 ** Allow the Xen runstate information to cross a page boundary
 
 ** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
 
 ** Add support for 32-bit guests in SCHEDOP_poll
 
 * Notable x86 fixes and cleanups:
 
 ** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
 
 ** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
    years back when eliminating unnecessary barriers when switching between
    vmcs01 and vmcs02.
 
 ** Clean up vmread_error_trampoline() to make it more obvious that params
    must be passed on the stack, even for x86-64.
 
 ** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
    of the current guest CPUID.
 
 ** Fudge around a race with TSC refinement that results in KVM incorrectly
    thinking a guest needs TSC scaling when running on a CPU with a
    constant TSC, but no hardware-enumerated TSC frequency.
 
 ** Advertise (on AMD) that the SMM_CTL MSR is not supported
 
 ** Remove unnecessary exports
 
 Generic:
 
 * Support for responding to signals during page faults; introduces
   new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
 
 Selftests:
 
 * Fix an inverted check in the access tracking perf test, and restore
   support for asserting that there aren't too many idle pages when
   running on bare metal.
 
 * Fix build errors that occur in certain setups (unsure exactly what is
   unique about the problematic setup) due to glibc overriding
   static_assert() to a variant that requires a custom message.
 
 * Introduce actual atomics for clear/set_bit() in selftests
 
 * Add support for pinning vCPUs in dirty_log_perf_test.
 
 * Rename the so called "perf_util" framework to "memstress".
 
 * Add a lightweight psuedo RNG for guest use, and use it to randomize
   the access pattern and write vs. read percentage in the memstress tests.
 
 * Add a common ucall implementation; code dedup and pre-work for running
   SEV (and beyond) guests in selftests.
 
 * Provide a common constructor and arch hook, which will eventually be
   used by x86 to automatically select the right hypercall (AMD vs. Intel).
 
 * A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
   breakpoints, stage-2 faults and access tracking.
 
 * x86-specific selftest changes:
 
 ** Clean up x86's page table management.
 
 ** Clean up and enhance the "smaller maxphyaddr" test, and add a related
    test to cover generic emulation failure.
 
 ** Clean up the nEPT support checks.
 
 ** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
 
 ** Fix an ordering issue in the AMX test introduced by recent conversions
    to use kvm_cpu_has(), and harden the code to guard against similar bugs
    in the future.  Anything that tiggers caching of KVM's supported CPUID,
    kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
    the caching occurs before the test opts in via prctl().
 
 Documentation:
 
 * Remove deleted ioctls from documentation
 
 * Clean up the docs for the x86 MSR filter.
 
 * Various fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
 KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
 mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
 yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
 Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
 MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
 =QAYX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Enable the per-vcpu dirty-ring tracking mechanism, together with an
     option to keep the good old dirty log around for pages that are
     dirtied by something other than a vcpu.

   - Switch to the relaxed parallel fault handling, using RCU to delay
     page table reclaim and giving better performance under load.

   - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
     option, which multi-process VMMs such as crosvm rely on (see merge
     commit 382b5b87a9: "Fix a number of issues with MTE, such as
     races on the tags being initialised vs the PG_mte_tagged flag as
     well as the lack of support for VM_SHARED when KVM is involved.
     Patches from Catalin Marinas and Peter Collingbourne").

   - Merge the pKVM shadow vcpu state tracking that allows the
     hypervisor to have its own view of a vcpu, keeping that state
     private.

   - Add support for the PMUv3p5 architecture revision, bringing support
     for 64bit counters on systems that support it, and fix the
     no-quite-compliant CHAIN-ed counter support for the machines that
     actually exist out there.

   - Fix a handful of minor issues around 52bit VA/PA support (64kB
     pages only) as a prefix of the oncoming support for 4kB and 16kB
     pages.

   - Pick a small set of documentation and spelling fixes, because no
     good merge window would be complete without those.

  s390:

   - Second batch of the lazy destroy patches

   - First batch of KVM changes for kernel virtual != physical address
     support

   - Removal of a unused function

  x86:

   - Allow compiling out SMM support

   - Cleanup and documentation of SMM state save area format

   - Preserve interrupt shadow in SMM state save area

   - Respond to generic signals during slow page faults

   - Fixes and optimizations for the non-executable huge page errata
     fix.

   - Reprogram all performance counters on PMU filter change

   - Cleanups to Hyper-V emulation and tests

   - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
     guest running on top of a L1 Hyper-V hypervisor)

   - Advertise several new Intel features

   - x86 Xen-for-KVM:

      - Allow the Xen runstate information to cross a page boundary

      - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

      - Add support for 32-bit guests in SCHEDOP_poll

   - Notable x86 fixes and cleanups:

      - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

      - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
        a few years back when eliminating unnecessary barriers when
        switching between vmcs01 and vmcs02.

      - Clean up vmread_error_trampoline() to make it more obvious that
        params must be passed on the stack, even for x86-64.

      - Let userspace set all supported bits in MSR_IA32_FEAT_CTL
        irrespective of the current guest CPUID.

      - Fudge around a race with TSC refinement that results in KVM
        incorrectly thinking a guest needs TSC scaling when running on a
        CPU with a constant TSC, but no hardware-enumerated TSC
        frequency.

      - Advertise (on AMD) that the SMM_CTL MSR is not supported

      - Remove unnecessary exports

  Generic:

   - Support for responding to signals during page faults; introduces
     new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks

  Selftests:

   - Fix an inverted check in the access tracking perf test, and restore
     support for asserting that there aren't too many idle pages when
     running on bare metal.

   - Fix build errors that occur in certain setups (unsure exactly what
     is unique about the problematic setup) due to glibc overriding
     static_assert() to a variant that requires a custom message.

   - Introduce actual atomics for clear/set_bit() in selftests

   - Add support for pinning vCPUs in dirty_log_perf_test.

   - Rename the so called "perf_util" framework to "memstress".

   - Add a lightweight psuedo RNG for guest use, and use it to randomize
     the access pattern and write vs. read percentage in the memstress
     tests.

   - Add a common ucall implementation; code dedup and pre-work for
     running SEV (and beyond) guests in selftests.

   - Provide a common constructor and arch hook, which will eventually
     be used by x86 to automatically select the right hypercall (AMD vs.
     Intel).

   - A bunch of added/enabled/fixed selftests for ARM64, covering
     memslots, breakpoints, stage-2 faults and access tracking.

   - x86-specific selftest changes:

      - Clean up x86's page table management.

      - Clean up and enhance the "smaller maxphyaddr" test, and add a
        related test to cover generic emulation failure.

      - Clean up the nEPT support checks.

      - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.

      - Fix an ordering issue in the AMX test introduced by recent
        conversions to use kvm_cpu_has(), and harden the code to guard
        against similar bugs in the future. Anything that tiggers
        caching of KVM's supported CPUID, kvm_cpu_has() in this case,
        effectively hides opt-in XSAVE features if the caching occurs
        before the test opts in via prctl().

  Documentation:

   - Remove deleted ioctls from documentation

   - Clean up the docs for the x86 MSR filter.

   - Various fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
  KVM: x86: Add proper ReST tables for userspace MSR exits/flags
  KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
  KVM: arm64: selftests: Align VA space allocator with TTBR0
  KVM: arm64: Fix benign bug with incorrect use of VA_BITS
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
  KVM: x86: Advertise that the SMM_CTL MSR is not supported
  KVM: x86: remove unnecessary exports
  KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
  tools: KVM: selftests: Convert clear/set_bit() to actual atomics
  tools: Drop "atomic_" prefix from atomic test_and_set_bit()
  tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
  KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
  perf tools: Use dedicated non-atomic clear/set bit helpers
  tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
  KVM: arm64: selftests: Enable single-step without a "full" ucall()
  KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
  KVM: Remove stale comment about KVM_REQ_UNHALT
  KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
  KVM: Reference to kvm_userspace_memory_region in doc and comments
  KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
  ...
2022-12-15 11:12:21 -08:00
Will Deacon
c0cd1d5417 Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption"
This reverts commit 44ecda71fd.

All versions of this patch on the mailing list, including the version
that ended up getting merged, have portions of code guarded by the
non-existent CONFIG_ARM64_WORKAROUND_2645198 option. Although Anshuman
says he tested the code with some additional debug changes [1], I'm
hesitant to fix the CONFIG option and light up a bunch of code right
before I (and others) disappear for the end of year holidays, during
which time we won't be around to deal with any fallout.

So revert the change for now. We can bring back a fixed, tested version
for a later -rc when folks are thinking about things other than trees
and turkeys.

[1] https://lore.kernel.org/r/b6f61241-e436-5db1-1053-3b441080b8d6@arm.com
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20221215094811.23188-1-lukas.bulwahn@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-12-15 17:59:12 +00:00
Linus Torvalds
e2ca6ba6ba MM patches for 6.2-rc1.
- More userfaultfs work from Peter Xu.
 
 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
 
 - Some filemap cleanups from Vishal Moola.
 
 - David Hildenbrand added the ability to selftest anon memory COW handling.
 
 - Some cpuset simplifications from Liu Shixin.
 
 - Addition of vmalloc tracing support by Uladzislau Rezki.
 
 - Some pagecache folioifications and simplifications from Matthew Wilcox.
 
 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
 
 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.  This series shold have been in the
   non-MM tree, my bad.
 
 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages.
 
 - DAMON cleanups and tuneups from SeongJae Park
 
 - Tony Luck fixed the handling of COW faults against poisoned pages.
 
 - Peter Xu utilized the PTE marker code for handling swapin errors.
 
 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient.
 
 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand.
 
 - zram support for multiple compression streams from Sergey Senozhatsky.
 
 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway.
 
 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations.
 
 - Vishal Moola removed the try_to_release_page() wrapper.
 
 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache.
 
 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking.
 
 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend.
 
 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range().
 
 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen.
 
 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests.  Better, but still not perfect.
 
 - Christoph Hellwig has removed the .writepage() method from several
   filesystems.  They only need .writepages().
 
 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting.
 
 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines.
 
 - Many singleton patches, as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
 jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
 CodAgiA51qwzId3GRytIo/tfWZSezgA=
 =d19R
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - More userfaultfs work from Peter Xu

 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying

 - Some filemap cleanups from Vishal Moola

 - David Hildenbrand added the ability to selftest anon memory COW
   handling

 - Some cpuset simplifications from Liu Shixin

 - Addition of vmalloc tracing support by Uladzislau Rezki

 - Some pagecache folioifications and simplifications from Matthew
   Wilcox

 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
   it

 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.

   This series should have been in the non-MM tree, my bad

 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages

 - DAMON cleanups and tuneups from SeongJae Park

 - Tony Luck fixed the handling of COW faults against poisoned pages

 - Peter Xu utilized the PTE marker code for handling swapin errors

 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient

 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand

 - zram support for multiple compression streams from Sergey Senozhatsky

 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway

 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations

 - Vishal Moola removed the try_to_release_page() wrapper

 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache

 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking

 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend

 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range()

 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen

 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests. Better, but still not perfect

 - Christoph Hellwig has removed the .writepage() method from several
   filesystems. They only need .writepages()

 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting

 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines

 - Many singleton patches, as usual

* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
  mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
  mm: mmu_gather: allow more than one batch of delayed rmaps
  mm: fix typo in struct pglist_data code comment
  kmsan: fix memcpy tests
  mm: add cond_resched() in swapin_walk_pmd_entry()
  mm: do not show fs mm pc for VM_LOCKONFAULT pages
  selftests/vm: ksm_functional_tests: fixes for 32bit
  selftests/vm: cow: fix compile warning on 32bit
  selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
  mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
  mm,thp,rmap: fix races between updates of subpages_mapcount
  mm: memcg: fix swapcached stat accounting
  mm: add nodes= arg to memory.reclaim
  mm: disable top-tier fallback to reclaim on proactive reclaim
  selftests: cgroup: make sure reclaim target memcg is unprotected
  selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  mm: memcg: fix stale protection of reclaim target memcg
  mm/mmap: properly unaccount memory on mas_preallocate() failure
  omfs: remove ->writepage
  jfs: remove ->writepage
  ...
2022-12-13 19:29:45 -08:00
Linus Torvalds
fc4c9f4504 EFI updates for v6.2:
- Refactor the zboot code so that it incorporates all the EFI stub
   logic, rather than calling the decompressed kernel as a EFI app.
 - Add support for initrd= command line option to x86 mixed mode.
 - Allow initrd= to be used with arbitrary EFI accessible file systems
   instead of just the one the kernel itself was loaded from.
 - Move some x86-only handling and manipulation of the EFI memory map
   into arch/x86, as it is not used anywhere else.
 - More flexible handling of any random seeds provided by the boot
   environment (i.e., systemd-boot) so that it becomes available much
   earlier during the boot.
 - Allow improved arch-agnostic EFI support in loaders, by setting a
   uniform baseline of supported features, and adding a generic magic
   number to the DOS/PE header. This should allow loaders such as GRUB or
   systemd-boot to reduce the amount of arch-specific handling
   substantially.
 - (arm64) Run EFI runtime services from a dedicated stack, and use it to
   recover from synchronous exceptions that might occur in the firmware
   code.
 - (arm64) Ensure that we don't allocate memory outside of the 48-bit
   addressable physical range.
 - Make EFI pstore record size configurable
 - Add support for decoding CXL specific CPER records
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOTQ1cACgkQw08iOZLZ
 jyQRkAv+LqaZFWeVwhAQHiw/N3RnRM0nZHea6++D2p1y/ZbCpwv3pdLl2YHQ1KmW
 wDG9Nr4C1ITLtfy1YZKeYpwloQtq9S1GZDWnFpVv/hdo7L924eRAwIlxowWn1OnP
 ruxv2PaYXyb0plh1YD1f6E1BqrfUOtajET55Kxs9ZsxmnMtDpIX3NiYy4LKMBIZC
 +Eywt41M3uBX+wgmSujFBMVVJjhOX60WhUYXqy0RXwDKOyrz/oW5td+eotSCreB6
 FVbjvwQvUdtzn4s1FayOMlTrkxxLw4vLhsaUGAdDOHd3rg3sZT9Xh1HqFFD6nss6
 ZAzAYQ6BzdiV/5WSB9meJe+BeG1hjTNKjJI6JPO2lctzYJqlnJJzI6JzBuH9vzQ0
 dffLB8NITeEW2rphIh+q+PAKFFNbXWkJtV4BMRpqmzZ/w7HwupZbUXAzbWE8/5km
 qlFpr0kmq8GlVcbXNOFjmnQVrJ8jPYn+O3AwmEiVAXKZJOsMH0sjlXHKsonme9oV
 Sk71c6Em
 =JEXz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "Another fairly sizable pull request, by EFI subsystem standards.

  Most of the work was done by me, some of it in collaboration with the
  distro and bootloader folks (GRUB, systemd-boot), where the main focus
  has been on removing pointless per-arch differences in the way EFI
  boots a Linux kernel.

   - Refactor the zboot code so that it incorporates all the EFI stub
     logic, rather than calling the decompressed kernel as a EFI app.

   - Add support for initrd= command line option to x86 mixed mode.

   - Allow initrd= to be used with arbitrary EFI accessible file systems
     instead of just the one the kernel itself was loaded from.

   - Move some x86-only handling and manipulation of the EFI memory map
     into arch/x86, as it is not used anywhere else.

   - More flexible handling of any random seeds provided by the boot
     environment (i.e., systemd-boot) so that it becomes available much
     earlier during the boot.

   - Allow improved arch-agnostic EFI support in loaders, by setting a
     uniform baseline of supported features, and adding a generic magic
     number to the DOS/PE header. This should allow loaders such as GRUB
     or systemd-boot to reduce the amount of arch-specific handling
     substantially.

   - (arm64) Run EFI runtime services from a dedicated stack, and use it
     to recover from synchronous exceptions that might occur in the
     firmware code.

   - (arm64) Ensure that we don't allocate memory outside of the 48-bit
     addressable physical range.

   - Make EFI pstore record size configurable

   - Add support for decoding CXL specific CPER records"

* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
  arm64: efi: Recover from synchronous exceptions occurring in firmware
  arm64: efi: Execute runtime services from a dedicated stack
  arm64: efi: Limit allocations to 48-bit addressable physical region
  efi: Put Linux specific magic number in the DOS header
  efi: libstub: Always enable initrd command line loader and bump version
  efi: stub: use random seed from EFI variable
  efi: vars: prohibit reading random seed variables
  efi: random: combine bootloader provided RNG seed with RNG protocol output
  efi/cper, cxl: Decode CXL Error Log
  efi/cper, cxl: Decode CXL Protocol Error Section
  efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
  efi: x86: Move EFI runtime map sysfs code to arch/x86
  efi: runtime-maps: Clarify purpose and enable by default for kexec
  efi: pstore: Add module parameter for setting the record size
  efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
  efi: memmap: Move manipulation routines into x86 arch tree
  efi: memmap: Move EFI fake memmap support into x86 arch tree
  efi: libstub: Undeprecate the command line initrd loader
  efi: libstub: Add mixed mode support to command line initrd loader
  efi: libstub: Permit mixed mode return types other than efi_status_t
  ...
2022-12-13 14:31:47 -08:00
Linus Torvalds
06cff4a58e arm64 updates for 6.2
ACPI:
 	* Enable FPDT support for boot-time profiling
 	* Fix CPU PMU probing to work better with PREEMPT_RT
 	* Update SMMUv3 MSI DeviceID parsing to latest IORT spec
 	* APMT support for probing Arm CoreSight PMU devices
 
 CPU features:
 	* Advertise new SVE instructions (v2.1)
 	* Advertise range prefetch instruction
 	* Advertise CSSC ("Common Short Sequence Compression") scalar
 	  instructions, adding things like min, max, abs, popcount
 	* Enable DIT (Data Independent Timing) when running in the kernel
 	* More conversion of system register fields over to the generated
 	  header
 
 CPU misfeatures:
 	* Workaround for Cortex-A715 erratum #2645198
 
 Dynamic SCS:
 	* Support for dynamic shadow call stacks to allow switching at
 	  runtime between Clang's SCS implementation and the CPU's
 	  pointer authentication feature when it is supported (complete
 	  with scary DWARF parser!)
 
 Tracing and debug:
 	* Remove static ftrace in favour of, err, dynamic ftrace!
 	* Seperate 'struct ftrace_regs' from 'struct pt_regs' in core
 	  ftrace and existing arch code
 	* Introduce and implement FTRACE_WITH_ARGS on arm64 to replace
 	  the old FTRACE_WITH_REGS
 	* Extend 'crashkernel=' parameter with default value and fallback
 	  to placement above 4G physical if initial (low) allocation
 	  fails
 
 SVE:
 	* Optimisation to avoid disabling SVE unconditionally on syscall
 	  entry and just zeroing the non-shared state on return instead
 
 Exceptions:
 	* Rework of undefined instruction handling to avoid serialisation
 	  on global lock (this includes emulation of user accesses to the
 	  ID registers)
 
 Perf and PMU:
 	* Support for TLP filters in Hisilicon's PCIe PMU device
 	* Support for the DDR PMU present in Amlogic Meson G12 SoCs
 	* Support for the terribly-named "CoreSight PMU" architecture
 	  from Arm (and Nvidia's implementation of said architecture)
 
 Misc:
 	* Tighten up our boot protocol for systems with memory above
           52 bits physical
 	* Const-ify static keys to satisty jump label asm constraints
 	* Trivial FFA driver cleanups in preparation for v1.1 support
 	* Export the kernel_neon_* APIs as GPL symbols
 	* Harden our instruction generation routines against
 	  instrumentation
 	* A bunch of robustness improvements to our arch-specific selftests
 	* Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmOPLFAQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNPRcCACLyDTvkimiqfoPxzzgdkx/6QOvw9s3/mXg
 UcTORSZBR1VnYkiMYEKVz/tTfG99dnWtD8/0k/rz48NbhBfsF2sN4ukyBBXVf0zR
 fjnaVyVC11LUgBgZKPo6maV+jf/JWf9hJtpPl06KTiPb2Hw2JX4DXg+PeF8t2hGx
 NLH4ekQOrlDM8mlsN5mc0YsHbiuO7Xe/NRuet8TsgU4bEvLAwO6bzOLVUMqDQZNq
 bQe2ENcGVAzAf7iRJb38lj9qB/5hrQTHRXqLXMSnJyyVjQEwYca0PeJMa7x30bXF
 ZZ+xQ8Wq0mxiffZraf6SE34yD4gaYS4Fziw7rqvydC15vYhzJBH1
 =hV+2
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The highlights this time are support for dynamically enabling and
  disabling Clang's Shadow Call Stack at boot and a long-awaited
  optimisation to the way in which we handle the SVE register state on
  system call entry to avoid taking unnecessary traps from userspace.

  Summary:

  ACPI:
   - Enable FPDT support for boot-time profiling
   - Fix CPU PMU probing to work better with PREEMPT_RT
   - Update SMMUv3 MSI DeviceID parsing to latest IORT spec
   - APMT support for probing Arm CoreSight PMU devices

  CPU features:
   - Advertise new SVE instructions (v2.1)
   - Advertise range prefetch instruction
   - Advertise CSSC ("Common Short Sequence Compression") scalar
     instructions, adding things like min, max, abs, popcount
   - Enable DIT (Data Independent Timing) when running in the kernel
   - More conversion of system register fields over to the generated
     header

  CPU misfeatures:
   - Workaround for Cortex-A715 erratum #2645198

  Dynamic SCS:
   - Support for dynamic shadow call stacks to allow switching at
     runtime between Clang's SCS implementation and the CPU's pointer
     authentication feature when it is supported (complete with scary
     DWARF parser!)

  Tracing and debug:
   - Remove static ftrace in favour of, err, dynamic ftrace!
   - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
     and existing arch code
   - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
     old FTRACE_WITH_REGS
   - Extend 'crashkernel=' parameter with default value and fallback to
     placement above 4G physical if initial (low) allocation fails

  SVE:
   - Optimisation to avoid disabling SVE unconditionally on syscall
     entry and just zeroing the non-shared state on return instead

  Exceptions:
   - Rework of undefined instruction handling to avoid serialisation on
     global lock (this includes emulation of user accesses to the ID
     registers)

  Perf and PMU:
   - Support for TLP filters in Hisilicon's PCIe PMU device
   - Support for the DDR PMU present in Amlogic Meson G12 SoCs
   - Support for the terribly-named "CoreSight PMU" architecture from
     Arm (and Nvidia's implementation of said architecture)

  Misc:
   - Tighten up our boot protocol for systems with memory above 52 bits
     physical
   - Const-ify static keys to satisty jump label asm constraints
   - Trivial FFA driver cleanups in preparation for v1.1 support
   - Export the kernel_neon_* APIs as GPL symbols
   - Harden our instruction generation routines against instrumentation
   - A bunch of robustness improvements to our arch-specific selftests
   - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
  arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
  arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
  arm64: Prohibit instrumentation on arch_stack_walk()
  arm64:uprobe fix the uprobe SWBP_INSN in big-endian
  arm64: alternatives: add __init/__initconst to some functions/variables
  arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
  kselftest/arm64: Allow epoll_wait() to return more than one result
  kselftest/arm64: Don't drain output while spawning children
  kselftest/arm64: Hold fp-stress children until they're all spawned
  arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
  arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
  arm64/sysreg: Convert MVFR2_EL1 to automatic generation
  arm64/sysreg: Convert MVFR1_EL1 to automatic generation
  arm64/sysreg: Convert MVFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
  ...
2022-12-12 09:50:05 -08:00
Feiyang Chen
2045a3b891 mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()
Generalise vmemmap_populate_hugepages() so ARM64 & X86 & LoongArch can
share its implementation.

Link: https://lkml.kernel.org/r/20221027125253.3458989-4-chenhuacai@loongson.cn
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Min Zhou <zhoumin@loongson.cn>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11 18:12:12 -08:00
Ard Biesheuvel
e8dfdf3162 arm64: efi: Recover from synchronous exceptions occurring in firmware
Unlike x86, which has machinery to deal with page faults that occur
during the execution of EFI runtime services, arm64 has nothing like
that, and a synchronous exception raised by firmware code brings down
the whole system.

With more EFI based systems appearing that were not built to run Linux
(such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as
the introduction of PRM (platform specific firmware routines that are
callable just like EFI runtime services), we are more likely to run into
issues of this sort, and it is much more likely that we can identify and
work around such issues if they don't bring down the system entirely.

Since we already use a EFI runtime services call wrapper in assembler,
we can quite easily add some code that captures the execution state at
the point where the call is made, allowing us to revert to this state
and proceed execution if the call triggered a synchronous exception.

Given that the kernel and the firmware don't share any data structures
that could end up in an indeterminate state, we can happily continue
running, as long as we mark the EFI runtime services as unavailable from
that point on.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2022-12-08 18:33:34 +01:00
Linus Torvalds
8ed710da28 Revert the dropping of the cache invalidation from the arm64
arch_dma_prep_coherent() as it caused a regression in the qcom_q6v5_mss
 remoteproc driver. The driver is already buggy but the original arm64
 change made the problem obvious. The change will be re-introduced once
 the driver is fixed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmOPfrIACgkQa9axLQDI
 XvH2SxAAolN3oris9RYVlVsldd38XSklgEjKhotJulYUzK2+9u1Chv757sX1UlyN
 YkZD2uZfHYIVMXzgEVlkecepMRNj7Vp8i4fzTRqUcYBeRjhNz8dfkAwS5L5ezA9f
 NFYnh3tv4YYn5LNv03Gd2VQMrniOHYfyZKLextZyJC0OuabIHXGAgbR8vt0cJeAe
 CzZkoGzmtv1IPd81DEeUZYsW+KGOHqKHj8aI+0DWHNGLmc3H/VNoQ5JlNOZSgR8F
 DOAmM+UgeTVNJfTv81u4skTcBm3Dr8aDyIUlHTrjXXRPAmpeFoy1WwN22jeq0P2z
 6nnbHGufgqVfhebd1Wy5sKQHg3tVZ8FEXJW9Tpw85m019v4jaPF3Pmz6e1Lvlzuz
 pUX5fbLkwNHBJMUkkw5iVR9W5P3vrKkcq/XGa2y8sMPUDK7IaiMGQhVTT724oHgy
 UW9j0G7iUguXNii6ZHyUPOdaA2Vrjj3AjmJX6bHhwDSvvkNZGZgA4iBY+yRoUU2V
 xBuOkLE9oa5QE62jGoipajgi3CIB3cXwbNZyfPuZkkosHuqqRDkuLX5TJ2snAHhs
 lSSwDlAGY+9p4jiTd8/36E2GPXZOGPDn3L/3UdYsi9ScNKYFNJREKL8b2ZBlTo9X
 P0qIeCmiyMPp5s6i3D9M8xRbVySgu542tcQG3+KhbGUsl3XElr0=
 =l6Q7
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Revert the dropping of the cache invalidation from the arm64
  arch_dma_prep_coherent() as it caused a regression in the
  qcom_q6v5_mss remoteproc driver.

  The driver is already buggy but the original arm64 change made
  the problem obvious. The change will be re-introduced once the
  driver is fixed"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
2022-12-06 11:03:03 -08:00
Will Deacon
b7d9aae404 Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
This reverts commit c44094eee3.

Although the semantics of the DMA API require only a clean operation
here, it turns out that the Qualcomm 'qcom_q6v5_mss' remoteproc driver
(ab)uses the DMA API for transferring the modem firmware to the secure
world via calls to Trustzone [1].

Once the firmware buffer has changed hands, _any_ access from the
non-secure side (i.e. Linux) will be detected on the bus and result in a
full system reset [2]. Although this is possible even with this revert
in place (due to speculative reads via the cacheable linear alias of
memory), anecdotally the problem occurs considerably more frequently
when the lines have not been invalidated, assumedly due to some
micro-architectural interactions with the cache hierarchy.

Revert the offending change for now, along with a comment, so that the
Qualcomm developers have time to fix the driver [3] to use a firmware
buffer which does not have a cacheable alias in the linear map.

Link: https://lore.kernel.org/r/20221114110329.68413-1-manivannan.sadhasivam@linaro.org [1]
Link: https://lore.kernel.org/r/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ [2]
Link: https://lore.kernel.org/r/20221206092152.GD15486@thinkpad [2]
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Sibi Sankar <quic_sibis@quicinc.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20221206103403.646-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-12-06 17:30:39 +00:00
Will Deacon
9d84ad425d Merge branch 'for-next/trivial' into for-next/core
* for-next/trivial:
  arm64: alternatives: add __init/__initconst to some functions/variables
  arm64/asm: Remove unused assembler DAIF save/restore macros
  arm64/kpti: Move DAIF masking to C code
  Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)"
  arm64/mm: Drop unused restore_ttbr1
  arm64: alternatives: make apply_alternatives_vdso() static
  arm64/mm: Drop idmap_pg_end[] declaration
  arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)
  arm64: make is_ttbrX_addr() noinstr-safe
  arm64/signal: Document our convention for choosing magic numbers
  arm64: atomics: lse: remove stale dependency on JUMP_LABEL
  arm64: paravirt: remove conduit check in has_pv_steal_clock
  arm64: entry: Fix typo
  arm64/booting: Add missing colon to FA64 entry
  arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPS
  arm64/asm: Remove unused enable_da macro
2022-12-06 11:33:29 +00:00
Will Deacon
c947948f7a Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
  arm64: booting: Require placement within 48-bit addressable memory
  arm64: mm: kfence: only handle translation faults
  arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
2022-12-06 11:21:21 +00:00
Will Deacon
926939c734 Merge branch 'for-next/kdump' into for-next/core
* for-next/kdump:
  arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
  arm64: kdump: Provide default size when crashkernel=Y,low is not specified
2022-12-06 11:16:20 +00:00
Marc Zyngier
382b5b87a9 Merge branch kvm-arm64/mte-map-shared into kvmarm-master/next
* kvm-arm64/mte-map-shared:
  : .
  : Update the MTE support to allow the VMM to use shared mappings
  : to back the memslots exposed to MTE-enabled guests.
  :
  : Patches courtesy of Catalin Marinas and Peter Collingbourne.
  : .
  : Fix a number of issues with MTE, such as races on the tags
  : being initialised vs the PG_mte_tagged flag as well as the
  : lack of support for VM_SHARED when KVM is involved.
  :
  : Patches from Catalin Marinas and Peter Collingbourne.
  : .
  Documentation: document the ABI changes for KVM_CAP_ARM_MTE
  KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled
  KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled
  arm64: mte: Lock a page for MTE tag initialisation
  mm: Add PG_arch_3 page flag
  KVM: arm64: Simplify the sanitise_mte_tags() logic
  arm64: mte: Fix/clarify the PG_mte_tagged semantics
  mm: Do not enable PG_arch_2 for all 64-bit architectures

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05 14:38:24 +00:00
Linus Torvalds
355479c70a Final EFI fix for v6.1
- Revert runtime service sync exception recovery on arm64
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOIsLoACgkQw08iOZLZ
 jyQspgv7BFM/z+PXeKCPee6uGUhNCjUoKP3Pnoo9jwe3sbSVagYUZZ5/vx01Ei33
 FHDi4FeSxo+adg2tb8xf8X5tZmfHYiQM5gEqK3xZXaBSJRT7FFY9+eRL6753Qe5W
 yxn220/nI9xTErGeEAabAHIngIOpth5nGQhRDQbM8r7EZDobGUu0NrJSRVnS7IBF
 Ov14VMkfqryInnJ8tzHwP4hz40Rrkf54dgNTPyJvUxJ3LCR5TqmeRp+16CZg87By
 CnmySayLnr+YLbZjp8vmQy9eUGVf4acPoDcXfBTe/lkE1BA4Qi+Hd/TaYgE8Fj0C
 B5g+sZa3khhaQc4Oc2RMOqpxK80DK41n1Vvy1UOZGOBFgZVoxT2bmDbAPohLyU/4
 lw6LqioDR2N4ZxiANGHuOVa8so8iY+Et30fLqRkeXj1TXqC84tDoxXGhsTEjF1Ad
 Vi0GGmrj4No4l0RM+4yGZIhNMaMGd5OhGdbwAiKUaLS9UyjvfqkaLGJ9S2wYVgZj
 R1PoDRWX
 =r/C6
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "A single revert for some code that I added during this cycle. The code
  is not wrong, but it should be a bit more careful about how to handle
  the shadow call stack pointer, so it is better to revert it for now
  and bring it back later in improved form.

  Summary:

   - Revert runtime service sync exception recovery on arm64"

* tag 'efi-fixes-for-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  arm64: efi: Revert "Recover from synchronous exceptions ..."
2022-12-01 11:25:11 -08:00
Ard Biesheuvel
7572ac3c97 arm64: efi: Revert "Recover from synchronous exceptions ..."
This reverts commit 23715a26c8, which introduced some code in
assembler that manipulates both the ordinary and the shadow call stack
pointer in a way that could potentially be taken advantage of. So let's
revert it, and do a better job the next time around.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-12-01 14:48:26 +01:00
Catalin Marinas
d77e59a8fc arm64: mte: Lock a page for MTE tag initialisation
Initialising the tags and setting PG_mte_tagged flag for a page can race
between multiple set_pte_at() on shared pages or setting the stage 2 pte
via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 and
set it before attempting page initialisation. Given that PG_mte_tagged
is never cleared for a page, consider setting this flag to mean page
unlocked and wait on this bit with acquire semantics if the page is
locked:

- try_page_mte_tagging() - lock the page for tagging, return true if it
  can be tagged, false if already tagged. No acquire semantics if it
  returns true (PG_mte_tagged not set) as there is no serialisation with
  a previous set_page_mte_tagged().

- set_page_mte_tagged() - set PG_mte_tagged with release semantics.

The two-bit locking is based on Peter Collingbourne's idea.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221104011041.290951-6-pcc@google.com
2022-11-29 09:26:07 +00:00
Catalin Marinas
e059853d14 arm64: mte: Fix/clarify the PG_mte_tagged semantics
Currently the PG_mte_tagged page flag mostly means the page contains
valid tags and it should be set after the tags have been cleared or
restored. However, in mte_sync_tags() it is set before setting the tags
to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for
shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on
write in another thread can cause the new page to have stale tags.
Similarly, tag reading via ptrace() can read stale tags if the
PG_mte_tagged flag is set before actually clearing/restoring the tags.

Fix the PG_mte_tagged semantics so that it is only set after the tags
have been cleared or restored. This is safe for swap restoring into a
MAP_SHARED or CoW page since the core code takes the page lock. Add two
functions to test and set the PG_mte_tagged flag with acquire and
release semantics. The downside is that concurrent mprotect(PROT_MTE) on
a MAP_SHARED page may cause tag loss. This is already the case for KVM
guests if a VMM changes the page protection while the guest triggers a
user_mem_abort().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[pcc@google.com: fix build with CONFIG_ARM64_MTE disabled]
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com
2022-11-29 09:26:07 +00:00
Mark Brown
a8bf2fc43f arm64/kpti: Move DAIF masking to C code
We really don't want to take an exception while replacing TTBR1 so we mask
DAIF during the actual update. Currently this is done in the assembly
function idmap_cpu_replace_ttbr1() but it could equally be done in the only
caller of that function, cpu_replace_ttbr1(). This simplifies the assembly
code slightly and means that when working with the code around masking DAIF
flags there is one less piece of assembly code which needs to be considered.

While we're at it add a comment which makes explicit why we are masking
DAIF in this code.

There should be no functional effect.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221123180209.634650-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-25 12:17:52 +00:00
Will Deacon
32d495b0c3 Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)"
This reverts commit 9ed2b4616d.

Nathan reports early boot failures bisected to this change which look
related to the kPTI nG repainting. In any case, consolidating the
BUG_ON()s to a single location needs more thought, so revert the change
until this is figured out properly.

Link: https://lore.kernel.org/r/Y3pS5fdZ3MdLZ00t@dev-arch.thelio-3990X
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-21 12:52:07 +00:00
Anshuman Khandual
44ecda71fd arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the
next instruction abort caused by permission fault.

Only user-space does executable to non-executable permission transition via
mprotect() system call which calls ptep_modify_prot_start() and ptep_modify
_prot_commit() helpers, while changing the page mapping. The platform code
can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION.

Work around the problem via doing a break-before-make TLB invalidation, for
all executable user space mappings, that go through mprotect() system call.
This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via
defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving
an opportunity to intercept user space exec mappings, and do the necessary
TLB invalidation. Similar interceptions are also implemented for HugeTLB.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116140915.356601-3-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 16:52:40 +00:00
Zhen Lei
a9ae89df73 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
For crashkernel=X without '@offset', select a region within DMA zones
first, and fall back to reserve region above DMA zones. This allows
users to use the same configuration on multiple platforms.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116121044.1690-3-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 14:14:35 +00:00
Zhen Lei
a149cf00b1 arm64: kdump: Provide default size when crashkernel=Y,low is not specified
Try to allocate at least 128 MiB low memory automatically for the case
that crashkernel=,high is explicitly specified, while crashkenrel=,low
is omitted. This allows users to focus more on the high memory
requirements of their business rather than the low memory requirements
of the crash kernel booting.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116121044.1690-2-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 14:14:35 +00:00
Anshuman Khandual
9ed2b4616d arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)
__create_pgd_mapping_locked() expects a page allocator used while mapping a
virtual range. This page allocator function propagates down the call chain,
while building intermediate levels in the page table. Passed page allocator
is a necessary ingredient required to build the page table but its presence
can be asserted just once in the very beginning rather than in all the down
stream functions. This consolidates BUG_ON(!pgtable_alloc) checks just in a
single place i.e __create_pgd_mapping_locked().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20221118053102.500216-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 14:10:41 +00:00
Mark Rutland
0bb1fbffc6 arm64: mm: kfence: only handle translation faults
Alexander noted that KFENCE only expects to handle faults from invalid page
table entries (i.e. translation faults), but arm64's fault handling logic will
call kfence_handle_page_fault() for other types of faults, including alignment
faults caused by unaligned atomics. This has the unfortunate property of
causing those other faults to be reported as "KFENCE: use-after-free",
which is misleading and hinders debugging.

Fix this by only forwarding unhandled translation faults to the KFENCE
code, similar to what x86 does already.

Alexander has verified that this passes all the tests in the KFENCE test
suite and avoids bogus reports on misaligned atomics.

Link: https://lore.kernel.org/all/20221102081620.1465154-1-zhongbaisong@huawei.com/
Fixes: 840b239863 ("arm64, kfence: enable KFENCE for ARM64")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221114104411.2853040-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-15 13:29:05 +00:00
Linus Torvalds
5ad6e7ba98 arm64 fixes:
- Another fix for rodata=full. Since rodata= is not a simple boolean on
   arm64 (accepting 'full' as well), it got inadvertently broken by
   changes in the core code. If rodata=on is the default and rodata=off
   is passed on the kernel command line, rodata_full is never disabled.
 
 - Fix gcc compiler warning of shifting 0xc0 into bits 31:24 without an
   explicit conversion to u32 (triggered by the AMPERE1 MIDR definition).
 
 - Include asm/ptrace.h in asm/syscall_wrapper.h to fix an incomplete
   struct pt_regs type causing the BPF verifier to refuse to load a
   tracing program which accesses pt_regs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmNu1ZQACgkQa9axLQDI
 XvHdaQ/+NmdLFZuAbajhpV3NTmiC6oJmPZ9oGKxtBRGoHmH6WxoyB1X3Dj9ZMGBc
 qZ1mUohHHwf+xWtX85aPUx4aJKImFuqSOU4MF5gectzXFEr4jR2nx5CYBaRmG+0d
 axAUHIID4xWozid79KFUR46myG9GhTv8EGMlrY9VKY2GsFSD/EtNTQZhq+ISc05N
 SQAkHN3iiIJhitgTtxxA2jQ8sEOvq41N4VRj4trNUnAp5Gis4nnl9KGpcayN/60z
 8/9IGwcaLpnQnlf74E74y53Tfq8ZHSpiNcl3J5Mk1eMNrbkFePakyH/OtpHHYiZ0
 /Y1cqUqvK2oZODE1Ro4mBMFLq3+lFMedla5S0jZtjApYnJ8MmUqFQwvJHIn0K02h
 oH1OstQtYLUINTF3k1+TaYnmFjenx1eOJ/tdLjkcgI9WFbAYcqu1Z4bcO0HR0fqj
 7d7HKCElHg+KIQBTfDtLvO1lNRvdpZYWe5uN3ItzPpYqrah4haHPJXPxcxVChkMv
 A445kWMX5pV7CTNzlkrGgWjWkF8EwJ+lgr/xASrpHPxW80cZ7b+eg2lFkU0fptsz
 FoHdUduGDB91hHo/k0Adx2gaS37SNOcBczr/1xFcId1ZpcxWwuBM1JeEDz1ccRDR
 P9qKykOn6sQ8ykQGDIlyY9Hgaz8KgihT2VDixRyY0NNGq0XO4LM=
 =W6FH
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Another fix for rodata=full. Since rodata= is not a simple boolean on
   arm64 (accepting 'full' as well), it got inadvertently broken by
   changes in the core code. If rodata=on is the default and rodata=off
   is passed on the kernel command line, rodata_full is never disabled

 - Fix gcc compiler warning of shifting 0xc0 into bits 31:24 without an
   explicit conversion to u32 (triggered by the AMPERE1 MIDR definition)

 - Include asm/ptrace.h in asm/syscall_wrapper.h to fix an incomplete
   struct pt_regs type causing the BPF verifier to refuse to load a
   tracing program which accesses pt_regs

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/syscall: Include asm/ptrace.h in syscall_wrapper header.
  arm64: Fix bit-shifting UB in the MIDR_CPU_MODEL() macro
  arm64: fix rodata=full again
2022-11-11 17:10:13 -08:00
Kefeng Wang
e025ab842e mm: remove kern_addr_valid() completely
Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address.  So as there is no need for kern_addr_valid(), let's
remove it.

Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de>			[parisc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: Guo Ren <guoren@kernel.org>			[csky]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 17:37:18 -08:00
Ard Biesheuvel
2081b3bd0c arm64: fix rodata=full again
Commit 2e8cff0a0e ("arm64: fix rodata=full") addressed a couple of
issues with the rodata= kernel command line option, which is not a
simple boolean on arm64, and inadvertently got broken due to changes in
the generic bool handling.

Unfortunately, the resulting code never clears the rodata_full boolean
variable if it defaults to true and rodata=on or rodata=off is passed,
as the generic code is not aware of the existence of this variable.

Given the way this code is plumbed together, clearing rodata_full when
returning false from arch_parse_debug_rodata() may result in
inconsistencies if the generic code decides that it cannot parse the
right hand side, so the best way to deal with this is to only take
rodata_full in account if rodata_enabled is also true.

Fixes: 2e8cff0a0e ("arm64: fix rodata=full")
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221103170015.4124426-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-11-08 16:21:40 +00:00
Anshuman Khandual
38e4b6605e arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPS
Currently ARM64_KERNEL_USES_PMD_MAPS is an unnecessary abstraction. Kernel
mapping at PMD (aka huge page aka block) level, is only applicable with 4K
base page, which makes it 2MB aligned, a necessary requirement for linear
mapping and physical memory start address. This can be easily achieved by
directly checking against base page size itself. This drops off the macro
ARM64_KERNE_USES_PMD_MAPS which is redundant.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221108034406.2950071-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08 13:59:36 +00:00
Ard Biesheuvel
23715a26c8 arm64: efi: Recover from synchronous exceptions occurring in firmware
Unlike x86, which has machinery to deal with page faults that occur
during the execution of EFI runtime services, arm64 has nothing like
that, and a synchronous exception raised by firmware code brings down
the whole system.

With more EFI based systems appearing that were not built to run Linux
(such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as
the introduction of PRM (platform specific firmware routines that are
callable just like EFI runtime services), we are more likely to run into
issues of this sort, and it is much more likely that we can identify and
work around such issues if they don't bring down the system entirely.

Since we already use a EFI runtime services call wrapper in assembler,
we can quite easily add some code that captures the execution state at
the point where the call is made, allowing us to revert to this state
and proceed execution if the call triggered a synchronous exception.

Given that the kernel and the firmware don't share any data structures
that could end up in an indeterminate state, we can happily continue
running, as long as we mark the EFI runtime services as unavailable from
that point on.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2022-11-03 18:01:15 +01:00
Linus Torvalds
1df046ab1c arm64 fixes:
- Cortex-A55 errata workaround (repeat TLBI).
 
 - AMPERE1 added to the Spectre-BHB affected list.
 
 - MTE fix to avoid setting PG_mte_tagged if no tags have been touched on
   a page.
 
 - Fixed typo in the SCTLR_EL1.SPINTMASK bit naming (the commit log has
   other typos).
 
 - perf: return value check in ali_drw_pmu_probe(),
   ALIBABA_UNCORE_DRW_PMU dependency on ACPI.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmNJrpAACgkQa9axLQDI
 XvFwWQ/+O71bVQPXf43p+O3LapX3IsOxDCWLTMjNcxVgSBSK+TPwtjXN7vIPZDvv
 Ibx4Y10vmHo8Copbs7C8USx+hGo7hzknk/s2zoeqJQX13WQkqpuAwTDshzMp60La
 nQoJXab3KapQ3UIPL5El/cbvAD9+DGJSiWdyvC8GBHwtWKWi1WDpSNFN3WMJm97P
 uQqERiWaf3XOI9BhsuOlCzQE5eemCllycdWoRBelCjIQByuo6SaDPEpTUZDICCPp
 f4Ji7U1hfORmXg/DJcjSJbtkSshVRqjhSAtAmP/sWUic7+kWGBiC+zJQ0PxwiNQH
 Bfryz90ETa/INA65hA1iC51lE7hvt1DKueZAMKjozxYSCSVAxUNonSkEOfKegPeU
 hLhTowmveryqxYGuQ75p5tZjdpvML0Sa/lx7p/GUEhaV77dca/EJ0B68x8WrBpO5
 TCsW3iDq2V+ErWgYL7n6nFoMhZQnNvq9jxhAPuJ8Y47ZkeQ8HcvooKLHUSDSjMk2
 f/7A7rUJh0piYf0FEPSjRBTO/HyPb1D90n1t2wJoCqwrICZ/mmWzVqua0fgmrbvS
 H33YQiSEIkwsfLktIIJRGknYgC0P/JALKlAQPAcmsd+njWsThXJ/WwwRrpvCZdMj
 9CVuDfhw7Ipt4Iz5Tg61lLDkzi7bPRqPpEKc8zzsI3nmY0KC/iA=
 =vjYu
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Cortex-A55 errata workaround (repeat TLBI)

 - AMPERE1 added to the Spectre-BHB affected list

 - MTE fix to avoid setting PG_mte_tagged if no tags have been touched
   on a page

 - Fixed typo in the SCTLR_EL1.SPINTMASK bit naming (the commit log has
   other typos)

 - perf: return value check in ali_drw_pmu_probe(),
   ALIBABA_UNCORE_DRW_PMU dependency on ACPI

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add AMPERE1 to the Spectre-BHB affected list
  arm64: mte: Avoid setting PG_mte_tagged if no tags cleared or restored
  MAINTAINERS: rectify file entry in ALIBABA PMU DRIVER
  drivers/perf: ALIBABA_UNCORE_DRW_PMU should depend on ACPI
  drivers/perf: fix return value check in ali_drw_pmu_probe()
  arm64: errata: Add Cortex-A55 to the repeat tlbi list
  arm64/sysreg: Fix typo in SCTR_EL1.SPINTMASK
2022-10-14 12:38:03 -07:00
Catalin Marinas
a8e5e5146a arm64: mte: Avoid setting PG_mte_tagged if no tags cleared or restored
Prior to commit 69e3b846d8 ("arm64: mte: Sync tags for pages where PTE
is untagged"), mte_sync_tags() was only called for pte_tagged() entries
(those mapped with PROT_MTE). Therefore mte_sync_tags() could safely use
test_and_set_bit(PG_mte_tagged, &page->flags) without inadvertently
setting PG_mte_tagged on an untagged page.

The above commit was required as guests may enable MTE without any
control at the stage 2 mapping, nor a PROT_MTE mapping in the VMM.
However, the side-effect was that any page with a PTE that looked like
swap (or migration) was getting PG_mte_tagged set automatically. A
subsequent page copy (e.g. migration) copied the tags to the destination
page even if the tags were owned by KASAN.

This issue was masked by the page_kasan_tag_reset() call introduced in
commit e5b8d92189 ("arm64: mte: reset the page tag in page->flags").
When this commit was reverted (20794545c1), KASAN started reporting
access faults because the overriding tags in a page did not match the
original page->flags (with CONFIG_KASAN_HW_TAGS=y):

  BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
  Read at addr f5ff000017f2e000 by task syz-executor.1/2218
  Pointer tag: [f5], memory tag: [f2]

Move the PG_mte_tagged bit setting from mte_sync_tags() to the actual
place where tags are cleared (mte_sync_page_tags()) or restored
(mte_restore_tags()).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: syzbot+c2c79c6d6eddc5262b77@syzkaller.appspotmail.com
Fixes: 69e3b846d8 ("arm64: mte: Sync tags for pages where PTE is untagged")
Cc: <stable@vger.kernel.org> # 5.14.x
Cc: Steven Price <steven.price@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/0000000000004387dc05e5888ae5@google.com/
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20221006163354.3194102-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-12 10:00:19 +01:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds
f23cdfcd04 IOMMU Updates for Linux v6.1:
Including:
 
 	- Removal of the bus_set_iommu() interface which became
 	  unnecesary because of IOMMU per-device probing
 
 	- Make the dma-iommu.h header private
 
 	- Intel VT-d changes from Lu Baolu:
 	  - Decouple PASID and PRI from SVA
 	  - Add ESRTPS & ESIRTPS capability check
 	  - Cleanups
 
 	- Apple DART support for the M1 Pro/MAX SOCs
 
 	- Support for AMD IOMMUv2 page-tables for the DMA-API layer. The
 	  v2 page-tables are compatible with the x86 CPU page-tables.
 	  Using them for DMA-API prepares support for hardware-assisted
 	  IOMMU virtualization
 
 	- Support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver
 
 	- Some smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmNEC5oACgkQK/BELZcB
 GuNcOQ/6A5SXmcvDRLYZW1ENM5Z6xsZ1LabSZkjhYSpmbJyu8Uny/Z2aRWqxPMLJ
 hJeHTsWSLhrTq1VfjFhELHB3kgT2DRr7H3LXXaMNC6qz690EcavX1wKX2AxH0m22
 8YrktkyAmFQ3BG6rsQLdlMMasLph/x06ix/xO9opQZVFdj/fV0Jx7ekX1JK+U3hx
 MI96i5W3G5PBVHBypAvjxSlmA4saj9Fhk7l3IZL7py9AOKz7NypuwWRs+86PMBiO
 EzLt5aF4g8pmKChF/c9BsoIbjBYvTG/s3NbycIng0ACc2SOvf+EvtoVZQclWifbT
 lwti9PLdsoVUnPOZHLYOTx4xSf/UyoLVzaLxJ52aoXnNYe2qaX5DANXhT2mWIY/Y
 z1mzOkShmK7WF7a8arRyqJeLJ4SvDx8GrbvLiom3DAzmqVHzzFGadHtt5fvGYN4F
 Jet/JIN3HjECQbamqtPBpWquBFhLmgusPksIiyMFscRvYdZqkaVkTkElcF3WqAMm
 QkeecfoTQ9Vdtdz44ZVLRjKpS77yRZmHshp1r/rfSI+9Ok8uRI+xmmcyrAI6ElqH
 DH14tLHPzw694rTHF+bTCd+pPMGOoFLi0xAfUXAeGWm1uzC1JIRrVu5JeQNOUOSD
 5SQDXB7dPrhXngaws5Fx2u3amCO3688mslcGgM7q54kC+LyVo0E=
 =h0sT
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - remove the bus_set_iommu() interface which became unnecesary because
   of IOMMU per-device probing

 - make the dma-iommu.h header private

 - Intel VT-d changes from Lu Baolu:
	  - Decouple PASID and PRI from SVA
	  - Add ESRTPS & ESIRTPS capability check
	  - Cleanups

 - Apple DART support for the M1 Pro/MAX SOCs

 - support for AMD IOMMUv2 page-tables for the DMA-API layer.

   The v2 page-tables are compatible with the x86 CPU page-tables. Using
   them for DMA-API prepares support for hardware-assisted IOMMU
   virtualization

 - support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver

 - some smaller fixes and cleanups

* tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (59 commits)
  iommu/vt-d: Avoid unnecessary global DMA cache invalidation
  iommu/vt-d: Avoid unnecessary global IRTE cache invalidation
  iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support
  iommu/vt-d: Remove pasid_set_eafe()
  iommu/vt-d: Decouple PASID & PRI enabling from SVA
  iommu/vt-d: Remove unnecessary SVA data accesses in page fault path
  dt-bindings: iommu: arm,smmu-v3: Relax order of interrupt names
  iommu: dart: Support t6000 variant
  iommu/io-pgtable-dart: Add DART PTE support for t6000
  iommu/io-pgtable: Add DART subpage protection support
  iommu/io-pgtable: Move Apple DART support to its own file
  iommu/mediatek: Add support for MT6795 Helio X10 M4Us
  iommu/mediatek: Introduce new flag TF_PORT_TO_ADDR_MT8173
  dt-bindings: mediatek: Add bindings for MT6795 M4U
  iommu/iova: Fix module config properly
  iommu/amd: Fix sparse warning
  iommu/amd: Remove outdated comment
  iommu/amd: Free domain ID after domain_flush_pages
  iommu/amd: Free domain id in error path
  iommu/virtio: Fix compile error with viommu_capable()
  ...
2022-10-10 13:20:53 -07:00
Linus Torvalds
18fd049731 arm64 updates for 6.1:
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
   vector granule register added to the user regs together with SVE perf
   extensions documentation.
 
 - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI documentation
   to match the actual kernel behaviour (zeroing the registers on syscall
   rather than "zeroed or preserved" previously).
 
 - More conversions to automatic system registers generation.
 
 - vDSO: use self-synchronising virtual counter access in gettimeofday()
   if the architecture supports it.
 
 - arm64 stacktrace cleanups and improvements.
 
 - arm64 atomics improvements: always inline assembly, remove LL/SC
   trampolines.
 
 - Improve the reporting of EL1 exceptions: rework BTI and FPAC exception
   handling, better EL1 undefs reporting.
 
 - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
   result.
 
 - arm64 defconfig updates: build CoreSight as a module, enable options
   necessary for docker, memory hotplug/hotremove, enable all PMUs
   provided by Arm.
 
 - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
   extensions).
 
 - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
   unused function.
 
 - kselftest updates for arm64: simple HWCAP validation, FP stress test
   improvements, validation of ZA regs in signal handlers, include larger
   SVE and SME vector lengths in signal tests, various cleanups.
 
 - arm64 alternatives (code patching) improvements to robustness and
   consistency: replace cpucap static branches with equivalent
   alternatives, associate callback alternatives with a cpucap.
 
 - Miscellaneous updates: optimise kprobe performance of patching
   single-step slots, simplify uaccess_mask_ptr(), move MTE registers
   initialisation to C, support huge vmalloc() mappings, run softirqs on
   the per-CPU IRQ stack, compat (arm32) misalignment fixups for
   multiword accesses.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmM9W4cACgkQa9axLQDI
 XvEy3w/+LJ3KCFowWiz5gTAWikjv+UVssHjLMJixn47V7hsEFQ26Xnam/438rTMI
 kE95u6DHUpw2SMIxKzFRO7oI5cQtP+cWGwTtOUnjVO+U1oN+HqDOIbO9DbylWDcU
 eeeqMMmawMfTPuZrYklpOhXscsorbrKIvYBg7wHYOcwBYV3EPhWr89lwMvTVRuyJ
 qpX628KlkGMaBcONNhv3nS3qZcAOs0oHQCAVS4C8czLDL+vtJlumXUS3xr1Mqm72
 xtFe7sje8Djr2kZ8mzh0GbFiZEBoBD3F/l7ayq8gVRaVpToUt8sk36Stjs4LojF1
 6imuAfji/5TItkScq5KhGqj6MIugwp/eUVbRN74OLNTYx7msF1ZADNFQ+Q0UuY0H
 SYK13KvmOji0xjS8qAfhqrwNB79sk3fb+zF9LjETbdz4ZJCgg9gcFbSUTY0DvMfS
 MXZk/jVeB07olA8xYbjh0BRt4UV9xU628FPQzK5k7e4Nzl4jSvgtJZCZanfuVtjy
 /ZS1vbN8o7tQLBAlVnw+Exi/VedkKxkkMgm8tPKsMgERTFDx0Pc4Gs72hRpDnPWT
 MRbeCCGleAf3JQ5vF0coBDNOCEVvweQgShHOyHTz0GyhWXLCFx3RJICo5I4EIpps
 LLUk4JK0fO3LVrf1AEpu5ZP4+Sact0zfsH3gB7qyLPYFDmjDXD8=
 =jl3Z
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
   vector granule register added to the user regs together with SVE perf
   extensions documentation.

 - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI
   documentation to match the actual kernel behaviour (zeroing the
   registers on syscall rather than "zeroed or preserved" previously).

 - More conversions to automatic system registers generation.

 - vDSO: use self-synchronising virtual counter access in gettimeofday()
   if the architecture supports it.

 - arm64 stacktrace cleanups and improvements.

 - arm64 atomics improvements: always inline assembly, remove LL/SC
   trampolines.

 - Improve the reporting of EL1 exceptions: rework BTI and FPAC
   exception handling, better EL1 undefs reporting.

 - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
   result.

 - arm64 defconfig updates: build CoreSight as a module, enable options
   necessary for docker, memory hotplug/hotremove, enable all PMUs
   provided by Arm.

 - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
   extensions).

 - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
   unused function.

 - kselftest updates for arm64: simple HWCAP validation, FP stress test
   improvements, validation of ZA regs in signal handlers, include
   larger SVE and SME vector lengths in signal tests, various cleanups.

 - arm64 alternatives (code patching) improvements to robustness and
   consistency: replace cpucap static branches with equivalent
   alternatives, associate callback alternatives with a cpucap.

 - Miscellaneous updates: optimise kprobe performance of patching
   single-step slots, simplify uaccess_mask_ptr(), move MTE registers
   initialisation to C, support huge vmalloc() mappings, run softirqs on
   the per-CPU IRQ stack, compat (arm32) misalignment fixups for
   multiword accesses.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits)
  arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
  arm64/kprobe: Optimize the performance of patching single-step slot
  arm64: defconfig: Add Coresight as module
  kselftest/arm64: Handle EINTR while reading data from children
  kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
  kselftest/arm64: Don't repeat termination handler for fp-stress
  ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
  arm64/mm: fold check for KFENCE into can_set_direct_map()
  arm64: ftrace: fix module PLTs with mcount
  arm64: module: Remove unused plt_entry_is_initialized()
  arm64: module: Make plt_equals_entry() static
  arm64: fix the build with binutils 2.27
  kselftest/arm64: Don't enable v8.5 for MTE selftest builds
  arm64: uaccess: simplify uaccess_mask_ptr()
  arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
  kselftest/arm64: Fix typo in hwcap check
  arm64: mte: move register initialization to C
  arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
  arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
  arm64/sve: Add Perf extensions documentation
  ...
2022-10-06 11:51:49 -07:00
Linus Torvalds
865dad2022 kcfi updates for v6.1-rc1
This replaces the prior support for Clang's standard Control Flow
 Integrity (CFI) instrumentation, which has required a lot of special
 conditions (e.g. LTO) and work-arounds. The current implementation
 ("Kernel CFI") is specific to C, directly designed for the Linux kernel,
 and takes advantage of architectural features like x86's IBT. This
 series retains arm64 support and adds x86 support. Additional "generic"
 architectural support is expected soon:
 https://github.com/samitolvanen/llvm-project/commits/kcfi_generic
 
 - treewide: Remove old CFI support details
 
 - arm64: Replace Clang CFI support with Clang KCFI support
 
 - x86: Introduce Clang KCFI support
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4aAUWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJkgWD/4mUgb7xewNIG/+fuipGd620Iao
 K0T8q4BNxLNRltOxNc3Q0WMDCggX0qJGCeds7EdFQJQOGxWcbifM8MAS4idAGM0G
 fc3Gxl1imC/oF6goCAbQgndA6jYFIWXGsv8LsRjAXRidWLFr3GFAqVqYJyokSySr
 8zMQsEDuF4I1gQnOhEWdtPZbV3MQ4ZjfFzpv+33agbq6Gb72vKvDh3G6g2VXlxjt
 1qnMtS+eEpbBU65cJkOi4MSLgymWbnIAeTMb0dbsV4kJ08YoTl8uz1B+weeH6GgT
 WP73ZJ4nqh1kkkT9EqS9oKozNB9fObhvCokEuAjuQ7i1eCEZsbShvRc0iL7OKTGG
 UfuTJa5qQ4h7Z0JS35FCSJETa+fcG0lTyEd133nLXLMZP9K2antf+A6O//fd0J1V
 Jg4VN7DQmZ+UNGOzRkL6dTtQUy4PkxhniIloaClfSYXxhNirA+v//sHTnTK3z2Bl
 6qceYqmFmns2Laual7+lvnZgt6egMBcmAL/MOdbU74+KIR9Xw76wxQjifktHX+WF
 FEUQkUJDB5XcUyKlbvHoqobRMxvEZ8RIlC5DIkgFiPRE3TI0MqfzNSFnQ/6+lFNg
 Y0AS9HYJmcj8sVzAJ7ji24WPFCXzsbFn6baJa9usDNbWyQZokYeiv7ZPNPHPDVrv
 YEBP6aYko0lVSUS9qw==
 =Li4D
 -----END PGP SIGNATURE-----

Merge tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kcfi updates from Kees Cook:
 "This replaces the prior support for Clang's standard Control Flow
  Integrity (CFI) instrumentation, which has required a lot of special
  conditions (e.g. LTO) and work-arounds.

  The new implementation ("Kernel CFI") is specific to C, directly
  designed for the Linux kernel, and takes advantage of architectural
  features like x86's IBT. This series retains arm64 support and adds
  x86 support.

  GCC support is expected in the future[1], and additional "generic"
  architectural support is expected soon[2].

  Summary:

   - treewide: Remove old CFI support details

   - arm64: Replace Clang CFI support with Clang KCFI support

   - x86: Introduce Clang KCFI support"

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107048 [1]
Link: https://github.com/samitolvanen/llvm-project/commits/kcfi_generic [2]

* tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
  x86: Add support for CONFIG_CFI_CLANG
  x86/purgatory: Disable CFI
  x86: Add types to indirectly called assembly functions
  x86/tools/relocs: Ignore __kcfi_typeid_ relocations
  kallsyms: Drop CONFIG_CFI_CLANG workarounds
  objtool: Disable CFI warnings
  objtool: Preserve special st_shndx indexes in elf_update_symbol
  treewide: Drop __cficanonical
  treewide: Drop WARN_ON_FUNCTION_MISMATCH
  treewide: Drop function_nocfi
  init: Drop __nocfi from __init
  arm64: Drop unneeded __nocfi attributes
  arm64: Add CFI error handling
  arm64: Add types to indirect called assembly functions
  psci: Fix the function type for psci_initcall_t
  lkdtm: Emit an indirect call for CFI tests
  cfi: Add type helper macros
  cfi: Switch to -fsanitize=kcfi
  cfi: Drop __CFI_ADDRESSABLE
  cfi: Remove CONFIG_CFI_CLANG_SHADOW
  ...
2022-10-03 17:11:07 -07:00
Catalin Marinas
53630a1f61 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  : Miscellaneous patches
  arm64/kprobe: Optimize the performance of patching single-step slot
  ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
  arm64/mm: fold check for KFENCE into can_set_direct_map()
  arm64: uaccess: simplify uaccess_mask_ptr()
  arm64: mte: move register initialization to C
  arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
  arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
  arm64: support huge vmalloc mappings
  arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually
  arm64: run softirqs on the per-CPU IRQ stack
  arm64: compat: Implement misalignment fixups for multiword loads
2022-09-30 09:18:26 +01:00
Mike Rapoport
b9dd04a20f arm64/mm: fold check for KFENCE into can_set_direct_map()
KFENCE requires linear map to be mapped at page granularity, so that it
is possible to protect/unprotect single pages, just like with
rodata_full and DEBUG_PAGEALLOC.

Instead of repating

	can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE)

make can_set_direct_map() handle the KFENCE case.

This also prevents potential false positives in kernel_page_present()
that may return true for non-present page if CONFIG_KFENCE is enabled.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220921074841.382615-1-rppt@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-29 17:59:28 +01:00
Peter Xu
0d206b5d2e mm/swap: add swp_offset_pfn() to fetch PFN from swap entry
We've got a bunch of special swap entries that stores PFN inside the swap
offset fields.  To fetch the PFN, normally the user just calls
swp_offset() assuming that'll be the PFN.

Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the
max possible length of a PFN on the host, meanwhile doing proper check
with MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the
PFNs properly always using the BUILD_BUG_ON() in is_pfn_swap_entry().

One reason to do so is we never tried to sanitize whether swap offset can
really fit for storing PFN.  At the meantime, this patch also prepares us
with the future possibility to store more information inside the swp
offset field, so assuming "swp_offset(entry)" to be the PFN will not stand
any more very soon.

Replace many of the swp_offset() callers to use swp_offset_pfn() where
proper.  Note that many of the existing users are not candidates for the
replacement, e.g.:

  (1) When the swap entry is not a pfn swap entry at all, or,
  (2) when we wanna keep the whole swp_offset but only change the swp type.

For the latter, it can happen when fork() triggered on a write-migration
swap entry pte, we may want to only change the migration type from
write->read but keep the rest, so it's not "fetching PFN" but "changing
swap type only".  They're left aside so that when there're more
information within the swp offset they'll be carried over naturally in
those cases.

Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what
the new swp_offset_pfn() is about.

Link: https://lkml.kernel.org/r/20220811161331.37055-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:05 -07:00
Sami Tolvanen
c50d32859e arm64: Add types to indirect called assembly functions
With CONFIG_CFI_CLANG, assembly functions indirectly called from C
code must be annotated with type identifiers to pass CFI checking. Use
SYM_TYPED_FUNC_START for the indirectly called functions, and ensure
we emit `bti c` also with SYM_TYPED_FUNC_START.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-10-samitolvanen@google.com
2022-09-26 10:13:13 -07:00
Joerg Roedel
38713c6028 Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-09-26 15:52:31 +02:00
Peter Collingbourne
973b9e3733 arm64: mte: move register initialization to C
If FEAT_MTE2 is disabled via the arm64.nomte command line argument on a
CPU that claims to support FEAT_MTE2, the kernel will use Tagged Normal
in the MAIR. If we interpret arm64.nomte to mean that the CPU does not
in fact implement FEAT_MTE2, setting the system register like this may
lead to UNSPECIFIED behavior. Fix it by arranging for MAIR to be set
in the C function cpu_enable_mte which is called based on the sanitized
version of the system register.

There is no need for the rest of the MTE-related system register
initialization to happen from assembly, with the exception of TCR_EL1,
which must be set to include at least TBI1 because the secondary CPUs
access KASan-allocated data structures early. Therefore, make the TCR_EL1
initialization unconditional and move the rest of the initialization to
cpu_enable_mte so that we no longer have a dependency on the unsanitized
ID register value.

Co-developed-by: Evgenii Stepanov <eugenis@google.com>
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 3b714d24ef ("arm64: mte: CPU feature detection and initial sysreg configuration")
Cc: <stable@vger.kernel.org> # 5.10.x
Link: https://lore.kernel.org/r/20220915222053.3484231-1-eugenis@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22 18:02:50 +01:00
Kefeng Wang
739e49e0fc arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
Directly check ARM64_SWAPPER_USES_SECTION_MAPS to choose base page
or PMD level huge page mapping in vmemmap_populate() to simplify
code a bit.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220920014951.196191-1-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22 16:25:51 +01:00
Will Deacon
c44094eee3 arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
arch_dma_prep_coherent() is called when preparing a non-cacheable region
for a consistent DMA buffer allocation. Since the buffer pages may
previously have been written via a cacheable mapping and consequently
allocated as dirty cachelines, the purpose of this function is to remove
these dirty lines from the cache, writing them back so that the
non-coherent device is able to see them.

On arm64, this operation can be achieved with a clean to the point of
coherency; a subsequent invalidation is not required and serves little
purpose in the presence of a cacheable alias (e.g. the linear map),
since clean lines can be speculatively fetched back into the cache after
the invalidation operation has completed.

Relax the cache maintenance in arch_dma_prep_coherent() so that only a
clean, and not a clean-and-invalidate operation is performed.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220823122111.17439-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22 15:09:31 +01:00
Mark Rutland
61d2d1808b arm64: mm: don't acquire mutex when rewriting swapper
Since commit:

  47546a1912 ("arm64: mm: install KPTI nG mappings with MMU enabled)"

... when building with CONFIG_DEBUG_ATOMIC_SLEEP=y and booting under
QEMU TCG with '-cpu max', there's a boot-time splat:

| BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 15, name: migration/0
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| no locks held by migration/0/15.
| irq event stamp: 28
| hardirqs last  enabled at (27): [<ffff8000091ed180>] _raw_spin_unlock_irq+0x3c/0x7c
| hardirqs last disabled at (28): [<ffff8000081b8d74>] multi_cpu_stop+0x150/0x18c
| softirqs last  enabled at (0): [<ffff80000809a314>] copy_process+0x594/0x1964
| softirqs last disabled at (0): [<0000000000000000>] 0x0
| CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.0.0-rc3-00002-g419b42ff7eef #3
| Hardware name: linux,dummy-virt (DT)
| Stopper: multi_cpu_stop+0x0/0x18c <- stop_cpus.constprop.0+0xa0/0xfc
| Call trace:
|  dump_backtrace.part.0+0xd0/0xe0
|  show_stack+0x1c/0x5c
|  dump_stack_lvl+0x88/0xb4
|  dump_stack+0x1c/0x38
|  __might_resched+0x180/0x230
|  __might_sleep+0x4c/0xa0
|  __mutex_lock+0x5c/0x450
|  mutex_lock_nested+0x30/0x40
|  create_kpti_ng_temp_pgd+0x4fc/0x6d0
|  kpti_install_ng_mappings+0x2b8/0x3b0
|  cpu_enable_non_boot_scope_capabilities+0x7c/0xd0
|  multi_cpu_stop+0xa0/0x18c
|  cpu_stopper_thread+0x88/0x11c
|  smpboot_thread_fn+0x1ec/0x290
|  kthread+0x118/0x120
|  ret_from_fork+0x10/0x20

Since commit:

  ee017ee353 ("arm64/mm: avoid fixmap race condition when create pud mapping")

... once the kernel leave the SYSTEM_BOOTING state, the fixmap pagetable
entries are protected by the fixmap_lock mutex.

The new KPTI rewrite code uses __create_pgd_mapping() to create a
temporary pagetable. This happens in atomic context, after secondary
CPUs are brought up and the kernel has left the SYSTEM_BOOTING state.
Hence we try to acquire a mutex in atomic context, which is generally
unsound (though benign in this case as the mutex should be free and all
other CPUs are quiescent).

This patch avoids the issue by pulling the mutex out of alloc_init_pud()
and calling it at a higher level in the pagetable manipulation code.
This allows it to be used without locking where one CPU is known to be
in exclusive control of the machine, even after having left the
SYSTEM_BOOTING state.

Fixes: 47546a1912 ("arm64: mm: install KPTI nG mappings with MMU enabled")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220920134731.1625740-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 12:55:39 +01:00
Mark Brown
2e75b393ff arm64/sysreg: Standardise naming for MTE feature enumeration
In preparation for conversion to automatic generation refresh the names
given to the items in the MTE feture enumeration to reflect our standard
pattern for naming, corresponding to the architecture feature names they
reflect. No functional change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-17-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 10:59:04 +01:00
Mark Brown
07d7d848b9 arm64/sysreg: Standardise naming of ID_AA64MMFR0_EL1.ASIDBits
For some reason we refer to ID_AA64MMFR0_EL1.ASIDBits as ASID. Add BITS
into the name, bringing the naming into sync with DDI0487H.a. Due to the
large amount of MixedCase in this register which isn't really consistent
with either the kernel style or the majority of the architecture the use of
upper case is preserved. No functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-10-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 10:59:02 +01:00
Mark Brown
6ca2b9ca45 arm64/sysreg: Add _EL1 into ID_AA64PFR1_EL1 constant names
Our standard is to include the _EL1 in the constant names for registers but
we did not do that for ID_AA64PFR1_EL1, update to do so in preparation for
conversion to automatic generation. No functional change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-8-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 10:59:02 +01:00
Mark Brown
2d987e64e8 arm64/sysreg: Add _EL1 into ID_AA64MMFR0_EL1 definition names
Normally we include the full register name in the defines for fields within
registers but this has not been followed for ID registers. In preparation
for automatic generation of defines add the _EL1s into the defines for
ID_AA64MMFR0_EL1 to follow the convention. No functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20220905225425.1871461-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 10:59:02 +01:00
Robin Murphy
fa49364cd5 iommu/dma: Move public interfaces to linux/iommu.h
The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with
only a couple more public interfaces left pertaining to MSI integration.
Since these depend on the main IOMMU API header anyway, move their
declarations there, taking the opportunity to update the half-baked
comments to proper kerneldoc along the way.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:47:00 +02:00
Ard Biesheuvel
3fc24ef32d arm64: compat: Implement misalignment fixups for multiword loads
The 32-bit ARM kernel implements fixups on behalf of user space when
using LDM/STM or LDRD/STRD instructions on addresses that are not 32-bit
aligned. This is not something that is supported by the architecture,
but was done anyway to increase compatibility with user space software,
which mostly targeted x86 at the time and did not care about aligned
accesses.

This feature is one of the remaining impediments to being able to switch
to 64-bit kernels on 64-bit capable hardware running 32-bit user space,
so let's implement it for the arm64 compat layer as well.

Note that the intent is to implement the exact same handling of
misaligned multi-word loads and stores as the 32-bit kernel does,
including what appears to be missing support for user space programs
that rely on SETEND to switch to a different byte order and back. Also,
like the 32-bit ARM version, we rely on the faulting address reported by
the CPU to infer the memory address, instead of decoding the instruction
fully to obtain this information.

This implementation is taken from the 32-bit ARM tree, with all pieces
removed that deal with instructions other than LDRD/STRD and LDM/STM, or
that deal with alignment exceptions taken in kernel mode.

Cc: debian-arm@lists.debian.org
Cc: Vagrant Cascadian <vagrant@debian.org>
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Steve McIntyre <steve@einval.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20220701135322.3025321-1-ardb@kernel.org
[catalin.marinas@arm.com: change the option to 'default n']
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-06 09:34:53 +01:00
Mark Rutland
2e8cff0a0e arm64: fix rodata=full
On arm64, "rodata=full" has been suppored (but not documented) since
commit:

  c55191e96c ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")

As it's necessary to determine the rodata configuration early during
boot, arm64 has an early_param() handler for this, whereas init/main.c
has a __setup() handler which is run later.

Unfortunately, this split meant that since commit:

  f9a40b0890 ("init/main.c: return 1 from handled __setup() functions")

... passing "rodata=full" would result in a spurious warning from the
__setup() handler (though RO permissions would be configured
appropriately).

Further, "rodata=full" has been broken since commit:

  0d6ea3ac94 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")

... which caused strtobool() to parse "full" as false (in addition to
many other values not documented for the "rodata=" kernel parameter.

This patch fixes this breakage by:

* Moving the core parameter parser to an __early_param(), such that it
  is available early.

* Adding an (optional) arch hook which arm64 can use to parse "full".

* Updating the documentation to mention that "full" is valid for arm64.

* Having the core parameter parser handle "on" and "off" explicitly,
  such that any undocumented values (e.g. typos such as "ful") are
  reported as errors rather than being silently accepted.

Note that __setup() and early_param() have opposite conventions for
their return values, where __setup() uses 1 to indicate a parameter was
handled and early_param() uses 0 to indicate a parameter was handled.

Fixes: f9a40b0890 ("init/main.c: return 1 from handled __setup() functions")
Fixes: 0d6ea3ac94 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:02:02 +01:00
Muchun Song
2da1c30929 mm: hugetlb_vmemmap: delete hugetlb_optimize_vmemmap_enabled()
Patch series "Simplify hugetlb vmemmap and improve its readability", v2.

This series aims to simplify hugetlb vmemmap and improve its readability.


This patch (of 8):

The name hugetlb_optimize_vmemmap_enabled() a bit confusing as it tests
two conditions (enabled and pages in use).  Instead of coming up to an
appropriate name, we could just delete it.  There is already a discussion
about deleting it in thread [1].

There is only one user of hugetlb_optimize_vmemmap_enabled() outside of
hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c. 
However, it does not need to call hugetlb_optimize_vmemmap_enabled() in
flush_dcache_page() since HugeTLB pages are always fully mapped and only
head page will be set PG_dcache_clean meaning only head page's flag may
need to be cleared (see commit cf5a501d98).  So it is easy to remove
hugetlb_optimize_vmemmap_enabled().

Link: https://lore.kernel.org/all/c77c61c8-8a5a-87e8-db89-d04d8aaab4cc@oracle.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:42 -07:00
Linus Torvalds
6614a3c316 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
 
 - Some kmemleak fixes from Patrick Wang and Waiman Long
 
 - DAMON updates from SeongJae Park
 
 - memcg debug/visibility work from Roman Gushchin
 
 - vmalloc speedup from Uladzislau Rezki
 
 - more folio conversion work from Matthew Wilcox
 
 - enhancements for coherent device memory mapping from Alex Sierra
 
 - addition of shared pages tracking and CoW support for fsdax, from
   Shiyang Ruan
 
 - hugetlb optimizations from Mike Kravetz
 
 - Mel Gorman has contributed some pagealloc changes to improve latency
   and realtime behaviour.
 
 - mprotect soft-dirty checking has been improved by Peter Xu
 
 - Many other singleton patches all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
 jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
 SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
 =w/UH
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
2022-08-05 16:32:45 -07:00
Linus Torvalds
97a77ab14f EFI updates for v5.20
- Enable mirrored memory for arm64
 - Fix up several abuses of the efivar API
 - Refactor the efivar API in preparation for moving the 'business logic'
   part of it into efivarfs
 - Enable ACPI PRM on arm64
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmLhuDIACgkQw08iOZLZ
 jyS9IQv/Wc2nhjN50S3gfrL+68/el/hGdP/J0FK5BOOjNosG2t1ZNYZtSthXqpPH
 hRrTU2m6PpQUalRpFDyLiHkJvdBFQe4VmvrzBa3TIBIzyflLQPJzkWrqThPchV+B
 qi4lmCtTDNIEJmayewqx1wWA+QmUiyI5zJ8wrZp84LTctBPL75seVv0SB20nqai0
 3/I73omB2RLVGpCpeWvb++vePXL8euFW3FEwCTM8hRboICjORTyIZPy8Y5os+3xT
 UgrIgVDOtn1Xwd4tK0qVwjOVA51east4Fcn3yGOrL40t+3SFm2jdpAJOO3UvyNPl
 vkbtjvXsIjt3/oxreKxXHLbamKyueWIfZRyCLsrg6wrr96oypPk6ID4iDCQoen/X
 Zf0VjM2vmvSd4YgnEIblOfSBxVg48cHJA4iVHVxFodNTrVnzGGFYPTmNKmJqo+Xn
 JeUILM7jlR4h/t0+cTTK3Busu24annTuuz5L5rjf4bUm6pPf4crb1yJaFWtGhlpa
 er233D6O
 =zI0R
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

 - Enable mirrored memory for arm64

 - Fix up several abuses of the efivar API

 - Refactor the efivar API in preparation for moving the 'business
   logic' part of it into efivarfs

 - Enable ACPI PRM on arm64

* tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  ACPI: Move PRM config option under the main ACPI config
  ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64
  ACPI: PRM: Change handler_addr type to void pointer
  efi: Simplify arch_efi_call_virt() macro
  drivers: fix typo in firmware/efi/memmap.c
  efi: vars: Drop __efivar_entry_iter() helper which is no longer used
  efi: vars: Use locking version to iterate over efivars linked lists
  efi: pstore: Omit efivars caching EFI varstore access layer
  efi: vars: Add thin wrapper around EFI get/set variable interface
  efi: vars: Don't drop lock in the middle of efivar_init()
  pstore: Add priv field to pstore_record for backend specific use
  Input: applespi - avoid efivars API and invoke EFI services directly
  selftests/kexec: remove broken EFI_VARS secure boot fallback check
  brcmfmac: Switch to appropriate helper to load EFI variable contents
  iwlwifi: Switch to proper EFI variable store interface
  media: atomisp_gmin_platform: stop abusing efivar API
  efi: efibc: avoid efivar API for setting variables
  efi: avoid efivars layer when loading SSDTs from variables
  efi: Correct comment on efi_memmap_alloc
  memblock: Disable mirror feature if kernelcore is not specified
  ...
2022-08-03 14:38:02 -07:00
Linus Torvalds
0cec3f24a7 arm64 updates for 5.20
- Remove unused generic cpuidle support (replaced by PSCI version)
 
 - Fix documentation describing the kernel virtual address space
 
 - Handling of some new CPU errata in Arm implementations
 
 - Rework of our exception table code in preparation for handling
   machine checks (i.e. RAS errors) more gracefully
 
 - Switch over to the generic implementation of ioremap()
 
 - Fix lockdep tracking in NMI context
 
 - Instrument our memory barrier macros for KCSAN
 
 - Rework of the kPTI G->nG page-table repainting so that the MMU remains
   enabled and the boot time is no longer slowed to a crawl for systems
   which require the late remapping
 
 - Enable support for direct swapping of 2MiB transparent huge-pages on
   systems without MTE
 
 - Fix handling of MTE tags with allocating new pages with HW KASAN
 
 - Expose the SMIDR register to userspace via sysfs
 
 - Continued rework of the stack unwinder, particularly improving the
   behaviour under KASAN
 
 - More repainting of our system register definitions to match the
   architectural terminology
 
 - Improvements to the layout of the vDSO objects
 
 - Support for allocating additional bits of HWCAP2 and exposing
   FEAT_EBF16 to userspace on CPUs that support it
 
 - Considerable rework and optimisation of our early boot code to reduce
   the need for cache maintenance and avoid jumping in and out of the
   kernel when handling relocation under KASLR
 
 - Support for disabling SVE and SME support on the kernel command-line
 
 - Support for the Hisilicon HNS3 PMU
 
 - Miscellanous cleanups, trivial updates and minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmLeccUQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNCysB/4ml92RJLhVwRAofbtFfVgVz3JLTSsvob9x
 Z7FhNDxfM/G32wKtOHU9tHkGJ+PMVWOPajukzxkMhxmilfTyHBbiisNWVRjKQxj4
 wrd07DNXPIv3bi8SWzS1y2y8ZqujZWjNJlX8SUCzEoxCVtuNKwrh96kU1jUjrkFZ
 kBo4E4wBWK/qW29nClGSCgIHRQNJaB/jvITlQhkqIb0pwNf3sAUzW7QoF1iTZWhs
 UswcLh/zC4q79k9poegdCt8chV5OBDLtLPnMxkyQFvsLYRp3qhyCSQQY/BxvO5JS
 jT9QR6d+1ewET9BFhqHlIIuOTYBCk3xn/PR9AucUl+ZBQd2tO4B1
 =LVH0
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "Highlights include a major rework of our kPTI page-table rewriting
  code (which makes it both more maintainable and considerably faster in
  the cases where it is required) as well as significant changes to our
  early boot code to reduce the need for data cache maintenance and
  greatly simplify the KASLR relocation dance.

  Summary:

   - Remove unused generic cpuidle support (replaced by PSCI version)

   - Fix documentation describing the kernel virtual address space

   - Handling of some new CPU errata in Arm implementations

   - Rework of our exception table code in preparation for handling
     machine checks (i.e. RAS errors) more gracefully

   - Switch over to the generic implementation of ioremap()

   - Fix lockdep tracking in NMI context

   - Instrument our memory barrier macros for KCSAN

   - Rework of the kPTI G->nG page-table repainting so that the MMU
     remains enabled and the boot time is no longer slowed to a crawl
     for systems which require the late remapping

   - Enable support for direct swapping of 2MiB transparent huge-pages
     on systems without MTE

   - Fix handling of MTE tags with allocating new pages with HW KASAN

   - Expose the SMIDR register to userspace via sysfs

   - Continued rework of the stack unwinder, particularly improving the
     behaviour under KASAN

   - More repainting of our system register definitions to match the
     architectural terminology

   - Improvements to the layout of the vDSO objects

   - Support for allocating additional bits of HWCAP2 and exposing
     FEAT_EBF16 to userspace on CPUs that support it

   - Considerable rework and optimisation of our early boot code to
     reduce the need for cache maintenance and avoid jumping in and out
     of the kernel when handling relocation under KASLR

   - Support for disabling SVE and SME support on the kernel
     command-line

   - Support for the Hisilicon HNS3 PMU

   - Miscellanous cleanups, trivial updates and minor fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits)
  arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}
  arm64: fix KASAN_INLINE
  arm64/hwcap: Support FEAT_EBF16
  arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long
  arm64/hwcap: Document allocation of upper bits of AT_HWCAP
  arm64: enable THP_SWAP for arm64
  arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
  arm64: errata: Remove AES hwcap for COMPAT tasks
  arm64: numa: Don't check node against MAX_NUMNODES
  drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
  perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
  docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
  arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
  mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
  mm: kasan: Skip unpoisoning of user pages
  mm: kasan: Ensure the tags are visible before the tag in page->flags
  drivers/perf: hisi: add driver for HNS3 PMU
  drivers/perf: hisi: Add description for HNS3 PMU driver
  drivers/perf: riscv_pmu_sbi: perf format
  perf/arm-cci: Use the bitmap API to allocate bitmaps
  ...
2022-08-01 10:37:00 -07:00
Will Deacon
f96d67a8af Merge branch 'for-next/boot' into for-next/core
* for-next/boot: (34 commits)
  arm64: fix KASAN_INLINE
  arm64: Add an override for ID_AA64SMFR0_EL1.FA64
  arm64: Add the arm64.nosve command line option
  arm64: Add the arm64.nosme command line option
  arm64: Expose a __check_override primitive for oddball features
  arm64: Allow the idreg override to deal with variable field width
  arm64: Factor out checking of a feature against the override into a macro
  arm64: Allow sticky E2H when entering EL1
  arm64: Save state of HCR_EL2.E2H before switch to EL1
  arm64: Rename the VHE switch to "finalise_el2"
  arm64: mm: fix booting with 52-bit address space
  arm64: head: remove __PHYS_OFFSET
  arm64: lds: use PROVIDE instead of conditional definitions
  arm64: setup: drop early FDT pointer helpers
  arm64: head: avoid relocating the kernel twice for KASLR
  arm64: kaslr: defer initialization to initcall where permitted
  arm64: head: record CPU boot mode after enabling the MMU
  arm64: head: populate kernel page tables with MMU and caches on
  arm64: head: factor out TTBR1 assignment into a macro
  arm64: idreg-override: use early FDT mapping in ID map
  ...
2022-07-25 10:59:15 +01:00
Will Deacon
c436500d9f Merge branch 'for-next/mte' into for-next/core
* for-next/mte:
  arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
  mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
  mm: kasan: Skip unpoisoning of user pages
  mm: kasan: Ensure the tags are visible before the tag in page->flags
2022-07-25 10:57:08 +01:00
Will Deacon
02eab44c71 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
  arm64: numa: Don't check node against MAX_NUMNODES
  arm64: mm: Remove assembly DMA cache maintenance wrappers
  arm64/mm: Define defer_reserve_crashkernel()
  arm64: fix oops in concurrently setting insn_emulation sysctls
  arm64: Do not forget syscall when starting a new thread.
  arm64: boot: add zstd support
2022-07-25 10:56:57 +01:00
Will Deacon
8184a8bc1c Merge branch 'for-next/kpti' into for-next/core
* for-next/kpti:
  arm64: correct the effect of mitigations off on kpti
  arm64: entry: simplify trampoline data page
  arm64: mm: install KPTI nG mappings with MMU enabled
  arm64: kpti-ng: simplify page table traversal logic
2022-07-25 10:56:49 +01:00
Will Deacon
84d8857af4 Merge branch 'for-next/ioremap' into for-next/core
* for-next/ioremap:
  arm64: Add HAVE_IOREMAP_PROT support
  arm64: mm: Convert to GENERIC_IOREMAP
  mm: ioremap: Add ioremap/iounmap_allowed()
  mm: ioremap: Setup phys_addr of struct vm_struct
  mm: ioremap: Use more sensible name in ioremap_prot()
  ARM: mm: kill unused runtime hook arch_iounmap()
2022-07-25 10:56:23 +01:00
Anshuman Khandual
42251045cc arm64/mm: move protection_map[] inside the platform
This moves protection_map[] inside the platform and makes it a static.

Link: https://lkml.kernel.org/r/20220711070600.2378316-6-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:37 -07:00
Baolin Wang
1bcdb769f9 arm64/hugetlb: implement arm64 specific hugetlb_mask_last_page
The HugeTLB address ranges are linearly scanned during fork, unmap and
remap operations, and the linear scan can skip to the end of range mapped
by the page table page if hitting a non-present entry, which can help to
speed linear scanning of the HugeTLB address ranges.

So hugetlb_mask_last_page() is introduced to help to update the address in
the loop of HugeTLB linear scanning with getting the last huge page mapped
by the associated page table page[1], when a non-present entry is
encountered.

Considering ARM64 specific cont-pte/pmd size HugeTLB, this patch
implemented an ARM64 specific hugetlb_mask_last_page() to help this case.

[1] https://lore.kernel.org/linux-mm/20220527225849.284839-1-mike.kravetz@oracle.com/

[baolin.wang@linux.alibaba.com: fix build]
  Link: https://lkml.kernel.org/r/a14e7b39-6a8a-4609-b4a1-84ac574f5c96@linux.alibaba.com
Link: https://lkml.kernel.org/r/20220621235620.291305-3-mike.kravetz@oracle.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:34 -07:00
Catalin Marinas
20794545c1 arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
This reverts commit e5b8d92189.

Pages mapped in user-space with PROT_MTE have the allocation tags either
zeroed or copied/restored to some user values. In order for the kernel
to access such pages via page_address(), resetting the tag in
page->flags was necessary. This tag resetting was deferred to
set_pte_at() -> mte_sync_page_tags() but it can race with another CPU
reading the flags (via page_to_virt()):

P0 (mte_sync_page_tags):	P1 (memcpy from virt_to_page):
				  Rflags!=0xff
  Wflags=0xff
  DMB (doesn't help)
  Wtags=0
				  Rtags=0   // fault

Since now the post_alloc_hook() function resets the page->flags tag when
unpoisoning is skipped for user pages (including the __GFP_ZEROTAGS
case), revert the arm64 commit calling page_kasan_tag_reset().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20220610152141.2148929-5-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-07 10:48:37 +01:00
Catalin Marinas
70c248aca9 mm: kasan: Skip unpoisoning of user pages
Commit c275c5c6d5 ("kasan: disable freed user page poisoning with HW
tags") added __GFP_SKIP_KASAN_POISON to GFP_HIGHUSER_MOVABLE. A similar
argument can be made about unpoisoning, so also add
__GFP_SKIP_KASAN_UNPOISON to user pages. To ensure the user page is
still accessible via page_address() without a kasan fault, reset the
page->flags tag.

With the above changes, there is no need for the arm64
tag_clear_highpage() to reset the page->flags tag.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20220610152141.2148929-3-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-07 10:48:37 +01:00
Will Deacon
7eacf1858b arm64: mm: Remove assembly DMA cache maintenance wrappers
Remove the __dma_{flush,map,unmap}_area assembly wrappers and call the
appropriate cache maintenance functions directly from the DMA mapping
callbacks.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220610151228.4562-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-05 13:06:31 +01:00
Anshuman Khandual
4890cc18f9 arm64/mm: Define defer_reserve_crashkernel()
Crash kernel memory reservation gets deferred, when either CONFIG_ZONE_DMA
or CONFIG_ZONE_DMA32 config is enabled on the platform. This deferral also
impacts overall linear mapping creation including the crash kernel itself.
Just encapsulate this deferral check in a new helper for better clarity.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220705062556.1845734-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-05 11:43:47 +01:00
Qi Zheng
18f3962953 mm: hugetlb: kill set_huge_swap_pte_at()
Commit e5251fd430 ("mm/hugetlb: introduce set_huge_swap_pte_at()
helper") add set_huge_swap_pte_at() to handle swap entries on
architectures that support hugepages consisting of contiguous ptes.  And
currently the set_huge_swap_pte_at() is only overridden by arm64.

set_huge_swap_pte_at() provide a sz parameter to help determine the number
of entries to be updated.  But in fact, all hugetlb swap entries contain
pfn information, so we can find the corresponding folio through the pfn
recorded in the swap entry, then the folio_size() is the number of entries
that need to be updated.

And considering that users will easily cause bugs by ignoring the
difference between set_huge_swap_pte_at() and set_huge_pte_at().  Let's
handle swap entries in set_huge_pte_at() and remove the
set_huge_swap_pte_at(), then we can call set_huge_pte_at() anywhere, which
simplifies our coding.

Link: https://lkml.kernel.org/r/20220626145717.53572-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:50 -07:00
Will Deacon
4109823037 arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptes
Commit fb396bb459 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
removed TLB invalidation from get_clear_flush() [now get_clear_contig()]
on the basis that the core TLB invalidation code is aware of hugetlb
mappings backed by contiguous page-table entries and will cover the
correct virtual address range.

However, this change also resulted in the TLB invalidation being removed
from the "break" step in the break-before-make (BBM) sequence used
internally by huge_ptep_set_{access_flags,wrprotect}(), therefore
making the BBM sequence unsafe irrespective of later invalidation.

Although the architecture is desperately unclear about how exactly
contiguous ptes should be updated in a live page-table, restore TLB
invalidation to our BBM sequence under the assumption that BBM is the
right thing to be doing in the first place.

Fixes: fb396bb459 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220629095349.25748-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-07-01 18:29:26 +01:00
Ard Biesheuvel
0aaa68532e arm64: mm: fix booting with 52-bit address space
Joey reports that booting 52-bit VA capable builds on 52-bit VA capable
CPUs is broken since commit 0d9b1ffefa ("arm64: mm: make vabits_actual
a build time constant if possible"). This is due to the fact that the
primary CPU reads the vabits_actual variable before it has been
assigned.

The reason for deferring the assignment of vabits_actual was that we try
to perform as few stores to memory as we can with the MMU and caches
off, due to the cache coherency issues it creates.

Since __cpu_setup() [which is where the read of vabits_actual occurs] is
also called on the secondary boot path, we cannot just read the CPU ID
registers directly, given that the size of the VA space is decided by
the capabilities of the primary CPU. So let's read vabits_actual only on
the secondary boot path, and read the CPU ID registers directly on the
primary boot path, by making it a function parameter of __cpu_setup().

To ensure that all users of vabits_actual (including kasan_early_init())
observe the correct value, move the assignment of vabits_actual back
into asm code, but still defer it to after the MMU and caches have been
enabled.

Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 0d9b1ffefa ("arm64: mm: make vabits_actual a build time constant if possible")
Reported-by: Joey Gouly <joey.gouly@arm.com>
Co-developed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220701111045.2944309-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:19:07 +01:00
Tong Tiangen
bacac63702 arm64: extable: cleanup redundant extable type EX_TYPE_FIXUP
Currently, extable type EX_TYPE_FIXUP is no place to use, We can safely
remove it.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220621072638.1273594-7-tongtiangen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-28 12:11:47 +01:00
Tong Tiangen
4953fc3d32 arm64: extable: add new extable type EX_TYPE_KACCESS_ERR_ZERO support
Currently, The extable type EX_TYPE_UACCESS_ERR_ZERO is used by
__get/put_kernel_nofault(), but those helpers are not uaccess type, so we
add a new extable type EX_TYPE_KACCESS_ERR_ZERO which can be used by
__get/put_kernel_no_fault().

This is also to prepare for distinguishing the two types in machine check
safe process.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220621072638.1273594-2-tongtiangen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-28 12:11:00 +01:00
akpm
46a3b11253 Merge branch 'master' into mm-stable 2022-06-27 10:31:34 -07:00
Kefeng Wang
893dea9ccd arm64: Add HAVE_IOREMAP_PROT support
With ioremap_prot() definition from generic ioremap, also move
pte_pgprot() from hugetlbpage.c into pgtable.h, then arm64 could
have HAVE_IOREMAP_PROT, which will enable generic_access_phys()
code, it is useful for debug, eg, gdb.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220607125027.44946-7-wangkefeng.wang@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-27 12:22:31 +01:00
Kefeng Wang
f23eab0bfa arm64: mm: Convert to GENERIC_IOREMAP
Add hook for arm64's special operation when ioremap(), then
ioremap_wc/np/cache is converted to use ioremap_prot() from
GENERIC_IOREMAP, update the Copyright and kill the unused
inclusions.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220607125027.44946-6-wangkefeng.wang@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-27 12:22:31 +01:00
Ard Biesheuvel
005e12676a arm64: head: record CPU boot mode after enabling the MMU
In order to avoid having to touch memory with the MMU and caches
disabled, and therefore having to invalidate it from the caches
explicitly, just defer storing the value until after the MMU has been
turned on, unless we are giving up with an error.

While at it, move the associated variable definitions into C code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-19-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel
c3cee924bd arm64: head: cover entire kernel image in initial ID map
As a first step towards avoiding the need to create, tear down and
recreate the kernel virtual mapping with MMU and caches disabled, start
by expanding the ID map so it covers the page tables as well as all
executable code. This will allow us to populate the page tables with the
MMU and caches on, and call KASLR init code before setting up the
virtual mapping.

Since this ID map is only needed at boot, create it as a temporary set
of page tables, and populate the permanent ID map after enabling the MMU
and caches. While at it, switch to read-only attributes for the where
possible, as writable permissions are only needed for the initial kernel
page tables. Note that on 4k granule configurations, the permanent ID
map will now be reduced to a single page rather than a 2M block mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-13-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel
1682c45b92 arm64: mm: provide idmap pointer to cpu_replace_ttbr1()
In preparation for changing the way we initialize the permanent ID map,
update cpu_replace_ttbr1() so we can use it with the initial ID map as
well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-11-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:10 +01:00
Ard Biesheuvel
ebd9aea1f2 arm64: head: drop idmap_ptrs_per_pgd
The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even
though it is updated with the MMU and caches disabled. However, we never
bother to read the value again except in the very next instruction, and
so we can just drop the variable entirely.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220624150651.1358849-5-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:09 +01:00
Ard Biesheuvel
e8d13cced5 arm64: head: move assignment of idmap_t0sz to C code
Setting idmap_t0sz involves fiddling with the caches if done with the
MMU off. Since we will be creating an initial ID map with the MMU and
caches off, and the permanent ID map with the MMU and caches on, let's
move this assignment of idmap_t0sz out of the startup code, and replace
it with a macro that simply issues the three instructions needed to
calculate the value wherever it is needed before the MMU is turned on.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-4-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:09 +01:00
Ard Biesheuvel
0d9b1ffefa arm64: mm: make vabits_actual a build time constant if possible
Currently, we only support 52-bit virtual addressing on 64k pages
configurations, and in all other cases, vabits_actual is guaranteed to
equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in
that case.

While at it, move the assignment out of the asm entry code - it has no
need to be there.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:09 +01:00
Ard Biesheuvel
475031b6ed arm64: head: move kimage_vaddr variable into C file
This variable definition does not need to be in head.S so move it out.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220624150651.1358849-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 17:18:09 +01:00
Ard Biesheuvel
1c9a8e8768 arm64: entry: simplify trampoline data page
Get rid of some clunky open coded arithmetic on section addresses, by
emitting the trampoline data variables into a separate, dedicated r/o
data section, and putting it at the next page boundary. This way, we can
access the literals via single LDR instruction.

While at it, get rid of other, implicit literals, and use ADRP/ADD or
MOVZ/MOVK sequences, as appropriate. Note that the latter are only
supported for CONFIG_RELOCATABLE=n (which is usually the case if
CONFIG_RANDOMIZE_BASE=n), so update the CPP conditionals to reflect
this.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220622161010.3845775-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 13:08:30 +01:00
Ard Biesheuvel
47546a1912 arm64: mm: install KPTI nG mappings with MMU enabled
In cases where we unmap the kernel while running in user space, we rely
on ASIDs to distinguish the minimal trampoline from the full kernel
mapping, and this means we must use non-global attributes for those
mappings, to ensure they are scoped by ASID and will not hit in the TLB
inadvertently.

We only do this when needed, as this is generally more costly in terms
of TLB pressure, and so we boot without these non-global attributes, and
apply them to all existing kernel mappings once all CPUs are up and we
know whether or not the non-global attributes are needed. At this point,
we cannot simply unmap and remap the entire address space, so we have to
update all existing block and page descriptors in place.

Currently, we go through a lot of trouble to perform these updates with
the MMU and caches off, to avoid violating break before make (BBM) rules
imposed by the architecture. Since we make changes to page tables that
are not covered by the ID map, we gain access to those descriptors by
disabling translations altogether. This means that the stores to memory
are issued with device attributes, and require extra care in terms of
coherency, which is costly. We also rely on the ID map to access a
shared flag, which requires the ID map to be executable and writable at
the same time, which is another thing we'd prefer to avoid.

So let's switch to an approach where we replace the kernel mapping with
a minimal mapping of a few pages that can be used for a minimal, ad-hoc
fixmap that we can use to map each page table in turn as we traverse the
hierarchy.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220609174320.4035379-3-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-23 18:26:13 +01:00
Ard Biesheuvel
c7eff738cf arm64: kpti-ng: simplify page table traversal logic
Simplify the KPTI G-to-nG asm helper code by:
- pulling the 'table bit' test into the get/put macros so we can combine
  them and incorporate the entire loop;
- moving the 'table bit' test after the update of bit #11 so we no
  longer need separate next_xxx and skip_xxx labels;
- redefining the pmd/pud register aliases and the next_pmd/next_pud
  labels instead of branching to them if the number of configured page
  table levels is less than 3 or 4, respectively.

No functional change intended, except for the fact that we now descend
into a next level table after setting bit #11 on its descriptor but this
should make no difference in practice.

While at it, switch to .L prefixed local labels so they don't clutter up
the symbol tables, kallsyms, etc, and clean up the indentation for
legibility.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220609174320.4035379-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-23 18:26:13 +01:00
Will Deacon
c50f11c619 arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
Invalidating the buffer memory in arch_sync_dma_for_device() for
FROM_DEVICE transfers

When using the streaming DMA API to map a buffer prior to inbound
non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU
cachelines so that they will not be written back during the transfer and
corrupt the buffer contents written by the DMA. This, however, poses two
potential problems:

  (1) If the DMA transfer does not write to every byte in the buffer,
      then the unwritten bytes will contain stale data once the transfer
      has completed.

  (2) If the buffer has a virtual alias in userspace, then stale data
      may be visible via this alias during the period between performing
      the cache invalidation and the DMA writes landing in memory.

Address both of these issues by cleaning (aka writing-back) the dirty
lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding
them using invalidation.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-17 19:06:06 +01:00
Peter Xu
d92725256b mm: avoid unnecessary page fault retires on shared memory types
I observed that for each of the shared file-backed page faults, we're very
likely to retry one more time for the 1st write fault upon no page.  It's
because we'll need to release the mmap lock for dirty rate limit purpose
with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).

Then after that throttling we return VM_FAULT_RETRY.

We did that probably because VM_FAULT_RETRY is the only way we can return
to the fault handler at that time telling it we've released the mmap lock.

However that's not ideal because it's very likely the fault does not need
to be retried at all since the pgtable was well installed before the
throttling, so the next continuous fault (including taking mmap read lock,
walk the pgtable, etc.) could be in most cases unnecessary.

It's not only slowing down page faults for shared file-backed, but also add
more mmap lock contention which is in most cases not needed at all.

To observe this, one could try to write to some shmem page and look at
"pgfault" value in /proc/vmstat, then we should expect 2 counts for each
shmem write simply because we retried, and vm event "pgfault" will capture
that.

To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
show that we've completed the whole fault and released the lock.  It's also
a hint that we should very possibly not need another fault immediately on
this page because we've just completed it.

This patch provides a ~12% perf boost on my aarch64 test VM with a simple
program sequentially dirtying 400MB shmem file being mmap()ed and these are
the time it needs:

  Before: 650.980 ms (+-1.94%)
  After:  569.396 ms (+-1.38%)

I believe it could help more than that.

We need some special care on GUP and the s390 pgfault handler (for gmap
code before returning from pgfault), the rest changes in the page fault
handlers should be relatively straightforward.

Another thing to mention is that mm_account_fault() does take this new
fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.

I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
them as-is.

Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	[arm part]
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Will Deacon <will@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:48:27 -07:00
Ma Wupeng
c0b978fedf arm64: mm: Only remove nomap flag for initrd
Commit 177e15f0c1 ("arm64: add the initrd region to the linear mapping explicitly")
remove all the flags of the memory used by initrd. This is fine since
MEMBLOCK_MIRROR is not used in arm64.

However with mirrored feature introduced to arm64, this will clear the mirrored
flag used by initrd, which will lead to error log printed by
find_zone_movable_pfns_for_nodes() if the lower 4G range has some non-mirrored
memory.

To solve this problem, only MEMBLOCK_NOMAP flag will be removed via
memblock_clear_nomap().

Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220614092156.1972846-5-mawupeng1@huawei.com
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-15 12:14:15 +02:00
Oleksandr Tyshchenko
9bf22421dc arm/xen: Introduce xen_setup_dma_ops()
This patch introduces new helper and places it in new header.
The helper's purpose is to assign any Xen specific DMA ops in
a single place. For now, we deal with xen-swiotlb DMA ops only.
The one of the subsequent commits in current series will add
xen-grant DMA ops case.

Also re-use the xen_swiotlb_detect() check on Arm32.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
[For arm64]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1654197833-25362-2-git-send-email-olekstysh@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06 08:54:33 +02:00
Baolin Wang
e68b823ab0 arm64/hugetlb: Fix building errors in huge_ptep_clear_flush()
Fix the arm64 build error which was caused by commit ae07562909 ("mm:
change huge_ptep_clear_flush() to return the original pte") interacting
with commit fb396bb459 ("arm64/hugetlb: Drop TLB flush from
get_clear_flush()"):

  arch/arm64/mm/hugetlbpage.c: In function ‘huge_ptep_clear_flush’:
  arch/arm64/mm/hugetlbpage.c:515:9: error: implicit declaration of function ‘get_clear_flush’; did you mean ‘ptep_clear_flush’? [-Werror=implicit-function-declaration]
    515 |  return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
        |         ^~~~~~~~~~~~~~~
        |         ptep_clear_flush

Due to the new get_clear_contig() has dropped TLB flush, we should add
an explicit TLB flush in huge_ptep_clear_flush() to keep original
semantics when changing to use new get_clear_contig().

Fixes: fb396bb459 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()").
Fixes: ae07562909 ("mm: change huge_ptep_clear_flush() to return the original pte")
Reported-and-tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27 10:56:35 -07:00
Linus Torvalds
98931dd95f Yang Shi has improved the behaviour of khugepaged collapsing of readonly
file-backed transparent hugepages.
 
 Johannes Weiner has arranged for zswap memory use to be tracked and
 managed on a per-cgroup basis.
 
 Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime
 enablement of the recent huge page vmemmap optimization feature.
 
 Baolin Wang contributes a series to fix some issues around hugetlb
 pagetable invalidation.
 
 Zhenwei Pi has fixed some interactions between hwpoisoned pages and
 virtualization.
 
 Tong Tiangen has enabled the use of the presently x86-only
 page_table_check debugging feature on arm64 and riscv.
 
 David Vernet has done some fixup work on the memcg selftests.
 
 Peter Xu has taught userfaultfd to handle write protection faults against
 shmem- and hugetlbfs-backed files.
 
 More DAMON development from SeongJae Park - adding online tuning of the
 feature and support for monitoring of fixed virtual address ranges.  Also
 easier discovery of which monitoring operations are available.
 
 Nadav Amit has done some optimization of TLB flushing during mprotect().
 
 Neil Brown continues to labor away at improving our swap-over-NFS support.
 
 David Hildenbrand has some fixes to anon page COWing versus
 get_user_pages().
 
 Peng Liu fixed some errors in the core hugetlb code.
 
 Joao Martins has reduced the amount of memory consumed by device-dax's
 compound devmaps.
 
 Some cleanups of the arch-specific pagemap code from Anshuman Khandual.
 
 Muchun Song has found and fixed some errors in the TLB flushing of
 transparent hugepages.
 
 Roman Gushchin has done more work on the memcg selftests.
 
 And, of course, many smaller fixes and cleanups.  Notably, the customary
 million cleanup serieses from Miaohe Lin.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA
 jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk
 PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8=
 =nFe6
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
2022-05-26 12:32:41 -07:00
Linus Torvalds
3f306ea2e1 dma-mapping updates for Linux 5.19
- don't over-decrypt memory (Robin Murphy)
  - takes min align mask into account for the swiotlb max mapping size
    (Tianyu Lan)
  - use GFP_ATOMIC in dma-debug (Mikulas Patocka)
  - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)
  - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)
  - cleanup swiotlb initialization and share more code with swiotlb-xen
    (me, Stefano Stabellini)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmKObTQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYObmA//dIcDB/q4iFGD+WJh4MhM+asx0ZsdF2OJz42WEhgT
 Z9duOrgcneEQundCamqJP9rNTs980LHDA8uWQC5rZEc9vxuRVOdS7bSgYRUwWh6B
 r0ZjOsvQCn+ChoZML8uyk4rfmEINq+EvJuec3G5fgecZOhPuJS2i2uzzv5cHwqgP
 ChC0fwyZlkfdECXgvZXbEoCJLfTgGNlziN6Ai8dirSoqgEQUoCsY89/M7OiEBvV2
 R4XUWD7OvQERfB4t6xLuUHyzf9PAuWB+OiblRVNeAmK3lMjxVrc3k4kIowgklnzD
 8hfmphAa9Zou3zdfi6Gd4fiQRHRVOwKVp1rtqUmJ+lPSiwyMzu64z9ld2+2qac0h
 V4sSr/yJkhxnBT4/0MkTChvhnRobisackpUzNRpiM4ck7cNVb7eAvkISsbH+pWI9
 aEexPhbyskjlV+GOyM4QL4ygG0dpXY0HSyoh6uaSVsaXMycnWIsJCPidXxV1HGV0
 q2/RLHuHwYxia8cYCF01/DQvwOKSjwbU0zModxtRezGD5GYh2C0a+SrA1aX+qiTu
 yGJCs2UHtSQstAt78tTVp499YeDeL/oGSQkPAu8zyRkSczzF+CncGTuXyoJbAWyK
 otcgERWljgZ4scxjfu1uacfoVhKQ7nOu7hiJokL0U80FESAennLC3ZlocvB9h/ff
 HNA=
 =n2rk
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - don't over-decrypt memory (Robin Murphy)

 - takes min align mask into account for the swiotlb max mapping size
   (Tianyu Lan)

 - use GFP_ATOMIC in dma-debug (Mikulas Patocka)

 - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)

 - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)

 - cleanup swiotlb initialization and share more code with swiotlb-xen
   (me, Stefano Stabellini)

* tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits)
  dma-direct: don't over-decrypt memory
  swiotlb: max mapping size takes min align mask into account
  swiotlb: use the right nslabs-derived sizes in swiotlb_init_late
  swiotlb: use the right nslabs value in swiotlb_init_remap
  swiotlb: don't panic when the swiotlb buffer can't be allocated
  dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
  dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
  swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm
  x86: remove cruft from <asm/dma-mapping.h>
  swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl
  swiotlb: merge swiotlb-xen initialization into swiotlb
  swiotlb: provide swiotlb_init variants that remap the buffer
  swiotlb: pass a gfp_mask argument to swiotlb_init_late
  swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction
  swiotlb: make the swiotlb_init interface more useful
  x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled
  x86: remove the IOMMU table infrastructure
  MIPS/octeon: use swiotlb_init instead of open coding it
  arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region
  swiotlb: rename swiotlb_late_init_with_default_size
  ...
2022-05-25 19:18:36 -07:00
Linus Torvalds
143a6252e1 arm64 updates for 5.19:
- Initial support for the ARMv9 Scalable Matrix Extension (SME). SME
   takes the approach used for vectors in SVE and extends this to provide
   architectural support for matrix operations. No KVM support yet, SME
   is disabled in guests.
 
 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.
 
 - btrfs search_ioctl() fix for live-lock with sub-page faults.
 
 - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring
   coherent I/O traffic, support for Arm's CMN-650 and CMN-700
   interconnect PMUs, minor driver fixes, kerneldoc cleanup.
 
 - Kselftest updates for SME, BTI, MTE.
 
 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.
 
 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).
 
 - stacktrace cleanups.
 
 - ftrace cleanups.
 
 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKH19IACgkQa9axLQDI
 XvEFWg//bf0p6zjeNaOJmBbyVFsXsVyYiEaLUpFPUs3oB+81s2YZ+9i1rgMrNCft
 EIDQ9+/HgScKxJxnzWf68heMdcBDbk76VJtLALExbge6owFsjByQDyfb/b3v/bLd
 ezAcGzc6G5/FlI1IP7ct4Z9MnQry4v5AG8lMNAHjnf6GlBS/tYNAqpmj8HpQfgRQ
 ZbhfZ8Ayu3TRSLWL39NHVevpmxQm/bGcpP3Q9TtjUqg0r1FQ5sK/LCqOksueIAzT
 UOgUVYWSFwTpLEqbYitVqgERQp9LiLoK5RmNYCIEydfGM7+qmgoxofSq5e2hQtH2
 SZM1XilzsZctRbBbhMit1qDBqMlr/XAy/R5FO0GauETVKTaBhgtj6mZGyeC9nU/+
 RGDljaArbrOzRwMtSuXF+Fp6uVo5spyRn1m8UT/k19lUTdrV9z6EX5Fzuc4Mnhed
 oz4iokbl/n8pDObXKauQspPA46QpxUYhrAs10B/ELc3yyp/Qj3jOfzYHKDNFCUOq
 HC9mU+YiO9g2TbYgCrrFM6Dah2E8fU6/cR0ZPMeMgWK4tKa+6JMEINYEwak9e7M+
 8lZnvu3ntxiJLN+PrPkiPyG+XBh2sux1UfvNQ+nw4Oi9xaydeX7PCbQVWmzTFmHD
 q7UPQ8220e2JNCha9pULS8cxDLxiSksce06DQrGXwnHc1Ir7T04=
 =0DjE
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Initial support for the ARMv9 Scalable Matrix Extension (SME).

   SME takes the approach used for vectors in SVE and extends this to
   provide architectural support for matrix operations. No KVM support
   yet, SME is disabled in guests.

 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.

 - btrfs search_ioctl() fix for live-lock with sub-page faults.

 - arm64 perf updates: support for the Hisilicon "CPA" PMU for
   monitoring coherent I/O traffic, support for Arm's CMN-650 and
   CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup.

 - Kselftest updates for SME, BTI, MTE.

 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.

 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).

 - stacktrace cleanups.

 - ftrace cleanups.

 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits)
  arm64/sysreg: Generate definitions for FAR_ELx
  arm64/sysreg: Generate definitions for DACR32_EL2
  arm64/sysreg: Generate definitions for CSSELR_EL1
  arm64/sysreg: Generate definitions for CPACR_ELx
  arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
  arm64/sysreg: Generate definitions for CLIDR_EL1
  arm64/sve: Move sve_free() into SVE code section
  arm64: Kconfig.platforms: Add comments
  arm64: Kconfig: Fix indentation and add comments
  arm64: mm: avoid writable executable mappings in kexec/hibernate code
  arm64: lds: move special code sections out of kernel exec segment
  arm64/hugetlb: Implement arm64 specific huge_ptep_get()
  arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
  arm64: kdump: Do not allocate crash low memory if not needed
  arm64/sve: Generate ZCR definitions
  arm64/sme: Generate defintions for SVCR
  arm64/sme: Generate SMPRI_EL1 definitions
  arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
  arm64/sme: Automatically generate SMIDR_EL1 defines
  arm64/sme: Automatically generate defines for SMCR
  ...
2022-05-23 21:06:11 -07:00
Catalin Marinas
0616ea3f1b Merge branch 'for-next/esr-elx-64-bit' into for-next/core
* for-next/esr-elx-64-bit:
  : Treat ESR_ELx as a 64-bit register.
  KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
  KVM: arm64: Treat ESR_EL2 as a 64-bit register
  arm64: Treat ESR_ELx as a 64-bit register
  arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall
  arm64: Make ESR_ELx_xVC_IMM_MASK compatible with assembly
2022-05-20 18:51:54 +01:00
Catalin Marinas
e003d5335c Merge branch 'for-next/sysreg-gen' into for-next/core
* for-next/sysreg-gen: (32 commits)
  : Automatic system register definition generation.
  arm64/sysreg: Generate definitions for FAR_ELx
  arm64/sysreg: Generate definitions for DACR32_EL2
  arm64/sysreg: Generate definitions for CSSELR_EL1
  arm64/sysreg: Generate definitions for CPACR_ELx
  arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
  arm64/sysreg: Generate definitions for CLIDR_EL1
  arm64/sve: Generate ZCR definitions
  arm64/sme: Generate defintions for SVCR
  arm64/sme: Generate SMPRI_EL1 definitions
  arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
  arm64/sme: Automatically generate SMIDR_EL1 defines
  arm64/sme: Automatically generate defines for SMCR
  arm64/sysreg: Support generation of RAZ fields
  arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.h
  arm64/sme: Standardise bitfield names for SVCR
  arm64/sme: Drop SYS_ from SMIDR_EL1 defines
  arm64/fp: Rename SVE and SME LEN field name to _WIDTH
  arm64/fp: Make SVE and SME length register definition match architecture
  arm64/sysreg: fix odd line spacing
  arm64/sysreg: improve comment for regs without fields
  ...
2022-05-20 18:50:57 +01:00
Catalin Marinas
201729d53a Merge branches 'for-next/sme', 'for-next/stacktrace', 'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf/arm-cmn: Decode CAL devices properly in debugfs
  perf/arm-cmn: Fix filter_sel lookup
  perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first
  drivers/perf: hisi: Add Support for CPA PMU
  drivers/perf: hisi: Associate PMUs in SICL with CPUs online
  drivers/perf: arm_spe: Expose saturating counter to 16-bit
  perf/arm-cmn: Add CMN-700 support
  perf/arm-cmn: Refactor occupancy filter selector
  perf/arm-cmn: Add CMN-650 support
  dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
  perf: check return value of armpmu_request_irq()
  perf: RISC-V: Remove non-kernel-doc ** comments

* for-next/sme: (30 commits)
  : Scalable Matrix Extensions support.
  arm64/sve: Move sve_free() into SVE code section
  arm64/sve: Make kernel FPU protection RT friendly
  arm64/sve: Delay freeing memory in fpsimd_flush_thread()
  arm64/sme: More sensibly define the size for the ZA register set
  arm64/sme: Fix NULL check after kzalloc
  arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding()
  arm64/sme: Provide Kconfig for SME
  KVM: arm64: Handle SME host state when running guests
  KVM: arm64: Trap SME usage in guest
  KVM: arm64: Hide SME system registers from guests
  arm64/sme: Save and restore streaming mode over EFI runtime calls
  arm64/sme: Disable streaming mode and ZA when flushing CPU state
  arm64/sme: Add ptrace support for ZA
  arm64/sme: Implement ptrace support for streaming mode SVE registers
  arm64/sme: Implement ZA signal handling
  arm64/sme: Implement streaming SVE signal handling
  arm64/sme: Disable ZA and streaming mode when handling signals
  arm64/sme: Implement traps and syscall handling for SME
  arm64/sme: Implement ZA context switching
  arm64/sme: Implement streaming SVE context switching
  ...

* for-next/stacktrace:
  : Stacktrace cleanups.
  arm64: stacktrace: align with common naming
  arm64: stacktrace: rename stackframe to unwind_state
  arm64: stacktrace: rename unwinder functions
  arm64: stacktrace: make struct stackframe private to stacktrace.c
  arm64: stacktrace: delete PCS comment
  arm64: stacktrace: remove NULL task check from unwind_frame()

* for-next/fault-in-subpage:
  : btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable().
  btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
  arm64: Add support for user sub-page fault probing
  mm: Add fault_in_subpage_writeable() to probe at sub-page granularity

* for-next/misc:
  : Miscellaneous patches.
  arm64: Kconfig.platforms: Add comments
  arm64: Kconfig: Fix indentation and add comments
  arm64: mm: avoid writable executable mappings in kexec/hibernate code
  arm64: lds: move special code sections out of kernel exec segment
  arm64/hugetlb: Implement arm64 specific huge_ptep_get()
  arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
  arm64: mm: Make arch_faults_on_old_pte() check for migratability
  arm64: mte: Clean up user tag accessors
  arm64/hugetlb: Drop TLB flush from get_clear_flush()
  arm64: Declare non global symbols as static
  arm64: mm: Cleanup useless parameters in zone_sizes_init()
  arm64: fix types in copy_highpage()
  arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE
  arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK
  arm64: document the boot requirements for MTE
  arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE

* for-next/ftrace:
  : ftrace cleanups.
  arm64/ftrace: Make function graph use ftrace directly
  ftrace: cleanup ftrace_graph_caller enable and disable

* for-next/crashkernel:
  : Support for crashkernel reservations above ZONE_DMA.
  arm64: kdump: Do not allocate crash low memory if not needed
  docs: kdump: Update the crashkernel description for arm64
  of: Support more than one crash kernel regions for kexec -s
  of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
  arm64: kdump: Reimplement crashkernel=X
  arm64: Use insert_resource() to simplify code
  kdump: return -ENOENT if required cmdline option does not exist
2022-05-20 18:50:35 +01:00
Ard Biesheuvel
01142791b0 arm64: mm: avoid writable executable mappings in kexec/hibernate code
The temporary mappings of the low-level kexec and hibernate helpers are
created with both writable and executable attributes, which is not
necessary here, and generally best avoided. So use read-only, executable
attributes instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220429131347.3621090-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-17 09:32:45 +01:00
Baolin Wang
bc5dfb4fd7 arm64/hugetlb: Implement arm64 specific huge_ptep_get()
Now we use huge_ptep_get() to get the pte value of a hugetlb page,
however it will only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb on ARM64 system, which can contain several
continuous pte or pmd entries with same page table attributes. And it
will not take into account the subpages' dirty or young bits of a
CONT-PTE/PMD size hugetlb page.

So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
which already takes account the dirty or young bits for any subpages
in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
young flags statistics for hugetlb pages with current huge_ptep_get(),
such as the gather_hugetlb_stats() function, and CONT-PTE/PMD hugetlb
monitoring with DAMON.

Thus define an ARM64 specific huge_ptep_get() implementation as well as
enabling __HAVE_ARCH_HUGE_PTEP_GET, that will take into account any
subpages' dirty or young bits for CONT-PTE/PMD size hugetlb page, for
those functions that want to check the dirty and young flags of a hugetlb
page.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/

Suggested-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/624109a80ac4bbdf1e462dfa0b49e9f7c31a7c0d.1652496622.git.baolin.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 23:39:07 +01:00
Baolin Wang
f0d9d79ec7 arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
The original huge_ptep_get() on ARM64 is just a wrapper of ptep_get(),
which will not take into account any contig-PTEs dirty and access bits.
Meanwhile we will implement a new ARM64-specific huge_ptep_get()
interface in following patch, which will take into account any contig-PTEs
dirty and access bits. To keep the same efficient logic to get the pte
value, change to use ptep_get() as a preparation.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/5113ed6e103f995e1d0f0c9fda0373b761bbcad2.1652496622.git.baolin.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 23:39:07 +01:00
Zhen Lei
8f0f104e2a arm64: kdump: Do not allocate crash low memory if not needed
When "crashkernel=X,high" is specified, the specified "crashkernel=Y,low"
memory is not required in the following corner cases:
1. If both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, it means
   that the devices can access any memory.
2. If the system memory is small, the crash high memory may be allocated
   from the DMA zones. If that happens, there's no need to allocate
   another crash low memory because there's already one.

Add condition '(crash_base >= CRASH_ADDR_LOW_MAX)' to determine whether
the 'high' memory is allocated above DMA zones. Note: when both
CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the entire physical
memory is DMA accessible, CRASH_ADDR_LOW_MAX equals 'PHYS_MASK + 1'.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220511032033.426-1-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 20:01:38 +01:00
Baolin Wang
ae07562909 mm: change huge_ptep_clear_flush() to return the original pte
Patch series "Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating", v4.

presently, migrating a hugetlb page or unmapping a poisoned hugetlb page,
we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue.  This patch set
will change to use hugetlb related APIs to fix this issue.

Note: Mike pointed out the huge_ptep_get() will only return the one
specific value, and it would not take into account the dirty or young bits
of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1].  This
inconsistent issue is not introduced by this patch set, and this issue
will be addressed in another thread [2].  Meanwhile the uffd for hugetlb
case [3] pointed out by Gerald also needs another patch to address.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/
[2] https://lore.kernel.org/all/cover.1651998586.git.baolin.wang@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/


This patch (of 3):

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table
when unmapping or migrating a hugetlb page, and will change to use
huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

[baolin.wang@linux.alibaba.com: fix build in several more architectures]
  Link: https://lkml.kernel.org/r/0009a4cd-2826-e8be-e671-f050d4f18d5d@linux.alibaba.com
[sfr@canb.auug.org.au: fixup]
  Link: https://lkml.kernel.org/r/20220511181531.7f27a5c1@canb.auug.org.au
Link: https://lkml.kernel.org/r/cover.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20f77ddab90baa249bd24504c413189b82acde69.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/cover.1652147571.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/dcf065868cce35bceaf138613ad27f17bb7c0c19.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 16:48:55 -07:00
Anshuman Khandual
fb396bb459 arm64/hugetlb: Drop TLB flush from get_clear_flush()
This drops now redundant TLB flush in get_clear_flush() which is no longer
required after recent commit 697a1d44af ("tlb: hugetlb: Add more sizes to
tlb_remove_huge_tlb_entry"). It also renames this function i.e dropping off
'_flush' and replacing it with '__contig' as appropriate.

Cc: Will Deacon <will@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220510043930.2410985-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-11 13:53:19 +01:00
Mike Rapoport
260364d112 arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
The semantics of pfn_valid() is to check presence of the memory map for a
PFN and not whether a PFN is covered by the linear map.  The memory map
may be present for NOMAP memory regions, but they won't be mapped in the
linear mapping.  Accessing such regions via __va() when they are
memremap()'ed will cause a crash.

On v5.4.y the crash happens on qemu-arm with UEFI [1]:

<1>[    0.084476] 8<--- cut here ---
<1>[    0.084595] Unable to handle kernel paging request at virtual address dfb76000
<1>[    0.084938] pgd = (ptrval)
<1>[    0.085038] [dfb76000] *pgd=5f7fe801, *pte=00000000, *ppte=00000000

...

<4>[    0.093923] [<c0ed6ce8>] (memcpy) from [<c16a06f8>] (dmi_setup+0x60/0x418)
<4>[    0.094204] [<c16a06f8>] (dmi_setup) from [<c16a38d4>] (arm_dmi_init+0x8/0x10)
<4>[    0.094408] [<c16a38d4>] (arm_dmi_init) from [<c0302e9c>] (do_one_initcall+0x50/0x228)
<4>[    0.094619] [<c0302e9c>] (do_one_initcall) from [<c16011e4>] (kernel_init_freeable+0x15c/0x1f8)
<4>[    0.094841] [<c16011e4>] (kernel_init_freeable) from [<c0f028cc>] (kernel_init+0x8/0x10c)
<4>[    0.095057] [<c0f028cc>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)

On kernels v5.10.y and newer the same crash won't reproduce on ARM because
commit b10d6bca87 ("arch, drivers: replace for_each_membock() with
for_each_mem_range()") changed the way memory regions are registered in
the resource tree, but that merely covers up the problem.

On ARM64 memory resources registered in yet another way and there the
issue of wrong usage of pfn_valid() to ensure availability of the linear
map is also covered.

Implement arch_memremap_can_ram_remap() on ARM and ARM64 to prevent access
to NOMAP regions via the linear mapping in memremap().

Link: https://lore.kernel.org/all/Yl65zxGgFzF1Okac@sirena.org.uk
Link: https://lkml.kernel.org/r/20220426060107.7618-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09 17:34:28 -07:00
Chen Zhou
944a45abfa arm64: kdump: Reimplement crashkernel=X
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel in DMA zone, which
will fail when there is not enough low memory.
2. If reserving crashkernel above DMA zone, in this case, crash dump
kernel will fail to boot because there is no low memory available
for allocation.

To solve these issues, introduce crashkernel=X,[high,low].
The "crashkernel=X,high" is used to select a region above DMA zone, and
the "crashkernel=Y,low" is used to allocate specified size low memory.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Co-developed-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20220506114402.365-4-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-07 19:54:33 +01:00
Zhen Lei
e6b394425c arm64: Use insert_resource() to simplify code
insert_resource() traverses the subtree layer by layer from the root node
until a proper location is found. Compared with request_resource(), the
parent node does not need to be determined in advance.

In addition, move the insertion of node 'crashk_res' into function
reserve_crashkernel() to make the associated code close together.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220506114402.365-3-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-07 19:54:33 +01:00
Kefeng Wang
f41ef4c2ee arm64: mm: Cleanup useless parameters in zone_sizes_init()
Directly use max_pfn for max and no one use min, kill them.

Reviewed-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220411092455.1461-4-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-05 08:48:09 +01:00
Tong Tiangen
921d161f15 arm64: fix types in copy_highpage()
In copy_highpage() the `kto` and `kfrom` local variables are pointers to
struct page, but these are used to hold arbitrary pointers to kernel memory
. Each call to page_address() returns a void pointer to memory associated
with the relevant page, and copy_page() expects void pointers to this
memory.

This inconsistency was introduced in commit 2563776b41 ("arm64: mte:
Tags-aware copy_{user_,}highpage() implementations") and while this
doesn't appear to be harmful in practice it is clearly wrong.

Correct this by making `kto` and `kfrom` void pointers.

Fixes: 2563776b41 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220420030418.3189040-3-tongtiangen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04 20:00:13 +01:00
Mark Brown
bc249e37b9 arm64/mte: Make TCF field values and naming more standard
In preparation for automatic generation of the defines for system registers
make the values used for the enumeration in SCTLR_ELx.TCF suitable for use
with the newly defined SYS_FIELD_PREP_ENUM helper, removing the shift from
the define and using the helper to generate it on use instead. Since we
only ever interact with this field in EL1 and in preparation for generation
of the defines also rename from SCTLR_ELx to SCTLR_EL1. SCTLR_EL2 is not
quite the same as SCTLR_EL1 so the conversion does not share the field
definitions.

There should be no functional change from this patch.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220503170233.507788-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04 15:30:27 +01:00
Alexandru Elisei
8d56e5c5a9 arm64: Treat ESR_ELx as a 64-bit register
In the initial release of the ARM Architecture Reference Manual for
ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This
changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture,
when they became 64-bit registers, with bits [63:32] defined as RES0. In
version G.a, a new field was added to ESR_ELx, ISS2, which covers bits
[36:32].  This field is used when the Armv8.7 extension FEAT_LS64 is
implemented.

As a result of the evolution of the register width, Linux stores it as
both a 64-bit value and a 32-bit value, which hasn't affected correctness
so far as Linux only uses the lower 32 bits of the register.

Make the register type consistent and always treat it as 64-bit wide. The
register is redefined as an "unsigned long", which is an unsigned
double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1,
page 14). The type was chosen because "unsigned int" is the most frequent
type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx
in exception handling, is also declared as "unsigned long". The 64-bit type
also makes adding support for architectural features that use fields above
bit 31 easier in the future.

The KVM hypervisor will receive a similar update in a subsequent patch.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-4-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Muchun Song
47010c040d mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*
The word of "free" is not expressive enough to express the feature of
optimizing vmemmap pages associated with each HugeTLB, rename this keywork
to "optimize".  In this patch , cheanup configs to make code more
expressive.

Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:15 -07:00
Muchun Song
f10f1442c3 mm: hugetlb_vmemmap: cleanup hugetlb_free_vmemmap_enabled*
The word of "free" is not expressive enough to express the feature of
optimizing vmemmap pages associated with each HugeTLB, rename this keywork
to "optimize".  In this patch , cheanup the static key and
hugetlb_free_vmemmap_enabled() to make code more expressive.

Link: https://lkml.kernel.org/r/20220404074652.68024-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:15 -07:00
Anshuman Khandual
b3aca728fb arm64/mm: enable ARCH_HAS_VM_GET_PAGE_PROT
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. It localizes arch_vm_get_page_prot()
and moves it near vm_get_page_prot().

Link: https://lkml.kernel.org/r/20220414062125.609297-4-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:13 -07:00
Muchun Song
1e63ac088f arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64
The feature of minimizing overhead of struct page associated with each
HugeTLB page aims to free its vmemmap pages (used as struct page) to save
memory, where is ~14GB/16GB per 1TB HugeTLB pages (2MB/1GB type).  In
short, when a HugeTLB page is allocated or freed, the vmemmap array
representing the range associated with the page will need to be remapped. 
When a page is allocated, vmemmap pages are freed after remapping.  When a
page is freed, previously discarded vmemmap pages must be allocated before
remapping.  More implementations and details can be found here [1].

The infrastructure of freeing vmemmap pages associated with each HugeTLB
page is already there, we can easily enable HUGETLB_PAGE_FREE_VMEMMAP for
arm64, the only thing to be fixed is flush_dcache_page() .

flush_dcache_page() need to be adapted to operate on the head page's flags
since the tail vmemmap pages are mapped with read-only after the feature
is enabled (clear operation is not permitted).

There was some discussions about this in the thread [2], but there was no
conclusion in the end.  And I copied the concern proposed by Anshuman to
here and explain why those concern is superfluous.  It is safe to enable
it for x86_64 as well as arm64.

1st concern:
'''
But what happens when a hot remove section's vmemmap area (which is
being teared down) is nearby another vmemmap area which is either created
or being destroyed for HugeTLB alloc/free purpose. As you mentioned
HugeTLB pages inside the hot remove section might be safe. But what about
other HugeTLB areas whose vmemmap area shares page table entries with
vmemmap entries for a section being hot removed ? Massive HugeTLB alloc
/use/free test cycle using memory just adjacent to a memory hotplug area,
which is always added and removed periodically, should be able to expose
this problem.
'''

Answer: At the time memory is removed, all HugeTLB pages either have been
migrated away or dissolved.  So there is no race between memory hot remove
and free_huge_page_vmemmap().  Therefore, HugeTLB pages inside the hot
remove section is safe.  Let's talk your question "what about other
HugeTLB areas whose vmemmap area shares page table entries with vmemmap
entries for a section being hot removed ?", the question is not
established.  The minimal granularity size of hotplug memory 128MB (on
arm64, 4k base page), any HugeTLB smaller than 128MB is within a section,
then, there is no share PTE page tables between HugeTLB in this section
and ones in other sections and a HugeTLB page could not cross two
sections.  In this case, the section cannot be freed.  Any HugeTLB bigger
than 128MB (section size) whose vmemmap pages is an integer multiple of
2MB (PMD-mapped).  As long as:

  1) HugeTLBs are naturally aligned, power-of-two sizes
  2) The HugeTLB size >= the section size
  3) The HugeTLB size >= the vmemmap leaf mapping size

Then a HugeTLB will not share any leaf page table entries with *anything
else*, but will share intermediate entries.  In this case, at the time
memory is removed, all HugeTLB pages either have been migrated away or
dissolved.  So there is also no race between memory hot remove and
free_huge_page_vmemmap().

2nd concern:
'''
differently, not sure if ptdump would require any synchronization.

Dumping an wrong value is probably okay but crashing because a page table
entry is being freed after ptdump acquired the pointer is bad. On arm64,
ptdump() is protected against hotremove via [get|put]_online_mems().
'''

Answer: The ptdump should be fine since vmemmap_remap_free() only
exchanges PTEs or splits the PMD entry (which means allocating a PTE page
table).  Both operations do not free any page tables (PTE), so ptdump
cannot run into a UAF on any page tables.  The worst case is just dumping
an wrong value.

[1] https://lore.kernel.org/all/20210510030027.56044-1-songmuchun@bytedance.com/
[2] https://lore.kernel.org/all/20210518091826.36937-1-songmuchun@bytedance.com/

[songmuchun@bytedance.com: restructure the code comment inside flush_dcache_page()]
  Link: https://lkml.kernel.org/r/20220414072646.21910-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220331065640.5777-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Tested-by: Barry Song <baohua@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:03 -07:00
Christoph Hellwig
c6af2aa9ff swiotlb: make the swiotlb_init interface more useful
Pass a boolean flag to indicate if swiotlb needs to be enabled based on
the addressing needs, and replace the verbose argument with a set of
flags, including one to force enable bounce buffering.

Note that this patch removes the possibility to force xen-swiotlb use
with the swiotlb=force parameter on the command line on x86 (arm and
arm64 never supported that), but this interface will be restored shortly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18 07:21:11 +02:00
Julia Lawall
dd671f16b1 arm64: fix typos in comments
Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220318103729.157574-10-Julia.Lawall@inria.fr
[will: Squashed in 20220318103729.157574-28-Julia.Lawall@inria.fr]
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-04 10:32:50 +01:00
Andrey Konovalov
36c4a73bf8 kasan, arm64: don't tag executable vmalloc allocations
Besides asking vmalloc memory to be executable via the prot argument of
__vmalloc_node_range() (see the previous patch), the kernel can skip that
bit and instead mark memory as executable via set_memory_x().

Once tag-based KASAN modes start tagging vmalloc allocations, executing
code from such allocations will lead to the PC register getting a tag,
which is not tolerated by the kernel.

Generic kernel code typically allocates memory via module_alloc() if it
intends to mark memory as executable.  (On arm64 module_alloc() uses
__vmalloc_node_range() without setting the executable bit).

Thus, reset pointer tags of pointers returned from module_alloc().

However, on arm64 there's an exception: the eBPF subsystem.  Instead of
using module_alloc(), it uses vmalloc() (via bpf_jit_alloc_exec()) to
allocate its JIT region.

Thus, reset pointer tags of pointers returned from bpf_jit_alloc_exec().

Resetting tags for these pointers results in untagged pointers being
passed to set_memory_x().  This causes conflicts in arithmetic checks in
change_memory_common(), as vm_struct->addr pointer returned by
find_vm_area() is tagged.

Reset pointer tag of find_vm_area(addr)->addr in change_memory_common().

Link: https://lkml.kernel.org/r/b7b2595423340cd7d76b770e5d519acf3b72f0ab.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:48 -07:00
Linus Torvalds
52deda9551 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Various misc subsystems, before getting into the post-linux-next
  material.

  41 patches.

  Subsystems affected by this patch series: procfs, misc, core-kernel,
  lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
  taskstats, panic, kcov, resource, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
  kernel/resource: fix kfree() of bootmem memory again
  kcov: properly handle subsequent mmap calls
  kcov: split ioctl handling into locked and unlocked parts
  panic: move panic_print before kmsg dumpers
  panic: add option to dump all CPUs backtraces in panic_print
  docs: sysctl/kernel: add missing bit to panic_print
  taskstats: remove unneeded dead assignment
  kasan: no need to unset panic_on_warn in end_report()
  ubsan: no need to unset panic_on_warn in ubsan_epilogue()
  panic: unset panic_on_warn inside panic()
  docs: kdump: add scp example to write out the dump file
  docs: kdump: update description about sysfs file system support
  arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
  cgroup: use irqsave in cgroup_rstat_flush_locked().
  fat: use pointer to simple type in put_user()
  minix: fix bug when opening a file with O_DIRECT
  ...
2022-03-24 14:14:07 -07:00
Jisheng Zhang
d339f1584f arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.

Link: https://lkml.kernel.org/r/20211206160514.2000-5-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Linus Torvalds
9030fb0bb9 Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention
    on i_mmap_rwsem (Hugh Dickins):
    https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
  - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
    https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
  - Convert GUP to use folios and make pincount available for order-1
    pages. (Matthew Wilcox)
  - Convert a few more truncation functions to use folios (Matthew Wilcox)
  - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
  - Convert rmap_walk to use folios (Matthew Wilcox)
  - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
  - Add support for creating large folios in readahead (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
 gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
 P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
 s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
 Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
 CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
 v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
 =n9Ad
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Linus Torvalds
3bf03b9a08 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
2022-03-22 16:11:53 -07:00
Anshuman Khandual
16785bd774 mm: merge pte_mkhuge() call into arch_make_huge_pte()
Each call into pte_mkhuge() is invariably followed by
arch_make_huge_pte().  Instead arch_make_huge_pte() can accommodate
pte_mkhuge() at the beginning.  This updates generic fallback stub for
arch_make_huge_pte() and available platforms definitions.  This makes huge
pte creation much cleaner and easier to follow.

Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:04 -07:00
Linus Torvalds
356a1adca8 arm64 updates for 5.18
- Support for including MTE tags in ELF coredumps
 
 - Instruction encoder updates, including fixes to 64-bit immediate
   generation and support for the LSE atomic instructions
 
 - Improvements to kselftests for MTE and fpsimd
 
 - Symbol aliasing and linker script cleanups
 
 - Reduce instruction cache maintenance performed for user mappings
   created using contiguous PTEs
 
 - Support for the new "asymmetric" MTE mode, where stores are checked
   asynchronously but loads are checked synchronously
 
 - Support for the latest pointer authentication algorithm ("QARMA3")
 
 - Support for the DDR PMU present in the Marvell CN10K platform
 
 - Support for the CPU PMU present in the Apple M1 platform
 
 - Use the RNDR instruction for arch_get_random_{int,long}()
 
 - Update our copy of the Arm optimised string routines for str{n}cmp()
 
 - Fix signal frame generation for CPUs which have foolishly elected to
   avoid building in support for the fpsimd instructions
 
 - Workaround for Marvell GICv3 erratum #38545
 
 - Clarification to our Documentation (booting reqs. and MTE prctl())
 
 - Miscellanous cleanups and minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmIvta8QHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNAIhB/oDSva5FryAFExVuIB+mqRkbZO9kj6fy/5J
 ctN9LEVO2GI/U1TVAUWop1lXmP8Kbq5UCZOAuY8sz7dAZs7NRUWkwTrXVhaTpi6L
 oxCfu5Afu76d/TGgivNz+G7/ewIJRFj5zCPmHezLF9iiWPUkcAsP0XCp4a0iOjU4
 04O4d7TL/ap9ujEes+U0oEXHnyDTPrVB2OVE316FKD1fgztcjVJ2U+TxX5O4xitT
 PPIfeQCjQBq1B2OC1cptE3wpP+YEr9OZJbx+Ieweidy1CSInEy0nZ13tLoUnGPGU
 KPhsvO9daUCbhbd5IDRBuXmTi/sHU4NIB8LNEVzT1mUPnU8pCizv
 =ziGg
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:

 - Support for including MTE tags in ELF coredumps

 - Instruction encoder updates, including fixes to 64-bit immediate
   generation and support for the LSE atomic instructions

 - Improvements to kselftests for MTE and fpsimd

 - Symbol aliasing and linker script cleanups

 - Reduce instruction cache maintenance performed for user mappings
   created using contiguous PTEs

 - Support for the new "asymmetric" MTE mode, where stores are checked
   asynchronously but loads are checked synchronously

 - Support for the latest pointer authentication algorithm ("QARMA3")

 - Support for the DDR PMU present in the Marvell CN10K platform

 - Support for the CPU PMU present in the Apple M1 platform

 - Use the RNDR instruction for arch_get_random_{int,long}()

 - Update our copy of the Arm optimised string routines for str{n}cmp()

 - Fix signal frame generation for CPUs which have foolishly elected to
   avoid building in support for the fpsimd instructions

 - Workaround for Marvell GICv3 erratum #38545

 - Clarification to our Documentation (booting reqs. and MTE prctl())

 - Miscellanous cleanups and minor fixes

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
  docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
  arm64/mte: Remove asymmetric mode from the prctl() interface
  arm64: Add cavium_erratum_23154_cpus missing sentinel
  perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
  arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
  Documentation: vmcoreinfo: Fix htmldocs warning
  kasan: fix a missing header include of static_keys.h
  drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
  drivers/perf: arm_pmu: Handle 47 bit counters
  arm64: perf: Consistently make all event numbers as 16-bits
  arm64: perf: Expose some Armv9 common events under sysfs
  perf/marvell: cn10k DDR perf event core ownership
  perf/marvell: cn10k DDR perfmon event overflow handling
  perf/marvell: CN10k DDR performance monitor support
  dt-bindings: perf: marvell: cn10k ddr performance monitor
  arm64: clean up tools Makefile
  perf/arm-cmn: Update watchpoint format
  perf/arm-cmn: Hide XP PUB events for CMN-600
  arm64: drop unused includes of <linux/personality.h>
  arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
  ...
2022-03-21 10:46:39 -07:00
Will Deacon
641d804157 Merge branch 'for-next/spectre-bhb' into for-next/core
Merge in the latest Spectre mess to fix up conflicts with what was
already queued for 5.18 when the embargo finally lifted.

* for-next/spectre-bhb: (21 commits)
  arm64: Do not include __READ_ONCE() block in assembly files
  arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  arm64: Use the clearbhb instruction in mitigations
  KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
  arm64: Mitigate spectre style branch history side channels
  arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
  arm64: Add percpu vectors for EL1
  arm64: entry: Add macro for reading symbol addresses from the trampoline
  arm64: entry: Add vectors that have the bhb mitigation sequences
  arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
  arm64: entry: Allow the trampoline text to occupy multiple pages
  arm64: entry: Make the kpti trampoline's kpti sequence optional
  arm64: entry: Move trampoline macros out of ifdef'd section
  arm64: entry: Don't assume tramp_vectors is the start of the vectors
  arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
  arm64: entry: Move the trampoline data page before the text page
  arm64: entry: Free up another register on kpti's tramp_exit path
  arm64: entry: Make the trampoline cleanup optional
  KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
  arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
  ...
2022-03-14 19:08:31 +00:00
Will Deacon
20fd2ed10f Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
  Documentation: vmcoreinfo: Fix htmldocs warning
  arm64/mm: Drop use_1G_block()
  arm64: avoid flushing icache multiple times on contiguous HugeTLB
  arm64: crash_core: Export MODULES, VMALLOC, and VMEMMAP ranges
  arm64/hugetlb: Define __hugetlb_valid_size()
  arm64/mm: avoid fixmap race condition when create pud mapping
  arm64/mm: Consolidate TCR_EL1 fields
2022-03-14 19:01:18 +00:00
Will Deacon
b3ea0eafa9 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
  arm64: clean up tools Makefile
  arm64: drop unused includes of <linux/personality.h>
  arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
  arm64: prevent instrumentation of bp hardening callbacks
  arm64: cpufeature: Remove cpu_has_fwb() check
  arm64: atomics: remove redundant static branch
  arm64: entry: Save some nops when CONFIG_ARM64_PSEUDO_NMI is not set
2022-03-14 19:01:12 +00:00
Will Deacon
563c463595 Merge branch 'for-next/linkage' into for-next/core
* for-next/linkage:
  arm64: module: remove (NOLOAD) from linker script
  linkage: remove SYM_FUNC_{START,END}_ALIAS()
  x86: clean up symbol aliasing
  arm64: clean up symbol aliasing
  linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()
2022-03-14 19:01:05 +00:00
Linus Torvalds
e7e19defa5 - Fix compilation of eBPF object files that indirectly include
mte-kasan.h.
 
 - Fix test for execute-only permissions with EPAN (Enhanced Privileged
   Access Never, ARMv8.7 feature).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmIoyFYACgkQa9axLQDI
 XvGvuw/+OFBDYhvIY8C845RTzmpjrukTusy7GQcin5XpplBzxr2z6AnGuxN+Fvez
 UJZdzJLocwZRNiNqzdbIC0ycMmtEPKn/QZzGFpmsFs42wQOlztrQx7PjdOCnn0HR
 Mtcd0BTHRAogkPKqfvkuiUqCrkorzQ4ka+EN7TavzxMEfegzqBsZk5r9eE7xgGvc
 KLPmz9pFB3K3dFfUhfneHdWrPwERrCjk8ygT3Ia9Sg3UcyT7jzNGOtXBAOLgVuXY
 w/0z32H1TIBbmIVgakXHE0XqXmh5Z53zPO6T2wsOJNEVbHTnLbq1aRcbw2K5dvWc
 hoSZWharQ72yWn8VHu8w3zropNHiSdCSYBIK3jeVzh4edxCvuRmPuTk2g9oDoSUp
 zVHVA8v5GeGHZdJ2Jk5mPK/mRlwN/GbRg4lhhUhkglx9mWaAdE9j8ouGQPSXFjbr
 J3rsVxqYb2948IHz5WOlXJc2baVf9MVS49yZI03cFWyBl1FMTYMDcDkQc0EtM7J2
 Z/VMc6r+22vW/IFKmyCqxJbQh+BnO5X5HS6+1r08uoMYvyynV+ua7MO7qaVI+6cX
 zFbSfkGkyGCOdJGng7BrlmVABeO0VQqb3rsL1OEiYqOm45ekiwM99HiodxaUkC0K
 mlbDxslBf8ei2XzaPz1bg8T9gov19PmJ38NaYmUDWy59mW/ryOM=
 =qWQy
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Fix compilation of eBPF object files that indirectly include
   mte-kasan.h.

 - Fix test for execute-only permissions with EPAN (Enhanced Privileged
   Access Never, ARMv8.7 feature).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kasan: fix include error in MTE functions
  arm64: Ensure execute-only permissions are not allowed without EPAN
2022-03-09 12:59:21 -08:00
Will Deacon
770093459b arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
Commit 031495635b ("arm64: Do not defer reserve_crashkernel() for
platforms with no DMA memory zones") introduced different definitions
for 'arm64_dma_phys_limit' depending on CONFIG_ZONE_DMA{,32} based on
a late suggestion from Pasha. Sadly, this results in a build error when
passing W=1:

  | arch/arm64/mm/init.c:90:19: error: conflicting type qualifiers for 'arm64_dma_phys_limit'

Drop the 'const' for now and use '__ro_after_init' consistently.

Link: https://lore.kernel.org/r/202203090241.aj7paWeX-lkp@intel.com
Link: https://lore.kernel.org/r/CA+CK2bDbbx=8R=UthkMesWOST8eJMtOGJdfMRTFSwVmo0Vn0EA@mail.gmail.com
Fixes: 031495635b ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones")
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-09 12:21:37 +00:00
Vijay Balakrishna
031495635b arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
The following patches resulted in deferring crash kernel reservation to
mem_init(), mainly aimed at platforms with DMA memory zones (no IOMMU),
in particular Raspberry Pi 4.

commit 1a8e1cef76 ("arm64: use both ZONE_DMA and ZONE_DMA32")
commit 8424ecdde7 ("arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges")
commit 0a30c53573 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
commit 2687275a58 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required")

Above changes introduced boot slowdown due to linear map creation for
all the memory banks with NO_BLOCK_MAPPINGS, see discussion[1].  The proposed
changes restore crash kernel reservation to earlier behavior thus avoids
slow boot, particularly for platforms with IOMMU (no DMA memory zones).

Tested changes to confirm no ~150ms boot slowdown on our SoC with IOMMU
and 8GB memory.  Also tested with ZONE_DMA and/or ZONE_DMA32 configs to confirm
no regression to deferring scheme of crash kernel memory reservation.
In both cases successfully collected kernel crash dump.

[1] https://lore.kernel.org/all/9436d033-579b-55fa-9b00-6f4b661c2dd7@linux.microsoft.com/

Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Cc: stable@vger.kernel.org
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/1646242689-20744-1-git-send-email-vijayb@linux.microsoft.com
[will: Add #ifdef CONFIG_KEXEC_CORE guards to fix 'crashk_res' references in allnoconfig build]
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-08 10:22:33 +00:00
Catalin Marinas
6e2edd6371 arm64: Ensure execute-only permissions are not allowed without EPAN
Commit 18107f8a2d ("arm64: Support execute-only permissions with
Enhanced PAN") re-introduced execute-only permissions when EPAN is
available. When EPAN is not available, arch_filter_pgprot() is supposed
to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However,
if BTI or MTE are present, such check does not detect the execute-only
pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE),
allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.

Remove the arch_filter_pgprot() function, change the default VM_EXEC
permissions to PAGE_READONLY_EXEC and update the protection_map[] array
at core_initcall() if EPAN is detected.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 18107f8a2d ("arm64: Support execute-only permissions with Enhanced PAN")
Cc: <stable@vger.kernel.org> # 5.13.x
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
2022-03-08 10:03:51 +00:00
Anshuman Khandual
1310222c27 arm64/mm: Drop use_1G_block()
pud_sect_supported() already checks for PUD level block mapping support i.e
on ARM64_4K_PAGES config. Hence pud_sect_supported(), along with some other
required alignment checks can help completely drop use_1G_block().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1644988012-25455-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-07 21:44:22 +00:00
Muchun Song
cf5a501d98 arm64: avoid flushing icache multiple times on contiguous HugeTLB
When a contiguous HugeTLB page is mapped, set_pte_at() will be called
CONT_PTES/CONT_PMDS times.  Therefore, __sync_icache_dcache() will
flush cache multiple times if the page is executable (to ensure
the I-D cache coherency).  However, the first flushing cache already
covers subsequent cache flush operations.  So only flusing cache
for the head page if it is a HugeTLB page to avoid redundant cache
flushing.  In the next patch, it is also depends on this change
since the tail vmemmap pages of HugeTLB is mapped with read-only
meanning only head page struct can be modified.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220302084624.33340-1-songmuchun@bytedance.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-07 21:42:34 +00:00
Christoph Hellwig
dc90f0846d mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.

Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Anshuman Khandual
a8a733b201 arm64/hugetlb: Define __hugetlb_valid_size()
arch_hugetlb_valid_size() can be just factored out to create another helper
to be used in arch_hugetlb_migration_supported() as well. This just defines
__hugetlb_valid_size() for that purpose.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1645073557-6150-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22 22:11:54 +00:00
Mark Rutland
0f61f6be1f arm64: clean up symbol aliasing
Now that we have SYM_FUNC_ALIAS() and SYM_FUNC_ALIAS_WEAK(), use those
to simplify and more consistently define function aliases across
arch/arm64.

Aliases are now defined in terms of a canonical function name. For
position-independent functions I've made the __pi_<func> name the
canonical name, and defined other alises in terms of this.

The SYM_FUNC_{START,END}_PI(func) macros obscure the __pi_<func> name,
and make this hard to seatch for. The SYM_FUNC_START_WEAK_PI() macro
also obscures the fact that the __pi_<func> fymbol is global and the
<func> symbol is weak. For clarity, I have removed these macros and used
SYM_FUNC_{START,END}() directly with the __pi_<func> name.

For example:

	SYM_FUNC_START_WEAK_PI(func)
	... asm insns ...
	SYM_FUNC_END_PI(func)
	EXPORT_SYMBOL(func)

... becomes:

	SYM_FUNC_START(__pi_func)
	... asm insns ...
	SYM_FUNC_END(__pi_func)

	SYM_FUNC_ALIAS_WEAK(func, __pi_func)
	EXPORT_SYMBOL(func)

For clarity, where there are multiple annotations such as
EXPORT_SYMBOL(), I've tried to keep annotations grouped by symbol. For
example, where a function has a name and an alias which are both
exported, this is organised as:

	SYM_FUNC_START(func)
	... asm insns ...
	SYM_FUNC_END(func)
	EXPORT_SYMBOL(func)

	SYM_FUNC_ALIAS(alias, func)
	EXPORT_SYMBOL(alias)

For consistency with the other string functions, I've defined strrchr as
a position-independent function, as it can safely be used as such even
though we have no users today.

As we no longer use SYM_FUNC_{START,END}_ALIAS(), our local copies are
removed. The common versions will be removed by a subsequent patch.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220216162229.1076788-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22 16:21:34 +00:00
Catalin Marinas
ab1e435ca7 arm64: mte: Define the number of bytes for storing the tags in a page
Rather than explicitly calculating the number of bytes for a compact tag
storage format corresponding to a page, just add a MTE_PAGE_TAG_STORAGE
macro. With the current MTE implementation of 4 bits per tag, we store
2 tags in a byte.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Luis Machado <luis.machado@linaro.org>
Link: https://lore.kernel.org/r/20220131165456.2160675-4-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-15 22:53:29 +00:00
James Morse
a9c406e646 arm64: entry: Allow the trampoline text to occupy multiple pages
Adding a second set of vectors to .entry.tramp.text will make it
larger than a single 4K page.

Allow the trampoline text to occupy up to three pages by adding two
more fixmap slots. Previous changes to tramp_valias allowed it to reach
beyond a single page.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2022-02-15 17:40:28 +00:00
Jianyong Wu
ee017ee353 arm64/mm: avoid fixmap race condition when create pud mapping
The 'fixmap' is a global resource and is used recursively by
create pud mapping(), leading to a potential race condition in the
presence of a concurrent call to alloc_init_pud():

kernel_init thread                          virtio-mem workqueue thread
==================                          ===========================

  alloc_init_pud(...)                       alloc_init_pud(...)
  pudp = pud_set_fixmap_offset(...)         pudp = pud_set_fixmap_offset(...)
  READ_ONCE(*pudp)
  pud_clear_fixmap(...)
                                            READ_ONCE(*pudp) // CRASH!

As kernel may sleep during creating pud mapping, introduce a mutex lock to
serialise use of the fixmap entries by alloc_init_pud(). However, there is
no need for locking in early boot stage and it doesn't work well with
KASLR enabled when early boot. So, enable lock when system_state doesn't
equal to "SYSTEM_BOOTING".

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: f471044545 ("arm64: mm: use fixmap when creating page tables")
Link: https://lore.kernel.org/r/20220201114400.56885-1-jianyong.wu@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-15 15:45:24 +00:00
Anshuman Khandual
e921da6bc7 arm64/mm: Consolidate TCR_EL1 fields
This renames and moves SYS_TCR_EL1_TCMA1 and SYS_TCR_EL1_TCMA0 definitions
into pgtable-hwdef.h thus consolidating all TCR fields in a single header.
This does not cause any functional change.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1643121513-21854-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-15 15:34:22 +00:00
Evgenii Stepanov
3758a6c74e arm64: extable: fix load_unaligned_zeropad() reg indices
In ex_handler_load_unaligned_zeropad() we erroneously extract the data and
addr register indices from ex->type rather than ex->data. As ex->type will
contain EX_TYPE_LOAD_UNALIGNED_ZEROPAD (i.e. 4):
 * We'll always treat X0 as the address register, since EX_DATA_REG_ADDR is
   extracted from bits [9:5]. Thus, we may attempt to dereference an
   arbitrary address as X0 may hold an arbitrary value.
 * We'll always treat X4 as the data register, since EX_DATA_REG_DATA is
   extracted from bits [4:0]. Thus we will corrupt X4 and cause arbitrary
   behaviour within load_unaligned_zeropad() and its caller.

Fix this by extracting both values from ex->data as originally intended.

On an MTE-enabled QEMU image we are hitting the following crash:
 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
 Call trace:
  fixup_exception+0xc4/0x108
  __do_kernel_fault+0x3c/0x268
  do_tag_check_fault+0x3c/0x104
  do_mem_abort+0x44/0xf4
  el1_abort+0x40/0x64
  el1h_64_sync_handler+0x60/0xa0
  el1h_64_sync+0x7c/0x80
  link_path_walk+0x150/0x344
  path_openat+0xa0/0x7dc
  do_filp_open+0xb8/0x168
  do_sys_openat2+0x88/0x17c
  __arm64_sys_openat+0x74/0xa0
  invoke_syscall+0x48/0x148
  el0_svc_common+0xb8/0xf8
  do_el0_svc+0x28/0x88
  el0_svc+0x24/0x84
  el0t_64_sync_handler+0x88/0xec
  el0t_64_sync+0x1b4/0x1b8
 Code: f8695a69 71007d1f 540000e0 927df12a (f940014a)

Fixes: 753b323687 ("arm64: extable: add load_unaligned_zeropad() handler")
Cc: <stable@vger.kernel.org> # 5.16.x
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
Link: https://lore.kernel.org/r/20220125182217.2605202-1-eugenis@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-26 18:58:12 +00:00
Linus Torvalds
b21bae9af1 arm64 fixes/cleanups:
- Add brackets to the io_stop_wc macro.
 
 - Avoid -Warray-bounds warning with the LSE atomics inline asm.
 
 - Apply __ro_after_init to memory_limit.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmHq44IACgkQa9axLQDI
 XvFbHBAAqVG6jcW7LfNYbGED28iIj08c2o0Y4+p87caXR0xNDobrPYtUl3zy76l9
 lJ+OTrUdBCBrd5exHghyJPvnC+dB1yjSu595DrsN4eSJ0/jfkfyv+vGHTASoyoeK
 HHL6L9rj3jBlpgDMuRiJFggdUddTSocWMpLDjUbZCNDpQOkf8TakFYrAOosooixS
 g0Psq5fJ6aP+yymQMQuIJyGiRzskSilTdjFSbMSqFvrCPfzQMZWLnMtF5OAs9Gbi
 Wcxb1AKAJLZFfy+XslgK709jIeTZ6CExTbWoY/yP9sxqgz+ALbj/dQBY4ryyUMVp
 C/Vu/bMLEnPW5jgppGXftsnsjivmVj1frb2/5+IY8SHI6Niq+8dna774jOLy6Mad
 PjYWSg5QxQSw4loZJpGzYduf6G3keKjSSkgeNns3n4f1ZqzAgHCuFF7TglIOeQvp
 /8AmR3u5fHjp0sMl6DE6zJ+cMTXwAxdXZYPHY9PWexLEsKiS1vkMbN2LYcb4tIjt
 xjymVGUUZ5/qKQSeZoKCyhjdgc06G6A5W+A/TmgVDGqkBa/U33/PvwoAxYwmaIf8
 z02/MEjliP3IRvG0KzX+Cw7SHLtJ+nIgaZQgSK4QZBmKrJQs6CI1M5V1g4gVhpmM
 sqwtAiIHpUspmu+dP/fF4vcGzvgnZam2kQDv5mv1vFy3Dh1yn0k=
 =LonE
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes/cleanups from Catalin Marinas:
 "Some fixes that turned up during the merge window:

   - Add brackets to the io_stop_wc macro

   - Avoid -Warray-bounds warning with the LSE atomics inline asm

   - Apply __ro_after_init to memory_limit"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: apply __ro_after_init to memory_limit
  arm64: atomics: lse: Dereference matching size
  asm-generic: Add missing brackets for io_stop_wc macro
2022-01-22 09:22:10 +02:00
Peng Fan
bb425a7598 arm64: mm: apply __ro_after_init to memory_limit
This variable is only set during initialization, so mark with
__ro_after_init.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211215064559.2843555-1-peng.fan@oss.nxp.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-20 09:15:16 +00:00
Linus Torvalds
35ce8ae9ae Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal/exit/ptrace updates from Eric Biederman:
 "This set of changes deletes some dead code, makes a lot of cleanups
  which hopefully make the code easier to follow, and fixes bugs found
  along the way.

  The end-game which I have not yet reached yet is for fatal signals
  that generate coredumps to be short-circuit deliverable from
  complete_signal, for force_siginfo_to_task not to require changing
  userspace configured signal delivery state, and for the ptrace stops
  to always happen in locations where we can guarantee on all
  architectures that the all of the registers are saved and available on
  the stack.

  Removal of profile_task_ext, profile_munmap, and profile_handoff_task
  are the big successes for dead code removal this round.

  A bunch of small bug fixes are included, as most of the issues
  reported were small enough that they would not affect bisection so I
  simply added the fixes and did not fold the fixes into the changes
  they were fixing.

  There was a bug that broke coredumps piped to systemd-coredump. I
  dropped the change that caused that bug and replaced it entirely with
  something much more restrained. Unfortunately that required some
  rebasing.

  Some successes after this set of changes: There are few enough calls
  to do_exit to audit in a reasonable amount of time. The lifetime of
  struct kthread now matches the lifetime of struct task, and the
  pointer to struct kthread is no longer stored in set_child_tid. The
  flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
  removed. Issues where task->exit_code was examined with
  signal->group_exit_code should been examined were fixed.

  There are several loosely related changes included because I am
  cleaning up and if I don't include them they will probably get lost.

  The original postings of these changes can be found at:
     https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org

  I trimmed back the last set of changes to only the obviously correct
  once. Simply because there was less time for review than I had hoped"

* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
  ptrace/m68k: Stop open coding ptrace_report_syscall
  ptrace: Remove unused regs argument from ptrace_report_syscall
  ptrace: Remove second setting of PT_SEIZED in ptrace_attach
  taskstats: Cleanup the use of task->exit_code
  exit: Use the correct exit_code in /proc/<pid>/stat
  exit: Fix the exit_code for wait_task_zombie
  exit: Coredumps reach do_group_exit
  exit: Remove profile_handoff_task
  exit: Remove profile_task_exit & profile_munmap
  signal: clean up kernel-doc comments
  signal: Remove the helper signal_group_exit
  signal: Rename group_exit_task group_exec_task
  coredump: Stop setting signal->group_exit_task
  signal: Remove SIGNAL_GROUP_COREDUMP
  signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
  signal: Make coredump handling explicit in complete_signal
  signal: Have prepare_signal detect coredumps using signal->core_state
  signal: Have the oom killer detect coredumps using signal->core_state
  exit: Move force_uaccess back into do_exit
  exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
  ...
2022-01-17 05:49:30 +02:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Qi Zheng
36ef159f44 mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit
Since commit 4064b98270 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:

	flags & FAULT_FLAG_ALLOW_RETRY

So just remove it.

[akpm@linux-foundation.org: coding style fixes]

Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:27 +02:00
Linus Torvalds
8efd0d9c31 Networking changes for 5.17.
Core
 ----
 
  - Defer freeing TCP skbs to the BH handler, whenever possible,
    or at least perform the freeing outside of the socket lock section
    to decrease cross-CPU allocator work and improve latency.
 
  - Add netdevice refcount tracking to locate sources of netdevice
    and net namespace refcount leaks.
 
  - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
    all queues from a single CPU removing latency spikes.
 
  - Various small optimizations throughout the stack from Eric Dumazet.
 
  - Make netdev->dev_addr[] constant, force modifications to go via
    appropriate helpers to allow us to keep addresses in ordered data
    structures.
 
  - Replace unix_table_lock with per-hash locks, improving performance
    of bind() calls.
 
  - Extend skb drop tracepoint with a drop reason.
 
  - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
 
 BPF
 ---
 
  - New helpers:
    - bpf_find_vma(), find and inspect VMAs for profiling use cases
    - bpf_loop(), runtime-bounded loop helper trading some execution
      time for much faster (if at all converging) verification
    - bpf_strncmp(), improve performance, avoid compiler flakiness
    - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
      for tracing programs, all inlined by the verifier
 
  - Support BPF relocations (CO-RE) in the kernel loader.
 
  - Further the support for BTF_TYPE_TAG annotations.
 
  - Allow access to local storage in sleepable helpers.
 
  - Convert verifier argument types to a composable form with different
    attributes which can be shared across types (ro, maybe-null).
 
  - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
    creating new, extensible ones where missing and deprecating those
    to be removed.
 
 Protocols
 ---------
 
  - WiFi (mac80211/cfg80211):
    - notify user space about long "come back in N" AP responses,
      allow it to react to such temporary rejections
    - allow non-standard VHT MCS 10/11 rates
    - use coarse time in airtime fairness code to save CPU cycles
 
  - Bluetooth:
    - rework of HCI command execution serialization to use a common
      queue and work struct, and improve handling errors reported
      in the middle of a batch of commands
    - rework HCI event handling to use skb_pull_data, avoiding packet
      parsing pitfalls
    - support AOSP Bluetooth Quality Report
 
  - SMC:
    - support net namespaces, following the RDMA model
    - improve connection establishment latency by pre-clearing buffers
    - introduce TCP ULP for automatic redirection to SMC
 
  - Multi-Path TCP:
    - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
    - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
      IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
    - support cmsgs: TCP_INQ
    - improvements in the data scheduler (assigning data to subflows)
    - support fastclose option (quick shutdown of the full MPTCP
      connection, similar to TCP RST in regular TCP)
 
  - MCTP (Management Component Transport) over serial, as defined by
    DMTF spec DSP0253 - "MCTP Serial Transport Binding".
 
 Driver API
 ----------
 
  - Support timestamping on bond interfaces in active/passive mode.
 
  - Introduce generic phylink link mode validation for drivers which
    don't have any quirks and where MAC capability bits fully express
    what's supported. Allow PCS layer to participate in the validation.
    Convert a number of drivers.
 
  - Add support to set/get size of buffers on the Rx rings and size of
    the tx copybreak buffer via ethtool.
 
  - Support offloading TC actions as first-class citizens rather than
    only as attributes of filters, improve sharing and device resource
    utilization.
 
  - WiFi (mac80211/cfg80211):
    - support forwarding offload (ndo_fill_forward_path)
    - support for background radar detection hardware
    - SA Query Procedures offload on the AP side
 
 New hardware / drivers
 ----------------------
 
  - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
    real-time requirements for isochronous communication with protocols
    like OPC UA Pub/Sub.
 
  - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
    integrated into many older Qualcomm SoCs, e.g. MSM8916 or
    MSM8974 (qcom_bam_dmux).
 
  - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch
    driver with support for bridging, VLANs and multicast forwarding
    (lan966x).
 
  - iwlmei driver for co-operating between Intel's WiFi driver and
    Intel's Active Management Technology (AMT) devices.
 
  - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
 
  - Bluetooth:
    - MediaTek MT7921 SDIO devices
    - Foxconn MT7922A
    - Realtek RTL8852AE
 
 Drivers
 -------
 
  - Significantly improve performance in the datapaths of:
    lan78xx, ax88179_178a, lantiq_xrx200, bnxt.
 
  - Intel Ethernet NICs:
    - igb: support PTP/time PEROUT and EXTTS SDP functions on
      82580/i354/i350 adapters
    - ixgbevf: new PF -> VF mailbox API which avoids the risk of
      mailbox corruption with ESXi
    - iavf: support configuration of VLAN features of finer granularity,
      stacked tags and filtering
    - ice: PTP support for new E822 devices with sub-ns precision
    - ice: support firmware activation without reboot
 
  - Mellanox Ethernet NICs (mlx5):
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
    - support TC forwarding when tunnel encap and decap happen between
      two ports of the same NIC
    - dynamically size and allow disabling various features to save
      resources for running in embedded / SmartNIC scenarios
 
  - Broadcom Ethernet NICs (bnxt):
    - use page frag allocator to improve Rx performance
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
 
  - Other Ethernet NICs:
    - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
 
  - Microsoft cloud/virtual NIC (mana):
    - add XDP support (PASS, DROP, TX)
 
  - Mellanox Ethernet switches (mlxsw):
    - initial support for Spectrum-4 ASICs
    - VxLAN with IPv6 underlay
 
  - Marvell Ethernet switches (prestera):
    - support flower flow templates
    - add basic IP forwarding support
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - support Per-Stream Filtering and Policing (PSFP)
    - enable cut-through forwarding between ports by default
    - support FDMA to improve packet Rx/Tx to CPU
 
  - Other embedded switches:
    - hellcreek: improve trapping management (STP and PTP) packets
    - qca8k: support link aggregation and port mirroring
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - qca6390, wcn6855: enable 802.11 power save mode in station mode
    - BSS color change support
    - WCN6855 hw2.1 support
    - 11d scan offload support
    - scan MAC address randomization support
    - full monitor mode, only supported on QCN9074
    - qca6390/wcn6855: report signal and tx bitrate
    - qca6390: rfkill support
    - qca6390/wcn6855: regdb.bin support
 
  - Intel WiFi (iwlwifi):
    - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
      in cooperation with the BIOS
    - support for Optimized Connectivity Experience (OCE) scan
    - support firmware API version 68
    - lots of preparatory work for the upcoming Bz device family
 
  - MediaTek WiFi (mt76):
    - Specific Absorption Rate (SAR) support
    - mt7921: 160 MHz channel support
 
  - RealTek WiFi (rtw88):
    - Specific Absorption Rate (SAR) support
    - scan offload
 
  - Other WiFi NICs
    - ath10k: support fetching (pre-)calibration data from nvmem
    - brcmfmac: configure keep-alive packet on suspend
    - wcn36xx: beacon filter support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S
 IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k
 h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O
 i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY
 AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz
 Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP
 NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF
 yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ
 0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N
 cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO
 z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J
 r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo=
 =KLm3
 -----END PGP SIGNATURE-----

Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Defer freeing TCP skbs to the BH handler, whenever possible, or at
     least perform the freeing outside of the socket lock section to
     decrease cross-CPU allocator work and improve latency.

   - Add netdevice refcount tracking to locate sources of netdevice and
     net namespace refcount leaks.

   - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
     all queues from a single CPU removing latency spikes.

   - Various small optimizations throughout the stack from Eric Dumazet.

   - Make netdev->dev_addr[] constant, force modifications to go via
     appropriate helpers to allow us to keep addresses in ordered data
     structures.

   - Replace unix_table_lock with per-hash locks, improving performance
     of bind() calls.

   - Extend skb drop tracepoint with a drop reason.

   - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.

  BPF
  ---

   - New helpers:
      - bpf_find_vma(), find and inspect VMAs for profiling use cases
      - bpf_loop(), runtime-bounded loop helper trading some execution
        time for much faster (if at all converging) verification
      - bpf_strncmp(), improve performance, avoid compiler flakiness
      - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
        for tracing programs, all inlined by the verifier

   - Support BPF relocations (CO-RE) in the kernel loader.

   - Further the support for BTF_TYPE_TAG annotations.

   - Allow access to local storage in sleepable helpers.

   - Convert verifier argument types to a composable form with different
     attributes which can be shared across types (ro, maybe-null).

   - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
     creating new, extensible ones where missing and deprecating those
     to be removed.

  Protocols
  ---------

   - WiFi (mac80211/cfg80211):
      - notify user space about long "come back in N" AP responses,
        allow it to react to such temporary rejections
      - allow non-standard VHT MCS 10/11 rates
      - use coarse time in airtime fairness code to save CPU cycles

   - Bluetooth:
      - rework of HCI command execution serialization to use a common
        queue and work struct, and improve handling errors reported in
        the middle of a batch of commands
      - rework HCI event handling to use skb_pull_data, avoiding packet
        parsing pitfalls
      - support AOSP Bluetooth Quality Report

   - SMC:
      - support net namespaces, following the RDMA model
      - improve connection establishment latency by pre-clearing buffers
      - introduce TCP ULP for automatic redirection to SMC

   - Multi-Path TCP:
      - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
      - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
        IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
      - support cmsgs: TCP_INQ
      - improvements in the data scheduler (assigning data to subflows)
      - support fastclose option (quick shutdown of the full MPTCP
        connection, similar to TCP RST in regular TCP)

   - MCTP (Management Component Transport) over serial, as defined by
     DMTF spec DSP0253 - "MCTP Serial Transport Binding".

  Driver API
  ----------

   - Support timestamping on bond interfaces in active/passive mode.

   - Introduce generic phylink link mode validation for drivers which
     don't have any quirks and where MAC capability bits fully express
     what's supported. Allow PCS layer to participate in the validation.
     Convert a number of drivers.

   - Add support to set/get size of buffers on the Rx rings and size of
     the tx copybreak buffer via ethtool.

   - Support offloading TC actions as first-class citizens rather than
     only as attributes of filters, improve sharing and device resource
     utilization.

   - WiFi (mac80211/cfg80211):
      - support forwarding offload (ndo_fill_forward_path)
      - support for background radar detection hardware
      - SA Query Procedures offload on the AP side

  New hardware / drivers
  ----------------------

   - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
     real-time requirements for isochronous communication with protocols
     like OPC UA Pub/Sub.

   - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
     integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
     (qcom_bam_dmux).

   - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
     with support for bridging, VLANs and multicast forwarding
     (lan966x).

   - iwlmei driver for co-operating between Intel's WiFi driver and
     Intel's Active Management Technology (AMT) devices.

   - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips

   - Bluetooth:
      - MediaTek MT7921 SDIO devices
      - Foxconn MT7922A
      - Realtek RTL8852AE

  Drivers
  -------

   - Significantly improve performance in the datapaths of: lan78xx,
     ax88179_178a, lantiq_xrx200, bnxt.

   - Intel Ethernet NICs:
      - igb: support PTP/time PEROUT and EXTTS SDP functions on
        82580/i354/i350 adapters
      - ixgbevf: new PF -> VF mailbox API which avoids the risk of
        mailbox corruption with ESXi
      - iavf: support configuration of VLAN features of finer
        granularity, stacked tags and filtering
      - ice: PTP support for new E822 devices with sub-ns precision
      - ice: support firmware activation without reboot

   - Mellanox Ethernet NICs (mlx5):
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
      - support TC forwarding when tunnel encap and decap happen between
        two ports of the same NIC
      - dynamically size and allow disabling various features to save
        resources for running in embedded / SmartNIC scenarios

   - Broadcom Ethernet NICs (bnxt):
      - use page frag allocator to improve Rx performance
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool

   - Other Ethernet NICs:
      - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support

   - Microsoft cloud/virtual NIC (mana):
      - add XDP support (PASS, DROP, TX)

   - Mellanox Ethernet switches (mlxsw):
      - initial support for Spectrum-4 ASICs
      - VxLAN with IPv6 underlay

   - Marvell Ethernet switches (prestera):
      - support flower flow templates
      - add basic IP forwarding support

   - NXP embedded Ethernet switches (ocelot & felix):
      - support Per-Stream Filtering and Policing (PSFP)
      - enable cut-through forwarding between ports by default
      - support FDMA to improve packet Rx/Tx to CPU

   - Other embedded switches:
      - hellcreek: improve trapping management (STP and PTP) packets
      - qca8k: support link aggregation and port mirroring

   - Qualcomm 802.11ax WiFi (ath11k):
      - qca6390, wcn6855: enable 802.11 power save mode in station mode
      - BSS color change support
      - WCN6855 hw2.1 support
      - 11d scan offload support
      - scan MAC address randomization support
      - full monitor mode, only supported on QCN9074
      - qca6390/wcn6855: report signal and tx bitrate
      - qca6390: rfkill support
      - qca6390/wcn6855: regdb.bin support

   - Intel WiFi (iwlwifi):
      - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
        in cooperation with the BIOS
      - support for Optimized Connectivity Experience (OCE) scan
      - support firmware API version 68
      - lots of preparatory work for the upcoming Bz device family

   - MediaTek WiFi (mt76):
      - Specific Absorption Rate (SAR) support
      - mt7921: 160 MHz channel support

   - RealTek WiFi (rtw88):
      - Specific Absorption Rate (SAR) support
      - scan offload

   - Other WiFi NICs
      - ath10k: support fetching (pre-)calibration data from nvmem
      - brcmfmac: configure keep-alive packet on suspend
      - wcn36xx: beacon filter support"

* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
  tcp: tcp_send_challenge_ack delete useless param `skb`
  net/qla3xxx: Remove useless DMA-32 fallback configuration
  rocker: Remove useless DMA-32 fallback configuration
  hinic: Remove useless DMA-32 fallback configuration
  lan743x: Remove useless DMA-32 fallback configuration
  net: enetc: Remove useless DMA-32 fallback configuration
  cxgb4vf: Remove useless DMA-32 fallback configuration
  cxgb4: Remove useless DMA-32 fallback configuration
  cxgb3: Remove useless DMA-32 fallback configuration
  bnx2x: Remove useless DMA-32 fallback configuration
  et131x: Remove useless DMA-32 fallback configuration
  be2net: Remove useless DMA-32 fallback configuration
  vmxnet3: Remove useless DMA-32 fallback configuration
  bna: Simplify DMA setting
  net: alteon: Simplify DMA setting
  myri10ge: Simplify DMA setting
  qlcnic: Simplify DMA setting
  net: allwinner: Fix print format
  page_pool: remove spinlock in page_pool_refill_alloc_cache()
  amt: fix wrong return type of amt_send_membership_update()
  ...
2022-01-10 19:06:09 -08:00
Catalin Marinas
945409a6ef Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (32 commits)
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  ...

* for-next/misc:
  : Miscellaneous patches
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64/fp: Add comments documenting the usage of state restore functions
  arm64: mm: Use asid feature macro for cheanup
  arm64: mm: Rename asid2idx() to ctxid2asid()
  arm64: kexec: reduce calls to page_address()
  arm64: extable: remove unused ex_handler_t definition
  arm64: entry: Use SDEI event constants
  arm64: Simplify checking for populated DT
  arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c

* for-next/cache-ops-dzp:
  : Avoid DC instructions when DCZID_EL0.DZP == 1
  arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
  arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

* for-next/stacktrace:
  : Unify the arm64 unwind code
  arm64: Make some stacktrace functions private
  arm64: Make dump_backtrace() use arch_stack_walk()
  arm64: Make profile_pc() use arch_stack_walk()
  arm64: Make return_address() use arch_stack_walk()
  arm64: Make __get_wchan() use arch_stack_walk()
  arm64: Make perf_callchain_kernel() use arch_stack_walk()
  arm64: Mark __switch_to() as __sched
  arm64: Add comment for stack_info::kr_cur
  arch: Make ARCH_STACKWALK independent of STACKTRACE

* for-next/xor-neon:
  : Use SHA3 instructions to speed up XOR
  arm64/xor: use EOR3 instructions when available

* for-next/kasan:
  : Log potential KASAN shadow aliases
  arm64: mm: log potential KASAN shadow alias
  arm64: mm: use die_kernel_fault() in do_mem_abort()

* for-next/armv8_7-fp:
  : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
  arm64: cpufeature: add HWCAP for FEAT_RPRES
  arm64: add ID_AA64ISAR2_EL1 sys register
  arm64: cpufeature: add HWCAP for FEAT_AFP

* for-next/atomics:
  : arm64 atomics clean-ups and codegen improvements
  arm64: atomics: lse: define RETURN ops in terms of FETCH ops
  arm64: atomics: lse: improve constraints for simple ops
  arm64: atomics: lse: define ANDs in terms of ANDNOTs
  arm64: atomics lse: define SUBs in terms of ADDs
  arm64: atomics: format whitespace consistently

* for-next/bti:
  : BTI clean-ups
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: Use BTI C directly and unconditionally
  arm64: Unconditionally override SYM_FUNC macros
  arm64: Add macro version of the BTI instruction
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel

* for-next/sve:
  : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME

* for-next/kselftest:
  : arm64 kselftest additions
  kselftest/arm64: Add pidbench for floating point syscall cases
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information

* for-next/kcsan:
  : Enable KCSAN for arm64
  arm64: Enable KCSAN
2022-01-05 18:14:32 +00:00
Mark Rutland
c2c529b27c arm64: remove __dma_*_area() aliases
The __dma_inv_area() and __dma_clean_area() aliases make cache.S harder
to navigate, but don't gain us anything in practice.

For clarity, let's remove them along with their redundant comments. The
only users are __dma_map_area() and __dma_unmap_area(), which need to be
position independent, and can call __pi_dcache_inval_poc() and
__pi_dcache_clean_poc() directly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211206124715.4101571-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-15 11:19:41 +00:00
Mark Rutland
07b742a4d9 arm64: mm: log potential KASAN shadow alias
When the kernel is built with KASAN_GENERIC or KASAN_SW_TAGS, shadow
memory is allocated and mapped for all legitimate kernel addresses, and
prior to a regular memory access instrumentation will read from the
corresponding shadow address.

Due to the way memory addresses are converted to shadow addresses, bogus
pointers (e.g. NULL) can generate shadow addresses out of the bounds of
allocated shadow memory. For example, with KASAN_GENERIC and 48-bit VAs,
NULL would have a shadow address of dfff800000000000, which falls
between the TTBR ranges.

To make such cases easier to debug, this patch makes die_kernel_fault()
dump the real memory address range for any potential KASAN shadow access
using kasan_non_canonical_hook(), which results in fault information as
below when KASAN is enabled:

| Unable to handle kernel paging request at virtual address dfff800000000017
| KASAN: null-ptr-deref in range [0x00000000000000b8-0x00000000000000bf]
| Mem abort info:
|   ESR = 0x96000004
|   EC = 0x25: DABT (current EL), IL = 32 bits
|   SET = 0, FnV = 0
|   EA = 0, S1PTW = 0
|   FSC = 0x04: level 0 translation fault
| Data abort info:
|   ISV = 0, ISS = 0x00000004
|   CM = 0, WnR = 0
| [dfff800000000017] address between user and kernel address ranges

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211207183226.834557-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-13 18:47:09 +00:00
Mark Rutland
6f6cfa5867 arm64: mm: use die_kernel_fault() in do_mem_abort()
If we take an unhandled fault from EL1, either:

a) The xFSC handler calls die_kernel_fault() directly. In this case,
   die_kernel_fault() calls:

   pr_alert(..., msg, addr);
   mem_abort_decode(esr);
   show_pte(addr);
   die();
   bust_spinlocks(0);
   do_exit(SIGKILL);

b) The xFSC handler returns to do_mem_abort(), indicating failure. In
   this case, do_mem_abort() calls:

   pr_alert(..., addr);
   mem_abort_decode(esr);
   show_pte(addr);
   arm64_notify_die() {
     die();
   }

This inconstency is unfortunatem, and in theory in case (b) registered
notifiers can prevent us from terminating the faulting thread by
returning NOTIFY_STOP, whereupon we'll end up returning from the fault,
replaying, and almost certainly get stuck in a livelock spewing errors
into dmesg. We don't expect notifers to fix things up, since we dump
state to dmesg before invoking them, so it would be more sensible to
consistently terminate the thread in this case.

This patch has do_mem_abort() call die_kernel_fault() for unhandled
faults taken from EL1. Where we would previously have logged a messafe
of the form:

| Unhandled fault at ${ADDR}

... we will now log a message of the form:

| Unable to handle kernel ${FAULT_NAME} at virtual address ${ADDR}

... and we will consistently terminate the thread from which the fault
was taken.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211207183226.834557-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-13 18:47:09 +00:00
Eric W. Biederman
0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Yunfeng Ye
386a74677b arm64: mm: Use asid feature macro for cheanup
The commit 95b54c3e4c ("KVM: arm64: Add feature register flag
definitions") introduce the ID_AA64MMFR0_ASID_8 and ID_AA64MMFR0_ASID_16
macros.

We can use these macros for cheanup in get_cpu_asid_bits().

No functional change.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/f71c75d3-735e-b32a-8414-b3e513c77240@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 18:24:20 +00:00
Yunfeng Ye
a3a5b76341 arm64: mm: Rename asid2idx() to ctxid2asid()
The commit 0c8ea531b7 ("arm64: mm: Allocate ASIDs in pairs") introduce
the asid2idx and idx2asid macro, but these macros are not really useful
after the commit f88f42f853 ("arm64: context: Free up kernel ASIDs if
KPTI is not in use").

The code "(asid & ~ASID_MASK)" can be instead by a macro, which is the
same code with asid2idx(). So rename it to ctxid2asid() for a better
understanding.

Also we add asid2ctxid() macro, the contextid can be generated based on
the asid and generation through this macro.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/c31516eb-6d15-94e0-421c-305fc010ea79@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 18:24:19 +00:00
Jisheng Zhang
fde046e07d arm64: extable: remove unused ex_handler_t definition
The ex_handler_t type was introduced in commit d6e2cc5647 ("arm64:
extable: add `type` and `data` fields"), but has never been used, and
is unnecessary. Remove it.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211119124608.3f03380b@xhacker
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-06 16:28:26 +00:00
Jakub Kicinski
a5bdc36354 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-11-15

We've added 72 non-merge commits during the last 13 day(s) which contain
a total of 171 files changed, 2728 insertions(+), 1143 deletions(-).

The main changes are:

1) Add btf_type_tag attributes to bring kernel annotations like __user/__rcu to
   BTF such that BPF verifier will be able to detect misuse, from Yonghong Song.

2) Big batch of libbpf improvements including various fixes, future proofing APIs,
   and adding a unified, OPTS-based bpf_prog_load() low-level API, from Andrii Nakryiko.

3) Add ingress_ifindex to BPF_SK_LOOKUP program type for selectively applying the
   programmable socket lookup logic to packets from a given netdev, from Mark Pashmfouroush.

4) Remove the 128M upper JIT limit for BPF programs on arm64 and add selftest to
   ensure exception handling still works, from Russell King and Alan Maguire.

5) Add a new bpf_find_vma() helper for tracing to map an address to the backing
   file such as shared library, from Song Liu.

6) Batch of various misc fixes to bpftool, fixing a memory leak in BPF program dump,
   updating documentation and bash-completion among others, from Quentin Monnet.

7) Deprecate libbpf bpf_program__get_prog_info_linear() API and migrate its users as
   the API is heavily tailored around perf and is non-generic, from Dave Marchevsky.

8) Enable libbpf's strict mode by default in bpftool and add a --legacy option as an
   opt-out for more relaxed BPF program requirements, from Stanislav Fomichev.

9) Fix bpftool to use libbpf_get_error() to check for errors, from Hengqi Chen.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpftool: Use libbpf_get_error() to check error
  bpftool: Fix mixed indentation in documentation
  bpftool: Update the lists of names for maps and prog-attach types
  bpftool: Fix indent in option lists in the documentation
  bpftool: Remove inclusion of utilities.mak from Makefiles
  bpftool: Fix memory leak in prog_dump()
  selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning
  selftests/bpf: Fix an unused-but-set-variable compiler warning
  bpf: Introduce btf_tracing_ids
  bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs
  bpftool: Enable libbpf's strict mode by default
  docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support
  selftests/bpf: Clarify llvm dependency with btf_tag selftest
  selftests/bpf: Add a C test for btf_type_tag
  selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c
  selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication
  selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests
  selftests/bpf: Test libbpf API function btf__add_type_tag()
  bpftool: Support BTF_KIND_TYPE_TAG
  libbpf: Support BTF_KIND_TYPE_TAG
  ...
====================

Link: https://lore.kernel.org/r/20211115162008.25916-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-15 08:49:23 -08:00
Kuan-Ying Lee
b873e98681 kasan: add kasan mode messages when kasan init
There are multiple kasan modes.  It makes sense that we add some
messages to know which kasan mode is active when booting up [1].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212195 [1]
Link: https://lkml.kernel.org/r/20211020094850.4113-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Yee Lee <yee.lee@mediatek.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-11 09:34:35 -08:00
Linus Torvalds
89fa0be0a0 arm64 fixes for -rc1
- Fix double-evaluation of 'pte' macro argument when using 52-bit PAs
 
 - Fix signedness of some MTE prctl PR_* constants
 
 - Fix kmemleak memory usage by skipping early pgtable allocations
 
 - Fix printing of CPU feature register strings
 
 - Remove redundant -nostdlib linker flag for vDSO binaries
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmGL8ukQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNB0XCADIFxVeLM3YMPsnDC6ttmGp/w1TXh5RoNBm
 IY5utdu8ybdrdCYO//Z6i7rQiYwPBRbR8Xts1LAfZDVuD/IPKqxiY+wjMe9FYu0U
 lb14dnT5iI7L3l0yo/5s1EC8FC/pfCMwi50+trAS5H5jo2wg/+6B/bgbymzTd1ZB
 vzVVzyGv1L/3xKHT4uEDjY1jMPRBH4Ugx2bJ1tzRzsC3Gz0D7ZeVd1Dg3ftPMsqI
 MA4Q5QeE9MsafcV8sKDRLnzNSYEr50goA0xtCre81VmJDNcm/y6UybUCF75KU72M
 kAOImXs0+wrDqNIocYgYlSWRu9xjw8qjoxjnyqikFx47HBesjY8T
 =pHQk
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:

 - Fix double-evaluation of 'pte' macro argument when using 52-bit PAs

 - Fix signedness of some MTE prctl PR_* constants

 - Fix kmemleak memory usage by skipping early pgtable allocations

 - Fix printing of CPU feature register strings

 - Remove redundant -nostdlib linker flag for vDSO binaries

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
  arm64: Track no early_pgtable_alloc() for kmemleak
  arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long
  arm64: vdso: remove -nostdlib compiler flag
  arm64: arm64_ftr_reg->name may not be a human-readable string
2021-11-10 11:29:30 -08:00
Russell King
b89ddf4cca arm64/bpf: Remove 128MB limit for BPF JIT programs
Commit 91fc957c9b ("arm64/bpf: don't allocate BPF JIT programs in module
memory") restricts BPF JIT program allocation to a 128MB region to ensure
BPF programs are still in branching range of each other. However this
restriction should not apply to the aarch64 JIT, since BPF_JMP | BPF_CALL
are implemented as a 64-bit move into a register and then a BLR instruction -
which has the effect of being able to call anything without proximity
limitation.

The practical reason to relax this restriction on JIT memory is that 128MB of
JIT memory can be quickly exhausted, especially where PAGE_SIZE is 64KB - one
page is needed per program. In cases where seccomp filters are applied to
multiple VMs on VM launch - such filters are classic BPF but converted to
BPF - this can severely limit the number of VMs that can be launched. In a
world where we support BPF JIT always on, turning off the JIT isn't always an
option either.

Fixes: 91fc957c9b ("arm64/bpf: don't allocate BPF JIT programs in module memory")
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <russell.king@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/1636131046-5982-2-git-send-email-alan.maguire@oracle.com
2021-11-08 22:16:26 +01:00
Qian Cai
c6975d7cab arm64: Track no early_pgtable_alloc() for kmemleak
After switched page size from 64KB to 4KB on several arm64 servers here,
kmemleak starts to run out of early memory pool due to a huge number of
those early_pgtable_alloc() calls:

  kmemleak_alloc_phys()
  memblock_alloc_range_nid()
  memblock_phys_alloc_range()
  early_pgtable_alloc()
  init_pmd()
  alloc_init_pud()
  __create_pgd_mapping()
  __map_memblock()
  paging_init()
  setup_arch()
  start_kernel()

Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
won't be enough for a server with 200GB+ memory. There isn't much
interesting to check memory leaks for those early page tables and those
early memory mappings should not reference to other memory. Hence, no
kmemleak false positives, and we can safely skip tracking those early
allocations from kmemleak like we did in the commit fed84c7852
("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
introduce complications to automatically scale the value depends on the
runtime memory size etc. After the patch, the default value of
DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.

Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:05:22 +00:00
Linus Torvalds
512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
Mike Rapoport
3ecc68349b memblock: rename memblock_free to memblock_phys_free
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Kefeng Wang
3252b1d830 kasan: arm64: fix pcpu_page_first_chunk crash with KASAN_VMALLOC
With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK the kernel crashes:

  Unable to handle kernel paging request at virtual address ffff7000028f2000
  ...
  swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
  [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
  Internal error: Oops: 96000007 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
  Hardware name: linux,dummy-virt (DT)
  pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
  pc : kasan_check_range+0x90/0x1a0
  lr : memcpy+0x88/0xf4
  sp : ffff80001378fe20
  ...
  Call trace:
   kasan_check_range+0x90/0x1a0
   pcpu_page_first_chunk+0x3f0/0x568
   setup_per_cpu_areas+0xb8/0x184
   start_kernel+0x8c/0x328

The vm area used in vm_area_register_early() has no kasan shadow memory,
Let's add a new kasan_populate_early_vm_area_shadow() function to
populate the vm area shadow memory to fix the issue.

[wangkefeng.wang@huawei.com: fix redefinition of 'kasan_populate_early_vm_area_shadow']
  Link: https://lkml.kernel.org/r/20211011123211.3936196-1-wangkefeng.wang@huawei.com

Link: https://lkml.kernel.org/r/20210910053354.26721-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Marco Elver <elver@google.com>		[KASAN]
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>	[KASAN]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:37 -07:00
Will Deacon
e6359798f6 Merge branch 'for-next/fixes' into for-next/core
Merge for-next/fixes to resolve conflicts in arm64_hugetlb_cma_reserve().

* for-next/fixes:
  acpi/arm64: fix next_platform_timer() section mismatch error
  arm64/hugetlb: fix CMA gigantic page order for non-4K PAGE_SIZE
2021-10-29 12:27:53 +01:00
Will Deacon
16c200e040 Merge branch 'for-next/pfn-valid' into for-next/core
* for-next/pfn-valid:
  arm64/mm: drop HAVE_ARCH_PFN_VALID
  dma-mapping: remove bogus test for pfn_valid from dma_map_resource
2021-10-29 12:25:19 +01:00
Will Deacon
dc6bab18fb Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
  arm64: mm: update max_pfn after memory hotplug
  arm64/mm: Add pud_sect_supported()
  arm64: mm: Drop pointless call to set_max_mapnr()
2021-10-29 12:25:04 +01:00
Will Deacon
d8a2c0fba5 Merge branch 'for-next/kexec' into for-next/core
* for-next/kexec:
  arm64: trans_pgd: remove trans_pgd_map_page()
  arm64: kexec: remove cpu-reset.h
  arm64: kexec: remove the pre-kexec PoC maintenance
  arm64: kexec: keep MMU enabled during kexec relocation
  arm64: kexec: install a copy of the linear-map
  arm64: kexec: use ld script for relocation function
  arm64: kexec: relocate in EL1 mode
  arm64: kexec: configure EL2 vectors for kexec
  arm64: kexec: pass kimage as the only argument to relocation function
  arm64: kexec: Use dcache ops macros instead of open-coding
  arm64: kexec: skip relocation code for inplace kexec
  arm64: kexec: flush image and lists during kexec load time
  arm64: hibernate: abstract ttrb0 setup function
  arm64: trans_pgd: hibernate: Add trans_pgd_copy_el2_vectors
  arm64: kernel: add helper for booted at EL2 and not VHE
2021-10-29 12:24:47 +01:00
Mark Rutland
753b323687 arm64: extable: add load_unaligned_zeropad() handler
For inline assembly, we place exception fixups out-of-line in the
`.fixup` section such that these are out of the way of the fast path.
This has a few drawbacks:

* Since the fixup code is anonymous, backtraces will symbolize fixups as
  offsets from the nearest prior symbol, currently
  `__entry_tramp_text_end`. This is confusing, and painful to debug
  without access to the relevant vmlinux.

* Since the exception handler adjusts the PC to execute the fixup, and
  the fixup uses a direct branch back into the function it fixes,
  backtraces of fixups miss the original function. This is confusing,
  and violates requirements for RELIABLE_STACKTRACE (and therefore
  LIVEPATCH).

* Inline assembly and associated fixups are generated from templates,
  and we have many copies of logically identical fixups which only
  differ in which specific registers are written to and which address is
  branched to at the end of the fixup. This is potentially wasteful of
  I-cache resources, and makes it hard to add additional logic to fixups
  without significant bloat.

* In the case of load_unaligned_zeropad(), the logic in the fixup
  requires a temporary register that we must allocate even in the
  fast-path where it will not be used.

This patch address all four concerns for load_unaligned_zeropad() fixups
by adding a dedicated exception handler which performs the fixup logic
in exception context and subsequent returns back after the faulting
instruction. For the moment, the fixup logic is identical to the old
assembly fixup logic, but in future we could enhance this by taking the
ESR and FAR into account to constrain the faults we try to fix up, or to
specialize fixups for MTE tag check faults.

Other than backtracing, there should be no functional change as a result
of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-13-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 10:45:22 +01:00
Mark Rutland
2e77a62cb3 arm64: extable: add a dedicated uaccess handler
For inline assembly, we place exception fixups out-of-line in the
`.fixup` section such that these are out of the way of the fast path.
This has a few drawbacks:

* Since the fixup code is anonymous, backtraces will symbolize fixups as
  offsets from the nearest prior symbol, currently
  `__entry_tramp_text_end`. This is confusing, and painful to debug
  without access to the relevant vmlinux.

* Since the exception handler adjusts the PC to execute the fixup, and
  the fixup uses a direct branch back into the function it fixes,
  backtraces of fixups miss the original function. This is confusing,
  and violates requirements for RELIABLE_STACKTRACE (and therefore
  LIVEPATCH).

* Inline assembly and associated fixups are generated from templates,
  and we have many copies of logically identical fixups which only
  differ in which specific registers are written to and which address is
  branched to at the end of the fixup. This is potentially wasteful of
  I-cache resources, and makes it hard to add additional logic to fixups
  without significant bloat.

This patch address all three concerns for inline uaccess fixups by
adding a dedicated exception handler which updates registers in
exception context and subsequent returns back into the function which
faulted, removing the need for fixups specialized to each faulting
instruction.

Other than backtracing, there should be no functional change as a result
of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-12-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 10:45:22 +01:00
Mark Rutland
d6e2cc5647 arm64: extable: add type and data fields
Subsequent patches will add specialized handlers for fixups, in addition
to the simple PC fixup and BPF handlers we have today. In preparation,
this patch adds a new `type` field to struct exception_table_entry, and
uses this to distinguish the fixup and BPF cases. A `data` field is also
added so that subsequent patches can associate data specific to each
exception site (e.g. register numbers).

Handlers are named ex_handler_*() for consistency, following the exmaple
of x86. At the same time, get_ex_fixup() is split out into a helper so
that it can be used by other ex_handler_*() functions ins subsequent
patches.

This patch will increase the size of the exception tables, which will be
remedied by subsequent patches removing redundant fixup code. There
should be no functional change as a result of this patch.

Since each entry is now 12 bytes in size, we must reduce the alignment
of each entry from `.align 3` (i.e. 8 bytes) to `.align 2` (i.e. 4
bytes), which is the natrual alignment of the `insn` and `fixup` fields.
The current 8-byte alignment is a holdover from when the `insn` and
`fixup` fields was 8 bytes, and while not harmful has not been necessary
since commit:

  6c94f27ac8 ("arm64: switch to relative exception tables")

Similarly, RO_EXCEPTION_TABLE_ALIGN is dropped to 4 bytes.

Concurrently with this patch, x86's exception table entry format is
being updated (similarly to a 12-byte format, with 32-bytes of absolute
data). Once both have been merged it should be possible to unify the
sorttable logic for the two.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Morse <james.morse@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-11-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 10:45:22 +01:00
Mark Rutland
5d0e790514 arm64: extable: use ex for exception_table_entry
Subsequent patches will extend `struct exception_table_entry` with more
fields, and the distinction between the entry and its `fixup` field will
become more important.

For clarity, let's consistently use `ex` to refer to refer to an entire
entry. In subsequent patches we'll use `fixup` to refer to the fixup
field specifically. This matches the naming convention used today in
arch/arm64/net/bpf_jit_comp.c.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-10-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 10:45:22 +01:00
Mark Rutland
e8c328d7de arm64: extable: make fixup_exception() return bool
The return values of fixup_exception() and arm64_bpf_fixup_exception()
represent a boolean condition rather than an error code, so for clarity
it would be better to return `bool` rather than `int`.

This patch adjusts the code accordingly. While we're modifying the
prototype, we also remove the unnecessary `extern` keyword, so that this
won't look out of place when we make subsequent additions to the header.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Morse <james.morse@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-9-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 10:45:22 +01:00
Mike Kravetz
2e5809a4dd arm64/hugetlb: fix CMA gigantic page order for non-4K PAGE_SIZE
For non-4K PAGE_SIZE configs, the largest gigantic huge page size is
CONT_PMD_SHIFT order. On arm64 with 64K PAGE_SIZE, the gigantic page is
16G. Therefore, one should be able to specify 'hugetlb_cma=16G' on the
kernel command line so that one gigantic page can be allocated from CMA.
However, when adding such an option the following message is produced:

hugetlb_cma: cma area should be at least 8796093022208 MiB

This is because the calculation for non-4K gigantic page order is
incorrect in the arm64 specific routine arm64_hugetlb_cma_reserve().

Fixes: abb7962adc ("arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs")
Cc: <stable@vger.kernel.org> # 5.9.x
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20211005202529.213812-1-mike.kravetz@oracle.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-10-11 18:45:19 +01:00
Anshuman Khandual
3de360c3fd arm64/mm: drop HAVE_ARCH_PFN_VALID
CONFIG_SPARSEMEM_VMEMMAP is now the only available memory model on arm64
platforms and free_unused_memmap() would just return without creating any
holes in the memmap mapping.  There is no need for any special handling in
pfn_valid() and HAVE_ARCH_PFN_VALID can just be dropped.  This also moves
the pfn upper bits sanity check into generic pfn_valid().

[rppt: rebased on v5.15-rc3]

Link: https://lkml.kernel.org/r/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20210930013039.11260-3-rppt@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-01 14:54:45 +01:00
Pasha Tatashin
6091dd9eaf arm64: trans_pgd: remove trans_pgd_map_page()
The intend of trans_pgd_map_page() was to map contiguous range of VA
memory to the memory that is getting relocated during kexec. However,
since we are now using linear map instead of contiguous range this
function is not needed

Suggested-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210930143113.1502553-16-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-01 13:31:01 +01:00
Pasha Tatashin
08eae0ef61 arm64: kexec: configure EL2 vectors for kexec
If we have a EL2 mode without VHE, the EL2 vectors are needed in order
to switch to EL2 and jump to new world with hypervisor privileges.

In preparation to MMU enabled relocation, configure our EL2 table now.

Kexec uses #HVC_SOFT_RESTART to branch to the new world, so extend
el1_sync vector that is provided by trans_pgd_copy_el2_vectors() to
support this case.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210930143113.1502553-9-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-01 13:31:00 +01:00
Pasha Tatashin
788bfdd974 arm64: trans_pgd: hibernate: Add trans_pgd_copy_el2_vectors
Users of trans_pgd may also need a copy of vector table because it is
also may be overwritten if a linear map can be overwritten.

Move setup of EL2 vectors from hibernate to trans_pgd, so it can be
later shared with kexec as well.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210930143113.1502553-3-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-01 13:30:59 +01:00
Sudarshan Rajagopalan
8fac67ca23 arm64: mm: update max_pfn after memory hotplug
After new memory blocks have been hotplugged, max_pfn and max_low_pfn
needs updating to reflect on new PFNs being hot added to system.
Without this patch, debug-related functions that use max_pfn such as
get_max_dump_pfn() or read_page_owner() will not work with any page in
memory that is hot-added after boot.

Fixes: 4ab2150615 ("arm64: Add memory hotplug support")
Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Georgi Djakov <quic_c_gdjako@quicinc.com>
Tested-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Link: https://lore.kernel.org/r/a51a27ee7be66024b5ce626310d673f24107bcb8.1632853776.git.quic_cgoldswo@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-09-29 17:11:06 +01:00
Anshuman Khandual
f8b46c4b51 arm64/mm: Add pud_sect_supported()
Section mapping at PUD level is supported only on 4K pages and currently it
gets verified with explicit #ifdef or IS_ENABLED() constructs. This adds a
new helper pud_sect_supported() for this purpose, which particularly cleans
up the HugeTLB code path. It updates relevant switch statements with checks
for __PAGETABLE_PMD_FOLDED in order to avoid build failures caused with two
identical switch case values in those code blocks.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1632130171-472-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-09-29 16:54:33 +01:00