Commit Graph

80 Commits

Author SHA1 Message Date
Robin Murphy
860a831de1 perf/arm: Add missing .suppress_bind_attrs
PMU drivers should set .suppress_bind_attrs so that userspace is denied
the opportunity to pull the driver out from underneath an in-use PMU
(with predictably unpleasant consequences). Somehow both the CMN and NI
drivers have managed to miss this; put that right.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/acd48c341b33b96804a3969ee00b355d40c546e2.1751465293.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04 18:07:26 +01:00
Robin Murphy
a7bfae2145 perf/arm-cmn: Reduce stack usage during discovery
Arnd reports that Clang's aggressive inlining of arm_cmn_discover() can
lead to stack frame size warnings, and while we could simply prevent
such inlining to hide the issue, it seems more productive to actually
heed the warning and do something about the overall stack footprint.
The xp_region array is already rather large, and CMN_MAX_XPS might only
grow larger in future, however it only serves as a convenience to save
repeating the first level's worth of register reads in the second pass
of discovery. There's no performance concern here, and it only takes a
small tweak to the flow to re-extract the offsets instead of stashing
them, so let's just do that and save several hundred bytes of stack.

Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/e7dd41bf0f1b098e2e4b01ef91318a4b272abff8.1751046159.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04 18:06:55 +01:00
Zhiyuan Dai
8b177e9a4e perf/arm-cmn: Broaden module description for wider interconnect support
The current MODULE_DESCRIPTION only mentions CMN-600, but this driver
now supports several Arm mesh interconnects including CMN-650, CMN-700,
CI-700, and CMN-S3.

Update the MODULE_DESCRIPTION to reflect the expanded scope.

Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Link: https://lore.kernel.org/r/20250522032122.949373-1-daizhiyuan@phytium.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04 18:05:07 +01:00
Robin Murphy
8c138a189f perf/arm-cmn: Add CMN S3 ACPI binding
An ACPI binding for CMN S3 was not yet finalised when the driver support
was originally written, but v1.2 of DEN0093 "ACPI for Arm Components"
has at last been published; support ACPI systems using the proper HID.

Cc: stable@vger.kernel.org
Fixes: 0dc2f4963f ("perf/arm-cmn: Support CMN S3")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/7dafe147f186423020af49d7037552ee59c60e97.1747652164.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-05-19 12:31:54 +01:00
Robin Murphy
597704e201 perf/arm-cmn: Initialise cmn->cpu earlier
For all the complexity of handling affinity for CPU hotplug, what we've
apparently managed to overlook is that arm_cmn_init_irqs() has in fact
always been setting the *initial* affinity of all IRQs to CPU 0, not the
CPU we subsequently choose for event scheduling. Oh dear.

Cc: stable@vger.kernel.org
Fixes: 0ba64770a2 ("perf: Add Arm CMN-600 PMU driver")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/b12fccba6b5b4d2674944f59e4daad91cd63420b.1747069914.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-05-16 15:18:58 +01:00
Robin Murphy
11b0f576e0 perf/arm-cmn: Fix REQ2/SNP2 mixup
Somehow the encodings for REQ2/SNP2 channels in XP events
got mixed up... Unmix them.

CC: stable@vger.kernel.org
Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/087023e9737ac93d7ec7a841da904758c254cb01.1746717400.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-05-09 12:15:31 +01:00
Robin Murphy
674cd77402 perf/arm-cmn: Remove CMN-600 DTC domain special case
The special case for trying to infer the DTC domain for DTC-adjacent
nodes on CMN-600 is fragile and buggy - currently resulting in subtly
messed up DTC counter allocation - and the theoretical benefit it
offers to a tiny minority of use-cases arguably doesn't outweigh the
inconsistency it offers to others anyway. Just get rid of it.

Fixes: ab33c66fd8 ("perf/arm-cmn: Enable per-DTC counter allocation")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/67985e39f53b56385d79a4f1264cf7f9cacedb58.1742308248.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17 14:08:10 +01:00
Robin Murphy
678a5d3d6d perf/arm-cmn: Minor event type housekeeping
While handling RN-D nodes under the functionally-identical RN-I type
works fine for perf tool users using the "rnid_" event aliases, and that
is the documented and expected ABI, there's little reason not to be
permissive and accept the actual RN-D type as an additional encoding for
the same events as well. This may be convenient for other tooling
generating event configs directly from its own topology data.

In the RN-I event mood, it also seems as good a time as any to clean up
a forgotten macro for CCLA_RNI events which ended up being unnecessary.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/ef46a47fc4ab909093f14b2b4289a4835836ab6c.1738851844.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-03-01 05:18:40 +00:00
Robin Murphy
e49ecdf79a perf/arm-cmn: Permit more exhaustive groups
The group validation logic still somewhat assumes the original CMN-600
case of events counting globally, such that if one tries to group 9
events where the first 8 target a single DTC domain, the 9th will be
rejected because *a* DTC domain is full, even though it might only
target other non-overlapping domains and thus still be schedulable.
Improve matters by only counting the DTCs that the new event actually
needs (as arm_cmn_val_add_event() was already clever enough to do).

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/bdfd1e58dac449e407c5cacfd6bf8577dc0a5899.1733943898.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-19 15:33:42 +00:00
Namhyung Kim
dfdf714fed perf/arm-cmn: Ensure port and device id bits are set properly
The portid_bits and deviceid_bits were set only for XP type nodes in
the arm_cmn_discover() and it confused other nodes to find XP nodes.
Copy the both bits from the XP nodes directly when it sets up a new
node.

Fixes: e79634b53e ("perf/arm-cmn: Refactor node ID handling. Again.")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20241121001334.331334-1-namhyung@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-25 18:53:05 +00:00
Uwe Kleine-König
845fd2cbed perf: Switch back to struct platform_driver::remove()
After commit 0edb555a65 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.

Convert all platform drivers below drivers/perf to use .remove(), with
the eventual goal to drop struct platform_driver::remove_new(). As
.remove() and .remove_new() have the same prototypes, conversion is done
by just changing the structure member name in the driver initializer.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20241027180313.410964-2-u.kleine-koenig@baylibre.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-11-06 14:16:20 +00:00
Robin Murphy
f32efa3e4b perf/arm-cmn: Improve format attr printing
Take full advantage of our formats being stored in bitfield form, and
make the printing even more robust and simple by letting printk do all
the hard work of formatting bitlists.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/50459f2d48fc62310a566863dbf8a7c14361d363.1725474584.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06 12:58:06 +01:00
Robin Murphy
f04b611e66 perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
Checking for NUMA_NO_NODE is a misleading and, on reflection, entirely
unnecessary micro-optimisation. If it ever did happen that an incoming
CPU has no NUMA affinity while the current CPU does, a questionably-
useful PMU migration isn't the biggest thing wrong with that picture...

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/00634da33c21269a00844140afc7cc3a2ac1eb4d.1725474584.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06 12:58:06 +01:00
Robin Murphy
0dc2f4963f perf/arm-cmn: Support CMN S3
CMN S3 is the latest and greatest evolution for 2024, although most of
the new features don't impact the PMU, so from our point of view it ends
up looking a lot like CMN-700 r3 still. We have some new device types to
ignore, a mildly irritating rearrangement of the register layouts, and a
scary new configuration option that makes it potentially unsafe to even
walk the full discovery tree, let alone attempt to use the PMU.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/2ec9eec5b6bf215a9886f3b69e3b00e4cd85095c.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
67acca3504 perf/arm-cmn: Refactor DTC PMU register access
Annoyingly, we're soon going to have to cope with PMU registers moving
about. This will mostly be straightforward, except for the hard-coding
of CMN_PMU_OFFSET for the DTC PMU registers. As a first step, refactor
those accessors to allow for encapsulating a variable offset without
making a big mess all over. As a bonus, we can repack the arm_cmn_dtc
structure to accommodate the new pointer without growing any larger,
since irq_friend only encodes a range of +/-3.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/fc677576fae7b5b55780e5b245a4ef6ea1b30daf.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
c5b15ddf11 perf/arm-cmn: Make cycle counts less surprising
By default, CMN has automatic clock-gating with the implication that
a DTC's cycle counter may not increment while the DTC is sufficiently
idle. Given that we may have up to 4 DTCs to choose from when scheduling
a cycles event, this may potentially lead to surprising results if
trying to measure metrics based on activity in a different DTC domain
from where cycles end up being counted. Furthermore, since the details
of internal clock gating are not documented, we can't even reason about
what "active" cycles for a DTC actually mean relative to the activity of
other nodes within the same nominal DTC domain.

Make the reasonable assumption that if the user wants to count cycles,
they almost certainly want to count all of the cycles, and disable clock
gating while a DTC's cycle counter is in use.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/c47cfdc09e907b1d7753d142a7e659982cceb246.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
ff436cee69 perf/arm-cmn: Improve build-time assertion
These days we can use static_assert() in the logical place rather than
jamming a BUILD_BUG_ON() into the nearest function scope.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/224ee8286f299100f1c768edb254edc898539f50.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
359414b33e perf/arm-cmn: Ensure dtm_idx is big enough
While CMN_MAX_DIMENSION was bumped to 12 for CMN-650, that only supports
up to a 10x10 mesh, so bumping dtm_idx to 256 bits at the time worked
out OK in practice. However CMN-700 did finally support up to 144 XPs,
and thus needs a worst-case 288 bits of dtm_idx for an aggregated XP
event on a maxed-out config. Oops.

Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/e771b358526a0d7fc06efee2c3a2fdc0c9f51d44.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
88b63a82c8 perf/arm-cmn: Fix CCLA register offset
Apparently pmu_event_sel is offset by 8 for all CCLA nodes, not just
the CCLA_RNI combination type.

Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/6e7bb06fef6046f83e7647aad0e5be544139763f.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:07 +01:00
Robin Murphy
e79634b53e perf/arm-cmn: Refactor node ID handling. Again.
The scope of the "extra device ports" configuration is not made clear by
the CMN documentation - so far we've assumed it applies globally, based
on the sole example which suggests as much. However it transpires that
this is incorrect, and the format does in fact vary based on each
individual XP's port configuration. As a consequence, we're currenly
liable to decode the port/device indices from a node ID incorrectly,
thus program the wrong event source in the DTM leading to bogus event
counts, and also show device topology on the wrong ports in debugfs.

To put this right, rework node IDs yet again to carry around the
additional data necessary to decode them properly per-XP. At this point
the notion of fully decomposing an ID becomes more impractical than it's
worth, so unabstracting the XY mesh coordinates (where 2/3 users were
just debug anyway) ends up leaving things a bit simpler overall.

Fixes: 60d1504070 ("perf/arm-cmn: Support new IP features")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/5195f990152fc37adba5fbf5929a6b11063d9f09.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:07 +01:00
Ilkka Koskinen
f9a7a91f64 perf/arm-cmn: Enable support for tertiary match group
Add support for tertiary match group.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20240618005056.3092866-3-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-01 14:55:08 +01:00
Ilkka Koskinen
4a112585eb perf/arm-cmn: Decouple wp_config registers from filter group number
Previously, wp_config0/2 registers were used for primary match group and
wp_config1/3 registers for secondary match group. In order to support
tertiary match group, this patch decouples the registers and the groups.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20240618005056.3092866-2-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-01 14:55:08 +01:00
Robin Murphy
8f9f5041c6 perf/arm-cmn: Set PMU device parent
Now that perf supports giving the PMU device a parent, we can use our
platform device to make the relationship between CMN instances and PMU
IDs trivially discoverable, from either nominal direction:

root@crazy-taxi:~# ls /sys/devices/platform/ARMHC600:00 | grep cmn
arm_cmn_0
root@crazy-taxi:~# realpath /sys/bus/event_source/devices/arm_cmn_0/..
/sys/devices/platform/ARMHC600:00

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/25d4428df1ddad966c74a3ed60171cd3ca6c8b66.1712682917.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-10 16:48:17 +01:00
Dawei Li
60c73240f3 perf/arm-cmn: Avoid placing cpumask on the stack
In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_any_and_but() to avoid the need for a temporary cpumask on
the stack.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Link: https://lore.kernel.org/r/20240403155950.2068109-4-dawei.li@shingroup.cn
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09 16:47:15 +01:00
Linus Torvalds
6d75c6f40a arm64 updates for 6.9:
* Reorganise the arm64 kernel VA space and add support for LPA2 (at
   stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address range
   with 4KB and 16KB pages
 
 * Enable Rust on arm64
 
 * Support for the 2023 dpISA extensions (data processing ISA), host only
 
 * arm64 perf updates:
 
   - StarFive's StarLink (integrates one or more CPU cores with a shared
     L3 memory system) PMU support
 
   - Enable HiSilicon Erratum 162700402 quirk for HIP09
 
   - Several updates for the HiSilicon PCIe PMU driver
 
   - Arm CoreSight PMU support
 
   - Convert all drivers under drivers/perf/ to use .remove_new()
 
 * Miscellaneous:
 
   - Don't enable workarounds for "rare" errata by default
 
   - Clean up the DAIF flags handling for EL0 returns (in preparation for
     NMI support)
 
   - Kselftest update for ptrace()
 
   - Update some of the sysreg field definitions
 
   - Slight improvement in the code generation for inline asm I/O
     accessors to permit offset addressing
 
   - kretprobes: acquire regs via a BRK exception (previously done via a
     trampoline handler)
 
   - SVE/SME cleanups, comment updates
 
   - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously disabled
     due to gcc silently ignoring -falign-functions=N)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmXxiSgACgkQa9axLQDI
 XvHd7hAAjQrQqxJogPT2ahM5/gxct8qTrXpIgX0B1Y7bb5R8ztvOUN9MJNuDyRsj
 0s28SSZw387LReM5OUu+U6G/iahcuNAyP/8d9qeac32Tidd255fV3KPEh4C4eC+u
 0HeOqLBZ+stmNoa71tBC2K6SmchizhYyYduvRnri8km8K4OMDawHWqWRTXl0PNRT
 RMVJvZTDJMPfMBFeD4+B7EnSFOoP14tKCw9MZvlbpT2PEV0kINjhCQiojW2jJgqv
 w36vm/dhwsg1avSzT1xhy3KE+m+7n28+IC/wr1HB7c1WumvYKv7Z84ieCp3PlO3Z
 owvVO7dKJC6X3RkoY6Kge5p2RHU6poDerDVHYiAvG+Zi57nrDmHyAubskThsGTGR
 AibSEeJ5nQ0yM6hx7zAIQa5XEo4l0svD1ZM7NynY+5JR44W9cdAH3SnEsvIBMGIf
 /ja+iZ1W4ZQnIESQXD5uDPSxILfqQ8Ebhdorpw+Qg3rB7OhdTdGSSGQCi6V2PcJH
 d/ErFO+i0lFRBPJtBbUAN4EEu3HJcVYEoEnVJYQahC+6KyNGLxO+7L6sH0YO7Pag
 P1LRa6h8ktuBMrbCrOPWdmJYNDYCbb5rRtmcCwO0ItZ4g5tYWp9djFc8pyctCaNB
 MZxxRrUCNwXTOcFTDiYzyk+JCvpf3EvXfvj8AH+P8BMjFWgqHqw=
 =KTD/
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The major features are support for LPA2 (52-bit VA/PA with 4K and 16K
  pages), the dpISA extension and Rust enabled on arm64. The changes are
  mostly contained within the usual arch/arm64/, drivers/perf, the arm64
  Documentation and kselftests. The exception is the Rust support which
  touches some generic build files.

  Summary:

   - Reorganise the arm64 kernel VA space and add support for LPA2 (at
     stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address
     range with 4KB and 16KB pages

   - Enable Rust on arm64

   - Support for the 2023 dpISA extensions (data processing ISA), host
     only

   - arm64 perf updates:

      - StarFive's StarLink (integrates one or more CPU cores with a
        shared L3 memory system) PMU support

      - Enable HiSilicon Erratum 162700402 quirk for HIP09

      - Several updates for the HiSilicon PCIe PMU driver

      - Arm CoreSight PMU support

      - Convert all drivers under drivers/perf/ to use .remove_new()

   - Miscellaneous:

      - Don't enable workarounds for "rare" errata by default

      - Clean up the DAIF flags handling for EL0 returns (in preparation
        for NMI support)

      - Kselftest update for ptrace()

      - Update some of the sysreg field definitions

      - Slight improvement in the code generation for inline asm I/O
        accessors to permit offset addressing

      - kretprobes: acquire regs via a BRK exception (previously done
        via a trampoline handler)

      - SVE/SME cleanups, comment updates

      - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously
        disabled due to gcc silently ignoring -falign-functions=N)"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (134 commits)
  Revert "mm: add arch hook to validate mmap() prot flags"
  Revert "arm64: mm: add support for WXN memory translation attribute"
  Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"
  ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
  kselftest/arm64: Add 2023 DPISA hwcap test coverage
  kselftest/arm64: Add basic FPMR test
  kselftest/arm64: Handle FPMR context in generic signal frame parser
  arm64/hwcap: Define hwcaps for 2023 DPISA features
  arm64/ptrace: Expose FPMR via ptrace
  arm64/signal: Add FPMR signal handling
  arm64/fpsimd: Support FEAT_FPMR
  arm64/fpsimd: Enable host kernel access to FPMR
  arm64/cpufeature: Hook new identification registers up to cpufeature
  docs: perf: Fix build warning of hisi-pcie-pmu.rst
  perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
  MAINTAINERS: Add entry for StarFive StarLink PMU
  docs: perf: Add description for StarFive's StarLink PMU
  dt-bindings: perf: starfive: Add JH8100 StarLink PMU
  perf: starfive: Add StarLink PMU support
  docs: perf: Update usage for target filter of hisi-pcie-pmu
  ...
2024-03-14 15:35:42 -07:00
Ilkka Koskinen
50572064ec perf/arm-cmn: Workaround AmpereOneX errata AC04_MESH_1 (incorrect child count)
AmpereOneX mesh implementation has a bug in HN-P nodes that makes them
report incorrect child count. The failing crosspoints report 8 children
while they only have two.

When the driver tries to access the inexistent child nodes, it believes it
has reached an invalid node type and probing fails. The workaround is to
ignore those incorrect child nodes and continue normally.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
[ rm: rewrote simpler generalised version ]
Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ce4b1442135fe03d0de41859b04b268c88c854a3.1707498577.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09 17:14:04 +00:00
Robin Murphy
a1083ee717 perf/arm-cmn: Improve debugfs pretty-printing for large configs
The debugfs pretty-printer was written for the CMN-600 assumptions of a
maximum 8x8 mesh, but CMN-700 now allows coordinates and ID values up to
12 and 128 respectively, which can overflow the format strings, mess up
the alignment of the table and hurt overall readability. This table does
prove useful for double-checking that the driver is picking up the
topology of new systems correctly and for verifying user expectations,
so tweak the formatting to stay nice and readable with wider values.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1d1517eadd1bac5992fab679c9dc531b381944da.1702484646.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09 16:00:08 +00:00
Uwe Kleine-König
3909cb3b5f perf: arm-cmn: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/8698ca612e17292f8a8bbb2d1c0f6be4b2053da7.1702648125.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09 15:59:29 +00:00
Will Deacon
db32cf8e28 Merge branch 'for-next/fixes' into for-next/core
Merge in arm64 fixes queued for 6.7 so that kpti_install_ng_mappings()
can be updated to use arm64_kernel_unmapped_at_el0() instead of checking
the ARM64_UNMAP_KERNEL_AT_EL0 CPU capability directly.

* for-next/fixes:
  arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
  perf/arm-cmn: Fail DTC counter allocation correctly
  arm64: Avoid enabling KPTI unnecessarily
2024-01-04 12:32:33 +00:00
Robin Murphy
1892fe103c perf/arm-cmn: Fail DTC counter allocation correctly
Calling arm_cmn_event_clear() before all DTC indices are allocated is
wrong, and can lead to arm_cmn_event_add() erroneously clearing live
counters from full DTCs where allocation fails. Since the DTC counters
are only updated by arm_cmn_init_counter() after all DTC and DTM
allocations succeed, nothing actually needs cleaning up in this case
anyway, and it should just return directly as it did before.

Fixes: 7633ec2c26 ("perf/arm-cmn: Rework DTC counters (again)")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/ed589c0d8e4130dc68b8ad1625226d28bdc185d4.1702322847.git.robin.murphy@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-12-12 11:29:21 +00:00
Robin Murphy
590f23b092 perf/arm-cmn: Fix HN-F class_occup_id events
A subtle copy-paste error managed to slip through the reorganisation
of these patches in development, and not only give some HN-F events
the wrong type, but use that wrong type before the subsequent patch
defined it. Too late to fix history, but we can at least fix the bug.

Fixes: b1b7dc38e4 ("perf/arm-cmn: Refactor HN-F event selector macros")
Reported-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/5a22439de84ff188ef76674798052448eb03a3e1.1700740693.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-05 12:17:02 +00:00
Linus Torvalds
56ec8e4cd8 arm64 updates for 6.7:
* Major refactoring of the CPU capability detection logic resulting in
   the removal of the cpus_have_const_cap() function and migrating the
   code to "alternative" branches where possible
 
 * Backtrace/kgdb: use IPIs and pseudo-NMI
 
 * Perf and PMU:
 
   - Add support for Ampere SoC PMUs
 
   - Multi-DTC improvements for larger CMN configurations with multiple
     Debug & Trace Controllers
 
   - Rework the Arm CoreSight PMU driver to allow separate registration of
     vendor backend modules
 
   - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
     driver; use device_get_match_data() in the xgene driver; fix NULL
     pointer dereference in the hisi driver caused by calling
     cpuhp_state_remove_instance(); use-after-free in the hisi driver
 
 * HWCAP updates:
 
   - FEAT_SVE_B16B16 (BFloat16)
 
   - FEAT_LRCPC3 (release consistency model)
 
   - FEAT_LSE128 (128-bit atomic instructions)
 
 * SVE: remove a couple of pseudo registers from the cpufeature code.
   There is logic in place already to detect mismatched SVE features
 
 * Miscellaneous:
 
   - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
     bouncing is needed. The buffer is still required for small kmalloc()
     buffers
 
   - Fix module PLT counting with !RANDOMIZE_BASE
 
   - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
     synchronisation code out of the set_ptes() loop
 
   - More compact cpufeature displaying enabled cores
 
   - Kselftest updates for the new CPU features
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmU7/QUACgkQa9axLQDI
 XvEx3xAAjICmHm+ryKJxS1IGXLYu2DXMcHUjeW6w1SxkK/vKhTMlHRx/CIWDze2l
 eENu7TcDLtTw+Gv9kqg30TSwzLfJhP9oFpX2T5TKkh5qlJlbz8fBtm+as14DTLCZ
 p2sra3J0w4B5JwTVqnj2RHOlEftMKvbyLGRkz3ve6wIUbsp5pXMkxAd/k3wOf0lC
 m6d9w1OMA2sOsw9YCgjcCNQGEzFMJk+13w7K+4w6A8Djn/Jxkt4fAFVn2ZlCiZzD
 NA2lTDWJqGmeGHo3iFdCTensWXmWTqjzxsNEf7PyBk5mBOdzDVxlTfEL7vnJg7gf
 BlTQ/nhIpra7rHQ9q2rwqEzbF+4Tn3uWlQfdDb7+/4goPjDh7tlBhEOYyOwTCEIT
 0t9cCSvBmSCKeXC3lKWWtJ+QJKhZHSmXN84EotTs65KyyfIsi4RuSezvV/+aIL86
 06sHYlYxETuujZP1cgOjf69Wsdsgizx0mqXJXf/xOjp22HFDcL4Bki6Rgi6t5OZj
 GEHG15kSE+eJ+RIpxpuAN8fdrlxYubsVLIksCqK7cZf9zXbQGIlifKAIrYiEx6kz
 FD+o+j/5niRWR6yJZCtCcGxqpSlwnYWPqc1Ds0GES8A/BphWMPozXUAZ0ll4Fnp1
 yyR2/Due/eBsCNESn579kP8989rashubB8vxvdx2fcWVtLC7VgE=
 =QaEo
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "No major architecture features this time around, just some new HWCAP
  definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.

  The bulk of the changes is reworking of the CPU capability checking
  code (cpus_have_cap() etc).

   - Major refactoring of the CPU capability detection logic resulting
     in the removal of the cpus_have_const_cap() function and migrating
     the code to "alternative" branches where possible

   - Backtrace/kgdb: use IPIs and pseudo-NMI

   - Perf and PMU:

      - Add support for Ampere SoC PMUs

      - Multi-DTC improvements for larger CMN configurations with
        multiple Debug & Trace Controllers

      - Rework the Arm CoreSight PMU driver to allow separate
        registration of vendor backend modules

      - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
        driver; use device_get_match_data() in the xgene driver; fix
        NULL pointer dereference in the hisi driver caused by calling
        cpuhp_state_remove_instance(); use-after-free in the hisi driver

   - HWCAP updates:

      - FEAT_SVE_B16B16 (BFloat16)

      - FEAT_LRCPC3 (release consistency model)

      - FEAT_LSE128 (128-bit atomic instructions)

   - SVE: remove a couple of pseudo registers from the cpufeature code.
     There is logic in place already to detect mismatched SVE features

   - Miscellaneous:

      - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
        bouncing is needed. The buffer is still required for small
        kmalloc() buffers

      - Fix module PLT counting with !RANDOMIZE_BASE

      - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
        synchronisation code out of the set_ptes() loop

      - More compact cpufeature displaying enabled cores

      - Kselftest updates for the new CPU features"

 * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
  arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
  arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
  arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
  perf: hisi: Fix use-after-free when register pmu fails
  drivers/perf: hisi_pcie: Initialize event->cpu only on success
  drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
  arm64: cpufeature: Change DBM to display enabled cores
  arm64: cpufeature: Display the set of cores with a feature
  perf/arm-cmn: Enable per-DTC counter allocation
  perf/arm-cmn: Rework DTC counters (again)
  perf/arm-cmn: Fix DTC domain detection
  drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
  drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
  drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
  clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
  arm64: Remove system_uses_lse_atomics()
  arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
  drivers/perf: xgene: Use device_get_match_data()
  perf/amlogic: add missing MODULE_DEVICE_TABLE
  arm64/mm: Hoist synchronization out of set_ptes() loop
  ...
2023-11-01 09:34:55 -10:00
Robin Murphy
ab33c66fd8 perf/arm-cmn: Enable per-DTC counter allocation
Finally enable independent per-DTC-domain counter allocation, except on
CMN-600 where we still need to cope with not knowing the domain topology
and thus keep counter indices sychronised across domains. This allows
users to simultaneously count up to 8 targeted events per domain, rather
than 8 globally, for up to 4x wider coverage on maximum configurations.

Even though this now looks deceptively simple, I stand by my previous
assertion that it was a flippin' nightmare to implement; all the real
head-scratchers are hidden in the foundations in the previous patch...

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/849f65566582cb102c6d0843d0f26e231180f8ac.1697824215.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-10-23 13:45:42 +01:00
Robin Murphy
7633ec2c26 perf/arm-cmn: Rework DTC counters (again)
The bitmap-based scheme for tracking DTC counter usage turns out to be a
complete dead-end for its imagined purpose, since by the time we have to
keep track of a per-DTC counter index anyway, we already have enough
information to make the bitmap itself redundant. Revert the remains of
it back to almost the original scheme, but now expanded to track per-DTC
indices, in preparation for making use of them in anger.

Note that since cycle count events always use a dedicated counter on a
single DTC, we reuse the field to encode their DTC index directly.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/5f6ade76b47f033836d7a36c03555da896dfb4a3.1697824215.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-10-23 13:45:42 +01:00
Robin Murphy
e3e73f511c perf/arm-cmn: Fix DTC domain detection
It transpires that dtm_unit_info is another register which got shuffled
in CMN-700 without me noticing. Fix that in a way which also proactively
fixes the fragile laziness of its consumer, just in case any further
fields ever get added alongside dtc_domain.

Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/3076ee83d0554f6939fbb6ee49ab2bdb28d8c7ee.1697824215.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-10-23 13:45:42 +01:00
Jing Zhang
7f949f6f54 perf/arm-cmn: Fix the unhandled overflow status of counter 4 to 7
The register por_dt_pmovsr Bits[7:0] indicates overflow from counters 7
to 0. But in arm_cmn_handle_irq(), only handled the overflow status of
Bits[3:0] which results in unhandled overflow status of counters 4 to 7.

So let the overflow status of DTC counters 4 to 7 to be handled.

Fixes: 0ba64770a2 ("perf: Add Arm CMN-600 PMU driver")
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1695612152-123633-1-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-09-29 16:24:59 +01:00
Robin Murphy
ac18ea1a89 perf/arm-cmn: Add CMN-700 r3 support
CMN-700 r3 has a special configuration option for a so-called "Super
Home Node", which is a superset of the standard HN-F that also manages
remote-chip coherency for multi-chip setups. As such it has a similar
but expanded set of PMU events compared to HN-F, with some additional
filtering options to boot.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/49153b72253f6af0e625cb55b9e1b825b110c49c.1688746690.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-07-28 14:31:06 +01:00
Robin Murphy
b1b7dc38e4 perf/arm-cmn: Refactor HN-F event selector macros
Refactor the macros for defining HN-F events with additional selectors,
so they can be shared with another upcoming similar-but-distinct HN
type. No functional change intended.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0f05327941e06c665dbfd47e03fad29276b9e63c.1688746690.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-07-28 14:31:06 +01:00
Robin Murphy
00df90934c perf/arm-cmn: Remove spurious event aliases
As the name suggests, the "partial DAT flit" event is only counted for
the DAT channel, and furthermore is only applicable to device ports, not
mesh links (strictly it's only device ports with CHI-A requesters
connected, but detecting that degree of detail is more bother than it's
worth). Stop generating spurious event aliases for other combinations
which aren't meaningful.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b01a58e3ff05c322547fbfd015f6dbfedf555ed3.1688746690.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-07-28 14:31:05 +01:00
Robin Murphy
a1c45d3ebd perf/arm-cmn: Add sysfs identifier
Expose a sysfs identifier encapsulating the CMN part number and revision
so that jevents can narrow down a fundamental set of possible events for
calculating metrics. Configuration-dependent aspects - such as whether a
given node type is present, and/or a given node ID is valid - are still
not covered, and in general it's hard to see how userspace could handle
them, so we won't be removing any data or validation logic from the
driver any time soon, but at least it's a step in a useful direction.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Tested-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Link: https://lore.kernel.org/r/b8a14c14fcdf028939ebf57849863e8ae01743de.1686588640.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16 10:28:21 +01:00
Robin Murphy
7819e05a0d perf/arm-cmn: Revamp model detection
CMN implements a set of CoreSight-format peripheral ID registers which
in principle we should be able to use to identify the hardware. However
so far we have avoided trying to use the part number field since the
TRMs have all described it as "configuration dependent". It turns out,
though, that this is a quirk of the documentation generation process,
and in fact the part number should always be a stable well-defined field
which we can trust.

To that end, revamp our model detection to rely less on ACPI/DT, and
pave the way towards further using the hardware information as an
identifier for userspace jevent metrics. This includes renaming the
revision constants to maximise readability.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/3c791eaae814b0126f9adbd5419bfb4a600dade7.1686588640.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16 10:28:21 +01:00
Robin Murphy
71746c995c perf/arm-cmn: Fix DTC reset
It turns out that my naive DTC reset logic fails to work as intended,
since, after checking with the hardware designers, the PMU actually
needs to be fully enabled in order to correctly clear any pending
overflows. Therefore, invert the sequence to start with turning on both
enables so that we can reliably get the DTCs into a known state, then
moving to our normal counters-stopped state from there. Since all the
DTM counters have already been unpaired during the initial discovery
pass, we just need to additionally reset the cycle counters to ensure
that no other unexpected overflows occur during this period.

Fixes: 0ba64770a2 ("perf: Add Arm CMN-600 PMU driver")
Reported-by: Geoff Blake <blakgeof@amazon.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0ea4559261ea394f827c9aee5168c77a60aaee03.1684946389.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-05 15:35:52 +01:00
Robin Murphy
2ad91e44e6 perf/arm-cmn: Fix port detection for CMN-700
When the "extra device ports" configuration was first added, the
additional mxp_device_port_connect_info registers were added around the
existing mxp_mesh_port_connect_info registers. What I missed about
CMN-700 is that it shuffled them around to remove this discontinuity.
As such, tweak the definitions and factor out a helper for reading these
registers so we can deal with this discrepancy easily, which does at
least allow nicely tidying up the callsites. With this we can then also
do the nice thing and skip accesses completely rather than relying on
RES0 behaviour where we know the extra registers aren't defined.

Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Reported-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/71d129241d4d7923cde72a0e5b4c8d2f6084525f.1681295193.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-14 13:45:00 +01:00
Robin Murphy
23b2fd8394 perf/arm-cmn: Validate cycles events fully
DTC cycle count events don't have anything to validate or initialise in
themselves, but we should not forget to still validate their whole group
context. Otherwise, we may fail to correctly reject a contrived group
containing an impossible number of cycles events.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/3124e8c276a1f513c1a415dc839ca4181b3c8bc8.1680522545.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-04-06 16:07:51 +01:00
Ilkka Koskinen
f87e9114b5 perf/arm-cmn: Move overlapping wp_combine field
As eventid field was expanded to support new mesh versions, it started to
overlap with wp_combine field. Move wp_combine to fix the issue.

Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20230301175540.19891-1-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-03-27 15:00:59 +01:00
Linus Torvalds
8bf1a529cd arm64 updates for 6.3:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that Linux
   needs to save/restore.
 
 - Include TPIDR2 in the signal context and add the corresponding
   kselftests.
 
 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time.
 
 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
 
 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire
   loaded kernel image to the PoC and disabling the MMU and caches before
   branching to the kernel bare metal entry point, leave the MMU and
   caches enabled and rely on EFI's cacheable 1:1 mapping of all of
   system RAM to populate the initial page tables.
 
 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values).
 
 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure.
 
 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice.
 
 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone.
 
 - Further arm64 sysreg conversion and some fixes.
 
 - arm64 kselftest fixes and improvements.
 
 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation.
 
 - Pseudo-NMI code generation optimisations.
 
 - Minor fixes for SME and TPIDR2 handling.
 
 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace
   strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic
   shadow call stack in two passes, intercept pfn changes in set_pte_at()
   without the required break-before-make sequence, attempt to dump all
   instructions on unhandled kernel faults.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmP0/QsACgkQa9axLQDI
 XvG+gA/+JDVEH9wRzAIZvbp9hSuohPc48xgAmIMP1eiVB0/5qeRjYAJwS33H0rXS
 BPC2kj9IBy/eQeM9ICg0nFd0zYznSVacITqe6NrqeJ1F+ftS4rrHdfxd+J7kIoCs
 V2L8e+BJvmHdhmNV2qMAgJdGlfxfQBA7fv2cy52HKYcouoOh1AUVR/x+yXVXAsCd
 qJP3+dlUKccgm/oc5unEC1eZ49u8O+EoasqOyfG6K5udMgzhEX3K6imT9J3hw0WT
 UjstYkx5uGS/prUrRCQAX96VCHoZmzEDKtQuHkHvQXEYXsYPF3ldbR2CziNJnHe7
 QfSkjJlt8HAtExA+BkwEe9i0MQO/2VF5qsa2e4fA6l7uqGu3LOtS/jJd23C9n9fR
 Id8aBMeN6S8+MjqRA9L2uf4t6e4ISEHoG9ZRdc4WOwloxEEiJoIeun+7bHdOSZLj
 AFdHFCz4NXiiwC0UP0xPDI2YeCLqt5np7HmnrUqwzRpVO8UUagiJD8TIpcBSjBN9
 J68eidenHUW7/SlIeaMKE2lmo8AUEAJs9AorDSugF19/ThJcQdx7vT2UAZjeVB3j
 1dbbwajnlDOk/w8PQC4thFp5/MDlfst0htS3WRwa+vgkweE2EAdTU4hUZ8qEP7FQ
 smhYtlT1xUSTYDTqoaG/U2OWR6/UU79wP0jgcOsHXTuyYrtPI/Q=
 =VmXL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that
   Linux needs to save/restore

 - Include TPIDR2 in the signal context and add the corresponding
   kselftests

 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time

 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64

 - Permit EFI boot with MMU and caches on. Instead of cleaning the
   entire loaded kernel image to the PoC and disabling the MMU and
   caches before branching to the kernel bare metal entry point, leave
   the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
   all of system RAM to populate the initial page tables

 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values)

 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure

 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice

 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone

 - Further arm64 sysreg conversion and some fixes

 - arm64 kselftest fixes and improvements

 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation

 - Pseudo-NMI code generation optimisations

 - Minor fixes for SME and TPIDR2 handling

 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
   replace strtobool() to kstrtobool() in the cpufeature.c code, apply
   dynamic shadow call stack in two passes, intercept pfn changes in
   set_pte_at() without the required break-before-make sequence, attempt
   to dump all instructions on unhandled kernel faults

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
  arm64: fix .idmap.text assertion for large kernels
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  arm64: kprobes: Drop ID map text from kprobes blacklist
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  arm64/sme: Fix __finalise_el2 SMEver check
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  arm64/signal: Only read new data when parsing the ZT context
  arm64/signal: Only read new data when parsing the ZA context
  arm64/signal: Only read new data when parsing the SVE context
  arm64/signal: Avoid rereading context frame sizes
  arm64/signal: Make interface for restore_fpsimd_context() consistent
  arm64/signal: Remove redundant size validation from parse_user_sigframe()
  arm64/signal: Don't redundantly verify FPSIMD magic
  arm64/cpufeature: Use helper macros to specify hwcaps
  arm64/cpufeature: Always use symbolic name for feature value in hwcaps
  arm64/sysreg: Initial unsigned annotations for ID registers
  arm64/sysreg: Initial annotation of signed ID registers
  ...
2023-02-21 15:27:48 -08:00
Robin Murphy
a428eb4b99 Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
It turns out the optimisation implemented by commit 4f2c3872dd is
totally broken, since all the places that consume hw->dtcs_used for
events other than cycle count are still not expecting it to be sparsely
populated, and fail to read all the relevant DTC counters correctly if
so.

If implemented correctly, the optimisation potentially saves up to 3
register reads per event update, which is reasonably significant for
events targeting a single node, but still not worth a massive amount of
additional code complexity overall. Getting it right within the current
design looks a fair bit more involved than it was ever intended to be,
so let's just make a functional revert which restores the old behaviour
while still backporting easily.

Fixes: 4f2c3872dd ("perf/arm-cmn: Optimise DTC counter accesses")
Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b41bb4ed7283c3d8400ce5cf5e6ec94915e6750f.1674498637.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-01-26 13:55:38 +00:00
Robin Murphy
bb21ef19a3 perf/arm-cmn: Reset DTM_PMU_CONFIG at probe
Although we treat the DTM counters as free-running such that we're not
too concerned about the initial DTM state, it's possible for a previous
user to have left DTM counters enabled and paired with DTC counters.
Thus if the first events are scheduled using some, but not all, DTMs,
the as-yet-unused ones could end up adding spurious increments to the
event counts at the DTC. Make sure we sync our initial DTM_PMU_CONFIG
state to all the DTMs at probe time to avoid that possibility.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ba5f38b3dc733cd06bfb5e659b697e76d18c2183.1670269572.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-01-19 18:30:21 +00:00
Ilkka Koskinen
05d6f6d346 perf/arm-cmn: Add more bits to child node address offset field
CMN-600 uses bits [27:0] for child node address offset while bits [30:28]
are required to be zero.

For CMN-650, the child node address offset field has been increased
to include bits [29:0] while leaving only bit 30 set to zero.

Let's include the missing two bits and assume older implementations
comply with the spec and set bits [29:28] to 0.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Fixes: 60d1504070 ("perf/arm-cmn: Support new IP features")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220808195455.79277-1-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 14:30:00 +01:00
Robin Murphy
c578121298 perf/arm-cmn: Decode CAL devices properly in debugfs
The debugfs code is lazy, and since it only keeps the bottom byte of
each connect_info register to save space, it also treats the whole thing
as the device_type since the other bits were reserved anyway. Upon
closer inspection, though, this is no longer true on newer IP versions,
so let's be good and decode the exact field properly. This should help
it not get confused when a Component Aggregation Layer is present (which
is already implied if Node IDs are found for both device addresses
represented by the next two lines of the table).

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/6a13a6128a28cfe2eec6d09cf372a167ec9c3b65.1652274773.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-05-12 13:44:56 +01:00