Commit Graph

2110 Commits

Author SHA1 Message Date
David Thompson
e419675754 EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2
The BlueField EDAC driver supports the first generation BlueField-1 SoC, but
not the second generation BlueField-2 SoC. The BlueField-2 SoC is different in
that only secure accesses are allowed to the External Memory Interface (EMI)
register block. On BlueField-2, all read/write accesses from Linux to EMI
registers are routed via the Arm Secure Monitor Call (SMC) through Arm Trusted
Firmware (ATF), which runs at EL3 privileged state.

On BlueField-1, EMI registers are mapped and accessed directly. In order to
support BlueField-2, the driver's read and write access methods must be
extended with additional logic to include secure access to the EMI registers
via SMCs.

  [ bp: Move struct member comments above them, simplify. ]

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Link: https://lore.kernel.org/r/20241021233013.18405-1-davthompson@nvidia.com
2024-10-22 18:36:13 +02:00
David Thompson
1fe774a93b EDAC/bluefield: Fix potential integer overflow
The 64-bit argument for the "get DIMM info" SMC call consists of mem_ctrl_idx
left-shifted 16 bits and OR-ed with DIMM index.  With mem_ctrl_idx defined as
32-bits wide the left-shift operation truncates the upper 16 bits of
information during the calculation of the SMC argument.

The mem_ctrl_idx stack variable must be defined as 64-bits wide to prevent any
potential integer overflow, i.e. loss of data from upper 16 bits.

Fixes: 82413e562e ("EDAC, mellanox: Add ECC support for BlueField DDR4")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Link: https://lore.kernel.org/r/20240930151056.10158-1-davthompson@nvidia.com
2024-10-17 14:10:18 +02:00
Lili Li
0be9f1af39 EDAC/igen6: Add Intel Panther Lake-H SoCs support
Panther Lake-H SoCs share the same IBECC registers with Meteor Lake-P
SoCs. Add Panther Lake-H SoC compute die IDs for EDAC support.

Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241012071439.54165-1-qiuxu.zhuo@intel.com
2024-10-14 11:28:41 -07:00
Rajendra Nayak
0a97195d21 EDAC/qcom: Make irq configuration optional
On most modern qualcomm SoCs, the configuration necessary to enable the
Tag/Data RAM related irqs being propagated to the SoC irq controller is
already done in firmware (in DSF or 'DDR System Firmware')

On some like the x1e80100, these registers aren't even accesible to the
kernel causing a crash when edac device is probed.

Hence, make the irq configuration optional in the driver and mark x1e80100
as the SoC on which this should be avoided.

Fixes: af16b00578 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts")
Reported-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Rajendra Nayak <quic_rjendra@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20240903101510.3452734-1-quic_rjendra@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-10-05 22:17:08 -05:00
Linus Torvalds
7dfc15c473 - Drop a now obsolete ppc4xx_edac driver
- Fix conversion to physical memory addresses on Intel's Elkhart Lake and Ice
   Lake hardware when the system address is above the (Top-Of-Memory) TOM
   address
 
 - Pay attention to the memory hole on Zynq UltraScale+ MPSoC DDR controllers
   when injecting errors for testing purposes
 
 - Add support for translating normalized error addresses reported by an AMD
   memory controller into system physical addresses using an UEFI mechanism
   called platform runtime mechanism (PRM).
 
 - The usual cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbeuNcACgkQEsHwGGHe
 VUoELw//fZaWbfYg7yYw8iTMojc01LCmS5m6nQeJc6PewcIfLp6FXr4V4Rq99NUn
 FBVIMunm0unRAqep9WTY+xphxlP9u9VovyaLR0cxRf1aEi3xRFit7PIG7P3RyTUn
 ipDKBnx0plTlwB9US5XllhGCM6xAvrNBoKPe1LV+bd7z9wOJvIy3GeV/65ajLsLV
 +7wNBJ8CMXIJ+319FK35ZUM1butp2XFLVtLqKL53nPsumowZcegfaD1u6sfsX4SO
 je8BpNMXKHl0ftZ3DPAMAGrr4M54lsXX/62k3PqcUr4LMbVGLzQmDGyoHUWwdruT
 OGb5tVWqBXoR6DA03/P25q1SGKwGsbuzK33E8T9vkwIqBrj73vA+tVBv03U3QFMO
 RSb4/BS09q/GtA70OFCnigumLoKMmuZu0tcLGQaUMP6sWVVVMp1vVctTapl22h57
 sonEUf0+GMsVu4ueS/vSfU3R3Dqadg/4LxZPG7njc06hCNDAu7u4/0gGdGuiQwqF
 ZyLUZO3SlJX/SkWfNyW4Lc4GNWRWgtFfh5sgODxATCE5NyUrazsQZg5Jsxr/5Jwv
 aBDsbHEUHO0zKRGfDBfHyaWK8318z+my8zvVhIGLuQCKEY8GSTK35rfthkp6vbEe
 UNrCgea+HaDZt6jN4ahaZjK/0DjiMSO12gA3GPt7tdO6v+U46/0=
 =+/Fq
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Drop a now obsolete ppc4xx_edac driver

 - Fix conversion to physical memory addresses on Intel's Elkhart Lake
   and Ice Lake hardware when the system address is above the
   (Top-Of-Memory) TOM address

 - Pay attention to the memory hole on Zynq UltraScale+ MPSoC DDR
   controllers when injecting errors for testing purposes

 - Add support for translating normalized error addresses reported by an
   AMD memory controller into system physical addresses using an UEFI
   mechanism called platform runtime mechanism (PRM).

 - The usual cleanups and fixes

* tag 'edac_updates_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC: Drop obsolete PPC4xx driver
  EDAC/sb_edac: Fix the compile warning of large frame size
  EDAC/{skx_common,i10nm}: Remove the AMAP register for determing DDR5
  EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common
  EDAC/igen6: Fix conversion of system address to physical memory address
  EDAC/synopsys: Fix error injection on Zynq UltraScale+
  RAS/AMD/ATL: Translate normalized to system physical addresses using PRM
  ACPI: PRM: Add PRM handler direct call support
2024-09-16 06:36:37 +02:00
Borislav Petkov (AMD)
92f8358bce Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-misc' and 'ras/edac-drivers' into edac-updates
* ras/edac-amd-atl:
  RAS/AMD/ATL: Translate normalized to system physical addresses using PRM
  ACPI: PRM: Add PRM handler direct call support

* ras/edac-misc:
  EDAC/synopsys: Fix error injection on Zynq UltraScale+

* ras/edac-drivers:
  EDAC: Drop obsolete PPC4xx driver
  EDAC/sb_edac: Fix the compile warning of large frame size
  EDAC/{skx_common,i10nm}: Remove the AMAP register for determing DDR5
  EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common
  EDAC/igen6: Fix conversion of system address to physical memory address

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-09-09 10:51:30 +02:00
Rob Herring (Arm)
a5f285d9cf EDAC: Drop obsolete PPC4xx driver
Since

  47d13a269b ("powerpc/40x: Remove 40x platforms.")

support for PPC40x platforms has been removed. While the EDAC driver also
mentions PPC440 and PPC460 processors, the driver refuses to probe on anything
other than PPC405. It's unlikely support will ever be added at this point for
these other old platforms, so the driver can be removed.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20240904192224.3060307-2-robh@kernel.org
2024-09-05 16:56:38 +02:00
Qiuxu Zhuo
43247abd09 EDAC/sb_edac: Fix the compile warning of large frame size
Compiling sb_edac driver with GCC 11.4.0 and the W=1 option reported
the following warning:

  drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
  drivers/edac/sb_edac.c:3249:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]

As there is no concurrent invocation of sbridge_mce_output_error(),
fix this warning by moving the large-size variables 'msg' and 'msg_full'
from the stack to the pre-allocated data segment.

[Tony: Fix checkpatch warnings for code alignment & use of strcpy()]

Reported-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20240829120903.84152-1-qiuxu.zhuo@intel.com
2024-09-03 15:09:22 -07:00
Qiuxu Zhuo
7a33c144c2 EDAC/{skx_common,i10nm}: Remove the AMAP register for determing DDR5
The configuration flag 'res_config->support_ddr5 = true' sufficiently
indicates DDR5 memory support for Sapphire Rapids and Granite Rapids.
Additionally, the i10nm_edac driver doesn't need to use the AMAP
register for setting the 'fine_grain_bank' of each DIMM. Therefore,
remove the AMAP register for determining DDR5.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20240829061309.57738-1-qiuxu.zhuo@intel.com
2024-09-03 12:36:59 -07:00
Qiuxu Zhuo
8b93582353 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common
Commit

  afdb82fd763c ("EDAC, i10nm: make skx_common.o a separate module")

made skx_common.o a separate module. With skx_common.o now a separate
module, move the common debug code setup_{skx,i10nm}_debug() and
teardown_{skx,i10nm}_debug() in {skx,i10nm}_base.c to skx_common.c to
reduce code duplication. Additionally, prefix these function names with
'skx' to maintain consistency with other names in the file.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20240829055101.56245-1-qiuxu.zhuo@intel.com
2024-09-03 12:35:06 -07:00
Qiuxu Zhuo
0ad875f442 EDAC/igen6: Fix conversion of system address to physical memory address
The conversion of system address to physical memory address (as viewed by
the memory controller) by igen6_edac is incorrect when the system address
is above the TOM (Total amount Of populated physical Memory) for Elkhart
Lake and Ice Lake (Neural Network Processor). Fix this conversion.

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com
2024-09-03 12:27:19 -07:00
Shubhrajyoti Datta
35e6dbfe18 EDAC/synopsys: Fix error injection on Zynq UltraScale+
The Zynq UltraScale+ MPSoC DDR has a disjoint memory from 2GB to 32GB.
The DDR host interface has a contiguous memory so while injecting
errors, the driver should remove the hole else the injection fails as
the address translation is incorrect.

Introduce a get_mem_info() function pointer and set it for Zynq
UltraScale+ platform to return host address.

Fixes: 1a81361f75 ("EDAC, synopsys: Add Error Injection support for ZynqMP DDR controller")
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240711100656.31376-1-shubhrajyoti.datta@amd.com
2024-08-01 16:27:46 +02:00
Linus Torvalds
1a251f52cf minmax: make generic MIN() and MAX() macros available everywhere
This just standardizes the use of MIN() and MAX() macros, with the very
traditional semantics.  The goal is to use these for C constant
expressions and for top-level / static initializers, and so be able to
simplify the min()/max() macros.

These macro names were used by various kernel code - they are very
traditional, after all - and all such users have been fixed up, with a
few different approaches:

 - trivial duplicated macro definitions have been removed

   Note that 'trivial' here means that it's obviously kernel code that
   already included all the major kernel headers, and thus gets the new
   generic MIN/MAX macros automatically.

 - non-trivial duplicated macro definitions are guarded with #ifndef

   This is the "yes, they define their own versions, but no, the include
   situation is not entirely obvious, and maybe they don't get the
   generic version automatically" case.

 - strange use case #1

   A couple of drivers decided that the way they want to describe their
   versioning is with

	#define MAJ 1
	#define MIN 2
	#define DRV_VERSION __stringify(MAJ) "." __stringify(MIN)

   which adds zero value and I just did my Alexander the Great
   impersonation, and rewrote that pointless Gordian knot as

	#define DRV_VERSION "1.2"

   instead.

 - strange use case #2

   A couple of drivers thought that it's a good idea to have a random
   'MIN' or 'MAX' define for a value or index into a table, rather than
   the traditional macro that takes arguments.

   These values were re-written as C enum's instead. The new
   function-line macros only expand when followed by an open
   parenthesis, and thus don't clash with enum use.

Happily, there weren't really all that many of these cases, and a lot of
users already had the pattern of using '#ifndef' guarding (or in one
case just using '#undef MIN') before defining their own private version
that does the same thing. I left such cases alone.

Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28 15:49:18 -07:00
Linus Torvalds
4477b39c32 minmax: add a few more MIN_T/MAX_T users
Commit 3a7e02c040 ("minmax: avoid overly complicated constant
expressions in VM code") added the simpler MIN_T/MAX_T macros in order
to avoid some excessive expansion from the rather complicated regular
min/max macros.

The complexity of those macros stems from two issues:

 (a) trying to use them in situations that require a C constant
     expression (in static initializers and for array sizes)

 (b) the type sanity checking

and MIN_T/MAX_T avoids both of these issues.

Now, in the whole (long) discussion about all this, it was pointed out
that the whole type sanity checking is entirely unnecessary for
min_t/max_t which get a fixed type that the comparison is done in.

But that still leaves min_t/max_t unnecessarily complicated due to
worries about the C constant expression case.

However, it turns out that there really aren't very many cases that use
min_t/max_t for this, and we can just force-convert those.

This does exactly that.

Which in turn will then allow for much simpler implementations of
min_t()/max_t().  All the usual "macros in all upper case will evaluate
the arguments multiple times" rules apply.

We should do all the same things for the regular min/max() vs MIN/MAX()
cases, but that has the added complexity of various drivers defining
their own local versions of MIN/MAX, so that needs another level of
fixes first.

Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28 13:41:14 -07:00
Linus Torvalds
222dfb8326 - Make error checking of AMD SMN accesses more robust in the callers as
they're the only ones who can interpret the results properly
 
  - The usual cleanups and fixes, left and right
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaVOU0ACgkQEsHwGGHe
 VUqeFBAAl9X4bj08GwSAXfqBangXaGpKO4Nx0VZiFCYDkQ/TDnchMEBbpRWSuVzS
 SEnVSrcAXCxKqhv295UyFMmv2a+q3UUidkxTzRfznekMZMMylHYcfCFrg16w9ZNJ
 N/cBquTu96hSJHd2/usNUvNPLllTrMoIg3gofBav+NTaHQQDmzvM5htfewREY9OF
 SRS/86o3u5oIsRKKiJRyzfLzzX9lEGUvU+lvxv/yu1x2Q6SG0guhfM3HeaSxCIOs
 yeB23bwe/N/pO5KlqOtEJJL49Ypu2k/jfiS2rhH6AxSqNfXVpBlDbnahu9sA973n
 irzWwycJhVU4OQ3pqmPXdcKDqn7GmUWDsjrkEIOqJeBCSukmlM7APi8Ss8yGZ3X4
 HgDw10c900ldrxSo0H5PdpeULvowpeptpzBY8gzcdum4s0vNUvZLy/n1AKo7ydea
 oJ+ZBdXvywnR66uGQLkTxLvpGTNgyFrKDORHuyOAwJTN5CbLuco2SV/82mkcQCZt
 sAgyiWFvIcLoHZPfY8BNztYWVX01lWDIxFHJE8ca/B97mBeZCC3w1DnHJla8Kxsg
 zCMV0yn61BdMvjVS9AGaKqEuN0gYYrs/QOjtOp5ggAv7QC1ke/wqgZoFGvLbmcP9
 pIf8GzCt34u3tACGAl76toP0rtnMjGvKD8xXdHGHf7AAj1jKo28=
 =rd6Q
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

 - Make error checking of AMD SMN accesses more robust in the callers as
   they're the only ones who can interpret the results properly

 - The usual cleanups and fixes, left and right

* tag 'x86_misc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kmsan: Fix hook for unaligned accesses
  x86/platform/iosf_mbi: Convert PCIBIOS_* return codes to errnos
  x86/pci/xen: Fix PCIBIOS_* return code handling
  x86/pci/intel_mid_pci: Fix PCIBIOS_* return code handling
  x86/of: Return consistent error type from x86_of_pci_irq_enable()
  hwmon: (k10temp) Rename _data variable
  hwmon: (k10temp) Remove unused HAVE_TDIE() macro
  hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters
  hwmon: (k10temp) Define a helper function to read CCD temperature
  x86/amd_nb: Enhance SMN access error checking
  hwmon: (k10temp) Check return value of amd_smn_read()
  EDAC/amd64: Check return value of amd_smn_read()
  EDAC/amd64: Remove unused register accesses
  tools/x86/kcpuid: Add missing dir via Makefile
  x86, arm: Add missing license tag to syscall tables files
2024-07-15 19:53:07 -07:00
Linus Torvalds
8028e290b6 - The AMD memory controllers data fabric version 4.5 supports
non-power-of-2 denormalization in the sense that certain bits of the
   system physical address cannot be reconstructed from the normalized
   address reported by the RAS hardware. Add support for handling such
   addresses
 
 - Switch the EDAC drivers to the new Intel CPU model defines
 
 - The usual fixes and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaU9aQACgkQEsHwGGHe
 VUpkKQ//eWbeC4JosmRohUECE7MtZppAJ7iX7I7DbQkpKAjdeN4qnPESIQleFN9o
 qg7CYkLRUOi8sYJ3MKmIG5l+yxgztKZl7EvzfAaKiCPDt2EK0DDLmhO3VTE1muTn
 bYo3kk0HpxCVFfuWxmDCu36CC11wkGmjUo5k6XCE5L4hFlywvVwrktc55jQWsbWk
 Kc5iAJxxSc+C8/7oTjqnYuARNl/6Fl4S376GYoxHXzlZI8VoFLO/sW20fz7gQjZg
 n/y25CEHki/K9y+bU8Gsexcwhd0jbU02HYtKQI7klcDqyamm8IlmLcTEXZ6Ozlhg
 C/dYs2FI9vi6V8B3f8tGHSA3jZgFmcU0OJV9Zl1Pr/ORax9+nbhfxyJbYgp/SgT5
 1so5d3iqM2vD+UHnyld0WftVO/HxurhhKPgfCHvcagQnseFwNNqSKGUuwcJ33RCs
 iUMBtwmupJL4nAoF+7ZskYbT2zTUduxgCjRiw0ok3h/mxZ+HvmPne5T8y1c1nzUC
 +GJbPmprLhKhxKaBrd8w2vrWZHb3X0OccZzfyoS/Eiy0VTdZsVGZfhFEYHvRxYHA
 rpM2ex0HrrI3RwrGRmp80PJjMVdGTVbue9yWRBN7LTyBmB+GkUPzCnGpFzyxibNe
 iKnwwUjIzhZ48ImImbiCcVA+VMUHSqvLvBMEeYD3nyrZO1x9OKI=
 =kLNX
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - The AMD memory controllers data fabric version 4.5 supports
   non-power-of-2 denormalization in the sense that certain bits of the
   system physical address cannot be reconstructed from the normalized
   address reported by the RAS hardware. Add support for handling such
   addresses

 - Switch the EDAC drivers to the new Intel CPU model defines

 - The usual fixes and cleanups all over the place

* tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC: Add missing MODULE_DESCRIPTION() macros
  EDAC/dmc520: Use devm_platform_ioremap_resource()
  EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support
  RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA
  RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization
  RAS/AMD/ATL: Validate address map when information is gathered
  RAS/AMD/ATL: Expand helpers for adding and removing base and hole
  RAS/AMD/ATL: Read DRAM hole base early
  RAS/AMD/ATL: Add amd_atl pr_fmt() prefix
  RAS/AMD/ATL: Add a missing module description
  EDAC, i10nm: make skx_common.o a separate module
  EDAC/skx: Switch to new Intel CPU model defines
  EDAC/sb_edac: Switch to new Intel CPU model defines
  EDAC, pnd2: Switch to new Intel CPU model defines
  EDAC/i10nm: Switch to new Intel CPU model defines
  EDAC/ghes: Add missing newline to pr_info() statement
  RAS/AMD/ATL: Add missing newline to pr_info() statement
  EDAC/thunderx: Remove unused struct error_syndrome
2024-07-15 18:20:24 -07:00
Jeff Johnson
3afa157f43 EDAC: Add missing MODULE_DESCRIPTION() macros
With ARCH=arm64

  make allmodconfig && make W=1 C=1

reports:

  WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/edac/layerscape_edac_mod.o

Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().

This includes mpc85xx_edac.c and four octeon_edac-*.c files which,
although they did not produce a warning with the arm64 allmodconfig
configuration, may cause this warning with other configurations.

  [ bp: s/module/driver/ for layerscape_edac ]

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240617-md-arm64-drivers-edac-v2-1-6d6c5dd1e5da@quicinc.com
2024-06-29 16:21:01 +02:00
Jai Arora
420c324d59 EDAC/dmc520: Use devm_platform_ioremap_resource()
platform_get_resource() and devm_ioremap_resource() are wrapped up in the
devm_platform_ioremap_resource() helper. Use the helper and get rid of the
local variable for struct resource *.

Signed-off-by: Jai Arora <jai.arora@samsung.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240618110226.97395-1-jai.arora@samsung.com
2024-06-23 10:48:55 +02:00
Qiuxu Zhuo
88150cd950 EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support
Arrow Lake-U/H SoCs share same IBECC registers with Meteor Lake-P
SoCs. Add Arrow Lake-U/H SoC compute die IDs for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240614030354.69180-1-qiuxu.zhuo@intel.com
2024-06-14 08:08:12 -07:00
Yazen Ghannam
5ac6293047 EDAC/amd64: Check return value of amd_smn_read()
Check the return value of amd_smn_read() before saving a value. This
ensures invalid values aren't saved. The struct umc instance is
initialized to 0 during memory allocation. Therefore, a bad read will
keep the value as 0 providing the expected Read-as-Zero behavior.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240606-fix-smn-bad-read-v4-2-ffde21931c3f@amd.com
2024-06-12 11:33:45 +02:00
Yazen Ghannam
f97a8b9170 EDAC/amd64: Remove unused register accesses
A number of UMC registers are read only for the purpose of debug printing. They
are not used in any calculations. Nor do they have any specific debug value.

Remove them.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240606-fix-smn-bad-read-v4-1-ffde21931c3f@amd.com
2024-06-12 11:33:45 +02:00
Ilpo Järvinen
f8367a74ae EDAC/igen6: Convert PCIBIOS_* return codes to errnos
errcmd_enable_error_reporting() uses pci_{read,write}_config_word()
that return PCIBIOS_* codes. The return code is then returned all the
way into the probe function igen6_probe() that returns it as is. The
probe functions, however, should return normal errnos.

Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from errcmd_enable_error_reporting().

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com
2024-06-04 11:29:52 +02:00
Ilpo Järvinen
3ec8ebd8a5 EDAC/amd64: Convert PCIBIOS_* return codes to errnos
gpu_get_node_map() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The return code is then returned all the way into the module
init function amd64_edac_init() that returns it as is. The module init
functions, however, should return normal errnos.

Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from gpu_get_node_map().

For consistency, convert also the other similar cases which return
PCIBIOS_* codes even if they do not have any bugs at the moment.

Fixes: 4251566ebc ("EDAC/amd64: Cache and use GPU node map")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527132236.13875-1-ilpo.jarvinen@linux.intel.com
2024-06-04 11:24:16 +02:00
Arnd Bergmann
123b158635 EDAC, i10nm: make skx_common.o a separate module
Commit 598afa0504 ("kbuild: warn objects shared among multiple modules")
was added to track down cases where the same object is linked into
multiple modules. This can cause serious problems if some modules are
builtin while others are not.

That test triggers this warning:

scripts/Makefile.build:236: drivers/edac/Makefile: skx_common.o is added to multiple modules: i10nm_edac skx_edac

Make this a separate module instead.

[Tony: Added more background details to commit message]

Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20240529095132.1929397-1-arnd@kernel.org/
2024-05-29 13:30:10 -07:00
Tony Luck
c2c887e9f9 EDAC/skx: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240520224620.9480-39-tony.luck@intel.com
2024-05-28 16:04:44 -07:00
Tony Luck
9593189cf0 EDAC/sb_edac: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240520224620.9480-38-tony.luck@intel.com
2024-05-28 16:04:17 -07:00
Tony Luck
e09d576c86 EDAC, pnd2: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240520224620.9480-37-tony.luck@intel.com
2024-05-28 16:03:43 -07:00
Tony Luck
bc39bfbaa2 EDAC/i10nm: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240520224620.9480-36-tony.luck@intel.com
2024-05-28 16:02:44 -07:00
Vasyl Gomonovych
e6f53274c0 EDAC/ghes: Add missing newline to pr_info() statement
Add a missing newline character even if printk() adds newlines to
non-\n-terminated strings because in the unlikely case a KERN_CONT print
statement is added after the unterminated statement, the two will get
glued together which is not the expected behavior.

[ bp: Rewrite commit message. ]

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240517204951.2019031-1-gomonovych@gmail.com
2024-05-28 16:13:09 +02:00
Dr. David Alan Gilbert
9aa31612d9 EDAC/thunderx: Remove unused struct error_syndrome
struct error_syndrome appears never to have been used. Remove it,
together with the MAX_SYNDROME_REGS it used.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240516133404.251397-1-linux@treblig.org
2024-05-27 14:42:04 +02:00
Linus Torvalds
eba77c0477 - Have skx_edac decode error addresses belonging to SGX properly
- Remove a bunch of unused struct members
 
 - Other cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmZB1k0ACgkQEsHwGGHe
 VUryTQ/8DuVnwHwPcRMrQmge6x2ZPuKZ73RBuFrDqnAcJdNau6YTzd1Iav2r5DE3
 Op6ubfUT1RJv7pmE5Q8EBZ5qoWJQ3SnIifFXT8HDqg2iqTlibXS7NUJCxHzeOzTs
 Z+YgAU618x18IZ0j+Dq55U7yUtvQTviwY8FkO+D+mr/4TFt7w6zCfKNomZm5sDi8
 3RfQD10OGVAlDBFdHVziKyhj82dNyQ20OMLrQ0RnhSG6D2e/3+gB88t9SaM0yiJ2
 ogWHlGiB9vLQEnGuru9+HXahHqJd0DZQPJc5ygO4EufNTpuDWFctm5zGzNcnk1rz
 tMvvyaN8ix7KTo5a9gWRqb5ElW7dDHJkM86z/uvGsNhD1DjVGZl5VUwgJp+sSuL6
 oepW1t6zqmNw81OgiZuhvWWk99HPEDQT2u1zxzmTkjXKEa2cY6Ju1KpzPxNECD4Y
 WwJPyUZhUsdEJ8+oQdZT2MzG3enAE/CxGlxcDEKbZU6WL19N+ofDiWYgMJaLLmW4
 5k0zejE6GMgts6seFNu7NfEAVieaT7proar0GPdi4WR+oERrlEDExyzkNPyUHShR
 H+Q7tlEQlKQQdApoa4H6WuKiSpPZtxkRgOW5W7AE4LHEvd3MGzT5PC+qu/s/vo/H
 uzL9rCnYjBdKNwNEg0bpWGXHk/hXIRJcXWdPtcIXMP5w+vTSIY4=
 =SVW7
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Have skx_edac decode error addresses belonging to SGX properly

 - Remove a bunch of unused struct members

 - Other cleanups

* tag 'edac_updates_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/skx_common: Allow decoding of SGX addresses
  EDAC/mc_sysfs: Convert sprintf()/snprintf() to sysfs_emit()
  EDAC: Remove unused struct members
  EDAC: Remove dynamic attributes from edac_device_alloc_ctl_info()
  EDAC/device: Remove edac_dev_sysfs_block_attribute::store()
  EDAC/device: Remove edac_dev_sysfs_block_attribute::{block,value}
  EDAC/amd64: Remove unused struct member amd64_pvt::ext_nbcfg
2024-05-14 08:31:10 -07:00
Serge Semin
591c946675 EDAC/synopsys: Fix ECC status and IRQ control race condition
The race condition around the ECCCLR register access happens in the IRQ
disable method called in the device remove() procedure and in the ECC IRQ
handler:

  1. Enable IRQ:
     a. ECCCLR = EN_CE | EN_UE
  2. Disable IRQ:
     a. ECCCLR = 0
  3. IRQ handler:
     a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
     b. ECCCLR = 0
     c. ECCCLR = EN_CE | EN_UE

So if the IRQ disabling procedure is called concurrently with the IRQ
handler method the IRQ might be actually left enabled due to the
statement 3c.

The root cause of the problem is that ECCCLR register (which since
v3.10a has been called as ECCCTL) has intermixed ECC status data clear
flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN
flags) and handling (write 1 to clear ECC status data) procedures must
be serialised around the ECCCTL register modification to prevent the
race.

So fix the problem described above by adding the spin-lock around the
ECCCLR modifications and preventing the IRQ-handler from modifying the
IRQs enable flags (there is no point in disabling the IRQ and then
re-enabling it again within a single IRQ handler call, see the
statements 3a/3b and 3c above).

Fixes: f7824ded41 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com
2024-05-06 14:19:07 +02:00
Shubhrajyoti Datta
1a24733e80 EDAC/versal: Do not log total error counts
When logging errors, the driver currently logs the total error count.
However, it should log the current error only. Fix it.

  [ bp: Rewrite text. ]

Fixes: 6f15b178cd ("EDAC/versal: Add a Xilinx Versal memory controller driver")
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240425121942.26378-4-shubhrajyoti.datta@amd.com
2024-04-25 18:08:05 +02:00
Shubhrajyoti Datta
de87ba848d EDAC/versal: Check user-supplied data before injecting an error
The function inject_data_ue_store() lacks a NULL check for the user
passed values. To prevent below kernel crash include a NULL check.

Call trace:

  kstrtoull
  kstrtou8
  inject_data_ue_store
  full_proxy_write
  vfs_write
  ksys_write
  __arm64_sys_write
  invoke_syscall
  el0_svc_common.constprop.0
  do_el0_svc
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync

Fixes: 83bf24051a ("EDAC/versal: Make the bit position of injected errors configurable")
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240425121942.26378-3-shubhrajyoti.datta@amd.com
2024-04-25 18:04:47 +02:00
Shubhrajyoti Datta
edbe59428e EDAC/versal: Do not register for NOC errors
The NOC errors are not handled in the driver. Remove the request for
registration.

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240425121942.26378-2-shubhrajyoti.datta@amd.com
2024-04-25 18:04:14 +02:00
Qiuxu Zhuo
e0d3350778 EDAC/skx_common: Allow decoding of SGX addresses
There are no "struct page" associations with SGX pages, causing the check
pfn_to_online_page() to fail. This results in the inability to decode the
SGX addresses and warning messages like:

  Invalid address 0x34cc9a98840 in IA32_MC17_ADDR

Add an additional check to allow the decoding of the error address and to
skip the warning message, if the error address is an SGX address.

Fixes: 1e92af09fa ("EDAC/skx_common: Filter out the invalid address")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240408120419.50234-1-qiuxu.zhuo@intel.com
2024-04-08 09:49:45 -07:00
Li Zhijian
d7518ad4ed EDAC/mc_sysfs: Convert sprintf()/snprintf() to sysfs_emit()
Per Documentation/filesystems/sysfs.rst, show() should only use
sysfs_emit() or sysfs_emit_at() when formatting the value to be returned
to user space.

Generated by:

  make coccicheck M=<path/to/file> MODE=patch \
    COCCI=scripts/coccinelle/api/device_attr_show.cocci

No functional change intended.

  [ bp: Massage. ]

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240314084628.1322006-1-lizhijian@fujitsu.com
2024-04-04 18:23:57 +02:00
Jiri Slaby (SUSE)
c8d37084e9 EDAC: Remove unused struct members
Remove unused

- edac_pci_ctl_info::edac_subsys
- edac_pci_ctl_info::complete
- edac_device_ctl_info::removal_complete

members.

Found by https://github.com/jirislaby/clang-struct.

  [ bp: Squash three almost identical trivial patches into one. ]

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240213112051.27715-6-jirislaby@kernel.org
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-27 18:26:58 +01:00
Jiri Slaby (SUSE)
48bc8869c5 EDAC: Remove dynamic attributes from edac_device_alloc_ctl_info()
Dynamic attributes are not passed from any caller of
edac_device_alloc_ctl_info(). Drop this unused/untested functionality
completely.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240213112051.27715-5-jirislaby@kernel.org
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-27 18:26:58 +01:00
Jiri Slaby (SUSE)
9186695ef7 EDAC/device: Remove edac_dev_sysfs_block_attribute::store()
No one uses this store hook (both BLOCK_ATTR() pass NULL). It actually
never was since its addition in

  fd309a9d8e ("drivers/edac: fix leaf sysfs attribute")

so drop it.

Found by https://github.com/jirislaby/clang-struct.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240213112051.27715-4-jirislaby@kernel.org
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-27 18:26:57 +01:00
Jiri Slaby (SUSE)
3667a35a50 EDAC/device: Remove edac_dev_sysfs_block_attribute::{block,value}
They're unused. And they were never used since their addition in

  fd309a9d8e ("drivers/edac: fix leaf sysfs attribute")

Drop it.

Found by https://github.com/jirislaby/clang-struct.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240213112051.27715-3-jirislaby@kernel.org
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-27 18:26:57 +01:00
Jiri Slaby (SUSE)
f5ca0d5156 EDAC/amd64: Remove unused struct member amd64_pvt::ext_nbcfg
Commit

  cfe40fdb4a ("amd64_edac: add driver header")

added amd64_pvt struct with ext_nbcfg in it. But no one used that member
since then.

Therefore, remove it.

Found by https://github.com/jirislaby/clang-struct.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20240213112051.27715-2-jirislaby@kernel.org
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-27 18:26:57 +01:00
Linus Torvalds
b0402403e5 - Add a FRU (Field Replaceable Unit) memory poison manager which
collects and manages previously encountered hw errors in order to
    save them to persistent storage across reboots. Previously recorded
    errors are "replayed" upon reboot in order to poison memory which has
    caused said errors in the past.
 
    The main use case is stacked, on-chip memory which cannot simply be
    replaced so poisoning faulty areas of it and thus making them
    inaccessible is the only strategy to prolong its lifetime.
 
  - Add an AMD address translation library glue which converts the
    reported addresses of hw errors into system physical addresses in
    order to be used by other subsystems like memory failure, for
    example. Add support for MI300 accelerators to that library.
 
  - igen6: Add support for Alder Lake-N SoC
 
  - i10nm: Add Grand Ridge support
 
  - The usual fixlets and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvKHcACgkQEsHwGGHe
 VUo4Lg/+OwXDI1EaCDyaHJ+f6JRmNok1EGjKMVjpp71/XmE3eUjiXfCv/b0bwl3V
 oIXGlXpJ5RSME+9aFDWADaE3h5zAGzTwQXuKtOUQPiJ6UuCebXodm8SaIG8V8trG
 yaW/hhP98AoJD+fN6qzv4XWYvTG8VRQs4tdISg9FXiljTjv4mKA+sxuCu8KpfrDh
 Tg+9F4Rre6gyR5GaB6N7Cc0k97DM7n5yKBZZGKucv+oYzDyf6n631ZSJ2zA9NC51
 CJlux917hCXI/IWrCQ2nkyfPPXxn8AaznUAA30wKgwlt8TFSdKTW+DvRA2zyuAU3
 0UDHO4FezOKuzVnWkzdnKsIMAnDyTGOz3Fi2LU4mC+JHaHHmI2quSWDxp5phWBuy
 S+T3XHxpbSsLGEI7zxT5F9u1oAlCvYu1C7HJw+yxNSn2iCy5LoNo0H/kl/nhR8Xr
 FgVp8SYgQRU2Pp8vgGOibMYY/TAHX55EticKdxvBI0yY+iqoJyAbZ0fb0XyLNc7s
 GqoWfvrK1KQzf5/Ya1Mm//0/QTPyFmJwujMJ2eEnMRRER+23bYpGvVBBT8E1sG9s
 gqEJkKjmVCPt9xJTcivm96sLJ7CG36w8+r/axSqpKXdcvDG9ec8G8PRqjlo5pcvh
 gYevmCBIcKny1xuhALwD6Rn2mkPip7araycDx9X9nd5z1qCxBaU=
 =FR2l
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add a FRU (Field Replaceable Unit) memory poison manager which
   collects and manages previously encountered hw errors in order to
   save them to persistent storage across reboots. Previously recorded
   errors are "replayed" upon reboot in order to poison memory which has
   caused said errors in the past.

   The main use case is stacked, on-chip memory which cannot simply be
   replaced so poisoning faulty areas of it and thus making them
   inaccessible is the only strategy to prolong its lifetime.

 - Add an AMD address translation library glue which converts the
   reported addresses of hw errors into system physical addresses in
   order to be used by other subsystems like memory failure, for
   example. Add support for MI300 accelerators to that library.

 - igen6: Add support for Alder Lake-N SoC

 - i10nm: Add Grand Ridge support

 - The usual fixlets and cleanups

* tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/versal: Convert to platform remove callback returning void
  RAS/AMD/FMPM: Fix off by one when unwinding on error
  RAS/AMD/FMPM: Add debugfs interface to print record entries
  RAS/AMD/FMPM: Save SPA values
  RAS: Export helper to get ras_debugfs_dir
  RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
  RAS: Introduce a FRU memory poison manager
  RAS/AMD/ATL: Add MI300 row retirement support
  Documentation: Move RAS section to admin-guide
  EDAC/versal: Make the bit position of injected errors configurable
  EDAC/i10nm: Add Intel Grand Ridge micro-server support
  EDAC/igen6: Add one more Intel Alder Lake-N SoC support
  RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
  RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
  RAS/AMD/ATL: Add MI300 support
  Documentation: RAS: Add index and address translation section
  EDAC/amd64: Use new AMD Address Translation Library
  RAS: Introduce AMD Address Translation Library
  EDAC/synopsys: Convert to devm_platform_ioremap_resource()
2024-03-11 18:14:06 -07:00
Borislav Petkov (AMD)
af65545a0f Merge remote-tracking branches 'ras/edac-drivers', 'ras/edac-misc' and 'ras/edac-amd-atl' into edac-updates-for-v6.9
* ras/edac-drivers:
  EDAC/i10nm: Add Intel Grand Ridge micro-server support
  EDAC/igen6: Add one more Intel Alder Lake-N SoC support

* ras/edac-misc:
  EDAC/versal: Convert to platform remove callback returning void
  EDAC/versal: Make the bit position of injected errors configurable
  EDAC/synopsys: Convert to devm_platform_ioremap_resource()

* ras/edac-amd-atl:
  RAS/AMD/FMPM: Fix off by one when unwinding on error
  RAS/AMD/FMPM: Add debugfs interface to print record entries
  RAS/AMD/FMPM: Save SPA values
  RAS: Export helper to get ras_debugfs_dir
  RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
  RAS: Introduce a FRU memory poison manager
  RAS/AMD/ATL: Add MI300 row retirement support
  Documentation: Move RAS section to admin-guide
  RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
  RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
  RAS/AMD/ATL: Add MI300 support
  Documentation: RAS: Add index and address translation section
  EDAC/amd64: Use new AMD Address Translation Library
  RAS: Introduce AMD Address Translation Library

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-11 16:24:20 +01:00
Uwe Kleine-König
4527a2194e EDAC/versal: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve this, there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Link: https://lore.kernel.org/r/83deca1ce260f7e17ff3cb106c9a6946d4ca4505.1709886922.git.u.kleine-koenig@pengutronix.de
2024-03-08 13:57:49 +01:00
Thomas Gleixner
7e3ec62867 x86/cpu/amd: Provide a separate accessor for Node ID
AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.

So provide new storage and new accessors which for now still access die_id
and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are
converted over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de
2024-02-15 22:07:37 +01:00
Shubhrajyoti Datta
83bf24051a EDAC/versal: Make the bit position of injected errors configurable
Currently, the bit positions to inject correctable and uncorrectable
errors are hardcoded. To make that configurable add separate sysfs entries
to set the bit positions for injecting CE and UE errors. Allow for
single bit error for CE and two bits errors for UE injection.

  [ bp: Massage. ]

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240208094653.11704-1-shubhrajyoti.datta@amd.com
2024-02-14 08:57:35 +01:00
Qiuxu Zhuo
e77086c375 EDAC/i10nm: Add Intel Grand Ridge micro-server support
The Grand Ridge CPU model uses similar memory controller registers with
Granite Rapids server. Add Grand Ridge CPU model ID for EDAC support.

Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240129062040.60809-3-qiuxu.zhuo@intel.com
2024-02-01 12:36:50 -08:00
Lili Li
65c441ec58 EDAC/igen6: Add one more Intel Alder Lake-N SoC support
Add a new Intel Alder Lake-N SoC compute die ID for EDAC support.

Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20240129062040.60809-2-qiuxu.zhuo@intel.com
2024-02-01 12:36:50 -08:00
Yazen Ghannam
6c9058f490 EDAC/amd64: Use new AMD Address Translation Library
Remove old address translation code and use the new AMD Address
Translation Library.

Use "imply" in Kconfig so that the "AMD_ATL" config option takes the
value of "EDAC_AMD64" as its default.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240123041401.79812-3-yazen.ghannam@amd.com
2024-01-24 12:55:00 +01:00
Yangtao Li
b57c1a1e7e EDAC/synopsys: Convert to devm_platform_ioremap_resource()
Use devm_platform_ioremap_resource() to simplify code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Michal Simek <michal.simek@amd.com>
Link: https://lore.kernel.org/r/20230704101811.49637-3-frank.li@vivo.com
2024-01-23 14:32:41 +01:00
Linus Torvalds
80955ae955 Driver core changes for 6.8-rc1
Here are the set of driver core and kernfs changes for 6.8-rc1.  Nothing
 major in here this release cycle, just lots of small cleanups and some
 tweaks on kernfs that in the very end, got reverted and will come back
 in a safer way next release cycle.
 
 Included in here are:
   - more driver core 'const' cleanups and fixes
   - fw_devlink=rpm is now the default behavior
   - kernfs tiny changes to remove some string functions
   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting
   - other minor changes and cleanups
 
 All of the cpu handling patches have been acked by the respective
 maintainers and are coming in here in one series.  Everything has been
 in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeOrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymtcwCffzvKKkSY9qAp6+0v2WQNkZm1JWoAoJCPYUwF
 If6wEoPLWvRfKx4gIoq9
 =D96r
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here are the set of driver core and kernfs changes for 6.8-rc1.
  Nothing major in here this release cycle, just lots of small cleanups
  and some tweaks on kernfs that in the very end, got reverted and will
  come back in a safer way next release cycle.

  Included in here are:

   - more driver core 'const' cleanups and fixes

   - fw_devlink=rpm is now the default behavior

   - kernfs tiny changes to remove some string functions

   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting

   - other minor changes and cleanups

  All of the cpu handling patches have been acked by the respective
  maintainers and are coming in here in one series. Everything has been
  in linux-next for a while with no reported issues"

* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
  Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
  class: fix use-after-free in class_register()
  PM: clk: make pm_clk_add_notifier() take a const pointer
  EDAC: constantify the struct bus_type usage
  kernfs: fix reference to renamed function
  driver core: device.h: fix Excess kernel-doc description warning
  driver core: class: fix Excess kernel-doc description warning
  driver core: mark remaining local bus_type variables as const
  driver core: container: make container_subsys const
  driver core: bus: constantify subsys_register() calls
  driver core: bus: make bus_sort_breadthfirst() take a const pointer
  kernfs: d_obtain_alias(NULL) will do the right thing...
  driver core: Better advertise dev_err_probe()
  kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
  initramfs: Expose retained initrd as sysfs file
  fs/kernfs/dir: obey S_ISGID
  kernel/cgroup: use kernfs_create_dir_ns()
  ...
2024-01-18 09:48:40 -08:00
Linus Torvalds
296455ade1 Char/Misc and other Driver changes for 6.8-rc1
Here is the big set of char/misc and other driver subsystem changes for
 6.8-rc1.  Lots of stuff in here, but first off, you will get a merge
 conflict in drivers/android/binder_alloc.c when merging this tree due to
 changing coming in through the -mm tree.
 
 The resolution of the merge issue can be found here:
 	https://lore.kernel.org/r/20231207134213.25631ae9@canb.auug.org.au
 or in a simpler patch form in that thread:
 	https://lore.kernel.org/r/ZXHzooF07LfQQYiE@google.com
 
 If there are issues with the merge of this file, please let me know.
 
 Other than lots of binder driver changes (as you can see by the merge
 conflicts) included in here are:
  - lots of iio driver updates and additions
  - spmi driver updates
  - eeprom driver updates
  - firmware driver updates
  - ocxl driver updates
  - mhi driver updates
  - w1 driver updates
  - nvmem driver updates
  - coresight driver updates
  - platform driver remove callback api changes
  - tags.sh script updates
  - bus_type constant marking cleanups
  - lots of other small driver updates
 
 All of these have been in linux-next for a while with no reported issues
 (other than the binder merge conflict.)
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeMMQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynWNgCfQ/Yz7QO6EMLDwHO5LRsb3YMhjL4AoNVdanjP
 YoI7f1I4GBcC0GKNfK6s
 =+Kyv
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc and other driver updates from Greg KH:
 "Here is the big set of char/misc and other driver subsystem changes
  for 6.8-rc1.

  Other than lots of binder driver changes (as you can see by the merge
  conflicts) included in here are:

   - lots of iio driver updates and additions

   - spmi driver updates

   - eeprom driver updates

   - firmware driver updates

   - ocxl driver updates

   - mhi driver updates

   - w1 driver updates

   - nvmem driver updates

   - coresight driver updates

   - platform driver remove callback api changes

   - tags.sh script updates

   - bus_type constant marking cleanups

   - lots of other small driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (341 commits)
  android: removed duplicate linux/errno
  uio: Fix use-after-free in uio_open
  drivers: soc: xilinx: add check for platform
  firmware: xilinx: Export function to use in other module
  scripts/tags.sh: remove find_sources
  scripts/tags.sh: use -n to test archinclude
  scripts/tags.sh: add local annotation
  scripts/tags.sh: use more portable -path instead of -wholename
  scripts/tags.sh: Update comment (addition of gtags)
  firmware: zynqmp: Convert to platform remove callback returning void
  firmware: turris-mox-rwtm: Convert to platform remove callback returning void
  firmware: stratix10-svc: Convert to platform remove callback returning void
  firmware: stratix10-rsu: Convert to platform remove callback returning void
  firmware: raspberrypi: Convert to platform remove callback returning void
  firmware: qemu_fw_cfg: Convert to platform remove callback returning void
  firmware: mtk-adsp-ipc: Convert to platform remove callback returning void
  firmware: imx-dsp: Convert to platform remove callback returning void
  firmware: coreboot_table: Convert to platform remove callback returning void
  firmware: arm_scpi: Convert to platform remove callback returning void
  firmware: arm_scmi: Convert to platform remove callback returning void
  ...
2024-01-17 16:47:17 -08:00
Linus Torvalds
3edbe8afb6 - Convert the hw error storm handling into a finer-grained, per-bank
solution which allows for more timely detection and reporting of
   errors
 
 - Start a documentation section which will hold down relevant
   RAS features description and how they should be used
 
 - Add new AMD error bank types
 
 - Slim down and remove error type descriptions from the kernel side of
   error decoding to rasdaemon which can be used from now on to decode
   hw errors on AMD
 
 - Mark pages containing uncorrectable errors as poison so that kdump can
   avoid them and thus not cause another panic
 
 - The usual cleanups and fixlets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWamqkACgkQEsHwGGHe
 VUrkMw/+Jv//z5pKFMXF2GlI/Xefh8pXDB57D22B1T6zNJm+Qq1b58U44WpxoU24
 b9gPtkgxjzA3JwoQG+cBDkCSs1xLV3McUS0UoEqQ28QvFN+mzxYKu4ura3F+rcZG
 PiCT4gPEYgZirYl0PKYeypnBPq3Krx/RYeTUE0vQ9HtmeBCmH71X1egyB4TqFbFU
 7ui9aITLAnLBwgO6On1qviGPvJoEDGAsQ656XoY0Js+8dUeYwAI4qpaaUwtXUo1D
 ARGzss55/qTRo2G+IkDDhbJ8e8G+eX22oa0n1tdhNFBqfYAN6gM+t4NiFlNnn+18
 nlbaSZfDygciE8DVjEnkVIrRJtkq7uj0dk7LvnqEI2y7J0LybHojC3hDKrqYLK3o
 PRgfPwmykOCwZQldGFYbShvmY8KoEQc/V9OWi/+A/M/uTJsForQmHn78Z2YkO9kG
 K6VaLuYszSCqz47wf56pHBwtMrivEPmkcxaz9ErkK90vM/NmV7kfLl+R8IK8apNJ
 cJkuLBjfgGpBP+AlpXGl9OE0lRJK5MCbEBlbGBBl58REBaB4DNkM4QHItrUSRR82
 ADLcfLAIRWT8UwwXieDbWF1jb+4L+IJnXCKGCCQ7+eYxcFI9V9TABD2B8io+Dzvz
 evZwLCPKmjuPc2CMcDu/eUdBKLNTn3QAoN/NLcVmzJ23lguQW/M=
 =o3UM
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

 - Convert the hw error storm handling into a finer-grained, per-bank
   solution which allows for more timely detection and reporting of
   errors

 - Start a documentation section which will hold down relevant RAS
   features description and how they should be used

 - Add new AMD error bank types

 - Slim down and remove error type descriptions from the kernel side of
   error decoding to rasdaemon which can be used from now on to decode
   hw errors on AMD

 - Mark pages containing uncorrectable errors as poison so that kdump
   can avoid them and thus not cause another panic

 - The usual cleanups and fixlets

* tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Handle Intel threshold interrupt storms
  x86/mce: Add per-bank CMCI storm mitigation
  x86/mce: Remove old CMCI storm mitigation code
  Documentation: Begin a RAS section
  x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
  EDAC/mce_amd: Remove SMCA Extended Error code descriptions
  x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
  x86/mce/inject: Clear test status value
  x86/mce: Remove redundant check from mce_device_create()
  x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
2024-01-08 16:03:00 -08:00
Linus Torvalds
1dee7f509d - The EDAC drivers part of the effort to make the ->remove() platform
driver callback return void
 
 - Add support for AMD AI accelerators
 
 - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P,
   Meteor Lake-{P,PS}
 
 - Random fixes and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWZRIAACgkQEsHwGGHe
 VUpesxAAux/mgyqJ4lfzpP4I3LkDA7X9uU3Iv6j9NtS64TGvulT4ZMiSL/U/nqTY
 /jsJYNJE4yIXm2Bf/T3zx+Dxh6MmUZNEYzSgZyyNx9NPEkHi4gX+b9F2nUx9i66s
 D0Pn9IVbM1liNfGr2RSSsKwFa/5UBg+58nGF1DSUz+rOaQ0KOoeta9WE98ne4n81
 fC0o4RCzcPswy3NVhi3wPoBiAfyMYmrtvBcSezIbV6CaVJB8aFkvZK4xGKkEBFoJ
 ApVrSr6bEjJ7I7DdudisfIhLfU6iji7ZGfkwiouN2TIfVCaOZdAVan/OETpPLC6b
 IPNac1iIIl6v2OfxQp6s6b/Vq+dL+vVa4SW5tNj242s9eHunuevEcly37x4W254m
 vt91K9dvZI3I5DzAbVagw3Z9tFHbIH7uiSYyjMoyYw11EvwY+bZPXfZExvEUuPQK
 2f+VL4ELlBj0imrWSCur5hF1g8R2ejTjb+kYlTcULyaXB4ANeBVzeShff3qc/21g
 3QojVA7v+7xkPwURKxRwlqkwE113cz0Rr/kKjXhXnBOuPulX0T5eSzX8WExZzSCT
 4FsEvAwZPFt7iCTxB+cspykNxyxLqzMb8GZFG+Ji8szGNbVgiidn/QEQzohs8yBu
 aFenuLEM7XwvpPCjmLGGnRkFV9mBDfPATLegTFvMpPQVYhjPdp4=
 =Bptq
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - The EDAC drivers part of the effort to make the ->remove() platform
   driver callback return void

 - Add support for AMD AI accelerators

 - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P,
   Meteor Lake-{P,PS}

 - Random fixes and cleanups all over the place

* tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits)
  EDAC/skx_common: Filter out the invalid address
  EDAC, pnd2: Sort headers alphabetically
  EDAC, pnd2: Correct misleading error message in mk_region_mask()
  EDAC, pnd2: Apply bit macros and helpers where it makes sense
  EDAC, pnd2: Replace custom definition by one from sizes.h
  EDAC/igen6: Add Intel Meteor Lake-P SoCs support
  EDAC/igen6: Add Intel Meteor Lake-PS SoCs support
  EDAC/igen6: Add Intel Raptor Lake-P SoCs support
  EDAC/igen6: Add Intel Alder Lake-N SoCs support
  EDAC/igen6: Make get_mchbar() helper function
  EDAC/amd64: Add support for family 0x19, models 0x90-9f devices
  EDAC/mc: Add support for HBM3 memory type
  EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer
  EDAC/armada_xp: Explicitly include correct DT includes
  EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals
  EDAC/thunderx: Fix possible out-of-bounds string access
  EDAC/fsl_ddr: Convert to platform remove callback returning void
  EDAC/zynqmp: Convert to platform remove callback returning void
  EDAC/xgene: Convert to platform remove callback returning void
  EDAC/ti: Convert to platform remove callback returning void
  ...
2024-01-08 13:22:30 -08:00
Jay Buddhabhatti
97d62760e4 drivers: soc: xilinx: add check for platform
Some error event IDs for Versal and Versal NET are different.
Both the platforms should access their respective error event
IDs so use sub_family_code to check for platform and check
error IDs for respective platforms. The family code is passed
via platform data to avoid platform detection again.
Platform data is setup when even driver is registered.

Signed-off-by: Jay Buddhabhatti <jay.buddhabhatti@amd.com>
Link: https://lore.kernel.org/r/20231219055025.27570-3-jay.buddhabhatti@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-04 17:02:49 +01:00
Greg Kroah-Hartman
f36be9ce81 EDAC: constantify the struct bus_type usage
In many places in the edac code, struct bus_type pointers are passed
around and then eventually sent to the driver core, which can handle a
constant pointer.  So constantify all of the edac usage of these as well
because the data in them is never modified by the edac code either.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc:  <linux-edac@vger.kernel.org>
Link: https://lore.kernel.org/r/2023121909-tribute-punctuate-4b22@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-04 14:34:27 +01:00
Qiuxu Zhuo
1e92af09fa EDAC/skx_common: Filter out the invalid address
Decoding an invalid address with certain firmware decoders could
cause a #PF (Page Fault) in the EFI runtime context, which could
subsequently hang the system. To make {i10nm,skx}_edac more robust
against such bogus firmware decoders, filter out invalid addresses
before allowing the firmware decoder to process them.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20231207014512.78564-1-qiuxu.zhuo@intel.com
2024-01-02 09:20:08 -08:00
Shubhrajyoti Datta
9483aa4491 EDAC/versal: Read num_csrows and num_chans using the correct bitfield macro
Fix the extraction of num_csrows and num_chans. The extraction of the
num_rows is wrong. Instead of extracting using the FIELD_GET it is
calling FIELD_PREP.

The issue was masked as the default design has the rows as 0.

Fixes: 6f15b178cd ("EDAC/versal: Add a Xilinx Versal memory controller driver")
Closes: https://lore.kernel.org/all/60ca157e-6eff-d12c-9dc0-8aeab125edda@linux-m68k.org/
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231215053352.8740-1-shubhrajyoti.datta@amd.com
2023-12-15 13:01:27 +01:00
Andy Shevchenko
a69badad73 EDAC, pnd2: Sort headers alphabetically
Sort the headers in alphabetic order in order to ease
the maintenance for this part.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 13:04:57 -08:00
Andy Shevchenko
f1b0b1167f EDAC, pnd2: Correct misleading error message in mk_region_mask()
The mask parameter is expected to be of a sequence of the set bits.
It does not mean it must be power of two (only single bit set).
Correct misleading error message.

Suggested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 13:04:57 -08:00
Andy Shevchenko
530258f872 EDAC, pnd2: Apply bit macros and helpers where it makes sense
Apply bit macros (BIT()/BIT_ULL()/GENMASK()/etc) and helpers
(for_each_set_bit()/etc) where it makes sense.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 13:04:57 -08:00
Andy Shevchenko
a50cc8de99 EDAC, pnd2: Replace custom definition by one from sizes.h
The sizes.h provides a set of common size definitions, use it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 13:04:57 -08:00
Qiuxu Zhuo
6807434ff0 EDAC/igen6: Add Intel Meteor Lake-P SoCs support
Add Intel Meteor Lake-P SoC compute die IDs for EDAC support.
These Meteor Lake-P SoCs share similar IBECC registers with
Alder Lake-P SoCs.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 10:59:13 -08:00
Qiuxu Zhuo
3c77090c12 EDAC/igen6: Add Intel Meteor Lake-PS SoCs support
Add Intel Meteor Lake-PS SoC compute die IDs for EDAC support.
These SoCs share similar IBECC registers with Alder Lake-P SoCs.
The only difference is that IBECC presence is detected through an
MMIO-mapped register instead of the capability register in the
PCI configuration space.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 10:59:13 -08:00
Qiuxu Zhuo
d23627a768 EDAC/igen6: Add Intel Raptor Lake-P SoCs support
Add Intel Raptor Lake-P SoC compute die IDs for EDAC support.
These Raptor Lake-P SoCs share similar IBECC registers with Alder Lake-P
SoCs but extend the most significant bit of the error address logged
in IBECC from bit 38 to bit 45.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 10:59:13 -08:00
Qiuxu Zhuo
c4a5398991 EDAC/igen6: Add Intel Alder Lake-N SoCs support
Add Intel Alder Lake-N SoC compute die IDs for EDAC support.
Alder Lake-N, with one memory controller, is a reduced version of
Alder Lake-P, which has two memory controllers.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 10:59:13 -08:00
Qiuxu Zhuo
a264f715ec EDAC/igen6: Make get_mchbar() helper function
Make get_mchbar() helper function to retrieve the BAR address of
the memory controller. No function changes.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05 10:59:13 -08:00
Muralidhara M K
12f230c07a EDAC/amd64: Add support for family 0x19, models 0x90-9f devices
AMD Models 90h-9fh are APUs. They have built-in HBM3 memory. ECC support
is enabled by default.

APU models have a single Data Fabric (DF) per Package. Each DF is
visible to the OS in the same way as chiplet-based systems like Zen2
CPUs and later. However, the Unified Memory Controllers (UMCs) are
arranged in the same way as GPU-based MI200 devices rather than
CPU-based systems.

Use the existing gpu_ops for hetergeneous systems to support enumeration
of nodes and memory topology with few fixups.

  [ bp: Massage comments. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231102114225.2006878-5-muralimk@amd.com
2023-11-29 11:21:05 +01:00
Muralidhara M K
9a5f580c1c EDAC/mc: Add support for HBM3 memory type
AMD MI300A models use HBM3 (High Bandwidth Memory Gen 3) memory. HBM is
a high-speed computer memory interface for 3D-stacked synchronous
dynamic random-access memory (SDRAM).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231102114225.2006878-4-muralimk@amd.com
2023-11-28 16:39:12 +01:00
Abhinav Singh
a2f99fbae4 EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer
Sparse warns about the use of the integer constant 0 as a NULL pointer
with the -Wnon-pointer-null switch.

Even though the C standard requires that 0 == NULL and type conversion
rules turn an integer constant 0 into a NULL pointer when cast to a void
* type, Linus notes that this is a very poor situation from a type
safety angle and a pointer should be initialized with a pointer type
- not an integer constant.

See https://www.spinics.net/lists/linux-sparse/msg10066.html for more
info.

  [ bp: Rewrite commit message, drop useless comments in the code. ]

Signed-off-by: Abhinav Singh <singhabhinav9051571833@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231128141703.614605-1-singhabhinav9051571833@gmail.com
2023-11-28 15:43:43 +01:00
Muralidhara M K
9f988030e8 EDAC/mce_amd: Remove SMCA Extended Error code descriptions
On AMD systems with Scalable MCA each machine check error of a SMCA bank
type has an associated bit position in the bank's control (CTL)
register.

An error's bit position in the CTL register is used during error decoding
for offsetting into the corresponding bank's error description structure.
As new errors are being added in newer AMD systems for existing SMCA bank
types, the underlying SMCA architecture guarantees that the bit positions
of existing errors are not altered.

However, on some AMD systems some of the existing bit definitions in the
CTL register of SMCA bank type are reassigned without defining new HWID
and McaType. Consequently, the errors whose bit definitions have been
reassigned in the CTL register are being erroneously decoded.

Remove SMCA Extended Error Code descriptions, this avoids decoding
issues for incorrectly reassigned bits, and avoids the related
maintenance burden in the kernel. But the bank type and Extended Error
Code value for an error will continue to be printed as a convenience.

The decoding of SMCA Extended Error Code description can be done by
referring to AMD documentation or use external tools such as rasdaemon.

Offline decoding can be done using below option in rasdaemon. For example:

  $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca

Also, the user can pass particular family and model to decode the error
string.

$ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family>
	--model <CPU Model> --bank <BANK_NUM>

Refer to the rasdaemon commit for details:

  https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20231102114225.2006878-2-muralimk@amd.com
2023-11-28 15:17:09 +01:00
Rob Herring
9e08ac1b5e EDAC/armada_xp: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h.

As a result, there's a pretty much random mix of those include files
used throughout the tree. In order to detangle these headers and replace
the implicit includes with struct declarations, users need to explicitly
include the correct includes.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jan Luebbe <jlu@pengutronix.de>
Link: https://lore.kernel.org/r/20231013190342.246973-1-robh@kernel.org
2023-11-27 18:20:51 +01:00
Yazen Ghannam
ff03ff328f x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
The long names of the SMCA banks are only used by the MCE decoder
module.

Move them out of the arch code and into the decoder module.

  [ bp: Name the long names array "smca_long_names", drop local ptr in
    decode_smca_error(), constify arrays. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231118193248.1296798-5-yazen.ghannam@amd.com
2023-11-27 12:16:51 +01:00
Ilpo Järvinen
5f57b717cc EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals
Replace literal 0x7f with PCI_HEADER_TYPE_MASK.

No functional changes:

  $ sha1sum drivers/edac/edac_pci_sysfs.o.*
  805b33a090d8019d8b3b348191f630c72c748c9c drivers/edac/edac_pci_sysfs.o.before
  805b33a090d8019d8b3b348191f630c72c748c9c drivers/edac/edac_pci_sysfs.o.after

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231124090919.23687-5-ilpo.jarvinen@linux.intel.com
2023-11-24 15:08:37 +01:00
Arnd Bergmann
475c58e1a4 EDAC/thunderx: Fix possible out-of-bounds string access
Enabling -Wstringop-overflow globally exposes a warning for a common bug
in the usage of strncat():

  drivers/edac/thunderx_edac.c: In function 'thunderx_ocx_com_threaded_isr':
  drivers/edac/thunderx_edac.c:1136:17: error: 'strncat' specified bound 1024 equals destination size [-Werror=stringop-overflow=]
   1136 |                 strncat(msg, other, OCX_MESSAGE_SIZE);
        |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ...
   1145 |                                 strncat(msg, other, OCX_MESSAGE_SIZE);
   ...
   1150 |                                 strncat(msg, other, OCX_MESSAGE_SIZE);

   ...

Apparently the author of this driver expected strncat() to behave the
way that strlcat() does, which uses the size of the destination buffer
as its third argument rather than the length of the source buffer. The
result is that there is no check on the size of the allocated buffer.

Change it to strlcat().

  [ bp: Trim compiler output, fixup commit message. ]

Fixes: 41003396f9 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20231122222007.3199885-1-arnd@kernel.org
2023-11-23 17:44:09 +01:00
Uwe Kleine-König
0c7c7ba0c7 EDAC/fsl_ddr: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

fsl_mc_err_remove() is used as callback in two drivers. So these have to
be converted together to the void returning remove callback.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231013100422.1382040-2-u.kleine-koenig@pengutronix.de
2023-11-20 23:34:04 +01:00
Uwe Kleine-König
ec886cf881 EDAC/zynqmp: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Michal Simek <michal.simek@amd.com>
Link: https://lore.kernel.org/r/20231004131254.2673842-22-u.kleine-koenig@pengutronix.de
2023-11-20 23:33:21 +01:00
Uwe Kleine-König
9441e5ca3a EDAC/xgene: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-21-u.kleine-koenig@pengutronix.de
2023-11-20 23:31:44 +01:00
Uwe Kleine-König
8312b2bbdd EDAC/ti: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-20-u.kleine-koenig@pengutronix.de
2023-11-20 23:30:09 +01:00
Uwe Kleine-König
f30e2fac7d EDAC/synopsys: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-19-u.kleine-koenig@pengutronix.de
2023-11-20 23:29:21 +01:00
Uwe Kleine-König
bfee05aa38 EDAC/qcom: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-18-u.kleine-koenig@pengutronix.de
2023-11-20 23:28:17 +01:00
Uwe Kleine-König
58758ffa11 EDAC/ppc4xx: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-17-u.kleine-koenig@pengutronix.de
2023-11-20 23:24:46 +01:00
Uwe Kleine-König
524d3e56fb EDAC/octeon-pci: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231004131254.2673842-16-u.kleine-koenig@pengutronix.de
2023-11-20 23:11:21 +01:00
Uwe Kleine-König
a92dd68e16 EDAC/octeon-pc: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231004131254.2673842-15-u.kleine-koenig@pengutronix.de
2023-11-20 23:10:37 +01:00
Uwe Kleine-König
c2a962933c EDAC/octeon-lmc: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231004131254.2673842-14-u.kleine-koenig@pengutronix.de
2023-11-20 23:09:49 +01:00
Uwe Kleine-König
01314f2772 EDAC/octeon-l2c: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231004131254.2673842-13-u.kleine-koenig@pengutronix.de
2023-11-20 23:09:07 +01:00
Uwe Kleine-König
8510e004d5 EDAC/npcm: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-12-u.kleine-koenig@pengutronix.de
2023-11-20 22:52:14 +01:00
Uwe Kleine-König
1baf49724e EDAC/mpc85xx: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-11-u.kleine-koenig@pengutronix.de
2023-11-20 22:44:37 +01:00
Uwe Kleine-König
81b3e87411 EDAC/highbank_mc: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20231004131254.2673842-10-u.kleine-koenig@pengutronix.de
2023-11-20 22:33:52 +01:00
Uwe Kleine-König
7aca2e9b7b EDAC/highbank_l2: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20231004131254.2673842-9-u.kleine-koenig@pengutronix.de
2023-11-20 22:33:22 +01:00
Uwe Kleine-König
d27cb32e00 EDAC/dmc520: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-8-u.kleine-koenig@pengutronix.de
2023-11-20 22:01:16 +01:00
Uwe Kleine-König
0576ded05b EDAC/cpc925: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-7-u.kleine-koenig@pengutronix.de
2023-11-20 21:56:49 +01:00
Uwe Kleine-König
d8d9f99fd0 EDAC/cell: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-6-u.kleine-koenig@pengutronix.de
2023-11-20 21:39:13 +01:00
Uwe Kleine-König
a5347591eb EDAC/bluefield: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-5-u.kleine-koenig@pengutronix.de
2023-11-20 21:28:08 +01:00
Uwe Kleine-König
2546fffd91 EDAC/aspeed: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-4-u.kleine-koenig@pengutronix.de
2023-11-20 21:26:01 +01:00
Uwe Kleine-König
5aafd02da7 EDAC/armada_xp: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-3-u.kleine-koenig@pengutronix.de
2023-11-20 20:07:53 +01:00
Uwe Kleine-König
b73e11c873 EDAC/altera: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231004131254.2673842-2-u.kleine-koenig@pengutronix.de
2023-11-20 19:43:18 +01:00
Rob Herring
1b09892962 EDAC/altera: Use device_get_match_data()
Use preferred device_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115210201.3743564-1-robh@kernel.org
2023-11-20 09:59:16 +01:00
Linus Torvalds
befaa609f4 hardening updates for v6.7-rc1
- Add LKDTM test for stuck CPUs (Mark Rutland)
 
 - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)
 
 - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva)
 
 - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh)
 
 - Convert group_info.usage to refcount_t (Elena Reshetova)
 
 - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)
 
 - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn)
 
 - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook)
 
 - Fix strtomem() compile-time check for small sources (Kees Cook)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmU/3cUWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsEoEACBGPSiOmfSWdH3TOnIG270PD24
 jGjg8KFv7RC/JTOdYmpLl0okdlGT9LvjN/ToSSDEw3PIayxoXUdhkbYy0MYtiV3m
 yz2ozDTzJuplQX/W2fPE+nXSzIwHao2zjPPFjHnT7lt8IIjhgjiOtLfZ2gGUkW99
 Mdu2aWh3u0r4tC8OS23++yN5ibRc5l72efsjDWjZ0aPXnxE1bjmLMiIPiizpndIf
 beasPuDBs98sJVYouemCwnsPXuXOPz3Q1Cpo/fTd+TMTJCLSemCQZCTuOBU0acI/
 ZjLCgCaJU1yIYKBMtrIN4G9kITZniXX3/Nm4o6NQMVlcCqMeNaHuflomqWoqWfhE
 UPbRo2eghZOaMNiCKLLvZDIqPrh1IcsiEl6Ef3W4hICc42GTK96IuGisIvDXwQ4N
 /SzTOupJuN42noh3z1M3XuZy5RoXJ99IYDNY5CTKf9IdqvA0bbGkU3nb1gZH/xw9
 BjTqKzR/7K1kTXuSgagDZ1Wceej9pZxhX7E3IHYsP8ZOvKug3EeL4yybVwQ3HRfq
 Qnzcp/qPB9cOkLSQXveRTFTsj2mX28Gixct/iDuc1jIYwGQlY1gI6dcUcqby6ptM
 BrQti7eR2NH2+T3aE2UVCIWsZVhx7NaSF+z8JxfAuu56jicc4xJVsi8zrNveWX5M
 m2VXyBl3121BVtKi4w==
 =0iVF
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "One of the more voluminous set of changes is for adding the new
  __counted_by annotation[1] to gain run-time bounds checking of
  dynamically sized arrays with UBSan.

   - Add LKDTM test for stuck CPUs (Mark Rutland)

   - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)

   - Refactor more 1-element arrays into flexible arrays (Gustavo A. R.
     Silva)

   - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem
     Shaikh)

   - Convert group_info.usage to refcount_t (Elena Reshetova)

   - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)

   - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas
     Bulwahn)

   - Fix randstruct GCC plugin performance mode to stay in groups (Kees
     Cook)

   - Fix strtomem() compile-time check for small sources (Kees Cook)"

* tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits)
  hwmon: (acpi_power_meter) replace open-coded kmemdup_nul
  reset: Annotate struct reset_control_array with __counted_by
  kexec: Annotate struct crash_mem with __counted_by
  virtio_console: Annotate struct port_buffer with __counted_by
  ima: Add __counted_by for struct modsig and use struct_size()
  MAINTAINERS: Include stackleak paths in hardening entry
  string: Adjust strtomem() logic to allow for smaller sources
  hardening: x86: drop reference to removed config AMD_IOMMU_V2
  randstruct: Fix gcc-plugin performance mode to stay in group
  mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by
  drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by
  irqchip/imx-intmux: Annotate struct intmux_data with __counted_by
  KVM: Annotate struct kvm_irq_routing_table with __counted_by
  virt: acrn: Annotate struct vm_memory_region_batch with __counted_by
  hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by
  sparc: Annotate struct cpuinfo_tree with __counted_by
  isdn: kcapi: replace deprecated strncpy with strscpy_pad
  isdn: replace deprecated strncpy with strscpy
  NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by
  nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by
  ...
2023-10-30 19:09:55 -10:00
Shubhrajyoti Datta
6f15b178cd EDAC/versal: Add a Xilinx Versal memory controller driver
Add a EDAC driver for the RAS capabilities on the Xilinx integrated DDR
Memory Controllers (DDRMCs) which support both DDR4 and LPDDR4/4X memory
interfaces. It has four programmable Network-on-Chip (NoC) interface
ports and is designed to handle multiple streams of traffic. The driver
reports correctable and uncorrectable errors, and also creates debugfs
entries for testing through error injection.

  [ bp:
   - Add a pointer to the documentation about the register unlock code.
   - Squash in a fix for a Smatch static checker issue as reported by
     Dan Carpenter:
     https://lore.kernel.org/r/a4db6f93-8e5f-4d55-a7b8-b5a987d48a58@moroto.mountain
  ]

Co-developed-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com>
Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com>
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231005101242.14621-3-shubhrajyoti.datta@amd.com
2023-10-23 19:41:27 +02:00
Justin Stitt
6b343a4642 EDAC/mc_sysfs: Replace deprecated strncpy() with memcpy()
`strncpy` is deprecated for use on NUL-terminated destination strings [1].

We've already calculated bounds, possible truncation with '\0' or '\n'
and manually NUL-terminated. The situation is now just a literal byte
copy from one buffer to another, let's treat it as such and use a less
ambiguous interface in memcpy.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230918-strncpy-drivers-edac-edac_mc_sysfs-c-v4-1-38a23d2fcdd8@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-09-29 14:48:32 -07:00
Linus Torvalds
bb511d4b25 Intel EDAC fixes:
- Old igen6 driver could lose pending events during initialization
 - Sapphire Rapids workstations have fewer memory controllers than their
   bigger siblings. This confused the driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEENIoOqscayAmBOQ5Iq6sjH5ffWIEFAmTudg4UHHRvbnkubHVj
 a0BpbnRlbC5jb20ACgkQq6sjH5ffWIH4wA//Z+pbRElvnWyK8rTx6SbWFu82D8a/
 dAXx5V+8I6v64MPb9VZXP6KEiBQgk2jD2AsC0+2QrZL9FUnKwnBSDC3rgVWPTxBo
 dTxu8j1PDTlnffU+wuaB+3cCRikwa1h+Fr/SQaphwTLA3nm13CHj+dUOp3ZUR8fT
 vz+M4t3SRgcU/0W40jcLnn1h5hsTNjQWr//zVVdctGr++sl7xtVh7wxZPakTC9RL
 FBMx3elqdroeQ5ILMxC5e1V02tAZVrXxZbSNpLWhH25MBwe8P7rc+SHYfNaddnpx
 3qrOOzRZl3fGifoM+GU/JsMeIYh6FYUhOfBNTjUFWQZP+6mDvgj9WaLxVgw9V99R
 W384K7KnjLSnE01/REZ0x9R1sehXyQIv2zGosJitRuKyLuw5UODx/khzpCG6a0P3
 RPi4tNemscCIr5djX8VBqmyxS5tqUzlBBDskDnsHHS7NXLuYv1O6SqR/7kvCqhFQ
 7/qGWNFbzZOMJZiLGUmmxEv3Pk+tfTlZdYOipfaHpSlNNr9zO07VXBRNK18aqQVp
 3GCpRp3IhTL3EmOE2RaV2uhyRIcpSnjvqi8shoN6p1wy8jQwNKoe3/nt7QobKhCl
 4kYC9q0jNDWgh/QWxgtoB6UzWHIieeVZQQcW0Da4fvlsIBwbzcpu5+j3qaCxUNBD
 jUt/DwSD+D91yPI=
 =4Uqu
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull intel EDAC fixes from Tony Luck:

 - Old igen6 driver could lose pending events during initialization

 - Sapphire Rapids workstations have fewer memory controllers than their
   bigger siblings. This confused the driver.

* tag 'edac_updates_for_v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/igen6: Fix the issue of no error events
  EDAC/i10nm: Skip the absent memory controllers
2023-08-30 19:23:00 -07:00
Linus Torvalds
ef2a0b7cdb Devicetree include cleanups for v6.6:
These are the remaining few clean-ups of DT related includes which
 didn't get applied to subsystem trees.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmTucUoACgkQ+vtdtY28
 YcOYoQ//RwIPeWc74PHQbOb6eQR95eTHDcDE1MR9Fw8amqxFaomGlSMpbyVyP4ag
 8p82c6qfJIZautyEikbKFO+iYjFMua0KuOTMVuDxHErQOl6ym4P4Uk3+1h5stVSj
 IdfK4CACtMKxKBOPAcyxJU6HKoWcUtMKsKV6OLdDh7M2Fy/G4RCjv4w1Xf3VAn59
 VOa0KF7FhHU3dhIB/tGsj0t13+3e3kF5+l4+pdoMoZWhR4gac5FJRxiR5dMZG6jr
 VY8i9FZb7DW2VtY78FVVOaYDDVf4vNrc+0kqnCbWUaKACHPgNXC375LvS7jFGXvc
 HYVN3teqhFxNOyoSehn2bdBVwJxjQFgy2gTt2vRWTa/CaUDES90cue2R9GT2Sz0b
 eBc3DQtNeT5m8mrLkuEfZrJjKjaEy2Pr6FjNDhNcmkJak7dkMMgkG/Y/SpNmpZOe
 2C3T6i4i6FUxni/2/rWHSVLnYBGfhPNdwWAZcQOi8rqtzp3tF46wVa345+Ev3VDG
 ECDndH8Qk3gtOmGyeTIvPc51yDP6Hpuh7+0jydtehkXHB+cUJtR+g0efIGf7BDgo
 sQpa1vRxkOolrCxyzKwcogEY7jjeccv/FM7BwaZQKXEibiKGkxeDuahdwbfvDuVq
 br16Uj9VzG8Jl6KK0gexV7kzZAAdw1y3JqPGUZf7hn4zmk099ow=
 =eLMf
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree include cleanups from Rob Herring:
 "These are the remaining few clean-ups of DT related includes which
  didn't get applied to subsystem trees"

* tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  ipmi: Explicitly include correct DT includes
  tpm: Explicitly include correct DT includes
  lib/genalloc: Explicitly include correct DT includes
  parport: Explicitly include correct DT includes
  sbus: Explicitly include correct DT includes
  mux: Explicitly include correct DT includes
  macintosh: Explicitly include correct DT includes
  hte: Explicitly include correct DT includes
  EDAC: Explicitly include correct DT includes
  clocksource: Explicitly include correct DT includes
  sparc: Explicitly include correct DT includes
  riscv: Explicitly include correct DT includes
2023-08-30 17:04:28 -07:00
Linus Torvalds
1a7c611546 Perf events changes for v6.6:
- AMD IBS improvements
 - Intel PMU driver updates
 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events
 - Micro-optimize software events and the ring-buffer code
 - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtBscRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hHoQ/+IBQ8Xi/rcdd40n8OqEB/VBWVuSjNT3uN
 3pHHcTl2Pio9CxBeat42NekNijlRILCKJrZ3Lt3JWBmWyWv5l3KFabelj+lDF2xa
 TVCjTnQNe1+HvrODYnF4ECIs5vaoMVjcJ9jg8+VDgAcOQr1nZs4m5TVAd6TLqPpV
 urBEQVULkkzk7ZRhfrugKhw+wrpWFefgGCx0RV8ijZB7TLMHc2wE+Q/sTxKdKceL
 wNaJaDgV33pZh0aImwR9pKUE532hF1FiBdLuehkh61PZa1L82jzAX1xjw2s1hSa4
 eIWemPHJIYfivRlENbJsDWc4N8gk6ijVHwrxGcr4Axu+NN+zPtQ3ddhaGMAyKdTo
 qUKXH3MZSMIl++jI5Fkc6xM+XLvY1rML62epSzMwu/cc7Z5MeyWdQcri0N9YFuO7
 wUUNnFpU00lwQBLbyyUQ3Zi8E0QV7NuPW4axTkmntiIjMpLagaEvVSf6nf8qLpbE
 WTT16s707t19hUZNazNZ7ONmhly4ALbHFQEH65J2KoYn99fYqy9z68Hwk+xnmykw
 bc3qvfhpw0MImQQ+DqHiBwb4n4UuvY2WlkkZI3FfNeSG63DaM2mZikfpElpXYjn6
 9iOIXvx21Wiq/n0cbLhidI2q/ZzFCzYLCk6ikZ320wb+rhvd7EoSlZil6QSzn3pH
 Qdk+NEZgWQY=
 =ZT6+
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Rob Herring
408d808893 EDAC: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Link: https://lore.kernel.org/r/20230714174434.4054728-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2023-08-28 13:31:01 -05:00
Avadhut Naik
c4d07c3712 EDAC/amd64: Add support for AMD family 1Ah models 00h-1Fh and 40h-4Fh
Add support for family 1Ah-based models 00h-1Fh and 40h-4Fh.

  [ bp: Simplify. ]

Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230809035244.2722455-4-avadhut.naik@amd.com
2023-08-10 14:25:21 +02:00
Peter Zijlstra
0cfd8fbadd x86/cpu: Fix Crestmont uarch
Sierra Forest and Grand Ridge are both E-core only using Crestmont
micro-architecture, They fit the pre-existing naming scheme prefectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.757666627@infradead.org
2023-08-09 21:51:06 +02:00
Qiuxu Zhuo
ce53ad81ed EDAC/igen6: Fix the issue of no error events
Current igen6_edac checks for pending errors before the registration
of the error handler. However, there is a possibility that the error
occurs during the registration process, leading to unhandled pending
errors and no future error events. This issue can be reproduced by
repeatedly injecting errors during the loading of the igen6_edac.

Fix this issue by moving the pending error handler after the registration
of the error handler, ensuring that no pending errors are left unhandled.

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: Ee Wey Lim <ee.wey.lim@intel.com>
Tested-by: Ee Wey Lim <ee.wey.lim@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com
2023-08-02 13:09:56 -07:00
Qiuxu Zhuo
c545f5e412 EDAC/i10nm: Skip the absent memory controllers
Some Sapphire Rapids workstations' absent memory controllers
still appear as PCIe devices that fool the i10nm_edac driver
and result in "shift exponent -66 is negative" call traces
from skx_get_dimm_info().

Skip the absent memory controllers to avoid the call traces.

Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Closes: https://lore.kernel.org/linux-edac/CAAd53p41Ku1m1rapeqb1xtD+kKuk+BaUW=dumuoF0ZO3GhFjFA@mail.gmail.com/T/#m5de16dce60a8c836ec235868c7c16e3fefad0cc2
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Koba Ko <koba.ko@canonical.com>
Closes: https://lore.kernel.org/linux-edac/SA1PR11MB71305B71CCCC3D9305835202892AA@SA1PR11MB7130.namprd11.prod.outlook.com/T/#t
Tested-by: Koba Ko <koba.ko@canonical.com>
Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230710013232.59712-1-qiuxu.zhuo@intel.com
2023-07-24 08:57:26 -07:00
Linus Torvalds
aa35a4835e - Add initial support for RAS hardware found on AMD server GPUs (MI200).
Those GPUs and CPUs are connected together through the coherent fabric
   and the GPU memory controllers report errors through x86's MCA so EDAC
   needs to support them. The amd64_edac driver supports now HBM (High
   Bandwidth Memory) and thus such heterogeneous memory controller
   systems
 
 - Other small cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZiUwACgkQEsHwGGHe
 VUphSQ/+JLXTAQ06CNos98MR8iCGdThVujhWt1pBIgjhQFJuf4JlEEtKs9htjbud
 9HZvgnGbHahRoO8pMCB0jwtz0ATrPbaOvz4BofVp3SIRiR5jMI0tfmyl8iSrnA3Q
 m5pbMh6uiIAlH8aPqQXret2iwp7JXOjnBWksgbmUWkI7d2qseKu98ikXyC4QoCaD
 AGRJJ6OCA3P85rdT9qabOuXh6yoELOPKw3j243s22sTLiqn+EuoTE+QX5ZjrQ8Ts
 DyXN/pYI/vGVP7sECkWf7PsEf1BkL6m5KeXDB4Ij2YJesQnBlBZQdAcxdGdY8z3M
 f/qpLdrYvpcLHQy42Jm5VnnISOvMvAl8YWqCEyUmBjXcLwSPNIKHN9LQuznhnQHr
 vssRVqQUg1J+/UWAoIzHdrAQ6zvgv1xlX2dG2YOw3t1WMDnMhztW3eoQv04etD3d
 fqQH3MrkGHI4qeq1Mice1Gz+NWQG/PXVhgBzbTBDDCiRJkg1Dhxce1OMRUiM4tUW
 0JABoU+KS0RZAKXAwine6v5duYmwK36Vl1SSCCWjqFMeR7XMwWWHA9d7t8+wdT1l
 KBIEiRTcRnXaZXyLUPSPRbEF5ALS25RgWVPCA3ibuSUnJjGU7Z7/rbwlQryAefVB
 nqjATed0zat4fbL9bvnDuOKQEzkuySvUWpU+Eozxbct6oRu5ms0=
 =Vcif
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Add initial support for RAS hardware found on AMD server GPUs (MI200).

   Those GPUs and CPUs are connected together through the coherent
   fabric and the GPU memory controllers report errors through x86's MCA
   so EDAC needs to support them. The amd64_edac driver supports now HBM
   (High Bandwidth Memory) and thus such heterogeneous memory controller
   systems

 - Other small cleanups and improvements

* tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/amd64: Cache and use GPU node map
  EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
  EDAC/amd64: Document heterogeneous system enumeration
  x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
  x86/amd_nb: Re-sort and re-indent PCI defines
  x86/amd_nb: Add MI200 PCI IDs
  ras/debugfs: Fix error checking for debugfs_create_dir()
  x86/MCE: Check a hw error's address to determine proper recovery action
2023-06-26 15:09:18 -07:00
Linus Torvalds
e5ce2f196f - amd64_edac: Add support for Zen4 client hardware
- amd64_edac: Remove the version string as it is useless and actively
   confusing when looking at backported versions of the driver
 
 - Add a driver for the Nuvoton NPCM memory controller
 
 - A debugfs error checking cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZSk4ACgkQEsHwGGHe
 VUqPaQ/8Dwc8vS4iibztAFyYisRGrJRfVP6k7Nl2tSgCi+Tg0BWNFTSMyBzoLNuY
 ewGUe1ZKNkKb3Vs8OE3E48vstVd6J/jcoMAxUmtl4uHxzjhVfoIruBD/xK2Q6mO1
 UDfRrfT2LZv/0/Tn7++QP3R3aQLvDqJC6IVAG1Hn4hqHSnhw7CqgCetbBY/M+hQR
 p9Xjtb2Gbm1UwMEK+z9DG9jNZR2vtPRfOeieAcHpOnDwTe2QY1jQGoeeVDfdfJbC
 iU2D87ad1V7o4p+7Eur0wwg8smuWqSVslWId6+qmtL4xePK6JUL9D+3kPEO4AjWV
 iYqDi4EcdXOglYnAEvKhRbN8eCFMaYyoZqpC10DUTccyWv5w/CW2tRc7ZOKDPgyZ
 LVpupz87rKaJ2C6ymQ41vv98hpHEiGSSHserK0aY4K03ecL+pnHp4Qu3ZID8YLCo
 V6P1R7S63YFO1TU0LSWiVBBcmoWg0Zy5MQkKc+2PcWYm6soGDYFoD5lURVoVAiw4
 YZhReq58NQwyZQYhxgpBmdZYaLlrvGiGQZx/dhuR5C2qF3uL3wdi5mYvP/vSmKbG
 vLPMl/DrqGQEHJnCU2U8Xo3kss3mf/Qv7qvusaxkjcub8wvfKRbX7w4QhXSU7+qb
 1sf6LPWBOk+xb2daUM1tzaMUnF3Pr+8gbzlAxlu1SmtG/HiC7JA=
 =e4qx
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - amd64_edac: Add support for Zen4 client hardware

 - amd64_edac: Remove the version string as it is useless and actively
   confusing when looking at backported versions of the driver

 - Add a driver for the Nuvoton NPCM memory controller

 - A debugfs error checking cleanup

* tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/npcm: Add NPCM memory controller driver
  dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller
  EDAC/thunderx: Check debugfs file creation retval properly
  EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
  EDAC/amd64: Remove module version string
2023-06-26 15:06:42 -07:00
Yazen Ghannam
4251566ebc EDAC/amd64: Cache and use GPU node map
AMD systems have historically provided an "AMD Node ID" that is a unique
identifier for each die in a multi-die package. This was associated with
a unique instance of the AMD Northbridge on a legacy system. And now it
is associated with a unique instance of the AMD Data Fabric on modern
systems. Each instance is referred to as a "Node"; this is an
AMD-specific term not to be confused with NUMA nodes.

The data fabric provides a number of interfaces accessible through a set
of functions in a single PCI device. There is one PCI device per Data
Fabric (AMD Node), and multi-die systems will see multiple such PCI
devices. The AMD Node ID matches a Node's position in the PCI hierarchy.
For example, the Node 0 is accessed using the first PCI device, Node 1
is accessed using the second, and so on. A logical CPU can find its AMD
Node ID using CPUID. Furthermore, the AMD Node ID is used within the
hardware fabric, so it is not purely a logical value.

Heterogeneous AMD systems, with a CPU Data Fabric connected to GPU data
fabrics, follow a similar convention. Each CPU and GPU die has a unique
AMD Node ID value, and each Node ID corresponds to PCI devices in
sequential order.

However, there are two caveats:
1) GPUs are not x86, and they don't have CPUID to read their AMD Node ID
like on CPUs. This means the value is more implicit and based on PCI
enumeration and hardware-specifics.
2) There is a gap in the hardware values for AMD Node IDs. Values 0-7
are for CPUs and values 8-15 are for GPUs.

For example, a system with one CPU die and two GPUs dies will have the
following values:
  CPU0 -> AMD Node 0
  GPU0 -> AMD Node 8
  GPU1 -> AMD Node 9

EDAC is the only subsystem where this has a practical effect. Memory
errors on AMD systems are commonly reported through MCA to a CPU on the
local AMD Node. The error information is passed along to EDAC where the
AMD EDAC modules use the AMD Node ID of reporting logical CPU to access
AMD Node information.

However, memory errors from a GPU die will be reported to the CPU die.
Therefore, the logical CPU's AMD Node ID can't be used since it won't
match the AMD Node ID of the GPU die. The AMD Node ID of the GPU die is
provided as part of the MCA information, and the value will match the
hardware enumeration (e.g. 8-15).

Handle this situation by discovering GPU dies the same way as CPU dies
in the AMD NB code. But do a "node id" fixup in AMD64 EDAC where it's
needed.

The GPU data fabrics provide a register with the base AMD Node ID for
their local "type", i.e. GPU data fabric. This value is the same for all
fabrics of the same type in a system.

Read and cache the base AMD Node ID from one of the GPU devices during
module initialization. Use this to fixup the "node id" when reporting
memory errors at runtime.

  [ bp: Squash a fix making gpu_node_map static as reported by
        Tom Rix <trix@redhat.com>.
    Link: https://lore.kernel.org/r/20230610210930.174074-1-trix@redhat.com ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-6-muralimk@amd.com
2023-06-19 13:01:44 +02:00
Borislav Petkov (AMD)
852667c317 Merge ras/edac-drivers into for-next
* ras/edac-drivers:
  EDAC/npcm: Add NPCM memory controller driver
  dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-06-12 15:15:36 +02:00
Marvin Lin
d244c610f1 EDAC/npcm: Add NPCM memory controller driver
Add driver for memory controller present on Nuvoton NPCM SoCs. The
memory controller supports single bit error correction and double bit
error detection.

Signed-off-by: Marvin Lin <milkfafa@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230111093245.318745-4-milkfafa@gmail.com
2023-06-12 15:14:10 +02:00
Borislav Petkov (AMD)
0a81fa5d74 Merge ras/edac-misc into for-next
* ras/edac-misc:
  EDAC/thunderx: Check debugfs file creation retval properly

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-06-07 10:50:07 +02:00
Yeqi Fu
bf5c04ddd3 EDAC/thunderx: Check debugfs file creation retval properly
edac_debugfs_create_file() returns ERR_PTR by way of the respective
debugfs function it calls, if an error occurs.

The appropriate way to verify for errors is to use IS_ERR(). Do so.

  [ bp: Rewrite all text. ]

Signed-off-by: Yeqi Fu <asuk4.q@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230517173111.365787-1-asuk4.q@gmail.com
2023-06-06 23:04:56 +02:00
Muralidhara M K
9c42edd571 EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
AMD Family 19h Model 30h-3Fh systems can be connected to AMD MI200
accelerator/GPU devices such that the CPU and GPU data fabrics are
connected together. In this configuration, the CPU manages error logging
and reporting for MCA banks located on the GPUs. This includes HBM memory
errors reported from Unified Memory Controllers (UMCs) on the GPUs.
The GPU memory errors are handled like CPU memory errors.

AMD CPU UMC support in EDAC can be re-used for GPU UMC support. However,
keeping them separate means drastic changes in one path (e.g. to support
newer products) should have less impact on the other path.

Also, simplify the "gpu_" helper functions where possible. GPU product
configuration, like memory type and channel count, is fixed compared to
CPU products.

GPU UMCs each have four physical connections (phys) connected to eight
channels. There is a single "chip select". This differs from CPUs where
each UMC has one physical connection connected to one channel, and each
channel has up to four "chip selects".

Enumerate each UMC "phy" as an EDAC CSROW, since there is only a single
chip select for each physical connection. This is similar to how a CPU
UMC "phy" is enumerated as an EDAC CHANNEL, since there is only a single
channel for each physical connection.

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-5-muralimk@amd.com
2023-06-05 12:27:18 +02:00
Yazen Ghannam
c35977b00f x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.

Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com
2023-06-05 12:27:11 +02:00
Manivannan Sadhasivam
cbd77119b6 EDAC/qcom: Get rid of hardcoded register offsets
The LLCC EDAC register offsets varies between each SoC. Hardcoding the
register offsets won't work and will often result in crash due to
accessing the wrong locations.

Hence, get the register offsets from the LLCC driver matching the
individual SoCs.

Cc: <stable@vger.kernel.org> # 6.0: 5365cea199 ("soc: qcom: llcc: Rename reg_offset structs to reflect LLCC version")
Cc: <stable@vger.kernel.org> # 6.0: c13d7d261e ("soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driver")
Cc: <stable@vger.kernel.org> # 6.0
Fixes: a6e9d7ef25 ("soc: qcom: llcc: Add configuration data for SM8450 SoC")
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230517114635.76358-3-manivannan.sadhasivam@linaro.org
2023-05-26 20:56:55 -07:00
Manivannan Sadhasivam
3d49f7406b EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
"ret" variable will be assigned on both success and failure cases. So there
is no need to initialize it during start of qcom_llcc_core_setup().

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230517114635.76358-2-manivannan.sadhasivam@linaro.org
2023-05-26 20:56:54 -07:00
Hristo Venev
6c79e42169 EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
instead of 12.

With two 32GB dual-rank DIMMs the sizes appear to be reported correctly:

  EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
  EDAC amd64: F19h_M60h detected (node 0).
  EDAC MC: UMC0 chip selects:
  EDAC amd64: MC: 0:     0MB 1:     0MB
  EDAC amd64: MC: 2: 16384MB 3: 16384MB
  EDAC MC: UMC1 chip selects:
  EDAC amd64: MC: 0:     0MB 1:     0MB
  EDAC amd64: MC: 2: 16384MB 3: 16384MB
  AMD64 EDAC driver v3.5.0

ECC errors can also be detected:

  mce: [Hardware Error]: Machine check events logged
  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b
  [Hardware Error]: Error Addr: 0x00000007ff7e93c0
  [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203
  [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
  EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1)
  [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

According to Mario Limonciello, the same code should also work for
models 70h-7Fh (follow thread in Link).

  [ bp: Massage, the translation logic updates are pending. ]

Signed-off-by: Hristo Venev <hristo@venev.name>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230425201239.324476-1-hristo@venev.name
Link: https://lore.kernel.org/r/20230511174506.875153-2-hristo@venev.name
2023-05-15 16:32:47 +02:00
Yazen Ghannam
b34348a0d7 EDAC/amd64: Remove module version string
The AMD64 EDAC module version information is not exposed through ABI
like MODULE_VERSION(). Instead it is printed during module init.

Version numbers can be confusing in cases where module updates are
partly backported resulting in a difference between upstream and
backported module versions.

Remove the AMD64 EDAC module version information to avoid user
confusion.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230410190959.3367528-1-yazen.ghannam@amd.com
2023-05-10 15:49:52 +02:00
Linus Torvalds
556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds
a907047732 ARM: SoC drivers for v6.4
The most notable updates this time are for Qualcomm Snapdragon platforms.
 The Inline-Crypto-Engine gets a new DT binding and driver. A number of
 drivers now support additional Snapdragon variants, in particular the
 rsc, scm, geni, bwm, glink and socinfo, while the llcc (edac) and rpm
 drivers get notable functionality updates.
 
 Updates on other platforms include:
 
  - Various updates to the Mediatek mutex and mmsys drivers, including
    support for the Helio X10 SoC
 
  - Support for unidirectional mailbox channels in Arm SCMI firmware
 
  - Support for per cpu asynchronous notification in OP-TEE firmware
 
  - Minor updates for memory controller drivers.
 
  - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
    Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
    SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
    obsolete DT driver interfaces.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRGmncACgkQYKtH/8kJ
 Uif6ghAAw1TiPTJzJLLCNx+txOVFB62WDglv3T1CufjfcWp0Eh0RJSCcsCOPV+/7
 UHi4+X4nPAcudeOFMFtslCR8ExLRWY4j7t2ZYo/k+VI3jdB8Qkbr6NAQgAuRdLYX
 WZ1cV6o76B3bhO2HqSVNVZ8/3Z7OAYw4j9VDD/4AbW+l3GyentlQTjabpJNREvSS
 5HzT3ZI33o7M8mM4uYmmEXVrg8sCupbRyL9S7jTiFXRLcfqujclhfezJ4UrJJv7b
 wxGf+e2YNMqKH6PiKYufzN1TYI2D0YQeB1m56Y9FsAKxgAyHh2xWpsHeyVnaw0jc
 KaKjRN/H3JDlW/VCMAjQOIShCZdAs02xHnEXxY6pKLMM6i8/FkzzNIxNQwXrx5KH
 zYESXVd6suOI0eCZT8zkKKLHRT5EJRaliUv5Z+Qp2BBe3vJVZD0JqSlZ7lOznplF
 lviwL6ydAMr2cfTgfMxbRiYQVDzncFkfnR3t55SC6rYjGt6QWjeS0dDbGHf4WVC4
 FDbnST4JaBmi+frh55VooX7EpzIv9wa0/taayaChd9qvXnh22uqaqho1sPYKZ6BI
 OXduHQ3qojJhKKKK1VJKzN5Ef3OHLQLNrvcc1DsKILrrES4w4LX1C9dmyh2CLXLo
 q5cX6L1iB1Hx5tujalDYBsHBBmbiT/1tNM2S7pAGigiGy4KEc28=
 =r6jm
 -----END PGP SIGNATURE-----

Merge tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "The most notable updates this time are for Qualcomm Snapdragon
  platforms. The Inline-Crypto-Engine gets a new DT binding and driver,
  and a number of drivers now support additional Snapdragon variants, in
  particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc
  (edac) and rpm drivers get notable functionality updates.

  Updates on other platforms include:

   - Various updates to the Mediatek mutex and mmsys drivers, including
     support for the Helio X10 SoC

   - Support for unidirectional mailbox channels in Arm SCMI firmware

   - Support for per cpu asynchronous notification in OP-TEE firmware

   - Minor updates for memory controller drivers.

   - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
     Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
     SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
     obsolete DT driver interfaces"

* tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
  soc: ti: smartreflex: Simplify getting the opam_sr pointer
  bus: vexpress-config: Add explicit of_platform.h include
  soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS
  memory: mtk-smi: mt8365: Add SMI Support
  dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365
  dt-bindings: memory-controllers: mediatek,smi-common: add mt8365
  memory: tegra: read values from correct device
  dt-bindings: crypto: Add Qualcomm Inline Crypto Engine
  soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver
  dt-bindings: firmware: document Qualcomm QCM2290 SCM
  soc: qcom: rpmh-rsc: Support RSC v3 minor versions
  soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
  soc/tegra: fuse: Remove nvmem root only access
  soc/tegra: cbb: tegra194: Use of_address_count() helper
  soc/tegra: cbb: Remove MODULE_LICENSE in non-modules
  ARM: tegra: Remove MODULE_LICENSE in non-modules
  soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource()
  soc: tegra: cbb: Drop empty platform remove function
  firmware: arm_scmi: Add support for unidirectional mailbox channels
  dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels
  ...
2023-04-25 12:02:16 -07:00
Borislav Petkov (AMD)
ce8ac91130 Merge branches 'edac-drivers', 'edac-amd64' and 'edac-misc' into edac-updates
Combine all queued EDAC changes for submission into v6.4:

* ras/edac-drivers:
  EDAC/i10nm: Add Intel Sierra Forest server support
  EDAC/skx: Fix overflows on the DRAM row address mapping arrays

* ras/edac-amd64: (27 commits)
  EDAC/amd64: Fix indentation in umc_determine_edac_cap()
  EDAC/amd64: Add get_err_info() to pvt->ops
  EDAC/amd64: Split dump_misc_regs() into dct/umc functions
  EDAC/amd64: Split init_csrows() into dct/umc functions
  EDAC/amd64: Split determine_edac_cap() into dct/umc functions
  EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
  EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
  EDAC/amd64: Split ecc_enabled() into dct/umc functions
  EDAC/amd64: Split read_mc_regs() into dct/umc functions
  EDAC/amd64: Split determine_memory_type() into dct/umc functions
  EDAC/amd64: Split read_base_mask() into dct/umc functions
  EDAC/amd64: Split prep_chip_selects() into dct/umc functions
  EDAC/amd64: Rework hw_info_{get,put}
  EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
  EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
  EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
  EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
  EDAC/amd64: Rename debug_display_dimm_sizes()

* ras/edac-misc:
  EDAC/altera: Remove MODULE_LICENSE in non-module
  EDAC: Sanitize MODULE_AUTHOR strings
  EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
  EDAC/i5100: Fix typo in comment
  EDAC/altera: Remove redundant error logging

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-04-24 09:14:30 +02:00
Qiuxu Zhuo
96ae3995c6 EDAC/i10nm: Add Intel Sierra Forest server support
The Sierra Forest CPU model uses similar memory controller registers as
Granite Rapids server. Add Sierra Forest CPU model ID for EDAC support.

Tested-by: Li Zhang <li4.zhang@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230410131531.11914-1-qiuxu.zhuo@intel.com
2023-04-10 09:33:51 -07:00
Yang Li
49aba1c589 EDAC/amd64: Fix indentation in umc_determine_edac_cap()
Use consistent indentation to improve the readability and fix:

  drivers/edac/amd64_edac.c:1279 umc_determine_edac_cap() warn: inconsistent indenting

Fixes: f6a4b4a1aa ("EDAC/amd64: Split determine_edac_cap() into dct/umc functions")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230404022557.46409-1-yang.lee@linux.alibaba.com
2023-04-04 17:22:55 +02:00
Nick Alcock
e088d80e2a EDAC/altera: Remove MODULE_LICENSE in non-module
Since

  8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"),

MODULE_LICENSE declarations are used to identify modules. As
a consequence, uses of the macro in non-modules will cause modprobe to
misidentify their containing object file as a module when it is not
(false positives), and modprobe might succeed rather than failing with
a suitable error message.

altera_edac is not a module for a while now, remove the macro call.

Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230217141059.392471-24-nick.alcock@oracle.com
2023-04-01 13:18:50 +02:00
Borislav Petkov (AMD)
371b27f2f3 EDAC: Sanitize MODULE_AUTHOR strings
Fixup the remaining MODULE_AUTHOR strings to not contain newlines.
Shorten and unbreak others.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230328134309.23159-1-bp@alien8.de
2023-03-28 15:43:30 +02:00
Jonathan Neuschäfer
01db1030f1 EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
MODULE_AUTHOR strings don't usually include a newline character.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230129165054.1675554-1-j.neuschaefer@gmx.net
2023-03-28 15:26:52 +02:00
Muralidhara M K
b3ece3a6a2 EDAC/amd64: Add get_err_info() to pvt->ops
GPU Nodes will use a different method to determine the chip select
and channel of an error. A function pointer should be used rather than
introduce another branching condition.

Prepare for this by adding get_err_info() to pvt->ops. This function is
only called from the modern code path, so a legacy function is not
defined.

Make sure to call this after MCA_STATUS[SyndV] is checked, since the
csrow value is found in MCA_SYND.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K
f6f36382d6 EDAC/amd64: Split dump_misc_regs() into dct/umc functions
Add a function pointer to pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-22-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K
6fb8b5fb9e EDAC/amd64: Split init_csrows() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

Also, drop the check for an "empty" device, i.e. one without memory.
This is redundant and already done in instance_has_memory() earlier in
the init path.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-21-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K
f6a4b4a1aa EDAC/amd64: Split determine_edac_cap() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-20-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Yazen Ghannam
9369239e8d EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
...to match the "umc_" prefix convention.

No functional change is intended.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-19-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
0a42a37f65 EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
The init_one_instance() path is shared between legacy and modern
systems. So add the new functions to a function pointer in pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-18-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
eb2bcdfc37 EDAC/amd64: Split ecc_enabled() into dct/umc functions
Call them using a function pointer in pvt->ops. The "ECC enabled"
check is done outside of the hardware information gathering done in
hw_info_get(). So a high-level function pointer is needed to separate
the legacy and modern paths.

No functional change is intended.

  [Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-17-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
32ecdf8688 EDAC/amd64: Split read_mc_regs() into dct/umc functions
Call them from their respective hw_info_get() paths.

ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz()
is left out of the UMC path. Do not save TOP_MEM* values on modern
controllers because they're not needed there (read: they were used only
for debugging, if anything).

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-16-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
78ec161a91 EDAC/amd64: Split determine_memory_type() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call them after all other hardware registers have been saved, since the
memory type for a device will be determined based on the saved
information.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-15-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
b29dad9bf3 EDAC/amd64: Split read_base_mask() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call the new functions after the setting the chip select base and mask
counts, since those are need to read the correct number of chip select
base and mask registers. And call the new functions before the remaining
set up, because the base and mask register values will be needed later.

  [Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
637f60ef2c EDAC/amd64: Split prep_chip_selects() into dct/umc functions
Call them from their respective hw_info_get() function. Avoid the
need for family/model-based function pointers.

Add the calls before reading hardware registers from the memory
controllers, since the number of chip select bases and masks needs to be
known first.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-13-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Yazen Ghannam
9a97a7f4d7 EDAC/amd64: Rework hw_info_{get,put}
The bulk of system-specific information is gathered at init time with
hw_info_get(). This function calls a number of helper functions, and
many of these helper functions are split between a modern UMC/DF path
and a legacy DCT path.

Split hw_info_get() into legacy and modern versions. This creates two
separate code paths early on, and legacy and modern helper functions can
be called directly in the appropriate code path.

Also, simplify hw_info_put() and share it between legacy and modern
systems. NULL pointer checks are done in pci_dev_put() and kfree(), so
they can be called unconditionally.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-12-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K
ed623d55ee EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
Future AMD systems will support heterogeneous "AMD Node" types, e.g.
CPU and GPU types. Therefore, a global family type shared across all
AMD nodes is no longer appropriate.

Move struct low_ops routines and members of struct amd64_family_type
to struct amd64_pvt.

Currently, there are many code branches that split between "modern" and
"legacy" systems. Another code branch will be needed in order to cover
GPU cases. However, rather than introduce another branching case in
multiple functions, the current branching code should be switched to a
set of function pointers. This change makes the code more readable and
simplifies adding support for new families/models.

In order to reuse code, define two sets of function pointers. Use one
for modern systems (Family 17h and later). This will not change between
current CPU families. Use another set of function pointers for legacy
systems (before Family 17h). Use the Family 16h versions as default
for the legacy ops since these are the latest, and adjust the function
pointers as needed for older families.

  [ Yazen: rebased/reworked patch and reworded commit message. ]
  [  bp: Fix rev8 or later check. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-11-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam
5a1adb375d EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
The ECC symbol size was needed on legacy system to lookup the ECC syndrome.
This is not needed on modern systems because the ECC syndrome is explicitly
provided in the MCA information.

Remove the ECC symbol size discovery code for modern UMC-based systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-10-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam
a2e59ab8e9 EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
The same function is used to calculate chip select size for all Zen-based
family/models. Therefore, a family/model function pointer is not necessary.

Drop the dbam_to_cs() function pointer for Family 17h and later systems.
Also, move the Family 17h function to avoid a forward declaration. Rename
it to indicate that the UMC Address Mask is used rather than the legacy
DBAM value.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-9-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam
c0984666fd EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
Split get_csrow_nr_pages() into a legacy and modern versions in preparation
for further legacy/modern refactoring.

Also, rename f17_get_cs_mode() to match the new convention.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-8-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam
00e4feb8c0 EDAC/amd64: Rename debug_display_dimm_sizes()
Use the "dct" and "umc" prefixes for legacy and modern versions
respectively.

Also, move the "dct" version to avoid a forward declaration, and fixup
some checkpatch warnings in the process.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-7-yazen.ghannam@amd.com
2023-03-24 12:54:47 +01:00
Jongwoo Han
5b6cb45072 EDAC/i5100: Fix typo in comment
Correct typo from 'preform' to 'perform' in comment.

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230302021120.56794-1-jongwooo.han@gmail.com
2023-03-23 12:04:04 +01:00
Greg Kroah-Hartman
cb4a0bec0b EDAC/sysfs: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lore.kernel.org/r/20230313182918.1312597-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22 09:25:49 +01:00