Commit Graph

846 Commits

Author SHA1 Message Date
Li Zhijian
49ba7b515c cxl/region: Fix memregion leaks in devm_cxl_add_region()
Move the mode verification to __create_region() before allocating the
memregion to avoid the memregion leaks.

Fixes: 6e09926418 ("cxl/region: Add volatile region creation support")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240507053421.456439-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-28 16:09:17 -07:00
Steven Rostedt (Google)
2c92ca849f tracing/treewide: Remove second parameter of __assign_str()
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.

This means that with:

  __string(field, mystring)

Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.

There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:

  git grep -l __assign_str | while read a ; do
      sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
      mv /tmp/test-file $a;
  done

I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.

Note, the same updates will need to be done for:

  __assign_str_len()
  __assign_rel_str()
  __assign_rel_str_len()

I tested this with both an allmodconfig and an allyesconfig (build only for both).

[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/

Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>	# xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-22 20:14:47 -04:00
Linus Torvalds
f0bae243b2 pci-v6.10-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmZLzNIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwr/Q//STe2XGKI8bAKqP2wbbkzm+ISnK4A
 Lqf3FEAIXunxDRspszfXKKV2p4vaIkmOFiwIdtp/kWvd0DQn5+ATXJ/iQtp8aFX/
 R+6BQ7EZc2G7fN5fbQuK54+CvmWEpkKEMbXYbd6ivQ14Cijdb3Nbu+w+DYFjS+6C
 k2a9lS1bTW7Xcy0fyiO1w6GQiWqtmOH8U3OlQtIrI0EVkDG9OG1LsLuc92/FgkOo
 REN+sU+hX1K5fHrvm2CtjYDn/9/B6bJ/It22H1dPgUL9nKvKC67fYzosMtUCOX1M
 6XSPjZIuXOmQGeZXHhpSlVwaidxoUjYO98I7nMquxKdCy6yct3geK7ULG/xeQCgD
 ML7MGQB4+sTiSWalXUQaziKqF1FIDEvU3HMGXFWnoBL5l56eRp8KS1EI9Eqk9pU3
 pk9fJaCkcFnkzPtMFzqPOm5q9zUZ6bGbfYb0hs72TUKplmVDhFo2T1YsW2AOyHZ7
 mjuDzUYZX0H7uM1tntA56IgZX+oNOrLvhBt5L5M/BQeCsZFBBUfIcAEaYoL9LwXO
 AYgIG3jdqzHHyAUzutJF+XHKinJLMHm0XVYbFmO6saPhFzrUJSNHqT7NzW1DGGTl
 OnO8e1WNMX1EcnKvnc6fXyGmM3SgVwy45FsbG/zRnhn4uBKqKtjrh6uX/myA22LK
 CSeqSUK9XmXxFNA=
 =xjoS
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Skip E820 checks for MCFG ECAM regions for new (2016+) machines,
     since there's no requirement to describe them in E820 and some
     platforms require ECAM to work (Bjorn Helgaas)

   - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien
     Le Moal)

   - Remove last user and pci_enable_device_io() (Heiner Kallweit)

   - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen)

   - Skip waiting for devices that have been disconnected while
     suspended (Ilpo Järvinen)

   - Clear Secondary Status errors after enumeration since Master Aborts
     and Unsupported Request errors are an expected part of enumeration
     (Vidya Sagar)

  MSI:

   - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas)

  Error handling:

   - Mask Genesys GL975x SD host controller Replay Timer Timeout
     correctable errors caused by a hardware defect; the errors cause
     interrupts that prevent system suspend (Kai-Heng Feng)

   - Fix EDR-related _DSM support, which previously evaluated revision 5
     but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan)

  ASPM:

   - Simplify link state definitions and mask calculation (Ilpo
     Järvinen)

  Power management:

   - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS
     apparently doesn't know how to put them back in D0 (Mario
     Limonciello)

  CXL:

   - Support resetting CXL devices; special handling required because
     CXL Ports mask Secondary Bus Reset by default (Dave Jiang)

  DOE:

   - Support DOE Discovery Version 2 (Alexey Kardashevskiy)

  Endpoint framework:

   - Set endpoint BAR to be 64-bit if the driver says that's all the
     device supports, in addition to doing so if the size is >2GB
     (Niklas Cassel)

   - Simplify endpoint BAR allocation and setting interfaces (Niklas
     Cassel)

  Cadence PCIe controller driver:

   - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof
     Kozlowski)

  Cadence PCIe endpoint driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

  Freescale Layerscape PCIe controller driver:

   - Convert DT binding to YAML (Frank Li)

  MediaTek MT7621 PCIe controller driver:

   - Add DT binding missing 'reg' property for child Root Ports
     (Krzysztof Kozlowski)

   - Fix theoretical string truncation in PHY name (Sergio Paracuellos)

  NVIDIA Tegra194 PCIe controller driver:

   - Return success for endpoint probe instead of falling through to the
     failure path (Vidya Sagar)

  Renesas R-Car PCIe controller driver:

   - Add DT binding missing IOMMU properties (Geert Uytterhoeven)

   - Add DT binding R-Car V4H compatible for host and endpoint mode
     (Yoshihiro Shimoda)

  Rockchip PCIe controller driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

   - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski)

   - Set the Subsystem Vendor ID, which was previously zero because it
     was masked incorrectly (Rick Wertenbroek)

  Synopsys DesignWare PCIe controller driver:

   - Restructure DBI register access to accommodate devices where this
     requires Refclk to be active (Manivannan Sadhasivam)

   - Remove the deinit() callback, which was only need by the
     pcie-rcar-gen4, and do it directly in that driver (Manivannan
     Sadhasivam)

   - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean
     up things like eDMA (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel
     to dw_pcie_ep_init() (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to
     reflect the actual functionality (Manivannan Sadhasivam)

   - Call dw_pcie_ep_init_registers() directly from all the glue
     drivers, not just those that require active Refclk from the host
     (Manivannan Sadhasivam)

   - Remove the "core_init_notifier" flag, which was an obscure way for
     glue drivers to indicate that they depend on Refclk from the host
     (Manivannan Sadhasivam)

  TI J721E PCIe driver:

   - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli)

   - Add DT binding J722S SoC support (Siddharth Vadapalli)

  TI Keystone PCIe controller driver:

   - Add DT binding missing num-viewport, phys and phy-name properties
     (Jan Kiszka)

  Miscellaneous:

   - Constify and annotate with __ro_after_init (Heiner Kallweit)

   - Convert DT bindings to YAML (Krzysztof Kozlowski)

   - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming
     Zhou)"

* tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
  PCI: Do not wait for disconnected devices when resuming
  x86/pci: Skip early E820 check for ECAM region
  PCI: Remove unused pci_enable_device_io()
  ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io()
  PCI: Update pci_find_capability() stub return types
  PCI: Remove PCI_IRQ_LEGACY
  scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY
  scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios
  Revert "genirq/msi: Provide constants for PCI/IMS support"
  Revert "x86/apic/msi: Enable PCI/IMS"
  Revert "iommu/vt-d: Enable PCI/IMS"
  Revert "iommu/amd: Enable PCI/IMS"
  Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"
  ...
2024-05-21 10:09:28 -07:00
Linus Torvalds
2e9250022e CXL changes for v6.10 merge window
Topics:
 - Add CXL log related mailbox commands
   - Add Get Log Capabilities command
   - Add Get Supported Log Sub-List Commands command
   - Add Clear Log command
 - Add series for HPA to DPA translation for CXL events cxl_dram and cxl_general_media
 - Add support to send CPER records to CXL for more detailed parsing.
 
 Misc changes and fixes:
 - Fix for compile warning of cxl_security_ops
 - Add debug message for invalid interleave granularity
 - Enhancement to cxl-test event testing
 - Add dev_warn() on unsupported mixed mode decoder
 - Fix use of phys_to_target_node() for x86
 - Use helper function for decoder enum instead of open coding
 - Include missing headers for cxl-event
 - Fix MAINTAINERS file entry
 - Fix cxlr_pmem memory leak
 - Cleanup __cxl_parse_cfmws via scope-based resource menagement
 - Convert cxl_pmem_region_alloc() to scope-based resource management
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmZE5YcACgkQYGjFFmlT
 OEo5ww/+KaerkOcC+B11R7+HHf4yfnbbOzBmBFIrRncWtaSoObq4CJyvjmG3cWL8
 bP+SQ4MazruyPgW0MFUBBeTvJTXUrXDbv0vcCLLRPgcHAP/PROMh7u8LueCFXo/T
 vd0II5OtmBm8aub3Ty5nNNdkgo1/rRmtlCLnBUD5q/om11ZbXWaEZS1ivBC9Dzlf
 qVRe4EYYEldrQWeqRI8LNZIstNtYF/ifzN0T1DQ4ndPuc/9r+s2YLyohcVn1tCZx
 mqeNj5wBldiLPe0QL8perqDbgJzZj8JX9rXgdj+4xUQcnoCBLMxiSn5XFmbIKgzf
 c0Y9LeNR3Y3Ivy57Qd/dakeI6yhz9F0J8MR0ifEyHcmFNCXT13mikN0OoAuSeVRT
 pkGuK/V8D3eF8a3wWn6CT4mJGdGDC8v2DDH2WK75+JrNAfgi4+V6GYUu1bHZMofY
 C0BEolMBZQMmWtR+CYOYm8UAHSRfDyQg55JFaxbbQkc/PekY8TMVR4klUA/+54az
 VL3RONeBxAYUksEoN7vVDhyAGCe1uiW3NyZOlLyRVqtB+X8ZFdh5eXxzhxxBaLKo
 W1M7xo1nDlc/SAqIFOfU9mt99krVJyBi12IuFeTKG9tgO2nYhoazbN9Q4W3wbyEj
 KU84YPFpuwEA9fIyRd8n7vDfe3QbMVL3Xt84MlMcHLlFgBSeUKE=
 =rCAo
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull CXL updates from Dave Jiang:

 - Three CXL mailbox passthrough commands are added to support the
   populating and clearing of vendor debug logs:
     - Get Log Capabilities
     - Get Supported Log Sub-List Commands
     - Clear Log

 - Add support of Device Phyiscal Address (DPA) to Host Physical Address
   (HPA) translation for CXL events of cxl_dram and cxl_general media.

   This allows user space to figure out which CXL region the event
   occured via trace event.

 - Connect CXL to CPER reporting.

   If a device is configured for firmware first, CXL event records are
   not sent directly to the host. Those records are reported through EFI
   Common Platform Error Records (CPER). Add support to route the CPER
   records through the CXL sub-system in order to provide DPA to HPA
   translation and also event decoding and tracing. This is useful for
   users to determine which system issues may correspond to specific
   hardware events.

 - A number of misc cleanups and fixes:
     - Fix for compile warning of cxl_security_ops
     - Add debug message for invalid interleave granularity
     - Enhancement to cxl-test event testing
     - Add dev_warn() on unsupported mixed mode decoder
     - Fix use of phys_to_target_node() for x86
     - Use helper function for decoder enum instead of open coding
     - Include missing headers for cxl-event
     - Fix MAINTAINERS file entry
     - Fix cxlr_pmem memory leak
     - Cleanup __cxl_parse_cfmws via scope-based resource menagement
     - Convert cxl_pmem_region_alloc() to scope-based resource management

* tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
  cxl/cper: Remove duplicated GUID defines
  cxl/cper: Fix non-ACPI-APEI-GHES build
  cxl/pci: Process CPER events
  acpi/ghes: Process CXL Component Events
  cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management
  cxl/acpi: Cleanup __cxl_parse_cfmws()
  cxl/region: Fix cxlr_pmem leaks
  cxl/core: Add region info to cxl_general_media and cxl_dram events
  cxl/region: Move cxl_trace_hpa() work to the region driver
  cxl/region: Move cxl_dpa_to_region() work to the region driver
  cxl/trace: Correct DPA field masks for general_media & dram events
  MAINTAINERS: repair file entry in COMPUTE EXPRESS LINK
  cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h>
  cxl/hdm: Debug, use decoder name function
  cxl: Fix use of phys_to_target_node() for x86
  cxl/hdm: dev_warn() on unsupported mixed mode decoder
  cxl/test: Enhance event testing
  cxl/hdm: Add debug message for invalid interleave granularity
  cxl: Fix compile warning for cxl_security_ops extern
  cxl/mbox: Add Clear Log mailbox command
  ...
2024-05-15 14:32:27 -07:00
Dave Jiang
934edcd436 cxl: Add post-reset warning if reset results in loss of previously committed HDM decoders
Secondary Bus Reset (SBR) is equivalent to a device being hot removed and
inserted again. Doing a SBR on a CXL type 3 device is problematic if the
exported device memory is part of system memory that cannot be offlined.
The event is equivalent to violently ripping out that range of memory from
the kernel. While the hardware requires the "Unmask SBR" bit set in the
Port Control Extensions register and the kernel currently does not unmask
it, user can unmask this bit via setpci or similar tool.

The driver does not have a way to detect whether a reset coming from the
PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to
detect is to note if a decoder is marked as enabled in software but the
decoder control register indicates it's not committed.

Add a helper function to find discrepancy between the decoder software
state versus the hardware register state.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240502165851.1948523-6-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2024-05-08 13:25:46 -05:00
Dave Jiang
962f1e79e7 PCI/CXL: Move CXL Vendor ID to pci_ids.h
Move PCI_DVSEC_VENDOR_ID_CXL in CXL private code to PCI_VENDOR_ID_CXL in
pci_ids.h in order to be utilized in PCI subsystem.

While the CXL Vendor ID (0x1e98) is not listed in the PCI SIG "Member
Companies" database at https://pcisig.com/membership/member-companies, the
SIG has confirmed that it is reserved by CXL.

Link: https://lore.kernel.org/r/20240502165851.1948523-2-dave.jiang@intel.com
Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Link: https://lore.kernel.org/linux-cxl/20240402172323.GA1818777@bhelgaas/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[bhelgaas: update commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2024-05-08 13:18:33 -05:00
Dave Jiang
df2a8f4b44 Merge remote-tracking branch 'cxl/for-6.10/cper' into cxl-for-next
Add support to send CPER records to CXL for more detailed parsing.
2024-05-01 09:53:32 -07:00
Ira Weiny
c19ac30eda cxl/pci: Process CPER events
If the firmware has configured CXL event support to be firmware first
the OS will receive those events through CPER records.  The CXL layer has
unique DPA to HPA knowledge and existing event trace parsing in
place.[0]

Add a CXL CPER work item and register it with the GHES code to process
CPER events.

Link: http://lore.kernel.org/r/cover.1711598777.git.alison.schofield@intel.com [0]
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Link: https://lore.kernel.org/r/20240426-cxl-cper3-v4-2-58076cce1624@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01 09:24:08 -07:00
Dan Williams
d357dd8ad2 cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management
A recent bugfix to cxl_pmem_region_alloc() to fix an
error-unwind-memleak [1], highlighted a use case for scope-based resource
management.

Delete the goto for releasing @cxl_region_rwsem, and return error codes
directly from error condition paths.

The caller, devm_cxl_add_pmem_region(), is no longer given @cxlr_pmem
directly it must retrieve it from @cxlr->cxlr_pmem. This retrieval from
@cxlr was already in place for @cxlr->cxl_nvb, and converting
cxl_pmem_region_alloc() to return an int makes it less awkward to handle
no_free_ptr().

Cc: Li Zhijian <lizhijian@fujitsu.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20240430174540.000039ce@Huawei.com
Link: http://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/171451430965.1147997.15782562063090960666.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01 09:17:04 -07:00
Dan Williams
e4ff70a8e3 cxl/acpi: Cleanup __cxl_parse_cfmws()
As a follow on to the recent rework of __cxl_parse_cfmws() to always
return errors [1], use cleanup.h helpers to remove goto and other cleanups
now that logging is moved to the cxl_parse_cfmws() wrapper.

This ends up adding more code than it deletes, but __cxl_parse_cfmws()
itself does get smaller. The takeaway from the cond_no_free_ptr()
discussion [2] was to not add new macros to handle the cases where
no_free_ptr() is awkward, instead rework the code to have helpers and
clearer delineation of responsibility.

Now one might say that  __free(del_cxl_resource) is excessive given it
is immediately registered with add_or_reset_cxl_resource(). The
rationale for keeping it is that it forces use of "no_free_ptr()" on the
argument passed to add_or_reset_cxl_resource(). That in turn makes it
clear that @res is NULL for the rest of the function which is part of
the point of the cleanup helpers, to turn subtle use after free errors
[3] into loud NULL pointer de-references.

Link: http://lore.kernel.org/r/170820177238.631006.1012639681618409284.stgit@dwillia2-xfh.jf.intel.com [1]
Link: http://lore.kernel.org/r/CAHk-=whBVhnh=KSeBBRet=E7qJAwnPR_aj5em187Q3FiD+LXnA@mail.gmail.com [2]
Link: http://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org [3]
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20240219124041.00002bda@Huawei.com
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/171235474028.2718248.14109646123143505522.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01 09:01:14 -07:00
Li Zhijian
1c987cf22d cxl/region: Fix cxlr_pmem leaks
Before this error path, cxlr_pmem pointed to a kzalloc() memory, free
it to avoid this memory leaking.

Fixes: f17b558d66 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 14:04:52 -07:00
Dave Jiang
660c0a8679 Merge remote-tracking branch 'cxl/for-6.10/dpa-to-hpa' into cxl-for-next
Support for HPA to DPA translation for CXL events cxl_dram and
cxl_general_media.
2024-04-30 13:45:43 -07:00
Alison Schofield
6aec00139d cxl/core: Add region info to cxl_general_media and cxl_dram events
User space may need to know which region, if any, maps the DPAs
(device physical addresses) reported in a cxl_general_media or
cxl_dram event. Since the mapping can change, the kernel provides
this information at the time the event occurs. This informs user
space that at event <timestamp> this <region> mapped this <DPA>
to this <HPA>.

Add the same region info that is included in the cxl_poison trace
event: the DPA->HPA translation, region name, and region uuid.

The new fields are inserted in the trace event and no existing
fields are modified. If the DPA is not mapped, user will see:
hpa=ULLONG_MAX, region="", and uuid=0

This work must be protected by dpa_rwsem & region_rwsem since
it is looking up region mappings.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/dd8d708b7a7ebfb64a27020a5eb338091336b34d.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 12:24:44 -07:00
Alison Schofield
86954ff503 cxl/region: Move cxl_trace_hpa() work to the region driver
This work belongs in the region driver as it is only useful with
CONFIG_CXL_REGION. Add a stub in core.h for when the region driver
is not built.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/183222631f11a43c5e6debc42ec22fe1bd4b818a.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 12:24:42 -07:00
Alison Schofield
b98d042698 cxl/region: Move cxl_dpa_to_region() work to the region driver
This helper belongs in the region driver as it is only useful
with CONFIG_CXL_REGION. Add a stub in core.h for when the region
driver is not built.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 12:24:42 -07:00
Alison Schofield
2042d11cb5 cxl/trace: Correct DPA field masks for general_media & dram events
The length of Physical Address in General Media and DRAM event
records is 64-bit, so the field mask for extracting the DPA should
be 64-bit also, otherwise the trace event reports DPA's with the
upper 32 bits of a DPA address masked off. If users do DPA-to-HPA
translations this could lead to incorrect page retirement decisions.

Use GENMASK_ULL() for CXL_DPA_MASK to get all the DPA address bits.

Tidy up CXL_DPA_FLAGS_MASK by using GENMASK() to only mask the exact
flag bits.

These bits are defined as part of the event record physical address
descriptions of General Media and DRAM events in CXL Spec 3.1
Section 8.2.9.2 Events.

Fixes: d54a531a43 ("cxl/mem: Trace General Media Event Record")
Co-developed-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/2867fc43c57720a4a15a3179431829b8dbd2dc16.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 12:24:26 -07:00
Dave Jiang
db4fdb73f9 Merge remote-tracking branch 'cxl/for-6.10/add-log-mbox-cmds' into cxl-for-next
Add CXL log related mailbox commands
- Add Get Log Capabilities command
- Add Get Supported Log Sub-List Commands command
- Add Clear Log command
2024-04-30 10:46:21 -07:00
Ira Weiny
6ef37af6f4 cxl/hdm: Debug, use decoder name function
The decoder enum has a name conversion function defined now.

Use that instead of open coding.

Suggested-by: Navneet Singh <navneet.singh@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-1-f740c47e7916@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 10:43:48 -07:00
Robert Richter
4cce9c6d4b cxl: Fix use of phys_to_target_node() for x86
The CXL driver uses both functions phys_to_target_node() and
memory_add_physaddr_to_nid(). The x86 architecture relies on the
NUMA_KEEP_MEMINFO kernel option enabled for both functions to work
correct. Update Kconfig to make sure the option is always enabled for
the driver.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Link: http://lore.kernel.org/r/65f8b191c0422_aa222941b@dwillia2-mobl3.amr.corp.intel.com.notmuch
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20240424154756.2152614-1-rrichter@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 10:43:48 -07:00
Alison Schofield
4afaed94bc cxl/hdm: dev_warn() on unsupported mixed mode decoder
A mixed mode decoder is programmed with device physical addresses
that span both ram and pmem partitions of a memdev.

Linux does not support mixed mode decoders. The driver rejects
sysfs writes that try to set decoder mode to mixed, and if a
resource bieng allocated is not wholly contained in either the
pmem or ram partition of a memdev, it is also rejected. Basically,
the CXL region driver is not going to create regions with mixed
mode decoders, but the BIOS could.

If the kernel driver sees the mixed mode decoder, it will fail to
enable the region, and emit a dev_dbg() message.

A dev_dbg() is not noisy enough in this case. Change the message
to be a dev_warn() that explicitly says mixed mode is not supported.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230218013834.31237-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 10:43:48 -07:00
Huang Ying
54e8dd59a7 cxl/hdm: Add debug message for invalid interleave granularity
There's no debug message for invalid interleave granularity.  This
makes it hard to debug related bugs.  So, this is added in this patch.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240402061016.388408-1-ying.huang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 10:43:48 -07:00
Dave Jiang
3381586a40 cxl: Fix compile warning for cxl_security_ops extern
Jonathan reported he has observed compiler warning when using running with
W=1 C=1 for cxl_security_ops that is declared as an extern in cxl/pmem.c.
Move to cxl.h to make it visible to all cxl sources.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/linux-cxl/167771196186.3285982.18283746206612049722.stgit@djiang5-mobl3.local/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 10:43:48 -07:00
Srinivasulu Thanneeru
206f9fa9d5 cxl/mbox: Add Clear Log mailbox command
Adding UAPI support for CXL r3.1 8.2.9.5.4
Clear Log command.

This proposed patch will be useful for clearing and populating
the Vendor debug log in certain scenarios, allowing for the
aggregation of results over time.

Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240313071218.729-3-sthanneeru.opensrc@micron.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 08:48:10 -07:00
Srinivasulu Thanneeru
940325add1 cxl/mbox: Add Get Log Capabilities and Get Supported Logs Sub-List commands
Adding UAPI support for
1. CXL r3.1 8.2.9.5.3 Get Log Capabilities.
2. CXL r3.1 8.2.9.5.6 Get Supported Logs Sub-List.

Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240313071218.729-2-sthanneeru.opensrc@micron.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30 08:48:10 -07:00
Dave Jiang
5d211c7090 cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH
Robert reported the following when booting a CXL host with Restricted CXL
Host (RCH) topology:
 [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
 [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]

... plus some related subsequent NULL pointer dereference:

 [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8

The iterator to walk the PCIe path did not account for RCH topology.
However RCH does not support hotplug and the memory exported by the
Restricted CXL Device (RCD) should be covered by HMAT and therefore no
access_coordinate is needed. Add check to see if the endpoint device is
RCD and skip calculation.

Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
to exercise the topology iterator. The dev_is_pci() check added is to help
with this test and should be harmless for normal operation.

Reported-by: Robert Richter <rrichter@amd.com>
Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
Fixes: 592780b839 ("cxl: Fix retrieving of access_coordinates in PCIe path")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20240426224913.1027420-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-29 09:03:26 -07:00
Dan Williams
4b759dd576 cxl/core: Fix potential payload size confusion in cxl_mem_get_poison()
A recent change to cxl_mem_get_records_log() [1] highlighted a subtle
nuance of looping calls to cxl_internal_send_cmd(), i.e. that
cxl_internal_send_cmd() modifies the 'size_out' member of the @mbox_cmd
argument. That mechanism is useful for communicating underflow, but it
is unwanted when reusing @mbox_cmd for a subsequent submission. It turns
out that cxl_xfer_log() avoids this scenario by always redefining
@mbox_cmd each iteration.

Update cxl_mem_get_records_log() and cxl_mem_get_poison() to follow the
same style as cxl_xfer_log(), i.e. re-define @mbox_cmd each iteration.
The cxl_mem_get_records_log() change is just a style fixup, but the
cxl_mem_get_poison() change is a potential fix, per Alison [2]:

    Poison list retrieval can hit this case if the MORE flag is set and
    a follow on read of the list delivers more records than the previous
    read.  ie. device gives one record, sets the _MORE flag, then gives 5.

Not an urgent fix since this behavior has not been seen in the wild,
but worth tracking as a fix.

Cc: Kwangjin Ko <kwangjin.ko@sk.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Fixes: ed83f7ca39 ("cxl/mbox: Add GET_POISON_LIST mailbox command")
Link: http://lore.kernel.org/r/20240402081404.1106-2-kwangjin.ko@sk.com [1]
Link: http://lore.kernel.org/r/ZhAhAL/GOaWFrauw@aschofie-mobl2 [2]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/171235441633.2716581.12330082428680958635.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-22 08:58:59 -07:00
Linus Torvalds
586b5dfb51 cxl fixes for v6.9-rc4
- Fix index of Clear Event Record handles in cxl_clear_event_record().
 - Fix use before init of map->reg_type in cxl_decode_regblock().
 - Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log().
 - Series fixing CXL path access_coordinate computation.
   - Remove unneded check of iter in loop.
   - Fix of retrieving of access_coordinate in PCI topology walk.
   - Fix of incorrect region access_coordinate data calculation.
   - Consolidate of access_coordinates attached to downstream port
     context.
   - Add check to validate access_coordinate validity to prevent
     incorrect data being exposed via sysfs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmYYa+YACgkQYGjFFmlT
 OErU2A/+MOjbUrgHAm2NECLR2SrXb7JHJA6J5glaWLwjUpuV97BopHpEAU5Whlf4
 sLk5o1j7DcNjKDBQQTtBvefYDdfzMQGS2amZdu9Z7FZJtWW1DRiVYjuKdMS3y4mC
 I6U0jRHWp6ojhf6Wa/09LYrRzOxu+sPLV8t3MGkkIpdYFwunJXl169H22EIjCuWD
 BALjl2jCqKSPIwxZMnM7hR817s1z6sDM25XK2Wr1oSCGaIeV0uGvZyx0PnY1jFSG
 z2iFN/ZntivbT554JTNEFMeHheOlkzZL7liy5QZRGCKmrfTM0WnVtFyMWGpQ85XI
 GMoi/xSCDozrmOk3aMTPqyhabCX9VGdUO5IZDyiMwfofCXKZQrGs6IbzQLvTC/MV
 Ngtzb8CExvel+N24UAiWDBilhsgvzrRLCBRWc8Scl08cGXF0/C+n2+Nq3brTqAaP
 aDn4Zj9IOpSG0POawN4mqLb90A7JkbCNux35ssQ6b/lXVjIe7uRqrmlFcXMnV6ja
 dQ1fw5dxZBCr1wtTSOOAqOqVt1XNw16VP85nmQ6SwWed++4Ja2U/cMZVwtRLnf1A
 sz53Po209RJODhwjzyQ5kxj6oTss3voqQ7MlVUiWrOnXPohQsGMRk3gGW/fC1DG3
 prvNFZlHfeeWyw5H7goJ+Newx/fcY991ytuI9X7II/cG+TLQ0cM=
 =SGRd
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dave Jiang:

 - Fix index of Clear Event Record handles in cxl_clear_event_record()

 - Fix use before init of map->reg_type in cxl_decode_regblock()

 - Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log()

 - Fix CXL path access_coordinate computation:
     - Remove unneded check of iter in loop
     - Fix of retrieving of access_coordinate in PCI topology walk
     - Fix of incorrect region access_coordinate data calculation
     - Consolidate of access_coordinates attached to downstream port
       context
     - Add check to validate access_coordinate validity to prevent
       incorrect data being exposed via sysfs

* tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl: Add checks to access_coordinate calculation to fail missing data
  cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord
  cxl: Fix incorrect region perf data calculation
  cxl: Fix retrieving of access_coordinates in PCIe path
  cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
  cxl/core: Fix initialization of mbox_cmd.size_out in get event
  cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
  cxl/mem: Fix for the index of Clear Event Record Handle
2024-04-11 16:49:11 -07:00
Dave Jiang
7bcf809b1e cxl: Add checks to access_coordinate calculation to fail missing data
Jonathan noted that when the coordinates for host bridge and switches
can be 0s if no actual data are retrieved and the calculation continues.
The resulting number would be inaccurate. Add checks to ensure that the
calculation would complete only if the numbers are valid.

While not seen in the wild, issue may show up with a BIOS that reported
CXL root ports via Generic Ports (via a PCI handle in the SRAT entry).

Fixes: 14a6960b3e ("cxl: Add helper function that calculate performance data for downstream ports")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-6-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08 08:25:21 -07:00
Dave Jiang
001c5d1934 cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord
The driver stores access_coordinate for host bridge in ->hb_coord and
switch CDAT access_coordinate in ->sw_coord. Since neither of these
access_coordinate clobber each other, the variable name can be consolidated
into ->coord to simplify the code.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-5-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08 08:25:21 -07:00
Dave Jiang
51293c565c cxl: Fix incorrect region perf data calculation
Current math in cxl_region_perf_data_calculate divides the latency by 1000
every time the function gets called. This causes the region latency to be
divided by 1000 per memory device and the math is incorrect. This is user
visible as the latency access_coordinate exposed via sysfs will show
incorrect latency data.

Normalize values from CDAT to nanoseconds. Adjust sub-nanoseconds latency
to at least 1. Remove adjustment of perf numbers from the generic target
since hmat handling code has already normalized those numbers. Now all
computation and stored numbers should be in nanoseconds.

cxl_hb_get_perf_coordinates() is removed and HB coords are calculated
in the port access_coordinate calculation path since it no longer need
to be treated special.

Fixes: 3d9f4a1972 ("cxl/region: Calculate performance data for a region")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08 08:25:21 -07:00
Dave Jiang
592780b839 cxl: Fix retrieving of access_coordinates in PCIe path
Current loop in cxl_endpoint_get_perf_coordinates() incorrectly assumes
the Root Port (RP) dport is the one with generic port access_coordinate.
However those coordinates are one level up in the Host Bridge (HB).
Current code causes the computation code to pick up 0s as the coordinates
and cause minimal bandwidth to result in 0.

Add check to skip RP when combining coordinates.

Fixes: 14a6960b3e ("cxl: Add helper function that calculate performance data for downstream ports")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08 08:24:45 -07:00
Dave Jiang
648dae58a8 cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
The while() loop in cxl_endpoint_get_perf_coordinates() checks to see if
'iter' is valid as part of the condition breaking out of the loop.
is_cxl_root() will stop the loop before the next iteration could go NULL.
Remove the iter check.

The presence of the iter or removing the iter does not impact the behavior
of the code. This is a code clean up and not a bug fix.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-2-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-05 08:15:50 -07:00
Kwangjin Ko
f7c52345cc cxl/core: Fix initialization of mbox_cmd.size_out in get event
Since mbox_cmd.size_out is overwritten with the actual output size in
the function below, it needs to be initialized every time.

cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd

Problem scenario:

1) The size_out variable is initially set to the size of the mailbox.
2) Read an event.
   - size_out is set to 160 bytes(header 32B + one event 128B).
   - Two event are created while reading.
3) Read the new *two* events.
   - size_out is still set to 160 bytes.
   - Although the value of out_len is 288 bytes, only 160 bytes are
     copied from the mailbox register to the local variable.
   - record_count is set to 2.
   - Accessing records[1] will result in reading incorrect data.

Fixes: 6ebe28f9ec ("cxl/mem: Read, trace, and clear events on driver load")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-03 14:25:32 -07:00
Dave Jiang
5c88a9ccd4 cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
In the error path, map->reg_type is being used for kernel warning
before its value is setup. Found by code inspection. Exposure to
user is wrong reg_type being emitted via kernel log. Use a local
var for reg_type and retrieve value for usage.

Fixes: 6c7f4f1e51 ("cxl/core/regs: Make cxl_map_{component, device}_regs() device generic")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-26 12:06:21 -07:00
Yuquan Wang
b7c59b038c cxl/mem: Fix for the index of Clear Event Record Handle
The dev_dbg info for Clear Event Records mailbox command would report
the handle of the next record to clear not the current one.

This was because the index 'i' had incremented before printing the
current handle value.

Fixes: 6ebe28f9ec ("cxl/mem: Read, trace, and clear events on driver load")
Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-26 12:06:21 -07:00
Masahiro Yamada
a46aba14cf cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
Commit 5d7107c727 ("perf: CXL Performance Monitoring Unit driver")
added the config entries for CXL_PMU in drivers/cxl/Kconfig and
drivers/perf/Kconfig, so it can be toggled from multiple locations:

[1] Device Drivers
     -> PCI support
       -> CXL (Compute Expres Link) Devices
         -> CXL Performance Monitoring Unit

[2] Device Drivers
     -> Performance monitor support
       -> CXL Performance Monitoring Unit

This complicates things, and nobody else does this.

I kept the one in drivers/perf/Kconfig because CONFIG_CXL_PMU controls
the compilation of drivers/perf/cxl_pmu.c.

Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-03-27 01:58:34 +09:00
Linus Torvalds
ad584d73a2 Tracing updates for 6.9:
Main user visible change:
 
 - User events can now have "multi formats"
 
   The current user events have a single format. If another event is created
   with a different format, it will fail to be created. That is, once an
   event name is used, it cannot be used again with a different format. This
   can cause issues if a library is using an event and updates its format.
   An application using the older format will prevent an application using
   the new library from registering its event.
 
   A task could also DOS another application if it knows the event names, and
   it creates events with different formats.
 
   The multi-format event is in a different name space from the single
   format. Both the event name and its format are the unique identifier.
   This will allow two different applications to use the same user event name
   but with different payloads.
 
 - Added support to have ftrace_dump_on_oops dump out instances and
   not just the main top level tracing buffer.
 
 Other changes:
 
 - Add eventfs_root_inode
 
   Only the root inode has a dentry that is static (never goes away) and
   stores it upon creation. There's no reason that the thousands of other
   eventfs inodes should have a pointer that never gets set in its
   descriptor. Create a eventfs_root_inode desciptor that has a eventfs_inode
   descriptor and a dentry pointer, and only the root inode will use this.
 
 - Added WARN_ON()s in eventfs
 
   There's some conditionals remaining in eventfs that should never be hit,
   but instead of removing them, add WARN_ON() around them to make sure that
   they are never hit.
 
 - Have saved_cmdlines allocation also include the map_cmdline_to_pid array
 
   The saved_cmdlines structure allocates a large amount of data to hold its
   mappings. Within it, it has three arrays. Two are already apart of it:
   map_pid_to_cmdline[] and saved_cmdlines[]. More memory can be saved by
   also including the map_cmdline_to_pid[] array as well.
 
 - Restructure __string() and __assign_str() macros used in TRACE_EVENT().
 
   Dynamic strings in TRACE_EVENT() are declared with:
 
       __string(name, source)
 
   And assigned with:
 
      __assign_str(name, source)
 
   In the tracepoint callback of the event, the __string() is used to get the
   size needed to allocate on the ring buffer and __assign_str() is used to
   copy the string into the ring buffer. There's a helper structure that is
   created in the TRACE_EVENT() macro logic that will hold the string length
   and its position in the ring buffer which is created by __string().
 
   There are several trace events that have a function to create the string
   to save. This function is executed twice. Once for __string() and again
   for __assign_str(). There's no reason for this. The helper structure could
   also save the string it used in __string() and simply copy that into
   __assign_str() (it also already has its length).
 
   By using the structure to store the source string for the assignment, it
   means that the second argument to __assign_str() is no longer needed.
 
   It will be removed in the next merge window, but for now add a warning if
   the source string given to __string() is different than the source string
   given to __assign_str(), as the source to __assign_str() isn't even used
   and will be going away.
 
 - Added checks to make sure that the source of __string() is also the
   source of __assign_str() so that it can be safely removed in the next
   merge window.
 
   Included fixes that the above check found.
 
 - Other minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZfhbUBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrhJAP9bfnYO7tfNGZVNPmTT7Fz0z4zCU1Pb
 P8M+24yiFTeFWwD/aIPlMFZONVkTdFAlLdffl6kJOKxZ7vW4XzUjfNWb6wo=
 =z/D6
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:
 "Main user visible change:

   - User events can now have "multi formats"

     The current user events have a single format. If another event is
     created with a different format, it will fail to be created. That
     is, once an event name is used, it cannot be used again with a
     different format. This can cause issues if a library is using an
     event and updates its format. An application using the older format
     will prevent an application using the new library from registering
     its event.

     A task could also DOS another application if it knows the event
     names, and it creates events with different formats.

     The multi-format event is in a different name space from the single
     format. Both the event name and its format are the unique
     identifier. This will allow two different applications to use the
     same user event name but with different payloads.

   - Added support to have ftrace_dump_on_oops dump out instances and
     not just the main top level tracing buffer.

  Other changes:

   - Add eventfs_root_inode

     Only the root inode has a dentry that is static (never goes away)
     and stores it upon creation. There's no reason that the thousands
     of other eventfs inodes should have a pointer that never gets set
     in its descriptor. Create a eventfs_root_inode desciptor that has a
     eventfs_inode descriptor and a dentry pointer, and only the root
     inode will use this.

   - Added WARN_ON()s in eventfs

     There's some conditionals remaining in eventfs that should never be
     hit, but instead of removing them, add WARN_ON() around them to
     make sure that they are never hit.

   - Have saved_cmdlines allocation also include the map_cmdline_to_pid
     array

     The saved_cmdlines structure allocates a large amount of data to
     hold its mappings. Within it, it has three arrays. Two are already
     apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
     can be saved by also including the map_cmdline_to_pid[] array as
     well.

   - Restructure __string() and __assign_str() macros used in
     TRACE_EVENT()

     Dynamic strings in TRACE_EVENT() are declared with:

         __string(name, source)

     And assigned with:

        __assign_str(name, source)

     In the tracepoint callback of the event, the __string() is used to
     get the size needed to allocate on the ring buffer and
     __assign_str() is used to copy the string into the ring buffer.
     There's a helper structure that is created in the TRACE_EVENT()
     macro logic that will hold the string length and its position in
     the ring buffer which is created by __string().

     There are several trace events that have a function to create the
     string to save. This function is executed twice. Once for
     __string() and again for __assign_str(). There's no reason for
     this. The helper structure could also save the string it used in
     __string() and simply copy that into __assign_str() (it also
     already has its length).

     By using the structure to store the source string for the
     assignment, it means that the second argument to __assign_str() is
     no longer needed.

     It will be removed in the next merge window, but for now add a
     warning if the source string given to __string() is different than
     the source string given to __assign_str(), as the source to
     __assign_str() isn't even used and will be going away.

   - Added checks to make sure that the source of __string() is also the
     source of __assign_str() so that it can be safely removed in the
     next merge window.

     Included fixes that the above check found.

   - Other minor clean ups and fixes"

* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
  tracing: Add __string_src() helper to help compilers not to get confused
  tracing: Use strcmp() in __assign_str() WARN_ON() check
  tracepoints: Use WARN() and not WARN_ON() for warnings
  tracing: Use div64_u64() instead of do_div()
  tracing: Support to dump instance traces by ftrace_dump_on_oops
  tracing: Remove second parameter to __assign_rel_str()
  tracing: Add warning if string in __assign_str() does not match __string()
  tracing: Add __string_len() example
  tracing: Remove __assign_str_len()
  ftrace: Fix most kernel-doc warnings
  tracing: Decrement the snapshot if the snapshot trigger fails to register
  tracing: Fix snapshot counter going between two tracers that use it
  tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
  tracing: Use ? : shortcut in trace macros
  tracing: Do not calculate strlen() twice for __string() fields
  tracing: Rework __assign_str() and __string() to not duplicate getting the string
  cxl/trace: Properly initialize cxl_poison region name
  net: hns3: tracing: fix hclgevf trace event strings
  drm/i915: Add missing ; to __assign_str() macros in tracepoint code
  NFSD: Fix nfsd_clid_class use of __string_len() macro
  ...
2024-03-18 15:11:44 -07:00
Alison Schofield
6c87126096 cxl/trace: Properly initialize cxl_poison region name
The TP_STRUCT__entry that gets assigned the region name, or an
empty string if no region is present, is erroneously initialized
to the cxl_region pointer. It needs to be properly initialized
otherwise it's length is wrong and garbage chars can appear in
the kernel trace output: /sys/kernel/tracing/trace

The bad initialization was due in part to a naming conflict with
the parameter: struct cxl_region *region. The field 'region' is
already exposed externally as the region name, so changing that
to something logical, like 'region_name' is not an option. Instead
rename the internal only struct cxl_region to the commonly used
'cxlr'.

Impact is that tooling depending on that trace data can miss
picking up a valid event when searching by region name. The
TP_printk() output, if enabled, does emit the correct region
names in the dmesg log.

This was found during testing of the cxl-list option to report
media-errors for a region.

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: stable@vger.kernel.org
Fixes: ddf49d57b8 ("cxl/trace: Add TRACE support for CXL media-error records")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-18 10:26:03 -04:00
Dan Williams
88482878c3 Merge branch 'for-6.9/cxl-fixes' into for-6.9/cxl
Pick up a parsing fix for the CDAT SSLBIS structure for v6.9.
2024-03-13 00:17:18 -07:00
Dan Williams
75f4d93ee8 Merge branch 'for-6.9/cxl-einj' into for-6.9/cxl
Pick up support for injecting errors via ACPI EINJ into the CXL protocol
for v6.9.
2024-03-13 00:09:20 -07:00
Dan Williams
d5c0078033 Merge branch 'for-6.9/cxl-qos' into for-6.9/cxl
Pick up support for CXL "HMEM reporting" for v6.9, i.e. build an HMAT
from CXL CDAT and PCIe switch information.
2024-03-13 00:07:36 -07:00
Robert Richter
c6c3187d66 lib/firmware_table: Provide buffer length argument to cdat_table_parse()
There exist card implementations with a CDAT table using a fixed size
buffer, but with entries filled in that do not fill the whole table
length size. Then, the last entry in the CDAT table may not mark the
end of the CDAT table buffer specified by the length field in the CDAT
header. It can be shorter with trailing unused (zero'ed) data. The
actual table length is determined while reading all CDAT entries of
the table with DOE.

If the table is greater than expected (containing zero'ed trailing
data), the CDAT parser fails with:

 [   48.691717] Malformed DSMAS table length: (24:0)
 [   48.702084] [CDAT:0x00] Invalid zero length
 [   48.711460] cxl_port endpoint1: Failed to parse CDAT: -22

In addition, a check of the table buffer length is missing to prevent
an out-of-bound access then parsing the CDAT table.

Hardening code against device returning borked table. Fix that by
providing an optional buffer length argument to
acpi_parse_entries_array() that can be used by cdat_table_parse() to
propagate the buffer size down to its users to check the buffer
length. This also prevents a possible out-of-bound access mentioned.

Add a check to warn about a malformed CDAT table length.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/ZdEnopFO0Tl3t2O1@rric.localdomain
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-13 00:03:21 -07:00
Robert Richter
e0c818e004 cxl/pci: Get rid of pointer arithmetic reading CDAT table
Reading the CDAT table using DOE requires a Table Access Response
Header in addition to the CDAT entry. In current implementation this
has caused offsets with sizeof(__le32) to the actual buffers. This led
to hardly readable code and even bugs. E.g., see fix of devm_kfree()
in read_cdat_data():

 commit c65efe3685 ("cxl/cdat: Free correct buffer on checksum error")

Rework code to avoid calculations with sizeof(__le32). Introduce
struct cdat_doe_rsp for this which contains the Table Access Response
Header and a variable payload size for various data structures
afterwards to access the CDAT table and its CDAT Data Structures
without recalculating buffer offsets.

Cc: Lukas Wunner <lukas@wunner.de>
Cc: Fan Ni <nifan.cxl@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240216155844.406996-3-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 23:52:29 -07:00
Robert Richter
ec8ffff3a9 cxl/pci: Rename DOE mailbox handle to doe_mb
Trivial variable rename for the DOE mailbox handle from cdat_doe to
doe_mb. The variable name cdat_doe is too ambiguous, use doe_mb that
is commonly used for the mailbox.

Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240216155844.406996-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 23:52:29 -07:00
Dave Jiang
99b52aac2d cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location
The 'entry' pointer in cdat_sslbis_handler() is set to header +
sizeof(common header). However, the math missed the addition of the SSLBIS
main header. It should be header + sizeof(common header) + sizeof(*sslbis).
Use a defined struct for all the SSLBIS parts in order to avoid pointer
math errors.

The bug causes incorrect parsing of the SSLBIS table and introduces incorrect
performance values to the access_coordinates during the CXL access_coordinate
calculation path if there are CXL switches present in the topology.

The issue was found during testing of new code being added to add additional
checks for invalid CDAT values during CXL access_coordinate calculation. The
testing was done on qemu with a CXL topology including a CXL switch.

Fixes: 80aa780dda ("cxl: Add callback to parse the SSLBIS subtable from CDAT")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20240301210948.1298075-1-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 23:41:59 -07:00
Ben Cheatham
8039804cfa cxl/core: Add CXL EINJ debugfs files
Export CXL helper functions in einj-cxl.c for getting/injecting
available CXL protocol error types to sysfs under kernel/debug/cxl.

The kernel/debug/cxl/einj_types file will print the available CXL
protocol errors in the same format as the available_error_types
file provided by the einj module. The
kernel/debug/cxl/$dport_dev/einj_inject file is functionally the same
as the error_type and error_inject files provided by the EINJ module,
i.e.: writing an error type into $dport_dev/einj_inject will inject
said error type into the CXL dport represented by $dport_dev.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Link: https://lore.kernel.org/r/20240311142508.31717-4-Benjamin.Cheatham@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 23:08:30 -07:00
Dave Jiang
debdce20c4 cxl/region: Deal with numa nodes not enumerated by SRAT
For the numa nodes that are not created by SRAT, no memory_target is
allocated and is not managed by the HMAT_REPORTING code. Therefore
hmat_callback() memory hotplug notifier will exit early on those NUMA
nodes. The CXL memory hotplug notifier will need to call
node_set_perf_attrs() directly in order to setup the access sysfs
attributes.

In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is
stored. Add a helper function acpi_node_backed_by_real_pxm() in order to
check if a NUMA node id is defined by SRAT or created by CFMWS.

node_set_perf_attrs() symbol is exported to allow update of perf attribs
for a node. The sysfs path of
/sys/devices/system/node/nodeX/access0/initiators/* is created by
node_set_perf_attrs() for the various attributes where nodeX is matched
to the NUMA node of the CXL region.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 14:54:03 -07:00
Dave Jiang
067353a46d cxl/region: Add memory hotplug notifier for cxl region
When the CXL region is formed, the driver computes the performance data
for the region. However this data is not available at the node data
collection that has been populated by the HMAT during kernel
initialization. Add a memory hotplug notifier to update the access
coordinates to the 'struct memory_target' context kept by the
HMAT_REPORTING code.

Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the
priority number to be called before HMAT_CALLBACK_PRI. The CXL update must
happen before hmat_callback().

A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in
order to allow CXL to update the memory_target access coordinates.

A new ext_updated member is added to the memory_target to indicate that
the access coordinates within the memory_target has been updated by an
external agent such as CXL. This prevents data being overwritten by the
hmat_update_target_attrs() triggered by hmat_callback().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Huang, Ying <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:12 -07:00
Dave Jiang
c20eaf4411 cxl/region: Add sysfs attribute for locality attributes of CXL regions
Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
region. The bandwidth is the aggregated bandwidth of all devices that
contribute to the CXL region. The latency is the worst latency of the
device amongst all the devices that contribute to the CXL region.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-11-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
3d9f4a1972 cxl/region: Calculate performance data for a region
Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.

The perf list is expected to be constructed before the endpoint decoders
are registered and thus there should be no early reading of the entries
from the region assemble action. The calling of the region qos calculate
function is under the protection of cxl_dpa_rwsem and will ensure that
all DPA associated work has completed.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
3d8be8b398 cxl: Set cxlmd->endpoint before adding port device
Move setting of cxlmd->endpoint to before calling add_device() on the port
device. Otherwise when referencing cxlmd->endpoint in region discovery code
that is triggered by the port driver probe function, the endpoint port
pointer is not valid.

Current code does not hit this issue yet since cxlmd->endpoint is not being
referenced during region discovery. However follow on code that does
performance calculations will.

Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-9-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
6ef83c4e19 cxl: Move QoS class to be calculated from the nearest CPU
Retrieve the qos_class (QTG ID) using the access coordinates from the
nearest CPU rather than the nearst initiator that may not be a CPU.
This may be the more appropriate number that applications care about.

For most cases, access0 and access1 have the same values.

Link: https://lore.kernel.org/linux-cxl/20240112113023.00006c50@Huawei.com/
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-8-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
863027d409 cxl: Split out host bridge access coordinates
The difference between access class 0 and access class 1 for 'struct
access_coordinate', if any, is that class 0 is for the distance from
the target to the closest initiator and that class 1 is for the distance
from the target to the closest CPU. For CXL memory, the nearest initiator
may not necessarily be a CPU node. The performance path from the CXL
endpoint to the host bridge should remain the same. However, the numbers
extracted and stored from HMAT is the difference for the two access
classes. Split out the performance numbers for the host bridge (generic
target) from the calculation of the entire path in order to allow
calculation of both access classes for a CXL region.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-7-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
032f7b37ad cxl: Split out combine_coordinates() for common shared usage
Refactor the common code of combining coordinates in order to reduce code.
Create a new function cxl_cooordinates_combine() it combine two 'struct
access_coordinate'.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-6-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
bd98cbbbf8 ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes
Update acpi_get_genport_coordinates() to allow retrieval of both access
classes of the 'struct access_coordinate' for a generic target. The update
will allow CXL code to compute access coordinates for both access class.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-5-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dan Williams
5c6224bfab cxl/acpi: Fix load failures due to single window creation failure
The expectation is that cxl_parse_cfwms() continues in the face the of
failure as evidenced by code like:

    cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
    if (IS_ERR(cxlrd))
    	return 0;

There are other error paths in that function which mistakenly follow
idiomatic expectations and return an error when they should not. Most of
those mistakes are innocuous checks that hardly ever fail in practice.
However, a recent change succeed in making the implementation more
fragile by applying an idiomatic, but still wrong "fix" [1]. In this
failure case the kernel reports:

    cxl root0: Failed to populate active decoder targets
    cxl_acpi ACPI0017:00: Failed to add decode range: [mem 0x00000000-0x7fffffff flags 0x200]

...which is a real issue with that one window (to be fixed separately),
but ends up failing the entirety of cxl_acpi_probe().

Undo that recent breakage while also removing the confusion about
ignoring errors. Update all exits paths to return an error per typical
expectations and let an outer wrapper function handle dropping the
error.

Fixes: 91019b5bc7 ("cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()") [1]
Cc: <stable@vger.kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-20 22:58:05 -08:00
Dan Williams
40de53fd00 Merge branch 'for-6.8/cxl-cper' into for-6.8/cxl
Pick up CXL CPER notification removal for v6.8-rc6, to return in a later
merge window.
2024-02-20 22:57:35 -08:00
Dan Williams
f3e6b3ae9c acpi/ghes: Remove CXL CPER notifications
Initial tests with the CXL CPER implementation identified that error
reports were being duplicated in the log and the trace event [1].  Then
it was discovered that the notification handler took sleeping locks
while the GHES event handling runs in spin_lock_irqsave() context [2]

While the duplicate reporting was fixed in v6.8-rc4, the fix for the
sleeping-lock-vs-atomic collision would enjoy more time to settle and
gain some test cycles.  Given how late it is in the development cycle,
remove the CXL hookup for now and try again during the next merge
window.

Note that end result is that v6.8 does not emit CXL CPER payloads to the
kernel log, but this is in line with the CXL trend to move error
reporting to trace events instead of the kernel log.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1]
Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-20 22:50:52 -08:00
Robert Richter
0cab687205 cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
The Linux CXL subsystem is built on the assumption that HPA == SPA.
That is, the host physical address (HPA) the HDM decoder registers are
programmed with are system physical addresses (SPA).

During HDM decoder setup, the DVSEC CXL range registers (cxl-3.1,
8.1.3.8) are checked if the memory is enabled and the CXL range is in
a HPA window that is described in a CFMWS structure of the CXL host
bridge (cxl-3.1, 9.18.1.3).

Now, if the HPA is not an SPA, the CXL range does not match a CFMWS
window and the CXL memory range will be disabled then. The HDM decoder
stops working which causes system memory being disabled and further a
system hang during HDM decoder initialization, typically when a CXL
enabled kernel boots.

Prevent a system hang and do not disable the HDM decoder if the
decoder's CXL range is not found in a CFMWS window.

Note the change only fixes a hardware hang, but does not implement
HPA/SPA translation. Support for this can be added in a follow on
patch series.

Signed-off-by: Robert Richter <rrichter@amd.com>
Fixes: 34e37b4c43 ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240216160113.407141-1-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Dave Jiang
cc214417f0 cxl: Fix sysfs export of qos_class for memdev
Current implementation exports only to
/sys/bus/cxl/devices/.../memN/qos_class. With both ram and pmem exposed,
the second registered sysfs attribute is rejected as duplicate. It's not
possible to create qos_class under the dev_groups via the driver due to
the ram and pmem sysfs sub-directories already created by the device sysfs
groups. Move the ram and pmem qos_class to the device sysfs groups and add
a call to sysfs_update() after the perf data are validated so the
qos_class can be visible. The end results should be
/sys/bus/cxl/devices/.../memN/ram/qos_class and
/sys/bus/cxl/devices/.../memN/pmem/qos_class.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-4-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Dave Jiang
10cb393d76 cxl: Remove unnecessary type cast in cxl_qos_class_verify()
The passed in host bridge parameter for device_for_each_child() has
unnecessary void * type cast. Remove the type cast.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-3-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Dave Jiang
00413c1506 cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
In order to address the issue with being able to expose qos_class sysfs
attributes under 'ram' and 'pmem' sub-directories, the attributes must
be defined as static attributes rather than under driver->dev_groups.
To avoid implementing locking for accessing the 'struct cxl_dpa_perf`
lists, convert the list to a single 'struct cxl_dpa_perf' entry in
preparation to move the attributes to statically defined.

While theoretically a partition may have multiple qos_class via CDAT, this
has not been encountered with testing on available hardware. The code is
simplified for now to not support the complex case until a use case is
needed to support that.

Link: https://lore.kernel.org/linux-cxl/65b200ba228f_2d43c29468@dwillia2-mobl3.amr.corp.intel.com.notmuch/
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-2-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Alison Schofield
cb66b1d60c cxl/region: Allow out of order assembly of autodiscovered regions
Autodiscovered regions can fail to assemble if they are not discovered
in HPA decode order. The user will see failure messages like:

[] cxl region0: endpoint5: HPA order violation region1
[] cxl region0: endpoint5: failed to allocate region reference

The check that is causing the failure helps the CXL driver enforce
a CXL spec mandate that decoders be committed in HPA order. The
check is needless for autodiscovered regions since their decoders
are already programmed. Trying to enforce order in the assembly of
these regions is useless because they are assembled once all their
member endpoints arrive, and there is no guarantee on the order in
which endpoints are discovered during probe.

Keep the existing check, but for autodiscovered regions, allow the
out of order assembly after a sanity check that the lesser numbered
decoder has the lesser HPA starting address.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Link: https://lore.kernel.org/r/3dec69ee97524ab229a20c6739272c3000b18408.1706736863.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Alison Schofield
453a7fde80 cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
In preparation for adding a new caller of cxl_region_find_decoders()
teach it to find a decoder from a cxl_endpoint_decoder structure.

Combining switch and endpoint decoder lookup in one function prevents
code duplication in call sites.

Update the existing caller.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Link: https://lore.kernel.org/r/79ae6d72978ef9f3ceec9722e1cb793820553c8e.1706736863.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Linus Torvalds
e6f39a90de EFI fixes for v6.8 #1
- Tighten ELF relocation checks on the RISC-V EFI stub
 - Give up if the new EFI memory attributes protocol fails spuriously on
   x86
 - Take care not to place the kernel in the lowest 16 MB of DRAM on x86
 - Omit special purpose EFI memory from memblock
 - Some fixes for the CXL CPER reporting code
 - Make the PE/COFF layout of mixed-mode capable images comply with a
   strict interpretation of the spec
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZcDtKAAKCRAwbglWLn0t
 XMDfAP9ttq8Ir4+hp8A0DGE79x6eSgBIkl5ztGmMQGybzEkzdAEAgxfDUieQW4TT
 GmbyGGUouvSYxfZf4gVTQn8b/bd57AI=
 =Af8A
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:
 "The only notable change here is the patch that changes the way we deal
  with spurious errors from the EFI memory attribute protocol. This will
  be backported to v6.6, and is intended to ensure that we will not
  paint ourselves into a corner when we tighten this further in order to
  comply with MS requirements on signed EFI code.

  Note that this protocol does not currently exist in x86 production
  systems in the field, only in Microsoft's fork of OVMF, but it will be
  mandatory for Windows logo certification for x86 PCs in the future.

   - Tighten ELF relocation checks on the RISC-V EFI stub

   - Give up if the new EFI memory attributes protocol fails spuriously
     on x86

   - Take care not to place the kernel in the lowest 16 MB of DRAM on
     x86

   - Omit special purpose EFI memory from memblock

   - Some fixes for the CXL CPER reporting code

   - Make the PE/COFF layout of mixed-mode capable images comply with a
     strict interpretation of the spec"

* tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
  cxl/trace: Remove unnecessary memcpy's
  cxl/cper: Fix errant CPER prints for CXL events
  efi: Don't add memblocks for soft-reserved memory
  efi: runtime: Fix potential overflow of soft-reserved region size
  efi/libstub: Add one kernel-doc comment
  x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR
  x86/efistub: Give up if memory attribute protocol returns an error
  riscv/efistub: Tighten ELF relocation check
  riscv/efistub: Ensure GP-relative addressing is not used
2024-02-09 10:40:50 -08:00
Ira Weiny
dbea519d68 cxl/trace: Remove unnecessary memcpy's
CPER events don't have UUIDs.  Therefore UUIDs were removed from the
records passed to trace events and replaced with hard coded values.

As pointed out by Jonathan, the new defines for the UUIDs present a more
efficient way to assign UUID in trace records.[1]

Replace memcpy's with the use of static data.

[1] https://lore.kernel.org/all/20240108132325.00000e9c@Huawei.com/

Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-03 18:31:17 +01:00
Li Ming
eef5c7b28d cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
The PCI AER model is an awkward fit for CXL error handling. While the
expectation is that a PCI device can escalate to link reset to recover
from an AER event, the same reset on CXL amounts to a surprise memory
hotplug of massive amounts of memory.

At present, the CXL error handler attempts some optimistic error
handling to unbind the device from the cxl_mem driver after reaping some
RAS register values. This results in a "hopeful" attempt to unplug the
memory, but there is no guarantee that will succeed.

A subsequent AER notification after the memdev unbind event can no
longer assume the registers are mapped. Check for memdev bind before
reaping status register values to avoid crashes of the form:

 BUG: unable to handle page fault for address: ffa00000195e9100
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 [...]
 RIP: 0010:__cxl_handle_ras+0x30/0x110 [cxl_core]
 [...]
 Call Trace:
  <TASK>
  ? __die+0x24/0x70
  ? page_fault_oops+0x82/0x160
  ? kernelmode_fixup_or_oops+0x84/0x110
  ? exc_page_fault+0x113/0x170
  ? asm_exc_page_fault+0x26/0x30
  ? __pfx_dpc_reset_link+0x10/0x10
  ? __cxl_handle_ras+0x30/0x110 [cxl_core]
  ? find_cxl_port+0x59/0x80 [cxl_core]
  cxl_handle_rp_ras+0xbc/0xd0 [cxl_core]
  cxl_error_detected+0x6c/0xf0 [cxl_core]
  report_error_detected+0xc7/0x1c0
  pci_walk_bus+0x73/0x90
  pcie_do_recovery+0x23f/0x330

Longer term, the unbind and PCI_ERS_RESULT_DISCONNECT behavior might
need to be replaced with a new PCI_ERS_RESULT_PANIC.

Fixes: 6ac07883db ("cxl/pci: Add RCH downstream port error logging")
Cc: stable@vger.kernel.org
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Ming <ming4.li@intel.com>
Link: https://lore.kernel.org/r/20240129131856.2458980-1-ming4.li@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-29 12:59:07 -08:00
Quanquan Cao
d76779dd36 cxl/region:Fix overflow issue in alloc_hpa()
Creating a region with 16 memory devices caused a problem. The div_u64_rem
function, used for dividing an unsigned 64-bit number by a 32-bit one,
faced an issue when SZ_256M * p->interleave_ways. The result surpassed
the maximum limit of the 32-bit divisor (4G), leading to an overflow
and a remainder of 0.
note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G

To fix this issue, I replaced the div_u64_rem function with div64_u64_rem
and adjusted the type of the remainder.

Signed-off-by: Quanquan Cao <caoqq@fujitsu.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 23a22cd1c9 ("cxl/region: Allocate HPA capacity to regions")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-24 21:03:03 -08:00
Ira Weiny
d72a4caf68 cxl/pci: Skip irq features if MSI/MSI-X are not supported
CXL 3.1 Section 3.1.1 states:

	"A Function on a CXL device must not generate INTx messages if
	that Function participates in CXL.cache protocol or CXL.mem
	protocols."

The generic CXL memory driver only supports devices which use the
CXL.mem protocol.  The current driver attempts to allocate MSI/MSI-X
vectors in anticipation of their need for mailbox interrupts or event
processing.  However, the above requirement does not require a device to
support interrupts, only that they use MSI/MSI-X.  For example, a device
may disable mailbox interrupts and either be configured for firmware
first or skip event processing and function.

Dave Larsen reported that the following Intel / Agilex card does not
support interrupts on function 0.

	CXL: Intel Corporation Device 0ddb (rev 01) (prog-if 10 [CXL Memory Device (CXL 2.x)])

Rather than fail device probe if interrupts are not supported; flag that
irqs are not enabled and avoid features which require interrupts.
Emit messages appropriate for the situation to aid in debugging should
device behavior be unexpected due to a failure to allocate vectors.

Note that it is possible for a device to have host based event
processing through polling.  However, the driver does not support
polling and it is not anticipated to be generally required.  Leave that
functionality to a future patch if such a device comes along.

Reported-by: Dave Larsen <davelarsen58@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-and-tested-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20240117-dont-fail-irq-v2-1-f33f26b0e365@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-22 10:42:00 -08:00
Shiyang Ruan
73bf93edee cxl/core: use sysfs_emit() for attr's _show()
sprintf() is deprecated for sysfs, use preferred sysfs_emit() instead.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20240112062709.2490947-1-ruansy.fnst@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-12 14:47:04 -08:00
Dan Williams
3601311593 Merge branch 'for-6.8/cxl-cper' into for-6.8/cxl
Pick up the CPER to CXL driver integration work for v6.8. Some
additional cleanup of cper_estatus_print() messages is needed, but that
is to be handled incrementally.
2024-01-09 19:21:44 -08:00
Ira Weiny
dc97f6344f cxl/pci: Register for and process CPER events
If the firmware has configured CXL event support to be firmware first
the OS can process those events through CPER records.  The CXL layer has
unique DPA to HPA knowledge and standard event trace parsing in place.

CPER records contain Bus, Device, Function information which can be used
to identify the PCI device which is sending the event.

Change the PCI driver registration to include registration of a CXL
CPER callback to process events through the trace subsystem.

Use new scoped based management to simplify the handling of the PCI
device object.

Tested-by: Smita-Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Smita-Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-9-1bb8a4ca2c7a@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
[djbw: use new pci_dev guard, flip init order]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09 15:41:23 -08:00
Ira Weiny
f9c683386f cxl/events: Create a CXL event union
The CXL CPER and event log records share everything but a UUID/GUID in
their structures.

Define a cxl_event union without the UUID/GUID to be shared between the
CPER and event log record formats.  Adjust the code to use this union.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-6-1bb8a4ca2c7a@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09 15:41:22 -08:00
Ira Weiny
6eade11075 cxl/events: Separate UUID from event structures
The UEFI CXL CPER structure does not include the UUID.  Now that the
UUID is passed separately to the trace event there is no need to have
the UUID in those structures.

Move UUID from the event record header to the raw structures.  Adjust
cxl-test to Create dummy structures for creating test records.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-5-1bb8a4ca2c7a@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09 15:39:38 -08:00
Ira Weiny
207a1f8230 cxl/events: Remove passing a UUID to known event traces
The UUID data is redundant in the known event trace types.  The addition
of static defines allows the trace macros to create the UUID data inside
the trace thus removing unnecessary code.

Have well known trace events use static data to set the uuid field based
on the event type.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-4-1bb8a4ca2c7a@intel.com
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09 15:39:38 -08:00
Ira Weiny
4c115c9c1f cxl/events: Create common event UUID defines
Dan points out in review that the cxl_test code could be made better
through the use of UUID's defines rather than being open coded.[1]

Create UUID defines and use them rather than open coding them.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: http://lore.kernel.org/r/65738d09e30e2_45e0129451@dwillia2-xfh.jf.intel.com.notmuch [1]
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-3-1bb8a4ca2c7a@intel.com
[djbw: clang-format uuid definitions]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09 15:39:38 -08:00
Dan Williams
e16bf7e015 Merge branch 'for-6.7/cxl' into for-6.8/cxl
Pick up a late locking change + fixup that is better as merge window
material than rc material.
2024-01-05 19:24:33 -08:00
Dan Williams
80dda9a69a Merge branch 'for-6.8/cxl-misc' into for-6.8/cxl
Pick up some miscellaneous fixups for v6.8.
2024-01-05 19:03:06 -08:00
Dan Williams
d3953c78fc Merge branch 'for-6.8/cxl-cdat' into for-6.8/cxl
Pick up some follow-on fixes for 'cxl_root' reference count leaks.
2024-01-05 18:59:06 -08:00
Ira Weiny
26a1a86dd0 cxl/events: Promote CXL event structures to a core header
UEFI code can process CXL events through CPER records.  Those records
use almost the same format as the CXL events.

Lift the CXL event structures to a core header to be shared in later
patches.

[jic123: drop "CXL rev 3.0" mention]

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-2-1bb8a4ca2c7a@intel.com
[djbw: add F: entry to maintainers for include/linux/cxl-event.h]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 18:49:40 -08:00
Dave Jiang
321dd36c28 cxl: Refactor to use __free() for cxl_root allocation in cxl_endpoint_port_probe()
Use scope-based resource management __free() macro to drop the open coded
put_device() in cxl_endpoint_port_probe().

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170449247973.3779673.15088722836135359275.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 14:36:29 -08:00
Dave Jiang
66f11890d3 cxl: Refactor to use __free() for cxl_root allocation in cxl_find_nvdimm_bridge()
Use scope-based resource management __free() macro to drop the open coded
put_device() in cxl_find_nvdimm_bridge().

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170449247353.3779673.5963704495491343135.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 14:36:29 -08:00
Dave Jiang
98e7ab3345 cxl: Fix device reference leak in cxl_port_perf_data_calculate()
cxl_port_perf_data_calculate() calls find_cxl_root() and does not
dereference the 'struct device' in the cxl_root->port. find_cxl_root()
calls get_device() and takes a reference on the port 'struct device'
member. Use the __free() macro to ensure the dereference happens.

Fixes: 7a4f148dd8 ("cxl: Compute the entire CXL path latency and bandwidth data")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170449246681.3779673.2288926019977963333.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 14:36:29 -08:00
Dave Jiang
44cd71ef7b cxl: Convert find_cxl_root() to return a 'struct cxl_root *'
Commit 790815902e ("cxl: Add support for _DSM Function for retrieving QTG ID")
introduced 'struct cxl_root', however all usages have been worked
indirectly through cxl_port. Refactor code such as find_cxl_root()
function to use 'struct cxl_root' directly.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170449246044.3779673.13035770941393418591.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 14:36:29 -08:00
Dave Jiang
98856b2ea3 cxl: Introduce put_cxl_root() helper
Add a helper function put_cxl_root() to maintain symmetry for
find_cxl_root() function instead of relying on open coding of the
put_device() in order to dereference the 'struct device' that happens via
get_device() in find_cxl_root().

Suggested-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/170449245417.3779673.4566146351673989387.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 14:36:29 -08:00
Dan Williams
5459e186a5 cxl/port: Fix missing target list lock
cxl_port_setup_targets() modifies the ->targets[] array of a switch
decoder. target_list_show() expects to be able to emit a coherent
snapshot of that array by "holding" ->target_lock for read. The
target_lock is held for write during initialization of the ->targets[]
array, but it is not held for write during cxl_port_setup_targets().

The ->target_lock() predates the introduction of @cxl_region_rwsem. That
semaphore protects changes to host-physical-address (HPA) decode which
is precisely what writes to a switch decoder's target list affects.

Replace ->target_lock with @cxl_region_rwsem.

Now the side-effect of snapshotting a unstable view of a decoder's
target list is likely benign so the Fixes: tag is presumptive.

Fixes: 27b3f8d138 ("cxl/region: Program target lists")
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-04 13:59:50 -08:00
Huang Ying
d6488fee66 cxl/port: Fix decoder initialization when nr_targets > interleave_ways
The decoder_populate_targets() helper walks all of the targets in a port
and makes sure they can be looked up in @target_map. Where @target_map
is a lookup table from target position to target id (corresponding to a
cxl_dport instance). However @target_map is only responsible for
conveying the active dport instances as indicated by interleave_ways.

When nr_targets > interleave_ways it results in
decoder_populate_targets() walking off the end of the valid entries in
@target_map. Given target_map is initialized to 0 it results in the
dport lookup failing if position 0 is not mapped to a dport with an id
of 0:

  cxl_port port3: Failed to populate active decoder targets
  cxl_port port3: Failed to add decoder
  cxl_port port3: Failed to add decoder3.0
  cxl_bus_probe: cxl_port port3: probe: -6

This bug also highlights that when the decoder's ->targets[] array is
written in cxl_port_setup_targets() it is missing a hold of the
targets_lock to synchronize against sysfs readers of the target list. A
fix for that is saved for a later patch.

Fixes: a5c2580216 ("cxl/bus: Populate the target list at decoder create")
Cc:  <stable@vger.kernel.org>
Signed-off-by: Huang, Ying <ying.huang@intel.com>
[djbw: rewrite the changelog, find the Fixes: tag]
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-04 13:59:50 -08:00
Jim Harris
c7ad3dc364 cxl/region: fix x9 interleave typo
CXL supports x3, x6 and x12 - not x9.

Fixes: 80d10a6cee ("cxl/region: Add interleave geometry attributes")
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/169904271254.204936.8580772404462743630.stgit@ubuntu
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-03 17:55:20 -08:00
Ira Weiny
6d0fc416c4 cxl/trace: Pass UUID explicitly to event traces
CXL CPER events are identified by the CPER Section Type GUID. The GUID
correlates with the CXL UUID for the event record. It turns out that a
CXL CPER record is a strict subset of the CXL event record, only the
UUID header field is chopped.

In order to unify handling between native and CPER flavors of CXL
events, prepare the code for the UUID to be passed in rather than
inferred from the record itself.

Later patches update the passed in record to only refer to the common
data between the formats.

Pass the UUID explicitly to each trace event to be able to remove the
UUID from the event structures.

Originally it was desirable to remove the UUID from the well known event
because the UUID value was redundant.  However, the trace API was
already in place.[1]

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/all/36f2d12934d64a278f2c0313cbd01abc@huawei.com [1]
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-1-1bb8a4ca2c7a@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-03 14:15:52 -08:00
Dan Williams
11c8393202 Merge branch 'for-6.8/cxl-cdat' into for-6.8/cxl
Pick up the CDAT parsing and QOS class infrastructure for v6.8.
2024-01-02 11:03:04 -08:00
Randy Dunlap
58f1e9d3a3 cxl/region: use %pap format to print resource_size_t
Use "%pap" to print a resource_size_t (phys_addr_t derived type)
to prevent build warnings on 32-bit arches (seen on i386 and
riscv-32).

../drivers/cxl/core/region.c: In function 'alloc_hpa':
../drivers/cxl/core/region.c:556:25: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=]
  556 |                         "HPA allocation error (%ld) for size:%#llx in %s %pr\n",

Fixes: 7984d22f13 ("cxl/region: Add dev_dbg() detail on failure to allocate HPA space")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Fan Ni <fan.ni@samsung.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc:  <linux-cxl@vger.kernel.org>
Link: https://lore.kernel.org/r/20240102173917.19718-1-rdunlap@infradead.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-02 11:02:49 -08:00
Alison Schofield
7984d22f13 cxl/region: Add dev_dbg() detail on failure to allocate HPA space
When the region driver fails while allocating HPA space for a
new region it can be because the parent resource, the CXL Window,
has no more available space.

In that case, the debug user sees this message:
cxl_core:alloc_hpa:555: cxl region2: failed to allocate HPA: -34

Expand the message like this:
cxl_core:alloc_hpa:555: cxl region8: HPA allocation error (-34) for size:0x20000000 in CXL Window 0 [mem 0xf010000000-0xf04fffffff flags 0x200]

Now the debug user can examine /proc/iomem and consider actions
like removing other allocations in that space or reducing the
size of their region request.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20231223004740.1401858-1-alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-24 13:43:48 -08:00
Dave Jiang
185c1a489f cxl: Check qos_class validity on memdev probe
Add a check to make sure the qos_class for the device will match one of
the root decoders qos_class. If no match is found, then the qos_class for
the device is set to invalid. Also add a check to ensure that the device's
host bridge matches to one of the root decoder's downstream targets.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319626313.2212653.9021004640856081917.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:46:02 -08:00
Dave Jiang
42834b17cf cxl: Export sysfs attributes for memory device QoS class
Export qos_class sysfs attributes for the CXL memory device. The QoS clas
should show up as /sys/bus/cxl/devices/memX/ram/qos_class for the volatile
partition and /sys/bus/cxl/devices/memX/pmem/qos_class for the persistent
partition. The QTG ID is retrieved via _DSM after supplying the
calculated bandwidth and latency for the entire CXL path from device to
the CPU. This ID is used to match up to the root decoder QoS class to
determine which CFMWS the memory range of a hotplugged CXL mem device
should be assigned under.

While there may be multiple DSMAS exported by the device CDAT, the driver
will only expose the first QTG ID per partition in sysfs for now. In the
future when multiple QTG IDs are necessary, they can be exposed. [1]

[1]: https://lore.kernel.org/linux-cxl/167571650007.587790.10040913293130712882.stgit@djiang5-mobl3.local/T/#md2a47b1ead3e1ba08f50eab29a4af1aed1d215ab

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319625698.2212653.17544381274847420961.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:46:02 -08:00
Dave Jiang
86557b7edf cxl: Store QTG IDs and related info to the CXL memory device context
Once the QTG ID _DSM is executed successfully, the QTG ID is retrieved from
the return package. Create a list of entries in the cxl_memdev context and
store the QTG ID as qos_class token and the associated DPA range. This
information can be exposed to user space via sysfs in order to help region
setup for hot-plugged CXL memory devices.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319625109.2212653.11872111896220384056.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:31:53 -08:00
Dave Jiang
7a4f148dd8 cxl: Compute the entire CXL path latency and bandwidth data
CXL Memory Device SW Guide [1] rev1.0 2.11.2 provides instruction on how to
calculate latency and bandwidth for CXL memory device. Calculate minimum
bandwidth and total latency for the path from the CXL device to the root
port. The QTG id is retrieved by providing the performance data as input
and calling the root port callback ->get_qos_class(). The retrieved id is
stored with the cxl_port of the CXL device.

For example for a device that is directly attached to a host bus:
Total Latency = Device Latency (from CDAT) + Dev to Host Bus (HB) Link
		Latency + Generic Port Latency
Min Bandwidth = Min bandwidth for link bandwidth between HB
		and CXL device, device CDAT bandwidth, and Generic Port
		Bandwidth

For a device that has a switch in between host bus and CXL device:
Total Latency = Device (CDAT) Latency + Dev to Switch Link Latency +
		Switch (CDAT) Latency + Switch to HB Link Latency +
		Generic Port Latency
Min Bandwidth = Min bandwidth for link bandwidth between CXL device
		to CXL switch, CXL device CDAT bandwidth, CXL switch CDAT
		bandwidth, CXL switch to HB bandwidth, and Generic Port
		Bandwidth.

[1]: https://cdrdv2-public.intel.com/643805/643805_CXL%20Memory%20Device%20SW%20Guide_Rev1p0.pdf

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319624458.2212653.13252496567443656371.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:31:52 -08:00
Dave Jiang
14a6960b3e cxl: Add helper function that calculate performance data for downstream ports
The CDAT information from the switch, Switch Scoped Latency and Bandwidth
Information Structure (SSLBIS), is parsed and stored under a cxl_dport
based on the correlated downstream port id from the SSLBIS entry. Walk
the entire CXL port paths and collect all the performance data. Also
pick up the link latency number that's stored under the dports. The
entire path PCIe bandwidth can be retrieved using the
pcie_bandwidth_available() call.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319623824.2212653.10302079766473698427.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:31:52 -08:00
Dave Jiang
1037b82fcc cxl: Store the access coordinates for the generic ports
Each CXL host bridge is represented by an ACPI0016 device. A generic port
device handle that is an ACPI device is represented by a string of
ACPI0016 device HID and UID. Create a device handle from the ACPI device
and retrieve the access coordinates from the stored memory targets. The
access coordinates are stored under the cxl_dport that is associated with
the CXL host bridge.

The access coordinates struct is dynamically allocated under cxl_dport in
order for code later on to detect whether the data exists or not.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319623196.2212653.17916695743464172534.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:31:52 -08:00
Dave Jiang
4d07a05397 cxl: Calculate and store PCI link latency for the downstream ports
The latency is calculated by dividing the flit size over the bandwidth. Add
support to retrieve the flit size for the CXL switch device and calculate
the latency of the PCIe link. Cache the latency number with cxl_dport.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319621931.2212653.6800240203604822886.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:53:49 -08:00
Dave Jiang
790815902e cxl: Add support for _DSM Function for retrieving QTG ID
CXL spec v3.0 9.17.3 CXL Root Device Specific Methods (_DSM)

Add support to retrieve QTG ID via ACPI _DSM call. The _DSM call requires
an input of an ACPI package with 4 dwords (read latency, write latency,
read bandwidth, write bandwidth). The call returns a package with 1 WORD
that provides the max supported QTG ID and a package that may contain 0 or
more WORDs as the recommended QTG IDs in the recommended order.

Create a cxl_root container for the root cxl_port and provide a callback
->get_qos_class() in order to retrieve the QoS class. For the ACPI case,
the _DSM helper is used to retrieve the QTG ID and returned. A
devm_cxl_add_root() function is added for root port setup and registration
of the cxl_root callback operation(s).

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/170319621294.2212653.1649682083061569256.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:33:28 -08:00
Dave Jiang
80aa780dda cxl: Add callback to parse the SSLBIS subtable from CDAT
Provide a callback to parse the Switched Scoped Latency and Bandwidth
Information Structure (SSLBIS) in the CDAT structures. The SSLBIS
contains the bandwidth and latency information that's tied to the
CXL switch that the data table has been read from. The extracted
values are stored to the cxl_dport correlated by the port_id
depending on the SSLBIS entry.

Coherent Device Attribute Table 1.03 2.1 Switched Scoped Latency
and Bandwidth Information Structure (DSLBIS)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319620635.2212653.5194389158785365150.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:33:28 -08:00
Dave Jiang
63cef81b9d cxl: Add callback to parse the DSLBIS subtable from CDAT
Provide a callback to parse the Device Scoped Latency and Bandwidth
Information Structure (DSLBIS) in the CDAT structures. The DSLBIS
contains the bandwidth and latency information that's tied to a DSMAS
handle. The driver will retrieve the read and write latency and
bandwidth associated with the DSMAS which is tied to a DPA range.

Coherent Device Attribute Table 1.03 2.1 Device Scoped Latency and
Bandwidth Information Structure (DSLBIS)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319620005.2212653.7475488478229720542.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:33:28 -08:00
Dave Jiang
ad6f04c026 cxl: Add callback to parse the DSMAS subtables from CDAT
Provide a callback function to the CDAT parser in order to parse the
Device Scoped Memory Affinity Structure (DSMAS). Each DSMAS structure
contains the DPA range and its associated attributes in each entry. See
the CDAT specification for details. The device handle and the DPA range
is saved and to be associated with the DSLBIS locality data when the
DSLBIS entries are parsed. The xarray is a local variable. When the
total path performance data is calculated and storred this xarray can be
discarded.

Coherent Device Attribute Table 1.03 2.1 Device Scoped memory Affinity
Structure (DSMAS)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319619355.2212653.2675953129671561293.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:33:10 -08:00
Dave Jiang
ace196de69 cxl: Fix unregister_region() callback parameter assignment
In devm_cxl_add_region(), devm_add_action_or_reset() is called by
passing in unregister_region() with data ptr of 'cxlr'. However, in
unregister_region(), the passed in parameter is incorrectly assumed to
be a 'struct device' rather than the 'cxlr' pointer. The code has been
working because 'struct device' is the first member of 'struct
cxl_region'. Issue found by inspection. Fix the assignment so that cxlr
is pointing directly to the passed in parameter.

Not flagged for -stable since there is no functional impact of this fix.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/170258123810.952211.3907381447996426480.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-18 15:35:00 -08:00
Ira Weiny
ef3d5cf9c5 cxl/pmu: Ensure put_device on pmu devices
The following kmemleaks were detected when removing the cxl module
stack:

unreferenced object 0xffff88822616b800 (size 1024):
...
  backtrace:
    [<00000000bedc6f83>] kmalloc_trace+0x26/0x90
    [<00000000448d1afc>] devm_cxl_pmu_add+0x3a/0x110 [cxl_core]
    [<00000000ca3bfe16>] 0xffffffffa105213b
    [<00000000ba7f78dc>] local_pci_probe+0x41/0x90
    [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0
...
unreferenced object 0xffff8882260abcc0 (size 16):
...
  hex dump (first 16 bytes):
    70 6d 75 5f 6d 65 6d 30 2e 30 00 26 82 88 ff ff  pmu_mem0.0.&....
  backtrace:
...
    [<00000000152b5e98>] dev_set_name+0x43/0x50
    [<00000000c228798b>] devm_cxl_pmu_add+0x102/0x110 [cxl_core]
    [<00000000ca3bfe16>] 0xffffffffa105213b
    [<00000000ba7f78dc>] local_pci_probe+0x41/0x90
    [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0
...
unreferenced object 0xffff8882272af200 (size 256):
...
  backtrace:
    [<00000000bedc6f83>] kmalloc_trace+0x26/0x90
    [<00000000a14d1813>] device_add+0x4ea/0x890
    [<00000000a3f07b47>] devm_cxl_pmu_add+0xbe/0x110 [cxl_core]
    [<00000000ca3bfe16>] 0xffffffffa105213b
    [<00000000ba7f78dc>] local_pci_probe+0x41/0x90
    [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0
...

devm_cxl_pmu_add() correctly registers a device remove function but it
only calls device_del() which is only part of device unregistration.

Properly call device_unregister() to free up the memory associated with
the device.

Fixes: 1ad3f701c3 ("cxl/pci: Find and register CXL PMU devices")
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231016-pmu-unregister-fix-v1-1-1e2eb2fa3c69@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-14 21:54:45 -08:00
Ira Weiny
c65efe3685 cxl/cdat: Free correct buffer on checksum error
The new 6.7-rc1 kernel now checks the checksum on CDAT data.  While
using a branch of Fan's DCD qemu work (and specifying DCD devices), the
following splat was observed.

	WARNING: CPU: 1 PID: 1384 at drivers/base/devres.c:1064 devm_kfree+0x4f/0x60
	...
	RIP: 0010:devm_kfree+0x4f/0x60
	...
 	? devm_kfree+0x4f/0x60
 	read_cdat_data+0x1a0/0x2a0 [cxl_core]
 	cxl_port_probe+0xdf/0x200 [cxl_port]
	...

The issue in qemu is still unknown but the spat is a straight forward
bug in the CDAT checksum processing code.  Use a CDAT buffer variable to
ensure the devm_free() works correctly on error.

Fixes: 670e4e88f3 ("cxl: Add checksum verification to CDAT from CXL")
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: http://lore.kernel.org/r/20231116-fix-cdat-devm-free-v1-1-b148b40707d7@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-08 16:14:28 -08:00
Dan Williams
6f5c4eca48 cxl/hdm: Fix dpa translation locking
The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
in the helper. Otherwise, it triggers multiple lockdep reports:

1/ Tracing callbacks are in an atomic context that can not acquire sleeping
locks:

    BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
    preempt_count: 2, expected: 0
    RCU nest depth: 0, expected: 0
    [..]
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Call Trace:
     <TASK>
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]

2/ The rwsem is already held in the inject poison path:

    WARNING: possible recursive locking detected
    6.7.0-rc2+ #12 Tainted: G        W  OE    N
    --------------------------------------------
    bash/1288 is trying to acquire lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]

    but task is already holding lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
    [..]
    Call Trace:
     <TASK>
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
     __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
     cxl_inject_poison+0x1bc/0x1e0 [cxl_core]

This appears to have been an issue since the initial implementation and
uncovered by the new cxl-poison.sh test [1]. That test is now passing with
these changes.

Fixes: 28a3ae4ff6 ("cxl/trace: Add an HPA to cxl_poison trace events")
Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
Cc: <stable@vger.kernel.org>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-07 19:14:04 -08:00
Davidlohr Bueso
cb46fca88d cxl: Add Support for Get Timestamp
Add the call to the UAPI such that userspace may corelate the
timestamps from the device log with system wall time, if, for
example there's any sort of inaccuracy or skew in the device.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230829152014.15452-1-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-07 12:44:19 -08:00
Alison Schofield
0e33ac9c3f cxl/memdev: Hold region_rwsem during inject and clear poison ops
Poison inject and clear are supported via debugfs where a privileged
user can inject and clear poison to a device physical address.

Commit 458ba8189c ("cxl: Add cxl_decoders_committed() helper")
added a lockdep assert that highlighted a gap in poison inject and
clear functions where holding the dpa_rwsem does not assure that a
a DPA is not added to a region.

The impact for inject and clear is that if the DPA address being
injected or cleared has been attached to a region, but not yet
committed, the dev_dbg() message intended to alert the debug user
that they are acting on a mapped address is not emitted. Also, the
cxl_poison trace event that serves as a log of the inject and clear
activity will not include region info.

Close this gap by snapshotting an unchangeable region state during
poison inject and clear operations. That means holding both the
region_rwsem and the dpa_rwsem during the inject and clear ops.

Fixes: d2fbc48658 ("cxl/memdev: Add support for the Inject Poison mailbox command")
Fixes: 9690b07748 ("cxl/memdev: Add support for the Clear Poison mailbox command")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/08721dc1df0a51e4e38fecd02425c3475912dfd5.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-29 17:23:03 -08:00
Alison Schofield
5558b92e8d cxl/core: Always hold region_rwsem while reading poison lists
A read of a device poison list is triggered via a sysfs attribute
and the results are logged as kernel trace events of type cxl_poison.
The work is managed by either: a) the region driver when one of more
regions map the device, or by b) the memdev driver when no regions
map the device.

In the case of a) the region driver holds the region_rwsem while
reading the poison by committed endpoint decoder mappings and for
any unmapped resources. This makes sure that the cxl_poison trace
event trace reports valid region info. (Region name, HPA, and UUID).

In the case of b) the memdev driver holds the dpa_rwsem preventing
new DPA resources from being attached to a region. However, it leaves
a gap between region attach and decoder commit actions. If a DPA in
the gap is in the poison list, the cxl_poison trace event will omit
the region info.

Close the gap by holding the region_rwsem and the dpa_rwsem when
reading poison per memdev. Since both methods now hold both locks,
down_read both from the caller. Doing so also addresses the lockdep
assert that found this issue:
Commit 458ba8189c ("cxl: Add cxl_decoders_committed() helper")

Fixes: f0832a5863 ("cxl/region: Provide region info to the cxl_poison trace event")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-29 17:03:53 -08:00
Dave Jiang
36a1c2ee50 cxl/hdm: Fix a benign lockdep splat
The new helper "cxl_num_decoders_committed()" added a lockdep assertion
to validate that port->commit_end is protected against modification.
That assertion fires in init_hdm_decoder() where it is initializing
port->commit_end. Given that it is both accessing and writing that
property it obstensibly needs the lock.

In practice, CXL decoder commit rules (must commit in order) and the
in-order discovery of device decoders makes the manipulation of
->commit_end in init_hdm_decoder() safe. However, rather than rely on
the subtle rules of CXL hardware, just make the implementation obviously
correct from a software perspective.

The Fixes: tag is only for cleaning up a lockdep splat, there is no
functional issue addressed by this fix.

Fixes: 458ba8189c ("cxl: Add cxl_decoders_committed() helper")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-22 16:34:30 -08:00
Terry Bowman
b3741ac86c cxl/pci: Change CXL AER support check to use native AER
Native CXL protocol errors are delivered to the OS through AER
reporting. The owner of AER owns CXL Protocol error management with
respect to _OSC negotiation.[1] CXL device errors are handled by a
separate interrupt with native control gated by _OSC control field
'CXL Memory Error Reporting Control'.

The CXL driver incorrectly checks for 'CXL Memory Error Reporting
Control' before accessing AER registers and caching RCH downport
AER registers. Replace the current check in these 2 cases with
native AER checks.

[1] CXL 3.0 - 9.17.2 CXL _OSC, Table-9-26, Interpretation of CXL
_OSC Support Fields, p.641

Fixes: f05fd10d13 ("cxl/pci: Add RCH downstream port AER register discovery")
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Link: https://lore.kernel.org/r/20231102155232.1421261-1-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-02 14:09:01 -07:00
Dan Williams
5d09c63f11 cxl/hdm: Remove broken error path
Dan reports that cxl_decoder_commit() potentially leaks a hold of
cxl_dpa_rwsem. The potential error case is a "should not" happen
scenario, turn it into a "can not" happen scenario by adding the error
check to cxl_port_setup_targets() where other setting validation occurs.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: http://lore.kernel.org/r/63295673-5d63-4919-b851-3b06d48734c0@moroto.mountain
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Fixes: 176baefb2e ("cxl/hdm: Commit decoder state to hardware")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31 14:10:04 -07:00
Dan Carpenter
69d56b15a7 cxl/hdm: Fix && vs || bug
If "info" is NULL then this code will crash.  || was intended instead of
&&.

Fixes: 8ce520fdea ("cxl/hdm: Use stored Component Register mappings to map HDM decoder capability")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/60028378-d3d5-4d6d-90fd-f915f061e731@moroto.mountain
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31 14:09:50 -07:00
Dan Williams
b3cfdbf6a0 Merge branch 'for-6.7/cxl-commited' into cxl/next
Add the committed decoder sysfs attribute for v6.7.
2023-10-31 11:00:08 -07:00
Dan Williams
de5512b2a2 Merge branch 'for-6.7/cxl' into cxl/next
Pickup some misc. CXL updates for v6.7.
2023-10-31 10:59:44 -07:00
Dan Williams
624eda92ab Merge branch 'for-6.7/cxl-qtg' into cxl/next
Merge some prep-work for CXL QOS class support. This cycle saw large
collisions with mm on this topic, so the bulk of this topic needs to
wait.
2023-10-31 10:59:26 -07:00
Dan Williams
7f946e6d83 Merge branch 'for-6.7/cxl-rch-eh' into cxl/next
Restricted CXL Host (RCH) Error Handling undoes the topology munging of
CXL 1.1 to enabled some AER recovery, and lands some base infrastructure
for handling Root-Complex-Event-Collectors (RCECs) with CXL. Include
this long running series finally for v6.7.
2023-10-31 10:59:00 -07:00
Dave Jiang
8358e8f159 cxl: Add support for reading CXL switch CDAT table
Add read_cdat_data() call in cxl_switch_port_probe() to allow
reading of CDAT data for CXL switches. read_cdat_data() needs
to be adjusted for the retrieving of the PCIe device depending
on if the passed in port is endpoint or switch.

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169713682855.2205276.6418370379144967443.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:48:02 -07:00
Dave Jiang
670e4e88f3 cxl: Add checksum verification to CDAT from CXL
A CDAT table is available from a CXL device. The table is read by the
driver and cached in software. With the CXL subsystem needing to parse the
CDAT table, the checksum should be verified. Add checksum verification
after the CDAT table is read from device.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169713682277.2205276.2687265961314933628.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:48:02 -07:00
Dave Jiang
529c0a4404 cxl: Export QTG ids from CFMWS to sysfs as qos_class attribute
Export the QoS Throttling Group ID from the CXL Fixed Memory Window
Structure (CFMWS) under the root decoder sysfs attributes as qos_class.

CXL rev3.0 9.17.1.3 CXL Fixed Memory Window Structure (CFMWS)

cxl cli will use this id to match with the _DSM retrieved id for a
hot-plugged CXL memory device DPA memory range to make sure that the
DPA range is under the right CFMWS window.

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169713681699.2205276.14475306324720093079.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:48:02 -07:00
Dave Jiang
05e37b2138 cxl: Add decoders_committed sysfs attribute to cxl_port
This attribute allows cxl-cli to determine whether there are decoders
committed to a memdev.  This is only a snapshot of the state, and
doesn't offer any protection or serialization against a concurrent
disable-region operation.

Reviewed-by: Jim Harris <jim.harris@samsung.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169747907439.272156.10261062080830155662.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:29:41 -07:00
Dave Jiang
458ba8189c cxl: Add cxl_decoders_committed() helper
Add a helper to retrieve the number of decoders committed for the port.
Replace all the open coding of the calculation with the helper.

Link: https://lore.kernel.org/linux-cxl/651c98472dfed_ae7e729495@dwillia2-xfh.jf.intel.com.notmuch/
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/169747906849.272156.1729290904857372335.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:29:41 -07:00
Robert Richter
e8db070160 cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for devm
struct cxl_register_map carries a @dev parameter for devm operations.
Simplify the function interface to use that instead of a separate @dev
argument.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-21-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:39 -07:00
Robert Richter
d3970f006f cxl/core/regs: Rename phys_addr in cxl_map_component_regs()
Trivial change that renames variable phys_addr in
cxl_map_component_regs() to shorten its length to keep the 80 char
size limit for the line and also for consistency between the different
paths.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-20-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:39 -07:00
Terry Bowman
d1a9def33d cxl/pci: Disable root port interrupts in RCH mode
The RCH root port contains root command AER registers that should not be
enabled.[1] Disable these to prevent root port interrupts.

[1] CXL 3.0 - 12.2.1.1 RCH Downstream Port-detected Errors

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-17-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:38 -07:00
Terry Bowman
6ac07883db cxl/pci: Add RCH downstream port error logging
RCH downstream port error logging is missing in the current CXL driver. The
missing AER and RAS error logging is needed for communicating driver error
details to userspace. Update the driver to include PCIe AER and CXL RAS
error logging.

Add RCH downstream port error handling into the existing RCiEP handler.
The downstream port error handler is added to the RCiEP error handler
because the downstream port is implemented in a RCRB, is not PCI
enumerable, and as a result is not directly accessible to the PCI AER
root port driver. The AER root port driver calls the RCiEP handler for
handling RCD errors and RCH downstream port protocol errors.

Update existing RCiEP correctable and uncorrectable handlers to also call
the RCH handler. The RCH handler will read the RCH AER registers, check for
error severity, and if an error exists will log using an existing kernel
AER trace routine. The RCH handler will also log downstream port RAS errors
if they exist.

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-16-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:38 -07:00
Terry Bowman
6c5f3aacb2 cxl/pci: Map RCH downstream AER registers for logging protocol errors
The restricted CXL host (RCH) error handler will log protocol errors
using AER and RAS status registers. The AER and RAS registers need to
be virtually memory mapped before enabling interrupts. Create the
initializer function devm_cxl_setup_parent_dport() for this when the
endpoint is connected with the dport. The initialization sets up the
RCH RAS and AER mappings.

Add 'struct cxl_regs' to 'struct cxl_dport' for saving a pointer to
the RCH downstream port's AER and RAS registers.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-15-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:38 -07:00
Terry Bowman
bf6c9fa846 cxl/pci: Update CXL error logging to use RAS register address
The CXL error handler currently only logs endpoint RAS status. The CXL
topology includes several components providing RAS details to be logged
during error handling.[1] Update the current handler's RAS logging to use a
RAS register address. Also, update the error handler function names to be
consistent with correctable and uncorrectable RAS. This will allow for
adding support to log other CXL component's RAS details in the future.

[1] CXL3.0 Table 8-22 CXL_Capability_ID Assignment

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-14-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:38 -07:00
Terry Bowman
6777877eb7 PCI/AER: Refactor cper_print_aer() for use by CXL driver module
The CXL driver plans to use cper_print_aer() for logging restricted CXL
host (RCH) AER errors. cper_print_aer() is not currently exported and
therefore not usable by the CXL drivers built as loadable modules. Export
the cper_print_aer() function. Use the EXPORT_SYMBOL_NS_GPL() variant
to restrict the export to CXL drivers.

The CONFIG_ACPI_APEI_PCIEAER kernel config is currently used to enable
cper_print_aer(). cper_print_aer() logs the AER registers and is
useful in PCIE AER logging outside of APEI. Remove the
CONFIG_ACPI_APEI_PCIEAER dependency to enable cper_print_aer().

The cper_print_aer() function name implies CPER specific use but is useful
in non-CPER cases as well. Rename cper_print_aer() to pci_print_aer().

Also, update cxl_core to import CXL namespace imports.

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-13-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:38 -07:00
Robert Richter
f05fd10d13 cxl/pci: Add RCH downstream port AER register discovery
Restricted CXL host (RCH) downstream port AER information is not currently
logged while in the error state. One problem preventing the error logging
is the AER and RAS registers are not accessible. The CXL driver requires
changes to find RCH downstream port AER and RAS registers for purpose of
error logging.

RCH downstream ports are not enumerated during a PCI bus scan and are
instead discovered using system firmware, ACPI in this case.[1] The
downstream port is implemented as a Root Complex Register Block (RCRB).
The RCRB is a 4k memory block containing PCIe registers based on the PCIe
root port.[2] The RCRB includes AER extended capability registers used for
reporting errors. Note, the RCH's AER Capability is located in the RCRB
memory space instead of PCI configuration space, thus its register access
is different. Existing kernel PCIe AER functions can not be used to manage
the downstream port AER capabilities and RAS registers because the port was
not enumerated during PCI scan and the registers are not PCI config
accessible.

Discover RCH downstream port AER extended capability registers. Use MMIO
accesses to search for extended AER capability in RCRB register space.

[1] CXL 3.0 Spec, 9.11.2 - System Firmware View of CXL 1.1 Hierarchy
[2] CXL 3.0 Spec, 8.2.1.1 - RCH Downstream Port RCRB

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-12-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:38 -07:00
Robert Richter
a2fcb84a19 cxl/port: Remove Component Register base address from struct cxl_port
The Component Register base address @component_reg_phys is no longer
used after the rework of the Component Register setup which now uses
struct member @reg_map instead. Remove the base address.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-10-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:37 -07:00
Robert Richter
f611d98a00 cxl/pci: Remove Component Register base address from struct cxl_dev_state
The Component Register base address @component_reg_phys is no longer
used after the rework of the Component Register setup which now uses
struct member @reg_map instead. Remove the base address.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-9-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:37 -07:00
Robert Richter
8ce520fdea cxl/hdm: Use stored Component Register mappings to map HDM decoder capability
Now, that the Component Register mappings are stored, use them to
enable and map the HDM decoder capabilities. The Component Registers
do not need to be probed again for this, remove probing code.

The HDM capability applies to Endpoints, USPs and VH Host Bridges. The
Endpoint's component register mappings are located in the cxlds and
else in the port's structure. Duplicate the cxlds->reg_map in
port->reg_map for endpoint ports.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
[rework to drop cxl_port_get_comp_map()]
Link: https://lore.kernel.org/r/20231018171713.1883517-8-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:37 -07:00
Robert Richter
2dd1827920 cxl/pci: Store the endpoint's Component Register mappings in struct cxl_dev_state
Same as for ports and dports, also store the endpoint's Component
Register mappings, use struct cxl_dev_state for that.

Keep the Component Register base address @component_reg_phys a bit to
not break functionality. It will be removed after the transition in a
later patch.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-7-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:37 -07:00
Robert Richter
4d758764e7 cxl/port: Pre-initialize component register mappings
The component registers of a component may not exist and
cxl_setup_comp_regs() will fail for that reason. In another case,
Software may not use and set those registers up. cxl_setup_comp_regs()
is then called with a base address of CXL_RESOURCE_NONE. Both are
valid cases, but the function returns without initializing the
register map.

Now, a missing component register block is not necessarily a reason to
fail (feature is optional or its existence checked later). Change
cxl_setup_comp_regs() to also use components with the component
register block missing. Thus, always initialize struct
cxl_register_map with valid values, set @dev and make @resource
CXL_RESOURCE_NONE.

The change is in preparation of follow-on patches.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-6-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:37 -07:00
Robert Richter
d8add49263 cxl/port: Rename @comp_map to @reg_map in struct cxl_register_map
Name the field @reg_map, because @reg_map->host will be used for
mapping operations beyond component registers (i.e. AER registers).
This is valid for all occurrences of @comp_map. Change them all.

Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-5-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:37 -07:00
Dan Williams
33d9c987bf cxl/port: Fix @host confusion in cxl_dport_setup_regs()
commit 5d2ffbe4b8 ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport")

...moved the dport component registers from a raw component_reg_phys
passed in at dport instantiation time to a 'struct cxl_register_map'
populated with both the component register data *and* the "host" device
for mapping operations.

While typical CXL switch dports are mapped by their associated 'struct
cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait
until the cxl_mem driver makes the attachment to map the registers. This
is because there are no intervening 'struct cxl_port' instances between
the root cxl_port and the endpoint port in an RCH topology.

For now just mark the host as NULL in the RCH dport case until code that
needs to map the dport registers arrives.

This patch is not flagged for -stable since nothing in the current
driver uses the dport->comp_map.

Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a
wrong value and then cxl_dport_setup_regs() fixes it up, but the
alternatives I came up with are more messy. For example, adding an
@logdev to 'struct cxl_register_map' that the dev_printk()s can fall
back to when @host is NULL. I settled on "post-fixup+comment" since it
is only RCH dports that have this special case where register probing is
split between a host-bridge RCRB lookup and when cxl_mem_probe() does
the association of the cxl_memdev and endpoint port.

[moved rename of @comp_map to @reg_map into next patch]

Fixes: 5d2ffbe4b8 ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport")
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:36 -07:00
Robert Richter
dd22581f89 cxl/core/regs: Rename @dev to @host in struct cxl_register_map
The primary role of @dev is to host the mappings for devm operations.
@dev is too ambiguous as a name. I.e. when does @dev refer to the
'struct device *' instance that the registers belong, and when does
@dev refer to the 'struct device *' instance hosting the mapping for
devm operations?

Clarify the role of @dev in cxl_register_map by renaming it to @host.
Also, rename local variables to 'host' where map->host is used.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-3-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:36 -07:00
Dan Williams
8d2ad999ca cxl/port: Fix delete_endpoint() vs parent unregistration race
The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of
ports (struct cxl_port objects) between an endpoint and the root of a
CXL topology. Each port including the endpoint port is attached to the
cxl_port driver.

Given that setup, it follows that when either any port in that lineage
goes through a cxl_port ->remove() event, or the memdev goes through a
cxl_mem ->remove() event. The hierarchy below the removed port, or the
entire hierarchy if the memdev is removed needs to come down.

The delete_endpoint() callback is careful to check whether it is being
called to tear down the hierarchy, or if it is only being called to
teardown the memdev because an ancestor port is going through
->remove().

That care needs to take the device_lock() of the endpoint's parent.
Which requires 2 bugs to be fixed:

1/ A reference on the parent is needed to prevent use-after-free
   scenarios like this signature:

    BUG: spinlock bad magic on CPU#0, kworker/u56:0/11
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Workqueue: cxl_port detach_memdev [cxl_core]
    RIP: 0010:spin_bug+0x65/0xa0
    Call Trace:
      do_raw_spin_lock+0x69/0xa0
     __mutex_lock+0x695/0xb80
     delete_endpoint+0xad/0x150 [cxl_core]
     devres_release_all+0xb8/0x110
     device_unbind_cleanup+0xe/0x70
     device_release_driver_internal+0x1d2/0x210
     detach_memdev+0x15/0x20 [cxl_core]
     process_one_work+0x1e3/0x4c0
     worker_thread+0x1dd/0x3d0

2/ In the case of RCH topologies, the parent device that needs to be
   locked is not always @port->dev as returned by cxl_mem_find_port(), use
   endpoint->dev.parent instead.

Fixes: 8dd2bc0f8e ("cxl/mem: Add the cxl_mem driver")
Cc: <stable@vger.kernel.org>
Reported-by: Robert Richter <rrichter@amd.com>
Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:23 -07:00
Jim Harris
98a04c7ace cxl/region: Fix x1 root-decoder granularity calculations
Root decoder granularity must match value from CFWMS, which may not
be the region's granularity for non-interleaved root decoders.

So when calculating granularities for host bridge decoders, use the
region's granularity instead of the root decoder's granularity to ensure
the correct granularities are set for the host bridge decoders and any
downstream switch decoders.

Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch.

Region created with 2048 granularity using following command line:

cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \
		  -g 2048 -s 2048M

Use "cxl list -PDE | grep granularity" to get a view of the granularity
set at each level of the topology.

Before this patch:
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":512,
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":512,
"interleave_granularity":256,

After:
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":4096,
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":4096,
"interleave_granularity":2048,

Fixes: 27b3f8d138 ("cxl/region: Program target lists")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in
[djbw: fixup the prebuilt cxl_test region]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:52 -07:00
Li Zhijian
3531b27f1f cxl/region: Fix cxl_region_rwsem lock held when returning to user space
Fix a missed "goto out" to unlock on error to cleanup this splat:

    WARNING: lock held when returning to user space!
    6.6.0-rc3-lizhijian+ #213 Not tainted
    ------------------------------------------------
    cxl/673 is leaving the kernel with locks still held!
    1 lock held by cxl/673:
     #0: ffffffffa013b9d0 (cxl_region_rwsem){++++}-{3:3}, at: commit_store+0x7d/0x3e0 [cxl_core]

In terms of user visible impact of this bug for backports:

cxl_region_invalidate_memregion() on x86 invokes wbinvd which is a
problematic instruction for virtualized environments. So, on virtualized
x86, cxl_region_invalidate_memregion() returns an error. This failure
case got missed because CXL memory-expander device passthrough is not a
production use case, and emulation of CXL devices is typically limited
to kernel development builds with CONFIG_CXL_REGION_INVALIDATION_TEST=y,
that makes cxl_region_invalidate_memregion() succeed.

In other words, the expected exposure of this bug is limited to CXL
subsystem development environments using QEMU that neglected
CONFIG_CXL_REGION_INVALIDATION_TEST=y.

Fixes: d1257d098a ("cxl/region: Move cache invalidation before region teardown, and before setup")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231025085450.2514906-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:51 -07:00
Alison Schofield
0cf36a85c1 cxl/region: Use cxl_calc_interleave_pos() for auto-discovery
For auto-discovered regions the driver must assign each target to
a valid position in the region interleave set based on the decoder
topology.

The current implementation fails to parse valid decode topologies,
as it does not consider the child offset into a parent port. The sort
put all targets of one port ahead of another port when an interleave
was expected, causing the region assembly to fail.

Replace the existing relative sort with cxl_calc_interleave_pos() that
finds the exact position in a region interleave for an endpoint based
on a walk up the ancestral tree from endpoint to root decoder.

cxl_calc_interleave_pos() was introduced in a prior patch, so the work
here is to use it in cxl_region_sort_targets().

Remove the obsoleted helper functions from the prior sort.

Testing passes on pre-production hardware with BIOS defined regions
that natively trigger this autodiscovery path of the region driver.
Testing passes a CXL unit test using the dev_dbg() calculation test
(see cxl_region_attach()) across an expanded set of region configs:
1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number
represents the count of endpoints per host bridge.

Fixes: a32320b71f ("cxl/region: Add region autodiscovery")
Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:51 -07:00
Alison Schofield
a3e00c964f cxl/region: Calculate a target position in a region interleave
Introduce a calculation to find a target's position in a region
interleave. Perform a self-test of the calculation on user-defined
regions.

The region driver uses the kernel sort() function to put region
targets in relative order. Positions are assigned based on each
target's index in that sorted list. That relative sort doesn't
consider the offset of a port into its parent port which causes
some auto-discovered regions to fail creation. In one failure case,
a 2 + 2 config (2 host bridges each with 2 endpoints), the sort
puts all the targets of one port ahead of another port when they
were expected to be interleaved.

In preparation for repairing the autodiscovery region assembly,
introduce a new method for discovering a target position in the
region interleave.

cxl_calc_interleave_pos() adds a method to find the target position by
ascending from an endpoint to a root decoder. The calculation starts
with the endpoint's local position and position in the parent port. It
traverses towards the root decoder and examines both position and ways
in order to allow the position to be refined all the way to the root
decoder.

This calculation: position = position * parent_ways + parent_pos;
applied iteratively yields the correct position.

Include a self-test that exercises this new position calculation against
every successfully configured user-defined region.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/0ac32c75cf81dd8b86bf07d70ff139d33c2300bc.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:48 -07:00
Alison Schofield
1110581412 cxl/region: Prepare the decoder match range helper for reuse
match_decoder_by_range() and decoder_match_range() both determine
if an HPA range matches a decoder. The first does it for root
decoders and the second one operates on switch decoders.

Tidy these up with clear naming and make the switch helper more
like the root decoder helper in style and functionality. Make it
take the actual range, rather than an endpoint decoder from which
it extracts the range. Require an exact match on switch decoders,
because unlike a root decoder that maps an entire region, Linux
only supports 1:1 mapping of switch to endpoint decoders. Note that
root-decoders are a super-set of switch-decoders and the range they
cover is a super-set of a region, hence the use of range_contains() for
that case.

Aside from aesthetics and maintainability, this is in preparation
for reuse.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/011b1f498e1758bb8df17c5951be00bd8d489e3b.1698263080.git.alison.schofield@intel.com
[djbw: fixup root decoder vs switch decoder range checks]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-26 08:46:54 -07:00
Alison Schofield
9214c9d56c cxl/mbox: Remove useless cast in cxl_mem_create_range_info()
DEFINE_RES_MEM() is a wrapper around the DEFINE_RES_NAMED() macro
which already has the (struct resource) for the compound literal.
The user of the macro should not repeat the cast.

Cleans up these sparse warnings:
drivers/cxl/core/mbox.c:1184:18: warning: cast to non-scalar
drivers/cxl/core/mbox.c:1184:18: warning: cast from non-scalar

Fixes: 52c4d11f1d ("resource: Convert DEFINE_RES_NAMED() to be compound literal")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230815172052.22514-1-alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-24 16:04:10 -07:00
Jim Harris
0718588c7a cxl/region: Do not try to cleanup after cxl_region_setup_targets() fails
Commit 5e42bcbc3f ("cxl/region: decrement ->nr_targets on error in
cxl_region_attach()") tried to avoid 'eiw' initialization errors when
->nr_targets exceeded 16, by just decrementing ->nr_targets when
cxl_region_setup_targets() failed.

Commit 86987c7662 ("cxl/region: Cleanup target list on attach error")
extended that cleanup to also clear cxled->pos and p->targets[pos]. The
initialization error was incidentally fixed separately by:
Commit 8d42854257 ("cxl/region: Fix port setup uninitialized variable
warnings") which was merged a few days after 5e42bcbc3f.

But now the original cleanup when cxl_region_setup_targets() fails
prevents endpoint and switch decoder resources from being reused:

1) the cleanup does not set the decoder's region to NULL, which results
   in future dpa_size_store() calls returning -EBUSY
2) the decoder is not properly freed, which results in future commit
   errors associated with the upstream switch

Now that the initialization errors were fixed separately, the proper
cleanup for this case is to just return immediately. Then the resources
associated with this target get cleanup up as normal when the failed
region is deleted.

The ->nr_targets decrement in the error case also helped prevent
a p->targets[] array overflow, so add a new check to prevent against
that overflow.

Tested by trying to create an invalid region for a 2 switch * 2 endpoint
topology, and then following up with creating a valid region.

Fixes: 5e42bcbc3f ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169703589120.1202031.14696100866518083806.stgit@bgt-140510-bm03.eng.stellus.in
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-24 14:43:37 -07:00
Dan Williams
88d3917f82 cxl/mem: Fix shutdown order
Ira reports that removing cxl_mock_mem causes a crash with the following
trace:

 BUG: kernel NULL pointer dereference, address: 0000000000000044
 [..]
 RIP: 0010:cxl_region_decode_reset+0x7f/0x180 [cxl_core]
 [..]
 Call Trace:
  <TASK>
  cxl_region_detach+0xe8/0x210 [cxl_core]
  cxl_decoder_kill_region+0x27/0x40 [cxl_core]
  cxld_unregister+0x29/0x40 [cxl_core]
  devres_release_all+0xb8/0x110
  device_unbind_cleanup+0xe/0x70
  device_release_driver_internal+0x1d2/0x210
  bus_remove_device+0xd7/0x150
  device_del+0x155/0x3e0
  device_unregister+0x13/0x60
  devm_release_action+0x4d/0x90
  ? __pfx_unregister_port+0x10/0x10 [cxl_core]
  delete_endpoint+0x121/0x130 [cxl_core]
  devres_release_all+0xb8/0x110
  device_unbind_cleanup+0xe/0x70
  device_release_driver_internal+0x1d2/0x210
  bus_remove_device+0xd7/0x150
  device_del+0x155/0x3e0
  ? lock_release+0x142/0x290
  cdev_device_del+0x15/0x50
  cxl_memdev_unregister+0x54/0x70 [cxl_core]

This crash is due to the clearing out the cxl_memdev's driver context
(@cxlds) before the subsystem is done with it. This is ultimately due to
the region(s), that this memdev is a member, being torn down and expecting
to be able to de-reference @cxlds, like here:

static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
...
                if (cxlds->rcd)
                        goto endpoint_reset;
...

Fix it by keeping the driver context valid until memdev-device
unregistration, and subsequently the entire stack of related
dependencies, unwinds.

Fixes: 9cc238c7a5 ("cxl/pci: Introduce cdevm_file_operations")
Reported-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-09 11:35:45 -07:00
Dan Williams
3398183808 cxl/memdev: Fix sanitize vs decoder setup locking
The sanitize operation is destructive and the expectation is that the
device is unmapped while in progress. The current implementation does a
lockless check for decoders being active, but then does nothing to
prevent decoders from racing to be committed. Introduce state tracking
to resolve this race.

This incidentally cleans up unpriveleged userspace from triggering mmio
read cycles by spinning on reading the 'security/state' attribute. Which
at a minimum is a waste since the kernel state machine can cache the
completion result.

Lastly cxl_mem_sanitize() was mistakenly marked EXPORT_SYMBOL() in the
original implementation, but an export was never required.

Fixes: 0c36b6ad43 ("cxl/mbox: Add sanitization handling machinery")
Cc: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06 00:12:45 -07:00
Dan Williams
5f2da19714 cxl/pci: Fix sanitize notifier setup
Fix a race condition between the mailbox-background command interrupt
firing and the security-state sysfs attribute being removed.

The race is difficult to see due to the awkward placement of the
sanitize-notifier setup code and the multiple places the teardown calls
are made, cxl_memdev_security_init() and cxl_memdev_security_shutdown().

Unify setup in one place, cxl_sanitize_setup_notifier(). Arrange for
the paired cxl_sanitize_teardown_notifier() to safely quiet the notifier
and let the cxl_memdev + irq be unregistered later in the flow.

Note: The special wrinkle of the sanitize notifier is that it interacts
with interrupts, which are enabled early in the flow, and it interacts
with memdev sysfs which is not initialized until late in the flow. Hence
why this setup routine takes an @cxlmd argument, and not just @mds.

This fix is also needed as a preparation fix for a memdev unregistration
crash.

Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20230929100316.00004546@Huawei.com
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Fixes: 0c36b6ad43 ("cxl/mbox: Add sanitization handling machinery")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06 00:12:45 -07:00
Dan Williams
f29a824b0b cxl/pci: Clarify devm host for memdev relative setup
It is all too easy to get confused about @dev usage in the CXL driver
stack. Before adding a new cxl_pci_probe() setup operation that has a
devm lifetime dependent on @cxlds->dev binding, but also references
@cxlmd->dev, and prints messages, rework the devm_cxl_add_memdev() and
cxl_memdev_setup_fw_upload() function signatures to make this
distinction explicit. I.e. pass in the devm context as an @host argument
rather than infer it from other objects.

This is in preparation for adding a devm_cxl_sanitize_setup_notifier().

Note the whitespace fixup near the change of the devm_cxl_add_memdev()
signature. That uncaught typo originated in the patch that added
cxl_memdev_security_init().

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06 00:12:44 -07:00
Dan Williams
2627c995c1 cxl/pci: Remove inconsistent usage of dev_err_probe()
If dev_err_probe() is to be used it should at least be used consistently
within the same function. It is also worth questioning whether
every potential -ENOMEM needs an explicit error message.

Remove the cxl_setup_fw_upload() error prints for what are rare /
hardware-independent failures.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06 00:12:44 -07:00
Dan Williams
08b8a8c054 cxl/pci: Remove hardirq handler for cxl_request_irq()
Now that all callers of cxl_request_irq() are using threaded irqs, drop
the hardirq handler option.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06 00:12:44 -07:00
Dan Williams
e30a106558 cxl/pci: Cleanup 'sanitize' to always poll
In preparation for fixing the init/teardown of the 'sanitize' workqueue
and sysfs notification mechanism, arrange for cxl_mbox_sanitize_work()
to be the single location where the sysfs attribute is notified. With
that change there is no distinction between polled mode and interrupt
mode. All the interrupt does is accelerate the polling interval.

The change to check for "mds->security.sanitize_node" under the lock is
there to ensure that the interrupt, the work routine and the
setup/teardown code can all have a consistent view of the registered
notifier and the workqueue state. I.e. the expectation is that the
interrupt is live past the point that the sanitize sysfs attribute is
published, and it may race teardown, so it must be consulted under a
lock. Given that new locking requirement, cxl_pci_mbox_irq() is moved
from hard to thread irq context.

Lastly, some opportunistic replacements of
"queue_delayed_work(system_wq, ...)", which is just open coded
schedule_delayed_work(), are included.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-29 15:43:34 -07:00
Dan Williams
76fe8713dd cxl/pci: Remove unnecessary device reference management in sanitize work
Given that any particular put_device() could be the final put of the
device, the fact that there are usages of cxlds->dev after
put_device(cxlds->dev) is a red flag. Drop the reference counting since
the device is pinned by being registered and will not be unregistered
without triggering the driver + workqueue to shutdown.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-29 14:44:46 -07:00
Kees Cook
c66650d297 cxl/acpi: Annotate struct cxl_cxims_data with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cxl_cxims_data.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230922175319.work.096-kees@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-22 14:31:04 -07:00
Dan Williams
a76b62518e cxl/port: Fix cxl_test register enumeration regression
The cxl_test unit test environment models a CXL topology for
sysfs/user-ABI regression testing. It uses interface mocking via the
"--wrap=" linker option to redirect cxl_core routines that parse
hardware registers with versions that just publish objects, like
devm_cxl_enumerate_decoders().

Starting with:

Commit 19ab69a60e ("cxl/port: Store the port's Component Register mappings in struct cxl_port")

...port register enumeration is moved into devm_cxl_add_port(). This
conflicts with the "cxl_test avoids emulating registers stance" so
either the port code needs to be refactored (too violent), or modified
so that register enumeration is skipped on "fake" cxl_test ports
(annoying, but straightforward).

This conflict has happened previously and the "check for platform
device" workaround to avoid instrusive refactoring was deployed in those
scenarios. In general, refactoring should only benefit production code,
test code needs to remain minimally instrusive to the greatest extent
possible.

This was missed previously because it may sometimes just cause warning
messages to be emitted, but it can also cause test failures. The
backport to -stable is only nice to have for clean cxl_test runs.

Fixes: 19ab69a60e ("cxl/port: Store the port's Component Register mappings in struct cxl_port")
Cc: stable@vger.kernel.org
Reported-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-22 14:28:42 -07:00
Ira Weiny
1b27978d69 cxl/pci: Update comment
The existence of struct cxl_dev_id containing a single member is odd.
The comment made sense when I wrote it but could be clarified.

Update the comment and place it next to the odd looking structure.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230426-cxl-fixes-v1-2-870c4c8b463a@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15 22:02:04 -07:00
Dan Williams
7914992b37 cxl/port: Quiet warning messages from the cxl_test environment
The cxl_test platform device CXL port hierarchy is useful for testing,
but throws warning messages of the form:

    cxl_mem mem2: at cxl_root_port.1 no parent for dport: platform
    cxl_mem mem3: at cxl_root_port.2 no parent for dport: platform
    cxl_mem mem4: at cxl_root_port.3 no parent for dport: platform
    cxl_mem mem5: at cxl_root_port.0 no parent for dport: platform
    cxl_mem mem6: at cxl_root_port.1 no parent for dport: platform
    cxl_mem mem7: at cxl_root_port.2 no parent for dport: platform
    cxl_mem mem8: at cxl_root_port.3 no parent for dport: platform
    cxl_mem mem9: at cxl_root_port.4 no parent for dport: platform
    cxl_mem mem10: at cxl_root_port.4 no parent for dport: platform

...and this message when running testing in QEMU:

    cxl_region region4: Bypassing cpu_cache_invalidate_memregion() for testing!

Noisy cxl_test warnings have caused other regressions to be missed. In
the interest of using cxl_test for early detection of dev_err() and
dev_warn() messages, silence platform device topology and
cache-invalidation messages.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15 21:45:55 -07:00
Alison Schofield
18f35dc931 cxl/region: Refactor granularity select in cxl_port_setup_targets()
In cxl_port_setup_targets() the region driver validates the
configuration of auto-discovered region decoders, as well
as decoders the driver is preparing to program.

The existing calculations use the encoded interleave granularity
value to create an interleave granularity that properly fans out
when routing an x1 interleave to a greater than x1 interleave.

That all worked well, until this config came along:
Host Bridge: 2 way at 256 granularity
    Switch Decoder_A:	1 way at 512
        Endpoint_X:	2 way at 256
    Switch Decoder_B:	1 way at 512
        Endpoint_Y:	2 way at 256

When the Host Bridge interleave is greater than 1 and the root
decoder interleave is exactly 1, the region driver needs to
consider the number of targets in the region when calculating
the expected granularity.

While examining the existing logic, and trying to cover the case
above, a couple of simplifications appeared, hence this proposed
refactoring.

The first simplification is to apply the logic to the nominal
values and use the existing helper function granularity_to_eig() to
translate the desired granularity to the encoded form. This means
the comment and code regarding setting address bits is discarded.
Although that logic is not wrong, it adds a level of complexity that
is not required in the granularity selection. The eig and eiw are
indeed part of the routing instructions programmed into the decoders.
Up-level the discussion to nominal ways and granularity for clearer
analysis.

The second simplification reduces the logic to a single granularity
calculation that works for all cases. The new calculation doesn't
care if parent_iw => 1 because parent_iw is used as a multiplier.

The refactor cleans up a useless assignment of eiw made after the
iw is already calculated.

Regression testing included an examination of all of the ways and
granularity selections made during a run of the cxl_test unit tests.
There were no differences in selections before and after this patch.

Fixes: ("27b3f8d13830 cxl/region: Program target lists")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230822180928.117596-1-alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-14 20:47:42 -07:00
Alison Schofield
9e4edf1a21 cxl/region: Match auto-discovered region decoders by HPA range
Currently, when the region driver attaches a region to a port, it
selects the ports next available decoder to program.

With the addition of auto-discovered regions, a port decoder has
already been programmed so grabbing the next available decoder can
be a mismatch when there is more than one region using the port.

The failure appears like this with CXL DEBUG enabled:

[] cxl_core:alloc_region_ref:754: cxl region0: endpoint9: HPA order violation region0:[mem 0x14780000000-0x1478fffffff flags 0x200] vs [mem 0x880000000-0x185fffffff flags 0x200]
[] cxl_core:cxl_port_attach_region:972: cxl region0: endpoint9: failed to allocate region reference

When CXL DEBUG is not enabled, there is no failure message. The region
just never materializes. Users can suspect this issue if they know their
firmware has programmed decoders so that more than one region is using
a port. Note that the problem may appear intermittently, ie not on
every reboot.

Add a matching method for auto-discovered regions that finds a decoder
based on an HPA range. The decoder range must exactly match the region
resource parameter.

Fixes: a32320b71f ("cxl/region: Add region autodiscovery")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230905211007.256385-1-alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-14 20:46:12 -07:00
Ira Weiny
d2f7060588 cxl/mbox: Fix CEL logic for poison and security commands
The following debug output was observed while testing CXL

cxl_core:cxl_walk_cel:721: cxl_mock_mem cxl_mem.0: Opcode 0x4300 unsupported by driver

opcode 0x4300 (Get Poison) is supported by the driver and the mock
device supports it.  The logic should be checking that the opcode is
both not poison and not security.

Fix the logic to allow poison and security commands.

Fixes: ad64f5952c ("cxl/memdev: Only show sanitize sysfs files when supported")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230903-cxl-cel-fix-v1-1-e260c9467be3@intel.com
[cleanup cxl_walk_cel() to centralized "enabled" checks]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-14 13:48:49 -07:00
Smita Koralahalli
55b8ff06a0 cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native()
Use pcie_aer_is_native() to determine the native AER ownership as the
usage of host_bride->native_aer does not cover command line override of
AER ownership.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230823234305.27333-4-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-11 15:24:30 -07:00
Smita Koralahalli
0339dc39a5 cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
cxl_pci fails to unmask CXL protocol errors when CXL memory error reporting
is not granted native control. Given that CXL memory error reporting uses
the event interface and protocol errors use AER, unmask protocol errors
based only on the native AER setting. Without this change end user
deployments will fail to report protocol errors in the case where native
memory error handling is not granted to Linux.

Also, return zero instead of an error code to not block the communication
with the cxl device when in native memory error reporting mode.

Fixes: 248529edc8 ("cxl: add RAS status unmasking for CXL")
Cc: <stable@vger.kernel.org>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230823234305.27333-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-11 15:23:20 -07:00
Davidlohr Bueso
ad64f5952c cxl/memdev: Only show sanitize sysfs files when supported
If the device does not support Sanitize or Secure Erase commands,
hide the respective sysfs interfaces such that the operation can
never be attempted.

In order to be generic, keep track of the enabled security commands
found in the CEL - the driver does not support Security Passthrough.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230726051940.3570-4-dave@stgolabs.net
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-28 13:16:54 -06:00
Davidlohr Bueso
3de8cd2242 cxl/memdev: Document security state in kern-doc
... as is the case with all members of struct cxl_memdev_state.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230726051940.3570-3-dave@stgolabs.net
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-28 13:16:54 -06:00
Breno Leitao
91019b5bc7 cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()
Driver initialization returned success (return 0) even if the
initialization (cxl_decoder_add() or acpi_table_parse_cedt()) failed.

Return the error instead of swallowing it.

Fixes: f4ce1f766f ("cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers")
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230714093146.2253438-2-leitao@debian.org
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-18 11:55:09 -06:00
Breno Leitao
4cf67d3cc9 cxl/acpi: Fix a use-after-free in cxl_parse_cfmws()
KASAN and KFENCE detected an user-after-free in the CXL driver. This
happens in the cxl_decoder_add() fail path. KASAN prints the following
error:

   BUG: KASAN: slab-use-after-free in cxl_parse_cfmws (drivers/cxl/acpi.c:299)

This happens in cxl_parse_cfmws(), where put_device() is called,
releasing cxld, which is accessed later.

Use the local variables in the dev_err() instead of pointing to the
released memory. Since the dev_err() is printing a resource, change the open
coded print format to use the %pr format specifier.

Fixes: e50fe01e1f ("cxl/core: Drop ->platform_res attribute for root decoders")
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-18 11:55:02 -06:00
Dan Carpenter
95c6bff72b cxl/mem: Fix a double shift bug
The CXL_FW_CANCEL macro is used with set/test_bit() so it should be a
bit number and not the shifted value.  The original code is the
equivalent of using BIT(BIT(0)) so it's 0x2 instead of 0x1.  This has
no effect on runtime because it's done consistently and nothing else
was using the 0x2 bit.

Fixes: 9521875bbe ("cxl: add a firmware update mechanism using the sysfs firmware loader")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/a11b0c78-4717-4f4e-90be-f47f300d607c@moroto.mountain
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-14 14:33:21 -06:00
Arnd Bergmann
9171dfcda4 cxl: fix CONFIG_FW_LOADER dependency
When FW_LOADER is disabled, cxl fails to link:

arm-linux-gnueabi-ld: drivers/cxl/core/memdev.o: in function `cxl_memdev_setup_fw_upload':
memdev.c:(.text+0x90e): undefined reference to `firmware_upload_register'
memdev.c:(.text+0x93c): undefined reference to `firmware_upload_unregister'

In order to use the firmware_upload_register() function, both FW_LOADER
and FW_UPLOAD have to be enabled, which is a bit confusing. In addition,
the dependency is on the wrong symbol, as the caller is part of the
cxl_core.ko module, not the cxl_mem.ko module.

Fixes: 9521875bbe ("cxl: add a firmware update mechanism using the sysfs firmware loader")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230703112928.332321-1-arnd@kernel.org
Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-14 14:32:22 -06:00
Yang Li
fe77cc2e5a cxl: Fix one kernel-doc comment
Fix a merge error that updated the argument to cxl_mem_get_fw_info() but
not the kernel-doc.

drivers/cxl/core/memdev.c:678: warning: Function parameter or member
'mds' not described in 'cxl_mem_get_fw_info'
drivers/cxl/core/memdev.c:678: warning: Excess function parameter
'cxlds' description in 'cxl_mem_get_fw_info'

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230629021118.102744-1-yang.lee@linux.alibaba.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-29 16:03:58 -07:00
Davidlohr Bueso
71baec7b85 cxl/pci: Use correct flag for sanitize polling
This is a bogus value, left behind from a previous version.

Fixes: 0c36b6ad43 ("cxl/mbox: Add sanitization handling machinery")
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/7q3vcjqidtmxmys4n34g6b3mygvhaen7yikzxanpz56lw43fz7@7subbtbfkmyx
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-27 15:51:40 -07:00
Dan Williams
0c0df63177 Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxl
Pick up the first half of the RCH error handling series. The back half
needs some fixups for test regressions. Small conflicts with the PMU
work around register enumeration and setup helpers.
2023-06-25 18:56:13 -07:00
Dan Williams
d2f9fe6953 Merge branch 'for-6.5/cxl-perf' into for-6.5/cxl
Pick up initial support for the CXL 3.0 performance monitoring
definition. Small conflicts with the firmware update work as they both
placed their init code in the same location.
2023-06-25 17:53:18 -07:00
Jonathan Cameron
5d7107c727 perf: CXL Performance Monitoring Unit driver
CXL rev 3.0 introduces a standard performance monitoring hardware
block to CXL. Instances are discovered using CXL Register Locator DVSEC
entries. Each CXL component may have multiple PMUs.

This initial driver supports a subset of types of counter.
It supports counters that are either fixed or configurable, but requires
that they support the ability to freeze and write value whilst frozen.

Development done with QEMU model which will be posted shortly.

Example:

$ perf stat -a -e cxl_pmu_mem0.0/h2d_req_snpcur/ -e cxl_pmu_mem0.0/h2d_req_snpdata/ -e cxl_pmu_mem0.0/clock_ticks/ sleep 1

Performance counter stats for 'system wide':

96,757,023,244,321      cxl_pmu_mem0.0/h2d_req_snpcur/
96,757,023,244,365      cxl_pmu_mem0.0/h2d_req_snpdata/
193,514,046,488,653      cxl_pmu_mem0.0/clock_ticks/

       1.090539600 seconds time elapsed

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230526095824.16336-5-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 17:47:09 -07:00
Dan Williams
e2c18eb50c Merge branch 'for-6.5/cxl-region-fixes' into for-6.5/cxl
Pick up the recent fixes to how CPU caches are managed relative to
region setup / teardown, and make sure that all decoders transition
successfully before updating the region state from COMMIT => ACTIVE.
2023-06-25 17:23:50 -07:00
Dan Williams
aeaefabc59 Merge branch 'for-6.5/cxl-type-2' into for-6.5/cxl
Pick up the driver cleanups identified in preparation for CXL "type-2"
(accelerator) device support. The major change here from a conflict
generation perspective is the split of 'struct cxl_memdev_state' from
the core 'struct cxl_dev_state'. Since an accelerator may not care about
all the optional features that are standard on a CXL "type-3" (host-only
memory expander) device.

A silent conflict also occurs with the move of the endpoint port to be a
formal property of a 'struct cxl_memdev' rather than drvdata.
2023-06-25 17:16:51 -07:00
Dan Williams
867eab655d Merge branch 'for-6.5/cxl-fwupd' into for-6.5/cxl
Add the first typical (non-sanitization) consumer of the new background
command infrastructure, firmware update. Given both firmware-update and
sanitization were developed in parallel from the common
background-command baseline, resolve some minor context conflicts.
2023-06-25 16:12:26 -07:00
Dan Williams
dcfb70610d Merge branch 'for-6.5/cxl-background' into for-6.5/cxl
Pick up the sanitization work and the infrastructure for other
background commands for 6.5. Sanitization has a different completion
path than typical background commands so it was important to have both
thought out and implemented before either went upstream.
2023-06-25 16:01:45 -07:00
Vishal Verma
9521875bbe cxl: add a firmware update mechanism using the sysfs firmware loader
The sysfs based firmware loader mechanism was created to easily allow
userspace to upload firmware images to FPGA cards. This also happens to
be pretty suitable to create a user-initiated but kernel-controlled
firmware update mechanism for CXL devices, using the CXL specified
mailbox commands.

Since firmware update commands can be long-running, and can be processed
in the background by the endpoint device, it is desirable to have the
ability to chunk the firmware transfer down to smaller pieces, so that
one operation does not monopolize the mailbox, locking out any other
long running background commands entirely - e.g. security commands like
'sanitize' or poison scanning operations.

The firmware loader mechanism allows a natural way to perform this
chunking, as after each mailbox command, that is restricted to the
maximum mailbox payload size, the cxl memdev driver relinquishes control
back to the fw_loader system and awaits the next chunk of data to
transfer. This opens opportunities for other background commands to
access the mailbox and send their own slices of background commands.

Add the necessary helpers and state tracking to be able to perform the
'Get FW Info', 'Transfer FW', and 'Activate FW' mailbox commands as
described in the CXL spec. Wire these up to the firmware loader
callbacks, and register with that system to create the memX/firmware/
sysfs ABI.

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Widawsky <bwidawsk@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20230602-vv-fw_update-v4-1-c6265bd7343b@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 15:58:40 -07:00
Davidlohr Bueso
180ffd338c cxl/mem: Support Secure Erase
Implement support for the non-pmem exclusive secure erase, per
CXL specs. Create a write-only 'security/erase' sysfs file to
perform the requested operation.

As with the sanitation this requires the device being offline
and thus no active HPA-DPA decoding.

The expectation is that userspace can use it such as:

	cxl disable-memdev memX
	echo 1 > /sys/bus/cxl/devices/memX/security/erase
	cxl enable-memdev memX

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230612181038.14421-7-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 15:34:41 -07:00
Davidlohr Bueso
48dcdbb16e cxl/mem: Wire up Sanitization support
Implement support for CXL 3.0 8.2.9.8.5.1 Sanitize. This is done by
adding a security/sanitize' memdev sysfs file to trigger the operation
and extend the status file to make it poll(2)-capable for completion.
Unlike all other background commands, this is the only operation that
is special and monopolizes the device for long periods of time.

In addition to the traditional pmem security requirements, all regions
must also be offline in order to perform the operation. This permits
avoiding explicit global CPU cache management, relying instead on the
implict cache management when a region transitions between
CXL_CONFIG_ACTIVE and CXL_CONFIG_COMMIT.

The expectation is that userspace can use it such as:

    cxl disable-memdev memX
    echo 1 > /sys/bus/cxl/devices/memX/security/sanitize
    cxl wait-sanitize memX
    cxl enable-memdev memX

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230612181038.14421-5-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 15:21:16 -07:00
Davidlohr Bueso
0c36b6ad43 cxl/mbox: Add sanitization handling machinery
Sanitization is by definition a device-monopolizing operation, and thus
the timeslicing rules for other background commands do not apply.
As such handle this special case asynchronously and return immediately.
Subsequent changes will allow completion to be pollable from userspace
via a sysfs file interface.

For devices that don't support interrupts for notifying background
command completion, self-poll with the caveat that the poller can
be out of sync with the ready hardware, and therefore care must be
taken to not allow any new commands to go through until the poller
sees the hw completion. The poller takes the mbox_mutex to stabilize
the flagging, minimizing any runtime overhead in the send path to
check for 'sanitize_tmo' for uncommon poll scenarios.

The irq case is much simpler as hardware will serialize/error
appropriately.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230612181038.14421-4-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:54:51 -07:00
Davidlohr Bueso
9968c9dd56 cxl/mem: Introduce security state sysfs file
Add a read-only sysfs file to display the security state
of a device (currently only pmem):

    /sys/bus/cxl/devices/memX/security/state

This introduces a cxl_security_state structure that is
to be the placeholder for common CXL security features.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230612181038.14421-3-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:54:51 -07:00
Davidlohr Bueso
8ea9c33d48 cxl/mbox: Allow for IRQ_NONE case in the isr
For cases when the mailbox background operation is not complete,
do not "handle" the interrupt, as it was not from this device.
And furthermore there are no racy scenarios such as the hw being
out of sync with the driver and starting a new background op
behind its back.

Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Fixes: ccadf1310f (cxl/mbox: Add background cmd handling machinery)
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230612181038.14421-2-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:54:51 -07:00
Dan Williams
8f0220af58 Revert "cxl/port: Enable the HDM decoder capability for switch ports"
commit eb0764b822 ("cxl/port: Enable the HDM decoder capability for switch ports")

...was added on the observation of CXL memory not being accessible after
setting up a region on a "cold-plugged" device. A "cold-plugged" CXL
device is one that was not present at boot, so platform-firmware/BIOS
has no chance to set it up.

While it is true that the debug found the enable bit clear in the
host-bridge's instance of the global control register (CXL 3.0
8.2.4.19.2 CXL HDM Decoder Global Control Register), that bit is
described as:

"This bit is only applicable to CXL.mem devices and shall
return 0 on CXL Host Bridges and Upstream Switch Ports."

So it is meant to be zero, and further testing confirmed that this "fix"
had no effect on the failure. Revert it, and be more vigilant about
proposed fixes in the future. Since the original copied stable@, flag
this revert for stable@ as well.

Cc: <stable@vger.kernel.org>
Fixes: eb0764b822 ("cxl/port: Enable the HDM decoder capability for switch ports")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168685882012.3475336.16733084892658264991.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:32:18 -07:00
Dan Williams
516b300c4c cxl/memdev: Formalize endpoint port linkage
Move the endpoint port that the cxl_mem driver establishes from drvdata
to a first class attribute. This is in preparation for device-memory
drivers reusing the CXL core for memory region management. Those drivers
need a type-safe method to retrieve their CXL port linkage. Leave
drvdata for private usage of the cxl_mem driver not external consumers
of a 'struct cxl_memdev' object.

Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679264292.3436160.3901392135863405807.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:33 -07:00
Dan Williams
f3c8a37a43 cxl/pci: Unconditionally unmask 256B Flit errors
The current check for 256B Flit mode is incomplete and unnecessary. It
is incomplete because it fails to consider the link speed, or check for
CXL link capabilities. It is unnecessary because unconditionally
unmasking 256B Flit errors is a nop when 256B Flit operation is not
available.

Remove this check in preparation for creating a cxl_probe_link() helper
to centralize this detection.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679263124.3436160.6228910132469454346.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:09 -07:00
Dan Williams
8c897b366c cxl/region: Manage decoder target_type at decoder-attach time
Switch-level (mid-level) decoders between the platform root and an
endpoint can dynamically switch modes between HDM-H and HDM-D[B]
depending on which region they target. Use the region type to fixup each
decoder that gets allocated to map the given region.

Note that endpoint decoders are meant to determine the region type, so
warn if those ever need to be fixed up, but since it is possible to
continue do so.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679262543.3436160.13053831955768440312.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:09 -07:00
Dan Williams
cecbb5da92 cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEM
In preparation for device-memory region creation, arrange for decoders
of CXL_DEVTYPE_DEVMEM memdevs to default to CXL_DECODER_DEVMEM for their
target type.

Revisit this if a device ever shows up that wants to offer mixed HDM-H
(Host-Only Memory) and HDM-DB support, or an CXL_DEVTYPE_DEVMEM device
that supports HDM-H.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679261945.3436160.11673393474107374595.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:08 -07:00
Dan Williams
5aa39a9165 cxl/port: Rename CXL_DECODER_{EXPANDER, ACCELERATOR} => {HOSTONLYMEM, DEVMEM}
In preparation for support for HDM-D and HDM-DB configuration
(device-memory, and device-memory with back-invalidate). Rename the current
type designators to use HOSTONLYMEM and DEVMEM as a suffix.

HDM-DB can be supported by devices that are not accelerators, so DEVMEM is
a more generic term for that case.

Fixup one location where this type value was open coded.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679261369.3436160.7042443847605280593.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:08 -07:00
Dan Williams
f6b8ab32e3 cxl/memdev: Make mailbox functionality optional
In support of the Linux CXL core scaling for a wider set of CXL devices,
allow for the creation of memdevs with some memory device capabilities
disabled. Specifically, allow for CXL devices outside of those claiming
to be compliant with the generic CXL memory device class code, like
vendor specific Type-2/3 devices that host CXL.mem. This implies, allow
for the creation of memdevs that only support component-registers, not
necessarily memory-device-registers (like mailbox registers). A memdev
derived from a CXL endpoint that does not support generic class code
expectations is tagged "CXL_DEVTYPE_DEVMEM", while a memdev derived from a
class-code compliant endpoint is tagged "CXL_DEVTYPE_CLASSMEM".

The primary assumption of a CXL_DEVTYPE_DEVMEM memdev is that it
optionally may not host a mailbox. Disable the command passthrough ioctl
for memdevs that are not CXL_DEVTYPE_CLASSMEM, and return empty strings
from memdev attributes associated with data retrieved via the
class-device-standard IDENTIFY command. Note that empty strings were
chosen over attribute visibility to maintain compatibility with shipping
versions of cxl-cli that expect those attributes to always be present.
Once cxl-cli has dropped that requirement this workaround can be
deprecated.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679260782.3436160.7587293613945445365.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:08 -07:00
Dan Williams
59f8d15107 cxl/mbox: Move mailbox related driver state to its own data structure
'struct cxl_dev_state' makes too many assumptions about the capabilities
of a CXL device. In particular it assumes a CXL device has a mailbox and
all of the infrastructure and state that comes along with that.

In preparation for supporting accelerator / Type-2 devices that may not
have a mailbox and in general maintain a minimal core context structure,
make mailbox functionality a super-set of  'struct cxl_dev_state' with
'struct cxl_memdev_state'.

With this reorganization it allows for CXL devices that support HDM
decoder mapping, but not other general-expander / Type-3 capabilities,
to only enable that subset without the rest of the mailbox
infrastructure coming along for the ride.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679260240.3436160.15520641540463704524.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:08 -07:00
Dan Williams
3fe7feb0f3 cxl: Remove leftover attribute documentation in 'struct cxl_dev_state'
commit 14d7887407 ("cxl/mem: Consolidate CXL DVSEC Range enumeration
in the core")

...removed @info from 'struct cxl_dev_state', but neglected to remove
the corresponding kernel-doc entry. Complete the removal.

Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20230606121054.000069e1@Huawei.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679259703.3436160.12583306507362357946.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:08 -07:00
Dan Williams
c192e5432f cxl: Fix kernel-doc warnings
After Jonathan noticed [1] that 'struct cxl_dev_state' had a kernel-doc
entry without a corresponding struct attribute I ran the kernel-doc
script to see what else might be broken. Fix these warnings:

drivers/cxl/cxlmem.h:199: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Event Interrupt Policy
drivers/cxl/cxlmem.h:224: warning: Function parameter or member 'buf' not described in 'cxl_event_state'
drivers/cxl/cxlmem.h:224: warning: Function parameter or member 'log_lock' not described in 'cxl_event_state'

Note that scripts/kernel-doc only finds missing kernel-doc entries. It
does not warn on too many kernel-doc entries, i.e. it did not catch the
fact that @info refers to a not present member.

Link: http://lore.kernel.org/r/20230606121054.000069e1@Huawei.com [1]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679259170.3436160.3686460404739136336.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:08 -07:00
Dan Williams
688baac109 cxl/regs: Clarify when a 'struct cxl_register_map' is input vs output
The @map parameter to cxl_probe_X_registers() is filled in with the
mapping parameters of the register block. The @map parameter to
cxl_map_X_registers() only reads that information to perform the
mapping. Mark @map const for cxl_map_X_registers() to clarify that it is
only an input to those helpers.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679258103.3436160.4941603739448763855.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 14:31:08 -07:00
Dan Williams
adfe19738b cxl/region: Fix state transitions after reset failure
Jonathan reports that failed attempts to reset a region (teardown its
HDM decoder configuration) mistakenly advance the state of the region
to "not committed". Revert to the previous state of the region on reset
failure so that the reset can be re-attempted.

Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com
Fixes: 176baefb2e ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168696507968.3590522.14484000711718573626.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 13:45:40 -07:00
Dan Williams
2ab47045ac cxl/region: Flag partially torn down regions as unusable
cxl_region_decode_reset() walks all the decoders associated with a given
region and disables them. Due to decoder ordering rules it is possible
that a switch in the topology notices that a given decoder can not be
shutdown before another region with a higher HPA is shutdown first. That
can leave the region in a partially committed state.

Capture that state in a new CXL_REGION_F_NEEDS_RESET flag and require
that a successful cxl_region_decode_reset() attempt must be completed
before cxl_region_probe() accepts the region.

This is a corollary for the bug that Jonathan identified in "CXL/region
:  commit reset of out of order region appears to succeed." [1].

Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Link: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com [1]
Fixes: 176baefb2e ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168696507423.3590522.16254212607926684429.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 13:45:40 -07:00
Dan Williams
d1257d098a cxl/region: Move cache invalidation before region teardown, and before setup
Vikram raised a concern with the theoretical case of a CPU sending
MemClnEvict to a device that is not prepared to receive. MemClnEvict is
a message that is sent after a CPU has taken ownership of a cacheline
from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder
reconfiguration it is possible that the CPU is holding old contents for
a new device that has taken over the physical address range being cached
by the CPU.

To avoid this scenario, invalidate caches prior to tearing down an HDM
decoder configuration.

Now, this poses another problem that it is possible for something to
speculate into that space while the decode configuration is still up, so
to close that gap also invalidate prior to establish new contents behind
a given physical address range.

With this change the cache invalidation is now explicit and need not be
checked in cxl_region_probe(), and that obviates the need for
CXL_REGION_F_INCOHERENT.

Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Fixes: d18bc74ace ("cxl/region: Manage CPU caches relative to DPA invalidation events")
Reported-by: Vikram Sethi <vsethi@nvidia.com>
Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 13:45:40 -07:00
Robert Richter
5d2ffbe4b8 cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport
Same as for ports, also store the downstream port's Component Register
mappings, use struct cxl_dport for that.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-16-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 12:22:53 -07:00
Robert Richter
19ab69a60e cxl/port: Store the port's Component Register mappings in struct cxl_port
CXL capabilities are stored in the Component Registers. To use them,
the specific I/O ranges of the capabilities must be determined by
probing the registers. For this, the whole Component Register range
needs to be mapped temporarily to detect the offset and length of a
capability range.

In order to use more than one capability of a component (e.g. RAS and
HDM) the Component Register are probed and its mappings created
multiple times. This also causes overlapping I/O ranges as the whole
Component Register range must be mapped again while a capability's I/O
range is already mapped.

Different capabilities cannot be setup at the same time. E.g. the RAS
capability must be made available as soon as the PCI driver is bound,
the HDM decoder is setup later during port enumeration. Moreover,
during early setup it is still unknown if a certain capability is
needed. A central capability setup is therefore not possible,
capabilities must be individually enabled once needed during
initialization.

To avoid a duplicate register probe and overlapping I/O mappings, only
probe the Component Registers one time and store the Component
Register mapping in struct port. The stored mappings can be used later
to iomap the capability register range when enabling the capability,
which will be implemented in a follow-on patch.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 12:22:44 -07:00
Robert Richter
733b57f262 cxl/pci: Early setup RCH dport component registers from RCRB
CXL RAS capabilities must be enabled and accessible as soon as the CXL
endpoint is detected in the PCI hierarchy and bound to the cxl_pci
driver. This needs to be independent of other modules such as cxl_port
or cxl_mem.

CXL RAS capabilities reside in the Component Registers. For an RCH
this is determined by probing RCRB which is implemented very late once
the CXL Memory Device is created.

Change this by moving the RCRB probe to the cxl_pci driver. Do this by
using a new introduced function cxl_pci_find_port() similar to
cxl_mem_find_port() to determine the involved dport by the endpoint's
PCI handle. Plug this into the existing cxl_pci_setup_regs() function
to setup Component Registers. Probe the RCRB in case the Component
Registers cannot be located through the CXL Register Locator
capability.

This unifies code and early sets up the Component Registers at the
same time for both, VH and RCH mode. Only the cxl_pci driver is
involved for this. This allows an early mapping of the CXL RAS
capability registers.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-14-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:57:11 -07:00
Robert Richter
86917c560d cxl/mem: Prepare for early RCH dport component register setup
In order to move the RCH dport component register setup to cxl_pci the
base address must be stored in CXL device state (cxlds) for both
modes, RCH and VH. Store it in cxlds->component_reg_phys and use it
for endpoint creation.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-13-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:57:02 -07:00
Robert Richter
f1d0525eff cxl/regs: Remove early capability checks in Component Register setup
When probing the Component Registers in function cxl_probe_regs()
there are also checks for the existence of the HDM and RAS
capabilities. The checks may fail for components that do not implement
the HDM capability causing the Component Registers setup to fail too.

Remove the checks for a generalized use of cxl_probe_regs() and check
them directly before mapping the RAS or HDM capabilities. This allows
it to setup other Component Registers esp. of an RCH Downstream Port,
which will be implemented in a follow-on patch.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-12-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:51:36 -07:00
Robert Richter
d8bffff201 cxl/port: Remove Component Register base address from struct cxl_dport
The Component Register base address @component_reg_phys is no longer
used after the rework of the Component Register setup which now uses
struct member @comp_map instead. Remove the base address.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-11-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:51:17 -07:00
Robert Richter
d02034b402 cxl/acpi: Directly bind the CEDT detected CHBCR to the Host Bridge's port
During a Host Bridge's downstream port enumeration the CHBS entries in
the CEDT table are parsed, its Component Register base address
extracted and then stored in struct cxl_dport. The CHBS may contain
either the RCRB (RCH mode) or the Host Bridge's Component Registers
(CHBCR, VH mode). The RCRB further contains the CXL downstream port
register base address, while in VH mode the CXL Downstream Switch
Ports are visible in the PCI hierarchy and the DP's component regs are
disovered using the CXL DVSEC register locator capability. The
Component Registers derived from the CHBS for both modes are different
and thus also must be treated differently. That is, in RCH mode, the
component regs base should be bound to the dport, but in VH mode to
the CXL host bridge's port object.

The current implementation stores the CHBCR in addition in struct
cxl_dport and copies it later from there to struct cxl_port. As a
result, the dport contains the wrong Component Registers base address
and, e.g. the RAS capability of a CXL Root Port cannot be detected.

To fix the CHBCR binding, attach it directly to the Host Bridge's
@cxl_port structure. Do this during port creation of the Host Bridge
in add_host_bridge_uport(). Factor out CHBS parsing code in
add_host_bridge_dport() and use it in both functions.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-10-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:41:45 -07:00
Robert Richter
f44c7b7ad9 cxl/acpi: Move add_host_bridge_uport() after cxl_get_chbs()
Just moving code to reorder functions to later share cxl_get_chbs()
with add_host_bridge_uport().

This makes changes in the next patch visible. No other changes at all.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-9-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:41:34 -07:00
Terry Bowman
d076bb8c4c cxl/pci: Refactor component register discovery for reuse
The endpoint implements component register setup code. Refactor it for
reuse with RCRB, downstream port, and upstream port setup.

Move PCI specifics from cxl_setup_regs() into cxl_pci_setup_regs().

Move cxl_setup_regs() into cxl/core/regs.c and export it. This also
includes supporting static functions cxl_map_registerblock(),
cxl_unmap_register_block() and cxl_probe_regs().

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-8-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:39:38 -07:00
Robert Richter
573408049b cxl/core/regs: Add @dev to cxl_register_map
The corresponding device of a register mapping is used for devm
operations and logging. For operations with struct cxl_register_map
the device needs to be kept track separately. To simpify the involved
function interfaces, add @dev to cxl_register_map.

While at it also reorder function arguments of cxl_map_device_regs()
and cxl_map_component_regs() to have the object @cxl_register_map
first.

As a result a bunch of functions are available to be used with a
@cxl_register_map object.

This patch is in preparation of reworking the component register setup
code.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-7-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:37:58 -07:00
Dan Williams
7481653dee cxl: Rename 'uport' to 'uport_dev'
For symmetry with the recent rename of ->dport_dev for a 'struct
cxl_dport', add the "_dev" suffix to the ->uport property of a 'struct
cxl_port'. These devices represent the downstream-port-device and
upstream-port-device respectively in the CXL/PCIe topology.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-6-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:37:54 -07:00
Robert Richter
227db57459 cxl: Rename member @dport of struct cxl_dport to @dport_dev
Reading code like dport->dport does not immediately suggest that this
points to the corresponding device structure of the dport. Rename
struct member @dport to @dport_dev.

While at it, also rename @new argument of add_dport() to @dport. This
better describes the variable as a dport (e.g. new->dport becomes to
dport->dport_dev).

Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-5-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:37:49 -07:00
Dan Williams
0619337856 cxl/rch: Prepare for caching the MMIO mapped PCIe AER capability
Prepare cxl_probe_rcrb() for retrieving more than just the component
register block. The RCH AER handling code wants to get back to the AER
capability that happens to be MMIO mapped rather then configuration
cycles.

Move RCRB specific downstream port data, like the RCRB base and the
AER capability offset, into its own data structure ('struct
cxl_rcrb_info') for cxl_probe_rcrb() to fill. Extend 'struct
cxl_dport' to include a 'struct cxl_rcrb_info' attribute.

This centralizes all RCRB scanning in one routine.

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-4-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:35:26 -07:00
Robert Richter
eb4663b07e cxl/acpi: Probe RCRB later during RCH downstream port creation
The RCRB is extracted already during ACPI CEDT table parsing while the
data of this is needed not earlier than dport creation. This
implementation comes with drawbacks: During ACPI table scan there is
already MMIO access including mapping and unmapping, but only ACPI
data should be collected here. The collected data must be transferred
through a couple of interfaces until it is finally consumed when
creating the dport. This causes complex data structures and function
interfaces. Additionally, RCRB parsing will be extended to also
extract AER data, it would be much easier do this at a later point
during port and dport creation when the data structures are available
to hold that data.

To simplify all that, probe the RCRB at a later point during RCH
downstream port creation. Change ACPI table parser to only extract the
base address of either the component registers or the RCRB. Parse and
extract the RCRB in devm_cxl_add_rch_dport().

This is in preparation to centralize all RCRB scanning.

Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-2-terry.bowman@amd.com
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20230622205523.85375-3-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25 11:35:20 -07:00
Jonathan Cameron
1ad3f701c3 cxl/pci: Find and register CXL PMU devices
CXL PMU devices can be found from entries in the Register
Locator DVSEC.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230526095824.16336-4-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-30 11:20:35 -07:00
Jonathan Cameron
d717d7f3df cxl: Add functions to get an instance of / count regblocks of a given type
Until the recently release CXL 3.0 specification, there
was only ever one instance of any given register block pointed
to by the Register Block Locator DVSEC. Now, the specification allows
for multiple CXL PMU instances, each with their own register block.

To enable this add cxl_find_regblock_instance() that takes an index
parameter and use that to implement cxl_count_regblock() and
cxl_find_regblock().

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230526095824.16336-3-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-30 11:20:35 -07:00
Dave Jiang
793a539ac7 cxl: Explicitly initialize resources when media is not ready
When media is not ready do not assume that the capacity information from
the identify command is valid, i.e. ->total_bytes
->partition_align_bytes ->{volatile,persistent}_only_bytes. Explicitly
zero out the capacity resources and exit early.

Given zero-init of those fields this patch is functionally equivalent to
the prior state, but it improves readability and robustness going
forward.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168506118166.3004974.13523455340007852589.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-26 13:34:39 -07:00
Davidlohr Bueso
ccadf1310f cxl/mbox: Add background cmd handling machinery
This adds support for handling background operations, as defined in
the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
can run in the background asynchronously (to the hardware).

The driver will deal with such commands synchronously, blocking all
other incoming commands for a specified period of time, allowing
time-slicing the command such that the caller can send incremental
requests to avoid monopolizing the driver/device. Any out of sync
(timeout) between the driver and hardware is just disregarded as
an invalid state until the next successful submission. Such timeouts
are considered a rare occurrence, either a real device problem or a
driver issue that needs to reduce the size of the background operation
to fit the timeout.

On devices where mbox interrupts are supported, this will still use
a poller that will wakeup in the specified wait intervals. The irq
handler will simply awake the blocked cmd, which is also safe vs a
task that is either waking (timing out) or already awoken. Similarly
any irq setup error during the probing falls back to polling, thus
avoids unnecessarily erroring out.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230523170927.20685-5-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-23 12:55:12 -07:00
Davidlohr Bueso
9f7a320d16 cxl/pci: Introduce cxl_request_irq()
Factor out common functionality/semantics for cxl shared interrupts
into a new helper on top of devm_request_irq().

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230523170927.20685-4-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-23 12:55:12 -07:00
Davidlohr Bueso
f279d0bc13 cxl/pci: Allocate irq vectors earlier during probe
Move the cxl_alloc_irq_vectors() call further up in the probing
in order to allow for mailbox interrupt usage. No change in
semantics.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230523170927.20685-3-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-23 12:55:12 -07:00
Robert Richter
a70fc4ed20 cxl/port: Fix NULL pointer access in devm_cxl_add_port()
In devm_cxl_add_port() the port creation may fail and its associated
pointer does not contain a valid address. During error message
generation this invalid port address is used. Fix that wrong address
access.

Fixes: f3cd264c4e ("cxl: Unify debug messages when calling devm_cxl_add_port()")
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230519215436.3394532-1-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-19 17:47:01 -07:00
Dave Jiang
e764f12208 cxl: Move cxl_await_media_ready() to before capacity info retrieval
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information. Allow cxl_pci_probe() to succeed
even if media is not ready. Cache the media failure in cxlds and don't ask
the device for any media information.

The rationale for proceeding in the !media_ready case is to allow for
mailbox operations to interrogate and/or remediate the device. After
media is repaired then rebinding the cxl_pci driver is expected to
restart the capacity scan.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Fixes: b39cb1052a ("cxl/mem: Register CXL memX devices")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168445310026.3251520.8124296540679268206.stgit@djiang5-mobl3
[djbw: fixup cxl_test]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-18 16:43:45 -07:00
Dave Jiang
ce17ad0d54 cxl: Wait Memory_Info_Valid before access memory related info
The Memory_Info_Valid bit (CXL 3.0 8.1.3.8.2) indicates that the CXL
Range Size High and Size Low registers are valid. The bit must be set
within 1 second of reset deassertion to the device. Check valid bit
before we check the Memory_Active bit when waiting for
cxl_await_media_ready() to ensure that the memory info is valid for
consumption. Also ensures both DVSEC ranges 1 and 2 are ready if DVSEC
Capability indicates they are both supported.

Fixes: 523e594d9c ("cxl/pci: Implement wait for media active")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168444687469.3134781.11033518965387297327.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-18 16:42:41 -07:00
Dan Williams
eb0764b822 cxl/port: Enable the HDM decoder capability for switch ports
Derick noticed, when testing hot plug, that hot-add behaves nominally
after a removal. However, if the hot-add is done without a prior
removal, CXL.mem accesses fail. It turns out that the original
implementation of the port driver and region programming wrongly assumed
that platform-firmware always enables the host-bridge HDM decoder
capability. Add support turning on switch-level HDM decoders in the case
where platform-firmware has not.

The implementation is careful to only arrange for the enable to be
undone if the current instance of the driver was the one that did the
enable. This is to interoperate with platform-firmware that may expect
CXL.mem to remain active after the driver is shutdown. This comes at the
cost of potentially not shutting down the enable on kexec flows, but it
is mitigated by the fact that the related HDM decoders still need to be
enabled on an individual basis.

Cc: <stable@vger.kernel.org>
Reported-by: Derick Marks <derick.w.marks@intel.com>
Fixes: 54cdbf845c ("cxl/port: Add a driver for 'struct cxl_port' objects")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/168437998331.403037.15719879757678389217.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-18 13:18:49 -07:00
Dave Jiang
764d102ef9 cxl: Add missing return to cdat read error path
Add a return to the error path when cxl_cdat_read_table() fails. Current
code continues with the table pointer points to freed memory.

Fixes: 7a877c9239 ("cxl/pci: Simplify CDAT retrieval error path")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/168382793506.3510737.4792518576623749076.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-13 00:20:06 -07:00
Linus Torvalds
7acc137211 cxl for v6.4
- Refactor the DOE infrastructure (Data Object Exchange PCI-config-cycle
   mailbox) to be a facility of the PCI core rather than the CXL core.
   This is foundational for upcoming support for PCI device-attestation and
   PCIe / CXL link encryption.
 
 - Add support for retrieving and injecting poison for CXL memory
   expanders. This enabling uses trace-events to convey CXL media error
   records to user tooling. It includes translation of device-local
   addresses (DPA) to system physical addresses (SPA) and their
   corresponding CXL region.
 
 - Fixes for decoder enumeration that missed v6.3-final
 
 - Miscellaneous fixups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZE2nNwAKCRDfioYZHlFs
 Z5c2AQCTWebok6CD+HN01xnIx+CBWAUQe0QIGR40dT2P6/WGEgEA8wMae0w/FDlc
 lQDvSoIyPvy1hGO7Ppb0K2AT6jrQAgU=
 =blcC
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull compute express link updates from Dan Williams:
 "DOE support is promoted from drivers/cxl/ to drivers/pci/ with Bjorn's
  blessing, and the CXL core continues to mature its media management
  capabilities with support for listing and injecting media errors. Some
  late fixes that missed v6.3-final are also included:

   - Refactor the DOE infrastructure (Data Object Exchange
     PCI-config-cycle mailbox) to be a facility of the PCI core rather
     than the CXL core.

     This is foundational for upcoming support for PCI
     device-attestation and PCIe / CXL link encryption.

   - Add support for retrieving and injecting poison for CXL memory
     expanders.

     This enabling uses trace-events to convey CXL media error records
     to user tooling. It includes translation of device-local addresses
     (DPA) to system physical addresses (SPA) and their corresponding
     CXL region.

   - Fixes for decoder enumeration that missed v6.3-final

   - Miscellaneous fixups"

* tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (38 commits)
  cxl/test: Add mock test for set_timestamp
  cxl/mbox: Update CMD_RC_TABLE
  tools/testing/cxl: Require CONFIG_DEBUG_FS
  tools/testing/cxl: Add a sysfs attr to test poison inject limits
  tools/testing/cxl: Use injected poison for get poison list
  tools/testing/cxl: Mock the Clear Poison mailbox command
  tools/testing/cxl: Mock the Inject Poison mailbox command
  cxl/mem: Add debugfs attributes for poison inject and clear
  cxl/memdev: Trace inject and clear poison as cxl_poison events
  cxl/memdev: Warn of poison inject or clear to a mapped region
  cxl/memdev: Add support for the Clear Poison mailbox command
  cxl/memdev: Add support for the Inject Poison mailbox command
  tools/testing/cxl: Mock support for Get Poison List
  cxl/trace: Add an HPA to cxl_poison trace events
  cxl/region: Provide region info to the cxl_poison trace event
  cxl/memdev: Add trigger_poison_list sysfs attribute
  cxl/trace: Add TRACE support for CXL media-error records
  cxl/mbox: Add GET_POISON_LIST mailbox command
  cxl/mbox: Initialize the poison state
  cxl/mbox: Restrict poison cmds to debugfs cxl_raw_allow_all
  ...
2023-04-30 11:51:51 -07:00
Linus Torvalds
556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Davidlohr Bueso
bfe58458fd cxl/mbox: Update CMD_RC_TABLE
As of CXL 3.0 there have some added return codes, update the
driver accordingly.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230307042655.6714-1-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 12:10:26 -07:00
Dan Williams
ca899f4021 Merge branch 'for-6.3/cxl-autodetect-fixes' into for-6.4/cxl
Pick up late v6.3 fixes for v6.4.
2023-04-23 12:10:13 -07:00
Dan Williams
856ef55e7e Merge branch 'for-6.4/cxl-poison' into for-6.4/cxl
Include the poison list and injection infrastructure from Alison for
v6.4.
2023-04-23 12:09:56 -07:00
Alison Schofield
50d527f52c cxl/mem: Add debugfs attributes for poison inject and clear
Inject and Clear Poison commands are optionally supported by CXL
memdev devices and are intended for use in debug environments only.
Add debugfs attributes for user access.

Documentation/ABI/testing/debugfs-cxl describes the usage.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/0c9ea8e671b8e58465d18722788b60d325c675c7.1681874357.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 12:08:39 -07:00
Alison Schofield
98b6926562 cxl/memdev: Trace inject and clear poison as cxl_poison events
The cxl_poison trace event allows users to view the history of poison
list reads. With the addition of inject and clear poison capabilities,
users will expect similar tracing.

Add trace types 'Inject' and 'Clear' to the cxl_poison trace_event and
trace successful operations only.

If the driver finds that the DPA being injected or cleared of poison
is mapped in a region, that region info is included in the cxl_poison
trace event. Region reconfigurations can make this extra info useless
if the debug operations are not carefully managed.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/e20eb7c3029137b480ece671998c183da0477e2e.1681874357.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 12:08:39 -07:00
Alison Schofield
0a105ab28a cxl/memdev: Warn of poison inject or clear to a mapped region
Inject and clear poison capabilities and intended for debug usage only.
In order to be useful in debug environments, the driver needs to allow
inject and clear operations on DPAs mapped in regions.

dev_warn_once() when either operation occurs.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/f911ca5277c9d0f9757b72d7e6842871bfff4fa2.1681874357.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:46:22 -07:00
Alison Schofield
9690b07748 cxl/memdev: Add support for the Clear Poison mailbox command
CXL devices optionally support the CLEAR POISON mailbox command. Add
memdev driver support for clearing poison.

Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
clear poison request, the device removes the address from the device's
Poison List and writes 0 (zero) for 64 bytes starting at address. If
the device cannot clear poison from the address, it returns a permanent
media error and -ENXIO is returned to the user.

Additionally, and per the spec also, it is not an error to clear poison
of an address that is not poisoned.

If the address is not contained in the device's dpa resource, or is
not 64 byte aligned, the driver returns -EINVAL without sending the
command to the device.

Poison clearing is intended for debug only and will be exposed to
userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.

Implementation note: Although the CXL specification defines the clear
command to accept 64 bytes of 'write-data', this implementation always
uses zeroes as write-data.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/8682c30ec24bd9c45af5feccb04b02be51e58c0a.1681874357.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:46:22 -07:00
Alison Schofield
d2fbc48658 cxl/memdev: Add support for the Inject Poison mailbox command
CXL devices optionally support the INJECT POISON mailbox command. Add
memdev driver support for the mailbox command.

Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
inject poison request, the device will return poison when the address
is accessed through the CXL.mem driver. Injecting poison adds the address
to the device's Poison List and the error source is set to Injected.
In addition, the device adds a poison creation event to its internal
Informational Event log, updates the Event Status register, and if
configured, interrupts the host.

Also, per the CXL Specification, it is not an error to inject poison
into an address that already has poison present and no error is
returned from the device.

If the address is not contained in the device's dpa resource, or is
not 64 byte aligned, return -EINVAL without issuing the mbox command.

Poison injection is intended for debug only and will be exposed to
userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/241c64115e6bd2effed9c7a20b08b3908dd7be8f.1681874357.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:46:22 -07:00
Alison Schofield
28a3ae4ff6 cxl/trace: Add an HPA to cxl_poison trace events
When a cxl_poison trace event is reported for a region, the poisoned
Device Physical Address (DPA) can be translated to a Host Physical
Address (HPA) for consumption by user space.

Translate and add the resulting HPA to the cxl_poison trace event.
Follow the device decode logic as defined in the CXL Spec 3.0 Section
8.2.4.19.13.

If no region currently maps the poison, assign ULLONG_MAX to the
cxl_poison event hpa field.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/6d3cd726f9042a59902785b0a2cb3ddfb70e0219.1681838292.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:46:13 -07:00
Alison Schofield
f0832a5863 cxl/region: Provide region info to the cxl_poison trace event
User space may need to know which region, if any, maps the poison
address(es) logged in a cxl_poison trace event. Since the mapping
of DPAs (device physical addresses) to a region can change, the
kernel must provide this information at the time the poison list
is read. The event informs user space that at event <timestamp>
this <region> mapped to this <DPA>, which is poisoned.

The cxl_poison trace event is already wired up to log the region
name and uuid if it receives param 'struct cxl_region'.

In order to provide that cxl_region, add another method for gathering
poison - by committed endpoint decoder mappings. This method is only
available with CONFIG_CXL_REGION and is only used if a region actually
maps the memdev where poison is being read. After the region driver
reads the poison list for all the mapped resources, poison is read for
any remaining unmapped resources.

The default method remains: read the poison by memdev resource.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/438b01ccaa70592539e8eda4eb2b1d617ba03160.1681838292.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:46:06 -07:00
Alison Schofield
7ff6ad1075 cxl/memdev: Add trigger_poison_list sysfs attribute
When a boolean 'true' is written to this attribute the memdev driver
retrieves the poison list from the device. The list consists of
addresses that are poisoned, or would result in poison if accessed,
and the source of the poison. This attribute is only visible for
devices supporting the capability. The retrieved errors are logged
as kernel events when cxl_poison event tracing is enabled.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/1081cfdc8a349dc754779642d584707e56db26ba.1681838291.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:46:02 -07:00
Alison Schofield
ddf49d57b8 cxl/trace: Add TRACE support for CXL media-error records
CXL devices may support the retrieval of a device poison list.
Add a new trace event that the CXL subsystem may use to log
the media-error records returned in the poison list.

Log each media-error record as a cxl_poison trace event of
type 'List'.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/de6196f5269483d886ab1834744f82d27189a666.1681838291.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:45:53 -07:00
Alison Schofield
ed83f7ca39 cxl/mbox: Add GET_POISON_LIST mailbox command
CXL devices maintain a list of locations that are poisoned or result
in poison if the addresses are accessed by the host.

Per the spec, (CXL 3.0 8.2.9.8.4.1), the device returns this Poison
list as a set of Media Error Records that include the source of the
error, the starting device physical address, and length. The length is
the number of adjacent DPAs in the record and is in units of 64 bytes.

Retrieve the poison list.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/a1f332e817834ef8e89c0ff32e760308fb903346.1681838291.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:45:49 -07:00
Alison Schofield
d0abf5787a cxl/mbox: Initialize the poison state
Driver reads of the poison list are synchronized to ensure that a
reader does not get an incomplete list because their request
overlapped (was interrupted or preceded by) another read request
of the same DPA range. (CXL Spec 3.0 Section 8.2.9.8.4.1). The
driver maintains state information to achieve this goal.

To initialize the state, first recognize the poison commands in
the CEL (Command Effects Log). If the device supports Get Poison
List, allocate a single buffer for the poison list and protect it
with a lock.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/9078d180769be28a5087288b38cdfc827cae58bf.1681838291.git.alison.schofield@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:45:26 -07:00
Alison Schofield
dec441d32a cxl/mbox: Restrict poison cmds to debugfs cxl_raw_allow_all
The Get, Inject, and Clear poison commands are not available for
direct user access because they require kernel driver controls to
perform safely.

Further restrict access to these commands by requiring the selection
of the debugfs attribute 'cxl_raw_allow_all' to enable in raw mode.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/0e5cb41ffae2bab800957d3b9003eedfd0a2dfd5.1681838291.git.alison.schofield@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23 11:45:09 -07:00
Dan Williams
3db166d6cf cxl/mbox: Deprecate poison commands
The CXL subsystem is adding formal mechanisms for managing device
poison. Minimize the maintenance burden going forward, and maximize
the investment in common tooling by deprecating direct user access
to poison commands outside of CXL_MEM_RAW_COMMANDS debug scenarios.

A new cxl_deprecated_commands[] list is created for querying which
command ids defined in previous kernels are now deprecated.

CXL Media and Poison Management commands, opcodes 0x43XX, defined in
CXL 3.0 Spec, Table 8-93 are deprecated with one exception: Get Scan
Media Capabilities. Keep Get Scan Media Capabilities as it simply
provides information and has no impact on the device state.

Effectively all of the commands defined in:

commit 87815ee9d0 ("cxl/pci: Add media provisioning required commands")

...were defined prematurely and should have waited until the kernel
implementation was decided. To my knowledge there are no shipping
devices with poison support and no known tools that would regress with
this change.

Co-developed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/652197e9bc8885e6448d989405b9e50ee9d6b0a6.1681838291.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-22 14:41:30 -07:00
Dan Williams
267214a231 cxl/port: Fix port to pci device assumptions in read_cdat_data()
Not all endpoint CXL ports are associated with PCI devices. The cxl_test
infrastructure models 'struct cxl_port' instances hosted by platform
devices. Teach read_cdat_data() to be careful about non-pci hosted
cxl_memdev instances. Otherwise, cxl_test crashes with this signature:

 RIP: 0010:xas_start+0x6d/0x290
 [..]
 Call Trace:
  <TASK>
  xas_load+0xa/0x50
  xas_find+0x25b/0x2f0
  xa_find+0x118/0x1d0
  pci_find_doe_mailbox+0x51/0xc0
  read_cdat_data+0x45/0x190 [cxl_core]
  cxl_port_probe+0x10a/0x1e0 [cxl_port]
  cxl_bus_probe+0x17/0x50 [cxl_core]

Some other cleanups are included like removing the single-use @uport
variable, and removing the indirection through 'struct cxl_dev_state' to
lookup the device that registered the memdev and may be a pci device.

Fixes: af0a6c3587 ("cxl/pci: Use CDAT DOE mailbox created by PCI core")
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/168213190748.708404.16215095414060364800.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-22 14:04:43 -07:00
Lukas Wunner
f960e57dca cxl/pci: Rightsize CDAT response allocation
Jonathan notes that cxl_cdat_get_length() and cxl_cdat_read_table()
allocate 32 dwords for the DOE response even though it may be smaller.

In the case of cxl_cdat_get_length(), only the second dword of the
response is of interest (it contains the length).  So reduce the
allocation to 2 dwords and let DOE discard the remainder.

In the case of cxl_cdat_read_table(), a correctly sized allocation for
the full CDAT already exists.  Let DOE write each table entry directly
into that allocation.  There's a snag in that the table entry is
preceded by a Table Access Response Header (1 dword, CXL 3.0 table 8-14).
Save the last dword of the previous table entry, let DOE overwrite it
with the header of the next entry and restore it afterwards.

The resulting CDAT is preceded by 4 unavoidable useless bytes.  Increase
the allocation size accordingly.

The buffer overflow check in cxl_cdat_read_table() becomes unnecessary
because the remaining bytes in the allocation are tracked in "length",
which is passed to DOE and limits how many bytes it writes to the
allocation.  Additionally, cxl_cdat_read_table() bails out if the DOE
response is truncated due to insufficient space.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/7a4e1f86958a79a70f29b96a92199522f08f8322.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:36:58 -07:00
Dave Jiang
7a877c9239 cxl/pci: Simplify CDAT retrieval error path
The cdat.table and cdat.length fields in struct cxl_port are set before
CDAT retrieval and must therefore be unset on failure.

Simplify by setting only on success.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Link: https://lore.kernel.org/linux-cxl/20230209113422.00007017@Huawei.com/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[lukas: rebase and rephrase commit message]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/7a5c7104fb6a3016dbaec1c5d0ed34619ea11a0c.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:36:58 -07:00
Lukas Wunner
af0a6c3587 cxl/pci: Use CDAT DOE mailbox created by PCI core
The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration.

Drop creation of a (duplicate) CDAT DOE mailbox on cxl probing in favor
of the one already created by the PCI core.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/becaf70e8faf9681d474200117d62d7eaac46cca.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:36:58 -07:00
Lukas Wunner
58709b924e cxl/pci: Use synchronous API for DOE
A synchronous API for DOE has just been introduced.  Convert CXL CDAT
retrieval over to it.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/c329c0a21c11c3b524ce2336b0bbb3c80a28c415.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:36:58 -07:00
Dan Williams
c841ecd827 cxl/hdm: Add more HDM decoder debug messages at startup
A recent debug session yielded a couple debug messages that were useful
for determining the reason why the driver was or was not falling back
to CXL range register emulation, and for identifying decoder setting
enumeration problems.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149845668.792294.11814353796371419167.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:32:47 -07:00
Dan Williams
7bba261e0a cxl/port: Scan single-target ports for decoders
Do not assume that a single-target port falls back to a passthrough
decoder configuration. Scan for decoders and only fallback after probing
that the HDM decoder capability is not present.

One user visible affect of this bug is the inability to enumerate
present CXL regions as the decoder settings for the present decoders are
skipped.

Fixes: d17d0540a0 ("cxl/core/hdm: Add CXL standard decoder enumeration to the core")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: http://lore.kernel.org/r/20230227153128.8164-1-Jonathan.Cameron@huawei.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149845130.792294.3210421233937427962.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:32:47 -07:00
Dan Williams
104087a8aa cxl/core: Drop unused io-64-nonatomic-lo-hi.h
After the discovery of a case where an implementation misbehaves with
register reads larger than the definition of the register the other
usages of readq() were audited and found to be correct, but some cases
where the io-64-nonatomic-lo-hi.h include is not needed were discovered,
delete them.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149844596.792294.8273108394688012953.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:32:46 -07:00