Similar to how the acpi_nfit driver exports Optane dirty shutdown count,
introduce:
/sys/bus/cxl/devices/nvdimm-bridge0/ndbusX/nmemY/cxl/dirty_shutdown
Under the conditions that 1) dirty shutdown can be set, 2) Device GPF
DVSEC exists, and 3) the count itself can be retrieved.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250220220235.276831-4-dave@stgolabs.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Add support for GPF flows. It is found that the CXL specification
around this to be a bit too involved from the driver side. And while
this should really all handled by the hardware, this patch takes
things with a grain of salt.
Upon respective port enumeration, both phase timeouts are set to
a max of 20 seconds, which is the NMI watchdog default for lockup
detection. The premise is that the kernel does not have enough
information to set anything better than a max across the board
and hope devices finish their GPF flows within the platform energy
budget.
Timeout detection is based on dirty Shutdown semantics. The driver
will mark it as dirty, expecting that the device clear it upon a
successful GPF event. The admin may consult the device Health and
check the dirty shutdown counter to see if there was a problem
with data integrity.
[ davej: Explicitly set return to 0 in update_gpf_port_dvsec() ]
[ davej: Add spec reference for 'struct cxl_mbox_set_shutdown_state_in ]
[ davej: Fix 0-day reported issue ]
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250124233533.910535-1-dave@stgolabs.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Provide a survey of the work-in-progress maturity (implementation
status) of various aspects of the CXL subsystem.
Clarify that in addition to ongoing upkeep relative to specification
updates, there are some long running themes in the driver that respond
to the discovery of new corner cases (bugs) and new use cases (feature
extensions).
The primary audience is distribution maintainers, but it also serves as
a guide for kernel developers to understand what aspects of the CXL
subsystem need more help. It is a landing page to document ongoing
progress, and a guide to discern exposure to work-in-progress features.
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/172005486862.2048248.6668794717827294862.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>