... as is the case with all members of struct cxl_memdev_state.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230726051940.3570-3-dave@stgolabs.net
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
The CXL_FW_CANCEL macro is used with set/test_bit() so it should be a
bit number and not the shifted value. The original code is the
equivalent of using BIT(BIT(0)) so it's 0x2 instead of 0x1. This has
no effect on runtime because it's done consistently and nothing else
was using the 0x2 bit.
Fixes: 9521875bbe ("cxl: add a firmware update mechanism using the sysfs firmware loader")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/a11b0c78-4717-4f4e-90be-f47f300d607c@moroto.mountain
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Pick up the first half of the RCH error handling series. The back half
needs some fixups for test regressions. Small conflicts with the PMU
work around register enumeration and setup helpers.
Pick up the driver cleanups identified in preparation for CXL "type-2"
(accelerator) device support. The major change here from a conflict
generation perspective is the split of 'struct cxl_memdev_state' from
the core 'struct cxl_dev_state'. Since an accelerator may not care about
all the optional features that are standard on a CXL "type-3" (host-only
memory expander) device.
A silent conflict also occurs with the move of the endpoint port to be a
formal property of a 'struct cxl_memdev' rather than drvdata.
Add the first typical (non-sanitization) consumer of the new background
command infrastructure, firmware update. Given both firmware-update and
sanitization were developed in parallel from the common
background-command baseline, resolve some minor context conflicts.
The sysfs based firmware loader mechanism was created to easily allow
userspace to upload firmware images to FPGA cards. This also happens to
be pretty suitable to create a user-initiated but kernel-controlled
firmware update mechanism for CXL devices, using the CXL specified
mailbox commands.
Since firmware update commands can be long-running, and can be processed
in the background by the endpoint device, it is desirable to have the
ability to chunk the firmware transfer down to smaller pieces, so that
one operation does not monopolize the mailbox, locking out any other
long running background commands entirely - e.g. security commands like
'sanitize' or poison scanning operations.
The firmware loader mechanism allows a natural way to perform this
chunking, as after each mailbox command, that is restricted to the
maximum mailbox payload size, the cxl memdev driver relinquishes control
back to the fw_loader system and awaits the next chunk of data to
transfer. This opens opportunities for other background commands to
access the mailbox and send their own slices of background commands.
Add the necessary helpers and state tracking to be able to perform the
'Get FW Info', 'Transfer FW', and 'Activate FW' mailbox commands as
described in the CXL spec. Wire these up to the firmware loader
callbacks, and register with that system to create the memX/firmware/
sysfs ABI.
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Widawsky <bwidawsk@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20230602-vv-fw_update-v4-1-c6265bd7343b@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Implement support for the non-pmem exclusive secure erase, per
CXL specs. Create a write-only 'security/erase' sysfs file to
perform the requested operation.
As with the sanitation this requires the device being offline
and thus no active HPA-DPA decoding.
The expectation is that userspace can use it such as:
cxl disable-memdev memX
echo 1 > /sys/bus/cxl/devices/memX/security/erase
cxl enable-memdev memX
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230612181038.14421-7-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Implement support for CXL 3.0 8.2.9.8.5.1 Sanitize. This is done by
adding a security/sanitize' memdev sysfs file to trigger the operation
and extend the status file to make it poll(2)-capable for completion.
Unlike all other background commands, this is the only operation that
is special and monopolizes the device for long periods of time.
In addition to the traditional pmem security requirements, all regions
must also be offline in order to perform the operation. This permits
avoiding explicit global CPU cache management, relying instead on the
implict cache management when a region transitions between
CXL_CONFIG_ACTIVE and CXL_CONFIG_COMMIT.
The expectation is that userspace can use it such as:
cxl disable-memdev memX
echo 1 > /sys/bus/cxl/devices/memX/security/sanitize
cxl wait-sanitize memX
cxl enable-memdev memX
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230612181038.14421-5-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Sanitization is by definition a device-monopolizing operation, and thus
the timeslicing rules for other background commands do not apply.
As such handle this special case asynchronously and return immediately.
Subsequent changes will allow completion to be pollable from userspace
via a sysfs file interface.
For devices that don't support interrupts for notifying background
command completion, self-poll with the caveat that the poller can
be out of sync with the ready hardware, and therefore care must be
taken to not allow any new commands to go through until the poller
sees the hw completion. The poller takes the mbox_mutex to stabilize
the flagging, minimizing any runtime overhead in the send path to
check for 'sanitize_tmo' for uncommon poll scenarios.
The irq case is much simpler as hardware will serialize/error
appropriately.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230612181038.14421-4-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add a read-only sysfs file to display the security state
of a device (currently only pmem):
/sys/bus/cxl/devices/memX/security/state
This introduces a cxl_security_state structure that is
to be the placeholder for common CXL security features.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230612181038.14421-3-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Move the endpoint port that the cxl_mem driver establishes from drvdata
to a first class attribute. This is in preparation for device-memory
drivers reusing the CXL core for memory region management. Those drivers
need a type-safe method to retrieve their CXL port linkage. Leave
drvdata for private usage of the cxl_mem driver not external consumers
of a 'struct cxl_memdev' object.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679264292.3436160.3901392135863405807.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In support of the Linux CXL core scaling for a wider set of CXL devices,
allow for the creation of memdevs with some memory device capabilities
disabled. Specifically, allow for CXL devices outside of those claiming
to be compliant with the generic CXL memory device class code, like
vendor specific Type-2/3 devices that host CXL.mem. This implies, allow
for the creation of memdevs that only support component-registers, not
necessarily memory-device-registers (like mailbox registers). A memdev
derived from a CXL endpoint that does not support generic class code
expectations is tagged "CXL_DEVTYPE_DEVMEM", while a memdev derived from a
class-code compliant endpoint is tagged "CXL_DEVTYPE_CLASSMEM".
The primary assumption of a CXL_DEVTYPE_DEVMEM memdev is that it
optionally may not host a mailbox. Disable the command passthrough ioctl
for memdevs that are not CXL_DEVTYPE_CLASSMEM, and return empty strings
from memdev attributes associated with data retrieved via the
class-device-standard IDENTIFY command. Note that empty strings were
chosen over attribute visibility to maintain compatibility with shipping
versions of cxl-cli that expect those attributes to always be present.
Once cxl-cli has dropped that requirement this workaround can be
deprecated.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679260782.3436160.7587293613945445365.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
'struct cxl_dev_state' makes too many assumptions about the capabilities
of a CXL device. In particular it assumes a CXL device has a mailbox and
all of the infrastructure and state that comes along with that.
In preparation for supporting accelerator / Type-2 devices that may not
have a mailbox and in general maintain a minimal core context structure,
make mailbox functionality a super-set of 'struct cxl_dev_state' with
'struct cxl_memdev_state'.
With this reorganization it allows for CXL devices that support HDM
decoder mapping, but not other general-expander / Type-3 capabilities,
to only enable that subset without the rest of the mailbox
infrastructure coming along for the ride.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679260240.3436160.15520641540463704524.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After Jonathan noticed [1] that 'struct cxl_dev_state' had a kernel-doc
entry without a corresponding struct attribute I ran the kernel-doc
script to see what else might be broken. Fix these warnings:
drivers/cxl/cxlmem.h:199: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Event Interrupt Policy
drivers/cxl/cxlmem.h:224: warning: Function parameter or member 'buf' not described in 'cxl_event_state'
drivers/cxl/cxlmem.h:224: warning: Function parameter or member 'log_lock' not described in 'cxl_event_state'
Note that scripts/kernel-doc only finds missing kernel-doc entries. It
does not warn on too many kernel-doc entries, i.e. it did not catch the
fact that @info refers to a not present member.
Link: http://lore.kernel.org/r/20230606121054.000069e1@Huawei.com [1]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168679259170.3436160.3686460404739136336.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
For symmetry with the recent rename of ->dport_dev for a 'struct
cxl_dport', add the "_dev" suffix to the ->uport property of a 'struct
cxl_port'. These devices represent the downstream-port-device and
upstream-port-device respectively in the CXL/PCIe topology.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-6-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This adds support for handling background operations, as defined in
the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
can run in the background asynchronously (to the hardware).
The driver will deal with such commands synchronously, blocking all
other incoming commands for a specified period of time, allowing
time-slicing the command such that the caller can send incremental
requests to avoid monopolizing the driver/device. Any out of sync
(timeout) between the driver and hardware is just disregarded as
an invalid state until the next successful submission. Such timeouts
are considered a rare occurrence, either a real device problem or a
driver issue that needs to reduce the size of the background operation
to fit the timeout.
On devices where mbox interrupts are supported, this will still use
a poller that will wakeup in the specified wait intervals. The irq
handler will simply awake the blocked cmd, which is also safe vs a
task that is either waking (timing out) or already awoken. Similarly
any irq setup error during the probing falls back to polling, thus
avoids unnecessarily erroring out.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230523170927.20685-5-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information. Allow cxl_pci_probe() to succeed
even if media is not ready. Cache the media failure in cxlds and don't ask
the device for any media information.
The rationale for proceeding in the !media_ready case is to allow for
mailbox operations to interrogate and/or remediate the device. After
media is repaired then rebinding the cxl_pci driver is expected to
restart the capacity scan.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Fixes: b39cb1052a ("cxl/mem: Register CXL memX devices")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168445310026.3251520.8124296540679268206.stgit@djiang5-mobl3
[djbw: fixup cxl_test]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
As of CXL 3.0 there have some added return codes, update the
driver accordingly.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230307042655.6714-1-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL devices optionally support the CLEAR POISON mailbox command. Add
memdev driver support for clearing poison.
Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid
clear poison request, the device removes the address from the device's
Poison List and writes 0 (zero) for 64 bytes starting at address. If
the device cannot clear poison from the address, it returns a permanent
media error and -ENXIO is returned to the user.
Additionally, and per the spec also, it is not an error to clear poison
of an address that is not poisoned.
If the address is not contained in the device's dpa resource, or is
not 64 byte aligned, the driver returns -EINVAL without sending the
command to the device.
Poison clearing is intended for debug only and will be exposed to
userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
Implementation note: Although the CXL specification defines the clear
command to accept 64 bytes of 'write-data', this implementation always
uses zeroes as write-data.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/8682c30ec24bd9c45af5feccb04b02be51e58c0a.1681874357.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL devices optionally support the INJECT POISON mailbox command. Add
memdev driver support for the mailbox command.
Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
inject poison request, the device will return poison when the address
is accessed through the CXL.mem driver. Injecting poison adds the address
to the device's Poison List and the error source is set to Injected.
In addition, the device adds a poison creation event to its internal
Informational Event log, updates the Event Status register, and if
configured, interrupts the host.
Also, per the CXL Specification, it is not an error to inject poison
into an address that already has poison present and no error is
returned from the device.
If the address is not contained in the device's dpa resource, or is
not 64 byte aligned, return -EINVAL without issuing the mbox command.
Poison injection is intended for debug only and will be exposed to
userspace through debugfs. Restrict compilation to CONFIG_DEBUG_FS.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/241c64115e6bd2effed9c7a20b08b3908dd7be8f.1681874357.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When a boolean 'true' is written to this attribute the memdev driver
retrieves the poison list from the device. The list consists of
addresses that are poisoned, or would result in poison if accessed,
and the source of the poison. This attribute is only visible for
devices supporting the capability. The retrieved errors are logged
as kernel events when cxl_poison event tracing is enabled.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/1081cfdc8a349dc754779642d584707e56db26ba.1681838291.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL devices maintain a list of locations that are poisoned or result
in poison if the addresses are accessed by the host.
Per the spec, (CXL 3.0 8.2.9.8.4.1), the device returns this Poison
list as a set of Media Error Records that include the source of the
error, the starting device physical address, and length. The length is
the number of adjacent DPAs in the record and is in units of 64 bytes.
Retrieve the poison list.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/a1f332e817834ef8e89c0ff32e760308fb903346.1681838291.git.alison.schofield@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Driver reads of the poison list are synchronized to ensure that a
reader does not get an incomplete list because their request
overlapped (was interrupted or preceded by) another read request
of the same DPA range. (CXL Spec 3.0 Section 8.2.9.8.4.1). The
driver maintains state information to achieve this goal.
To initialize the state, first recognize the poison commands in
the CEL (Command Effects Log). If the device supports Get Poison
List, allocate a single buffer for the poison list and protect it
with a lock.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/9078d180769be28a5087288b38cdfc827cae58bf.1681838291.git.alison.schofield@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration.
Drop creation of a (duplicate) CDAT DOE mailbox on cxl probing in favor
of the one already created by the PCI core.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/becaf70e8faf9681d474200117d62d7eaac46cca.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
- CXL RAM region enumeration: instantiate 'struct cxl_region' objects
for platform firmware created memory regions
- CXL RAM region provisioning: complement the existing PMEM region
creation support with RAM region support
- "Soft Reservation" policy change: Online (memory hot-add)
soft-reserved memory (EFI_MEMORY_SP) by default, but still allow for
setting aside such memory for dedicated access via device-dax.
- CXL Events and Interrupts: Takeover CXL event handling from
platform-firmware (ACPI calls this CXL Memory Error Reporting) and
export CXL Events via Linux Trace Events.
- Convey CXL _OSC results to drivers: Similar to PCI, let the CXL
subsystem interrogate the result of CXL _OSC negotiation.
- Emulate CXL DVSEC Range Registers as "decoders": Allow for
first-generation devices that pre-date the definition of the CXL HDM
Decoder Capability to translate the CXL DVSEC Range Registers into
'struct cxl_decoder' objects.
- Set timestamp: Per spec, set the device timestamp in case of hotplug,
or if platform-firwmare failed to set it.
- General fixups: linux-next build issues, non-urgent fixes for
pre-production hardware, unit test fixes, spelling and debug message
improvements.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY/WYcgAKCRDfioYZHlFs
Z6m3APkBUtiEEm1o8ikdu5llUS1OTLBwqjJDwGMTyf8X/WDXhgD+J2mLsCgARS7X
5IS0RAtefutrW5sQpUucPM7QiLuraAY=
=kOXC
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dan Williams:
"To date Linux has been dependent on platform-firmware to map CXL RAM
regions and handle events / errors from devices. With this update we
can now parse / update the CXL memory layout, and report events /
errors from devices. This is a precursor for the CXL subsystem to
handle the end-to-end "RAS" flow for CXL memory. i.e. the flow that
for DDR-attached-DRAM is handled by the EDAC driver where it maps
system physical address events to a field-replaceable-unit (FRU /
endpoint device). In general, CXL has the potential to standardize
what has historically been a pile of memory-controller-specific error
handling logic.
Another change of note is the default policy for handling RAM-backed
device-dax instances. Previously the default access mode was "device",
mmap(2) a device special file to access memory. The new default is
"kmem" where the address range is assigned to the core-mm via
add_memory_driver_managed(). This saves typical users from wondering
why their platform memory is not visible via free(1) and stuck behind
a device-file. At the same time it allows expert users to deploy
policy to, for example, get dedicated access to high performance
memory, or hide low performance memory from general purpose kernel
allocations. This affects not only CXL, but also systems with
high-bandwidth-memory that platform-firmware tags with the
EFI_MEMORY_SP (special purpose) designation.
Summary:
- CXL RAM region enumeration: instantiate 'struct cxl_region' objects
for platform firmware created memory regions
- CXL RAM region provisioning: complement the existing PMEM region
creation support with RAM region support
- "Soft Reservation" policy change: Online (memory hot-add)
soft-reserved memory (EFI_MEMORY_SP) by default, but still allow
for setting aside such memory for dedicated access via device-dax.
- CXL Events and Interrupts: Takeover CXL event handling from
platform-firmware (ACPI calls this CXL Memory Error Reporting) and
export CXL Events via Linux Trace Events.
- Convey CXL _OSC results to drivers: Similar to PCI, let the CXL
subsystem interrogate the result of CXL _OSC negotiation.
- Emulate CXL DVSEC Range Registers as "decoders": Allow for
first-generation devices that pre-date the definition of the CXL
HDM Decoder Capability to translate the CXL DVSEC Range Registers
into 'struct cxl_decoder' objects.
- Set timestamp: Per spec, set the device timestamp in case of
hotplug, or if platform-firwmare failed to set it.
- General fixups: linux-next build issues, non-urgent fixes for
pre-production hardware, unit test fixes, spelling and debug
message improvements"
* tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (66 commits)
dax/kmem: Fix leak of memory-hotplug resources
cxl/mem: Add kdoc param for event log driver state
cxl/trace: Add serial number to trace points
cxl/trace: Add host output to trace points
cxl/trace: Standardize device information output
cxl/pci: Remove locked check for dvsec_range_allowed()
cxl/hdm: Add emulation when HDM decoders are not committed
cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decoders
cxl/hdm: Emulate HDM decoder from DVSEC range registers
cxl/pci: Refactor cxl_hdm_decode_init()
cxl/port: Export cxl_dvsec_rr_decode() to cxl_port
cxl/pci: Break out range register decoding from cxl_hdm_decode_init()
cxl: add RAS status unmasking for CXL
cxl: remove unnecessary calling of pci_enable_pcie_error_reporting()
dax/hmem: build hmem device support as module if possible
dax: cxl: add CXL_REGION dependency
cxl: avoid returning uninitialized error code
cxl/pmem: Fix nvdimm registration races
cxl/mem: Fix UAPI command comment
cxl/uapi: Tag commands from cxl_query_cmd()
...
Pick up the CXL DVSEC range register emulation for v6.3, and resolve
conflicts with the cxl_port_probe() split (from for-6.3/cxl-ram-region)
and event handling (from for-6.3/cxl-events).
Call cxl_dvsec_rr_decode() in the beginning of cxl_port_probe() and
preserve the decoded information in a local
'struct cxl_endpoint_dvsec_info'. This info can be passed to various
functions later on in order to support the HDM decoder emulation.
The invocation of cxl_dvsec_rr_decode() in cxl_hdm_decode_init() is
removed and a pointer to the 'struct cxl_endpoint_dvsec_info' is passed
in.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640367377.935665.2848747799651019676.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Include the support for enumerating and provisioning ram regions for
v6.3. This also include a default policy change for ram / volatile
device-dax instances to assign them to the dax_kmem driver by default.
CXL_CMD_FLAG_NONE is not used, remove it.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221222-cxl-misc-v4-1-62f701c1cdd1@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Take two endpoints attached to the first switch on the first host-bridge
in the cxl_test topology and define a pre-initialized region. This is a
x2 interleave underneath a x1 CXL Window.
$ modprobe cxl_test
$ # cxl list -Ru
{
"region":"region3",
"resource":"0xf010000000",
"size":"512.00 MiB (536.87 MB)",
"interleave_ways":2,
"interleave_granularity":4096,
"decode_state":"commit"
}
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167602000547.1924368.11613151863880268868.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Testing of ram region support [1], stimulates a long standing bug in
cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to
inability to walk ports after dports have been unregistered. That
results in a failure to re-register a memdev after the port is
re-enabled leading to a crash like the following:
cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256
general protection fault, ...
[..]
RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core]
dev_name at include/linux/device.h:700
(inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155
(inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249
[..]
Call Trace:
<TASK>
attach_target+0x39a/0x760 [cxl_core]
? __mutex_unlock_slowpath+0x3a/0x290
cxl_add_to_region+0xb8/0x340 [cxl_core]
? lockdep_hardirqs_on+0x7d/0x100
discover_region+0x4b/0x80 [cxl_port]
? __pfx_discover_region+0x10/0x10 [cxl_port]
device_for_each_child+0x58/0x90
cxl_port_probe+0x10e/0x130 [cxl_port]
cxl_bus_probe+0x17/0x50 [cxl_core]
Change the port ancestry walk to be by depth rather than by dport. This
ensures that even if a port has unregistered its dports a deferred
memdev cleanup will still be able to cleanup the memdev's interest in
that port.
The parent_port->dev.driver check is only needed for determining if the
bottom up removal beat the top-down removal, but cxl_ep_remove() can
always proceed given the port is pinned. That is, the two sources of
cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and
cxl_port_release() can not run if cxl_detach_ep() holds a reference.
Fixes: 2703c16c75 ("cxl/core/port: Add switch port enumeration")
Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1]
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL r3.0 section 8.2.9.4.2 "Set Timestamp" recommends that the host sets
the timestamp after every Conventional or CXL Reset to ensure accurate
timestamps. This should include on initial boot up. The time base that
is being set is used by a device for the poison list overflow timestamp
and all event timestamps. Note that the command is optional and if
not supported and the device cannot return accurate timestamps it will
fill the fields in with an appropriate marker (see the specification
description of each timestamp).
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230130151327.32415-1-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
Determine if the event read is memory module record and if so trace the
record.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-5-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
Determine if the event read is a DRAM event record and if so trace the
record.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-4-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
Determine if the event read is a general media record and if so trace
the record as a General Media Event Record.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-3-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently the only CXL features targeted for irq support require their
message numbers to be within the first 16 entries. The device may
however support less than 16 entries depending on the support it
provides.
Attempt to allocate these 16 irq vectors. If the device supports less
then the PCI infrastructure will allocate that number. Upon successful
allocation, users can plug in their respective isr at any point
thereafter.
CXL device events are signaled via interrupts. Each event log may have
a different interrupt message number. These message numbers are
reported in the Get Event Interrupt Policy mailbox command.
Add interrupt support for event logs. Interrupts are allocated as
shared interrupts. Therefore, all or some event logs can share the same
message number.
In addition all logs are queried on any interrupt in order of the most
to least severe based on the status register.
Finally place all event configuration logic into cxl_event_config().
Previously the logic was a simple 'read all' on start up. But
interrupts must be configured prior to any reads to ensure no events are
missed. A single event configuration function results in a cleaner over
all implementation.
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-2-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL devices have multiple event logs which can be queried for CXL event
records. Devices are required to support the storage of at least one
event record in each event log type.
Devices track event log overflow by incrementing a counter and tracking
the time of the first and last overflow event seen.
Software queries events via the Get Event Record mailbox command; CXL
rev 3.0 section 8.2.9.2.2 and clears events via CXL rev 3.0 section
8.2.9.2.3 Clear Event Records mailbox command.
If the result of negotiating CXL Error Reporting Control is OS control,
read and clear all event logs on driver load.
Ensure a clean slate of events by reading and clearing the events on
driver load.
The status register is not used because a device may continue to trigger
events and the only requirement is to empty the log at least once. This
allows for the required transition from empty to non-empty for interrupt
generation. Handling of interrupts is in a follow on patch.
The device can return up to 1MB worth of event records per query.
Allocate a shared large buffer to handle the max number of records based
on the mailbox payload size.
This patch traces a raw event record and leaves specific event record
type tracing to subsequent patches. Macros are created to aid in
tracing the common CXL Event header fields.
Each record is cleared explicitly. A clear all bit is specified but is
only valid when the log overflows.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-1-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
cxl_internal_send_cmd() skips output size validation for variable output
commands which is not ideal. Most of the time internal usages want to
fail if the output size does not match what was requested. For other
commands where the caller cannot predict the size there is usually a
a header that conveys how much vaild data is in the payload. For those
cases add @min_out as a parameter to specify what the minimum response
payload needs to be for the caller to parse the rest of the payload.
In this patch only Get Supported Logs has that behavior, but going
forward records retrieval commands like Get Poison List and Get Event
Records can use @min_out to retrieve a variable amount of records.
Critically, this validation scheme skips the needs to interrogate the
cxl_mem_commands array which in turn frees up the implementation to
support internal command enabling without also enabling external / user
commands.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/167030055918.4044561.10339573829837910505.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Internally cxl_mbox_send_cmd() converts all passed-in parameters to a
'struct cxl_mbox_cmd' instance and sends that to cxlds->mbox_send(). It
then teases the possibilty that the caller can validate the output size.
However, they cannot since the resulting output size is not conveyed to
the called. Fix that by making the caller pass in a constructed 'struct
cxl_mbox_cmd'. This prepares for a future patch to add output size
validation on a per-command basis.
Given the change in signature, also change the name to differentiate it
from the user command submission path that performs more validation
before generating the 'struct cxl_mbox_cmd' instance to execute.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/167030055370.4044561.17788093375112783036.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Unlike a CXL memory expander in a VH topology that has at least one
intervening 'struct cxl_port' instance between itself and the CXL root
device, an RCD attaches one-level higher. For example:
VH
┌──────────┐
│ ACPI0017 │
│ root0 │
└─────┬────┘
│
┌─────┴────┐
│ dport0 │
┌─────┤ ACPI0016 ├─────┐
│ │ port1 │ │
│ └────┬─────┘ │
│ │ │
┌──┴───┐ ┌──┴───┐ ┌───┴──┐
│dport0│ │dport1│ │dport2│
│ RP0 │ │ RP1 │ │ RP2 │
└──────┘ └──┬───┘ └──────┘
│
┌───┴─────┐
│endpoint0│
│ port2 │
└─────────┘
...vs:
RCH
┌──────────┐
│ ACPI0017 │
│ root0 │
└────┬─────┘
│
┌───┴────┐
│ dport0 │
│ACPI0016│
└───┬────┘
│
┌────┴─────┐
│endpoint0 │
│ port1 │
└──────────┘
So arrange for endpoint port in the RCH/RCD case to appear directly
connected to the host-bridge in its singular role as a dport. Compare
that to the VH case where the host-bridge serves a dual role as a
'cxl_dport' for the CXL root device *and* a 'cxl_port' upstream port for
the Root Ports in the Root Complex that are modeled as 'cxl_dport'
instances in the CXL topology.
Another deviation from the VH case is that RCDs may need to look up
their component registers from the Root Complex Register Block (RCRB).
That platform firmware specified RCRB area is cached by the cxl_acpi
driver and conveyed via the host-bridge dport to the cxl_mem driver to
perform the cxl_rcrb_to_component() lookup for the endpoint port
(See 9.11.8 CXL Devices Attached to an RCH for the lookup of the
upstream port component registers).
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993045621.1882361.1730100141527044744.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Camerom <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
tl;dr: Clean up an unnecessary export and enable cxl_test.
An RCD (Restricted CXL Device), in contrast to a typical CXL device in
a VH topology, obtains its component registers from the bottom half of
the associated CXL host bridge RCRB (Root Complex Register Block). In
turn this means that cxl_rcrb_to_component() needs to be called from
devm_cxl_add_endpoint().
Presently devm_cxl_add_endpoint() is part of the CXL core, but the only
user is the CXL mem module. Move it from cxl_core to cxl_mem to not only
get rid of an unnecessary export, but to also enable its call out to
cxl_rcrb_to_component(), in a subsequent patch, to be mocked by
cxl_test. Recall that cxl_test can only mock exported symbols, and since
cxl_rcrb_to_component() is itself inside the core, all callers must be
outside of cxl_core to allow cxl_test to mock it.
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993045072.1882361.13944923741276843683.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add nominal error handling that tears down CXL.mem in response to error
notifications that imply a device reset. Given some CXL.mem may be
operating as System RAM, there is a high likelihood that these error
events are fatal. However, if the system survives the notification the
expectation is that the driver behavior is equivalent to a hot-unplug
and re-plug of an endpoint.
Note that this does not change the mask values from the default. That
awaits CXL _OSC support to determine whether platform firmware is in
control of the mask registers.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166974413966.1608150.15522782911404473932.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Create callback function to support the nvdimm_security_ops() ->erase()
callback. Translate the operation to send "Passphrase Secure Erase"
security command for CXL memory device.
When the mem device is secure erased, cpu_cache_invalidate_memregion() is
called in order to invalidate all CPU caches before attempting to access
the mem device again.
See CXL 3.0 spec section 8.2.9.8.6.6 for reference.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983615293.2734609.10358657600295932156.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Create callback function to support the nvdimm_security_ops() ->unlock()
callback. Translate the operation to send "Unlock" security command for CXL
mem device.
When the mem device is unlocked, cpu_cache_invalidate_memregion() is called
in order to invalidate all CPU caches before attempting to access the mem
device.
See CXL rev3.0 spec section 8.2.9.8.6.4 for reference.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983614167.2734609.15124543712487741176.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Create callback function to support the nvdimm_security_ops() ->freeze()
callback. Translate the operation to send "Freeze Security State" security
command for CXL memory device.
See CXL rev3.0 spec section 8.2.9.8.6.5 for reference.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983613019.2734609.10645754779802492122.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Create callback function to support the nvdimm_security_ops ->disable()
callback. Translate the operation to send "Disable Passphrase" security
command for CXL memory device. The operation supports disabling a
passphrase for the CXL persistent memory device. In the original
implementation of nvdimm_security_ops, this operation only supports
disabling of the user passphrase. This is due to the NFIT version of
disable passphrase only supported disabling of user passphrase. The CXL
spec allows disabling of the master passphrase as well which
nvidmm_security_ops does not support yet. In this commit, the callback
function will only support user passphrase.
See CXL rev3.0 spec section 8.2.9.8.6.3 for reference.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983611878.2734609.10602135274526390127.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Create callback function to support the nvdimm_security_ops ->change_key()
callback. Translate the operation to send "Set Passphrase" security command
for CXL memory device. The operation supports setting a passphrase for the
CXL persistent memory device. It also supports the changing of the
currently set passphrase. The operation allows manipulation of a user
passphrase or a master passphrase.
See CXL rev3.0 spec section 8.2.9.8.6.2 for reference.
However, the spec leaves a gap WRT master passphrase usages. The spec does
not define any ways to retrieve the status of if the support of master
passphrase is available for the device, nor does the commands that utilize
master passphrase will return a specific error that indicates master
passphrase is not supported. If using a device does not support master
passphrase and a command is issued with a master passphrase, the error
message returned by the device will be ambiguous.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983610751.2734609.4445075071552032091.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add nvdimm_security_ops support for CXL memory device with the introduction
of the ->get_flags() callback function. This is part of the "Persistent
Memory Data-at-rest Security" command set for CXL memory device support.
The ->get_flags() function provides the security state of the persistent
memory device defined by the CXL 3.0 spec section 8.2.9.8.6.1.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983609611.2734609.13231854299523325319.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL regions (interleave sets) are made up of a set of memory devices
where each device maps a portion of the interleave with one of its
decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
As endpoint decoders are identified by a provisioning tool they can be
added to a region provided the region interleave properties are set
(way, granularity, HPA) and DPA has been assigned to the decoder.
The attach event triggers several validation checks, for example:
- is the DPA sized appropriately for the region
- is the decoder reachable via the host-bridges identified by the
region's root decoder
- is the device already active in a different region position slot
- are there already regions with a higher HPA active on a given port
(per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
...and the attach event affords an opportunity to collect data and
resources relevant to later programming the target lists in switch
decoders, for example:
- allocate a decoder at each cxl_port in the decode chain
- for a given switch port, how many the region's endpoints are hosted
through the port
- how many unique targets (next hops) does a port need to map to reach
those endpoints
The act of reconciling this information and deploying it to the decoder
configuration is saved for a follow-on patch.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for provisioning CXL regions, add accounting for the DPA
space consumed by existing regions / decoders. Recall, a CXL region is a
memory range comprised from one or more endpoint devices contributing a
mapping of their DPA into HPA space through a decoder.
Record the DPA ranges covered by committed decoders at initial probe of
endpoint ports relative to a per-device resource tree of the DPA type
(pmem or volatile-ram).
The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
state across all endpoints and their decoders at once. The vast majority
of DPA operations are reads as region creation is expected to be as rare
as disk partitioning and volume creation. The device_lock() for this
synchronization is specifically avoided for concern of entangling with
sysfs attribute removal.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784327682.1758207.7914919426043855876.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
DOE mailbox objects will be needed for various mailbox communications
with each memory device.
Iterate each DOE mailbox capability and create PCI DOE mailbox objects
as found.
It is not anticipated that this is the final resting place for the
iteration of the DOE devices. The support of switch ports will drive
this code into the PCIe side. In this imagined architecture the CXL
port driver would then query into the PCI device for the DOE mailbox
array.
For now creating the mailboxes in the CXL port is good enough for the
endpoints. Later PCIe ports will need to support this to support switch
ports more generically.
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220719205249.566684-5-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In support of testing DPA allocation mechanisms in the CXL core, the
cxl_test environment needs to support establishing and retrieving the
'pmem partition boundary.
Replace the platform_device_add_resources() method for delineating DPA
within an endpoint with an emulated DEV_SIZE amount of partitionable
capacity. Set DEV_SIZE such that an endpoint has enough capacity to
simultaneously participate in 8 distinct regions.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603887411.551046.13234212587991192347.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dump the device-physical-address map for a CXL expander in /proc/iomem
style format. E.g.:
cat /sys/kernel/debug/cxl/mem1/dpamem
00000000-0fffffff : ram
10000000-1fffffff : pmem
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603885318.551046.8308248564880066726.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
To date the per-device-partition DPA range information has only been
used for enumeration purposes. In preparation for allocating regions
from available DPA capacity, convert those ranges into DPA-type resource
trees.
With resources and the new add_dpa_res() helper some open coded end
address calculations and debug prints can be cleaned.
The 'cxlds->pmem_res' and 'cxlds->ram_res' resources are child resources
of the total-device DPA space and they in turn will host DPA allocations
from cxl_endpoint_decoder instances (tracked by cxled->dpa_res).
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603878921.551046.8127845916514734142.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL specification defines these as little endian.
Fixes: 60b8f17215 ("cxl/pmem: Translate NVDIMM label commands to CXL label commands")
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/20220225221456.1025635-1-alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for fixing the setting of the 'mem_enabled' bit in CXL
DVSEC Control register, move all CXL DVSEC range enumeration into the
same source file.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165291688886.1426646.15046138604010482084.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Allow cxl_await_media_ready() to be mocked for testing purposes rather
than carrying the maintenance burden of an indirect function call in the
mainline driver.
With the move cxl_await_media_ready() can no longer reuse the mailbox
timeout override, so add a media_ready_timeout module parameter to the
core to backfill.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165291688340.1426646.4755627801983775011.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The CXL specification claims S3 support at a hardware level, but at a
system software level there are some missing pieces. Section 9.4 (CXL
2.0) rightly claims that "CXL mem adapters may need aux power to retain
memory context across S3", but there is no enumeration mechanism for the
OS to determine if a given adapter has that support. Moreover the save
state and resume image for the system may inadvertantly end up in a CXL
device that needs to be restored before the save state is recoverable.
I.e. a circular dependency that is not resolvable without a third party
save-area.
Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly
allows for suspend, but requires unbinding all CXL memory devices before
the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind
flow is intended to also tear down all CXL memory regions associated
with a given cxl_memdev.
It is reasonable to assume that any device participating in a System RAM
range published in the EFI memory map is covered by aux power and
save-area outside the device itself. So this restriction can be
minimized in the future once pre-existing region enumeration support
arrives, and perhaps a spec update to clarify if the EFI memory map is
sufficent for determining the range of devices managed by
platform-firmware for S3 support.
Per Rafael, if the CXL configuration prevents suspend then it should
fail early before tasks are frozen, and mem_sleep should stop showing
'mem' as an option [1]. Effectively CXL augments the platform suspend
->valid() op since, for example, the ACPI ops are not aware of the CXL /
PCI dependencies. Given the split role of platform firmware vs OS
provisioned CXL memory it is up to the cxl_mem driver to determine if
the CXL configuration has elements that platform firmware may not be
prepared to restore.
Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1]
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/165066828317.3907920.5690432272182042556.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Upon a completed command the caller is still expected to check
the actual return_code register to ensure it succeed. This
adds, per the spec, the potential command return codes. It maps
the hardware return code with the kernel's errno style, and by
default continues to use -ENXIO (Command completed, but device
reported an error).
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
Link: https://lore.kernel.org/r/20220404021216.66841-4-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
User space may send the SET_PARTITION_INFO mailbox command using
the IOCTL interface. Inspect the input payload and fail if the
immediate flag is set.
This is the first instance of the driver inspecting an input payload
from user space. Assume there will be more such cases and implement
with an extensible helper.
In order for the kernel to react to an immediate partition change it
needs to assert that the change will not affect any active decode. At
a minimum this requires validating that the device is using HDM
decoders instead of the CXL DVSEC for decode, and that none of the
active HDM decoders are affected by the partition change. For now,
just fail until that support arrives.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/241821186c363833980adbc389e2c547bc5a6395.1648687552.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
At this point the subsystem can enumerate all CXL ports (CXL.mem decode
resources in upstream switch ports and host bridges) in a system. The
last mile is connecting those ports to endpoints.
The cxl_mem driver connects an endpoint device to the platform CXL.mem
protoctol decode-topology. At ->probe() time it walks its
device-topology-ancestry and adds a CXL Port object at every Upstream
Port hop until it gets to CXL root. The CXL root object is only present
after a platform firmware driver registers platform CXL resources. For
ACPI based platform this is managed by the ACPI0017 device and the
cxl_acpi driver.
The ports are registered such that disabling a given port automatically
unregisters all descendant ports, and the chain can only be registered
after the root is established.
Given ACPI device scanning may run asynchronously compared to PCI device
scanning the root driver is tasked with rescanning the bus after the
root successfully probes.
Conversely if any ports in a chain between the root and an endpoint
becomes disconnected it subsequently triggers the endpoint to
unregister. Given lock depenedencies the endpoint unregistration happens
in a workqueue asynchronously. If userspace cares about synchronizing
delayed work after port events the /sys/bus/cxl/flush attribute is
available for that purpose.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: clarify changelog, rework hotplug support]
Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Per the CXL specification (8.1.12.2 Memory Device PCIe Capabilities and
Extended Capabilities) the Device Serial Number capability is mandatory.
Emit it for user tooling to identify devices.
It is reasonable to ask whether the attribute should be added to the
list of PCI sysfs device attributes. The PCI layer can optionally emit
it too, but the CXL subsystem is aiming to preserve its independence and
the possibility of CXL topologies with non-PCI devices in it. To date
that has only proven useful for the 'cxl_test' model, but as can be seen
with seen with ACPI0016 devices, sometimes all that is needed is a
platform firmware table to point to CXL Component Registers in MMIO
space to define a "CXL" device.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164366608838.196598.16856227191534267098.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL 2.0 8.1.3.8.2 states:
Memory_Active: When set, indicates that the CXL Range 1 memory is
fully initialized and available for software use. Must be set within
Range 1. Memory_Active_Timeout of deassertion of reset to CXL device
if CXL.mem HwInit Mode=1
Unfortunately, Memory_Active can take quite a long time depending on
media size (up to 256s per 2.0 spec). Provide a callback for the
eventual establishment of CXL.mem operations via the 'cxl_mem' driver
the 'struct cxl_memdev'. The implementation waits for 60s by default for
now and can be overridden by the mbox_ready_time module parameter.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: switch to sleeping wait]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298427373.3018233.9309741847039301834.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Before CXL 2.0 HDM Decoder Capability mechanisms can be utilized in a
device the driver must determine that the device is ready for CXL.mem
operation and that platform firmware, or some other agent, has
established an active decode via the legacy CXL 1.1 decoder mechanism.
This legacy mechanism is defined in the CXL DVSEC as a set of range
registers and status bits that take time to settle after a reset.
Validate the CXL memory decode setup via the DVSEC and cache it for
later consideration by the cxl_mem driver (to be added). Failure to
validate is not fatal to the cxl_pci driver since that is only providing
CXL command support over PCI.mmio, and might be needed to rectify CXL
DVSEC validation problems.
Any potential ranges that the device is already claiming via DVSEC need
to be reconciled with the dynamic provisioning ranges provided by
platform firmware (like ACPI CEDT.CFMWS). Leave that reconciliation to
the cxl_mem driver.
[djbw: shorten defines]
[djbw: change precise spin wait to generous msleep]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: clarify changelog]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164375911821.559935.7375160041663453400.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The PCIe device DVSEC, defined in the CXL 2.0 spec, 8.1.3 is required to
be implemented by CXL 2.0 endpoint devices. In preparation for consuming
this information in a new cxl_mem driver, retrieve the CXL DVSEC
position and warn about the implications of not finding it. Allow for
mailbox operation even if the CXL DVSEC is missing.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164375309615.513620.7874131241128599893.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for defining a cxl_port object to represent the decoder
resources of a memory expander capture the component register base
address.
The port driver uses the component register base to enumerate the HDM
Decoder Capability structure. Unlike other cxl_port objects the endpoint
port decodes from upstream SPA to downstream DPA rather than upstream
port to downstream port.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: clarify changelog]
Link: https://lore.kernel.org/r/164375084181.484304.3919737667590006795.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Unlike the decoder enumeration for "root decoders" described by platform
firmware, standard decoders can be enumerated from the component
registers space once the base address has been identified (via PCI,
ACPI, or another mechanism).
Add common infrastructure for HDM (Host-managed-Device-Memory) Decoder
enumeration and share it between host-bridge, upstream switch port, and
cxl_test defined decoders.
The locking model for switch level decoders is to hold the port lock
over the enumeration. This facilitates moving the dport and decoder
enumeration to a 'port' driver. For now, the only enumerator of decoder
resources is the cxl_acpi root driver.
Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164374688404.395335.9239248252443123526.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The 'struct cxl_mem' object actually represents the state of a CXL
device within the driver. Comments indicating that 'struct cxl_mem' is a
device itself are incorrect. It is data layered on top of a CXL Memory
Expander class device. Rename it 'struct cxl_dev_state'. The 'struct'
cxl_memdev' structure represents a Linux CXL memory device object, and
it uses services and information provided by 'struct cxl_dev_state'.
Update the structure name, function names, and the kdocs to reflect the
real uses of this structure.
Some helper functions that were previously prefixed "cxl_mem_" are
renamed to just "cxl_".
Acked-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20211102202901.3675568-3-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for cxl_test to mock responses to mailbox command
requests, move some definitions from core/mbox.c to cxlmem.h.
No functional changes intended.
Acked-by: Ben Widawsky <ben.widawsky@intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163116439547.2460985.10457111177103589574.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The CXL_PMEM driver expects exclusive control of the label storage area
space. Similar to the LIBNVDIMM expectation that the label storage area
is only writable from userspace when the corresponding memory device is
not active in any region, the expectation is the native CXL_PCI UAPI
path is disabled while the cxl_nvdimm for a given cxl_memdev device is
active in LIBNVDIMM.
Add the ability to toggle the availability of a given command for the
UAPI path. Use that new capability to shutdown changes to partitions and
the label storage area while the cxl_nvdimm device is actively proxying
commands for LIBNVDIMM.
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/163164579468.2830966.6980053377428474263.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Define enabled_cmds as an embedded member of 'struct cxl_mem' rather
than a pointer to another dynamic allocation.
As this leaves only one user of cxl_cmd_count, just open code it and
delete the helper.
Acked-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163116436415.2460985.10101824045493194813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that the internals of mailbox operations are abstracted from the PCI
specifics a bulk of infrastructure can move to the core.
The CXL_PMEM driver intends to proxy LIBNVDIMM UAPI and driver requests
to the equivalent functionality provided by the CXL hardware mailbox
interface. In support of that intent move the mailbox implementation to
a shared location for the CXL_PCI driver native IOCTL path and CXL_PMEM
nvdimm command proxy path to share.
A unit test framework seeks to implement a unit test backend transport
for mailbox commands to communicate mocked up payloads. It can reuse all
of the mailbox infrastructure minus the PCI specifics, so that also gets
moved to the core.
Finally with the mailbox infrastructure and ioctl handling being
transport generic there is no longer any need to pass file
file_operations to devm_cxl_add_memdev(). That allows all the ioctl
boilerplate to move into the core for unit test reuse.
No functional change intended, just code movement.
Acked-by: Ben Widawsky <ben.widawsky@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163116435233.2460985.16197340449713287180.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for implementing a unit test backend transport for ioctl
operations, and making the mailbox available to the cxl/pmem
infrastructure, move the existing PCI specific portion of mailbox handling
to an "mbox_send" operation.
With this split all the PCI-specific transport details are comprehended
by a single operation and the rest of the mailbox infrastructure is
'struct cxl_mem' and 'struct cxl_memdev' generic.
Acked-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/163116434098.2460985.9004760022659400540.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Commit 0b9159d0ff ("cxl/pci: Store memory capacity values") missed
updating the kernel-doc for 'struct cxl_mem' leading to the following
warnings:
./scripts/kernel-doc -v drivers/cxl/cxlmem.h 2>&1 | grep warn
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'total_bytes' not described in 'cxl_mem'
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'volatile_only_bytes' not described in 'cxl_mem'
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'persistent_only_bytes' not described in 'cxl_mem'
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'partition_align_bytes' not described in 'cxl_mem'
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'active_volatile_bytes' not described in 'cxl_mem'
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'active_persistent_bytes' not described in 'cxl_mem'
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'next_volatile_bytes' not described in 'cxl_mem'
drivers/cxl/cxlmem.h:107: warning: Function parameter or member 'next_persistent_bytes' not described in 'cxl_mem'
Also, it is redundant to describe those same parameters in the
kernel-doc for cxl_mem_get_partition_info(). Given the only user of that
routine updates the values in @cxlm, just do that implicitly internal to
the helper.
Cc: Ira Weiny <ira.weiny@intel.com>
Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163157174216.2653013.1277706528753990974.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for adding a unit test provider of a cxl_memdev, convert
the 'struct cxl_mem' driver context to carry a generic device rather
than a pci device.
Note, some dev_dbg() lines needed extra reformatting per clang-format.
This conversion also allows the cxl_mem_create() and
devm_cxl_add_memdev() calling conventions to be simplified. The "host"
for a cxl_memdev, must be the same device for the driver that allocated
@cxlm.
Acked-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/163116432973.2460985.7553504957932024222.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Memory devices may specify volatile only, persistent only, and
partitionable space which when added together result in a total capacity.
If Identify Memory Device.Partition Alignment != 0 the device supports
partitionable space. This partitionable space can be split between
volatile and persistent space. The total volatile and persistent sizes
are reported in Get Partition Info. ie
active volatile memory = volatile only + partitionable volatile
active persistent memory = persistent only + partitionable persistent
Define cxl_mem_get_partition(), check for partitionable support, and use
cxl_mem_get_partition() if applicable.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The Identify Memory Device command returns information about the
volatile only and persistent only memory capacities. Store those values
in the cxl_mem structure for later use. While at it, reuse those
calculations to calculate the ram and pmem ranges.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20210617221620.1904031-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The motivation for moving cxl_memdev allocation to the core (beyond
better file organization of sysfs attributes in core/ and drivers in
cxl/), is that device lifetime is longer than module lifetime. The cxl_pci
module should be free to come and go without needing to coordinate with
devices that need the text associated with cxl_memdev_release() to stay
resident. The move fixes a use after free bug when looping driver
load / unload with CONFIG_DEBUG_KOBJECT_RELEASE=y.
Another motivation for disconnecting cxl_memdev creation from cxl_pci is
to enable other drivers, like a unit test driver, to registers memdevs.
Fixes: b39cb1052a ("cxl/mem: Register CXL memX devices")
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/162792540495.368511.9748638751088219595.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for moving cxl_memdev allocation to the core, introduce
cdevm_file_operations to coordinate file operations shutdown relative to
driver data release.
The motivation for moving cxl_memdev allocation to the core (beyond
better file organization of sysfs attributes in core/ and drivers in
cxl/), is that device lifetime is longer than module lifetime. The cxl_pci
module should be free to come and go without needing to coordinate with
devices that need the text associated with cxl_memdev_release() to stay
resident. The move will fix a use after free bug when looping driver
load / unload with CONFIG_DEBUG_KOBJECT_RELEASE=y.
Another motivation for passing in file_operations to the core cxl_memdev
creation flow is to allow for alternate drivers, like unit test code, to
define their own ioctl backends.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/162792539962.368511.2962268954245340288.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL core is growing, and it's already arguably unmanageable. To support
future growth, move core functionality to a new directory and rename the
file to represent just bus support. Future work will remove non-bus
functionality.
Note that mem.h is renamed to cxlmem.h to avoid a namespace collision
with the global ARCH=um mem.h header.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/162792537866.368511.8915631504621088321.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>