For symmetry with the recent rename of ->dport_dev for a 'struct
cxl_dport', add the "_dev" suffix to the ->uport property of a 'struct
cxl_port'. These devices represent the downstream-port-device and
upstream-port-device respectively in the CXL/PCIe topology.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-6-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reading code like dport->dport does not immediately suggest that this
points to the corresponding device structure of the dport. Rename
struct member @dport to @dport_dev.
While at it, also rename @new argument of add_dport() to @dport. This
better describes the variable as a dport (e.g. new->dport becomes to
dport->dport_dev).
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-5-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prepare cxl_probe_rcrb() for retrieving more than just the component
register block. The RCH AER handling code wants to get back to the AER
capability that happens to be MMIO mapped rather then configuration
cycles.
Move RCRB specific downstream port data, like the RCRB base and the
AER capability offset, into its own data structure ('struct
cxl_rcrb_info') for cxl_probe_rcrb() to fill. Extend 'struct
cxl_dport' to include a 'struct cxl_rcrb_info' attribute.
This centralizes all RCRB scanning in one routine.
Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-4-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The RCRB is extracted already during ACPI CEDT table parsing while the
data of this is needed not earlier than dport creation. This
implementation comes with drawbacks: During ACPI table scan there is
already MMIO access including mapping and unmapping, but only ACPI
data should be collected here. The collected data must be transferred
through a couple of interfaces until it is finally consumed when
creating the dport. This causes complex data structures and function
interfaces. Additionally, RCRB parsing will be extended to also
extract AER data, it would be much easier do this at a later point
during port and dport creation when the data structures are available
to hold that data.
To simplify all that, probe the RCRB at a later point during RCH
downstream port creation. Change ACPI table parser to only extract the
base address of either the component registers or the RCRB. Parse and
extract the RCRB in devm_cxl_add_rch_dport().
This is in preparation to centralize all RCRB scanning.
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230622205523.85375-2-terry.bowman@amd.com
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20230622205523.85375-3-terry.bowman@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL PMU devices can be found from entries in the Register
Locator DVSEC.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230526095824.16336-4-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Until the recently release CXL 3.0 specification, there
was only ever one instance of any given register block pointed
to by the Register Block Locator DVSEC. Now, the specification allows
for multiple CXL PMU instances, each with their own register block.
To enable this add cxl_find_regblock_instance() that takes an index
parameter and use that to implement cxl_count_regblock() and
cxl_find_regblock().
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230526095824.16336-3-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This adds support for handling background operations, as defined in
the CXL 3.0 spec. Commands that can take too long (over ~2 seconds)
can run in the background asynchronously (to the hardware).
The driver will deal with such commands synchronously, blocking all
other incoming commands for a specified period of time, allowing
time-slicing the command such that the caller can send incremental
requests to avoid monopolizing the driver/device. Any out of sync
(timeout) between the driver and hardware is just disregarded as
an invalid state until the next successful submission. Such timeouts
are considered a rare occurrence, either a real device problem or a
driver issue that needs to reduce the size of the background operation
to fit the timeout.
On devices where mbox interrupts are supported, this will still use
a poller that will wakeup in the specified wait intervals. The irq
handler will simply awake the blocked cmd, which is also safe vs a
task that is either waking (timing out) or already awoken. Similarly
any irq setup error during the probing falls back to polling, thus
avoids unnecessarily erroring out.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230523170927.20685-5-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Derick noticed, when testing hot plug, that hot-add behaves nominally
after a removal. However, if the hot-add is done without a prior
removal, CXL.mem accesses fail. It turns out that the original
implementation of the port driver and region programming wrongly assumed
that platform-firmware always enables the host-bridge HDM decoder
capability. Add support turning on switch-level HDM decoders in the case
where platform-firmware has not.
The implementation is careful to only arrange for the enable to be
undone if the current instance of the driver was the one that did the
enable. This is to interoperate with platform-firmware that may expect
CXL.mem to remain active after the driver is shutdown. This comes at the
cost of potentially not shutting down the enable on kexec flows, but it
is mitigated by the fact that the related HDM decoders still need to be
enabled on an individual basis.
Cc: <stable@vger.kernel.org>
Reported-by: Derick Marks <derick.w.marks@intel.com>
Fixes: 54cdbf845c ("cxl/port: Add a driver for 'struct cxl_port' objects")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/168437998331.403037.15719879757678389217.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The find_cxl_root() helper is used to lookup root decoders and other CXL
platform topology information for a given endpoint. It turns out that
for RCDs it has never worked. The result of find_cxl_root(&cxlmd->dev)
is always NULL for the RCH topology case because it expects to find a
cxl_port at the host-bridge. RCH topologies only have the root cxl_port
object with the host-bridge as a dport. While there are no reports of
this being a problem to date, by inspection region enumeration should
crash as a result of this problem, and it does in a local unit test for
this scenario.
However, an observation that ever since:
commit f17b558d66 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
...all callers of find_cxl_root() occur after the memdev connection to
the port topology has been established. That means that find_cxl_root()
can be simplified to a walk of the endpoint port topology to the root.
Switch to that arrangement which also fixes the RCD bug.
Fixes: a32320b71f ("cxl/region: Add region autodiscovery")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168002857715.50647.344876437247313909.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If the driver is allowed to enable memory operation itself then it can
also turn on HDM decoder support at will.
With this the second call to cxl_setup_hdm_decoder_from_dvsec(), when
an HDM decoder is not committed, is not needed.
Fixes: b777e9bec9 ("cxl/hdm: Emulate HDM decoder from DVSEC range registers")
Link: http://lore.kernel.org/r/20230220113657.000042e1@huawei.com
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167703068474.185722.664126485486344246.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
- CXL RAM region enumeration: instantiate 'struct cxl_region' objects
for platform firmware created memory regions
- CXL RAM region provisioning: complement the existing PMEM region
creation support with RAM region support
- "Soft Reservation" policy change: Online (memory hot-add)
soft-reserved memory (EFI_MEMORY_SP) by default, but still allow for
setting aside such memory for dedicated access via device-dax.
- CXL Events and Interrupts: Takeover CXL event handling from
platform-firmware (ACPI calls this CXL Memory Error Reporting) and
export CXL Events via Linux Trace Events.
- Convey CXL _OSC results to drivers: Similar to PCI, let the CXL
subsystem interrogate the result of CXL _OSC negotiation.
- Emulate CXL DVSEC Range Registers as "decoders": Allow for
first-generation devices that pre-date the definition of the CXL HDM
Decoder Capability to translate the CXL DVSEC Range Registers into
'struct cxl_decoder' objects.
- Set timestamp: Per spec, set the device timestamp in case of hotplug,
or if platform-firwmare failed to set it.
- General fixups: linux-next build issues, non-urgent fixes for
pre-production hardware, unit test fixes, spelling and debug message
improvements.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY/WYcgAKCRDfioYZHlFs
Z6m3APkBUtiEEm1o8ikdu5llUS1OTLBwqjJDwGMTyf8X/WDXhgD+J2mLsCgARS7X
5IS0RAtefutrW5sQpUucPM7QiLuraAY=
=kOXC
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dan Williams:
"To date Linux has been dependent on platform-firmware to map CXL RAM
regions and handle events / errors from devices. With this update we
can now parse / update the CXL memory layout, and report events /
errors from devices. This is a precursor for the CXL subsystem to
handle the end-to-end "RAS" flow for CXL memory. i.e. the flow that
for DDR-attached-DRAM is handled by the EDAC driver where it maps
system physical address events to a field-replaceable-unit (FRU /
endpoint device). In general, CXL has the potential to standardize
what has historically been a pile of memory-controller-specific error
handling logic.
Another change of note is the default policy for handling RAM-backed
device-dax instances. Previously the default access mode was "device",
mmap(2) a device special file to access memory. The new default is
"kmem" where the address range is assigned to the core-mm via
add_memory_driver_managed(). This saves typical users from wondering
why their platform memory is not visible via free(1) and stuck behind
a device-file. At the same time it allows expert users to deploy
policy to, for example, get dedicated access to high performance
memory, or hide low performance memory from general purpose kernel
allocations. This affects not only CXL, but also systems with
high-bandwidth-memory that platform-firmware tags with the
EFI_MEMORY_SP (special purpose) designation.
Summary:
- CXL RAM region enumeration: instantiate 'struct cxl_region' objects
for platform firmware created memory regions
- CXL RAM region provisioning: complement the existing PMEM region
creation support with RAM region support
- "Soft Reservation" policy change: Online (memory hot-add)
soft-reserved memory (EFI_MEMORY_SP) by default, but still allow
for setting aside such memory for dedicated access via device-dax.
- CXL Events and Interrupts: Takeover CXL event handling from
platform-firmware (ACPI calls this CXL Memory Error Reporting) and
export CXL Events via Linux Trace Events.
- Convey CXL _OSC results to drivers: Similar to PCI, let the CXL
subsystem interrogate the result of CXL _OSC negotiation.
- Emulate CXL DVSEC Range Registers as "decoders": Allow for
first-generation devices that pre-date the definition of the CXL
HDM Decoder Capability to translate the CXL DVSEC Range Registers
into 'struct cxl_decoder' objects.
- Set timestamp: Per spec, set the device timestamp in case of
hotplug, or if platform-firwmare failed to set it.
- General fixups: linux-next build issues, non-urgent fixes for
pre-production hardware, unit test fixes, spelling and debug
message improvements"
* tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (66 commits)
dax/kmem: Fix leak of memory-hotplug resources
cxl/mem: Add kdoc param for event log driver state
cxl/trace: Add serial number to trace points
cxl/trace: Add host output to trace points
cxl/trace: Standardize device information output
cxl/pci: Remove locked check for dvsec_range_allowed()
cxl/hdm: Add emulation when HDM decoders are not committed
cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decoders
cxl/hdm: Emulate HDM decoder from DVSEC range registers
cxl/pci: Refactor cxl_hdm_decode_init()
cxl/port: Export cxl_dvsec_rr_decode() to cxl_port
cxl/pci: Break out range register decoding from cxl_hdm_decode_init()
cxl: add RAS status unmasking for CXL
cxl: remove unnecessary calling of pci_enable_pcie_error_reporting()
dax/hmem: build hmem device support as module if possible
dax: cxl: add CXL_REGION dependency
cxl: avoid returning uninitialized error code
cxl/pmem: Fix nvdimm registration races
cxl/mem: Fix UAPI command comment
cxl/uapi: Tag commands from cxl_query_cmd()
...
Pick up the CXL DVSEC range register emulation for v6.3, and resolve
conflicts with the cxl_port_probe() split (from for-6.3/cxl-ram-region)
and event handling (from for-6.3/cxl-events).
CXL rev3 spec 8.1.3
RCDs may not have HDM register blocks. Create a fake HDM with information
from the CXL PCIe DVSEC registers. The decoder count will be set to the
HDM count retrieved from the DVSEC cap register.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640368994.935665.15831225724059704620.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In the case where HDM decoder register block exists but is not programmed
and at the same time the DVSEC range register range is active, populate the
CXL decoder object 'cxl_decoder' with info from DVSEC range registers.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640368454.935665.13806415120298330717.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Call cxl_dvsec_rr_decode() in the beginning of cxl_port_probe() and
preserve the decoded information in a local
'struct cxl_endpoint_dvsec_info'. This info can be passed to various
functions later on in order to support the HDM decoder emulation.
The invocation of cxl_dvsec_rr_decode() in cxl_hdm_decode_init() is
removed and a pointer to the 'struct cxl_endpoint_dvsec_info' is passed
in.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167640367377.935665.2848747799651019676.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
By default the CXL RAS mask registers bits are defaulted to 1's and
suppress all error reporting. If the kernel has negotiated ownership
of error handling for CXL then unmask the mask registers by writing 0s.
PCI_EXP_DEVCTL capability is checked to see uncorrectable or correctable
errors bits are set before unmasking the respective errors.
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_regs.h
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167639402301.778884.12556849214955646539.stgit@djiang5-mobl3.local
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Include the support for enumerating and provisioning ram regions for
v6.3. This also include a default policy change for ram / volatile
device-dax instances to assign them to the dax_kmem driver by default.
While platform firmware takes some responsibility for mapping the RAM
capacity of CXL devices present at boot, the OS is responsible for
mapping the remainder and hot-added devices. Platform firmware is also
responsible for identifying the platform general purpose memory pool,
typically DDR attached DRAM, and arranging for the remainder to be 'Soft
Reserved'. That reservation allows the CXL subsystem to route the memory
to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated
access (device-dax).
The new 'struct cxl_dax_region' object allows for a CXL memory resource
(region) to be published, but also allow for udev and module policy to
act on that event. It also prevents cxl_core.ko from having a module
loading dependency on any drivers/dax/ modules.
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Take two endpoints attached to the first switch on the first host-bridge
in the cxl_test topology and define a pre-initialized region. This is a
x2 interleave underneath a x1 CXL Window.
$ modprobe cxl_test
$ # cxl list -Ru
{
"region":"region3",
"resource":"0xf010000000",
"size":"512.00 MiB (536.87 MB)",
"interleave_ways":2,
"interleave_granularity":4096,
"decode_state":"commit"
}
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167602000547.1924368.11613151863880268868.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Region autodiscovery is an asynchronous state machine advanced by
cxl_port_probe(). After the decoders on an endpoint port are enumerated
they are scanned for actively enabled instances. Each active decoder is
flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region.
If a region does not already exist for the address range setting of the
decoder one is created. That creation process may race with other
decoders of the same region being discovered since cxl_port_probe() is
asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is
introduced to mitigate that race.
Once all decoders have arrived, "p->nr_targets == p->interleave_ways",
they are sorted by their relative decode position. The sort algorithm
involves finding the point in the cxl_port topology where one leg of the
decode leads to deviceA and the other deviceB. At that point in the
topology the target order in the 'struct cxl_switch_decoder' indicates
the relative position of those endpoint decoders in the region.
>From that point the region goes through the same setup and validation
steps as user-created regions, but instead of programming the decoders
it validates that driver would have written the same values to the
decoders as were already present.
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for a new region type, "ram" regions, add a mode
attribute to clarify the mode of the decoders that can be added to a
region. Share the internals of mode_show() (for decoders) with the
region case.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601993930.1924368.4305018565539515665.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the only CXL features targeted for irq support require their
message numbers to be within the first 16 entries. The device may
however support less than 16 entries depending on the support it
provides.
Attempt to allocate these 16 irq vectors. If the device supports less
then the PCI infrastructure will allocate that number. Upon successful
allocation, users can plug in their respective isr at any point
thereafter.
CXL device events are signaled via interrupts. Each event log may have
a different interrupt message number. These message numbers are
reported in the Get Event Interrupt Policy mailbox command.
Add interrupt support for event logs. Interrupts are allocated as
shared interrupts. Therefore, all or some event logs can share the same
message number.
In addition all logs are queried on any interrupt in order of the most
to least severe based on the status register.
Finally place all event configuration logic into cxl_event_config().
Previously the logic was a simple 'read all' on start up. But
interrupts must be configured prior to any reads to ensure no events are
missed. A single event configuration function results in a cleaner over
all implementation.
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-2-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL devices have multiple event logs which can be queried for CXL event
records. Devices are required to support the storage of at least one
event record in each event log type.
Devices track event log overflow by incrementing a counter and tracking
the time of the first and last overflow event seen.
Software queries events via the Get Event Record mailbox command; CXL
rev 3.0 section 8.2.9.2.2 and clears events via CXL rev 3.0 section
8.2.9.2.3 Clear Event Records mailbox command.
If the result of negotiating CXL Error Reporting Control is OS control,
read and clear all event logs on driver load.
Ensure a clean slate of events by reading and clearing the events on
driver load.
The status register is not used because a device may continue to trigger
events and the only requirement is to empty the log at least once. This
allows for the required transition from empty to non-empty for interrupt
generation. Handling of interrupts is in a follow on patch.
The device can return up to 1MB worth of event records per query.
Allocate a shared large buffer to handle the max number of records based
on the mailbox payload size.
This patch traces a raw event record and leaves specific event record
type tracing to subsequent patches. Macros are created to aid in
tracing the common CXL Event header fields.
Each record is cleared explicitly. A clear all bit is specified but is
only valid when the log overflows.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-1-2316a5c8f7d8@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL is using tracepoints for reporting RAS capability register payloads
for AER events, and has plans to use tracepoints for the output payload
of Get Poison List and Get Event Records commands. For organization
purposes it would be nice to keep those all under a single + local CXL
trace system. This also organization also potentially helps in the
future when CXL drivers expand beyond generic memory expanders, however
that would also entail a move away from the expander-specific
cxl_dev_state context, save that for later.
Note that the powerpc-specific drivers/misc/cxl/ also defines a 'cxl'
trace system, however, it is unlikely that a single platform will ever
load both drivers simultaneously.
Cc: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167051869176.436579.9728373544811641087.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Change names for interleave ways macros to clearly indicate which
variable is encoded and which is the actual ways value.
ways == interleave ways
eiw == encoded interleave ways
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167027516228.3124679.11265039496968588580.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Change names for granularity macros to clearly indicate which
variable is encoded and which is the actual granularity.
granularity == interleave granularity
eig == encoded interleave granularity
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167027493237.3124429.8948852388671827664.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pick up support for "XOR" interleave math when parsing ACPI CFMWS window
structures. Fix up conflicts with the RCH emulation already pending in
cxl/next.
tl;dr: Clean up an unnecessary export and enable cxl_test.
An RCD (Restricted CXL Device), in contrast to a typical CXL device in
a VH topology, obtains its component registers from the bottom half of
the associated CXL host bridge RCRB (Root Complex Register Block). In
turn this means that cxl_rcrb_to_component() needs to be called from
devm_cxl_add_endpoint().
Presently devm_cxl_add_endpoint() is part of the CXL core, but the only
user is the CXL mem module. Move it from cxl_core to cxl_mem to not only
get rid of an unnecessary export, but to also enable its call out to
cxl_rcrb_to_component(), in a subsequent patch, to be mocked by
cxl_test. Recall that cxl_test can only mock exported symbols, and since
cxl_rcrb_to_component() is itself inside the core, all callers must be
outside of cxl_core to allow cxl_test to mock it.
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993045072.1882361.13944923741276843683.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When the CFMWS is using XOR math, parse the corresponding
CXIMS structure and store the xormaps in the root decoder
structure. Use the xormaps in a new lookup, cxl_hb_xor(),
to find a targets entry in the host bridge interleave
target list.
Defined in CXL Specfication 3.0 Section: 9.17.1
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/5794813acdf7b67cfba3609c6aaff46932fa38d0.1669847017.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add nominal error handling that tears down CXL.mem in response to error
notifications that imply a device reset. Given some CXL.mem may be
operating as System RAM, there is a high likelihood that these error
events are fatal. However, if the system survives the notification the
expectation is that the driver behavior is equivalent to a hot-unplug
and re-plug of an endpoint.
Note that this does not change the mask values from the default. That
awaits CXL _OSC support to determine whether platform firmware is in
control of the mask registers.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166974413966.1608150.15522782911404473932.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The RAS Capability Structure has some ancillary information that may be
relevant with respect to AER events, link and protcol error status
registers. Map the RAS Capability Registers in support of defining a
'struct pci_error_handlers' instance for the cxl_pci driver.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166974412803.1608150.7096566580400947001.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The RAS Capabilitiy Structure is a CXL Component register capability
block. Unlike the HDM Decoder Capability, it will be referenced by the
cxl_pci driver in response to PCIe AER events. Due to this it is no
longer the case that cxl_map_component_regs() can assume that it should
map all component registers. Plumb a bitmask of capability ids to map
through cxl_map_component_regs().
For symmetry cxl_probe_device_regs() is updated to populate @id in
'struct cxl_reg_map' even though cxl_map_device_regs() does not have a
need to map a subset of the device registers per caller.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166974412214.1608150.11487843455070795378.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A downstream port must be connected to a component register block.
For restricted hosts the base address is determined from the RCRB. The
RCRB is provided by the host's CEDT CHBS entry. Rework CEDT parser to
get the RCRB and add code to extract the component register block from
it.
RCRB's BAR[0..1] point to the component block containing CXL subsystem
component registers. MEMBAR extraction follows the PCI base spec here,
esp. 64 bit extraction and memory range alignment (6.0, 7.5.1.2.1). The
RCRB base address is cached in the cxl_dport per-host bridge so that the
upstream port component registers can be retrieved later by an RCD
(RCIEP) associated with the host bridge.
Note: Right now the component register block is used for HDM decoder
capability only which is optional for RCDs. If unsupported by the RCD,
the HDM init will fail. It is future work to bypass it in this case.
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/Y4dsGZ24aJlxSfI1@rric.localdomain
[djbw: introduce devm_cxl_add_rch_dport()]
Link: https://lore.kernel.org/r/166993044524.1882361.2539922887413208807.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A "DPA invalidation event" is any scenario where the contents of a DPA
(Device Physical Address) is modified in a way that is incoherent with
CPU caches, or if the HPA (Host Physical Address) to DPA association
changes due to a remapping event.
PMEM security events like Unlock and Passphrase Secure Erase already
manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events
that need cache management by the CXL core. Those only happen when the
boot time CXL configuration has changed. That event occurs when
userspace attaches an endpoint decoder to a region configuration, and
that region is subsequently activated.
The implications of not invalidating caches between remap events is that
reads from the region at different points in time may return different
results due to stale cached data from the previous HPA to DPA mapping.
Without a guarantee that the region contents after cxl_region_probe()
are written before being read (a layering-violation assumption that
cxl_region_probe() can not make) the CXL subsystem needs to ensure that
reads that precede writes see consistent results.
A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug
and unit testing of the CXL implementation in QEMU or other environments
where cpu_cache_has_invalidate_memregion() returns false. This may prove
too restrictive for QEMU where the HDM decoders are emulated, but in
that case the CXL subsystem needs some new mechanism / indication that
the HDM decoder is emulated and not a passthrough of real hardware.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166993222098.1995348.16604163596374520890.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that cxl_nvdimm and cxl_pmem_region objects are torn down
sychronously with the removal of either the bridge, or an endpoint, the
cxl_pmem_wq infrastructure can be jettisoned.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993042335.1882361.17022872468068436287.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that a cxl_nvdimm object can only experience ->remove() via an
unregistration event (because the cxl_nvdimm bind attributes are
suppressed), additional cleanups are possible.
It is already the case that the removal of a cxl_memdev object triggers
->remove() on any associated region. With that mechanism in place there
is no need for the cxl_nvdimm removal to trigger the same. Just rely on
cxl_region_detach() to tear down the whole cxl_pmem_region.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041215.1882361.6321535567798911286.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When reviewing the CFMWS parsing code that deals with the HDM decoders,
I noticed a couple of magic numbers. This commit replaces these magic numbers
with constants defined by the CXL 3.0 specification.
v2:
- Change references to CXL 3.0 specification (David)
- CXL_DECODER_MAX_GRANULARITY_ORDER -> CXL_DECODER_MAX_ENCODED_IG (Dan)
Signed-off-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20220829220249.243888-1-a.manzanares@samsung.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The function devm_cxl_iomap_block() is only used in the core
code. There are two declarations in header files of it, in
drivers/cxl/core/core.h and drivers/cxl/cxl.h. Remove its unused
declaration in drivers/cxl/cxl.h.
Fixing build error in regs.c found by kernel test robot by including
"core.h" there.
Signed-off-by: Robert Richter <rrichter@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20221018132341.76259-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When programming port decode targets, the algorithm wants to ensure that
two devices are compatible to be programmed as peers beneath a given
port. A compatible peer is a target that shares the same dport, and
where that target's interleave position also routes it to the same
dport. Compatibility is determined by the device's interleave position
being >= to distance. For example, if a given dport can only map every
Nth position then positions less than N away from the last target
programmed are incompatible.
The @distance for the host-bridge's cxl_port in a simple dual-ported
host-bridge configuration with 2 direct-attached devices is 1, i.e. An
x2 region divided by 2 dports to reach 2 region targets.
An x4 region under an x2 host-bridge would need 2 intervening switches
where the @distance at the host bridge level is 2 (x4 region divided by
2 switches to reach 4 devices).
However, the distance between peers underneath a single ported
host-bridge is always zero because there is no limit to the number of
devices that can be mapped. In other words, there are no decoders to
program in a passthrough, all descendants are mapped and distance only
starts matters for the intervening descendant ports of the passthrough
port.
Add tracking for the number of dports mapped to a port, and use that to
detect the passthrough case for calculating @distance.
Cc: <stable@vger.kernel.org>
Reported-by: Bobo WL <lmw.bobo@gmail.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com
Fixes: 27b3f8d138 ("cxl/region: Program target lists")
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When a cxl_nvdimm object goes through a ->remove() event (device
physically removed, nvdimm-bridge disabled, or nvdimm device disabled),
then any associated regions must also be disabled. As highlighted by the
cxl-create-region.sh test [1], a single device may host multiple
regions, but the driver was only tracking one region at a time. This
leads to a situation where only the last enabled region per nvdimm
device is cleaned up properly. Other regions are leaked, and this also
causes cxl_memdev reference leaks.
Fix the tracking by allowing cxl_nvdimm objects to track multiple region
associations.
Cc: <stable@vger.kernel.org>
Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1]
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Fixes: 04ad63f086 ("cxl/region: Introduce cxl_pmem_region objects")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752183647.947915.2045230911503793901.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Sphinx reported undescribed parameters in cxl_region_params struct:
./drivers/cxl/cxl.h:376: warning: Function parameter or member 'targets' not described in 'cxl_region_params'
./drivers/cxl/cxl.h:376: warning: Function parameter or member 'nr_targets' not described in 'cxl_region_params'
Describe these members.
Fixes: b9686e8c8e ("cxl/region: Enable the assignment of endpoint decoders to regions")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220804075448.98241-3-bagasdotme@gmail.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The kernel enforces that region granularity is >= to the top-level
interleave-granularity for the given CXL window. However, when the CXL
window interleave is x1, i.e. non-interleaved at the host bridge level,
then the specified granularity does not matter. Override the window
specified granularity to the CXL minimum so that any valid region
granularity is >= to the root granularity.
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/165853776917.2430596.16823264262010844458.stgit@dwillia2-xfh.jf.intel.com
[djbw: add CXL_DECODER_MIN_GRANULARITY per vishal]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The "ways" variable comes from the user. The ways_to_cxl() function
has an upper bound but it doesn't check for negatives. Make
the "ways" variable an unsigned int to fix this bug.
Fixes: 80d10a6cee ("cxl/region: Add interleave geometry attributes")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/Yueo3NV2hFCXx1iV@kili
[djbw: fixup interleave_ways_store() to only accept unsigned input]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The LIBNVDIMM subsystem is a platform agnostic representation of system
NVDIMM / persistent memory resources. To date, the CXL subsystem's
interaction with LIBNVDIMM has been to register an nvdimm-bridge device
and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM
subsystem mechanics.
With regions the approach is the same. Create a new cxl_pmem_region
object to proxy CXL region details into a LIBNVDIMM definition. With
this enabling LIBNVDIMM can partition CXL persistent memory regions with
legacy namespace labels. A follow-on patch will add CXL region label and
CXL namespace label support to persist region configurations across
driver reload / system-reset events.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Be careful to only disable cxl_pmem objects related to a given
cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL
domains and disables more than is expected.
Fixes: 21083f5152 ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784339569.1758207.1557084545278004577.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The CXL region driver is responsible for routing fully formed CXL
regions to one of libnvdimm, for persistent memory regions, device-dax
for volatile memory regions, or just act as an enumeration placeholder
if the region was setup and configuration locked by platform firmware.
In the platform-firmware-setup case the expectation is that region is
already accounted in the system memory map, i.e. already enabled as
"System RAM".
For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and
take no further action.
Given this driver is just a small / simple router, include it in the
core rather than its own module.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-18-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After all the soft validation of the region has completed, convey the
region configuration to hardware while being careful to commit decoders
in specification mandated order. In addition to programming the endpoint
decoder base-address, interleave ways and granularity, the switch
decoder target lists are also established.
While the kernel can enforce spec-mandated commit order, it can not
enforce spec-mandated reset order. For example, the kernel can't stop
someone from removing an endpoint device that is occupying decoderN in a
switch decoder where decoderN+1 is also committed. To reset decoderN,
decoderN+1 must be torn down first. That "tear down the world"
implementation is saved for a follow-on patch.
Callback operations are provided for the 'commit' and 'reset'
operations. While those callbacks may prove useful for CXL accelerators
(Type-2 devices with memory) the primary motivation is to enable a
simple way for cxl_test to intercept those operations.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784338418.1758207.14659830845389904356.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Once the region's interleave geometry (ways, granularity, size) is
established and all the endpoint decoder targets are assigned, the next
phase is to program all the intermediate decoders. Specifically, each
CXL switch in the path between the endpoint and its CXL host-bridge
(including the logical switch internal to the host-bridge) needs to have
its decoders programmed and the target list order assigned.
The difficulty in this implementation lies in determining which endpoint
decoder ordering combinations are valid. Consider the cxl_test case of 2
host bridges, each of those host-bridges attached to 2 switches, and
each of those switches attached to 2 endpoints for a potential 8-way
interleave. The x2 interleave at the host-bridge level requires that all
even numbered endpoint decoder positions be located on the "left" hand
side of the topology tree, and the odd numbered positions on the other.
The endpoints that are peers on the same switch need to have a position
that can be routed with a dedicated address bit per-endpoint. See
check_last_peer() for the details.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784337827.1758207.132121746122685208.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL regions (interleave sets) are made up of a set of memory devices
where each device maps a portion of the interleave with one of its
decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
As endpoint decoders are identified by a provisioning tool they can be
added to a region provided the region interleave properties are set
(way, granularity, HPA) and DPA has been assigned to the decoder.
The attach event triggers several validation checks, for example:
- is the DPA sized appropriately for the region
- is the decoder reachable via the host-bridges identified by the
region's root decoder
- is the device already active in a different region position slot
- are there already regions with a higher HPA active on a given port
(per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
...and the attach event affords an opportunity to collect data and
resources relevant to later programming the target lists in switch
decoders, for example:
- allocate a decoder at each cxl_port in the decode chain
- for a given switch port, how many the region's endpoints are hosted
through the port
- how many unique targets (next hops) does a port need to map to reach
those endpoints
The act of reconciling this information and deploying it to the decoder
configuration is saved for a follow-on patch.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The ACPI CXL Fixed Memory Window Structure (CFMWS) defines multiple
methods to determine which host bridge provides access to a given
endpoint relative to that device's position in the interleave. The
"Interleave Arithmetic" defines either a "standard modulo" /
round-random algorithm, or "xormap" based algorithm which can be defined
as a non-linear transform. Given that there are already more options
beyond "standard modulo" and that "xormap" may turn out to be ACPI CXL
specific, provide a callback for the region provisioning code to map
endpoint positions back to expected host bridge id (cxl_dport target).
For now just support the simple modulo math case and save the xormap for
a follow-on change.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-14-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The region provisioning process involves allocating DPA to a set of
endpoint decoders, and HPA plus the region geometry to a region device.
Then the decoder is assigned to the region. At this point several
validation steps can be performed to validate that the decoder is
suitable to participate in the region.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/165784336184.1758207.16403282029203949622.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After a region's interleave parameters (ways and granularity) are set,
add a way for regions to allocate HPA (host physical address space) from
the free capacity in their parent root-decoder. The allocator for this
capacity reuses the 'struct resource' based allocator used for
CONFIG_DEVICE_PRIVATE.
Once the tuple of "ways, granularity, [uuid], and size" is set the
region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE
state which is a precursor to allowing endpoint decoders to be added to
a region.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784335630.1758207.420216490941955417.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add ABI to allow the number of devices that comprise a region to be
set as well as the interleave granularity for the region.
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: reword changelog]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-11-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The process of provisioning a region involves triggering the creation of
a new region object, pouring in the configuration, and then binding that
configured object to the region driver to start its operation. For
persistent memory regions the CXL specification mandates that it
identified by a uuid. Add an ABI for userspace to specify a region's
uuid.
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: simplify locking]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784334465.1758207.8224025435884752570.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The port scanning algorithm in devm_cxl_enumerate_ports() walks up the
topology and adds cxl_port objects starting from the root down to the
endpoint. When those ports are initially created they know all their
dports, but they do not know the downstream cxl_port instance that
represents the next descendant in the topology. Rework create_endpoint()
into devm_cxl_add_endpoint() that enumerates the downstream cxl_port
topology into each port's 'struct cxl_ep' record for each endpoint it
that the port is an ancestor.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-7-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reduce the complexity and the overhead of walking the topology to
determine endpoint connectivity to root decoder interleave
configurations.
Note that cxl_detach_ep(), after it determines that the last @ep has
departed and decides to delete the port, now needs to walk the dport
array with the device_lock() held to remove entries. Previously
list_splice_init() could be used atomically delete all dport entries at
once and then perform entry tear down outside the lock. There is no
list_splice_init() equivalent for the xarray.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784331647.1758207.6345820282285119339.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for region provisioning that needs to walk the topology
by endpoints, use an xarray to record endpoint interest in a given port.
In addition to being more space and time efficient it also reduces the
complexity of the implementation by moving locking internal to the
xarray implementation. It also allows for a single cxl_ep reference to
be recorded in multiple xarrays.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-2-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
At the time that cxl_port instances are being created, cache the dport
from the parent port that points to this new child port. This will be
useful for region provisioning when walking the tree to calculate
decoder targets, and saves rewalking the dport list after the fact to
build this information.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-1-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recall that the primary role of the cxl_mem driver is to probe if the
given endpoint is connected to a CXL port topology. In that process it
walks its device ancestry to its PCI root port. If that root port is
also a CXL root port then the probe process adds cxl_port object
instances at switch in the path between to the root and the endpoint. As
those cxl_port instances are added, or if a previous enumeration
attempt already created the port, a 'struct cxl_ep' instance is
registered with that port to track the endpoints interested in that
port.
At the time the cxl_ep is registered the downstream egress path from the
port to the endpoint is known. Take the opportunity to record that
information as it will be needed for dynamic programming of decoder
targets during region provisioning.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784329944.1758207.15203961796832072116.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The CXL specification enforces that endpoint decoders are committed in
hw instance id order. In preparation for adding dynamic DPA allocation,
record the hw instance id in endpoint decoders, and enforce allocations
to occur in hw instance id order.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784328827.1758207.9627538529944559954.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recall that the Device Physical Address (DPA) space of a CXL Memory
Expander is potentially partitioned into a volatile and persistent
portion. A decoder maps a Host Physical Address (HPA) range to a DPA
range and that translation depends on the value of all previous (lower
instance number) decoders before the current one.
In preparation for allowing dynamic provisioning of regions, decoders
need an ABI to indicate which DPA partition a decoder targets. This ABI
needs to be prepared for the possibility that some other agent committed
and locked a decoder that spans the partition boundary.
Add 'decoderX.Y/mode' to endpoint decoders that indicates which
partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder
currently spans the partition boundary.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603881967.551046.6007594190951596439.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for provisioning CXL regions, add accounting for the DPA
space consumed by existing regions / decoders. Recall, a CXL region is a
memory range comprised from one or more endpoint devices contributing a
mapping of their DPA into HPA space through a decoder.
Record the DPA ranges covered by committed decoders at initial probe of
endpoint ports relative to a per-device resource tree of the DPA type
(pmem or volatile-ram).
The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
state across all endpoints and their decoders at once. The vast majority
of DPA operations are reads as region creation is expected to be as rare
as disk partitioning and volume creation. The device_lock() for this
synchronization is specifically avoided for concern of entangling with
sysfs attribute removal.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784327682.1758207.7914919426043855876.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Previously the target routing specifics of switch decoders and platform
CXL window resource tracking of root decoders were factored out of
'struct cxl_decoder'. While switch decoders translate from SPA to
downstream ports, endpoint decoders translate from SPA to DPA.
This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an
endpoint-specific Device Physical Address (DPA) resource. For now this
just defines ->dpa_res, a follow-on patch will handle requesting DPA
resource ranges from a device-DPA resource tree.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784327088.1758207.15502834501671201192.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Previously the target routing specifics of switch decoders were factored
out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'.
This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a
switch decoder that also track the associated CXL window platform
resource.
Note that the reason the resource for a given root decoder needs to be
looked up after the fact (i.e. after cxl_parse_cfmws() and
add_cxl_resource()) is because add_cxl_resource() may have merged CXL
windows in order to keep them at the top of the resource tree / decode
hierarchy.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784326541.1758207.9915663937394448341.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently 'struct cxl_decoder' contains the superset of attributes
needed for all decoder types. Before more type-specific attributes are
added to the common definition, reorganize 'struct cxl_decoder' into type
specific objects.
This patch, the first of three, factors out a cxl_switch_decoder type.
See the new kdoc for what a 'struct cxl_switch_decoder' represents in a
CXL topology.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/165784325340.1758207.5064717153608954960.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The per-device CDAT data provides performance data that is relevant for
mapping which CXL devices can participate in which CXL ranges by QTG
(QoS Throttling Group) (per ECN: CXL 2.0 CEDT CFMWS & QTG_DSM) [1]. The
QTG association specified in the ECN is advisory. Until the
cxl_acpi driver grows support for invoking the QTG _DSM method the CDAT
data is only of interest to userspace that may need it for debug
purposes.
Search the DOE mailboxes available, query CDAT data, cache the data and
make it available via a sysfs binary attribute per endpoint at:
/sys/bus/cxl/devices/endpointX/CDAT
...similar to other ACPI-structured table data in
/sys/firmware/ACPI/tables. The CDAT is relative to 'struct cxl_port'
objects since switches in addition to endpoints can host a CDAT
instance. Switch CDAT support is not implemented.
This does not support table updates at runtime. It will always provide
whatever was there when first cached. It is also the case that table
updates are not expected outside of explicit DPA address map affecting
commands like Set Partition with the immediate flag set. Given that the
driver does not support Set Partition with the immediate flag set there
is no current need for update support.
Link: https://www.computeexpresslink.org/spec-landing [1]
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
[djbw: drop in-kernel parsing infra for now, and other minor fixups]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220719205249.566684-7-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
While there is a need to go from a LIBNVDIMM 'struct nvdimm' to a CXL
'struct cxl_nvdimm', there is no use case to go the other direction.
Likely this is a leftover from an early version of the referenced commit
before it implemented devm for releasing the created nvdimm.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-19-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Region creation has need for checking host-bridge connectivity when
adding endpoints to regions. Record, at port creation time, the
host-bridge to provide a useful shortcut from any location in the
topology to the most-significant ancestor.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-4-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Interleave granularity and ways have CXL specification defined encodings.
Promote the conversion helpers to a common header, and use them to
replace other open-coded instances.
Force caller to consider the error case of the conversion similarly to
other conversion helpers like kstrto*().
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603875016.551046.17236943065932132355.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This helper was only used to identify the object type for lockdep
purposes. Now that lockdep support is done with explicit lock classes,
this helper can be dropped.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Link: https://lore.kernel.org/r/165603874340.551046.15491766127759244728.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Root decoders are responsible for hosting the available host address
space for endpoints and regions to claim. The tracking of that available
capacity can be done in iomem_resource directly. As a result, root
decoders no longer need to host their own resource tree. The
current ->platform_res attribute was added prematurely.
Otherwise, ->hpa_range fills the role of conveying the current decode
range of the decoder.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Link: https://lore.kernel.org/r/165603873619.551046.791596854070136223.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for growing a ->dpa_range attribute for endpoint
decoders, rename the current ->decoder_range to the more descriptive
->hpa_range.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Link: https://lore.kernel.org/r/165603872867.551046.2170426227407458814.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Save some characters and directly check decoder type rather than port
type. There's no need to check if the port is an endpoint port since, by
this point, cxl_endpoint_decoder_alloc() has a specified type.
Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that all CXL subsystem locking is validated with custom lock
classes, there is no need for the custom usage of the lockdep_mutex.
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165055520383.3745911.53447786039115271.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recall that a CXL Port is any object that publishes a CXL HDM Decoder
Capability structure. That is Host Bridge and Switches that have been
enabled so far. Now, add decoder support to the 'endpoint' CXL Ports
registered by the cxl_mem driver. They mostly share the same enumeration
as Bridges and Switches, but witout a target list. The target of
endpoint decode is device-internal DPA space, not another downstream
port.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: clarify changelog, hookup enumeration in the port driver]
Link: https://lore.kernel.org/r/164386092069.765089.14895687988217608642.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
At this point the subsystem can enumerate all CXL ports (CXL.mem decode
resources in upstream switch ports and host bridges) in a system. The
last mile is connecting those ports to endpoints.
The cxl_mem driver connects an endpoint device to the platform CXL.mem
protoctol decode-topology. At ->probe() time it walks its
device-topology-ancestry and adds a CXL Port object at every Upstream
Port hop until it gets to CXL root. The CXL root object is only present
after a platform firmware driver registers platform CXL resources. For
ACPI based platform this is managed by the ACPI0017 device and the
cxl_acpi driver.
The ports are registered such that disabling a given port automatically
unregisters all descendant ports, and the chain can only be registered
after the root is established.
Given ACPI device scanning may run asynchronously compared to PCI device
scanning the root driver is tasked with rescanning the bus after the
root successfully probes.
Conversely if any ports in a chain between the root and an endpoint
becomes disconnected it subsequently triggers the endpoint to
unregister. Given lock depenedencies the endpoint unregistration happens
in a workqueue asynchronously. If userspace cares about synchronizing
delayed work after port events the /sys/bus/cxl/flush attribute is
available for that purpose.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: clarify changelog, rework hotplug support]
Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
So far the platorm level CXL resources have been enumerated by the
cxl_acpi driver, and cxl_pci has gathered all the pre-requisite
information it needs to fire up a cxl_mem driver. However, the first
thing the cxl_mem driver will be tasked to do is validate that all the
PCIe Switches in its ancestry also have CXL capabilities and an CXL.mem
link established.
Provide a common mechanism for a CXL.mem endpoint driver to enumerate
all the ancestor CXL ports in the topology and validate CXL.mem
connectivity.
Multiple endpoints may end up racing to establish a shared port in the
topology. This race is resolved via taking the device-lock on a parent
CXL Port before establishing a new child. The winner of the race
establishes the port, the loser simply registers its interest in the
port via 'struct cxl_ep' place-holder reference.
At endpoint teardown the same parent port lock is taken as 'struct
cxl_ep' references are deleted. Last endpoint to drop its reference
unregisters the port.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164398731146.902644.1029761300481366248.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that dport and decoder enumeration is centralized in the port
driver, the @host argument for these helpers can be made implicit. For
the root port the host is the port's uport device (ACPI0017 for
cxl_acpi), and for all other descendant ports the devm context is the
parent of @port.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164375043390.484143.17617734732003230076.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The need for a CXL port driver and a dedicated cxl_bus_type is driven by
a need to simultaneously support 2 independent physical memory decode
domains (cache coherent CXL.mem and uncached PCI.mmio) that also
intersect at a single PCIe device node. A CXL Port is a device that
advertises a CXL Component Register block with an "HDM Decoder
Capability Structure".
>From Documentation/driver-api/cxl/memory-devices.rst:
Similar to how a RAID driver takes disk objects and assembles them into
a new logical device, the CXL subsystem is tasked to take PCIe and ACPI
objects and assemble them into a CXL.mem decode topology. The need for
runtime configuration of the CXL.mem topology is also similar to RAID in
that different environments with the same hardware configuration may
decide to assemble the topology in contrasting ways. One may choose
performance (RAID0) striping memory across multiple Host Bridges and
endpoints while another may opt for fault tolerance and disable any
striping in the CXL.mem topology.
The port driver identifies whether an endpoint Memory Expander is
connected to a CXL topology. If an active (bound to the 'cxl_port'
driver) CXL Port is not found at every PCIe Switch Upstream port and an
active "root" CXL Port then the device is just a plain PCIe endpoint
only capable of participating in PCI.mmio and DMA cycles, not CXL.mem
coherent interleave sets.
The 'cxl_port' driver lets the CXL subsystem leverage driver-core
infrastructure for setup and teardown of register resources and
communicating device activation status to userspace. The cxl_bus_type
can rendezvous the async arrival of platform level CXL resources (via
the 'cxl_acpi' driver) with the asynchronous enumeration of Memory
Expander endpoints, while also implementing a hierarchical locking model
independent of the associated 'struct pci_dev' locking model. The
locking for dport and decoder enumeration is now handled in the core
rather than callers.
For now the port driver only enumerates and registers CXL resources
(downstream port metadata and decoder resources) later it will be used
to take action on its decoders in response to CXL.mem region
provisioning requests.
Note1: cxlpci.h has long depended on pci.h, but port.c was the first to
not include pci.h. Carry that dependency in cxlpci.h.
Note2: cxl port enumeration and probing complicates CXL subsystem init
to the point that it helps to have centralized debug logging of probe
events in cxl_bus_probe().
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Unlike the decoder enumeration for "root decoders" described by platform
firmware, standard decoders can be enumerated from the component
registers space once the base address has been identified (via PCI,
ACPI, or another mechanism).
Add common infrastructure for HDM (Host-managed-Device-Memory) Decoder
enumeration and share it between host-bridge, upstream switch port, and
cxl_test defined decoders.
The locking model for switch level decoders is to hold the port lock
over the enumeration. This facilitates moving the dport and decoder
enumeration to a 'port' driver. For now, the only enumerator of decoder
resources is the cxl_acpi root driver.
Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164374688404.395335.9239248252443123526.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The core houses infrastructure for decoder resources. A CXL port's
dports are more closely related to decoder infrastructure than topology
enumeration. Implement generic PCI based dport enumeration in the core,
i.e. arrange for existing root port enumeration from cxl_acpi to share
code with switch port enumeration which just amounts to a small
difference in a pci_walk_bus() invocation once the appropriate 'struct
pci_bus' has been retrieved.
Set the convention that decoder objects are registered after all dports
are enumerated. This enables userspace to know when the CXL core is
finished establishing 'dportX' links underneath the 'portX' object.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164368114191.354031.5270501846455462665.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for switch port enumeration while also preserving the
potential for multi-domain / multi-root CXL topologies. Introduce a
'struct device' generic mechanism for retrieving a root CXL port, if one
is registered. Note that the only known multi-domain CXL configurations
are running the cxl_test unit test on a system that also publishes an
ACPI0017 device.
With this in hand the nvdimm-bridge lookup can be with
device_find_child() instead of bus_find_device() + custom mocked lookup
infrastructure in cxl_test.
The mechanism looks for a 2nd level port since the root level topology
is platform-firmware specific and the 2nd level down follows standard
PCIe topology expectations. The cxl_acpi 2nd level is associated with a
PCIe Root Port.
Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164367562182.225521.9488555616768096049.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add a helper for converting a PCI enumerated cxl_port into the pci_bus
that hosts its dports. For switch ports this is trivial, but for root
ports there is no generic way to go from a platform defined host bridge
device, like ACPI0016 to its corresponding pci_bus. Rather than spill
ACPI goop outside of the cxl_acpi driver, just arrange for it to
register an xarray translation from the uport device to the
corresponding pci_bus.
This is in preparation for centralizing dport enumeration in the core.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164364745633.85488.9744017377155103992.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Lockdep reports:
======================================================
WARNING: possible circular locking dependency detected
5.16.0-rc1+ #142 Tainted: G OE
------------------------------------------------------
cxl/1220 is trying to acquire lock:
ffff979b85475460 (kn->active#144){++++}-{0:0}, at: __kernfs_remove+0x1ab/0x1e0
but task is already holding lock:
ffff979b87ab38e8 (&dev->lockdep_mutex#2/4){+.+.}-{3:3}, at: cxl_remove_ep+0x50c/0x5c0 [cxl_core]
...where cxl_remove_ep() is a helper that wants to delete ports while
holding a lock on the host device for that port. That sets up a lockdep
violation whereby target_list_show() can not rely holding the decoder's
device lock while walking the target_list. Switch to a dedicated seqlock
for this purpose.
Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164367209095.208169.1171673319121271280.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When CONFIG_PROVE_LOCKING is enabled the 'struct device' definition gets
an additional mutex that is not clobbered by
lockdep_set_novalidate_class() like the typical device_lock(). This
allows for local annotation of subsystem locks with mutex_lock_nested()
per the subsystem's object/lock hierarchy. For CXL, this primarily needs
the ability to lock ports by depth and child objects of ports by their
parent parent-port lock.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164365853422.99383.1052399160445197427.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for proving CXL subsystem usage of the device_lock()
order track the depth of ports with the expectation that shallower port
locks can be held over deeper port locks.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298419321.3018233.4469731547378993606.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add wrappers for the creation of decoder objects at the root level and
switch level, and keep the core helper private to cxl/core/port.c. Root
decoders are static descriptors conveyed from platform firmware (e.g.
ACPI CFMWS). Switch decoders are CXL standard decoders enumerated via
the HDM decoder capability structure. The base address for the HDM
decoder capability structure may be conveyed either by PCIe or platform
firmware (ACPI CEDT.CHBS).
Additionally, the kdoc descriptions for these helpers and their
dependencies is updated.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: fixup changelog, clarify kdoc]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164366463014.111117.9714595404002687111.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL decoders manage address ranges in a hierarchical fashion whereby a
leaf is a unique subregion of its parent decoder (midlevel or root). It
therefore makes sense to use the resource API for handling this.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> (v1)
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298417191.3018233.5201055578165414714.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Many CXL drivers simply want to register and unregister themselves.
module_driver already supported this. A simple wrapper around that
reduces a decent amount of boilerplate in upcoming patches.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/164298415591.3018233.13608495220547681412.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This implements the TODO in cxl_acpi for mapping component registers.
cxl_acpi becomes the second consumer of CXL register block enumeration
(cxl_pci being the first). Moving the functionality to cxl_core allows
both of these drivers to use the functionality. Equally importantly it
allows cxl_core to use the functionality in the future.
CXL 2.0 root ports are similar to CXL 2.0 Downstream Ports with the main
distinction being they're a part of the CXL 2.0 host bridge. While
mapping their component registers is not immediately useful for the CXL
drivers, the movement of register block enumeration into core is a vital
step towards HDM decoder programming.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[djbw: fix cxl_regmap_to_base() failure cases]
Link: https://lore.kernel.org/r/164298415080.3018233.14694957480228676592.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Commit 48667f6761 ("cxl/core: Split decoder setup into alloc + add")
aimed to fix a large stack frame warning but from v5 to v6, it
introduced a new instance of the warning due to allocating
cxld_const_init on the stack, which was done due to the use of const on
the nr_target member of the cxl_decoder struct. With ARCH=arm
allmodconfig minus CONFIG_KASAN:
GCC 11.2.0:
drivers/cxl/core/bus.c: In function ‘cxl_decoder_alloc’:
drivers/cxl/core/bus.c:523:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
523 | }
| ^
cc1: all warnings being treated as errors
Clang 12.0.1:
drivers/cxl/core/bus.c:486:21: error: stack frame size of 1056 bytes in function 'cxl_decoder_alloc' [-Werror,-Wframe-larger-than=]
struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port, int nr_targets)
^
1 error generated.
Revert that part of the change, which makes the stack frame of
cxl_decoder_alloc() much more reasonable.
Fixes: 48667f6761 ("cxl/core: Split decoder setup into alloc + add")
Link: https://github.com/ClangBuiltLinux/linux/issues/1539
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20211210213627.2477370-1-nathan@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>