irq_domain_create_simple() and irq_domain_create_legacy() use
__irq_domain_instantiate(), but have extra handling of allocating interrupt
descriptors and associating interrupts in them. Some of that is duplicated.
There are also call sites which have conditonals to invoke different
interrupt domain creator functions, where one of them is usually
irq_domain_create_legacy(). Alternatively they associate the interrupts for
the legacy case after creating the domain.
Moving the extra logic of irq_domain_create_simple()/legacy() into
__irq_domain_instantiate() allows to consolidate that.
Introduce hwirq_base and virq_base members in the irq_domain_info
structure, which allows to transport the required information and add the
conditional interrupt descriptor allocation and interrupt association into
__irq_domain_instantiate().
This reduces irq_domain_create_legacy() and irq_domain_create_simple() to
trivial wrappers which fill in the info structure and allows call sites
which must support the legacy case along with more modern mechanism to
select the domain type via the parameters of the info struct.
[ tglx: Massaged change log ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/32d07bd79eb2b5416e24da9e9e8fe5955423dcf9.1723120028.git.mazziesaccount@gmail.com
Currently, whenever a caller is providing an affinity hint for an
interrupt, the allocation code uses it to calculate the node and copies the
cpumask into irq_desc::affinity.
If the affinity for the interrupt is not marked 'managed' then the startup
of the interrupt ignores irq_desc::affinity and uses the system default
affinity mask.
Prevent this by setting the IRQD_AFFINITY_SET flag for the interrupt in the
allocator, which causes irq_setup_affinity() to use irq_desc::affinity on
interrupt startup if the mask contains an online CPU.
[ tglx: Massaged changelog ]
Fixes: 45ddcecbfa ("genirq: Use affinity hint in irqdesc allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/20240806072044.837827-1-shayd@nvidia.com
Various PCI controllers that mux MSIs onto a single IRQ line produce these
"IRQ%d: set affinity failed" warnings when entering suspend. This has been
discussed before [1] [2] and an example test case is included at the end of
this commit message.
Controller drivers that create MSI IRQ domain with
MSI_FLAG_USE_DEF_CHIP_OPS and do not override the .irq_set_affinity()
irqchip callback get assigned the default msi_domain_set_affinity()
callback. That is not desired on controllers where it is not possible to
set affinity of each MSI IRQ line to a specific CPU core due to hardware
limitation.
Introduce flag MSI_FLAG_NO_AFFINITY, which keeps .irq_set_affinity() unset
if the controller driver did not assign it. This way, migrate_one_irq()
can exit right away, without printing the warning. The .irq_set_affinity()
implementations which only return -EINVAL can be removed from multiple
controller drivers.
$ grep 25 /proc/interrupts
25: 0 0 0 0 0 0 0 0 PCIe MSI 0 Edge PCIe PME
$ echo core > /sys/power/pm_test ; echo mem > /sys/power/state
...
Disabling non-boot CPUs ...
IRQ25: set affinity failed(-22). <---------- This is being silenced here
psci: CPU7 killed (polled 4 ms)
...
[1] https://lore.kernel.org/all/d4a6eea3c5e33a3a4056885419df95a7@kernel.org/
[2] https://lore.kernel.org/all/5f4947b18bf381615a37aa81c2242477@kernel.org/
Link: https://lore.kernel.org/r/20240723132958.41320-2-marek.vasut+renesas@mailbox.org
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
This simplifies the handling of platform MSI and wire to MSI controllers
and removes about 500 lines of legacy code.
Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
allocation of PCI/MSI interrupts and to support the upcoming non
standard IMS (Interrupt Message Store) mechanism on PCIe devices
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaeheUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYocX4D/wLYD+DQDpA3U1XS8jPNE4vKcmBBNX8
Mj4qdHsY8fK+FhmtLsj8FL3iSTymPtgXzFupXGS+5iFG3LhbW8JWEbqjbowcJ1c8
/4w8sKyyWdCSScrCTrH4A3RrLNDAX3DzSMqqi17sdETuwtN0RJiXgcm/CwRXETmn
kVqB7ddalyAR0Z2N/ym1fkuwyBAdeu3cBxMy/BWR6GFae1dAGe8Kr8GsmmuzBTFi
DQSmkh6kZntTn9y+K7juXF+1q8InolmHiOOUeoUJachSCyp6nu9W2+S2MVUiuOA2
X1/Ei3eKvkBHFDd7phZnIrVecuNehAQEV6BRMKOYBiDG4lwD6vCbbr9/YF5vBGni
tbZAetk9VBpIj0YRVAz7WkLC2JmVbw4znlrDwe8+xeLeDwRXl9f4Xc1Udr0qKgpd
1bNE1zG1z45v5J3OtJLJ4MCYcUCsQgv1CkUlNEdz5+NhXHT+W+oKJor/0WYJ3Qwe
iqTEJ9BA1/SzvngwIt/uoMZlEjBl/0/T1UEMJvP/7oEqjl/UAEWGlpKnID3hsDc2
GcIEOJod6hWzyPyeJUI6RpCHy4ZG93WL7Ks+lvzfp381yoDL5/KlveDtSomyuzYF
2xXHUAvw8MAYfJ/CFft/DYme8sBpn1cxAMWdctEiAn0qfS7X1RNZ/RhQ2OXxRw3q
tNpc0jEen9m72A==
=2adH
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI interrupt updates from Thomas Gleixner:
"Switch ARM/ARM64 over to the modern per device MSI domains.
This simplifies the handling of platform MSI and wire to MSI
controllers and removes about 500 lines of legacy code.
Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
allocation of PCI/MSI interrupts and to support the upcoming non
standard IMS (Interrupt Message Store) mechanism on PCIe devices"
* tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/gic-v3-its: Correctly fish out the DID for platform MSI
irqchip/gic-v3-its: Correctly honor the RID remapping
genirq/msi: Move msi_device_data to core
genirq/msi: Remove platform MSI leftovers
irqchip/irq-mvebu-icu: Remove platform MSI leftovers
irqchip/irq-mvebu-sei: Switch to MSI parent
irqchip/mvebu-odmi: Switch to parent MSI
irqchip/mvebu-gicp: Switch to MSI parent
irqchip/irq-mvebu-icu: Prepare for real per device MSI
irqchip/imx-mu-msi: Switch to MSI parent
irqchip/gic-v2m: Switch to device MSI
irqchip/gic_v3_mbi: Switch over to parent domain
genirq/msi: Remove platform_msi_create_device_domain()
irqchip/mbigen: Remove platform_msi_create_device_domain() fallback
irqchip/gic-v3-its: Switch platform MSI to MSI parent
irqchip/irq-msi-lib: Prepare for DOMAIN_BUS_WIRED_TO_MSI
irqchip/mbigen: Prepare for real per device MSI
irqchip/irq-msi-lib: Prepare for DEVICE MSI to replace platform MSI
irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
irqchip/irq-msi-lib: Prepare for PCI MSI/MSIX
...
- Core:
- Provide a new mechanism to create interrupt domains. The existing
interfaces have already too many parameters and it's a pain to expand
any of this for new required functionality.
The new function takes a pointer to a data structure as argument. The
data structure combines all existing parameters and allows for easy
extension.
The first extension for this is to handle the instantiation of
generic interrupt chips at the core level and to allow drivers to
provide extra init/exit callbacks.
This is necessary to do the full interrupt chip initialization before
the new domain is published, so that concurrent usage sites won't see
a half initialized interrupt domain. Similar problems exist on
teardown.
This has turned out to be a real problem due to the deferred and
parallel probing which was added in recent years.
Handling this at the core level allows to remove quite some accrued
boilerplate code in existing drivers and avoids horrible workarounds
at the driver level.
- The usual small improvements all over the place
- Drivers
- Add support for LAN966x OIC and RZ/Five SoC
- Split the STM ExtI driver into a microcontroller and a SMP version to
allow building the latter as a module for multi-platform kernels.
- Enable MSI support for Armada 370XP on platforms which do not support
IPIs.
- The usual small fixes and enhancements all over the place.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaVJbUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXTuD/9Tc9BhY5CW7HQkdPQu2Db1O+esprkQ
Uo9lMpTTpPiy9btg4LONzLf4mjbufZpyKBxkRWoZFO0Zj5q4UE9NZYh7EcxrF5Tl
CIFJmyteLsYuOyCmPrtSDSovonXjQKYBE3u2LVJNNkwEkhYbYW9sqIKeT8nneLv6
53gd28ESFUEUjHNTblw/eXviweyUKSXc0qyg+3hgZQPMoh9RkdkEPvyaw9Y/s5Ce
FelLLxzMqX86dR2TJMLqiaGiMpUu/kl+Yz2m5c77TwA2D68qjhHywbtKtlH7b3C6
LMHu2dMrrKSJrLL8roVIYJdHAd1TKWVdnYhqv9WBHFTu1sDuztpR44mewbo8exUU
L2RgVSGYNmeFC3p4wztWYSQfIVa9uOg7+TnJJdh7G0jLIeKM/TbufWqDAJAuoVPL
QhGbZ5xNbZJZ8bvhhItjxpRN/kPs44p3mUGyRJBQzm+mDN118bqfmQzhLcwRbfE2
smp73SQzg9alG2rGdNVEqkKmp8zhg2Crx2VCeVdgbeOxWQRet9zLWcp4FfCEUE9e
eK3iEi8z+rmwafaf3rsxYdrdIRLaUmcni0v7R/16cJH/Cs7bU3Re8XyGhevo3lsO
pJiP5wZDxbckwXNpLm3S/qPDW7vSCnuFPF7QmOvC3a70PsD+E4NKUgiwJuHtn/ZV
pFBKzbQgCsowQA==
=QCRH
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Core:
- Provide a new mechanism to create interrupt domains. The existing
interfaces have already too many parameters and it's a pain to
expand any of this for new required functionality.
The new function takes a pointer to a data structure as argument.
The data structure combines all existing parameters and allows for
easy extension.
The first extension for this is to handle the instantiation of
generic interrupt chips at the core level and to allow drivers to
provide extra init/exit callbacks.
This is necessary to do the full interrupt chip initialization
before the new domain is published, so that concurrent usage sites
won't see a half initialized interrupt domain. Similar problems
exist on teardown.
This has turned out to be a real problem due to the deferred and
parallel probing which was added in recent years.
Handling this at the core level allows to remove quite some accrued
boilerplate code in existing drivers and avoids horrible
workarounds at the driver level.
- The usual small improvements all over the place
Drivers:
- Add support for LAN966x OIC and RZ/Five SoC
- Split the STM ExtI driver into a microcontroller and a SMP version
to allow building the latter as a module for multi-platform
kernels
- Enable MSI support for Armada 370XP on platforms which do not
support IPIs
- The usual small fixes and enhancements all over the place"
* tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
irqdomain: Fix the kernel-doc and plug it into Documentation
genirq: Set IRQF_COND_ONESHOT in request_irq()
irqchip/imx-irqsteer: Handle runtime power management correctly
irqchip/gic-v3: Pass #redistributor-regions to gic_of_setup_kvm_info()
irqchip/bcm2835: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock
irqchip/gic-v4: Always configure affinity on VPE activation
Revert "irqchip/dw-apb-ictl: Support building as module"
Revert "Loongarch: Support loongarch avec"
arm64: Kconfig: Allow build irq-stm32mp-exti driver as module
ARM: stm32: Allow build irq-stm32mp-exti driver as module
irqchip/stm32mp-exti: Allow building as module
irqchip/stm32mp-exti: Rename internal symbols
irqchip/stm32-exti: Split MCU and MPU code
arm64: Kconfig: Select STM32MP_EXTI on STM32 platforms
ARM: stm32: Use different EXTI driver on ARMv7m and ARMv7a
irqchip/stm32-exti: Add CONFIG_STM32MP_EXTI
irqchip/dw-apb-ictl: Support building as module
irqchip/riscv-aplic: Simplify the initialization code
...
Now that the platform MSI hack is gone, nothing needs to know about struct
msi_device_data outside of the core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Shivamurthy Shastri <shivamurthy.shastri@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240623142236.003295177@linutronix.de
- Intel PT support enhancements & fixes
- Fix leaked SIGTRAP events
- Improve and fix the Intel uncore driver
- Add support for Intel HBM and CXL uncore counters
- Add Intel Lake and Arrow Lake support
- AMD uncore driver fixes
- Make SIGTRAP and __perf_pending_irq() work on RT
- Micro-optimizations
- Misc cleanups and fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmaWjncRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iZyg//TSafjCK4N9fyXrPdPqf8L7ntX5uYf0rd
uVZpEo/+VGvuFhznHnZIV2DLetvuwYZcUWszCqQMYfokGGi6WI1/k4MeZkSpN5QE
p5mFk6gW3cmpHT9bECg7mKQH+w7Qna/b6mnA0HYTFxPGmQKdQDl1/S+ZsgWedxpC
4V3re7/FzenFVS45DwSMPi9s7uZzZhVhTSgb4XLy+0Da4S0iRULItBa8HT8HmqE5
v5aQlw3mmwKPUWvyPMi3Sw6RRWK3C+n5ZxWswSYoLSM3dsp1ZD+YYqtOv2GqAx8v
JoL0SOnGnNCfxGHh0kz5D2hztDvq61Enotih2gz7HxvdWh2DasNp4yS1USGQhu5h
VJnKNA0TfOUaYqWFVj0EgRVhDX79lMwSHTkR1DZd4vM2GDigHeRPh0zGSn2w/koV
oCRxFfBoktHBnX0Te1NE2BhojbuKp25vTGK6GriVcHt/RNpuz6hTxsjdJzHCAlVX
M349l0EpUJafvfaIN9zF22uw22J8P9y9JYqI6ebkUIKiuoT9LuafVYhQupSE9H4u
IqlozPCTNw6eAQcUo03gkl3n+SY/DZH6eU2ycKgEp3r7TDGYbJPwxY1BgOHbwi4U
lySM07leso2accSVAz7GDMI3ejj6Sx64asWS1FSwbajDflouaIK2jtey+1IOdXfv
hHY65tomV8U=
=gguT
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
- Intel PT support enhancements & fixes
- Fix leaked SIGTRAP events
- Improve and fix the Intel uncore driver
- Add support for Intel HBM and CXL uncore counters
- Add Intel Lake and Arrow Lake support
- AMD uncore driver fixes
- Make SIGTRAP and __perf_pending_irq() work on RT
- Micro-optimizations
- Misc cleanups and fixes
* tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
perf/x86/intel: Add a distinct name for Granite Rapids
perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake
perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated
perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR
perf: Split __perf_pending_irq() out of perf_pending_irq()
perf: Don't disable preemption in perf_pending_task().
perf: Move swevent_htable::recursion into task_struct.
perf: Shrink the size of the recursion counter.
perf: Enqueue SIGTRAP always via task_work.
task_work: Add TWA_NMI_CURRENT as an additional notify mode.
perf: Move irq_work_queue() where the event is prepared.
perf: Fix event leak upon exec and file release
perf: Fix event leak upon exit
task_work: Introduce task_work_cancel() again
task_work: s/task_work_cancel()/task_work_cancel_func()/
perf/x86/amd/uncore: Fix DF and UMC domain identification
perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable
perf/x86/intel: Support Perfmon MSRs aliasing
perf/x86/intel: Support PERFEVTSEL extension
perf/x86: Add config_mask to represent EVENTSEL bitmask
...
A proper task_work_cancel() API that actually cancels a callback and not
*any* callback pointing to a given function is going to be needed for
perf events event freeing. Do the appropriate rename to prepare for
that.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240621091601.18227-2-frederic@kernel.org
Currently users of the interrupt simulator don't have any way of being
notified about interrupts from the simulated domain being requested or
released. This causes a problem for one of the users - the GPIO
simulator - which is unable to lock the pins as interrupts.
Define a structure containing callbacks to be executed on various
irq_sim-related events (for now: irq request and release) and provide an
extended function for creating simulated interrupt domains that takes it
and a pointer to custom user data (to be passed to said callbacks) as
arguments.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240624093934.17089-2-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Modify the comment formatting in irq_find_matching_fwspec function to
enhance code readability and maintain consistency.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Shivamurthy Shastri <shivamurthy.shastri@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614102403.13610-2-shivamurthy.shastri@linutronix.de
Domain creation functions use __irq_domain_add(). With the introduction
of irq_domain_instantiate(), __irq_domain_add() becomes obsolete.
In order to fully remove __irq_domain_add(), convert domain
creation function to irq_domain_instantiate()
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-19-herve.codina@bootlin.com
The current API functions create an irq_domain and also publish this
newly created to domain. Once an irq_domain is published, consumers can
request IRQ in order to use them.
Some interrupt controller drivers have to perform some more operations
with the created irq_domain in order to have it ready to be used.
For instance:
- Allocate generic irq chips with irq_alloc_domain_generic_chips()
- Retrieve the generic irq chips with irq_get_domain_generic_chip()
- Initialize retrieved chips: set register base address and offsets,
set several hooks such as irq_mask, irq_unmask, ...
With the newly introduced irq_domain_alloc_generic_chips(), an interrupt
controller driver can use the irq_domain_chip_generic_info structure and
set the init() hook to perform its generic chips initialization.
In order to avoid a window where the domain is published but not yet
ready to be used, handle the generic chip creation (i.e the
irq_domain_alloc_generic_chips() call) before the domain is published.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-16-herve.codina@bootlin.com
Most of generic chip drivers need to perform some more additional
initializations on the generic chips allocated before they can be fully
ready.
These additional initializations need to be performed before the IRQ
domain is published to avoid a race condition between IRQ consumers and
suppliers.
Introduce the init() hook to perform these initializations at the right
place just after the generic chip creation. Also introduce the exit() hook
to allow reverting operations done by the init() hook just before the
generic chip is destroyed.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-15-herve.codina@bootlin.com
The existing __irq_alloc_domain_generic_chips() uses a bunch of parameters
to describe the generic chips that need to be allocated.
Adding more parameters and wrappers to hide new parameters in the existing
code leads to more and more code without any relevant values and without
any flexibility.
Introduce irq_domain_alloc_generic_chips() where the generic chips
description is done using the irq_domain_chip_generic_info structure
instead of the bunch of parameters to allow flexibility and easy evolution.
Also introduce irq_domain_remove_generic_chips() to revert the operations
done by irq_domain_alloc_generic_chips().
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-14-herve.codina@bootlin.com
The current API does not allow additional initialization before the
domain is published. This can lead to a race condition between consumers
and supplier as a domain can be available for consumers before being
fully ready.
Introduce the init() hook to allow additional initialization before
plublishing the domain. Also introduce the exit() hook to revert
operations done in init() on domain removal.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-13-herve.codina@bootlin.com
irq_domain_update_bus_token() is the only way to set the domain bus
token. This is sub-optimal as irq_domain_update_bus_token() can be called
only once the domain is created and needs to revert some operations, change
the domain name and redo the operations.
In order to avoid this revert/change/redo sequence, take the domain bus
into account token during the domain creation.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-12-herve.codina@bootlin.com
__irq_domain_create() can fail for several reasons. When it fails it
returns a NULL pointer and so filters out the exact failure reason.
The only user of __irq_domain_create() is irq_domain_instantiate() which
can return a PTR_ERR value. On __irq_domain_create() failure, it uses an
arbitrary error code.
Rather than using this arbitrary error value, make __irq_domain_create()
return is own error code and use that one.
[ tglx: Remove the pointless ERR_CAST. domain is a valid return pointer ]
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-11-herve.codina@bootlin.com
irq_domain_instantiate() handles all needs to be used in
irq_domain_create_hierarchy()
Avoid code duplication and use directly irq_domain_instantiate() for
hierarchy domain creation.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-10-herve.codina@bootlin.com
To use irq_domain_instantiate() from irq_domain_create_hierarchy(),
irq_domain_instantiate() needs to handle the domain hierarchy parent.
Add the required functionality.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-9-herve.codina@bootlin.com
In order to use irq_domain_instantiate() from several places such as
irq_domain_create_hierarchy(), irq_domain_instantiate() needs to handle
additional domain flags.
Add the required infrastructure.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-8-herve.codina@bootlin.com
The existing __irq_domain_create() use a bunch of parameters to create
an irq domain.
With the introduction of irq_domain_info structure, these parameters are
available in the information structure itself.
Using directly this information structure allows future flexibility to
add other parameters in a simple way without the need to change the
__irq_domain_create() prototype.
Convert __irq_domain_create() to use the information structure.
[ tglx: Fixup struct initializer ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-7-herve.codina@bootlin.com
The interrupt domain name computation and setting is directly done in
__irq_domain_create(). This leads to a quite long __irq_domain_create()
function.
In order to simplify __irq_domain_create() and isolate the domain name
computation and setting, move the related operations to a dedicated
function.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-6-herve.codina@bootlin.com
The existing irq_domain_add_*() functions used to instantiate an IRQ
domain are wrappers built on top of __irq_domain_add() and describe the
domain properties using a bunch of parameters.
Adding more parameters and wrappers to hide new parameters in the
existing code lead to more and more code without any relevant value and
without any flexibility.
Introduce irq_domain_instantiate() where the interrupt domain properties
are given using a irq_domain_info structure instead of the bunch of
parameters to allow flexibility and easy evolution.
irq_domain_instantiate() performs the same operation as the one done by
__irq_domain_add(). For compatibility reason with existing code, keep
__irq_domain_add() but convert it to irq_domain_instantiate().
[ tglx: Fixed up struct initializer coding style ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-3-herve.codina@bootlin.com
In preparation of the introduction of the irq domain instantiation,
introduce irq_domain_free() to avoid code duplication on later
modifications.
This new function is an extraction of the current operations performed
to free the irq domain. No functional change intended.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240614173232.1184015-2-herve.codina@bootlin.com
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617ee ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
During compilation, several warning of the following form were raised:
Function parameter or struct member 'x' not described in 'yyy'
Add the missing function parameter descriptions.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240527161450.326615-10-herve.codina@bootlin.com
Improve the readability of irqdomain debugging information in debugfs by
printing the flags field of domain files as human-readable strings instead
of a raw bitmask, which aligned with the existing style used for irqchip
flags in the irq debug files.
Before:
#cat :cpus:cpu@0:interrupt-controller
name: :cpus:cpu@0:interrupt-controller
size: 0
mapped: 2
flags: 0x00000003
After:
#cat :cpus:cpu@0:interrupt-controller
name: :cpus:cpu@0:interrupt-controller
size: 0
mapped: 3
flags: 0x00000003
IRQ_DOMAIN_FLAG_HIERARCHY
IRQ_DOMAIN_NAME_ALLOCATED
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240529091628.3666379-1-ruanjinjie@huawei.com
Interrupts which have no action and chained interrupts can be
ignored due to the following reasons (as per tglx's comment):
1) Interrupts which have no action are completely uninteresting as
there is no real information attached.
2) Chained interrupts do not have a count at all.
So there is no point to evaluate the number of accounted interrupts before
checking for non-requested or chained interrupts.
Remove the any_count logic and simply check whether the interrupt
descriptor has the kstat_irqs member populated.
[ tglx: Adapted to upstream changes ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiwei Sun <sunjw10@lenovo.com>
Link: https://lore.kernel.org/r/20240515100632.1419-1-ahuang12@lenovo.com
Link: https://lore.kernel.org/lkml/87h6f0knau.ffs@tglx/
irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:
CPU0 CPU1
desc = mt_find()
delayed_free_desc(desc)
irq_desc_get_irq(desc)
The use-after-free is reported by KASAN:
Call trace:
irq_get_next_irq+0x58/0x84
show_stat+0x638/0x824
seq_read_iter+0x158/0x4ec
proc_reg_read_iter+0x94/0x12c
vfs_read+0x1e0/0x2c8
Freed by task 4471:
slab_free_freelist_hook+0x174/0x1e0
__kmem_cache_free+0xa4/0x1dc
kfree+0x64/0x128
irq_kobj_release+0x28/0x3c
kobject_put+0xcc/0x1e0
delayed_free_desc+0x14/0x2c
rcu_do_batch+0x214/0x720
Guard the access with a RCU read lock section.
Fixes: 721255b982 ("genirq: Use a maple tree for interrupt descriptor management")
Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.
When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b4 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
- The vfio fsl-mc bus driver has become orphaned. We'll consider
removing it in future releases if a new maintainer isn't found.
(Alex Williamson)
- Improved usage of opaque data in vfio-pci INTx handling,
avoiding lookups of the eventfd through the interrupt and
irqfd runtime paths. (Alex Williamson)
- Resolve an error path memory leak introduced in vfio-pci
interrupt code. (Ye Bin)
- Addition of interrupt support for vfio devices exposed on the
CDX bus, including a new MSI allocation helper and export of
existing helpers for MSI alloc and free. (Nipun Gupta)
- A new vfio-pci variant driver supporting migration of Intel
QAT VF devices for the GEN4 PFs. (Xin Zeng & Yahui Cao)
- Resolve a possibly circular locking dependency in vfio-pci
by avoiding copy_to_user() from a PCI bus walk callback.
(Alex Williamson)
- Trivial docs update to remove a duplicate semicolon.
(Foryun Ma)
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmZLhtUbHGFsZXgud2ls
bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsikU4P/jzHWOU9OvpP30c1r6me
ez8V7JIGmAtLI0ci69uqn0B86h1nLAAmLg8QvcTco9s0a+4Pb3QGUmLfA6niZLUV
Ji7Z4c3Df4v6Kxzjg4e2Sb8rSvdzehV+WNB+kQ4lEGPyx7OvfiR6lHi2WYzAjm4M
lcmZCH5Y0URQ+wMSEHZcuom4OOSfHULvOovHuvN9CFyuZfEpVmA57MhAGiCNhXcD
Nr2KMADt7K2xDtfCv84ezx2kw6MP3mTQiWOwN1HHLEI5IW+pnv3DTaPnEn6KdTcn
zRHDu9a3uUnE4/HsuiAkMeOX046NYLHhZRls4IjligcjB8Es53nA3iSVm1sJL9RT
Nos/FubSuZ2TJ9AEkiqLRujSJiq40ALRC1qccjyN4a6pgmWSBe/3lbOHukPjAQ2K
6BmmO3tB/3wLSSbSumojar385NvyzGOQCOVHKTXgoqK7KFJpTQqsxT9GqwMdOQ+O
6nSOzfcnliTGQZ5GFuUVieFeOb6R2U7dQLT42pgBPIvToidjdfEcBRvL0SlvQbQe
HuyQ/Rx4XQ9tHHjSlOw6GEsiNsgY8TsmX+lqrCEc4G15nRLCHMp7RRh7gWz08y+g
/JqeB872zsKNiIlgnaskxmDA5iRZjPLdCu+85H7pZzegLC1NVhVrJJehR3LgleDQ
3WGxxjFNl1gKOGubhiUgd/B7
=EFpj
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.10-rc1' of https://github.com/awilliam/linux-vfio
Pull vfio updates from Alex Williamson:
- The vfio fsl-mc bus driver has become orphaned. We'll consider
removing it in future releases if a new maintainer isn't found (Alex
Williamson)
- Improved usage of opaque data in vfio-pci INTx handling, avoiding
lookups of the eventfd through the interrupt and irqfd runtime paths
(Alex Williamson)
- Resolve an error path memory leak introduced in vfio-pci interrupt
code (Ye Bin)
- Addition of interrupt support for vfio devices exposed on the CDX
bus, including a new MSI allocation helper and export of existing
helpers for MSI alloc and free (Nipun Gupta)
- A new vfio-pci variant driver supporting migration of Intel QAT VF
devices for the GEN4 PFs (Xin Zeng & Yahui Cao)
- Resolve a possibly circular locking dependency in vfio-pci by
avoiding copy_to_user() from a PCI bus walk callback (Alex
Williamson)
- Trivial docs update to remove a duplicate semicolon (Foryun Ma)
* tag 'vfio-v6.10-rc1' of https://github.com/awilliam/linux-vfio:
vfio/pci: Restore zero affected bus reset devices warning
vfio: remove an extra semicolon
vfio/pci: Collect hot-reset devices to local buffer
vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices
vfio/cdx: add interrupt support
genirq/msi: Add MSI allocation helper and export MSI functions
vfio/pci: fix potential memory leak in vfio_intx_enable()
vfio/pci: Pass eventfd context object through irqfd
vfio/pci: Pass eventfd context to IRQ handler
MAINTAINERS: Orphan vfio fsl-mc bus driver
- Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the lockup
detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to retrieve
the delta to the snapshot. Use this new mechanism in the watchdog code
to do a two stage lockup analysis by taking the snapshot and printing
the deltas for the topmost active interrupts on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the global
counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that they
are not delivered before the system resumes the device drivers when
coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU are
migrated to a different CPU in the affinity mask. This can fail when
the CPUs have no vectors left. Instead of giving up try to migrate it
to any online CPU and thereby breaking the affinity setting in order to
prevent a stale device interrupt which targets an offline CPU
- The usual small cleanups
- Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmZBCM0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeZHEACqMLN3K+1HyWflYtcTHJeYCjZLHS77
2tQeKaaskOA4W6dcGXPxMw5CHqAobHVQQMqgcJxhUdqQiOJnFFnrtCD7JtqM0hWK
UORNbyeovuhAo+iJ0fTuS8p63H7vm2GIWwBLWJnOuChYv/6Yyx5Cald1skvyvbzL
zePhiiAf5mkdmJMeT5wJSCqEWSRYOXsVAJ/0YAwFG3bKkJH3bmDo6SDJY02sXT5P
pjbtD/0hum9wIVT4fNdYleHHQMdBdj9dLlcxXBikHq50mDMw7GxvjKiLcXmoerw3
rEBfVVJp3qpSofpNJZ3HH0ywcF3yUzq04/LPE9Tk2MoQ8NF0GzP8r9Ahke4B7cUj
FysWNiAlC2IisEi6th313FZkTLx0zgewdsdEBTLt8eAE9TU0wamRbo99LZ8i/Qr3
hk7jV8DzL+EDQJLgl4p1iPJgA708eW17tbCxLEa15VKVV6P58miohmhx/IfPO2Gx
FV1PPehtItsmiK/UoRtUCoFdFsqNQtOE+h8DWLyy8RDmhBqGbn9Ut4euXiQIF+rX
WJKPFfslCTR39BrBcZnZeNsgOCN7tEfFRstzjzkey1DaeTGWtxmA5UGhpC2vT74y
YyXluvZlgKr4S64ABmcqQj++hQLho0OQAih3uW5YVxt4VxEUcXYMJOsV1AQGpMjF
UnewWH5opBQdfw==
=jFLf
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the
lockup detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to
retrieve the delta to the snapshot. Use this new mechanism in the
watchdog code to do a two stage lockup analysis by taking the
snapshot and printing the deltas for the topmost active interrupts
on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the
global counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that
they are not delivered before the system resumes the device drivers
when coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU
are migrated to a different CPU in the affinity mask. This can fail
when the CPUs have no vectors left. Instead of giving up try to
migrate it to any online CPU and thereby breaking the affinity
setting in order to prevent a stale device interrupt which targets
an offline CPU
- The usual small cleanups
Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place"
* tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
cpuidle: Avoid explicit cpumask allocation on stack
irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack
irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack
irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack
cpumask: Introduce cpumask_first_and_and()
irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown
genirq: Reuse irq_is_nmi()
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251
arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251
ARM: dts: stm32: List exti parent interrupts on stm32mp131
ARM: dts: stm32: List exti parent interrupts on stm32mp151
arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32
irqchip/stm32-exti: Mark events reserved with RIF configuration check
irqchip/stm32-exti: Skip secure events
irqchip/stm32-exti: Convert driver to standard PM
...
When a CPU goes offline, the interrupts affine to that CPU are
re-configured.
Managed interrupts undergo either migration to other CPUs or shutdown if
all CPUs listed in the affinity are offline. The migration of managed
interrupts is guaranteed on x86 because there are interrupt vectors
reserved.
Regular interrupts are migrated to a still online CPU in the affinity mask
or if there is no online CPU to any online CPU.
This works as long as the still online CPUs in the affinity mask have
interrupt vectors available, but in case that none of those CPUs has a
vector available the migration fails and the device interrupt becomes
stale.
This is not any different from the case where the affinity mask does not
contain any online CPU, but there is no fallback operation for this.
Instead of giving up, retry the migration attempt with the online CPU mask
if the interrupt is not managed, as managed interrupts cannot be affected
by this problem.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423073413.79625-1-dongli.zhang@oracle.com
irq_restore_affinity_of_irq() restarts managed interrupts unconditionally
when the first CPU in the affinity mask comes online. That's correct during
normal hotplug operations, but not when resuming from S3 because the
drivers are not resumed yet and interrupt delivery is not expected by them.
Skip the startup of suspended interrupts and let resume_device_irqs() deal
with restoring them. This ensures that irqs are not delivered to drivers
during the noirq phase of resuming from S3, after non-boot CPUs are brought
back online.
Signed-off-by: David Stevens <stevensd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240424090341.72236-1-stevensd@chromium.org
MSI functions for allocation and free can be directly used by
the device drivers without any wrapper provided by bus drivers.
So export these MSI functions.
Also, add a wrapper API to allocate MSIs providing only the
number of interrupts rather than range for simpler driver usage.
Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423111021.1686144-1-nipun.gupta@amd.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Since whether desc is NULL or desc->percpu_enabled is true, it returns
-EINVAL, check them together, and assign desc->percpu_affinity using a
ternary to simplify the code.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240417085356.3785381-1-ruanjinjie@huawei.com
show_interrupts() unconditionally accumulates the per CPU interrupt
statistics to determine whether an interrupt was ever raised.
This can be avoided for all interrupts which are not strictly per CPU
and not of type NMI because those interrupts provide already an
accumulated counter. The required logic is already implemented in
kstat_irqs().
Split the inner access logic out of kstat_irqs() and use it for
kstat_irqs() and show_interrupts() to avoid the accumulation loop
when possible.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240411074134.30922-4-yaoma@linux.alibaba.com
The soft lockup detector lacks a mechanism to identify interrupt storms as
root cause of a lockup. To enable this the detector needs a mechanism to
snapshot the interrupt count statistics on a CPU when the detector observes
a potential lockup scenario and compare that against the interrupt count
when it warns about the lockup later on. The number of interrupts in that
period give a hint whether the lockup might have been caused by an interrupt
storm.
Instead of having extra storage in the lockup detector and accessing the
internals of the interrupt descriptor directly, add a snapshot member to
the per CPU irq_desc::kstat_irq structure and provide interfaces to take a
snapshot of all interrupts on the current CPU and to retrieve the delta of
a specific interrupt later on.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-3-yaoma@linux.alibaba.com
The irq_desc::kstat_irqs member is a per-CPU variable of type int, which is
only capable of counting. A snapshot mechanism for interrupt statistics
will be added soon, which requires an additional variable to store the
snapshot.
To facilitate expansion, convert kstat_irqs here to a struct containing
only the count.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-2-yaoma@linux.alibaba.com
It's a bit hard to read the logic since the virq is used before checking it
for 0. Rearrange the code to make it better to understand.
This, in particular, should clearly answer the question whether the caller
needs to perform this check or not, and there are plenty of places for both
variants, confirming a confusion.
Fun fact that the new code is shorter:
Function old new delta
irq_dispose_mapping 278 271 -7
Total: Before=11625, After=11618, chg -0.06%
when compiled by GCC on Debian for x86_64.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405190105.3932034-1-andriy.shevchenko@linux.intel.com
There is a problem when a driver requests a shared interrupt line to run a
threaded handler on it without IRQF_ONESHOT set if that flag has been set
already for the IRQ in question by somebody else. Namely, the request
fails which usually leads to a probe failure even though the driver might
have worked just fine with IRQF_ONESHOT, but it does not want to use it by
default. Currently, the only way to handle this is to try to request the
IRQ without IRQF_ONESHOT, but with IRQF_PROBE_SHARED set and if this fails,
try again with IRQF_ONESHOT set. However, this is a bit cumbersome and not
very clean.
When commit 7a36b901a6 ("ACPI: OSL: Use a threaded interrupt handler for
SCI") switched the ACPI subsystem over to using a threaded interrupt
handler for the SCI, it had to use IRQF_ONESHOT for it because that's
required due to the way the SCI handler works (it needs to walk all of the
enabled GPEs before the interrupt line can be unmasked). The SCI interrupt
line is not shared with other users very often due to the SCI handling
overhead, but on sone systems it is shared and when the other user of it
attempts to install a threaded handler, a flags mismatch related to
IRQF_ONESHOT may occur.
As it turned out, that happened to the pinctrl-amd driver and so commit
4451e8e841 ("pinctrl: amd: Add IRQF_ONESHOT to the interrupt request")
attempted to address the issue by adding IRQF_ONESHOT to the interrupt
flags in that driver, but this is now causing an IRQF_ONESHOT-related
mismatch to occur on another system which cannot boot as a result of it.
Clearly, pinctrl-amd can work with IRQF_ONESHOT if need be, but it should
not set that flag by default, so it needs a way to indicate that to the
interrupt subsystem.
To that end, introdcuce a new interrupt flag, IRQF_COND_ONESHOT, which will
only have effect when the IRQ line is shared and IRQF_ONESHOT has been set
for it already, in which case it will be promoted to the latter.
This is sufficient for drivers sharing the interrupt line with the SCI as
it is requested by the ACPI subsystem before any drivers are probed, so
they will always see IRQF_ONESHOT set for the interrupt in question.
Fixes: 4451e8e841 ("pinctrl: amd: Add IRQF_ONESHOT to the interrupt request")
Reported-by: Francisco Ayala Le Brun <francisco@videowindow.eu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: 6.8+ <stable@vger.kernel.org> # 6.8+
Closes: https://lore.kernel.org/lkml/CAN-StX1HqWqi+YW=t+V52-38Mfp5fAz7YHx4aH-CQjgyNiKx3g@mail.gmail.com/
Link: https://lore.kernel.org/r/12417336.O9o76ZdvQC@kreacher
- Core and platform-MSI
The core changes have been adopted from previous work which converted
ARM[64] to the new per device MSI domain model, which was merged to
support multiple MSI domain per device. The ARM[64] changes are being
worked on too, but have not been ready yet. The core and platform-MSI
changes have been split out to not hold up RISC-V and to avoid that
RISC-V builds on the scheduled for removal interfaces.
The core support provides new interfaces to handle wire to MSI bridges
in a straight forward way and introduces new platform-MSI interfaces
which are built on top of the per device MSI domain model.
Once ARM[64] is converted over the old platform-MSI interfaces and the
related ugliness in the MSI core code will be removed.
- Drivers:
- Add a new driver for the Andes hart-level interrupt controller
- Rework the SiFive PLIC driver to prepare for MSI suport
- Expand the RISC-V INTC driver to support the new RISC-V AIA
controller which provides the basis for MSI on RISC-V
- A few fixup for the fallout of the core changes.
The actual MSI parts for RISC-V were finalized late and have been
post-poned for the next merge window.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXt7MsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofrMD/9Dag12ttmbE2uqzTzlTxc7RHC2MX5n
VJLt84FNNwGPA4r7WLOOqHrfuvfoGjuWT9pYMrVaXCglRG1CMvL10kHMB2f28UWv
Qpc5PzbJwpD6tqyfRSFHMoJp63DAI8IpS7J3I8bqnRD8+0PwYn3jMA1+iMZkH0B7
8uO3mxlFhQ7BFvIAeMEAhR0szuAfvXqEtpi1iTgQTrQ4Je4Rf1pmLjEe2rkwDvF4
p3SAmPIh4+F3IjO7vNsVkQ2yOarTP2cpSns6JmO8mrobLIVX7ZCQ6uVaVCfBhxfx
WttuJO6Bmh/I15yDe/waH6q9ym+0VBwYRWi5lonMpViGdq4/D2WVnY1mNeLRIfjl
X65aMWE1+bhiqyIIUfc24hacf0UgBIlMEW4kJ31VmQzb+OyLDXw+UvzWg1dO6XdA
3L6j1nRgHk0ea5yFyH6SfH/mrfeyqHuwHqo17KFyHxD3jM2H1RRMplpbwXiOIepp
KJJ/O06eMEzHqzn4B8GCT2EvX6L2ehgoWbLeEDNLQh/3LwA9OdcBzPr6gsweEl0U
Q7szJgUWZHeMr39F2rnt0GmvkEuu6muEp/nQzfnohjoYZ0PhpMLSq++4Gi+Ko3fz
2IyecJ+tlbSfyM5//8AdNnOSpsTG3f8u6B/WwhGp5lIDwMnMzCssgfQmRnc3Uyv5
kU3pdMjURJaTUA==
=7aXj
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
"Updates for the MSI interrupt subsystem and initial RISC-V MSI
support.
The core changes have been adopted from previous work which converted
ARM[64] to the new per device MSI domain model, which was merged to
support multiple MSI domain per device. The ARM[64] changes are being
worked on too, but have not been ready yet. The core and platform-MSI
changes have been split out to not hold up RISC-V and to avoid that
RISC-V builds on the scheduled for removal interfaces.
The core support provides new interfaces to handle wire to MSI bridges
in a straight forward way and introduces new platform-MSI interfaces
which are built on top of the per device MSI domain model.
Once ARM[64] is converted over the old platform-MSI interfaces and the
related ugliness in the MSI core code will be removed.
The actual MSI parts for RISC-V were finalized late and have been
post-poned for the next merge window.
Drivers:
- Add a new driver for the Andes hart-level interrupt controller
- Rework the SiFive PLIC driver to prepare for MSI suport
- Expand the RISC-V INTC driver to support the new RISC-V AIA
controller which provides the basis for MSI on RISC-V
- A few fixup for the fallout of the core changes"
* tag 'irq-msi-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
irqchip/riscv-intc: Fix low-level interrupt handler setup for AIA
x86/apic/msi: Use DOMAIN_BUS_GENERIC_MSI for HPET/IO-APIC domain search
genirq/matrix: Dynamic bitmap allocation
irqchip/riscv-intc: Add support for RISC-V AIA
irqchip/sifive-plic: Improve locking safety by using irqsave/irqrestore
irqchip/sifive-plic: Parse number of interrupts and contexts early in plic_probe()
irqchip/sifive-plic: Cleanup PLIC contexts upon irqdomain creation failure
irqchip/sifive-plic: Use riscv_get_intc_hwnode() to get parent fwnode
irqchip/sifive-plic: Use devm_xyz() for managed allocation
irqchip/sifive-plic: Use dev_xyz() in-place of pr_xyz()
irqchip/sifive-plic: Convert PLIC driver into a platform driver
irqchip/riscv-intc: Introduce Andes hart-level interrupt controller
irqchip/riscv-intc: Allow large non-standard interrupt number
genirq/irqdomain: Don't call ops->select for DOMAIN_BUS_ANY tokens
irqchip/imx-intmux: Handle pure domain searches correctly
genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV
genirq/irqdomain: Reroute device MSI create_mapping
genirq/msi: Provide allocation/free functions for "wired" MSI interrupts
genirq/msi: Optionally use dev->fwnode for device domain
genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI
...
A future user of the matrix allocator, does not know the size of the matrix
bitmaps at compile time.
To avoid wasting memory on unnecessary large bitmaps, size the bitmap at
matrix allocation time.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240222094006.1030709-11-apatel@ventanamicro.com
Users of the IRQCHIP_PLATFORM_DRIVER_{BEGIN,END} helpers rely on a fwspec
containing only the fwnode (and crucially a number of parameters set to 0)
together with a DOMAIN_BUS_ANY token to check whether a parent irqchip has
probed and registered a domain.
Since de1ff306dc ("genirq/irqdomain: Remove the param count restriction
from select()"), ops->select() is called unconditionally, meaning that
irqchips implementing select() now need to handle ANY as a match.
Instead of adding more esoteric checks to the individual drivers, add that
condition to irq_find_matching_fwspec(), and let it handle the corner case,
as per the comment in the function.
This restores the functionality of the above helpers.
Fixes: de1ff306dc ("genirq/irqdomain: Remove the param count restriction from select()")
Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reported-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://lore.kernel.org/r/20240220114731.1898534-1-maz@kernel.org
Link: https://lore.kernel.org/r/20240219-gic-fix-child-domain-v1-1-09f8fd2d9a8f@linaro.org
The affinity setting of interrupt threads happens in the context of the
thread when the thread is woken up by an hard interrupt. As this can be an
arbitrary after changing the affinity, the thread can become runnable on an
isolated CPU and cause isolation disruption.
Avoid this by checking the set affinity request in wait_for_interrupt() and
waking the threads immediately when the affinity is modified.
Note that this is of the most benefit on systems where the interrupt
affinity itself does not need to be deferred to the interrupt handler, but
even where that's not the case, the total dirsuption will be less.
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240122235353.15235-1-crwood@redhat.com
Some platform-MSI implementations require that power management is
redirected to the underlying interrupt chip device. To make this work
with per device MSI domains provide a new feature flag and let the
core code handle the setup of dev->pm_dev when set during device MSI
domain creation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-14-apatel@ventanamicro.com
Reroute interrupt allocation in irq_create_fwspec_mapping() if the domain
is a MSI device domain. This is required to convert the support for wire
to MSI bridges to per device MSI domains.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-13-apatel@ventanamicro.com
To support wire to MSI bridges proper in the MSI core infrastructure it is
required to have separate allocation/free interfaces which can be invoked
from the regular irqdomain allocaton/free functions.
The mechanism for allocation is:
- Allocate the next free MSI descriptor index in the domain
- Store the hardware interrupt number and the trigger type
which was extracted by the irqdomain core from the firmware spec
in the MSI descriptor device cookie so it can be retrieved by
the underlying interrupt domain and interrupt chip
- Use the regular MSI allocation mechanism for the newly allocated
index which returns a fully initialized Linux interrupt on succes
This works because:
- the domains have a fixed size
- each hardware interrupt is only allocated once
- the underlying domain does not care about the MSI index it only cares
about the hardware interrupt number and the trigger type
The free function looks up the MSI index in the MSI descriptor of the
provided Linux interrupt number and uses the regular index based free
functions of the MSI core.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-12-apatel@ventanamicro.com
To support wire to MSI domains via the MSI infrastructure it is required to
use the firmware node of the device which implements this for creating the
MSI domain. Otherwise the existing firmware match mechanisms to find the
correct irqdomain for a wired interrupt which is connected to a wire to MSI
bridge would fail.
This cannot be used for the general case because not all devices provide
firmware nodes and all regular per device MSI domains are directly
associated to the device and have not be searched for.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-11-apatel@ventanamicro.com
In preparation for providing a special allocation function for wired
interrupts which are connected to a wire to MSI bridge, split the inner
workings of msi_domain_alloc_irq_at() out into a helper function so the
code can be shared.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-9-apatel@ventanamicro.com
irq_create_fwspec_mapping() requires translation of the firmware spec to a
hardware interrupt number and the trigger type information.
Wired interrupts which are connected to a wire to MSI bridge, like MBIGEN
are allocated that way. So far MBIGEN provides a regular irqdomain which
then hooks backwards into the MSI infrastructure. That's an unholy mess and
will be replaced with per device MSI domains which are regular MSI domains.
Interrupts on MSI domains are not supported by irq_create_fwspec_mapping(),
but for making the wire to MSI bridges sane it makes sense to provide a
special allocation/free interface in the MSI infrastructure. That avoids
the backdoors into the core MSI allocation code and just shares all the
regular MSI infrastructure.
Provide an optional translation callback in msi_domain_ops which can be
utilized by these wire to MSI bridges. No other MSI domain should provide a
translation callback. The default translation callback of the MSI
irqdomains will warn when it is invoked on a non-prepared MSI domain.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-8-apatel@ventanamicro.com
Now that the GIC-v3 callback can handle invocation with a fwspec parameter
count of 0 lift the restriction in the core code and invoke select()
unconditionally when the domain provides it.
Preparatory change for per device MSI domains.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-3-apatel@ventanamicro.com
Use the new __free() mechanism to remove all gotos and simplify the error
paths.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240122124243.44002-5-brgl@bgdev.pl
For better readability and maintenance keep headers in alphabetical
order.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240122124243.44002-4-brgl@bgdev.pl
alloc_desc() and early_irq_init() contain duplicated code to initialize
interrupt descriptors.
Replace that with a helper function.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240122085716.2999875-6-dawei.li@shingroup.cn
For a CONFIG_SPARSE_IRQ=n kernel, early_irq_init() is supposed to
initialize all interrupt descriptors.
It does except for irq_desc::resend_node, which ia only initialized for the
first descriptor.
Use the indexed decriptor and not the base pointer to address that.
Fixes: bc06a9e087 ("genirq: Use hlist for managing resend handlers")
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240122085716.2999875-5-dawei.li@shingroup.cn
there's little I can say which isn't in the individual changelogs.
The lengthier patch series are
- "kdump: use generic functions to simplify crashkernel reservation in
arch", from Baoquan He. This is mainly cleanups and consolidation of
the "crashkernel=" kernel parameter handling.
- After much discussion, David Laight's "minmax: Relax type checks in
min() and max()" is here. Hopefully reduces some typecasting and the
use of min_t() and max_t().
- A group of patches from Oleg Nesterov which clean up and slightly fix
our handling of reads from /proc/PID/task/... and which remove
task_struct.therad_group.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA
jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4
63BNcG3+hM9nwGJHb5lyh5m79nBMRg0=
=On4u
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"As usual, lots of singleton and doubleton patches all over the tree
and there's little I can say which isn't in the individual changelogs.
The lengthier patch series are
- 'kdump: use generic functions to simplify crashkernel reservation
in arch', from Baoquan He. This is mainly cleanups and
consolidation of the 'crashkernel=' kernel parameter handling
- After much discussion, David Laight's 'minmax: Relax type checks in
min() and max()' is here. Hopefully reduces some typecasting and
the use of min_t() and max_t()
- A group of patches from Oleg Nesterov which clean up and slightly
fix our handling of reads from /proc/PID/task/... and which remove
task_struct.thread_group"
* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
scripts/gdb/vmalloc: disable on no-MMU
scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
.mailmap: add address mapping for Tomeu Vizoso
mailmap: update email address for Claudiu Beznea
tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
.mailmap: map Benjamin Poirier's address
scripts/gdb: add lx_current support for riscv
ocfs2: fix a spelling typo in comment
proc: test ProtectionKey in proc-empty-vm test
proc: fix proc-empty-vm test with vsyscall
fs/proc/base.c: remove unneeded semicolon
do_io_accounting: use sig->stats_lock
do_io_accounting: use __for_each_thread()
ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
ocfs2: fix a typo in a comment
scripts/show_delta: add __main__ judgement before main code
treewide: mark stuff as __ro_after_init
fs: ocfs2: check status values
proc: test /proc/${pid}/statm
compiler.h: move __is_constexpr() to compiler.h
...
- Make the quirk for non-maskable MSI interrupts in the affinity setter
functional again.
It was broken by a MSI core code update, which restructured the code in
a way that the quirk flag was not longer set correctly.
Trying to restore the core logic caused a deeper inspection and it
turned out that the extra quirk flag is not required at all because
it's the inverse of the reservation mode bit, which only can be set
when the MSI interrupt is maskable.
So the trivial fix is to use the reservation mode check in the affinity
setter function and remove almost 40 lines of code related to the
no-mask quirk flag.
- Cure a Kconfig dependency issue which causes compile fails by correcting
the conditionals in the affected heaer files.
- Clean up coding style in the UV APIC driver.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmU+w2ETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodRIEACugQpAiND53Cz9MTJE1XWGheBn8n+y
WZtbVckK9LXHfnf99dtigtq3ycPBg2Mx/pItT1c71iN/NyVPNCC/3mM+ntP53etX
06v7VoIgRpF+GRQsIbjvVfBkKWS3G5zqHq6hazE0lcPhPEiOIMSBcusSQaA3C3sp
BT23rrx/59JKCb/187pum0Obx+n1oH+MG91wJ1v0zNOGgfx1u5gBPyKDZ0JAsMPx
r05991Lp4K80ooEsk/PKgpcZuPZD3QRwFDLHJwh2ITjsC22ItOnhN/c/1J8MZtk9
5YBaIzy+vyQd+nksqmDtB48FD0ioW/WLUR5zV9MMpD3RQAGgrxQmXeTqScu/j8tn
I4QZGi80HZSWLwFjWL2DoAw5LHZDh8ksxIYZBHE+8LHx0ymyX+7//naZnNFn9cXM
K5orouZFQi+dvfD7MAla73fiibPs6cHjGzRKfsfNlLBNNj+/ffcEolLlKGhEX9fx
R1w7gbtNs/RDnfoa45cvmez8UxJB5zei7l2HWbSecR2DDTQ9aGfEU/0NEr9muBzK
cuZIpmgD09VvS84jCxdYbaXA5Cau3NLOE5Os6UN9Wa3dUjWP5PArWkc8RzxP3Bgx
PANtytesqJ9YeDDg9SyylPw6d3EYNldtFZDXlD32K2O2SDmUZxxJa+0og8K9YNJN
aIUcqiSaYhg5rQ==
=2lhV
-----END PGP SIGNATURE-----
Merge tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
- Make the quirk for non-maskable MSI interrupts in the affinity setter
functional again.
It was broken by a MSI core code update, which restructured the code
in a way that the quirk flag was not longer set correctly.
Trying to restore the core logic caused a deeper inspection and it
turned out that the extra quirk flag is not required at all because
it's the inverse of the reservation mode bit, which only can be set
when the MSI interrupt is maskable.
So the trivial fix is to use the reservation mode check in the
affinity setter function and remove almost 40 lines of code related
to the no-mask quirk flag.
- Cure a Kconfig dependency issue which causes compile failures by
correcting the conditionals in the affected header files.
- Clean up coding style in the UV APIC driver.
* tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/msi: Fix misconfigured non-maskable MSI quirk
x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC
x86/platform/uv/apic: Clean up inconsistent indenting
irq_remove_generic_chip() calculates the Linux interrupt number for removing the
handler and interrupt chip based on gc::irq_base as a linear function of
the bit positions of set bits in the @msk argument.
When the generic chip is present in an irq domain, i.e. created with a call
to irq_alloc_domain_generic_chips(), gc::irq_base contains not the base
Linux interrupt number. It contains the base hardware interrupt for this
chip. It is set to 0 for the first chip in the domain, 0 + N for the next
chip, where $N is the number of hardware interrupts per chip.
That means the Linux interrupt number cannot be calculated based on
gc::irq_base for irqdomain based chips without a domain map lookup, which
is currently missing.
Rework the code to take the irqdomain case into account and calculate the
Linux interrupt number by a irqdomain lookup of the domain specific
hardware interrupt number.
[ tglx: Massage changelog. Reshuffle the logic and add a proper comment. ]
Fixes: cfefd21e69 ("genirq: Add chip suspend and resume callbacks")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231024150335.322282-1-herve.codina@bootlin.com
commit ef8dd01538 ("genirq/msi: Make interrupt allocation less
convoluted"), reworked the code so that the x86 specific quirk for affinity
setting of non-maskable PCI/MSI interrupts is not longer activated if
necessary.
This could be solved by restoring the original logic in the core MSI code,
but after a deeper analysis it turned out that the quirk flag is not
required at all.
The quirk is only required when the PCI/MSI device cannot mask the MSI
interrupts, which in turn also prevents reservation mode from being enabled
for the affected interrupt.
This allows ot remove the NOMASK quirk bit completely as msi_set_affinity()
can instead check whether reservation mode is enabled for the interrupt,
which gives exactly the same answer.
Even in the momentary non-existing case that the reservation mode would be
not set for a maskable MSI interrupt this would not cause any harm as it
just would cause msi_set_affinity() to go needlessly through the
functionaly equivalent slow path, which works perfectly fine with maskable
interrupts as well.
Rework msi_set_affinity() to query the reservation mode and remove all
NOMASK quirk logic from the core code.
[ tglx: Massaged changelog ]
Fixes: ef8dd01538 ("genirq/msi: Make interrupt allocation less convoluted")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp
When a CPU is about to be offlined, x86 validates that all active
interrupts which are targeted to this CPU can be migrated to the remaining
online CPUs. If not, the offline operation is aborted.
The validation uses irq_matrix_allocated() to retrieve the number of
vectors which are allocated on the outgoing CPU. The returned number of
allocated vectors includes also vectors which are associated to managed
interrupts.
That's overaccounting because managed interrupts are:
- not migrated when the affinity mask of the interrupt targets only
the outgoing CPU
- migrated to another CPU, but in that case the vector is already
pre-allocated on the potential target CPUs and must not be taken into
account.
As a consequence the check whether the remaining online CPUs have enough
capacity for migrating the allocated vectors from the outgoing CPU might
fail incorrectly.
Let irq_matrix_allocated() return only the number of allocated non-managed
interrupts to make this validation check correct.
[ tglx: Amend changelog and fixup kernel-doc comment ]
Fixes: 2f75d9e1c9 ("genirq: Implement bitmap matrix allocator")
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231020072522.557846-1-yu.c.chen@intel.com
irq_init_generic_chip() only sets the name for the first chip type, which
leads to empty names for other chip types. Eventually, these names will be
shown as "-" /proc/interrupts.
Set the name for all chip types by default.
Signed-off-by: Keguang Zhang <keguang.zhang@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230925121734.93017-1-keguang.zhang@gmail.com
Add a kthread_stop_put() helper that stops a thread and puts its task
struct. Use it to replace the various instances of kthread_stop()
followed by put_task_struct().
Remove the kthread_stop_put() macro in usbip that is similar but doesn't
return the result of kthread_stop().
[agruenba@redhat.com: fix kerneldoc comment]
Link: https://lkml.kernel.org/r/20230911111730.2565537-1-agruenba@redhat.com
[akpm@linux-foundation.org: document kthread_stop_put()'s argument]
Link: https://lkml.kernel.org/r/20230907234048.2499820-1-agruenba@redhat.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Core:
- Prevent a deadlock of nested interrupt threads vs.
synchronize_hard()
- Removal of a stale extern declaration
Drivers:
- The first new driver since v6.2 for Amlogic-C3 SoCs
- The usual small fixes, cleanups and improvements all over
the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmTsjR0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofmLEAC5anouyAUbGjl/cL//+2GkvWB2YgO2
+D7q8tx3Tt5US/vpiYaDvFs+at+2lAyB6M/KUkvYSCne9bm80+YqaL+73iM6YQKH
yNDrAnLR1FA4+fHIvvhmk23U1uUjWgSTL7iKufgNWf8I0aYsWLTIX3N6m0606ZLE
eUNIf7w+aZRr/axHdadRQpib6l1fvfA3C72urPRBnZDA56ZDAgE9tS0kfk9D+3sW
BgXRp4knvHBf6I4RdA10hHDTa1RuX9xkDeAC1a/ljWpbCEgEDPJ+5JI+TD+fU/d5
TCVGa7GwqJc2srRFwy76/t0jQrG7DnwW56SsMomjS+vjIu4exNFwXJ6LqZSJacwa
Z3HB0Py3awQWPfHdFqdF9LHyum+a58RHX96RenlL8Q/42qe5K6RmAIfcAaiy2OpL
xAGy9+nplMWh+qde9q1o30WPr08GhhDEXrdHZdAAODjBeoUDGmFooH5NHAFjw2+Q
ba15/f7Nl8KIl854OUJv4cftNEv5klpueLR/YUviivoO55vydRae/k/CSPhvt7TN
VIQ+vgiaiOCEwAAx2kP7Au0ADeEMCYiEqH9KWBp33dvjNZMt2DbAGLDWagcy8N9y
R8ms4c5e7Z2MvN9Z6YDihQ1XvkQsdX/dWwJq3weH3c/tP1MBFFHZYdeQhIVKTIKR
4zFKi4jrlmn0vQ==
=jiUK
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Boring updates for the interrupt subsystem:
Core:
- Prevent a deadlock of nested interrupt threads vs.
synchronize_hard()
- Removal of a stale extern declaration
Drivers:
- The first new driver since v6.2 for Amlogic-C3 SoCs
- The usual small fixes, cleanups and improvements all over the
place"
* tag 'irq-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Add support for Amlogic-C3 SoCs
dt-bindings: interrupt-controller: Add support for Amlogic-C3 SoCs
irqchip/irq-mvebu-sei: Use devm_platform_get_and_ioremap_resource()
irqchip/ls-scfg-msi: Use devm_platform_get_and_ioremap_resource()
irqchip: Explicitly include correct DT includes
irqchip/orion: Use of_address_count() helper
irqchip/irq-pruss-intc: Do not check for 0 return after calling platform_get_irq()
irqchip/imx-mu-msi: Do not check for 0 return after calling platform_get_irq()
irqchipr/i8259: Mark i8259_of_init() static
irqchip/mips-gic: Mark gic_irq_domain_free() static
irqchip/xtensa-pic: Include header for xtensa_pic_init_legacy()
irqchip/loongson-eiointc: Fix return value checking of eiointc_index
genirq: Remove unused extern declaration
genirq: Prevent nested thread vs synchronize_hardirq() deadlock
The switch to using hlist for managing software resend of interrupts
broke resend in at least two ways:
First, unconditionally adding interrupt descriptors to the resend list can
corrupt the list when the descriptor in question has already been
added. This causes the resend tasklet to loop indefinitely with interrupts
disabled as was recently reported with the Lenovo ThinkPad X13s after
threaded NAPI was disabled in the ath11k WiFi driver.
This bug is easily fixed by restoring the old semantics of irq_sw_resend()
so that it can be called also for descriptors that have already been marked
for resend.
Second, the offending commit also broke software resend of nested
interrupts by simply discarding the code that made sure that such
interrupts are retriggered using the parent interrupt.
Add back the corresponding code that adds the parent descriptor to the
resend list.
Fixes: bc06a9e087 ("genirq: Use hlist for managing resend handlers")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/lkml/20230809073432.4193-1-johan+linaro@kernel.org/
Link: https://lore.kernel.org/r/20230826154004.1417-1-johan+linaro@kernel.org
There is a possibility of deadlock if synchronize_hardirq() is called
when the nested threaded interrupt is active. The following scenario
was observed on a uniprocessor PREEMPT_NONE system:
Thread 1 Thread 2
handle_nested_thread()
Set INPROGRESS
Call ->thread_fn()
thread_fn goes to sleep
free_irq()
__synchronize_hardirq()
Busy-loop forever waiting for INPROGRESS
to be cleared
The INPROGRESS flag is only supposed to be used for hard interrupt
handlers. Remove the incorrect usage in the nested threaded interrupt
case and instead re-use the threads_active / wait_for_threads mechanism
to wait for nested threaded interrupts to complete.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613-genirq-nested-v3-1-ae58221143eb@axis.com
- A number of Loogson/Loogarch fixes
- Allow the core code to retrigger an interrupt that has
fired while the same interrupt is being handled on another
CPU, papering over a GICv3 architecture issue
- Work around an integration problem on ASR8601, where the CPU
numbering isn't representable in the GIC implementation...
- Add some missing interrupt to the STM32 irqchip
- A bunch of warning squashing triggered by W=1 builds
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmSWHlEPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDgDcP/AyuNjjt9ybZ8LT5hGYRBKTfO3bsssiV/ZIE
NRbfwK6nWTurA3tN9XypYouB3HfKvTvDYHzeo9Yi2y51GI20o7noSfVNHHld5dAE
WTBYcfc6n+1KqGNA1tPVeE403jd50TnmOmFaXhN1Vn/KQVQKs+LmjoVdS7HXKmwu
PLk1tbFSGh5+vgLQTlhze5Wz7nU59lJ5WfCJ8negYls/5l3FseGgxEuffWKytElJ
I6PbnRyd9y7R/KYmkobFjS98cjeNznAxJEm/2InVpcb3IwUSEyJUHgNwbs6KSeYf
4BdVyxSAQlFPgxO17UKQxblYu/HuFY/NWghkg7dn6ZgVubj52rtrl9txUId57HGC
pYbSVm0bMevqlgHDYHold+Q4I6DqltyHKgiA9UfU9IQj4MV/KXbDjtng/ih4ZkHe
rw+BnckyrhZ5K5p3nDlzS/TNQT1baKqNFgx/O5HPV27NT8ARynqoFxNUzA0z37v3
x5dj34kiIW2kAGJDOP9+9jj2uF4yX0MKdNxdMs8PM9hYAsO0AqItNsXEQoUuzJF+
u1hhGZ1H6HYmY2WphXD57fnIR/7Hm6Iz00XC17+1jUP+wsefhAbreOyJIbbcpENF
IIxrcIxAPl3PQPsj8n98/9fKQ6u/QfrvWUH+6aoFQH/Np7RTFWmDh8bqucjuXsGS
8gxR3REr
=quHC
-----END PGP SIGNATURE-----
Merge tag 'irqchip-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- A number of Loogson/Loogarch fixes
- Allow the core code to retrigger an interrupt that has
fired while the same interrupt is being handled on another
CPU, papering over a GICv3 architecture issue
- Work around an integration problem on ASR8601, where the CPU
numbering isn't representable in the GIC implementation...
- Add some missing interrupt to the STM32 irqchip
- A bunch of warning squashing triggered by W=1 builds
Link: https://lore.kernel.org/r/20230623224345.3577134-1-maz@kernel.org
* irq/misc-6.5:
: .
: Misc cleanups:
:
: - Add a number of missing prototypes
: - Mark global symbol as static where needed
: - Drop some now useless non-DT code paths
: - Add a missing interrupt mapping to the STM32 irqchip
: - Silence another STM32 warning when building with W=1
: - Fix the jcore-aic driver that actually never worked...
: .
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
Signed-off-by: Marc Zyngier <maz@kernel.org>
irq_domain_debugfs_init() is defined in irqdomain.c, but the
declaration is in a header that is not included here:
kernel/irq/irqdomain.c:1965:13: error: no previous prototype for 'irq_domain_debugfs_init' [-Werror=missing-prototypes]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230516200432.554240-1-arnd@kernel.org
There is a class of interrupt controllers out there that, once they
have signalled a given interrupt number, will still signal incoming
instances of the *same* interrupt despite the original interrupt
not having been EOIed yet.
As long as the new interrupt reaches the *same* CPU, nothing bad
happens, as that CPU still has its interrupts globally disabled,
and we will only take the new interrupt once the interrupt has
been EOIed.
However, things become more "interesting" if an affinity change comes
in while the interrupt is being handled. More specifically, while
the per-irq lock is being dropped. This results in the affinity change
taking place immediately. At this point, there is nothing that prevents
the interrupt from firing on the new target CPU. We end-up with the
interrupt running concurrently on two CPUs, which isn't a good thing.
And that's where things become worse: the new CPU notices that the
interrupt handling is in progress (irq_may_run() return false), and
*drops the interrupt on the floor*.
The whole race looks like this:
CPU 0 | CPU 1
-----------------------------|-----------------------------
interrupt start |
handle_fasteoi_irq | set_affinity(CPU 1)
handler |
... | interrupt start
... | handle_fasteoi_irq -> early out
handle_fasteoi_irq return | interrupt end
interrupt end |
If the interrupt was an edge, too bad. The interrupt is lost, and
the system will eventually die one way or another. Not great.
A way to avoid this situation is to detect this problem at the point
we handle the interrupt on the new target. Instead of dropping the
interrupt, use the resend mechanism to force it to be replayed.
Also, in order to limit the impact of this workaround to the pathetic
architectures that require it, gate it behind a new irq flag aptly
named IRQD_RESEND_WHEN_IN_PROGRESS.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
[maz: reworded commit mesage]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-3-jgowans@amazon.com
Adding a bit more info about what the flags are used for may help future
code readers.
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-2-jgowans@amazon.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZHGVRAAKCRCAXGG7T9hj
vqtJAQDizKasLE7tSnfs/FrZ/4xPaDLe3bQifMx2C1dtYCjRcAD/ciZSa1L0WzZP
dpEZnlYRzsR3bwLktQEMQFOvlbh1SwE=
=K860
-----END PGP SIGNATURE-----
Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a double free fix in the Xen pvcalls backend driver
- a fix for a regression causing the MSI related sysfs entries to not
being created in Xen PV guests
- a fix in the Xen blkfront driver for handling insane input data
better
* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pci/xen: populate MSI sysfs entries
xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
xen/blkfront: Only check REQ_FUA for writes
Commit bf5e758f02 ("genirq/msi: Simplify sysfs handling") reworked the
creation of sysfs entries for MSI IRQs. The creation used to be in
msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
Then it moved into __msi_domain_alloc_irqs which is an implementation of
domain_alloc_irqs. However, Xen comes with the only other implementation
of domain_alloc_irqs and hence doesn't run the sysfs population code
anymore.
Commit 6c796996ee ("x86/pci/xen: Fixup fallout from the PCI/MSI
overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
but that doesn't actually have an effect because Xen uses it's own
domain_alloc_irqs implementation.
Fix this by making use of the fallback functions for sysfs population.
Fixes: bf5e758f02 ("genirq/msi: Simplify sysfs handling")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
The current implementation uses a static bitmap for interrupt descriptor
allocation and a radix tree to pointer store the pointer for lookup.
However, the size of the bitmap is constrained by the build time macro
MAX_SPARSE_IRQS, which may not be sufficient to support high-end servers,
particularly those with GICv4.1 hardware, which require a large interrupt
space to cover LPIs and vSGIs.
Replace the bitmap and the radix tree with a maple tree, which not only
stores pointers for lookup, but also provides a mechanism to find free
ranges. That removes the build time hardcoded upper limit.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-4-sdonthineni@nvidia.com
Move the open coded sparse bitmap handling into helper functions as
a preparatory step for converting the sparse interrupt management
to a maple tree.
No functional change.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-3-sdonthineni@nvidia.com
The current implementation utilizes a bitmap for managing interrupt resend
handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros.
However, this method may not efficiently utilize memory during runtime,
particularly when IRQ_BITMAP_BITS is large.
Address this issue by using an hlist to manage interrupt resend handlers
instead of relying on a static bitmap memory allocation. Additionally, a
new function, clear_irq_resend(), is introduced and called from
irq_shutdown to ensure a graceful teardown of the interrupt.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-2-sdonthineni@nvidia.com
For interrupts with secondary threads, the affinity is applied when the
thread is created but if the interrupts affinity is changed later only
the primary thread is updated.
Update the secondary thread's affinity as well to keep all the interrupts
activity on the assigned CPUs.
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230406180857.588682-1-john@metanate.com
- Prevent possible NULL pointer derefences in irq_data_get_affinity_mask()
and irq_domain_create_hierarchy().
- Take the per device MSI lock before invoking code which relies
on it being hold.
- Make sure that MSI descriptors are unreferenced before freeing
them. This was overlooked when the platform MSI code was converted to
use core infrastructure and results in a fals positive warning.
- Remove dead code in the MSI subsystem.
- Clarify the documentation for pci_msix_free_irq().
- More kobj_type constification.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmQEVToTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodc9EACg5HBOGsh5OV8pwnuEDqfThK/dOZ5k
LJ8xQjGAx29JeNWu4gkkHaSDEGjZhwLiZlB6qUeH+LPQ1NgmAlLL3T2NxEOOWa6y
z/xQv+1Ceu6XxazpCSFRWR/6w4Nyup92jhlsUIkmmsWkVvKH/pV6Uo+3ta0WagWg
heb3vqts6J0AOJaMepF8azYGbwAPSIElNLI1UtiEuQYEKU55N8jLK20VJTL6lzJ2
FyRg/0ghNWDAaBdnv4cZCQ/MzoG5UkoU3f2cqhdSce5mqnq2fKRfgBjzllNgaRgA
zxOxIR88QaKTMHIr+WKD1dyWxDQlotFbBOkmVW39XAa13rn42s4GIeW88VCjJGww
RAm52SbC48cCIyNQlh4A6Vhb4vjPx2DndWbWnnVj5fWlUevdAPRxSlm4BjfxFxe+
LbuZCRRL1jjlC0fXmhVXTTxeE1/K7jarAZwRV7Nxhr3g0gT+Zv1jyaaW9rWuHq5U
3pS+xBl89LA/VYp9tv6jDfJlocmRwgrFbGX4UlfikqtObdTFqcH0FtmqisE61fZS
n0194BMWNDfPSibSpDohf/CDPoHZ6pNxeuqkVDiisUJHPpIYOt8+lH+8//DgBL7a
oi9zS0JazPIn2VM6NB4f/WXOYmS9GZq5+loiYEWb52AYtodKUKmOoWG0SUyy6XFr
E7yJzemsUwrJVg==
=jWot
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"A set of updates for the interrupt susbsystem:
- Prevent possible NULL pointer derefences in
irq_data_get_affinity_mask() and irq_domain_create_hierarchy()
- Take the per device MSI lock before invoking code which relies on
it being hold
- Make sure that MSI descriptors are unreferenced before freeing
them. This was overlooked when the platform MSI code was converted
to use core infrastructure and results in a fals positive warning
- Remove dead code in the MSI subsystem
- Clarify the documentation for pci_msix_free_irq()
- More kobj_type constification"
* tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced
genirq/msi: Drop dead domain name assignment
irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy()
genirq/irqdesc: Make kobj_type structures constant
PCI/MSI: Clarify usage of pci_msix_free_irq()
genirq/msi: Take the per-device MSI lock before validating the control structure
genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
Miquel reported a warning in the MSI core which is triggered when
interrupts are freed via platform_msi_device_domain_free().
This code got reworked to use core functions for freeing the MSI
descriptors, but nothing took care to clear the msi_desc->irq entry, which
then triggers the warning in msi_free_msi_desc() which uses desc->irq to
validate that the descriptor has been torn down. The same issue exists in
msi_domain_populate_irqs().
Up to the point that msi_free_msi_descs() grew a warning for this case,
this went un-noticed.
Provide the counterpart of msi_domain_populate_irqs() and invoke it in
platform_msi_device_domain_free() before freeing the interrupts and MSI
descriptors and also in the error path of msi_domain_populate_irqs().
Fixes: 2f2940d168 ("genirq/msi: Remove filter from msi_free_descs_free_range()")
Reported-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87mt4wkwnv.ffs@tglx
Some polishing and small fixes for iommufd:
- Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem
- Use GFP_KERNEL_ACCOUNT inside the iommu_domains
- Support VFIO_NOIOMMU mode with iommufd
- Various typos
- A list corruption bug if HWPTs are used for attach
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF
Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D
crkeq9D4FUiPAkFnJ64Exw2FHb060Qg=
=RABK
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"Some polishing and small fixes for iommufd:
- Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt
subsystem
- Use GFP_KERNEL_ACCOUNT inside the iommu_domains
- Support VFIO_NOIOMMU mode with iommufd
- Various typos
- A list corruption bug if HWPTs are used for attach"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
vfio: Support VFIO_NOIOMMU with iommufd
iommufd: Add three missing structures in ucmd_buffer
selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
iommu: Remove IOMMU_CAP_INTR_REMAP
irq/s390: Add arch_is_isolated_msi() for s390
iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code
iommufd: Convert to msi_device_has_isolated_msi()
vfio/type1: Convert to iommu_group_has_isolated_msi()
iommu: Add iommu_group_has_isolated_msi()
genirq/msi: Add msi_device_has_isolated_msi()
Since commit d59f6617ee ("genirq: Allow fwnode to carry name
information only") an IRQ domain is always given a name during
allocation (e.g. used for the debugfs entry).
Drop the unused fallback name assignment when creating MSI domains.
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230224130509.27814-1-johan+linaro@kernel.org
The recent switch to per-domain locking caused a NULL dereference in
irq_domain_create_hierarchy(), as Xen code is calling
msi_create_irq_domain() with a NULL parent pointer.
Fix that by testing parent to be set before dereferencing it. For a
non-existing parent the irqdomain's root will stay to point to
itself.
Fixes: 9dbb8e3452 ("irqdomain: Switch to per-domain locking")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230223083800.31347-1-jgross@suse.com
Core:
- Move the interrupt affinity spreading mechanism into lib/group_cpus
so it can be used for similar spreading requirements, e.g. in the
block multi-queue code.
This also contains a first usecase in the block multi-queue code which
Jens asked to take along with the librarization.
- Improve irqdomain locking to close a number race conditions which
can be observed with massive parallel device driver probing.
- Enforce and document the semantics of disable_irq() which cannot be
invoked safely from non-sleepable context.
- Move the IPI multiplexing code from the Apple AIC driver into the
core. so it can be reused by RISCV.
Drivers:
- Plug OF node refcounting leaks in various drivers.
- Correctly mark level triggered interrupts in the Broadcom L2 drivers.
- The usual small fixes and improvements.
- No new drivers for the record!
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzUSkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoY3DEAC9E4yLO7VxxTrs/KrAVCgL3SnHVXQU
nE42uFbQwpCILuNmnqP3uvTHLCsXZkbuBaZEbxLBxC2iyU6+31N1Is+e6cClGMjK
kX6U9g9EqiRCdX3fgJiEU16fCgE8D1AEg+7XKLjeasQhCfKQGGtCtE9/Gmg/Ji92
gcEY/bjvm1hcoNo9dh/vR4k0k63fb13716RLScozUkS/XYVlu+LrrG349gD2WEA9
lh1twDkXvZTWkiYKWAkLorxcNyKhcnJxJw8zEIGVF5b6pCCudK8gXjBbMD5abC7W
xano6B8F455eSKNsi2TWyW47ZHUkC60sqCNDgI2MBTsI7D72UpAJoDfe0VjbMoaH
RQJnrGsUQbviBUen+LEet7nWZBQJRKZHOVtYEjA8ndB3PJUXKKcLeODdw11odyjR
bgZk+0wnowMArIaoLfeItF2oSpfSzLVxh2i8Aeus5tBesvhVCOi4LABRBKGCWvMj
cpSlMhZ4znMnr5j5lOGpcAjKFlWVh1HmF70Y2deGZi5xC8EXFL/VsB7rH5LEEEuF
7I8CO8M1mXeOTJoCchCbuAYgZyuk1DIhKUyOiYQZblaPNGcVGvCIN31SFBRT9h/8
e0VwSvVL756GhotUp/LjgTdG7MoKspWqRG00+q84SsDalsKGXMW7zmHc+1NgGN/C
Yxio1Jlly9Rwyw==
=+pu3
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Move the interrupt affinity spreading mechanism into lib/group_cpus
so it can be used for similar spreading requirements, e.g. in the
block multi-queue code
This also contains a first usecase in the block multi-queue code
which Jens asked to take along with the librarization
- Improve irqdomain locking to close a number race conditions which
can be observed with massive parallel device driver probing
- Enforce and document the semantics of disable_irq() which cannot be
invoked safely from non-sleepable context
- Move the IPI multiplexing code from the Apple AIC driver into the
core, so it can be reused by RISCV
Drivers:
- Plug OF node refcounting leaks in various drivers
- Correctly mark level triggered interrupts in the Broadcom L2
drivers
- The usual small fixes and improvements
- No new drivers for the record!"
* tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
irqdomain: Switch to per-domain locking
irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
irqchip/gic-v2m: Use irq_domain_create_hierarchy()
irqchip/alpine-msi: Use irq_domain_add_hierarchy()
x86/uv: Use irq_domain_create_hierarchy()
x86/ioapic: Use irq_domain_create_hierarchy()
irqdomain: Clean up irq_domain_push/pop_irq()
irqdomain: Drop leftover brackets
irqdomain: Drop dead domain-name assignment
irqdomain: Drop revmap mutex
irqdomain: Fix domain registration race
irqdomain: Fix mapping-creation race
irqdomain: Refactor __irq_domain_alloc_irqs()
irqdomain: Look for existing mapping only once
irqdomain: Drop bogus fwspec-mapping error handling
...
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definitions which prevents
modification at runtime.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230217-kobj_type-irq-v1-1-fedfacaf8cdb@weissschuh.net
Calling msi_ctrl_valid() ultimately results in calling
msi_get_device_domain(), which requires holding the device MSI lock.
However, in msi_domain_populate_irqs() the lock is taken right after having
called msi_ctrl_valid(), which is just a tad too late.
Take the lock before invoking msi_ctrl_valid().
Fixes: 40742716f2 ("genirq/msi: Make msi_add_simple_msi_descs() device domain aware")
Reported-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/Y/Opu6ETe3ZzZ/8E@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20230220190101.314446-1-maz@kernel.org
If ipi_send_{mask|single}() is called with an invalid interrupt number, all
the local variables there will be NULL. ipi_send_verify() which is invoked
from these functions does verify its 'data' parameter, resulting in a
kernel oops in irq_data_get_affinity_mask() as the passed NULL pointer gets
dereferenced.
Add a missing NULL pointer check in ipi_send_verify()...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Fixes: 3b8e29a82d ("genirq: Implement ipi_send_mask/single()")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/b541232d-c2b6-1fe9-79b4-a7129459e4d0@omp.ru
- New and improved irqdomain locking, closing a number of races that
became apparent now that we are able to probe drivers in parallel
- A bunch of OF node refcounting bugs have been fixed
- We now have a new IPI mux, lifted from the Apple AIC code and
made common. It is expected that riscv will eventually benefit
from it
- Two small fixes for the Broadcom L2 drivers
- Various cleanups and minor bug fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmPw4OgPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDYVgP/iVFxCPs+DCWUYvyTC8rvNzOj51COHUV/7yD
mY5BTIjH3yTQPDhQmFvITCAjKaMYc3eDLml/nF4tTCU0MFig+KsRsWNIEFXtSsI0
wO+S19QhHzj5odUok5IDC+cNTXScp2HV+vFoOhhf0zDzXqwVxRr7lO5i+n37ELMp
Mm9g2+EeUt43xTQxzbmNn5Kkpq9PMEnQFU2UkvJleg+KCgzSYThcR8/KUDKySZpk
TP+mcR5PevcqGhLt7vYS2lGh8Ye1warzp54C7Je8P8Txg3BM8xBynT1d3fgrlKfm
AOAPVW3PV6bPhgVYXZJopH3ykfmYM4ZiIvhRcgLyf6tbZAU6Twpiq823TAOVHyPI
SRcW8dehuvgq1VJIpRGZOSB2qIvFrqLhl0B1CtT04gFWJW9bSa2n5Y1h4Gcqy29o
SLJiKscx2KqvPmQqarLUUnuOZ5hhIrtYhkhhJuuwqZqzS1Kkz/mSB1MkPQEGxJi1
MpoTfbQ/0KTYXCqqgs/GBnDJ0mYrcvtBoGP7bjnVYnXpANP2bs+ZpQVPVq+17uuQ
k0gjxe8iENqXjW6JMlFX5K3dxG5ygXjfECMWsCJ+JdCtJdaIL8I46X/u7wHU2mfY
bohhb7xS2+HIPxz6w8aRu3IQG00mMv06vCYPBbPh+W0dUtocdM3U2kpe5gPYm1iz
kWx3WLaM
=ONcj
-----END PGP SIGNATURE-----
Merge tag 'irqchip-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- New and improved irqdomain locking, closing a number of races that
became apparent now that we are able to probe drivers in parallel
- A bunch of OF node refcounting bugs have been fixed
- We now have a new IPI mux, lifted from the Apple AIC code and
made common. It is expected that riscv will eventually benefit
from it
- Two small fixes for the Broadcom L2 drivers
- Various cleanups and minor bug fixes
Link: https://lore.kernel.org/r/20230218143452.3817627-1-maz@kernel.org
* irq/irqdomain-locking:
: .
: irqdomain locking overhaul courtesy of Johan Hovold.
:
: From the cover letter:
:
: "Parallel probing (e.g. due to asynchronous probing) of devices that
: share interrupts can currently result in two mappings for the same
: hardware interrupt to be created.
:
: This series fixes this mapping race and reworks the irqdomain locking so
: that in the end the global irq_domain_mutex is only used for managing
: the likewise global irq_domain_list, while domain operations (e.g. IRQ
: allocations) use per-domain (hierarchy) locking."
: .
irqdomain: Switch to per-domain locking
irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
irqchip/gic-v2m: Use irq_domain_create_hierarchy()
irqchip/alpine-msi: Use irq_domain_add_hierarchy()
x86/uv: Use irq_domain_create_hierarchy()
x86/ioapic: Use irq_domain_create_hierarchy()
irqdomain: Clean up irq_domain_push/pop_irq()
irqdomain: Drop leftover brackets
irqdomain: Drop dead domain-name assignment
irqdomain: Drop revmap mutex
irqdomain: Fix domain registration race
irqdomain: Fix mapping-creation race
irqdomain: Refactor __irq_domain_alloc_irqs()
irqdomain: Look for existing mapping only once
irqdomain: Drop bogus fwspec-mapping error handling
irqdomain: Fix disassociation race
irqdomain: Fix association race
Signed-off-by: Marc Zyngier <maz@kernel.org>
The IRQ domain structures are currently protected by the global
irq_domain_mutex. Switch to using more fine-grained per-domain locking,
which can speed up parallel probing by reducing lock contention.
On a recent arm64 laptop, the total time spent waiting for the locks
during boot drops from 160 to 40 ms on average, while the maximum
aggregate wait time drops from 550 to 90 ms over ten runs for example.
Note that the domain lock of the root domain (innermost domain) must be
used for hierarchical domains. For non-hierarchical domains (as for root
domains), the new root pointer is set to the domain itself so that
&domain->root->mutex always points to the right lock.
Also note that hierarchical domains should be constructed using
irq_domain_create_hierarchy() (or irq_domain_add_hierarchy()) to avoid
having racing allocations access a not fully initialised domain. As a
safeguard, the lockdep assertion in irq_domain_set_mapping() will catch
any offenders that also fail to set the root domain pointer.
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-21-johan+linaro@kernel.org
The irq_domain_push_irq() interface is used to add a new (outmost) level
to a hierarchical domain after IRQs have been allocated.
Possibly due to differing mental images of hierarchical domains, the
names used for the irq_data variables make these functions much harder
to understand than what they need to be.
Rename the struct irq_data pointer to the data embedded in the
descriptor as simply 'irq_data' and refer to the data allocated by this
interface as 'parent_irq_data' so that the names reflect how
hierarchical domains are implemented.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-12-johan+linaro@kernel.org
Drop some unnecessary brackets that were left in place when the
corresponding code was updated.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-11-johan+linaro@kernel.org
Since commit d59f6617ee ("genirq: Allow fwnode to carry name
information only") an IRQ domain is always given a name during
allocation (e.g. used for the debugfs entry).
Drop the leftover name assignment when allocating the first IRQ.
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-10-johan+linaro@kernel.org
The revmap mutex is essentially only used to maintain the integrity of
the radix tree during updates (lookups use RCU).
As the global irq_domain_mutex is now held in all paths that update the
revmap structures there is strictly no longer any need for the dedicated
mutex, which can be removed.
Drop the revmap mutex and add lockdep assertions to the revmap helpers
to make sure that the global lock is always held when updating the
revmap.
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-9-johan+linaro@kernel.org
Hierarchical domains created using irq_domain_create_hierarchy() are
currently added to the domain list before having been fully initialised.
This specifically means that a racing allocation request might fail to
allocate irq data for the inner domains of a hierarchy in case the
parent domain pointer has not yet been set up.
Note that this is not really any issue for irqchip drivers that are
registered early (e.g. via IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE())
but could potentially cause trouble with drivers that are registered
later (e.g. modular drivers using IRQCHIP_PLATFORM_DRIVER_BEGIN(),
gpiochip drivers, etc.).
Fixes: afb7da83b9 ("irqdomain: Introduce helper function irq_domain_add_hierarchy()")
Cc: stable@vger.kernel.org # 3.19
Signed-off-by: Marc Zyngier <maz@kernel.org>
[ johan: add commit message ]
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-8-johan+linaro@kernel.org
Parallel probing of devices that share interrupts (e.g. when a driver
uses asynchronous probing) can currently result in two mappings for the
same hardware interrupt to be created due to missing serialisation.
Make sure to hold the irq_domain_mutex when creating mappings so that
looking for an existing mapping before creating a new one is done
atomically.
Fixes: 765230b5f0 ("driver-core: add asynchronous probing support for drivers")
Fixes: b62b2cf575 ("irqdomain: Fix handling of type settings for existing mappings")
Link: https://lore.kernel.org/r/YuJXMHoT4ijUxnRb@hovoldconsulting.com
Cc: stable@vger.kernel.org # 4.8
Cc: Dmitry Torokhov <dtor@chromium.org>
Cc: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-7-johan+linaro@kernel.org
Refactor __irq_domain_alloc_irqs() so that it can be called internally
while holding the irq_domain_mutex.
This will be used to fix a shared-interrupt mapping race, hence the
Fixes tag.
Fixes: b62b2cf575 ("irqdomain: Fix handling of type settings for existing mappings")
Cc: stable@vger.kernel.org # 4.8
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-6-johan+linaro@kernel.org
Avoid looking for an existing mapping twice when creating a new mapping
using irq_create_fwspec_mapping() by factoring out the actual allocation
which is shared with irq_create_mapping_affinity().
The new helper function will also be used to fix a shared-interrupt
mapping race, hence the Fixes tag.
Fixes: b62b2cf575 ("irqdomain: Fix handling of type settings for existing mappings")
Cc: stable@vger.kernel.org # 4.8
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-5-johan+linaro@kernel.org
In case a newly allocated IRQ ever ends up not having any associated
struct irq_data it would not even be possible to dispose the mapping.
Replace the bogus disposal with a WARN_ON().
This will also be used to fix a shared-interrupt mapping race, hence the
CC-stable tag.
Fixes: 1e2a7d7849 ("irqdomain: Don't set type when mapping an IRQ")
Cc: stable@vger.kernel.org # 4.8
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-4-johan+linaro@kernel.org
The global irq_domain_mutex is held when mapping interrupts from
non-hierarchical domains but currently not when disposing them.
This specifically means that updates of the domain mapcount is racy
(currently only used for statistics in debugfs).
Make sure to hold the global irq_domain_mutex also when disposing
mappings from non-hierarchical domains.
Fixes: 9dc6be3d41 ("genirq/irqdomain: Add map counter")
Cc: stable@vger.kernel.org # 4.13
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-3-johan+linaro@kernel.org
The sanity check for an already mapped virq is done outside of the
irq_domain_mutex-protected section which means that an (unlikely) racing
association may not be detected.
Fix this by factoring out the association implementation, which will
also be used in a follow-on change to fix a shared-interrupt mapping
race.
Fixes: ddaf144c61 ("irqdomain: Refactor irq_domain_associate_many()")
Cc: stable@vger.kernel.org # 3.11
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-2-johan+linaro@kernel.org
Using __irq_domain_alloc_irqs() is an unnecessary complexity. Use
irq_domain_alloc_irqs(), which is simpler and makes the code more
readable.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Here are a number of small char/misc/whatever driver fixes for 6.2-rc7.
They include:
- IIO driver fixes for some reported problems
- nvmem driver fixes
- fpga driver fixes
- debugfs memory leak fix in the hv_balloon and irqdomain code
(irqdomain change was acked by the maintainer.)
All have been in linux-next with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY9+Xkw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yl92QCfSTATuad8nUXmtBDsveUyAV3G6pkAoJFMWAIj
otmAl9XOGWxdwAURdovs
=tSdo
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a number of small char/misc/whatever driver fixes. They
include:
- IIO driver fixes for some reported problems
- nvmem driver fixes
- fpga driver fixes
- debugfs memory leak fix in the hv_balloon and irqdomain code
(irqdomain change was acked by the maintainer)
All have been in linux-next with no reported problems"
* tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (33 commits)
kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()
HV: hv_balloon: fix memory leak with using debugfs_lookup()
nvmem: qcom-spmi-sdam: fix module autoloading
nvmem: core: fix return value
nvmem: core: fix cell removal on error
nvmem: core: fix device node refcounting
nvmem: core: fix registration vs use race
nvmem: core: fix cleanup after dev_set_name()
nvmem: core: remove nvmem_config wp_gpio
nvmem: core: initialise nvmem->id early
nvmem: sunxi_sid: Always use 32-bit MMIO reads
nvmem: brcm_nvram: Add check for kzalloc
iio: imu: fxos8700: fix MAGN sensor scale and unit
iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
iio: imu: fxos8700: fix failed initialization ODR mode assignment
iio: imu: fxos8700: fix incorrect ODR mode readback
iio: light: cm32181: Fix PM support on system with 2 I2C resources
iio: hid: fix the retval in gyro_3d_capture_sample
iio: hid: fix the retval in accel_3d_capture_sample
iio: imu: st_lsm6dsx: fix build when CONFIG_IIO_TRIGGERED_BUFFER=m
...
All RISC-V platforms have a single HW IPI provided by the INTC local
interrupt controller. The HW method to trigger INTC IPI can be through
external irqchip (e.g. RISC-V AIA), through platform specific device
(e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call).
To support multiple IPIs on RISC-V, add a generic IPI multiplexing
mechanism which help us create multiple virtual IPIs using a single
HW IPI. This generic IPI multiplexing is inspired by the Apple AIC
irqchip driver and it is shared by various RISC-V irqchip drivers.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Hector Martin <marcan@marcan.st>
Tested-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230103141221.772261-4-apatel@ventanamicro.com
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time. To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable <stable@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230202151554.2310273-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
msi_create_device_irq_domain() creates a firmware node for the new domain,
which is never freed. kmemleak reports:
unreferenced object 0xffff888120ba9a00 (size 96):
comm "systemd-modules", pid 221, jiffies 4294893411 (age 635.732s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff ................
00 00 00 00 00 00 00 00 18 9a ba 20 81 88 ff ff ........... ....
backtrace:
[<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
[<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
[<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
[<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
[<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
[<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
[<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
[<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
[<00000000df8efb43>] local_pci_probe+0xd6/0x170
[<0000000085cb9924>] pci_device_probe+0x231/0x6e0
Use the proper free operation for the firmware wnode so the name is freed
during error unwind of msi_create_device_irq_domain() and also free the
node in msi_remove_device_irq_domain() if it was automatically allocated.
To avoid extra NULL pointer checks make irq_domain_free_fwnode() tolerant
of NULL.
Fixes: 27a6dea3eb ("genirq/msi: Provide msi_create/free_device_irq_domain()")
Reported-by: Omri Barazi <obarazi@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kalle Valo <kvalo@kernel.org>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-24af6665e2da+c9-msi_leak_jgg@nvidia.com
group_cpus_evenly() has become a generic function which can be used for
other subsystems than the interrupt subsystem, so move it into lib/.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20221227022905.352674-6-ming.lei@redhat.com
Map irq vector into group, which allows to abstract the algorithm for
a generic use case outside of the interrupt core.
Rename irq_build_affinity_masks as group_cpus_evenly, so the API can be
reused for blk-mq to make default queue mapping even though irq vectors
aren't involved.
No functional change, just rename vector as group.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20221227022905.352674-5-ming.lei@redhat.com
Prepare for abstracting irq_build_affinity_masks() into a public function
for assigning all CPUs evenly into several groups.
Don't pass irq_affinity_desc array to irq_build_affinity_masks, instead
return a cpumask array by storing each assigned group into one element of
the array.
This allows to provide a generic interface for grouping all CPUs evenly
from a NUMA and CPU locality viewpoint, and the cost is one extra allocation
in irq_build_affinity_masks(), which should be fine since it is done via
GFP_KERNEL and irq_build_affinity_masks() is a slow path anyway.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20221227022905.352674-4-ming.lei@redhat.com
Pass affinity managed mask array to irq_build_affinity_masks() so that the
index of the first affinity managed vector is always zero.
This allows to simplify the implementation a bit.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20221227022905.352674-3-ming.lei@redhat.com
The 'firstvec' parameter is always same with the parameter of
'startvec', so use 'startvec' directly inside irq_build_affinity_masks().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20221227022905.352674-2-ming.lei@redhat.com
s390 doesn't use irq_domains, so it has no place to set
IRQ_DOMAIN_FLAG_ISOLATED_MSI. Instead of continuing to abuse the iommu
subsystem to convey this information add a simple define which s390 can
make statically true. The define will cause msi_device_has_isolated() to
return true.
Remove IOMMU_CAP_INTR_REMAP from the s390 iommu driver.
Link: https://lore.kernel.org/r/8-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
What x86 calls "interrupt remapping" is one way to achieve isolated MSI,
make it clear this is talking about isolated MSI, no matter how it is
achieved. This matches the new driver facing API name of
msi_device_has_isolated_msi()
No functional change.
Link: https://lore.kernel.org/r/6-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
After converting the users of irq_domain_check_msi_remap() it and the
helpers are no longer needed.
The new version does not require all the #ifdef helpers and inlines
because CONFIG_GENERIC_MSI_IRQ always requires CONFIG_IRQ_DOMAIN and
IRQ_DOMAIN_HIERARCHY.
Link: https://lore.kernel.org/r/5-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This will replace irq_domain_check_msi_remap() in following patches.
The new API makes it more clear what "msi_remap" actually means from a
functional perspective instead of identifying an implementation specific
HW feature.
Isolated MSI means that HW modeled by an irq_domain on the path from the
initiating device to the CPU will validate that the MSI message specifies
an interrupt number that the device is authorized to trigger. This must
block devices from triggering interrupts they are not authorized to
trigger. Currently authorization means the MSI vector is one assigned to
the device.
This is interesting for securing VFIO use cases where a rouge MSI (eg
created by abusing a normal PCI MemWr DMA) must not allow the VFIO
userspace to impact outside its security domain, eg userspace triggering
interrupts on kernel drivers, a VM triggering interrupts on the
hypervisor, or a VM triggering interrupts on another VM.
As this is actually modeled as a per-irq_domain property, not a global
platform property, correct the interface to accept the device parameter
and scan through only the part of the irq_domains hierarchy originating
from the source device.
Locate the new code in msi.c as it naturally only works with
CONFIG_GENERIC_MSI_IRQ, which also requires CONFIG_IRQ_DOMAIN and
IRQ_DOMAIN_HIERARCHY.
Link: https://lore.kernel.org/r/1-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
With the introduction of threaded interrupt handlers, it is virtually
never safe to call disable_irq() from non-premptible context.
Thus: Update the documentation, add an explicit might_sleep() to catch any
offenders. This is more obvious and straight forward than the implicit
might_sleep() check deeper down in the disable_irq() call chain.
Fixes: 3aa551c9b4 ("genirq: add threaded interrupt handler support")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221216150441.200533-3-manfred@colorfullife.com
On architectures such as s390 that do not use irq domains for MSI,
returning 0 as the maximum MSI index is a bit counter-productive,
as it indicates that no MSI can be allocated. Bad idea.
Instead, return the maximum we're willing to support in the MSI
backing store (MSI_XA_DOMAIN_SIZE), and let the arch code do its
usual thing.
Thanks to Matthew Rosato for fixing the fix.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87fsdgzpqs.ffs@tglx
For architectures such as s390 and powerpc that do not use
irq domains for MSIs, dev->msi.domain is always NULL, so
the per-device, per-bus MSI domain is also guaranteed to
be NULL.
So checking one without checking the other is bound to result
in a splat, followed by a memory leak as we don't free the MSI
descriptors.
Add the missing check.
Reported-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/e570e70d-19bc-101b-0481-ff9a3cab3504@linux.ibm.com
For supporting post MSI-X enable allocations and for the upcoming PCI/IMS
support a separate interface is required which allows not only the
allocation of a specific index, but also the allocation of any, i.e. the
next free index. The latter is especially required for IMS because IMS
completely does away with index to functionality mappings which are
often found in MSI/MSI-X implementation.
But even with MSI-X there are devices where only the first few indices have
a fixed functionality and the rest is freely assignable by software,
e.g. to queues.
msi_domain_alloc_irq_at() is also different from the range based interfaces
as it always enforces that the MSI descriptor is allocated by the core code
and not preallocated by the caller like the PCI/MSI[-X] enable code path
does.
msi_domain_alloc_irq_at() can be invoked with the index argument set to
MSI_ANY_INDEX which makes the core code pick the next free index. The irq
domain can provide a prepare_desc() operation callback in it's
msi_domain_ops to do domain specific post allocation initialization before
the actual Linux interrupt and the associated interrupt descriptor and
hierarchy alloccations are conducted.
The function also takes an optional @icookie argument which is of type
union msi_instance_cookie. This cookie is not used by the core code and is
stored in the allocated msi_desc::data::icookie. The meaning of the cookie
is completely implementation defined. In case of IMS this might be a PASID
or a pointer to a device queue, but for the MSI core it's opaque and not
used in any way.
The function returns a struct msi_map which on success contains the
allocated index number and the Linux interrupt number so the caller can
spare the index to Linux interrupt number lookup.
On failure map::index contains the error code and map::virq is 0.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.501359457@linutronix.de
The existing MSI domain ops msi_prepare() and set_desc() turned out to be
unsuitable for implementing IMS support.
msi_prepare() does not operate on the MSI descriptors. set_desc() lacks
an irq_domain pointer and has a completely different purpose.
Introduce a prepare_desc() op which allows IMS implementations to amend an
MSI descriptor which was allocated by the core code, e.g. by adjusting the
iomem base or adding some data based on the allocated index. This is way
better than requiring that all IMS domain implementations preallocate the
MSI descriptor and then allocate the interrupt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.444560717@linutronix.de
Provide new bus tokens for the upcoming per device PCI/MSI and PCI/MSIX
interrupt domains.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.917219885@linutronix.de
Per device domains provide the real domain size to the core code. This
allows range checking on insertion of MSI descriptors and also paves the
way for dynamic index allocations which are required e.g. for IMS. This
avoids external mechanisms like bitmaps on the device side and just
utilizes the core internal MSI descriptor storxe for it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.798556374@linutronix.de
Provide an interface to match a per device domain bus token. This allows to
query which type of domain is installed for a particular domain id. Will be
used for PCI to avoid frequent create/remove cycles for the MSI resp. MSI-X
domains.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.738047902@linutronix.de
Now that all prerequsites are in place, provide the actual interfaces for
creating and removing per device interrupt domains.
MSI device interrupt domains are created from the provided
msi_domain_template which is duplicated so that it can be modified for the
particular device.
The name of the domain and the name of the interrupt chip are composed by
"$(PREFIX)$(CHIPNAME)-$(DEVNAME)"
$PREFIX: The optional prefix provided by the underlying MSI parent domain
via msi_parent_ops::prefix.
$CHIPNAME: The name of the irq_chip in the template
$DEVNAME: The name of the device
The domain is further initialized through a MSI parent domain callback which
fills in the required functionality for the parent domain or domains further
down the hierarchy. This initialization can fail, e.g. when the requested
feature or MSI domain type cannot be supported.
The domain pointer is stored in the pointer array inside of msi_device_data
which is attached to the domain.
The domain can be removed via the API or left for disposal via devres when
the device is torn down. The API removal is useful e.g. for PCI to have
seperate domains for MSI and MSI-X, which are mutually exclusive and always
occupy the default domain id slot.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.678838546@linutronix.de
Split the functionality of msi_create_irq_domain() so it can
be reused for creating per device irq domains.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.559086358@linutronix.de
To allow proper range checking especially for dynamic allocations add a
size field to struct msi_domain_info. If the field is 0 then the size is
unknown or unlimited (up to MSI_MAX_INDEX) to provide backwards
compability.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.501144862@linutronix.de
MSI parent domains must have some control over the MSI domains which are
built on top. On domain creation they need to fill in e.g. architecture
specific chip callbacks or msi domain ops to make the outermost domain
parent agnostic which is obviously required for architecture independence
etc.
The structure contains:
1) A bitfield which exposes the supported functional features. This
allows to check for features and is also used in the initialization
callback to mask out unsupported features when the actual domain
implementation requests a broader range, e.g. on x86 PCI multi-MSI
is only supported by remapping domains but not by the underlying
vector domain. The PCI/MSI code can then always request multi-MSI
support, but the resulting feature set after creation might not
have it set.
2) An optional string prefix which is put in front of domain and chip
names during creation of the MSI domain. That allows to keep the
naming schemes e.g. on x86 where PCI-MSI domains have a IR- prefix
when interrupt remapping is enabled.
3) An initialization callback to sanity check the domain info of
the to be created MSI domain, to restrict features and to
apply changes in MSI ops and interrupt chip callbacks to
accomodate to the particular MSI parent implementation and/or
the underlying hierarchy.
Add a conveniance function to delegate the initialization from the
MSI parent domain to an underlying domain in the hierarchy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.382485843@linutronix.de
Now that all users are converted remove the old interfaces.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.694291814@linutronix.de
Provide two sorts of interfaces to handle the different use cases:
- msi_domain_alloc_irqs_range():
Handles a caller defined precise range
- msi_domain_alloc_irqs_all():
Allocates all interrupts associated to a domain by scanning the
allocated MSI descriptors
The latter is useful for the existing PCI/MSI support which does not have
range information available.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.396497163@linutronix.de
Provide two sorts of interfaces to handle the different use cases:
- msi_domain_free_irqs_range():
Handles a caller defined precise range
- msi_domain_free_irqs_all():
Frees all interrupts associated to a domain
The latter is useful for device teardown and to handle the legacy MSI support
which does not have any range information available.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.337844751@linutronix.de
Allocating simple interrupt descriptors in the core code has to be multi
device irqdomain aware for the upcoming PCI/IMS support.
Change the interfaces to take a domain id into account. Use the internal
control struct for transport of arguments.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.279112474@linutronix.de
Change the descriptor free functions to take a domain id to prepare for the
upcoming multi MSI domain per device support.
To avoid changing and extending the interfaces over and over use an core
internal control struct and hand the pointer through the various functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.220788011@linutronix.de
Change the descriptor allocation and insertion functions to take a domain
id to prepare for the upcoming multi MSI domain per device support.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.163043028@linutronix.de
This reflects the functionality better. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.103554618@linutronix.de
In preparation of the upcoming per device multi MSI domain support, change
the interface to support lookups based on domain id and zero based index
within the domain.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230314.044613697@linutronix.de
To support multiple MSI interrupt domains per device it is necessary to
segment the xarray MSI descriptor storage. Each domain gets up to
MSI_MAX_INDEX entries.
Change the iterators so they operate with domain ids and take the domain
offsets into account.
The publicly available iterators which are mostly used in legacy
implementations and the PCI/MSI core default to MSI_DEFAULT_DOMAIN (0)
which is the id for the existing "global" domains.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230313.985498981@linutronix.de
With the upcoming per device MSI interrupt domain support it is necessary
to store the domain pointers per device.
Instead of delegating that storage to device drivers or subsystems add a
domain pointer to the msi_dev_domain array in struct msi_device_data.
This pointer is also used to take care of tearing down the irq domains when
msi_device_data is cleaned up via devres.
The interfaces into the MSI core will be changed from irqdomain pointer
based interfaces to domain id based interfaces to support multiple MSI
domains on a single device (e.g. PCI/MSI[-X] and PCI/IMS.
Once the per device domain support is complete the irq domain pointer in
struct device::msi.domain will not longer contain a pointer to the "global"
MSI domain. It will contain a pointer to the MSI parent domain instead.
It would be a horrible maze of conditionals to evaluate all over the place
which domain pointer should be used, i.e. the "global" one in
device::msi::domain or one from the internal pointer array.
To avoid this evaluate in msi_setup_device_data() whether the irq domain
which is associated to a device is a "global" or a parent MSI domain. If it
is global then copy the pointer into the first entry of the msi_dev_domain
array.
This allows to convert interfaces and implementation to domain ids while
keeping everything existing working.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230313.923860399@linutronix.de
The upcoming support for multiple MSI domains per device requires storage
for the MSI descriptors and in a second step storage for the irqdomain
pointers.
Move the xarray into a separate data structure msi_dev_domain and create an
array with size 1 in msi_device_data, which can be expanded later when the
support for per device domains is implemented.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230313.864887773@linutronix.de
In the upcoming per device MSI domain concept the MSI parent domains are
not allowed to be used as regular MSI domains where the MSI allocation/free
operations are applicable.
Add appropriate checks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230313.806128070@linutronix.de
irq_domain::dev is a misnomer as it's usually the rule that a device
pointer points to something which is directly related to the instance.
irq_domain::dev can point to some other device for power management to
ensure that this underlying device is not powered down when an interrupt is
allocated.
The upcoming per device MSI domains really require a pointer to the device
which instantiated the irq domain and not to some random other device which
is required for power management down the chain.
Rename irq_domain::dev to irq_domain::pm_dev and fixup the few sites which
use that pointer.
Conversion was done with the help of coccinelle.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230313.574541683@linutronix.de
It's truly a MSI only flag and for the upcoming per device MSI domains this
must be in the MSI flags so it can be set during domain setup without
exposing this quirk outside of x86.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124230313.454246167@linutronix.de
Fault injection tests trigger warnings like this:
kernfs: can not remove 'chip_name', no directory
WARNING: CPU: 0 PID: 253 at fs/kernfs/dir.c:1616 kernfs_remove_by_name_ns+0xce/0xe0
RIP: 0010:kernfs_remove_by_name_ns+0xce/0xe0
Call Trace:
<TASK>
remove_files.isra.1+0x3f/0xb0
sysfs_remove_group+0x68/0xe0
sysfs_remove_groups+0x41/0x70
__kobject_del+0x45/0xc0
kobject_del+0x29/0x40
free_desc+0x42/0x70
irq_free_descs+0x5e/0x90
The reason is that the interrupt descriptor sysfs handling does not roll
back on a failing kobject_add() during allocation. If the descriptor is
freed later on, kobject_del() is invoked with a not added kobject resulting
in the above warnings.
A proper rollback in case of a kobject_add() failure would be the straight
forward solution. But this is not possible due to the way how interrupt
descriptor sysfs handling works.
Interrupt descriptors are allocated before sysfs becomes available. So the
sysfs files for the early allocated descriptors are added later in the boot
process. At this point there can be nothing useful done about a failing
kobject_add(). For consistency the interrupt descriptor allocation always
treats kobject_add() failures as non-critical and just emits a warning.
To solve this problem, keep track in the interrupt descriptor whether
kobject_add() was successful or not and make the invocation of
kobject_del() conditional on that.
[ tglx: Massage changelog, comments and use a state bit. ]
Fixes: ecb3f394c5 ("genirq: Expose interrupt information through sysfs")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20221128151612.1786122-1-yangyingliang@huawei.com
Adjust to reality and remove another layer of pointless Kconfig
indirection. CONFIG_GENERIC_MSI_IRQ is good enough to serve
all purposes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122014.524842979@linutronix.de
Add a bus token member to struct msi_domain_info and let
msi_create_irq_domain() set the bus token.
That allows to remove the bus token updates at the call sites.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122014.294554462@linutronix.de
Now that the last user is gone, confine it to the core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122014.179595843@linutronix.de
To prepare for removing the exposure of __msi_domain_free_irqs() provide a
post_free() callback in the MSI domain ops which can be used to solve
the problem of the only user of __msi_domain_free_irqs() in arch/powerpc.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122014.063153448@linutronix.de
When a range of descriptors is freed then all of them are not associated to
a linux interrupt. Remove the filter and add a warning to the free function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122013.888850936@linutronix.de
There are no associated MSI descriptors in the requested range when the MSI
descriptor allocation fails. Use MSI_DESC_ALL as the filter which prepares
the next step to get rid of the filter for freeing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122013.831151822@linutronix.de
commit 509853f9e1 ("genirq: Provide generic_handle_irq_safe()")
addressed the problem of demultiplexing interrupt handlers which are force
threaded on PREEMPT_RT enabled kernels which means that the demultiplexed
handler is invoked with interrupts enabled which triggers a lockdep
warning due to a non-irq safe lock acquisition.
The same problem exists for the irq domain based interrupt handling via
generic_handle_domain_irq() which has been reported against the AMD
pin-ctrl driver.
Provide generic_handle_domain_irq_safe() which can used from any context.
[ tglx: Split the usage sites out and massaged changelog ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/YnkfWFzvusFFktSt@linutronix.de
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215954
* Core code update:
- Non-SMP IRQ affinity fixes, allowing UP kernel to behave similarly
to SMP ones for the purpose of interrupt affinity
- Let irq_set_chip_handler_name_locked() take a const struct irq_chip *
- Tidy-up the NOMAP irqdomain API variant
- Teach action_show() to use for_each_action_of_desc()
- Make irq_chip_request_resources_parent() allow the parent callback
to be optional
- Remove dynamic allocations from populate_parent_alloc_arg()
* New drivers:
- Merge the long awaited IRQ support for the LoongArch architecture,
with the provisional ACPICA update (to be reverted once the official
support lands)
- New Renesas RZ/G2L IRQC driver, equipped with its companion GPIO
driver
* Driver updates
- Optimise the hot path operations for the SiFive PLIC, trading the
locking for per-CPU priority masking masking operations which are
apparently faster
- Work around broken PLIC implementations that deal pretty badly with
edge-triggered interrupts. Flag two implementations as affected.
- Simplify the irq-stm32-exti driver, particularly the table that
remaps the interrupts from exti to the GIC, reducing the memory usage
- Convert the ocelot irq_chip to being immutable
- Check ioremap() return value in the MIPS GIC driver
- Move MMP driver init function declarations into the common .h
- The obligatory typo fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmLhi0EPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDI+wP/2BPABqCwu7JAmue8hHtpGweVkEBNulaS1K+
1v/ElU8E1P8ppn1AVmu0lwgmAWiTtPuVWT21+AUbfOjQQ/MYKkegkH4KLmK93qSi
Dn3MEiYv8WYsEV4yrJ7Il7fuuzr1iHIhIfhg3tMxDAJx47lzZH0o8nVoNFqXD1Ro
WUFab/qTAOxJ/I53R9nrpx/yRf5dVRFUAZznrabYKpc/CiD/X+RLcHkbiybbRERu
n3xwEtGMA2faCUgifKhsXoNUaW9mZLaufoF/F3J3Dyt7jNB/TTmdnxqWo6e4/rtd
+Ut0MlH0W7bUdHGiVl1E90hDQ3yuBykUpKlTfMoYWOxeTqAx0bPYjGIuh6ajrAIy
+fXWcK89KimOGB+aLC0cR5YrzvShHnH1G2qlrQg3pAXporigAXfZvzhnouRBsVKO
RfnQHNsHSQHXTEu1u2HjMt7AXtmy/SoJENuPPUrtLfojg8b3aupwOvPLVx7w1Ok0
5TKZ2yhOHdskfr3lCPisVPKK0KZ+QZhDdBkd319JkxR5/iA/tzMeMTzKslruhd2U
Ug6hYhKNE2kKkBBBMcEVCHAuuq94DnU/q6l458UTSkkBmvq5cMMSz5Fs0kMwGFRc
q/DncpKQnrPKGrwiilj1uGgOWO2vec8KfMJUYtKzSM/QELbKUvF7yzZeIjUQxiDO
KqlWNczX
=E5fZ
-----END PGP SIGNATURE-----
Merge tag 'irqchip-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip/genirq updates from Marc Zyngier:
* Core code update:
- Non-SMP IRQ affinity fixes, allowing UP kernel to behave similarly
to SMP ones for the purpose of interrupt affinity
- Let irq_set_chip_handler_name_locked() take a const struct irq_chip *
- Tidy-up the NOMAP irqdomain API variant
- Teach action_show() to use for_each_action_of_desc()
- Make irq_chip_request_resources_parent() allow the parent callback
to be optional
- Remove dynamic allocations from populate_parent_alloc_arg()
* New drivers:
- Merge the long awaited IRQ support for the LoongArch architecture,
with the provisional ACPICA update (to be reverted once the official
support lands)
- New Renesas RZ/G2L IRQC driver, equipped with its companion GPIO
driver
* Driver updates
- Optimise the hot path operations for the SiFive PLIC, trading the
locking for per-CPU priority masking masking operations which are
apparently faster
- Work around broken PLIC implementations that deal pretty badly with
edge-triggered interrupts. Flag two implementations as affected.
- Simplify the irq-stm32-exti driver, particularly the table that
remaps the interrupts from exti to the GIC, reducing the memory usage
- Convert the ocelot irq_chip to being immutable
- Check ioremap() return value in the MIPS GIC driver
- Move MMP driver init function declarations into the common .h
- The obligatory typo fixes
Link: https://lore.kernel.org/all/20220727192356.1860546-1-maz@kernel.org
* irq/misc-5.20:
: .
: Misc IRQ changes for 5.20:
:
: - Let irq_set_chip_handler_name_locked() take a const struct irq_chip *
:
: - Convert the ocelot irq_chip to being immutable (depends on the above)
:
: - Tidy-up the NOMAP irqdomain API variant
:
: - Teach action_show() to use for_each_action_of_desc()
:
: - Check ioremap() return value in the MIPS GIC driver
:
: - Move MMP driver init function declarations into the common .h
:
: - The obligatory typo fixes
: .
irqchip/mmp: Declare init functions in common header file
irqchip/mips-gic: Check the return value of ioremap() in gic_of_init()
genirq: Use for_each_action_of_desc in actions_show()
irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains
irqdomain: Report irq number for NOMAP domains
irqchip/gic-v3: Fix comment typo
pinctrl: ocelot: Make irq_chip immutable
genirq: Allow irq_set_chip_handler_name_locked() to take a const irq_chip
Signed-off-by: Marc Zyngier <maz@kernel.org>
* irq/loongarch:
: .
: Merge the long awaited IRQ support for the LoongArch architecture.
:
: From the cover letter:
:
: "Currently, LoongArch based processors (e.g. Loongson-3A5000)
: can only work together with LS7A chipsets. The irq chips in
: LoongArch computers include CPUINTC (CPU Core Interrupt
: Controller), LIOINTC (Legacy I/O Interrupt Controller),
: EIOINTC (Extended I/O Interrupt Controller), PCH-PIC (Main
: Interrupt Controller in LS7A chipset), PCH-LPC (LPC Interrupt
: Controller in LS7A chipset) and PCH-MSI (MSI Interrupt Controller)."
:
: Note that this comes with non-official, arch private ACPICA
: definitions until the official ACPICA update is realeased.
: .
irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch
irqchip: Add LoongArch CPU interrupt controller support
irqchip: Add Loongson Extended I/O interrupt controller support
irqchip/loongson-liointc: Add ACPI init support
irqchip/loongson-pch-msi: Add ACPI init support
irqchip/loongson-pch-pic: Add ACPI init support
irqchip: Add Loongson PCH LPC controller support
LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain
LoongArch: Use ACPI_GENERIC_GSI for gsi handling
genirq/generic_chip: Export irq_unmap_generic_chip
ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback
APCI: irq: Add support for multiple GSI domains
LoongArch: Provisionally add ACPICA data structures
Signed-off-by: Marc Zyngier <maz@kernel.org>
Refactor action_show() to use for_each_action_of_desc instead
of a similar open-coded loop.
Signed-off-by: Paran Lee <p4ranlee@gmail.com>
[maz: reword commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220710112614.19410-1-p4ranlee@gmail.com
Some irq controllers have to re-implement a private version for
irq_generic_chip_ops, because they have a different xlate to translate
hwirq. Export irq_unmap_generic_chip to allow reusing in drivers.
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1658314292-35346-5-git-send-email-lvjianmin@loongson.cn
NOMAP irq domains use the revmap_size field to indicate the maximum
hwirq number the domain accepts. This is a bit confusing as
revmap_size is usually used to indicate the size of the revmap array,
which a NOMAP domain doesn't have.
Instead, use the hwirq_max field which has the correct semantics, and
keep revmap_size to 0 for a NOMAP domain.
Signed-off-by: Xu Qiang <xuqiang36@huawei.com>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220719063641.56541-3-xuqiang36@huawei.com
When using a NOMAP domain, __irq_resolve_mapping() doesn't store
the Linux IRQ number at the address optionally provided by the caller.
While this isn't a huge deal (the returned value is guaranteed
to the hwirq that was passed as a parameter), let's honour the letter
of the API by writing the expected value.
Fixes: d22558dd0a (“irqdomain: Introduce irq_resolve_mapping()”)
Signed-off-by: Xu Qiang <xuqiang36@huawei.com>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220719063641.56541-2-xuqiang36@huawei.com
* irq/plic-masking:
: .
: SiFive PLIC optimisations from Samuel Holland:
:
: "This series removes the spinlocks and cpumask operations from the PLIC
: driver's hot path. As far as I know, using the priority to mask
: interrupts is an intended usage and will work on all existing
: implementations. [...]"
: .
irqchip/sifive-plic: Separate the enable and mask operations
irqchip/sifive-plic: Make better use of the effective affinity mask
PCI: hv: Take a const cpumask in hv_compose_msi_req_get_cpu()
genirq: Provide an IRQ affinity mask in non-SMP configs
genirq: Return a const cpumask from irq_data_get_affinity_mask
genirq: Add and use an irq_data_update_affinity helper
genirq: Refactor accessors to use irq_data_get_affinity_mask
genirq: Drop redundant irq_init_effective_affinity
genirq: GENERIC_IRQ_EFFECTIVE_AFF_MASK depends on SMP
genirq: GENERIC_IRQ_IPI depends on SMP
irqchip/mips-gic: Only register IPI domain when SMP is enabled
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that the irq_data_update_affinity helper exists, enforce its use
by returning a a const cpumask from irq_data_get_affinity_mask.
Since the previous commit already updated places that needed to call
irq_data_update_affinity, this commit updates the remaining code that
either did not modify the cpumask or immediately passed the modified
mask to irq_set_affinity.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220701200056.46555-8-samuel@sholland.org
An IRQ's effective affinity can only be different from its configured
affinity if there are multiple CPUs. Make it clear that this option is
only meaningful when SMP is enabled. Most of the relevant code in
irqdesc.c is already hidden behind CONFIG_SMP anyway.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220701200056.46555-4-samuel@sholland.org
The generic IPI code depends on the IRQ affinity mask being allocated
and initialized. This will not be the case if SMP is disabled. Fix up
the remaining driver that selected GENERIC_IRQ_IPI in a non-SMP config.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220701200056.46555-3-samuel@sholland.org
Function irq_chip::irq_request_resources() is reported as optional
in the declaration of struct irq_chip.
If the parent irq_chip does not implement it, we should ignore it
and return.
Don't return error if the functions is missing.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220512160544.13561-1-antonio.borneo@foss.st.com
Ever since {suspend,resume}_device_irqs() were introduced in 2009
by commit 0a0c5168df ("PM: Introduce functions for suspending and
resuming device interrupts"), they've been exported even though there
are no module users and never will be: The functions are solely called
by the PM core, which is always built-in. Unexport them.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/fad9b50609f9d9828ea14772dbd4d195713f1c4b.1654846687.git.lukas@wunner.de
When requesting an interrupt, we correctly call into the runtime
PM framework to guarantee that the underlying interrupt controller
is up and running.
However, we fail to do so for chained interrupt controllers, as
the mux interrupt is not requested along the same path.
Augment __irq_do_set_handler() to call into the runtime PM code
in this case, making sure the PM flow is the same for all interrupts.
Reported-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Liu Ying <victor.liu@nxp.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/26973cddee5f527ea17184c0f3fccb70bc8969a0.camel@pengutronix.de
Core code:
- Make the managed interrupts more robust by shutting them down in the
core code when the assigned affinity mask does not contain online
CPUs.
- Make the irq simulator chip work on RT
- A small set of cpumask and power manageent cleanups
Drivers:
- A set of changes which mark GPIO interrupt chips immutable to prevent
the GPIO subsystem from modifying it under the hood. This provides
the necessary infrastructure and converts a set of GPIO and pinctrl
drivers over.
- A set of changes to make the pseudo-NMI handling for GICv3 more
robust: a missing barrier and consistent handling of the priority
mask.
- Another set of GICv3 improvements and fixes, but nothing outstanding
- The usual set of improvements and cleanups all over the place
- No new irqchip drivers and not even a new device tree binding!
100+ interrupt chips are truly enough.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKLOEoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQ4ED/9B1kDwunvkNAPJDmSmr4hFU7EU3ZLb
SyS2099PWekgU3TaWdD6eILm9hIvsAmmhbU7CJ0EWol6G5VsqbNoYsfOsWliuGTi
CL3ygZL84hL4b24c3sipqWAF60WCEKLnYV7pb1DgiZM41C87+wxPB49FQbHVjroz
WDRTF8QYWMqoTRvxGMCflDfkAwydlCrqzQwgyUB5hJj3vbiYX9dVMAkJmHRyM3Uq
Prwhx1Ipbj/wBSReIbIXlNx4XI/iUDI0UWeh02XkVxLb5Jzg7vPCHiuyVMR1DW2J
oEjAR+/1sGwVOoRnfRlwdRUmRRItdlbopbL4CuhO/ENrM/r/o/rMvDDMwF4WoMW9
zXvzFBLllVpLvyFvVHO1LKI6Hx2mdyAmQ1M/TxMFOmHAyfOPtN150AJDPKdCrMk/
0F0B0y/KPgU9P/Q9yLh2UiXRAkoUBpLpk20xZbAUGHnjXXkys4Z2fE+THIob+Ibe
pUnXsgCXVVWyqJjdikPF2gqsSsCFUo7iblHRzI0hzOAPe3MTph0qh3hZoFAFNEYP
IIyAv9+IiT1EvBMgjHNmZ51U0uTbt3qWOSxebEoU3a598wwEVNRRVyutqvREXhl8
inkzpL2N3uBPX7sA25lYkH4QKRbzVoNkF/s0e/J9WZdYbj3SsxGouoGdYA2xgvtM
8tiCnFC9hfzepQ==
=xcXv
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt handling updates from Thomas Gleixner:
"Core code:
- Make the managed interrupts more robust by shutting them down in
the core code when the assigned affinity mask does not contain
online CPUs.
- Make the irq simulator chip work on RT
- A small set of cpumask and power manageent cleanups
Drivers:
- A set of changes which mark GPIO interrupt chips immutable to
prevent the GPIO subsystem from modifying it under the hood. This
provides the necessary infrastructure and converts a set of GPIO
and pinctrl drivers over.
- A set of changes to make the pseudo-NMI handling for GICv3 more
robust: a missing barrier and consistent handling of the priority
mask.
- Another set of GICv3 improvements and fixes, but nothing
outstanding
- The usual set of improvements and cleanups all over the place
- No new irqchip drivers and not even a new device tree binding!
100+ interrupt chips are truly enough"
* tag 'irq-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
irqchip: Add Kconfig symbols for sunxi drivers
irqchip/gic-v3: Fix priority mask handling
irqchip/gic-v3: Refactor ISB + EOIR at ack time
irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling
genirq/irq_sim: Make the irq_work always run in hard irq context
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
irqchip/gic: Improved warning about incorrect type
irqchip/csky: Return true/false (not 1/0) from bool functions
irqchip/imx-irqsteer: Add runtime PM support
irqchip/imx-irqsteer: Constify irq_chip struct
irqchip/armada-370-xp: Enable MSI affinity configuration
irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
irqchip/sun6i-r: Use NULL for chip_data
irqchip/xtensa-mx: Fix initial IRQ affinity in non-SMP setup
irqchip/exiu: Fix acknowledgment of edge triggered interrupts
irqchip/gic-v3: Claim iomem resources
dt-bindings: interrupt-controller: arm,gic-v3: Make the v2 compat requirements explicit
irqchip/gic-v3: Relax polling of GIC{R,D}_CTLR.RWP
irqchip/gic-v3: Detect LPI invalidation MMIO registers
...
- Add new infrastructure to stop gpiolib from rewriting irq_chip
structures behind our back. Convert a few of them, but this will
obviously be a long effort.
- A bunch of GICv3 improvements, such as using MMIO-based invalidations
when possible, and reducing the amount of polling we perform when
reconfiguring interrupts.
- Another set of GICv3 improvements for the Pseudo-NMI functionality,
with a nice cleanup making it easy to reason about the various
states we can be in when an NMI fires.
- The usual bunch of misc fixes and minor improvements.
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKGcX8PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpD7kYP/1sbxyRoq7iWqtTDK7ENWvqXh5wu/YZe0pnw
jr0hPrJTdQKUbsBA+pusklEnTHvRgnLOmFpfR3X7apGg/If7mPRZGQcZz3fXKwDA
53u74IzZhYa+fx9H0L1qtBUHJtTP4/IexkzL/84R19u2/ewIhzDyhpvGxA/yAFj+
Gi6bgz93NGMOt/tdNtXZvj5zdr+5BayC6JBpnyzliyxS1xD3YeA0T05fHDYfjrcM
51gUeA/9tA3EWiRzsdZGq6uDaUfBW5aspWu0bZx/WBUWNBvAAjWzhIgNWDW/xKJP
N3t6UQ6+uNYJXvdaCJlBLc6TiXBzGXINgr4oMljg8nJRYLt+xVsadkTnFxlnqoY/
FNeEiOUQqjZ1qcvHJoIceGHgTq//o3VaZ+AnuAESqeNPGavz+LMOCNo7Su+k2+Tk
H3x09+p+SbrzJvRVyboLVk+v74NtzEz1fGrjEzQk2eHw+dc18yz1v+D1EX1REkhM
gjzjSIAgZoq1M3GZL8tyrov44vhG3mUm3jAO01u9fRTHqEee6WIKt0aijSe/sCRr
chTf+S9n8xPsr6AHUPQImV/fSismK4erCJeAiSp+P3hZjqyK8iPsHgiM5YLj50Cl
ry9dACxv6CYf7lMKmKPC/atV1IlJSEZpguc6FLQ2tv9IBWqNMQXve0012acFMr6B
ZpncbECV
=nQxd
-----END PGP SIGNATURE-----
Merge tag 'irqchip-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- Add new infrastructure to stop gpiolib from rewriting irq_chip
structures behind our back. Convert a few of them, but this will
obviously be a long effort.
- A bunch of GICv3 improvements, such as using MMIO-based invalidations
when possible, and reducing the amount of polling we perform when
reconfiguring interrupts.
- Another set of GICv3 improvements for the Pseudo-NMI functionality,
with a nice cleanup making it easy to reason about the various
states we can be in when an NMI fires.
- The usual bunch of misc fixes and minor improvements.
Link: https://lore.kernel.org/all/20220519165308.998315-1-maz@kernel.org
The IRQ simulator uses irq_work to trigger an interrupt. Without the
IRQ_WORK_HARD_IRQ flag the irq_work will be performed in thread context
on PREEMPT_RT. This causes locking errors later in handle_simple_irq()
which expects to be invoked with disabled interrupts.
Triggering individual interrupts in hardirq context should not lead to
unexpected high latencies since this is also what the hardware
controller does. Also it is used as a simulator so...
Use IRQ_WORK_INIT_HARD() to carry out the irq_work in hardirq context on
PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/YnuZBoEVMGwKkLm+@linutronix.de
Since commit 0953fb2637 ("irq: remove handle_domain_{irq,nmi}()"),
generic_handle_domain_irq() warns if called outside hardirq context, even
though the function calls down to handle_irq_desc(), which warns about the
same, but conditionally on handle_enforce_irqctx().
The newly added warning is a false positive if the interrupt originates
from any other irqchip than x86 APIC or ARM GIC/GICv3. Those are the only
ones for which handle_enforce_irqctx() returns true. Per commit
c16816acd0 ("genirq: Add protection against unsafe usage of
generic_handle_irq()"):
"In general calling generic_handle_irq() with interrupts disabled from non
interrupt context is harmless. For some interrupt controllers like the
x86 trainwrecks this is outright dangerous as it might corrupt state if
an interrupt affinity change is pending."
Examples for interrupt chips where the warning is a false positive are
USB-attached GPIO controllers such as drivers/gpio/gpio-dln2.c:
USB gadgets are incapable of directly signaling an interrupt because they
cannot initiate a bus transaction by themselves. All communication on
the bus is initiated by the host controller, which polls a gadget's
Interrupt Endpoint in regular intervals. If an interrupt is pending,
that information is passed up the stack in softirq context, from which a
hardirq is synthesized via generic_handle_domain_irq().
Remove the warning to eliminate such false positives.
Fixes: 0953fb2637 ("irq: remove handle_domain_{irq,nmi}()")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jakub Kicinski <kuba@kernel.org>
CC: Linus Walleij <linus.walleij@linaro.org>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Octavian Purdila <octavian.purdila@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220505113207.487861b2@kernel.org
Link: https://lore.kernel.org/r/20220506203242.GA1855@wunner.de
Link: https://lore.kernel.org/r/c3caf60bfa78e5fdbdf483096b7174da65d1813a.1652168866.git.lukas@wunner.de
A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:
setserial
open("/dev/ttyXXX")
request_irq()
do_stuff()
-> serial interrupt
-> wake(irq_thread)
desc->threads_active++;
close()
free_irq()
kthread_stop(irq_thread)
synchronize_irq() <- hangs because desc->threads_active != 0
The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.
This problem was introduced with commit 519cc8652b, which addressed a
interrupt sharing issue in the PCIe code.
Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.
To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.
This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.
Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().
This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.
[ tglx: Amended changelog ]
Fixes: 519cc8652b ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com
pm_runtime_resume_and_get() achieves the same and simplifies the code.
[ tglx: Simplify it further by presetting retval ]
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220418110716.2559453-1-chi.minghao@zte.com.cn
Variable end is being initialized with a value that is never read, it
is being re-assigned later with the same value. The initialization is
redundant and can be removed.
Cleans up clang scan build warning:
kernel/irq/matrix.c:289:25: warning: Value stored to 'end' during its
initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20220422110418.1264778-1-colin.i.king@gmail.com
* irq/gpio-immutable:
: .
: First try at preventing the GPIO subsystem from abusing irq_chip
: data structures. The general idea is to have an irq_chip flag
: to tell the GPIO subsystem that these structures are immutable,
: and to convert drivers one by one.
: .
Documentation: Update the recommended pattern for GPIO irqchips
gpio: Update TODO to mention immutable irq_chip structures
pinctrl: amd: Make the irqchip immutable
pinctrl: msmgpio: Make the irqchip immutable
pinctrl: apple-gpio: Make the irqchip immutable
gpio: pl061: Make the irqchip immutable
gpio: tegra186: Make the irqchip immutable
gpio: Add helpers to ease the transition towards immutable irq_chip
gpio: Expose the gpiochip_irq_re[ql]res helpers
gpio: Don't fiddle with irqchips marked as immutable
Signed-off-by: Marc Zyngier <maz@kernel.org>
In order to move away from gpiolib messing with the internals of
unsuspecting irqchips, add a flag by which irqchips advertise
that they are not to be messed with, and do solemnly swear that
they correctly call into the gpiolib helpers when required.
Also nudge the users into converting their drivers to the
new model.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419141846.598305-2-maz@kernel.org
Although setting the affinity of an interrupt to a set of CPUs that doesn't
have any online CPU is generally frowned apon, there are a few limited
cases where such affinity is set from a CPUHP notifier, setting the
affinity to a CPU that isn't online yet.
The saving grace is that this is always done using the 'force' attribute,
which gives a hint that the affinity setting can be outside of the online
CPU mask and the callsite set this flag with the knowledge that the
underlying interrupt controller knows to handle it.
This restores the expected behaviour on Marek's system.
Fixes: 33de0aa4ba ("genirq: Always limit the affinity to online CPUs")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/4b7fc13c-887b-a664-26e8-45aed13f048a@samsung.com
Link: https://lore.kernel.org/r/20220414140011.541725-1-maz@kernel.org
If CPUs on a node are offline at boot time, the number of nodes is
different when building affinity masks for present cpus and when building
affinity masks for possible cpus. This causes the following problem:
In the case that the number of vectors is less than the number of nodes
there are cases where bits of masks for present cpus are overwritten when
building masks for possible cpus.
Fix this by excluding CPUs, which are not part of the current build mask
(present/possible).
[ tglx: Massaged changelog and added comment ]
Fixes: b825921990 ("genirq/affinity: Spread IRQs to all available NUMA nodes")
Signed-off-by: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220331003309.10891-1-yamamoto.rei@jp.fujitsu.com
__irq_build_affinity_masks() calls cpumask_weight() to check if any bit of
a given cpumask is set.
This can be done more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220210224933.379149-22-yury.norov@gmail.com
When booting with maxcpus=<small number> (or even loading a driver
while most CPUs are offline), it is pretty easy to observe managed
affinities containing a mix of online and offline CPUs being passed
to the irqchip driver.
This means that the irqchip cannot trust the affinity passed down
from the core code, which is a bit annoying and requires (at least
in theory) all drivers to implement some sort of affinity narrowing.
In order to address this, always limit the cpumask to the set of
online CPUs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220405185040.206297-3-maz@kernel.org
When booting with maxcpus=<small number>, interrupt controllers
such as the GICv3 ITS may not be able to satisfy the affinity of
some managed interrupts, as some of the HW resources are simply
not available.
The same thing happens when loading a driver using managed interrupts
while CPUs are offline.
In order to deal with this, do not try to activate such interrupt
if there is no online CPU capable of handling it. Instead, place
it in shutdown state. Once a capable CPU shows up, it will be
activated.
Reported-by: John Garry <john.garry@huawei.com>
Reported-by: David Decotigny <ddecotig@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220405185040.206297-2-maz@kernel.org
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler build,
from the fast-headers tree - including placeholder *_api.h headers for
later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k
ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL
LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb
mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/
dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1
0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie
IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep
qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6
3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi
NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk
f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t
ES/qvlGxTIs=
=Z8uT
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler
build, from the fast-headers tree - including placeholder *_api.h
headers for later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per
node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
sched/headers: ARM needs asm/paravirt_api_clock.h too
sched/numa: Fix boot crash on arm64 systems
headers/prep: Fix header to build standalone: <linux/psi.h>
sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
cgroup: Fix suspicious rcu_dereference_check() usage warning
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
sched/deadline,rt: Remove unused functions for !CONFIG_SMP
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
sched/deadline: Remove unused def_dl_bandwidth
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
sched/tracing: Don't re-read p->state when emitting sched_switch event
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
sched/cpuacct: Remove redundant RCU read lock
sched/cpuacct: Optimize away RCU read lock
sched/cpuacct: Fix charge percpu cpuusage
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
...
- Add support for the STM32MP13 variant
- Move parent device away from struct irq_chip
- Remove all instances of non-const strings assigned to
struct irq_chip::name, enabling a nice cleanup for VIC and GIC)
- Simplify the Qualcomm PDC driver
- A bunch of SiFive PLIC cleanups
- Add support for a new variant of the Meson GPIO block
- Add support for the irqchip side of the Apple M1 PMU
- Add support for the Apple M1 Pro/Max AICv2 irqchip
- Add support for the Qualcomm MPM wakeup gadget
- Move the Xilinx driver over to the generic irqdomain handling
- Tiny speedup for IPIs on GICv3 systems
- The usual odd cleanups
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmItxJMPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpD7ccQAIkkNoC6yQ+9lhbdRrlo6KUtUT2apDheIF+5
Yfo7dTeKMUb4NpQs+b4v01A0B3KSLPuwTulWfGXhsLRXVcfEEnkBCQzy/IQnkYTQ
DDvxENRz40SS0WJF1G74a7KsqHt+epyHZkB6KJQV4BYrZKxt2h0tWNSiNf1IDN/e
9mZq2kLgEk0kfRCR9u6NYGMugbrgbdtiLgwBARKdRtAAkjBlGEtC2slp0a3WTsyg
QfnMWMOK22wa34eZzFG8VrJMVwGyeqMP/ZW30EoClBzPyLUM5aZWRr+LSvLYQC4n
ho6ua1+a2726TBT6vtWNi0KDNcXwhL6JheO4m2bCoWPvu4YengfKQ5QllAFvSR3W
e4oT/xwkBcf+n5ehXEfxqTRRxG398oWYI60kX586dIcr9qN9WBsw1S5aPkDeZ+nT
6THbQ5uZrIqkeWOoJmvg+iwKkE/NQY/xUENW0zeG2f4/YLIGeKK7e1/XCl1jqzlk
vIvf/bYr64TgOvvHhIeh1G5iXQnk1TWoCzW0DQ8BIXhjlbVRG39QuvwjXKok4AhK
QgKMi6N1ge4nKO1gcYbR174gDz+MylZP41ddDACVXT/5hzsfyxLF36ixdyMLKwtr
Lybb4PGB5Pf0Zgxu6cVWeVsEZEwtlMCmIi1XUW4YRv2saypTPD5V78Ug6jbyPMXE
G7J5dxwS
=cf1B
-----END PGP SIGNATURE-----
Merge tag 'irqchip-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- Add support for the STM32MP13 variant
- Move parent device away from struct irq_chip
- Remove all instances of non-const strings assigned to
struct irq_chip::name, enabling a nice cleanup for VIC and GIC)
- Simplify the Qualcomm PDC driver
- A bunch of SiFive PLIC cleanups
- Add support for a new variant of the Meson GPIO block
- Add support for the irqchip side of the Apple M1 PMU
- Add support for the Apple M1 Pro/Max AICv2 irqchip
- Add support for the Qualcomm MPM wakeup gadget
- Move the Xilinx driver over to the generic irqdomain handling
- Tiny speedup for IPIs on GICv3 systems
- The usual odd cleanups
Link: https://lore.kernel.org/all/20220313105142.704579-1-maz@kernel.org
Provide generic_handle_irq_safe() which can used from any context.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20220211181500.1856198-2-bigeasy@linutronix.de
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
In order to let a const irqchip be fed to the irqchip layer, adjust
the various prototypes. An extra cast in irq_set_chip()() is required
to avoid a warning.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220209162607.1118325-3-maz@kernel.org
In order to let a const irqchip be fed to the irqchip layer, adjust
the various prototypes. An extra cast in irq_domain_set_hwirq_and_chip()
is required to avoid a warning.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220209162607.1118325-2-maz@kernel.org