mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-27 20:26:07 +00:00
27d033df35
44796 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
365346980e |
- Fix a performance regression when measuring the CPU time of a thread
(clock_gettime(CLOCK_THREAD_CPUTIME_ID,...)) due to the addition of PSI IRQ time accounting in the hotpath - Fix a task_struct leak due to missing to decrement the refcount when the task is enqueued before the timer which is supposed to do that, expires - Revert an attempt to expedite detaching of movable tasks, as finding those could become very costly. Turns out the original issue wasn't even hit by anyone -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaTmqUACgkQEsHwGGHe VUos3BAAgeZdeFiqop5TuNPURy7DDFpl/Ibwe9Wv1PGvZ70WHT0Aqf6S+woE91+g uRR9VZnyS7ODUEP4PD43zFeBHbrt6mZkKTyPRKxiylZpJGOp1KGfGmaxPEoi+kC+ 3rwphrs7F6cJ0H4mKvqj5+x1jA19L/RZ7LqZ4tZBicwkZXmBnk4Hy9mlO/5Neb2Q SqhzzCSVgpUW3mVvpPetst8N26R7BTYkejA3RWmCr8xFLB9nyzLBX5uGPtolv1QZ B5gRtK5ZY2tohdKaShFqdiSUDoUAKuO2WPLSS/ALZDfwK5a+Pue7uGt97OSHhVLt fTCcPcWDiNj5t/uA7FYXA9wiTQmATUzPtvj2urf/mVupaMLZJiadnQvX0Ya9YrCr 9dyowk7me3326FvUzeqga12cyUtPxElfVsb3KzT1YzSyu7nd/Kezn8Lie41xzZJL cSqhex3Xl8Eoxvkf2gIy9FnBrx8t47B8SYCSK+JvjeGzIgeBpiHM4zsd5RE5xZ8d 9m2uhlFXV+fIlDrqO1RA9k3yRfvbnuGrIHQUo0B2EKC/u/aSvrQVShQwnKzjM1u7 mXMMyPUxyiDi2VLCTUcKqhLmf77TA0/1px+4dQ7E+ar6tvZxeBUrhoTM9jbNqDWl g5ShAHBWQXifoIytyhFNM5cfJnkRll97LyKm3LXyG6yZ7VK/JL8= =gxU0 -----END PGP SIGNATURE----- Merge tag 'sched_urgent_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Fix a performance regression when measuring the CPU time of a thread (clock_gettime(CLOCK_THREAD_CPUTIME_ID,...)) due to the addition of PSI IRQ time accounting in the hotpath - Fix a task_struct leak due to missing to decrement the refcount when the task is enqueued before the timer which is supposed to do that, expires - Revert an attempt to expedite detaching of movable tasks, as finding those could become very costly. Turns out the original issue wasn't even hit by anyone * tag 'sched_urgent_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath sched/deadline: Fix task_struct reference leak Revert "sched/fair: Make sure to try to detach at least one movable task" |
||
|
|
a6fcd19d7e |
bpf: Defer work in bpf_timer_cancel_and_free
Currently, the same case as previous patch (two timer callbacks trying
to cancel each other) can be invoked through bpf_map_update_elem as
well, or more precisely, freeing map elements containing timers. Since
this relies on hrtimer_cancel as well, it is prone to the same deadlock
situation as the previous patch.
It would be sufficient to use hrtimer_try_to_cancel to fix this problem,
as the timer cannot be enqueued after async_cancel_and_free. Once
async_cancel_and_free has been done, the timer must be reinitialized
before it can be armed again. The callback running in parallel trying to
arm the timer will fail, and freeing bpf_hrtimer without waiting is
sufficient (given kfree_rcu), and bpf_timer_cb will return
HRTIMER_NORESTART, preventing the timer from being rearmed again.
However, there exists a UAF scenario where the callback arms the timer
before entering this function, such that if cancellation fails (due to
timer callback invoking this routine, or the target timer callback
running concurrently). In such a case, if the timer expiration is
significantly far in the future, the RCU grace period expiration
happening before it will free the bpf_hrtimer state and along with it
the struct hrtimer, that is enqueued.
Hence, it is clear cancellation needs to occur after
async_cancel_and_free, and yet it cannot be done inline due to deadlock
issues. We thus modify bpf_timer_cancel_and_free to defer work to the
global workqueue, adding a work_struct alongside rcu_head (both used at
_different_ points of time, so can share space).
Update existing code comments to reflect the new state of affairs.
Fixes:
|
||
|
|
d4523831f0 |
bpf: Fail bpf_timer_cancel when callback is being cancelled
Given a schedule:
timer1 cb timer2 cb
bpf_timer_cancel(timer2); bpf_timer_cancel(timer1);
Both bpf_timer_cancel calls would wait for the other callback to finish
executing, introducing a lockup.
Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps
track of all in-flight cancellation requests for a given BPF timer.
Whenever cancelling a BPF timer, we must check if we have outstanding
cancellation requests, and if so, we must fail the operation with an
error (-EDEADLK) since cancellation is synchronous and waits for the
callback to finish executing. This implies that we can enter a deadlock
situation involving two or more timer callbacks executing in parallel
and attempting to cancel one another.
Note that we avoid incrementing the cancelling counter for the target
timer (the one being cancelled) if bpf_timer_cancel is not invoked from
a callback, to avoid spurious errors. The whole point of detecting
cur->cancelling and returning -EDEADLK is to not enter a busy wait loop
(which may or may not lead to a lockup). This does not apply in case the
caller is in a non-callback context, the other side can continue to
cancel as it sees fit without running into errors.
Background on prior attempts:
Earlier versions of this patch used a bool 'cancelling' bit and used the
following pattern under timer->lock to publish cancellation status.
lock(t->lock);
t->cancelling = true;
mb();
if (cur->cancelling)
return -EDEADLK;
unlock(t->lock);
hrtimer_cancel(t->timer);
t->cancelling = false;
The store outside the critical section could overwrite a parallel
requests t->cancelling assignment to true, to ensure the parallely
executing callback observes its cancellation status.
It would be necessary to clear this cancelling bit once hrtimer_cancel
is done, but lack of serialization introduced races. Another option was
explored where bpf_timer_start would clear the bit when (re)starting the
timer under timer->lock. This would ensure serialized access to the
cancelling bit, but may allow it to be cleared before in-flight
hrtimer_cancel has finished executing, such that lockups can occur
again.
Thus, we choose an atomic counter to keep track of all outstanding
cancellation requests and use it to prevent lockups in case callbacks
attempt to cancel each other while executing in parallel.
Reported-by: Dohyun Kim <dohyunkim@google.com>
Reported-by: Neel Natu <neelnatu@google.com>
Fixes:
|
||
|
|
af253aef18 |
bpf: fix order of args in call to bpf_map_kvcalloc
The original function call passed size of smap->bucket before the number of buckets which raises the error 'calloc-transposed-args' on compilation. Vlastimil Babka added: The order of parameters can be traced back all the way to |
||
|
|
cf3f9a593d |
mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or /proc or ptrace or page migration (get_task_mm()), it is impossible to find an appropriate task_struct in the loop whose mm_struct is the same as the target mm_struct. If the above race condition is combined with the stress-ng-zombie and stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in write_lock_irq() for tasklist_lock. Recognize this situation in advance and exit early. Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tycho Andersen <tandersen@netflix.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
ddae0ca2a8 |
sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath
It was reported that in moving to 6.1, a larger then 10%
regression was seen in the performance of
clock_gettime(CLOCK_THREAD_CPUTIME_ID,...).
Using a simple reproducer, I found:
5.10:
100000000 calls in 24345994193 ns => 243.460 ns per call
100000000 calls in 24288172050 ns => 242.882 ns per call
100000000 calls in 24289135225 ns => 242.891 ns per call
6.1:
100000000 calls in 28248646742 ns => 282.486 ns per call
100000000 calls in 28227055067 ns => 282.271 ns per call
100000000 calls in 28177471287 ns => 281.775 ns per call
The cause of this was finally narrowed down to the addition of
psi_account_irqtime() in update_rq_clock_task(), in commit
|
||
|
|
b58652db66 |
sched/deadline: Fix task_struct reference leak
During the execution of the following stress test with linux-rt:
stress-ng --cyclic 30 --timeout 30 --minimize --quiet
kmemleak frequently reported a memory leak concerning the task_struct:
unreferenced object 0xffff8881305b8000 (size 16136):
comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
object hex dump (first 32 bytes):
02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .@..............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
debug hex dump (first 16 bytes):
53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 S...............
backtrace:
[<00000000046b6790>] dup_task_struct+0x30/0x540
[<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0
[<00000000ced59777>] kernel_clone+0xb0/0x770
[<00000000a50befdc>] __do_sys_clone+0xb6/0xf0
[<000000001dbf2008>] do_syscall_64+0x5d/0xf0
[<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
The issue occurs in start_dl_timer(), which increments the task_struct
reference count and sets a timer. The timer callback, dl_task_timer,
is supposed to decrement the reference count upon expiration. However,
if enqueue_task_dl() is called before the timer expires and cancels it,
the reference count is not decremented, leading to the leak.
This patch fixes the reference leak by ensuring the task_struct
reference count is properly decremented when the timer is canceled.
Fixes:
|
||
|
|
2feab2492d |
Revert "sched/fair: Make sure to try to detach at least one movable task"
This reverts commit |
||
|
|
3e334486ec |
TTY/Serial/Console fixes for 6.10-rc6
Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are:
- revert the bunch of tty/serial/console changes that landed in -rc1
that didn't quite work properly yet. Everyone agreed to just revert
them for now and will work on making them better for a future
release instead of trying to quick fix the existing changes this
late in the release cycle
- 8250 driver port count bugfix
- Other tiny serial port bugfixes for reported issues
All of these have been in linux-next this week with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZoFmvg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymziACgvoDTxuDHHfPOd6h/1qrHqYpFK1YAn2IDMJGj
Ng4/I/gwnkJeeHQC5JSn
=g9o4
-----END PGP SIGNATURE-----
Merge tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial / console fixes from Greg KH:
"Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are:
- revert the bunch of tty/serial/console changes that landed in -rc1
that didn't quite work properly yet.
Everyone agreed to just revert them for now and will work on making
them better for a future release instead of trying to quick fix the
existing changes this late in the release cycle
- 8250 driver port count bugfix
- Other tiny serial port bugfixes for reported issues
All of these have been in linux-next this week with no reported
issues"
* tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "printk: Save console options for add_preferred_console_match()"
Revert "printk: Don't try to parse DEVNAME:0.0 console options"
Revert "printk: Flag register_console() if console is set on command line"
Revert "serial: core: Add support for DEVNAME:0.0 style naming for kernel console"
Revert "serial: core: Handle serial console options"
Revert "serial: 8250: Add preferred console in serial8250_isa_init_ports()"
Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports"
Revert "serial: 8250: Fix add preferred console for serial8250_isa_init_ports()"
Revert "serial: core: Fix ifdef for serial base console functions"
serial: bcm63xx-uart: fix tx after conversion to uart_port_tx_limited()
serial: core: introduce uart_port_tx_limited_flags()
Revert "serial: core: only stop transmit when HW fifo is empty"
serial: imx: set receiver level before starting uart
tty: mcf: MCF54418 has 10 UARTS
serial: 8250_omap: Implementation of Errata i2310
tty: serial: 8250: Fix port count mismatch with the device
|
||
|
|
3ffea9a7a6 |
- Fix "nosmp" and "maxcpus=0" after the parallel CPU bringup work went in and broke them
- Make sure CPU hotplug dynamic prepare states are actually executed -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaBF4QACgkQEsHwGGHe VUq3JA//UOmjzHAdcRnGNnh6h1dMKyQW4KH98eiQMXaSuvDeBOCAd6Y4tq6YF/Om AIxHgLlhOY5O1OSVJZhtxf/lALkolCAEIAkIvGvn6EpPjog5UtNoIf6XAzwLzMn3 O8WVASO2fkypaAYBY+tUEQoLY6CAfkxogV0lzNA8HGMr6Yf/YWueiK2GO63z9Bgt n0h0362xqACMdUbFnPGrX2wpMDA+WuhHwl8Z1Z1TB0rprYiA/tFCMLcVkT3Fezjh hx7sYMwBM8cunMya8p9ucd4kBUJROrfNo4SfHWfG0lsitW/cflTgRXOfLp4GFLvp z0OI9oeSHQyRATOU9yiXrWcbO8M3rFRw4/YcdRZ+5mlydJWDM00DZPqPcuxs4R3Q nH3gE82CvzWchLU5InHwYhi5oqwNUq1N8mz2bN4T9Yjtaj7zArSLqjqIafhxpJqV 9DllV9gGroAUawlRSgo5dpl2XvPcbr9Sx8bIJqwn36esuBb2qZwL6pOtVJIBr88O QWamnvUH6NnIqweUUR9lRRjO5WjR3Xf2ECpEt5rNnqHXLn92usNaphEhBDo3tvrG +O3pjNER3sTEgF43yYpDX0gMZmHuXfmN+fT6QDcDGk764As+/UawIHStyI3nustI R7gM6SUx8Fv3883LuzZtQ7KNLuhPvLxf8YD2I626HpTtLA9tn5k= =qGvT -----END PGP SIGNATURE----- Merge tag 'smp_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp fixes from Borislav Petkov: - Fix "nosmp" and "maxcpus=0" after the parallel CPU bringup work went in and broke them - Make sure CPU hotplug dynamic prepare states are actually executed * tag 'smp_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Fix broken cmdline "nosmp" and "maxcpus=0" cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked() |
||
|
|
03c8b0bd46 |
- Warn when an hrtimer doesn't get a callback supplied
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaBFDwACgkQEsHwGGHe VUqPnw//aRU3MsjXkoBKmK98O7M+6qoHL2rFRGcvw1o0GxzVB4gODgE2mZWeirG7 JLRp/lVX4xhR85NSBpKlmsnnkC8UCDnpXLRpO24ZTdlc84xEyJGsN0gHqJPjpm9M GkBLRPOwDiSEBzL++6IyR/m3f88WDucQJXVyFa/LQIkSiFdzPBbLwX4otuIieD19 6niyXlqQQ+iAyvkDIH7tNELrOHxivPpH3+QQEfAdtE7TWamv5dkQpu9Kbf811vQb DUsaD4E2+kQUY9ulevvz9OnsGpyhd3m30PUOHKdsrUfaE9bM/RTBDpnQ1dR3lPFD kEb4OXsrcM0z++eIUUTBMpRATVjxl17nSgkDg5S6GLTq/Om4KQP33Co7iXE5D4sI ephbA9jlnHAOtaNh/C1/95pIBidMBHw5HE63XcHJGei1x1pRtFx1apI9UezGGc9H IwRzpKR2UorojCcJedZFiXGt54nJL9UUg7d7sybiVurlKOxIxnaB7cfg1MgeG9ke yUGj6ElXvEAoEmnaf7ScAQnQ5VmkyJYTE8PUlR8h8dumQ3tyBEHanOUxqkQAlZ2P TzVqNeCymh8XGChKKs9pHHUeySkQKYMBOZhEGwSGte0kw8JLJuEsTFud+vcONkda 4MqkH73ebPUdsH5pBNDX7eeDFLvrbpwPNh9u3wQtAnpMGLTKH4k= =et0y -----END PGP SIGNATURE----- Merge tag 'timers_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Warn when an hrtimer doesn't get a callback supplied * tag 'timers_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Prevent queuing of hrtimer without a function callback |
||
|
|
adfbe3640b |
asm-generic fixes for 6.10
These are some bugfixes for system call ABI issues I found while working on a cleanup series. None of these are urgent since these bugs have gone unnoticed for many years, but I think we probably want to backport them all to stable kernels, so it makes sense to have the fixes included as early as possible. One more fix addresses a compile-time warning in kallsyms that was uncovered by a patch I did to enable additional warnings in 6.10. I had mistakenly thought that this fix was already merged through the module tree, but as Geert pointed out it was still missing. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmZ9iRQACgkQYKtH/8kJ UicHIxAA0ej8dMJ3znHovc/CQYkZMpb88bxLlqLotOYuOItEzvR6wd7vnu4cPeZf nHguBiP9RAnzCZhL3F7AS3p8NNJ+P1OZo+sj6tZOANO955mzj1VQ5p2fbSRw+WI3 4Oc1HKvP6UMhHGjU3wHY0+Odd5bpoepN9/fnoiQcHPzq0LbUFM8e4D9KGr51I7fV r7tuDMy9xykEfs6umuDu9wOXih3JkpV9eSmefmjvzgxG3hKLdsvTbWVsVmnKXhZm xdFiTROOmiNvttfkQh0ruBd0drBl8aVhzCKPqIe0vQqS9rBmcf9WTkcJzpihq/fI BA3QjVQFvmHeXs+viaLZf4r/y0qabaTPRBMQxZyEFE0QgtwfxT4/ZnNEbH2s3pIC Pcm0JltLlHLbZs7V63drL6txCoFVndiPXdEBTBsqBwnuDHXCj/tvDcO3tuVTfYoz 9G8TTOsYNEDLYmn8AmzzhJOh75gp6O6A2ui3TtcD9KFNaoTQqqzPJWp8IoxBfxcb 3+rzRWQvXAhfSRBIaejv1quo2ZxoZk3KO3i+ysRITTUF1MLz7b0/Yy/8r74CqmOu 8Iw2Q0BaFPtj1x+VjneQnL++iYWYPEh+ZBEg7AD/z6QHwMLz33SyHlD+/RgRkthV J/L9xUBs5HagWJxRYkVc+l0LOVclTqVJieKD2AWONZ5OFRB+CCI= =ieQy -----END PGP SIGNATURE----- Merge tag 'asm-generic-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fixes from Arnd Bergmann: "These are some bugfixes for system call ABI issues I found while working on a cleanup series. None of these are urgent since these bugs have gone unnoticed for many years, but I think we probably want to backport them all to stable kernels, so it makes sense to have the fixes included as early as possible. One more fix addresses a compile-time warning in kallsyms that was uncovered by a patch I did to enable additional warnings in 6.10. I had mistakenly thought that this fix was already merged through the module tree, but as Geert pointed out it was still missing" * tag 'asm-generic-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: kallsyms: rework symbol lookup return codes linux/syscalls.h: add missing __user annotations syscalls: mmap(): use unsigned offset type consistently s390: remove native mmap2() syscall hexagon: fix fadvise64_64 calling conventions csky, hexagon: fix broken sys_sync_file_range sh: rework sync_file_range ABI powerpc: restore some missing spu syscalls parisc: use generic sys_fanotify_mark implementation parisc: use correct compat recv/recvfrom syscalls sparc: fix compat recv/recvfrom syscalls sparc: fix old compat_sys_select() syscalls: fix compat_sys_io_pgetevents_time64 usage ftruncate: pass a signed offset |
||
|
|
fd19d4a492 |
Including fixes from can, bpf and netfilter.
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
- batman-adv: fix RCU race at module unload time
Current release - new code bugs:
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset.
- fix the corner case with may_goto and jump to the 1st insn.
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmZ9ZlQSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkawoQAKLTWHswqM790uaAAgqP6jGuC4/waRS8
MowEt5rHlwdMXcHhLrDSrLQoDJAZRsWmjniIgbsaeX+HtY4HXfF0tfDMPKiws3vx
Z51qVj7zYjdT7IoZ7Yc8Zlwmt2kVgO4ba6gSigQSORQO9Qq/WNSb0q8BM6cDaYXT
cXC7ikPeMlLnxKxsFRpZ3CUD06dI/aJFp/pefPEm7/X/EbROlSs5y+2GshPdp5t7
tzOUsLHs6ORVq/6jg2nRHH+0D+LMuQG0Z0yCMmYerJMJNtRIxyW6tTYeAsWXeyn3
UN3gaoQ/SIURDrNRZvHsaVDNO/u4rbYtFLoK7S5uPffPWqsGJY59FcH+xYFukFCD
P5Lca4kKBr8xOahsRfSiO0uFbwQfQAauzNiz9Ue39n1hj+ZhZ/CliBLhUeoBl6Y6
jSsxq+/8CZCQ7beek96cyLx83skAcWAU5BEC9xOVlOTuTL91Gxr9UzSx/FqLI34h
Smgw9ZUPzJgvFLgB/OBQ/WYne9LfJ5RYQHZoAXObiozO3TX7NgBUfa0e1T9dLE3F
TalysSO3/goiZNK5a/UNJcj3fAcSEs4M2z9UIK790i3P3GuRigs1sJEtTUqyowWk
aaTFmWCXE0wdoshJjux3syh3Vk6phJWpOlMLYjy0v5s0BF/ZOfDaKQT/dGsvV1HE
AFGpKpybizNV
=BYgZ
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, bpf and netfilter.
There are a bunch of regressions addressed here, but hopefully nothing
spectacular. We are still waiting the driver fix from Intel, mentioned
by Jakub in the previous networking pull.
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed
TFO
- batman-adv: fix RCU race at module unload time
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not
confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset
- fix the corner case with may_goto and jump to the 1st insn
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM
transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added"
* tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: mana: Fix possible double free in error handling path
selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c.
af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head.
selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c
selftest: af_unix: Check SIGURG after every send() in msg_oob.c
selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c
af_unix: Don't stop recv() at consumed ex-OOB skb.
selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c.
af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.
af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.
selftest: af_unix: Add msg_oob.c.
selftest: af_unix: Remove test_unix_oob.c.
tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
net: usb: qmi_wwan: add Telit FN912 compositions
tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
ionic: use dev_consume_skb_any outside of napi
net: dsa: microchip: fix wrong register write when masking interrupt
Fix race for duplicate reqsk on identical SYN
ibmvnic: Add tx check to prevent skb leak
...
|
||
|
|
7e1f4eb9a6 |
kallsyms: rework symbol lookup return codes
Building with W=1 in some configurations produces a false positive
warning for kallsyms:
kernel/kallsyms.c: In function '__sprint_symbol.isra':
kernel/kallsyms.c:503:17: error: 'strcpy' source argument is the same as destination [-Werror=restrict]
503 | strcpy(buffer, name);
| ^~~~~~~~~~~~~~~~~~~~
This originally showed up while building with -O3, but later started
happening in other configurations as well, depending on inlining
decisions. The underlying issue is that the local 'name' variable is
always initialized to the be the same as 'buffer' in the called functions
that fill the buffer, which gcc notices while inlining, though it could
see that the address check always skips the copy.
The calling conventions here are rather unusual, as all of the internal
lookup functions (bpf_address_lookup, ftrace_mod_address_lookup,
ftrace_func_address_lookup, module_address_lookup and
kallsyms_lookup_buildid) already use the provided buffer and either return
the address of that buffer to indicate success, or NULL for failure,
but the callers are written to also expect an arbitrary other buffer
to be returned.
Rework the calling conventions to return the length of the filled buffer
instead of its address, which is simpler and easier to follow as well
as avoiding the warning. Leave only the kallsyms_lookup() calling conventions
unchanged, since that is called from 16 different functions and
adapting this would be a much bigger change.
Link: https://lore.kernel.org/lkml/20200107214042.855757-1-arnd@arndb.de/
Link: https://lore.kernel.org/lkml/20240326130647.7bfb1d92@gandalf.local.home/
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
||
|
|
24ca36a562 |
workqueue: Fixes for v6.10-rc5
Two patches to fix kworker name formatting. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZnyHUw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGV8AAQCi2flF10nHfE+nPsji8Z2FWtDKIzemsKgmxSnW VvThlAD+MeK29fbz0uZzFa65TqIeckf7DGoGQpKnMzC7+FeJWAc= =6POk -----END PGP SIGNATURE----- Merge tag 'wq-for-6.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two patches to fix kworker name formatting" * tag 'wq-for-6.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Increase worker desc's length to 32 workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string |
||
|
|
5a830bbce3 |
hrtimer: Prevent queuing of hrtimer without a function callback
The hrtimer function callback must not be NULL. It has to be specified by the call side but it is not validated by the hrtimer code. When a hrtimer is queued without a function callback, the kernel crashes with a null pointer dereference when trying to execute the callback in __run_hrtimer(). Introduce a validation before queuing the hrtimer in hrtimer_start_range_ns(). [anna-maria: Rephrase commit message] Signed-off-by: Phil Chang <phil.chang@mediatek.com> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de> |
||
|
|
d3882564a7 |
syscalls: fix compat_sys_io_pgetevents_time64 usage
Using sys_io_pgetevents() as the entry point for compat mode tasks works almost correctly, but misses the sign extension for the min_nr and nr arguments. This was addressed on parisc by switching to compat_sys_io_pgetevents_time64() in commit |
||
|
|
cc8d5a2f09 |
Revert "printk: Save console options for add_preferred_console_match()"
This reverts commit
|
||
|
|
64f9f010c6 |
Revert "printk: Don't try to parse DEVNAME:0.0 console options"
This reverts commit
|
||
|
|
deb091cb05 |
Revert "printk: Flag register_console() if console is set on command line"
This reverts commit
|
||
|
|
482000cf7f |
bpf-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZnlmXgAKCRDbK58LschI g2ovAP9iynwwFEjMSxHjQVXSq1J1PMqF4966vmy30RCKJMMN/QD/SRsRRKcfsPis BzKOdsOVbWlDl2CUqvBrPZGT6laKoQc= =6/0V -----END PGP SIGNATURE----- Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-24 We've added 12 non-merge commits during the last 10 day(s) which contain a total of 10 files changed, 412 insertions(+), 16 deletions(-). The main changes are: 1) Fix a BPF verifier issue validating may_goto with a negative offset, from Alexei Starovoitov. 2) Fix a BPF verifier validation bug with may_goto combined with jump to the first instruction, also from Alexei Starovoitov. 3) Fix a bug with overrunning reservations in BPF ring buffer, from Daniel Borkmann. 4) Fix a bug in BPF verifier due to missing proper var_off setting related to movsx instruction, from Yonghong Song. 5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(), from Daniil Dulov. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xdp: Remove WARN() from __xdp_reg_mem_model() selftests/bpf: Add tests for may_goto with negative offset. bpf: Fix may_goto with negative offset. selftests/bpf: Add more ring buffer test coverage bpf: Fix overrunning reservations in ringbuf selftests/bpf: Tests with may_goto and jumps to the 1st insn bpf: Fix the corner case with may_goto and jump to the 1st insn. bpf: Update BPF LSM maintainer list bpf: Fix remap of arena. selftests/bpf: Add a few tests to cover bpf: Add missed var_off setting in coerce_subreg_to_size_sx() bpf: Add missed var_off setting in set_sext32_default_val() ==================== Link: https://patch.msgid.link/20240624124330.8401-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
2b2efe1937 |
bpf: Fix may_goto with negative offset.
Zac's syzbot crafted a bpf prog that exposed two bugs in may_goto.
The 1st bug is the way may_goto is patched. When offset is negative
it should be patched differently.
The 2nd bug is in the verifier:
when current state may_goto_depth is equal to visited state may_goto_depth
it means there is an actual infinite loop. It's not correct to prune
exploration of the program at this point.
Note, that this check doesn't limit the program to only one may_goto insn,
since 2nd and any further may_goto will increment may_goto_depth only
in the queued state pushed for future exploration. The current state
will have may_goto_depth == 0 regardless of number of may_goto insns
and the verifier has to explore the program until bpf_exit.
Fixes:
|
||
|
|
6ef8eb5125 |
cpu: Fix broken cmdline "nosmp" and "maxcpus=0"
After the rework of "Parallel CPU bringup", the cmdline "nosmp" and "maxcpus=0" parameters are not working anymore. These parameters set setup_max_cpus to zero and that's handed to bringup_nonboot_cpus(). The code there does a decrement before checking for zero, which brings it into the negative space and brings up all CPUs. Add a zero check at the beginning of the function to prevent this. [ tglx: Massaged change log ] Fixes: |
||
|
|
cfa1a2329a |
bpf: Fix overrunning reservations in ringbuf
The BPF ring buffer internally is implemented as a power-of-2 sized circular
buffer, with two logical and ever-increasing counters: consumer_pos is the
consumer counter to show which logical position the consumer consumed the
data, and producer_pos which is the producer counter denoting the amount of
data reserved by all producers.
Each time a record is reserved, the producer that "owns" the record will
successfully advance producer counter. In user space each time a record is
read, the consumer of the data advanced the consumer counter once it finished
processing. Both counters are stored in separate pages so that from user
space, the producer counter is read-only and the consumer counter is read-write.
One aspect that simplifies and thus speeds up the implementation of both
producers and consumers is how the data area is mapped twice contiguously
back-to-back in the virtual memory, allowing to not take any special measures
for samples that have to wrap around at the end of the circular buffer data
area, because the next page after the last data page would be first data page
again, and thus the sample will still appear completely contiguous in virtual
memory.
Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
book-keeping the length and offset, and is inaccessible to the BPF program.
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
possible to make a second allocated memory chunk overlapping with the first
chunk and as a result, the BPF program is now able to edit first chunk's
header.
For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
allocate a chunk B with size 0x3000. This will succeed because consumer_pos
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
page and could cause a crash.
Fix it by calculating the oldest pending_pos and check whether the range
from the oldest outstanding record to the newest would span beyond the ring
buffer size. If that is the case, then reject the request. We've tested with
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
before/after the fix and while it seems a bit slower on some benchmarks, it
is still not significantly enough to matter.
Fixes:
|
||
|
|
5337ac4c9b |
bpf: Fix the corner case with may_goto and jump to the 1st insn.
When the following program is processed by the verifier:
L1: may_goto L2
goto L1
L2: w0 = 0
exit
the may_goto insn is first converted to:
L1: r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto L2
r11 -= 1
*(u64 *)(r10 -8) = r11
goto L1
L2: w0 = 0
exit
then later as the last step the verifier inserts:
*(u64 *)(r10 -8) = BPF_MAX_LOOPS
as the first insn of the program to initialize loop count.
When the first insn happens to be a branch target of some jmp the
bpf_patch_insn_data() logic will produce:
L1: *(u64 *)(r10 -8) = BPF_MAX_LOOPS
r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto L2
r11 -= 1
*(u64 *)(r10 -8) = r11
goto L1
L2: w0 = 0
exit
because instruction patching adjusts all jmps and calls, but for this
particular corner case it's incorrect and the L1 label should be one
instruction down, like:
*(u64 *)(r10 -8) = BPF_MAX_LOOPS
L1: r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto L2
r11 -= 1
*(u64 *)(r10 -8) = r11
goto L1
L2: w0 = 0
exit
and that's what this patch is fixing.
After bpf_patch_insn_data() call adjust_jmp_off() to adjust all jmps
that point to newly insert BPF_ST insn to point to insn after.
Note that bpf_patch_insn_data() cannot easily be changed to accommodate
this logic, since jumps that point before or after a sequence of patched
instructions have to be adjusted with the full length of the patch.
Conceptually it's somewhat similar to "insert" of instructions between other
instructions with weird semantics. Like "insert" before 1st insn would require
adjustment of CALL insns to point to newly inserted 1st insn, but not an
adjustment JMP insns that point to 1st, yet still adjusting JMP insns that
cross over 1st insn (point to insn before or insn after), hence use simple
adjust_jmp_off() logic to fix this corner case. Ideally bpf_patch_insn_data()
would have an auxiliary info to say where 'the start of newly inserted patch
is', but it would be too complex for backport.
Fixes:
|
||
|
|
d5a7fc58da |
Including fixes from wireless, bpf and netfilter.
Current release - regressions:
- ipv6: bring NLM_DONE out to a separate recv() again
Current release - new code bugs:
- wifi: cfg80211: wext: set ssids=NULL for passive scans via old wext API
Previous releases - regressions:
- wifi: mac80211: fix monitor channel setting with chanctx emulation
(probably most awaited of the fixes in this PR, tracked by Thorsten)
- usb: ax88179_178a: bring back reset on init, if PHY is disconnected
- bpf: fix UML x86_64 compile failure with BPF
- bpf: avoid splat in pskb_pull_reason(), sanity check added can be hit
with malicious BPF
- eth: mvpp2: use slab_build_skb() for packets in slab, driver was
missed during API refactoring
- wifi: iwlwifi: add missing unlock of mvm mutex
Previous releases - always broken:
- ipv6: add a number of missing null-checks for in6_dev_get(), in case
IPv6 disabling races with the datapath
- bpf: fix reg_set_min_max corruption of fake_reg
- sched: act_ct: add netns as part of the key of tcf_ct_flow_table
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZ0VAAACgkQMUZtbf5S
IrtMnQ//b0YNnC2PduSn6fDnDamyZW3vjqwXQ6K0DsgSzEIiAtEd6LbkPN4vAcpp
k634dHseQjTuAcsTZxisIs32nC2up9q/t/+6XD8VSaQbSzKhB+rFDviUxfGJWjt4
MZRK0mDcmib2tXAEfYnMi+QjvC5S+ZSHLpemDdzTI3AyKcPynqLcM1PcC0CGS5GS
6MpvRAtEgTAkXd2rc4WAbOcmd8NLJN80f/srRDXFVqrXy8f6adaULvCvzSXSiQy8
peUaPhI6BYNBL2Tzjp3D+Nh54ks3Ol8MeqaGYsuJHtgd+/I+/YWzYc74an8BuEwR
C6fszbH7i64WaQUI5ZhX/1Da0CTesNxzsPgeAFP3qEe20r53vN0NiFjRrHpO02El
lew9Hrx27Zzt9k3eSdtC3GGj/S93PYjE5RRuSClQrW8fUqETZ8dFocbrNAraHGMv
rDOqIT3XMg/BIBw9ADxizAgsrFC0QbBShQPs2iMuuVwmrWj9DEC0GKlt3KxyPT36
fl4w3gGRdIDz/ZTXKQZtta3Z4ckaKiTw8jbNXxteBDEHErFYYND+4XDzK/uIqHCe
0IoVWVUnhVfKOuGBIDGIFDsAvbgqTcVd+wZTB4SxZsbXISzpfYLcrM4qXf4YQNNb
MeIQg0Zwjm+xdLGXVCt8wBBGmj4EK9uMa3wjYu3lGREgxyH42eI=
=Lb9b
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, bpf and netfilter.
Happy summer solstice! The line count is a bit inflated by a selftest
and update to a driver's FW interface header, in reality this is
slightly below average for us. We are expecting one driver fix from
Intel, but there are no big known issues.
Current release - regressions:
- ipv6: bring NLM_DONE out to a separate recv() again
Current release - new code bugs:
- wifi: cfg80211: wext: set ssids=NULL for passive scans via old wext API
Previous releases - regressions:
- wifi: mac80211: fix monitor channel setting with chanctx emulation
(probably most awaited of the fixes in this PR, tracked by Thorsten)
- usb: ax88179_178a: bring back reset on init, if PHY is disconnected
- bpf: fix UML x86_64 compile failure with BPF
- bpf: avoid splat in pskb_pull_reason(), sanity check added can be hit
with malicious BPF
- eth: mvpp2: use slab_build_skb() for packets in slab, driver was
missed during API refactoring
- wifi: iwlwifi: add missing unlock of mvm mutex
Previous releases - always broken:
- ipv6: add a number of missing null-checks for in6_dev_get(), in case
IPv6 disabling races with the datapath
- bpf: fix reg_set_min_max corruption of fake_reg
- sched: act_ct: add netns as part of the key of tcf_ct_flow_table"
* tag 'net-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettings
selftests: virtio_net: add forgotten config options
bnxt_en: Restore PTP tx_avail count in case of skb_pad() error
bnxt_en: Set TSO max segs on devices with limits
bnxt_en: Update firmware interface to 1.10.3.44
net: stmmac: Assign configured channel value to EXTTS event
net: do not leave a dangling sk pointer, when socket creation fails
net/tcp_ao: Don't leak ao_info on error-path
ice: Fix VSI list rule with ICE_SW_LKUP_LAST type
ipv6: bring NLM_DONE out to a separate recv() again
selftests: add selftest for the SRv6 End.DX6 behavior with netfilter
selftests: add selftest for the SRv6 End.DX4 behavior with netfilter
netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core
seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors
netfilter: ipset: Fix suspicious rcu_dereference_protected()
selftests: openvswitch: Set value to nla flags.
octeontx2-pf: Fix linking objects into multiple modules
octeontx2-pf: Add error handling to VLAN unoffload handling
virtio_net: fixing XDP for fully checksummed packets handling
virtio_net: checksum offloading handling fix
...
|
||
|
|
e5b3efbe1a |
Probes fixes for v6.10-rc4:
- Restrict gen-API tests for synthetic and kprobe events to only be built as modules, as they generate dynamic events that cannot be removed, causing ftracetest and startup selftests to fail. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmZy6HobHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bqtYIAMLap5hV/w9Gh5b32hOF /FS/oqGTIs8wfvZq2PBOruFmmvhrqjvpbZVTU9aNUr2lywYALM+jgO3ElSLIoZdz 5s8Wsnic5a2DvG23r/S5u80f85Gxy14e5fvCcCT/3Bvw1ip65XdMXqUwh9oM4zHh i8rmeIIJmVspHD9bxTREsosB8/LKvSx6GNzLrHwHyL5UepDgj/r5hLvyEuY3fyuo hazbvsZbHi+aduAS3it+BnhMoFLgLzqrYi8dl1fPY+xmnGI2LZZkds1mfD1JmjBB AVm9gOWKpW+HHoxeMEMcAs8mhithR7VGA2V2zdsOmRzndytKhUghHWvgcrBZWvl6 D5Y= =BNpD -----END PGP SIGNATURE----- Merge tag 'probes-fixes-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - Restrict gen-API tests for synthetic and kprobe events to only be built as modules, as they generate dynamic events that cannot be removed, causing ftracetest and startup selftests to fail * tag 'probes-fixes-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Build event generation tests only as modules |
||
|
|
b90d77e5fd |
bpf: Fix remap of arena.
The bpf arena logic didn't account for mremap operation. Add a refcnt for
multiple mmap events to prevent use-after-free in arena_vm_close.
Fixes:
|
||
|
|
3d54351c64 |
lsm/stable-6.10 PR 20240617
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmZwh44UHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNRBA//Q09J/SADHi63fjpStx+Gvo5h6TbM L4gqYsjxpi7CfXFwlBtFRjk9Q0osRDxbDWTuZ8gMcJONlRdHpZFil2gYSEacImsn tAkrQpV32U1oNua+kgoIkQTHwNIKjA9odYZ4pyJ0AZvnB5Z62B841r8GAaTADg++ fGOuCBYZeuioCAjPUN2KZtkCKdhiu823Gwe2z9U6SJyCdPqRFjpBuumDoNvCTrCB UJuc5DqWSNk2rZXZQG6RSLeOOZZwRf9s2ATU96T/9Lp0m6qqxPPisHkWscjhx5Ve W7z2IWGFrNzJ8ABKwBK/NUMQbs3WzsepyPqZdoo//PkhMjQlfb+5iPitJWM6qmdM 6jgj2HkDzX2OtR9u6VOcOKKwz4NQnf4JcHRUDjq8vQ3eKYOTcDLx4VR8O/Ullmhf pZL4klNXpBrw7DLYurTlpbm9jUmMCev9DvuSYJmyRjq7jA+8Cph6+clGriIbljqn 9hCqSnbufDxySwB0unYu9zwnC4bN+Yzcgr4qYFoA+zdj5eYloaJvPhwOh6MPsQaO DJlCt6Wfw4SqD3afxaJnzw4/SBRuPA8ISoxTXVJUg7Q+NfUI8HBDO4YihiqJ7cm0 yvD0mFvweJVEpX2slDyob58xYgkmL8TaIPErJ9A/EO30W0nm+nQzXDR+cOa9VqAc txcTscOv5YMLLMk= =nYky -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm fix from Paul Moore: "A single LSM/IMA patch to fix a problem caused by sleeping while in a RCU critical section" * tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: ima: Avoid blocking in RCU read-side critical section |
||
|
|
e6b324fbf2 |
19 hotfixes, 8 of which are cc:stable.
Mainly MM singleton fixes. And a couple of ocfs2 regression fixes. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZnCEQAAKCRDdBJ7gKXxA jmgSAQDk3BYs1n67cnwx/Zi04yMYDyfYTCYg2udPfT2a+GpmbwD+N5dJd/vCztXH 5eLpP11xd/yr2+I9FefyZeUuA80KtgQ= =2agY -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes" * tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kcov: don't lose track of remote references during softirqs mm: shmem: fix getting incorrect lruvec when replacing a shmem folio mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick mm: fix possible OOB in numa_rebuild_large_mapping() mm/migrate: fix kernel BUG at mm/compaction.c:2761! selftests: mm: make map_fixed_noreplace test names stable mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default gcov: add support for GCC 14 zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING mm: huge_memory: fix misused mapping_large_folio_support() for anon folios lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get() lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n MAINTAINERS: remove Lorenzo as vmalloc reviewer Revert "mm: init_mlocked_on_free_v3" mm/page_table_check: fix crash on ZONE_DEVICE gcc: disable '-Warray-bounds' for gcc-9 ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger() ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty() |
||
|
|
44b7f7151d |
bpf: Add missed var_off setting in coerce_subreg_to_size_sx()
In coerce_subreg_to_size_sx(), for the case where upper
sign extension bits are the same for smax32 and smin32
values, we missed to setup properly. This is especially
problematic if both smax32 and smin32's sign extension
bits are 1.
The following is a simple example illustrating the inconsistent
verifier states due to missed var_off:
0: (85) call bpf_get_prandom_u32#7 ; R0_w=scalar()
1: (bf) r3 = r0 ; R0_w=scalar(id=1) R3_w=scalar(id=1)
2: (57) r3 &= 15 ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
3: (47) r3 |= 128 ; R3_w=scalar(smin=umin=smin32=umin32=128,smax=umax=smax32=umax32=143,var_off=(0x80; 0xf))
4: (bc) w7 = (s8)w3
REG INVARIANTS VIOLATION (alu): range bounds violation u64=[0xffffff80, 0x8f] s64=[0xffffff80, 0x8f]
u32=[0xffffff80, 0x8f] s32=[0x80, 0xffffff8f] var_off=(0x80, 0xf)
The var_off=(0x80, 0xf) is not correct, and the correct one should
be var_off=(0xffffff80; 0xf) since from insn 3, we know that at
insn 4, the sign extension bits will be 1. This patch fixed this
issue by setting var_off properly.
Fixes:
|
||
|
|
380d5f89a4 |
bpf: Add missed var_off setting in set_sext32_default_val()
Zac reported a verification failure and Alexei reproduced the issue
with a simple reproducer ([1]). The verification failure is due to missed
setting for var_off.
The following is the reproducer in [1]:
0: R1=ctx() R10=fp0
0: (71) r3 = *(u8 *)(r10 -387) ;
R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R10=fp0
1: (bc) w7 = (s8)w3 ;
R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=127,var_off=(0x0; 0x7f))
2: (36) if w7 >= 0x2533823b goto pc-3
mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1
mark_precise: frame0: regs=r7 stack= before 1: (bc) w7 = (s8)w3
mark_precise: frame0: regs=r3 stack= before 0: (71) r3 = *(u8 *)(r10 -387)
2: R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=127,var_off=(0x0; 0x7f))
3: (b4) w0 = 0 ; R0_w=0
4: (95) exit
Note that after insn 1, the var_off for R7 is (0x0; 0x7f). This is not correct
since upper 24 bits of w7 could be 0 or 1. So correct var_off should be
(0x0; 0xffffffff). Missing var_off setting in set_sext32_default_val() caused later
incorrect analysis in zext_32_to_64(dst_reg) and reg_bounds_sync(dst_reg).
To fix the issue, set var_off correctly in set_sext32_default_val(). The correct
reg state after insn 1 becomes:
1: (bc) w7 = (s8)w3 ;
R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
R7_w=scalar(smin=0,smax=umax=0xffffffff,smin32=-128,smax32=127,var_off=(0x0; 0xffffffff))
and at insn 2, the verifier correctly determines either branch is possible.
[1] https://lore.kernel.org/bpf/CAADnVQLPU0Shz7dWV4bn2BgtGdxN3uFHPeobGBA72tpg5Xoykw@mail.gmail.com/
Fixes:
|
||
|
|
932d847639 |
cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked()
Commit |
||
|
|
01c8f9806b |
kcov: don't lose track of remote references during softirqs
In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV
metadata of the current task into a per-CPU variable. However, the
kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV
coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote
KCOV objects.
If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens
to get interrupted and kcov_remote_start() is called, it ultimately leads
to kcov_remote_stop() NOT restoring the original KCOV reference. So when
the task exits, all registered remote KCOV handles remain active forever.
The most uncomfortable effect (at least for syzkaller) is that the bug
prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If
we obtain it in the parent process and then e.g. drop some
capabilities and continuously fork to execute individual programs, at
some point current->kcov of the forked process is lost,
kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls
calls from subsequent forks fail.
And, yes, the efficiency is also affected if we keep on losing remote
kcov objects.
a) kcov_remote_map keeps on growing forever.
b) (If I'm not mistaken), we're also not freeing the memory referenced
by kcov->area.
Fix it by introducing a special kcov_mode that is assigned to the task
that owns a KCOV remote object. It makes kcov_mode_enabled() return true
and yet does not trigger coverage collection in __sanitizer_cov_trace_pc()
and write_comp_data().
[nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment]
Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com
Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com
Fixes:
|
||
|
|
c1558bc57b |
gcov: add support for GCC 14
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte long .gcda files with no usable data. To fix this, update GCOV_COUNTERS to match the value defined by GCC 14. Tested with GCC versions 14.1.0 and 13.2.0. Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.com Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reported-by: Allison Henderson <allison.henderson@oracle.com> Reported-by: Chuck Lever III <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
7fea700e04 |
zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
kernel_wait4() doesn't sleep and returns -EINTR if there is no
eligible child and signal_pending() is true.
That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
return false and avoid a busy-wait loop.
Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com
Fixes:
|
||
|
|
c64da10adb |
bpf-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmykPwAKCRDbK58LschI g7LOAQDVPkJ9k50/xrWIBtgvkGq1jCrMlpwEh49QYO0xoqh1IgEA+6Xje9jCIsdp AHz9WmZ6G0EpTuDgFq50K1NVZ7MgSQE= =zKfv -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-14 We've added 8 non-merge commits during the last 2 day(s) which contain a total of 9 files changed, 92 insertions(+), 11 deletions(-). The main changes are: 1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason() triggered via __bpf_try_make_writable(), from Florian Westphal. 2) Fix removal of kfuncs during linking phase which then throws a kernel build warning via resolve_btfids about unresolved symbols, from Tony Ambardar. 3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol is not available on User Mode Linux, from Maciej Żenczykowski. 4) Fix a register corruption in reg_set_min_max triggering an invariant violation in BPF verifier, from Daniel Borkmann. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Harden __bpf_kfunc tag against linker kfunc removal compiler_types.h: Define __retain for __attribute__((__retain__)) bpf: Avoid splat in pskb_pull_reason bpf: fix UML x86_64 compile failure selftests/bpf: Add test coverage for reg_set_min_max handling bpf: Reduce stack consumption in check_stack_write_fixed_off bpf: Fix reg_set_min_max corruption of fake_reg MAINTAINERS: mailmap: Update Stanislav's email address ==================== Link: https://lore.kernel.org/r/20240614203223.26500-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
9a95c5bfbf |
ima: Avoid blocking in RCU read-side critical section
A panic happens in ima_match_policy: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 PGD 42f873067 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 5 PID: |
||
|
|
b99a95bc56 |
bpf: fix UML x86_64 compile failure
pcpu_hot (defined in arch/x86) is not available on user mode linux (ARCH=um)
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Fixes:
|
||
|
|
e73cd1cfc2 |
bpf: Reduce stack consumption in check_stack_write_fixed_off
The fake_reg moved into env->fake_reg given it consumes a lot of stack space (120 bytes). Migrate the fake_reg in check_stack_write_fixed_off() as well now that we have it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20240613115310.25383-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
9242480126 |
bpf: Fix reg_set_min_max corruption of fake_reg
Juan reported that after doing some changes to buzzer [0] and implementing
a new fuzzing strategy guided by coverage, they noticed the following in
one of the probes:
[...]
13: (79) r6 = *(u64 *)(r0 +0) ; R0=map_value(ks=4,vs=8) R6_w=scalar()
14: (b7) r0 = 0 ; R0_w=0
15: (b4) w0 = -1 ; R0_w=0xffffffff
16: (74) w0 >>= 1 ; R0_w=0x7fffffff
17: (5c) w6 &= w0 ; R0_w=0x7fffffff R6_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0; 0x7fffffff))
18: (44) w6 |= 2 ; R6_w=scalar(smin=umin=smin32=umin32=2,smax=umax=umax32=0x7fffffff,var_off=(0x2; 0x7ffffffd))
19: (56) if w6 != 0x7ffffffd goto pc+1
REG INVARIANTS VIOLATION (true_reg2): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
REG INVARIANTS VIOLATION (false_reg2): const tnum out of sync with range bounds u64=[0x0, 0xffffffffffffffff] s64=[0x8000000000000000, 0x7fffffffffffffff] u32=[0x0, 0xffffffff] s32=[0x80000000, 0x7fffffff] var_off=(0x7fffffff, 0x0)
19: R6_w=0x7fffffff
20: (95) exit
from 19 to 21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
21: (14) w6 -= 2147483632 ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=14,var_off=(0x2; 0xfffffffd))
22: (76) if w6 s>= 0xe goto pc+1 ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=13,var_off=(0x2; 0xfffffffd))
23: (95) exit
from 22 to 24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
24: (14) w6 -= 14 ; R6_w=0
[...]
What can be seen here is a register invariant violation on line 19. After
the binary-or in line 18, the verifier knows that bit 2 is set but knows
nothing about the rest of the content which was loaded from a map value,
meaning, range is [2,0x7fffffff] with var_off=(0x2; 0x7ffffffd). When in
line 19 the verifier analyzes the branch, it splits the register states
in reg_set_min_max() into the registers of the true branch (true_reg1,
true_reg2) and the registers of the false branch (false_reg1, false_reg2).
Since the test is w6 != 0x7ffffffd, the src_reg is a known constant.
Internally, the verifier creates a "fake" register initialized as scalar
to the value of 0x7ffffffd, and then passes it onto reg_set_min_max(). Now,
for line 19, it is mathematically impossible to take the false branch of
this program, yet the verifier analyzes it. It is impossible because the
second bit of r6 will be set due to the prior or operation and the
constant in the condition has that bit unset (hex(fd) == binary(1111 1101).
When the verifier first analyzes the false / fall-through branch, it will
compute an intersection between the var_off of r6 and of the constant. This
is because the verifier creates a "fake" register initialized to the value
of the constant. The intersection result later refines both registers in
regs_refine_cond_op():
[...]
t = tnum_intersect(tnum_subreg(reg1->var_off), tnum_subreg(reg2->var_off));
reg1->var_off = tnum_with_subreg(reg1->var_off, t);
reg2->var_off = tnum_with_subreg(reg2->var_off, t);
[...]
Since the verifier is analyzing the false branch of the conditional jump,
reg1 is equal to false_reg1 and reg2 is equal to false_reg2, i.e. the reg2
is the "fake" register that was meant to hold a constant value. The resulting
var_off of the intersection says that both registers now hold a known value
of var_off=(0x7fffffff, 0x0) or in other words: this operation manages to
make the verifier think that the "constant" value that was passed in the
jump operation now holds a different value.
Normally this would not be an issue since it should not influence the true
branch, however, false_reg2 and true_reg2 are pointers to the same "fake"
register. Meaning, the false branch can influence the results of the true
branch. In line 24, the verifier assumes R6_w=0, but the actual runtime
value in this case is 1. The fix is simply not passing in the same "fake"
register location as inputs to reg_set_min_max(), but instead making a
copy. Moving the fake_reg into the env also reduces stack consumption by
120 bytes. With this, the verifier successfully rejects invalid accesses
from the test program.
[0] https://github.com/google/buzzer
Fixes:
|
||
|
|
3572bd5689 |
tracing: Build event generation tests only as modules
The kprobes and synth event generation test modules add events and lock (get a reference) those event file reference in module init function, and unlock and delete it in module exit function. This is because those are designed for playing as modules. If we make those modules as built-in, those events are left locked in the kernel, and never be removed. This causes kprobe event self-test failure as below. [ 97.349708] ------------[ cut here ]------------ [ 97.353453] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:2133 kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.357106] Modules linked in: [ 97.358488] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-g699646734ab5-dirty #14 [ 97.361556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 97.363880] RIP: 0010:kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.365538] Code: a8 24 08 82 e9 ae fd ff ff 90 0f 0b 90 48 c7 c7 e5 aa 0b 82 e9 ee fc ff ff 90 0f 0b 90 48 c7 c7 2d 61 06 82 e9 8e fd ff ff 90 <0f> 0b 90 48 c7 c7 33 0b 0c 82 89 c6 e8 6e 03 1f ff 41 ff c7 e9 90 [ 97.370429] RSP: 0000:ffffc90000013b50 EFLAGS: 00010286 [ 97.371852] RAX: 00000000fffffff0 RBX: ffff888005919c00 RCX: 0000000000000000 [ 97.373829] RDX: ffff888003f40000 RSI: ffffffff8236a598 RDI: ffff888003f40a68 [ 97.375715] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 97.377675] R10: ffffffff811c9ae5 R11: ffffffff8120c4e0 R12: 0000000000000000 [ 97.379591] R13: 0000000000000001 R14: 0000000000000015 R15: 0000000000000000 [ 97.381536] FS: 0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000 [ 97.383813] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 97.385449] CR2: 0000000000000000 CR3: 0000000002244000 CR4: 00000000000006b0 [ 97.387347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 97.389277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 97.391196] Call Trace: [ 97.391967] <TASK> [ 97.392647] ? __warn+0xcc/0x180 [ 97.393640] ? kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.395181] ? report_bug+0xbd/0x150 [ 97.396234] ? handle_bug+0x3e/0x60 [ 97.397311] ? exc_invalid_op+0x1a/0x50 [ 97.398434] ? asm_exc_invalid_op+0x1a/0x20 [ 97.399652] ? trace_kprobe_is_busy+0x20/0x20 [ 97.400904] ? tracing_reset_all_online_cpus+0x15/0x90 [ 97.402304] ? kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.403773] ? init_kprobe_trace+0x50/0x50 [ 97.404972] do_one_initcall+0x112/0x240 [ 97.406113] do_initcall_level+0x95/0xb0 [ 97.407286] ? kernel_init+0x1a/0x1a0 [ 97.408401] do_initcalls+0x3f/0x70 [ 97.409452] kernel_init_freeable+0x16f/0x1e0 [ 97.410662] ? rest_init+0x1f0/0x1f0 [ 97.411738] kernel_init+0x1a/0x1a0 [ 97.412788] ret_from_fork+0x39/0x50 [ 97.413817] ? rest_init+0x1f0/0x1f0 [ 97.414844] ret_from_fork_asm+0x11/0x20 [ 97.416285] </TASK> [ 97.417134] irq event stamp: |
||
|
|
07c54cc598 |
tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
After the recent commit |
||
|
|
7cec2e16cb |
Fix race between perf_event_free_task() and perf_event_release_kernel()
that can result in missed wakeups and hung tasks. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZkCmARHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hrsBAAsOlZWV+VN0l6TGUqqVZZuuZO03Nk3TX3 kk4iq/xj9pRNmq1bizfsyCFEH3lK7fSYK2I0/iK145vHASzqyeXT1EbtGv61g7EJ Z42oYdmAxhXcBOZRffZYI2t8zOcVBUV5s/aDmim11fftYdXkaX73pAHVJdhg/e0v tfgFeJwihG4ynCPa8HrRzBUEXwtKngB8II8mmHyL7E5SsoRsO0ubZBU2MW6P5NA9 9f27ab9Uxj8YWAKa/1Yn6VKgAird8ao+kUu7hYDjZyZTEOvrXU3HrM+VXptnn1S8 2UPSRPNlV2MGDn7Uo3AskIFkF4QlZ3DZFGKCn2n/1Qf/OqeD3d39DoCKzg7lTx+9 gSd7cAtKkrsA1bwfz6uZVPEY9M4eVNqZ5aIo7K28q6zSvSFYy2LI0LCzFVwRgAyK q8+XyR0aH9R9BYT1Tj0i6z9uZY/h8KROO/M/bU0egG/JRe4WX7X7mjVEQAapHi5F xLvPH0I9wo3jDVuxr7hPaT30Xv67F8V9WSKQ6HlMDN/etXNNLysCyOxDmXSXJ6Z7 9IWaLrHKJG36vj+rKOpPn6Tm+fGAxlm2lDfXExZkYnyzAI59mmf6XEHQBdPly/Q6 mXZVi3APTOn0REUo3tqi6UiaLL56HD0yvzwSmqSAFke3Nv+za+SeerwgjS2JoYWr hJPmkc9fLBY= =uVD+ -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix race between perf_event_free_task() and perf_event_release_kernel() that can result in missed wakeups and hung tasks" * tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix missing wakeup when waiting for context reference |
||
|
|
d30d0e49da |
Including fixes from BPF and big collection of fixes for WiFi core
and drivers.
Current release - regressions:
- vxlan: fix regression when dropping packets due to invalid src addresses
- bpf: fix a potential use-after-free in bpf_link_free()
- xdp: revert support for redirect to any xsk socket bound to the same
UMEM as it can result in a corruption
- virtio_net:
- add missing lock protection when reading return code from control_buf
- fix false-positive lockdep splat in DIM
- Revert "wifi: wilc1000: convert list management to RCU"
- wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
Previous releases - regressions:
- rtnetlink: make the "split" NLM_DONE handling generic, restore the old
behavior for two cases where we started coalescing those messages with
normal messages, breaking sloppily-coded userspace
- wifi:
- cfg80211: validate HE operation element parsing
- cfg80211: fix 6 GHz scan request building
- mt76: mt7615: add missing chanctx ops
- ath11k: move power type check to ASSOC stage, fix connecting
to 6 GHz AP
- ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
- rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
- iwlwifi: mvm: fix a crash on 7265
Previous releases - always broken:
- ncsi: prevent multi-threaded channel probing, a spec violation
- vmxnet3: disable rx data ring on dma allocation failure
- ethtool: init tsinfo stats if requested, prevent unintentionally
reporting all-zero stats on devices which don't implement any
- dst_cache: fix possible races in less common IPv6 features
- tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
- ax25: fix two refcounting bugs
- eth: ionic: fix kernel panic in XDP_TX action
Misc:
- tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZh3mUACgkQMUZtbf5S
IrvPwRAApv8X0ZIbPD5PuVEkiYuSkSE6QVou5GaVO7DzF4gj07zPNtCe6B/ZZdBu
RLdlppxjAmVwdCRmUo0plxSydYZcqFpQqV6lRH/rbWmktWIp0pGIOAcOG7ISRPCC
FAYJ4udSt4+wrq0hXTsE1KO1JZ0p7zE2bXxNC8uR8wgM9yonUjqhYdAUZhrl3yCY
zOCD/+kvWFLYtehDcmyNK0ANS3yNveTNkRhXDc1UrpOGMtza60lf5u3bWK+sU5VS
NGPe9cU60WKMQi6QnWFBZKIcp4Vgy2MukOLdNn9e8BRjFLh2dbY86LAmE4HWPA7I
ONZagOfEjeOcRSCMdFHxui/PUDZLBZNhrnqQ6x8uC2yKwwIMr+CgEt5sCmVFwH6n
3HTlWSjL38yuiVuYuhxGchmVnZfC4bLi2qAFF1oxhlDGViBDhAwi36MSCnjDpN8k
Jo0x6crQLS/uvwVXPKWAUcQhy7OE69A3FwwA1PtkxRX5EQPn1if2Z7yq7YfYb9aD
bChvCarlfuVDm+CBItphXg0ajVZc+im7+JK62Zn50A1cTbEK0lnYCOcmqzqiqrXI
Vr3XXt6gVVnvwY374JDO1vmB5ft2IYBn7sWnLcIvR2UlggqEfqMdKSSwm7pOprG9
YJ/LDAXVmG0kLN7rZUYUBLItnpuHAhYDrBOsV5HaFeksWauc1oY=
=mwEJ
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from BPF and big collection of fixes for WiFi core and
drivers.
Current release - regressions:
- vxlan: fix regression when dropping packets due to invalid src
addresses
- bpf: fix a potential use-after-free in bpf_link_free()
- xdp: revert support for redirect to any xsk socket bound to the
same UMEM as it can result in a corruption
- virtio_net:
- add missing lock protection when reading return code from
control_buf
- fix false-positive lockdep splat in DIM
- Revert "wifi: wilc1000: convert list management to RCU"
- wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
Previous releases - regressions:
- rtnetlink: make the "split" NLM_DONE handling generic, restore the
old behavior for two cases where we started coalescing those
messages with normal messages, breaking sloppily-coded userspace
- wifi:
- cfg80211: validate HE operation element parsing
- cfg80211: fix 6 GHz scan request building
- mt76: mt7615: add missing chanctx ops
- ath11k: move power type check to ASSOC stage, fix connecting to
6 GHz AP
- ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
- rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
- iwlwifi: mvm: fix a crash on 7265
Previous releases - always broken:
- ncsi: prevent multi-threaded channel probing, a spec violation
- vmxnet3: disable rx data ring on dma allocation failure
- ethtool: init tsinfo stats if requested, prevent unintentionally
reporting all-zero stats on devices which don't implement any
- dst_cache: fix possible races in less common IPv6 features
- tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
- ax25: fix two refcounting bugs
- eth: ionic: fix kernel panic in XDP_TX action
Misc:
- tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB"
* tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
selftests: net: lib: set 'i' as local
selftests: net: lib: avoid error removing empty netns name
selftests: net: lib: support errexit with busywait
net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
ipv6: fix possible race in __fib6_drop_pcpu_from()
af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
af_unix: Annotate data-races around sk->sk_sndbuf.
af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
af_unix: Annotate data-race of sk->sk_state in unix_accept().
af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
af_unix: Annodate data-races around sk->sk_state for writers.
af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
...
|
||
|
|
74751ef5c1 |
perf/core: Fix missing wakeup when waiting for context reference
In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:
[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.
Thread A Thread B
... ...
perf_event_free_task perf_event_release_kernel
...
acquire event->child_mutex
...
get_ctx
... release event->child_mutex
acquire ctx->mutex
...
perf_free_event (acquire/release event->child_mutex)
...
release ctx->mutex
wait_var_event
acquire ctx->mutex
acquire event->child_mutex
# move existing events to free_list
release event->child_mutex
release ctx->mutex
put_ctx
... ...
In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.
Fixes:
|
||
|
|
2884dc7d08 |
bpf: Fix a potential use-after-free in bpf_link_free()
After commit |
||
|
|
2317dc2c22 |
bpf, devmap: Remove unnecessary if check in for loop
The iterator variable dst cannot be NULL and the if check can be removed. Remove it and fix the following Coccinelle/coccicheck warning reported by itnull.cocci: ERROR: iterator variable bound on line 762 cannot be NULL Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240529101900.103913-2-thorsten.blum@toblux.com |
||
|
|
ec9eeb89e6 |
Kbuild fixes for v6.10
- Fix a Kconfig bug regarding comparisons to 'm' or 'n' - Replace missed $(srctree)/$(src) - Fix unneeded kallsyms step 3 - Remove incorrect "compatible" properties from image nodes in image.fit - Improve gen_kheaders.sh - Fix 'make dt_binding_check' - Clean up unnecessary code -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZbSHwVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG/acP/1kxHTDYHmCrcZfxrHVKAg/LHYz+ 73J0yoc+hA6vDgcpLYrpDfg/r4slWIlOAxlu4oBKo7Wn/917GRRW7EYcw+mja7Ps zaLbO2QGDLDp6L+O1c9V3Y7JIbdZvco03iEtsuN7NAisQn3vbo6NTbQV+iGPRR2h k4/2UmwSbcXN1bX5co3jEgsaAYzNhFNVQ3m+LN/UYozodIl/+9tc0vkafw4ejxkd nsiwKW4V30IvxmDs0YkmzJEcAe/UNmBrTGMAqiDUThA1y0FnXHJGODzC5TFV4O0m Pf40uaeWCrzAuaa2KwjzwK83v3PpeMUM7cXGUa4nrxoeQhfb7/ZN+GnSTF9yIEbZ 07T27bVMUqbCrRFCIHOMQLqZ9fLsDYqT/k70fQcXWHNCK74VcoB2XDYZSDW5Sc5B h7NPiD+zkZKNR4lMWJ7WAyoLX/atSnP6465itp5quZRlQMtC6ns/6SCkQ715O+0B akVta9yduLeqlbZQYIsQ9UMEfAc2MI1eGUTtTzEHYfBvkxErvOFgmcRWFV+S5Mxl ZKYue2R9SM/UvSTcQD91eAWbUyu2Z5CjLt87mjuLqu56//HKGYlm/HsANkBEfpYz i8B3hiKqdmTM0w2ajDauQMu3hDjT0UtMTg45N6IaryeMiBErEVsMltDsxsb3F27x kKGIsQMf5RvFGiZ5 =wSXX -----END PGP SIGNATURE----- Merge tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix a Kconfig bug regarding comparisons to 'm' or 'n' - Replace missed $(srctree)/$(src) - Fix unneeded kallsyms step 3 - Remove incorrect "compatible" properties from image nodes in image.fit - Improve gen_kheaders.sh - Fix 'make dt_binding_check' - Clean up unnecessary code * tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: dt-bindings: kbuild: Fix dt_binding_check on unconfigured build kheaders: use `command -v` to test for existence of `cpio` kheaders: explicitly define file modes for archived headers scripts/make_fit: Drop fdt image entry compatible string kbuild: remove a stale comment about cleaning in link-vmlinux.sh kbuild: fix short log for AS in link-vmlinux.sh kbuild: change scripts/mksysmap into sed script kbuild: avoid unneeded kallsyms step 3 kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src) kconfig: remove redundant check in expr_join_or() kconfig: fix comparison to constant symbols, 'm', 'n' kconfig: remove unused expr_is_no() |
||
|
|
aeb8fe0283 |
bpf: Fix bpf_session_cookie BTF_ID in special_kfunc_set list
The bpf_session_cookie is unavailable for !CONFIG_FPROBE as reported
by Sebastian [1].
To fix that we remove CONFIG_FPROBE ifdef for session kfuncs, which
is fine, because there's filter for session programs.
Then based on bpf_trace.o dependency:
obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
we add bpf_session_cookie BTF_ID in special_kfunc_set list dependency
on CONFIG_BPF_EVENTS.
[1] https://lore.kernel.org/bpf/20240531071557.MvfIqkn7@linutronix.de/T/#m71c6d5ec71db2967288cb79acedc15cc5dbfeec5
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Fixes:
|