The UML function names to_virt() and to_phys() are exposed by UML
headers, and are very generic and may be defined by drivers. As it
turns out, commit 9409c9b670 ("pmem: refactor pmem_clear_poison()")
did exactly that.
This results in build errors such as the following when trying to build
um:allmodconfig:
drivers/nvdimm/pmem.c: In function ‘pmem_dax_zero_page_range’:
./arch/um/include/asm/page.h:105:20: error: too few arguments to function ‘to_phys’
105 | #define __pa(virt) to_phys((void *) (unsigned long) (virt))
| ^~~~~~~
Use less generic function names for the um specific to_phys() and
to_virt() functions to fix the problem and to avoid similar problems in
the future.
Fixes: 9409c9b670 ("pmem: refactor pmem_clear_poison()")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
of Peter Zijlstra was encountering with ptrace in his freezer rewrite
I identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habbit of the ptrace code to change task->__state
from the tracer to suppress TASK_WAKEKILL from waking up the tracee. No
other code in the kernel does that and it is straight forward to update
signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers for
a process are only reliably available when the process is stopped in
the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly.
According to our testing and our understanding of userspace nothing
cares that spurious SIGTRAPs can be generated in that case.
The entire discussion can be found at:
https://lkml.kernel.org/r/87a6bv6dl6.fsf_-_@email.froward.int.ebiederm.org
Eric W. Biederman (11):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Remove arch_ptrace_attach
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
ptrace: Document that wait_task_inactive can't fail
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Don't change __state
ptrace: Always take siglock in ptrace_resume
Peter Zijlstra (1):
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
arch/ia64/include/asm/ptrace.h | 4 --
arch/ia64/kernel/ptrace.c | 57 ----------------
arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +--
arch/um/kernel/signal.c | 4 +-
arch/x86/kernel/step.c | 3 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched.h | 10 ++-
include/linux/sched/jobctl.h | 8 +++
include/linux/sched/signal.h | 20 ++++--
include/linux/signal.h | 3 +-
kernel/ptrace.c | 87 ++++++++---------------
kernel/sched/core.c | 5 +-
kernel/signal.c | 140 +++++++++++++++++---------------------
kernel/time/posix-cpu-timers.c | 6 +-
20 files changed, 140 insertions(+), 240 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaXaYACgkQC/v6Eiaj
j0CgoA/+JncSQ6PY2D5Jh1apvHzmnRsFXzr3DRvtv/CVx4oIebOXRQFyVDeD5tRn
TmMgB29HpBlHRDLojlmlZRGAld1HR/aPEW9j8W1D3Sy/ZFO5L8lQitv9aDHO9Ntw
4lZvlhS1M0KhATudVVBqSPixiG6CnV5SsGmixqdOyg7xcXSY6G1l2nB7Zk9I3Tat
ZlmhuZ6R5Z5qsm4MEq0vUSrnsHiGxYrpk6uQOaVz8Wkv8ZFmbutt6XgxF0tsyZNn
mHSmWSiZzIgBjTlaibEmxi8urYJTPj3vGBeJQVYHblFwLFi6+Oy7bDxQbWjQvaZh
DsgWPScfBF4Jm0+8hhCiSYpvPp8XnZuklb4LNCeok/VFr+KfSmpJTIhn00kagQ1u
vxQDqLws8YLW4qsfGydfx9uUIFCbQE/V2VDYk5J3Re3gkUNDOOR1A56hPniKv6VB
2aqGO2Fl0RdBbUa3JF+XI5Pwq5y1WrqR93EUvj+5+u5W9rZL/8WLBHBMEz6gbmfD
DhwFE0y8TG2WRlWJVEDRId+5zo3di/YvasH0vJZ5HbrxhS2RE/yIGAd+kKGx/lZO
qWDJC7IHvFJ7Mw5KugacyF0SHeNdloyBM7KZW6HeXmgKn9IMJBpmwib92uUkRZJx
D8j/bHHqD/zsgQ39nO+c4M0MmhO/DsPLG/dnGKrRCu7v1tmEnkY=
=ZUuO
-----END PGP SIGNATURE-----
Merge tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace_stop cleanups from Eric Biederman:
"While looking at the ptrace problems with PREEMPT_RT and the problems
Peter Zijlstra was encountering with ptrace in his freezer rewrite I
identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habit of the ptrace code to change
task->__state from the tracer to suppress TASK_WAKEKILL from waking up
the tracee. No other code in the kernel does that and it is straight
forward to update signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers
for a process are only reliably available when the process is stopped
in the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly. According
to our testing and our understanding of userspace nothing cares that
spurious SIGTRAPs can be generated in that case"
* tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
ptrace: Always take siglock in ptrace_resume
ptrace: Don't change __state
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Document that wait_task_inactive can't fail
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Remove arch_ptrace_attach
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
signal: Replace __group_send_sig_info with send_signal_locked
signal: Rename send_signal send_signal_locked
- Various cleanups and fixes: xterm, serial line, time travel
- Set ARCH_HAS_GCOV_PROFILE_ALL
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmKZz4UWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wRrZD/951tWxCjNiSZYL8B32YnaxqJLf
vwDwXWxYbXmtLgVhqTGCQ61qNFhKnGVeyfO2niacB5EB3VFzxtB2dXTswDpcu/93
o99/Dozisehcn9L3CbGO0sKCxKZbdgP4TQPOGQpr8lyzv9NlmNF/bgkiH8s8rIB/
ACaiKmLrAVIyQz/8VElFqJNyB/RkmcILks//jVidFlueZhmYMSzbfdVDFyTDDJMA
JEmIz+SG2hg15yCy620EpWgskHvSQSNhWMv4wxJVy4XRdZ7nztb9babV3lIaRmaI
8rBR+DsLGlDeep3SOv63giFzjMjHpXcAlJ0UoefkaRA4htSP4GmyDPHX6Fo6ilW4
fQ52lxsHmP+6fLmWNOnFjsvk93z9u3XU55ReEhP1PGzgAGDNUczy8BEYvb6l0Weq
M8BdYgU2nh/SA0ycmXdSVbyl7nFST5s0lHr6hEt22CMxJ+jz6WS650vrnH2JimMX
bnJMcEYX6PeyDq4lTYyrWOCdzPTorT6eEcn/BM3qKgWkosK0FlN+nnT0HOef5Bag
jezo4/dt/VPfftKQS28Waufud/nnJD2oFAvFBG0/YHTooTZMkSnH2LnRbYkFo7vS
xkDRnHJGSDOaxzdSfgNlwveqZ/qTgjNs1CUYHBrs2Tj1dJicqro+wWjZg8qzCHI+
8YgCQjaEOYLN6CN0CA==
=hD4o
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Various cleanups and fixes: xterm, serial line, time travel
- Set ARCH_HAS_GCOV_PROFILE_ALL
* tag 'for-linus-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Fix out-of-bounds read in LDT setup
um: chan_user: Fix winch_tramp() return value
um: virtio_uml: Fix broken device handling in time-travel
um: line: Use separate IRQs per line
um: Enable ARCH_HAS_GCOV_PROFILE_ALL
um: Use asm-generic/dma-mapping.h
um: daemon: Make default socket configurable
um: xterm: Make default terminal emulator configurable
Today, all possible serial lines (ssl*=) as well as all
possible consoles (con*=) each share a single interrupt
(with a fixed number) with others of the same type.
Now, if you have two lines, say ssl0 and ssl1, and one
of them is connected to an fd you cannot read (e.g. a
file), but the other gets a read interrupt, then both
of them get the interrupt since it's shared. Then, the
read() call will return EOF, since it's a file being
written and there's nothing to read (at least not at
the current offset, at the end).
Unfortunately, this is treated as a read error, and we
close this line, losing all the possible output.
It might be possible to work around this and make the
IRQ sharing work, however, now that we have dynamically
allocated IRQs that are easy to use, simply use that to
achieve separating between the events; then there's no
interrupt for that line and we never attempt the read
in the first place, thus not closing the line.
This manifested itself in the wifi hostap/hwsim tests
where the parallel script communicates via one serial
console and the kernel messages go to another (a file)
and sending data on the communication console caused
the kernel messages to stop flowing into the file.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: anton ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If DMA (PCI over virtio) is enabled, then some drivers may
enable CONFIG_DMA_OPS as well, and then we pull in the x86
definition of get_arch_dma_ops(), which uses the dma_ops
symbol, which isn't defined.
Since we don't have real DMA ops nor any kind of IOMMU fix
this in the simplest possible way: pull in the asm-generic
file instead of inheriting the x86 one. It's not clear why
those drivers that do (e.g. VDPA) "select DMA_OPS", and if
they'd even work with this, but chances are nobody will be
wanting to do that anyway, so fixing the build failure is
good enough.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
This is accomplished by just including the asm-generic code like on
other architectures, which means we can get rid of the empty stub
function here.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.
So use a thread info flag with a better name instead of flag in tsk->ptrace.
Remove the definition PT_DTRACE as uml is the last user.
Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull vfs updates from Al Viro:
"Assorted bits and pieces"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: drop needless assignment in aio_read()
clean overflow checks in count_mounts() a bit
seq_file: fix NULL pointer arithmetic warning
uml/x86: use x86 load_unaligned_zeropad()
asm/user.h: killed unused macros
constify struct path argument of finish_automount()/do_add_mount()
fs: Remove FIXME comment in generic_write_checks()
- Devicetree support (for testing)
- Various cleanups and fixes: UBD, port_user, uml_mconsole
- Maintainer update
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmJFwUMWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQqBD/9gLyeiVp2eu1YFVir64IASgVjK
lNdlAfUwfebtEsw65JcfY8K64910ahw6TvkjTT2A+QGeJIYaVwmw69bLXJUvQq31
C7ZDsMHptuNiZrHDL9SoA0DfwqRdJx3tgGzDnSkhX+2T7Zs5n1nLRMBmn/NJV9Qy
CmxG9fLH1VsU0p6RI76WST3GPLOqWa3jCeHK1vMGZNXI+eo5prHc59lkOcT7lEy7
M4vJRaAV6pCDDYMQdDOYr1PDEeG7/h49EqdKylkOhonDyYB649rL6Lc9nRBvSts3
NXX/qYy1Sj1AlOSR5IOon6QCyk1hap9kr85QoCtz3VMabD/yLlBovZzLOLaF+0S6
dQWgKg806g8QYQGxN03Ph0Pb5cA6hAjr8nVmAuICJDWgmY6Oo74pEvhI8toofFzk
NJzwa6G99xNhfggeTcGdG0ddQDT8N3enKspDPkzpN127GzU5cgvI1Z8wnZXB7JDM
zLMCxzwehocCSrFlh9aQDFK1XJfEWuP66xEPl5cX46//IMKqsrXEOjNlCTRUmA5F
OhU4qqb01OW3K4HPaAkBcGPZ0HhFn6JREUFyNW07dg6s73IWzf0CaNKeYJS7abln
tdvfPg3OPNXCjHd3aCW22EzuB9R/K8BNMkva3QQZxtUa+tOjBdBd9JBJ+vHGA1MN
7/k60wl1dt8/N9yHFg==
=YsK8
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Devicetree support (for testing)
- Various cleanups and fixes: UBD, port_user, uml_mconsole
- Maintainer update
* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: run_helper: Write error message to kernel log on exec failure on host
um: port_user: Improve error handling when port-helper is not found
um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar
um: port_user: Search for in.telnetd in PATH
um: clang: Strip out -mno-global-merge from USER_CFLAGS
docs: UML: Mention telnetd for port channel
um: Remove unused timeval_to_ns() function
um: Fix uml_mconsole stop/go
um: Cleanup syscall_handler_t definition/cast, fix warning
uml: net: vector: fix const issue
um: Fix WRITE_ZEROES in the UBD Driver
um: Migrate vector drivers to NAPI
um: Fix order of dtb unflatten/early init
um: fix and optimize xor select template for CONFIG64 and timetravel mode
um: Document dtb command line option
lib/logic_iomem: correct fallback config references
um: Remove duplicated include in syscalls_64.c
MAINTAINERS: Update UserModeLinux entry
Here is the big set of char/misc and other small driver subsystem
updates for 5.18-rc1.
Included in here are merges from driver subsystems which contain:
- iio driver updates and new drivers
- fsi driver updates
- fpga driver updates
- habanalabs driver updates and support for new hardware
- soundwire driver updates and new drivers
- phy driver updates and new drivers
- coresight driver updates
- icc driver updates
Individual changes include:
- mei driver updates
- interconnect driver updates
- new PECI driver subsystem added
- vmci driver updates
- lots of tiny misc/char driver updates
There will be two merge conflicts with your tree, one in MAINTAINERS
which is obvious to fix up, and one in drivers/phy/freescale/Kconfig
which also should be easy to resolve.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG3fQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykNEgCfaRG8CRxewDXOO4+GSeA3NGK+AIoAnR89donC
R4bgCjfg8BWIBcVVXg3/
=WWXC
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc and other driver updates from Greg KH:
"Here is the big set of char/misc and other small driver subsystem
updates for 5.18-rc1.
Included in here are merges from driver subsystems which contain:
- iio driver updates and new drivers
- fsi driver updates
- fpga driver updates
- habanalabs driver updates and support for new hardware
- soundwire driver updates and new drivers
- phy driver updates and new drivers
- coresight driver updates
- icc driver updates
Individual changes include:
- mei driver updates
- interconnect driver updates
- new PECI driver subsystem added
- vmci driver updates
- lots of tiny misc/char driver updates
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (556 commits)
firmware: google: Properly state IOMEM dependency
kgdbts: fix return value of __setup handler
firmware: sysfb: fix platform-device leak in error path
firmware: stratix10-svc: add missing callback parameter on RSU
arm64: dts: qcom: add non-secure domain property to fastrpc nodes
misc: fastrpc: Add dma handle implementation
misc: fastrpc: Add fdlist implementation
misc: fastrpc: Add helper function to get list and page
misc: fastrpc: Add support to secure memory map
dt-bindings: misc: add fastrpc domain vmid property
misc: fastrpc: check before loading process to the DSP
misc: fastrpc: add secure domain support
dt-bindings: misc: add property to support non-secure DSP
misc: fastrpc: Add support to get DSP capabilities
misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP
misc: fastrpc: separate fastrpc device from channel context
dt-bindings: nvmem: brcm,nvram: add basic NVMEM cells
dt-bindings: nvmem: make "reg" property optional
nvmem: brcm_nvram: parse NVRAM content into NVMEM cells
nvmem: dt-bindings: Fix the error of dt-bindings check
...
Hi Linus,
Please, pull the following treewide patch that replaces zero-length arrays with
flexible-array members. This patch has been baking in linux-next for a
whole development cycle.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmI6GIUACgkQRwW0y0cG
2zFLWw/+OB1gZeQD3boKpUMntWnn6wjhUxdrO8CYkpzG+B+8TFECXNjy8HV1CSiw
GKKRndYELOyYaD5o/F2vtPe10iPHbrdIlMFRPBRoht0/cvSZgzHlfT8EjWQwerYY
dieztUFKjeSj0MXivdNDnKOTm8o9cz8KmCrWFP+My37Fasn/9+nBX8iNVIvAX4xy
T+IVmjtDifQUsTs298UGnBvDeuZOiGHhXXU5rq6lIX0Rl554OsWZW94d6jUPj/h7
t1v6jdojNuyaMKn45/xnPj9VvmDiSu3K67m3fjRdzLPDOhISjr2fw4KEUOKdsebh
yJ9t5u8IufyPbm9kyI+rZt+T8ZlV2/qt2+mt6QgtDMnWrs+4nU15JY0SHImMSBZQ
rBEZcQlrIcGJ+CsNB8Y7jIGYO0SSkhodAvfl0LRA0AbTqLGqq0OkAQS5D52r3H2r
uz6xdYb7kG43XaRyaAIPqhZsp/jk2NrXvEvin2tSaXZFR1cxp+oxcV2UajmnOU6i
EIBS4PzJnYx2RZRa+h8YbBa/+D4N6+fj/tjmwBawiUBPjjaLAsGFNwUHqvBoD05S
bk6oXi654NBwVjsknZ0grVz0TtSvdZ3uJL5FZApTOHITqH8vlxlNefmHri4vZRZO
NN7NIQ0yaUCnorzMg+vP8ZtflhQwrMJbjwIS9YD0RHd7MBhYX8k=
=xZD2
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array transformations from Gustavo Silva:
"Treewide patch that replaces zero-length arrays with flexible-array
members.
This has been baking in linux-next for a whole development cycle"
* tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
treewide: Replace zero-length arrays with flexible-array members
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
We need to use this function in common code, so define it for
architectures and/or configrations that miss it. The result of
pmd_pfn() will only be used if TRANSPARENT_HUGEPAGE is enabled,
but a function or macro called pmd_pfn() must be defined, even
on machines with two level page tables.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Nowadays PC-style parallel ports come in the form of PCI and PCIe option
cards and there are some combined parallel/serial option cards as well
that we handle in the parport subsystem. There is nothing in particular
that would prevent them from being used in any system equipped with PCI
or PCIe connectivity, except that we do not permit the PARPORT_PC config
option to be selected for platforms for which ARCH_MIGHT_HAVE_PC_PARPORT
has not been set for.
The only PCI platforms that actually can't make use of PC-style parallel
port hardware are those newer PCIe systems that have no support for I/O
cycles in the host bridge, required by such parallel ports. Notably,
this includes the s390 arch, which has port I/O accessors that cause
compilation warnings (promoted to errors with `-Werror'), and there are
other cases such as the POWER9 PHB4 device, though this one has variable
port I/O accessors that depend on the particular system. Also it is not
clear whether the serial port side of devices enabled by PARPORT_SERIAL
uses port I/O or MMIO. Finally Super I/O solutions are always either
ISA or platform devices.
Make the PARPORT_PC option selectable also for PCI systems then, except
for the s390 arch, however limit the availability of PARPORT_PC_SUPERIO
to platforms that enable ARCH_MIGHT_HAVE_PC_PARPORT. Update platforms
accordingly for the required <asm/parport.h> header.
Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2202141955550.34636@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Call to fallocate with FALLOC_FL_PUNCH_HOLE on a device backed by a sparse
file can end up by missing data, zeroes data range, if the underlying file
is used with a tool like bmaptool which will referenced only used spaces.
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Due to dropped inclusion of asm-generic/xor.h, xor_block_8regs symbol is
missing with CONFIG64 and break compilation, as the asm/xor_64.h also did
not include it. The patch recreate the logic from arch/x86, which check
whether AVX is available and add fallbacks for 32bit and 64bit config of
um.
A very minor additional "fix" is, the return of the macro parameter
instead of NULL, as this is the original intent of the macro, but
this does not change the actual behavior.
Fixes: c0ecca6604 ("um: enable the use of optimized xor routines in UML")
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.
Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.
For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.
Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.
Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Nine architectures are still missing __{get,put}_kernel_nofault:
alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa.
Add a generic version that lets everything use the normal
copy_{from,to}_kernel_nofault() code based on these, removing the last
use of get_fs()/set_fs() from architecture-independent code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].
This code was transformed with the help of Coccinelle:
(next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)
@@
identifier S, member, array;
type T1, T2;
@@
struct S {
...
T1 member;
T2 array[
- 0
];
};
UAPI and wireless changes were intentionally excluded from this patch
and will be sent out separately.
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/78
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Remove address space overrides using set_fs() for User Mode Linux.
Note that just like the existing kernel access case of the uaccess
routines the new nofault kernel handlers do not actually have any
exception handling. This is probably broken, but not change to the
status quo.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
get_vm(), add_iomem(), phys_offset() dead since 2004;
init_mem_user() and setup_memory() - since before the initial merge.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Only one extern in there is needed in processor-generic.h, and it's
not needed anywhere else. So move it over there and get rid of
the include in processor-generic.h, adding includes of registers.h
to the few files that need the declarations in it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
The function names init_registers() and restore_registers() are used
in several net/ethernet/ and gpu/drm/ drivers for other purposes (not
calls to UML functions), so rename them.
This fixes multiple build errors.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: linux-um@lists.infradead.org
Signed-off-by: Richard Weinberger <richard@nod.at>
Rename set_signals() as there's at least one driver that
uses the same name and can now be built on UM due to PCI
support, and thus we can get symbol conflicts.
Also rename set_signals_trace() to be consistent.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Many places in the kernel use 'udelay' as an identifier, and
are broken with the current "#define udelay um_udelay". Fix
this by adding an argument to the macro, and do the same to
'ndelay' as well, just in case.
Fixes: 0bc8fb4dda ("um: Implement ndelay/udelay in time-travel mode")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This is a single cleanup from Peter Collingbourne, removing
some dead code.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGKm+kACgkQmmx57+YA
GNkksg/7BUxJrWFrQmLA3fhzh4wG3KswdrKTGQMf0jRVI1n77vmdfig3hEJmMekH
0SZoIYmPztOSj34+6p4NxuqY/Sk62oYLr8Awo8ZLDhIBWNUJE8UWC07Qb/3DpGbp
JfD7/8mXu10htgM85aGlhaVLnHvvqUBR8PlkJUGNxuY5Gy2L+eCkpwCAYlZpSOqr
vs0BJWFY1LzO1POcjIq4/IM0PvcU2ncLB5XoxDjjIIWnWKyHzWY21ZvoeaNumvou
xqQ/Hj8Sc+ufS0yNlSgIC+bJP0bp1bSw/dALKr8oYxLt7X9LELVY3WXyRH+It4nS
b6HPYmga26NVq/u7RrylBA+2fRCDB8E6z73gHt4SeHrRDEvFjhNzIyV+aXPae/dY
XI/pjiwpG6k6FpSnF69YZJ/Y+GmUA90V/Jq8aLFZhGz8SgpjRl+2foEUSDhRVXCA
jGB1Y388m0e6jPlVJROB7ORzXMd8K5iciyUGqtAI87QCOtPozn10ruh5RdziguKm
kDW5IKy9E4l1ch8WRprVbgV5Ew+QWKS1JIbyjDaX3jN0lUPCqgwkwvRxxtgdFnVA
Lq5BiUdraSWFUr84rLU3gCRU0+VoEdyZYI+bQGGNlQ4ovmLYU1nLmCU/azMvSEgz
ZsoM5YdffShxUtfwMg+W67RpYOjSnV55ZzwN0+d1C2oqz9xECXc=
=8EFP
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanup from Arnd Bergmann:
"This is a single cleanup from Peter Collingbourne, removing some dead
code"
* tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: remove unused function syscall_set_arguments()
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
- Add -s option (strict mode) to merge_config.sh to make it fail when
any symbol is redefined.
- Show a warning if a different compiler is used for building external
modules.
- Infer --target from ARCH for CC=clang to let you cross-compile the
kernel without CROSS_COMPILE.
- Make the integrated assembler default (LLVM_IAS=1) for CC=clang.
- Add <linux/stdarg.h> to the kernel source instead of borrowing
<stdarg.h> from the compiler.
- Add Nick Desaulniers as a Kbuild reviewer.
- Drop stale cc-option tests.
- Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
to handle symbols in inline assembly.
- Show a warning if 'FORCE' is missing for if_changed rules.
- Various cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmExXHoVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGAZwP/iHdEZzuQ4cz2uXUaV0fevj9jjPU
zJ8wrrNabAiT6f5x861DsARQSR4OSt3zN0tyBNgZwUdotbe7ED5GegrgIUBMWlML
QskhTEIZj7TexAX/20vx671gtzI3JzFg4c9BuriXCFRBvychSevdJPr65gMDOesL
vOJnXe+SGXG2+fPWi/PxrcOItNRcveqo2GiWHT3g0Cv/DJUulu81gEkz3hrufnMR
cjMeSkV0nJJcvI755OQBOUnEuigW64k4m2WxHPG24tU8cQOCqV6lqwOfNQBAn4+F
OoaCMyPQT9gvGYwGExQMCXGg0wbUt1qnxzOVoA2qFCwbo+MFhqjBvPXab6VJm7CE
mY3RrTtvxSqBdHI6EGcYeLjhycK9b+LLoJ1qc3S9FK8It6NoFFp4XV0R6ItPBls7
mWi9VSpyI6k0AwLq+bGXEHvaX/bnnf/vfqn8H+w6mRZdXjFV8EB2DiOSRX/OqjVG
RnvTtXzWWThLyXvWR3Jox4+7X6728oL7akLemoeZI6oTbJDm7dQgwpz5HbSyHXLh
d+gUF3Y/6lqxT5N9GSVDxpD1bEMh2I7nGQ4M7WGbGas/3yUemF8wbBqGQo4a+YeD
d9vGAUxDp2PQTtL2sjFo5Gd4PZEM9g7vwWzRvHe0o5NxKEXcBg25b8cD1hxrN9Y4
Y1AAnc0kLO+My3PC
=lw3M
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add -s option (strict mode) to merge_config.sh to make it fail when
any symbol is redefined.
- Show a warning if a different compiler is used for building external
modules.
- Infer --target from ARCH for CC=clang to let you cross-compile the
kernel without CROSS_COMPILE.
- Make the integrated assembler default (LLVM_IAS=1) for CC=clang.
- Add <linux/stdarg.h> to the kernel source instead of borrowing
<stdarg.h> from the compiler.
- Add Nick Desaulniers as a Kbuild reviewer.
- Drop stale cc-option tests.
- Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
to handle symbols in inline assembly.
- Show a warning if 'FORCE' is missing for if_changed rules.
- Various cleanups
* tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
kbuild: redo fake deps at include/ksym/*.h
kbuild: clean up objtool_args slightly
modpost: get the *.mod file path more simply
checkkconfigsymbols.py: Fix the '--ignore' option
kbuild: merge vmlinux_link() between ARCH=um and other architectures
kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
kbuild: remove stale *.symversions
kbuild: remove unused quiet_cmd_update_lto_symversions
gen_compile_commands: extract compiler command from a series of commands
x86: remove cc-option-yn test for -mtune=
arc: replace cc-option-yn uses with cc-option
s390: replace cc-option-yn uses with cc-option
ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild
sparc: move the install rule to arch/sparc/Makefile
security: remove unneeded subdir-$(CONFIG_...)
kbuild: sh: remove unused install script
kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
kbuild: Switch to 'f' variants of integrated assembler flag
kbuild: Shuffle blank line to improve comment meaning
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmEt+hwACgkQUqAMR0iA
lPLppBAAiyrUNVmqqtdww+IJajEs1uD/4FqPsysHRwroHBFymJeQG1XCwUpDZ7jj
6gXT0chxyjQE18gT/W9nf+PSmA9XvIVA1WSR+WCECTNW3YoZXqtgwiHfgnitXYku
HlmoZLthYeuoXWw2wn+hVLfTRh6VcPHYEaC21jXrs6B1pOXHbvjJ5eTLHlX9oCfL
UKSK+jFTHAJcn/GskRzviBe0Hpe8fqnkRol2XX13ltxqtQ73MjaGNu7imEH6/Pa7
/MHXWtuWJtOvuYz17aztQP4Qwh1xy+kakMy3aHucdlxRBTP4PTzzTuQI3L/RYi6l
+ttD7OHdRwqFAauBLY3bq3uJjYb5v/64ofd8DNnT2CJvtznY8wrPbTdFoSdPcL2Q
69/opRWHcUwbU/Gt4WLtyQf3Mk0vepgMbbVg1B5SSy55atRZaXMrA2QJ/JeawZTB
KK6D/mE7ccze/YFzsySunCUVKCm0veoNxEAcakCCZKXSbsvd1MYcIRC0e+2cv6e5
2NEH7gL4dD+5tqu5nzvIuKDn3NrDQpbi28iUBoFbkxRgcVyvHJ9AGSa62wtb5h3D
OgkqQMdVKBbjYNeUodPlQPzmXZDasytavyd0/BC/KENOcBvU/8gW++2UZTfsh/1A
dLjgwFBdyJncQcCS9Abn20/EKntbIMEX8NLa97XWkA3fuzMKtak=
=yEVq
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Optionally, provide an index of possible printk messages via
<debugfs>/printk/index/. It can be used when monitoring important
kernel messages on a farm of various hosts. The monitor has to be
updated when some messages has changed or are not longer available by
a newly deployed kernel.
- Add printk.console_no_auto_verbose boot parameter. It allows to
generate crash dump even with slow consoles in a reasonable time
frame.
- Remove printk_safe buffers. The messages are always stored directly
to the main logbuffer, even in NMI or recursive context. Also it
allows to serialize syslog operations by a mutex instead of a spin
lock.
- Misc clean up and build fixes.
* tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk/index: Fix -Wunused-function warning
lib/nmi_backtrace: Serialize even messages about idle CPUs
printk: Add printk.console_no_auto_verbose boot parameter
printk: Remove console_silent()
lib/test_scanf: Handle n_bits == 0 in random tests
printk: syslog: close window between wait and read
printk: convert @syslog_lock to mutex
printk: remove NMI tracking
printk: remove safe buffers
printk: track/limit recursion
lib/nmi_backtrace: explicitly serialize banner and regs
printk: Move the printk() kerneldoc comment to its new home
printk/index: Fix warning about missing prototypes
MIPS/asm/printk: Fix build failure caused by printk
printk: index: Add indexing support to dev_printk
printk: Userspace format indexing support
printk: Rework parse_prefix into printk_parse_prefix
printk: Straighten out log_flags into printk_info_flags
string_helpers: Escape double quotes in escape_special
printk/console: Check consistent sequence number when handling race in console_unlock()
Delete/fixup few includes in anticipation of global -isystem compile
option removal.
Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition
of uintptr_t error (one definition comes from <stddef.h>, another from
<linux/types.h>).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
As these are now in asm-generic, it's no longer necessary to
declare them in the architecture.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This is a preparation for changing over architectures to the
generic implementation one at a time. As there are no callers
of either __strncpy_from_user() or __strnlen_user(), fold these
into the strncpy_from_user() and strnlen_user() functions to make
each implementation independent of the others.
Many of these implementations have known bugs, but the intention
here is to not change behavior at all and stay compatible with
those bugs for the moment.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:
1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.
As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.
While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.
As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.
One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.
This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.
Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.
This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:
$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.
There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmDnZwAWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wW1BD/9SHWGYhxLY+xL27eO0Q8XOPePb
diqllGavzq3fcakmJ3+6iIpb/WYX0ztu1M4KMBRP3QxNjP6nFkS1ph3PC0LL3ec2
h23hRfOrhlQd4rdonPcq/Z7oXKhrkem9G6KneVfvB94HmXnaZIrNBjwQRy0uRMXE
/IVNH4o6YMR8Av/VrG+L6BS+O/oXVnYVSLOuXsIrxmxS24NybsOpRzHvl14ZUsHt
eiwzcRC3ugAaxJn8cOSrHdBwvdOgbFFWEtMITcesQpYru+EmQcsCZdmJ0DbwsV2e
9k+LrVoy0CZFoekBtaaFvZq+JVBjUZKoAUYBML4ejWnQKolJH0BZQRh4RT0rbTjc
UMiuE3kFUsdJjzJRyO4pcqpwaNhCiZ2XrwyKeev/FLIn95bD1xbLJWfRvoKhioiI
X+1vujN2+N5n8T+u8sCVohujJCkUkMjevfF6ew8rvYOj3FrGqTi4jgrXUFAIsjLa
mHdA92oHIjNOCjyVIqnoUFTDltVMW9CwnLtd5nPnGvJoMtsj7lthy6fdtdPH0WVu
iNR4toE/AjBJo4rtib/irYbZtqmw2AbBFqoRk4yj8Fw4ZdSPYELwAR1aah0Oce9R
t1T9OE66vlr28XIC0NF917JfSNkc2eXnx4B21Zh+a/68XSJ1FzXPTob3lvXVVhQR
Ou4aw6dH7mql/2bq1w==
=wAww
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
* tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove unneeded semicolon in um_arch.c
um: Remove the repeated declaration
um: fix error return code in winch_tramp()
um: fix error return code in slip_open()
um: Fix stack pointer alignment
um: implement flush_cache_vmap/flush_cache_vunmap
um: add a UML specific futex implementation
um: enable the use of optimized xor routines in UML
um: Add support for host CPU flags and alignment
um: allow not setting extra rpaths in the linux binary
um: virtio/pci: enable suspend/resume
um: add PCI over virtio emulation driver
um: irqs: allow invoking time-travel handler multiple times
um: time-travel/signals: fix ndelay() in interrupt
um: expose time-travel mode to userspace side
um: export signals_enabled directly
um: remove unused smp_sigio_handler() declaration
lib: add iomem emulation (logic_iomem)
um: allow disabling NO_IOMEM
Currently most platforms define pmd_pgtable() as pmd_page() duplicating
the same code all over. Instead just define a default value i.e
pmd_page() for pmd_pgtable() and let platforms override when required via
<asm/pgtable.h>. All the existing platform that override pmd_pgtable()
have been moved into their respective <asm/pgtable.h> header in order to
precede before the new generic definition. This makes it much cleaner
with reduced code.
Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the
same code all over. Instead just define a generic default value (i.e 0UL)
for FIRST_USER_ADDRESS and let the platforms override when required. This
makes it much cleaner with reduced code.
The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h>
when the given platform overrides its value via <asm/pgtable.h>.
Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Stafford Horne <shorne@gmail.com> [openrisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [RISC-V]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Function 'os_flush_stdout' is declared twice, so remove the
repeated declaration.
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
vmalloc() heavy workloads in UML are extremely slow, due to
flushing the entire kernel VM space (flush_tlb_kernel_vm())
on the first segfault.
Implement flush_cache_vmap() to avoid that, and while at it
also add flush_cache_vunmap() since it's trivial.
This speeds up my vmalloc() heavy test of copying files out
from /sys/kernel/debug/gcov/ by 30x (from 30s to 1s.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The generic asm futex implementation emulates atomic access to
memory by doing a get_user followed by put_user. These translate
to two mapping operations on UML with paging enabled in the
meantime. This, in turn may end up changing interrupts,
invoking the signal loop, etc.
This replaces the generic implementation by a mapping followed
by an operation on the mapped segment.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This patch enables the use of optimized xor routines from the x86
tree as well as the necessary fpu api shims so they can work on
UML.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
1. Reflect host cpu flags into the UML instance so they can
be used to select the correct implementations for xor, crypto, etc.
2. Reflect host cache alignment into UML instance. This is
important when running 32 bit on a 64 bit host as 32 bit by
default aligns to 32 while the actual alignment should be 64.
Ditto for some Xeons which align at 128.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The UM virtual PCI devices currently cannot be suspended properly
since the virtio driver already disables VQs well before the PCI
bus's suspend_noirq wants to complete the transition by writing to
PCI config space.
After trying around for a long time with moving the devices on the
DPM list, trying to create dependencies between them, etc. I gave
up and instead added UML specific cross-driver API that lets the
virt-pci code enable not suspending/resuming VQs for its devices.
This then allows the PCI bus suspend_noirq to still talk to the
device, and suspend/resume works properly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
To support testing of PCI/PCIe drivers in UML, add a PCI bus
support driver. This driver uses virtio, which in UML is really
just vhost-user, to talk to devices, and adds the devices to
the virtual PCI bus in the system.
Since virtio already allows DMA/bus mastering this really isn't
all that hard, of course we need the logic_iomem infrastructure
that was added by a previous patch.
The protocol to talk to the device is has a few fairly simple
messages for reading to/writing from config and IO spaces, and
messages for the device to send the various interrupts (INT#,
MSI/MSI-X and while suspended PME#).
Note that currently no offical virtio device ID is assigned for
this protocol, as a consequence this patch requires defining it
in the Kconfig, with a default that makes the driver refuse to
work at all.
Finally, in order to add support for MSI/MSI-X interrupts, some
small changes are needed in the UML IRQ code, it needs to have
more interrupts, changing NR_IRQS from 64 to 128 if this driver
is enabled, but not actually use them for anything so that the
generic IRQ domain/MSI infrastructure can allocate IRQ numbers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We should be able to ndelay() from any context, even from an
interrupt context! However, this is broken (not functionally,
but locking-wise) in time-travel because we'll get into the
time-travel code and enable interrupts to handle messages on
other time-travel aware subsystems (only virtio for now).
Luckily, I've already reworked the time-travel aware signal
(interrupt) delivery for suspend/resume to have a time travel
handler, which runs directly in the context of the signal and
not from the Linux interrupt.
In order to fix this time-travel issue then, we need to do a
few things:
1) rework the signal handling code to call time-travel handlers
(only) if interrupts are disabled but signals aren't blocked,
instead of marking it only pending there. This is needed to
not deadlock other communication.
2) rework time-travel to not enable interrupts while it's
waiting for a message;
3) rework time-travel to not (just) disable interrupts but
rather block signals at a lower level while it needs them
disabled for communicating with the controller.
Finally, since now we can actually spend even virtual time
in interrupts-disabled sections, the delay warning when we
deliver a time-travel delayed interrupt is no longer valid,
things can (and should) now get delayed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This will be necessary in the userspace side to fix the
signal/interrupt handling in time-travel=ext mode, which
is the next patch.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Use signals_enabled instead of always jumping through
a function call to read it, there's not much point in
that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This function doesn't exist, remove its declaration.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Adjust the kconfig a little to allow disabling NO_IOMEM in UML. To
make an "allyesconfig" with CONFIG_NO_IOMEM=n build, adjust a few
Kconfig things elsewhere and add dummy asm/fb.h and asm/vga.h files.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Currently when using "W=1" with UML builds, there are over 700 warnings
like so:
CC arch/um/drivers/stderr_console.o
cc1: warning: ./arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
but arch/um/ does not have include/uapi/ at all, so add that
subdir and put one Kbuild file into it (since git does not track
empty subdirs).
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: linux-um@lists.infradead.org
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Use the common kernel style to eliminate a warning:
./arch/um/include/asm/pgtable.h:305:47: warning: suggest braces around empty body in ‘do’ statement [-Wempty-body]
#define update_mmu_cache(vma,address,ptep) do ; while (0)
^
mm/filemap.c:3212:3: note: in expansion of macro ‘update_mmu_cache’
update_mmu_cache(vma, addr, vmf->pte);
^~~~~~~~~~~~~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: linux-um@lists.infradead.org
Signed-off-by: Richard Weinberger <richard@nod.at>
The irq stack switching was moved out of the ASM entry code in course of
the entry code consolidation. It ended up being suboptimal in various
ways.
- Make the stack switching inline so the stackpointer manipulation is not
longer at an easy to find place.
- Get rid of the unnecessary indirect call.
- Avoid the double stack switching in interrupt return and reuse the
interrupt stack for softirq handling.
- A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused
about the stack pointer manipulation.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmA21OcTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaX0D/9S0ud6oqbsIvI8LwhvYub63a2cjKP9
liHAJ7xwMYYVwzf0skwsPb/QE6+onCzdq0upJkgG/gEYm2KbiaMWZ4GgHdj0O7ER
qXKJONDd36AGxSEdaVzLY5kPuD/mkomGk5QdaZaTmjruthkNzg4y/N2wXUBIMZR0
FdpSpp5fGspSZCn/DXDx6FjClwpLI53VclvDs6DcZ2DIBA0K+F/cSLb1UQoDLE1U
hxGeuNa+GhKeeZ5C+q5giho1+ukbwtjMW9WnKHAVNiStjm0uzdqq7ERGi/REvkcB
LY62u5uOSW1zIBMmzUjDDQEqvypB0iFxFCpN8g9sieZjA0zkaUioRTQyR+YIQ8Cp
l8LLir0dVQivR1bHghHDKQJUpdw/4zvDj4mMH10XHqbcOtIxJDOJHC5D00ridsAz
OK0RlbAJBl9FTdLNfdVReBCoehYAO8oefeyMAG12nZeSh5XVUWl238rvzmzIYNhG
cEtkSx2wIUNEA+uSuI+xvfmwpxL7voTGvqmiRDCAFxyO7Bl/GBu9OEBFA1eOvHB+
+wTmPDMswRetQNh4QCRXzk1JzP1Wk5CobUL9iinCWFoTJmnsPPSOWlosN6ewaNXt
kYFpRLy5xt9EP7dlfgBSjiRlthDhTdMrFjD5bsy1vdm1w7HKUo82lHa4O8Hq3PHS
tinKICUqRsbjig==
=Sqr1
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq entry updates from Thomas Gleixner:
"The irq stack switching was moved out of the ASM entry code in course
of the entry code consolidation. It ended up being suboptimal in
various ways.
This reworks the X86 irq stack handling:
- Make the stack switching inline so the stackpointer manipulation is
not longer at an easy to find place.
- Get rid of the unnecessary indirect call.
- Avoid the double stack switching in interrupt return and reuse the
interrupt stack for softirq handling.
- A objtool fix for CONFIG_FRAME_POINTER=y builds where it got
confused about the stack pointer manipulation"
* tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix stack-swizzle for FRAME_POINTER=y
um: Enforce the usage of asm-generic/softirq_stack.h
x86/softirq/64: Inline do_softirq_own_stack()
softirq: Move do_softirq_own_stack() to generic asm header
softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
x86/softirq: Remove indirection in do_softirq_own_stack()
x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall
x86/entry: Convert device interrupts to inline stack switching
x86/entry: Convert system vectors to irq stack macro
x86/irq: Provide macro for inlining irq stack switching
x86/apic: Split out spurious handling code
x86/irq/64: Adjust the per CPU irq stack pointer by 8
x86/irq: Sanitize irq stack tracking
x86/entry: Fix instrumentation annotation
The recent rework of the X86 irq stack switching mechanism broke UM as UM
pulls in the X86 specific variant of softirq_stack.h.
Enforce the usage of the asm-generic variant.
Fixes: 72f40a2823 ("x86/softirq/64: Inline do_softirq_own_stack()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Weinberger <richard@nod.at>
This will get the (no-op) definition of irq_canonicalize()
which some code might want. We could define that ourselves,
but it seems like we'd likely want generic extensions in
the future, if any.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This may be needed for size_t if something doesn't get
it included elsewhere before including <asm/io.h>, so
add the include.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Add a pseudo RTC that simply is able to send an alarm signal
waking up the system at a given time in the future.
Since apparently timerfd_create() FDs don't support SIGIO, we
use the sigio-creating helper thread, which just learned to do
suspend/resume properly in the previous patch.
For time-travel mode, OTOH, just add an event at the specified
time in the future, and that's already sufficient to wake up
the system at that point in time since suspend will just be in
an "endless wait".
For s2idle support also call pm_system_wakeup().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This mostly reverts the old commit 3963333fe6 ("uml: cover stubs
with a VMA") which had added a VMA to the existing PTEs. However,
there's no real reason to have the PTEs in the first place and the
VMA cannot be 'fixed' in place, which leads to bugs that userspace
could try to unmap them and be forcefully killed, or such. Also,
there's a bit of an ugly hole in userspace's address space.
Simplify all this: just install the stub code/page at the top of
the (inner) address space, i.e. put it just above TASK_SIZE. The
pages are simply hard-coded to be mapped in the userspace process
we use to implement an mm context, and they're out of reach of the
inner mmap/munmap/mprotect etc. since they're above TASK_SIZE.
Getting rid of the VMA also makes vma_merge() no longer hit one of
the VM_WARN_ON()s there because we installed a VMA while the code
assumes the stack VMA is the first one.
It also removes a lockdep warning about mmap_sem usage since we no
longer have uml_setup_stubs() and thus no longer need to do any
manipulation that would require mmap_sem in activate_mm().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The userspace stacks mostly have a stack (and in the case of the
syscall stub we can just set their stack pointer) that points to
the location of the stub data page already.
Rework the stubs to use the stack pointer to derive the start of
the data page, rather than requiring it to be hard-coded.
In the clone stub, also integrate the int3 into the stack remap,
since we really must not use the stack while we remap it.
This prepares for putting the stub at a variable location that's
not part of the normal address space of the userspace processes
running inside the UML machine.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If the two are mixed up, then it looks as though the parent
returned an error if the child failed (before) the mmap(),
and then the resulting process never gets killed. Fix this
by splitting the child and parent errors, reporting and
using them appropriately.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
In some cases we can get to fix_range_common() with mmap_sem held,
and in others we get there without it being held. For example, we
get there with it held from sys_mprotect(), and without it held
from fork_handler().
Avoid any issues in this and simply defer killing the task until
it runs the next time. Do it on the mm so that another task that
shares the same mm can't continue running afterwards.
Cc: stable@vger.kernel.org
Fixes: 468f65976a ("um: Fix hung task in fix_range_common()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
powerpc was the last provider of arch_remap() and the last
user of mm-arch-hooks.h.
Since commit 526a9c4a72 ("powerpc/vdso: Provide vdso_remap()"),
arch_remap() hence mm-arch-hooks.h are not used anymore.
Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Richard Weinberger <richard@nod.at>
In external time-travel mode, where time is controlled via the
controller application socket, interrupt handling is a little
tricky. For example on virtio, the following happens:
* we receive a message (that requires an ACK) on the vhost-user socket
* we add a time-travel event to handle the interrupt
(this causes communication on the time socket)
* we ACK the original vhost-user message
* we then handle the interrupt once the event is triggered
This protocol ensures that the sender of the interrupt only continues
to run in the simulation when the time-travel event has been added.
So far, this was only done in the virtio driver, but it was actually
wrong, because only virtqueue interrupts were handled this way, and
config change interrupts were handled immediately. Additionally, the
messages were actually handled in the real Linux interrupt handler,
but Linux interrupt handlers are part of the simulation and shouldn't
run while there's no time event.
To really do this properly and only handle all kinds of interrupts in
the time-travel event when we are scheduled to run in the simulation,
rework this to plug in to the lower interrupt layers in UML directly:
Add a um_request_irq_tt() function that let's a time-travel aware
driver request an interrupt with an additional timetravel_handler()
that is called outside of the context of the simulation, to handle
the message only. It then adds an event to the time-travel calendar
if necessary, and no "real" Linux code runs outside of the time
simulation.
This also hooks in with suspend/resume properly now, since this new
timetravel_handler() can run while Linux is suspended and interrupts
are disabled, and decide to wake up (or not) the system based on the
message it received. Importantly in this case, it ACKs the message
before the system even resumes and interrupts are re-enabled, thus
allowing the simulation to progress properly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This reverts commit 963285b0b4 ("um: support some of
ARCH_HAS_SET_MEMORY"), as it turns out that it's not only not
working (due to um never using the protection bits in the
page tables) but also corrupts the page tables if used on a
non-vmalloc page, since um never allocates proper page tables
for the 'physmem' in the first place.
Fixing all this will take more effort, so for now revert it.
Reported-by: Benjamin Berg <benjamin@sipsolutions.net>
Fixes: 963285b0b4 ("um: support some of ARCH_HAS_SET_MEMORY")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This reverts commit ef4459a6da ("um: allocate a guard page to
helper threads"), it's broken in multiple ways:
1) the free no longer matches the alloc; and
2) more importantly, the set_memory_ro() causes allocation of
page tables for the normal memory that doesn't have any,
and that later causes corruption and crashes (usually but
not always in vfree()).
We could fix the first bug and use vmalloc() to work around the
second, but set_memory_ro() actually doesn't do anything either
so I'll just revert that as well.
Reported-by: Benjamin Berg <benjamin@sipsolutions.net>
Fixes: ef4459a6da ("um: allocate a guard page to helper threads")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Back a few years ago, ioremap() was added to UML so that we'd
not break the build for everything all the time. However, for
some reason, v1 of the patch got applied, rather than the v2
that returned NULL, which was discussed here:
https://lore.kernel.org/lkml/1495726955-27497-1-git-send-email-logang@deltatee.com/
Fix that now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
- IRQ handling cleanups
- Support for suspend
- Various fixes for UML specific drivers: ubd, vector, xterm
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl/bzaIWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wYe+EAC2Q2GfncmAAK20vOMbNKH7KsTl
ZncmGENzOayKMdaWTOQ7ycxUXL/IWsJ5ush8M52CxcaBZMWAsCxEJUf8QHtRbOkL
OVXwW+xNNRioM+9J5ZR0PzDJtBbrDY/kqDCaTsM7/07UGKCih9UbYjdsRICC3pdL
sPSLr7tLWXRNRRd2T+TeK0NQvka/VejasWqlPjMRXM8+eQAklVn7CnQxmemdVdU5
BEbwBF/XKNdmXGTTxOBM1+66lYqd+Yf/cd+VCMyQWWax6/oUczhx2/3n5NpxcI50
Da8FPkh49v0YWsMtCY87UhZHhF2c0fIrRQWlT9E12wLKSvdJvHTMqgj5tZevM9xg
uqWYFiS+qLqkVaXu0WHtJfombrK30zFXCJsbsG3H2qN6CFOqiDaFu8sLcG6AVRhu
anKGq9XbL1+DnFk2x5ExETY9wok8OMbF63yorFryp2aHTCfEXctyoqMidBvAxrpR
QVjcF66ej8Y6KaKxc+PNQ5sQhQC7zakQPm/GlFR5lDJHyuugOOoCXdYkrH1cspgm
DgrQbqo+B0G3G9P80vT5TB3M/bNEd8YLxGakBYFPpHfVAH8N5Adwg8r0ZD8n+rRx
fKiUupw6lfinhTq5au84unaSTtvAiiNcjWN1utCh84VhrlCfP2MvnMHNj7E28mpz
q1CBVpgeFxT3Lcp52g==
=t1mQ
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- IRQ handling cleanups
- Support for suspend
- Various fixes for UML specific drivers: ubd, vector, xterm
* tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (32 commits)
um: Fix build w/o CONFIG_PM_SLEEP
um: time-travel: Correct time event IRQ delivery
um: irq/sigio: Support suspend/resume handling of workaround IRQs
um: time-travel: Actually apply "free-until" optimisation
um: chan_xterm: Fix fd leak
um: tty: Fix handling of close in tty lines
um: Monitor error events in IRQ controller
um: allocate a guard page to helper threads
um: support some of ARCH_HAS_SET_MEMORY
um: time-travel: avoid multiple identical propagations
um: Fetch registers only for signals which need them
um: Support suspend to RAM
um: Allow PM with suspend-to-idle
um: time: Fix read_persistent_clock64() in time-travel
um: Simplify os_idle_sleep() and sleep longer
um: Simplify IRQ handling code
um: Remove IRQ_NONE type
um: irq: Reduce irq_reg allocation
um: irq: Clean up and rename struct irq_fd
um: Clean up alarm IRQ chip name
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/YJxsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjpyEACBdW+YjenjTbkUPeEXzQgkBkTZUYw3g007
DPcUT1g8PQZXYXlQvBKCvGhhIr7/KVcjepKoowiNQfBNGcIPJTVopW58nzpqAfTQ
goI2WYGn5EKFFKBPvtH04cJD/Wo8muXdxynKtqyZbnGGgZjQxPrE259b8dpHjBSR
6L7HHkk0D1oU/5b6h6Ocpg9mc/0iIUCZylySAYY3eGO0JaVPJaXgZSJZYgHxCHll
Lb+/y/fXdtm/0PmQ3ko0ev54g3yEWqZIX0NsZW1asrButIy+KLzQ2Mz1xFLFDMag
prtIfwb8tzgc4dFPY090C/azjCh5CPpxqYS6FkRwS0p86n6OhkyXrqfily5Hs4/B
NC7CBPBSH/j+NKUK7CYZcpTzTpxPjUr9p0anUdlvMJz8FhTb/3YEEZ1UTeWOeHmk
Yo5SxnFghLeZZeZ1ok6rdymnVa7WEX12SCLGQX31BB2mld0tNbKb4b+FsBF6OUMk
IUaX6OjwDFVRaysC88BQ4hjcIP1HxsViG4/VZDX15gjAAH2Pvb+7tev+lcDcOhjz
TCD4GNFspTFzRhh9nT7oxQ679qCh9G9zHbzuIRewnrS6iqvo5SJQB3dR2yrWZRRH
ySkQFiHpYOlnLJYv0jg9COlGwo2FUdcvKhCvkjQKKBz48rzW/IC0LwKdRQWZDFk3
FKGzP/NBig==
=cadT
-----END PGP SIGNATURE-----
Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe:
"This sits on top of of the core entry/exit and x86 entry branch from
the tip tree, which contains the generic and x86 parts of this work.
Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL.
With that done, we can get rid of JOBCTL_TASK_WORK from task_work and
signal.c, and also remove a deadlock work-around in io_uring around
knowing that signal based task_work waking is invoked with the sighand
wait queue head lock.
The motivation for this work is to decouple signal notify based
task_work, of which io_uring is a heavy user of, from sighand. The
sighand lock becomes a huge contention point, particularly for
threaded workloads where it's shared between threads. Even outside of
threaded applications it's slower than it needs to be.
Roman Gershman <romger@amazon.com> reported that his networked
workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU
after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all
spent hammering on the sighand lock, showing 57% of the CPU time there
[1].
There are further cleanups possible on top of this. One example is
TIF_PATCH_PENDING, where a patch already exists to use
TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more
consolidation, but the work stands on its own as well"
[1] https://github.com/axboe/liburing/issues/215
* tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits)
io_uring: remove 'twa_signal_ok' deadlock work-around
kernel: remove checking for TIF_NOTIFY_SIGNAL
signal: kill JOBCTL_TASK_WORK
io_uring: JOBCTL_TASK_WORK is no longer used by task_work
task_work: remove legacy TWA_SIGNAL path
sparc: add support for TIF_NOTIFY_SIGNAL
riscv: add support for TIF_NOTIFY_SIGNAL
nds32: add support for TIF_NOTIFY_SIGNAL
ia64: add support for TIF_NOTIFY_SIGNAL
h8300: add support for TIF_NOTIFY_SIGNAL
c6x: add support for TIF_NOTIFY_SIGNAL
alpha: add support for TIF_NOTIFY_SIGNAL
xtensa: add support for TIF_NOTIFY_SIGNAL
arm: add support for TIF_NOTIFY_SIGNAL
microblaze: add support for TIF_NOTIFY_SIGNAL
hexagon: add support for TIF_NOTIFY_SIGNAL
csky: add support for TIF_NOTIFY_SIGNAL
openrisc: add support for TIF_NOTIFY_SIGNAL
sh: add support for TIF_NOTIFY_SIGNAL
um: add support for TIF_NOTIFY_SIGNAL
...
This is a cleanup series from Nicholas Piggin, preparing for
later changes. The asm/mmu_context.h header are generalized
and common code moved to asm-gneneric/mmu_context.h.
This saves a bit of code and makes it easier to change in
the future.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl/Y1LsACgkQmmx57+YA
GNm6kBAAq4/n6nuNnh6b9LhjXaZRG75gEyW7JvHl8KE5wmZHwDHqbwiQgU1b3lUs
JJGbfKqi5ASKxNg6MpfYodmCOqeTUUYG0FUCb6lMhcxxMdfLTLYBvkNd6Y143M+T
boi5b/iz+OUQdNPzlVeSsUEVsD59FIXmP/GhscWZN9VAyf/aLV2MDBIOhrDSJlPo
ObexnP0Iw1E1NRQYDQ6L2dKTHa6XmHyUtw40ABPmd/6MSd1S+D+j3FGg+CYmvnzG
k9g8FbNby8xtUfc0pZV4W/322WN8cDFF9bc04eTDZiAv1bk9lmfvWJ2bWjs3s2qt
RO/suiZEOAta/WUX9vVLgYn2td00ef+AyjNUgffiUfvQfl++fiCDFTGl+MoCLjbh
xQUPcRuRdED7bMKNrC0CcDOSwWEBWVXvkU/szBLDeE1sPjXzGQ80q1Y72k9y961I
mqg7FrHqjZsxT9luXMAzClHNhXAtvehkJZBIdHlFok83EFoTQp48Da4jaDuOOhlq
p/lkPJWOHegIQMWtGwRyGmG1qzil7b/QBNAPLgu9pF4TA+ySRBEB2BOr2jRSkj6N
mNTHQbSYxBoktdt+VhtrSsxR+i8lwlegx+RNRFmKK3VH5da2nfiBaOY7zBQQHxCK
yxQvXvsljSVpfkFKLc/S2nLQL1zTkRfFKV1Xmd3+3owR+EoqM60=
=NpMX
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic mmu-context cleanup from Arnd Bergmann:
"This is a cleanup series from Nicholas Piggin, preparing for later
changes. The asm/mmu_context.h header are generalized and common code
moved to asm-gneneric/mmu_context.h.
This saves a bit of code and makes it easier to change in the future"
* tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (25 commits)
h8300: Fix generic mmu_context build
m68k: mmu_context: Fix Sun-3 build
xtensa: use asm-generic/mmu_context.h for no-op implementations
x86: use asm-generic/mmu_context.h for no-op implementations
um: use asm-generic/mmu_context.h for no-op implementations
sparc: use asm-generic/mmu_context.h for no-op implementations
sh: use asm-generic/mmu_context.h for no-op implementations
s390: use asm-generic/mmu_context.h for no-op implementations
riscv: use asm-generic/mmu_context.h for no-op implementations
powerpc: use asm-generic/mmu_context.h for no-op implementations
parisc: use asm-generic/mmu_context.h for no-op implementations
openrisc: use asm-generic/mmu_context.h for no-op implementations
nios2: use asm-generic/mmu_context.h for no-op implementations
nds32: use asm-generic/mmu_context.h for no-op implementations
mips: use asm-generic/mmu_context.h for no-op implementations
microblaze: use asm-generic/mmu_context.h for no-op implementations
m68k: use asm-generic/mmu_context.h for no-op implementations
ia64: use asm-generic/mmu_context.h for no-op implementations
hexagon: use asm-generic/mmu_context.h for no-op implementations
csky: use asm-generic/mmu_context.h for no-op implementations
...
Core:
- Consolidation and robustness changes for irq time accounting
- Cleanup and consolidation of irq stats
- Remove the fasteoi IPI flow which has been proved useless
- Provide an interface for converting legacy interrupt mechanism into
irqdomains
Drivers:
The rare event of not having completely new chip driver code, just new
DT bindings and extensions of existing drivers to accomodate new
variants!
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
Thanks,
tglx
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/YwZgTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoW4CD/90rTi1OQrMe3nb5okVjUZmktz/K3BN
Cl5+evFiXiNoH+yJSMIVP+8eMAtBH6RgoaD0EUtSYmgzb9h/JRRQYwtPxobXcMb2
2xcWyLPJkVJL431JKNM8BBRYjLA2VnQ6Ia+Kx3BxqpgKXn5+cEMh1dwIy27Ll2rj
+2NHAQe1sHL7o/KcCDhYqbVIDjw5K/d7YPwjEuPeEoNv1DOxrOCdCEfgFN0jBtRE
CoaRTBskeAaHIzHNp47Mxyz43g4tA/D8kB68X0OjpEykVkPUbgNK1FHSwaPbIsFT
FTSPU3zg8Q6DZ+RGyjNJykIFgUbirlJxARk2c6Ct8Kc3DN6K1jQt4EsU7CXRCc98
BTBjUNeFeNj3irZ4GHhyMKOQJCA1Z5nCRfBUGiW6gK8183us3BLfH5DM1zEsAYUh
DCp+UKsLuXhbB80EWq7kl82/2mNGZ8En8EerE6XJA7Z3JN8FplOHEuLezYYzwzbb
RIes971Vc50J2u2Wf/M2c3PDz3D/4FzfwUeA4LJfTnmOL09RYZ8CsqSckpx4ku/F
XiBnjwtGEpDXWJ8z13DC7yONrxFGByV19+sqHTBlub5DmIs0gXjhC0dKAPAruUIS
iCC+Vx6xLgOpTDu8shFsjibbi9Hb6vuZrF2Te+WR5Rf7d80C0J4b5K5PS4daUjr6
IuD2tz+3CtPjHw==
=iytv
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Generic interrupt and irqchips subsystem updates. Unusually, there is
not a single completely new irq chip driver, just new DT bindings and
extensions of existing drivers to accomodate new variants!
Core:
- Consolidation and robustness changes for irq time accounting
- Cleanup and consolidation of irq stats
- Remove the fasteoi IPI flow which has been proved useless
- Provide an interface for converting legacy interrupt mechanism into
irqdomains
Drivers:
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM
optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups"
* tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
driver core: platform: Add devm_platform_get_irqs_affinity()
ACPI: Drop acpi_dev_irqresource_disabled()
resource: Add irqresource_disabled()
genirq/affinity: Add irq_update_affinity_desc()
irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge
irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device
platform-msi: Track shared domain allocation
irqchip/ti-sci-intr: Fix freeing of irqs
irqchip/ti-sci-inta: Fix printing of inta id on probe success
drivers/irqchip: Remove EZChip NPS interrupt controller
Revert "genirq: Add fasteoi IPI flow"
irqchip/hip04: Make IPIs use handle_percpu_devid_irq()
irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq()
irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq()
irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq()
irqchip/ocelot: Add support for Jaguar2 platforms
irqchip/ocelot: Add support for Serval platforms
irqchip/ocelot: Add support for Luton platforms
irqchip/ocelot: prepare to support more SoC
...
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Remove the fasteoi IPI flow which has been proved useless
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl/Uxq8PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDoW0P/0ZMDvFPxrfnJD46exUgUOPuuFF8jZxAlxD8
7UExqar7u6yX7bbq394jPgtOOxldDagfCx/jCXgb9ja7DK5EHKRcrfjaDT8knHi2
Keg5RaRMRi9TVltvWQTxAkXwSv0Atl881qqsndPeZCez0GNZp+HB34s+rNkZwBOu
MBrWihMQOSv5QE6milsNc7HXLSHM1eLZ7Y2XgumNtKrIGEX9yZI7qwdMofwP8Za3
ayMOvc1WAWaTJI7Mg5ac1yTCVbqLmRHhCtws6c6DMgaRu6SI0itmbpQzkDuJJIe3
k9h4KQPaKAFcQsoo3GV0MKTMm63eq82XT3CAdv+htYRY1z95D2+nzNK+mJtsGptX
gJ2zeJkUb4u+yVtNguL9qjo5ssCXV/6IybJxv6baaEFnSwQMUwqa066NdxmtqfIe
1BOWnc153a7SRbQ34M9/llje+v8YJbueGMS2RFR2LQ6IjjpaHsXh+YCZokfA/kdk
zGbOUD5WWFtFD1T3UoaJ4gFt+pzHjNqym4CcEj4S1Vf5y+POUkNmC+GYK+xfm2Fp
WJMbdIUxJhHFRD9L1ShtfAVUSbp712VOOdILp9rYAkOdqfb51BVUiMUP++s2dGp1
ZIT78qt7kTKT1CxbDdFAjzsi7RoMqdSGYgKmG4sVprELeZnFwq47nBkBr8XEQ1TT
0ccEUOY8
=7Z24
-----END PGP SIGNATURE-----
Merge tag 'irqchip-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates for 5.11 from Marc Zyngier:
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Remove the fasteoi IPI flow which has been proved useless
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
Link: https://lore.kernel.org/r/20201212135626.1479884-1-maz@kernel.org
- Consolidate all kmap_atomic() internals into a generic implementation
which builds the base for the kmap_local() API and make the
kmap_atomic() interface wrappers which handle the disabling/enabling of
preemption and pagefaults.
- Switch the storage from per-CPU to per task and provide scheduler
support for clearing mapping when scheduling out and restoring them
when scheduling back in.
- Merge the migrate_disable/enable() code, which is also part of the
scheduler pull request. This was required to make the kmap_local()
interface available which does not disable preemption when a mapping
is established. It has to disable migration instead to guarantee that
the virtual address of the mapped slot is the same accross preemption.
- Provide better debug facilities: guard pages and enforced utilization
of the mapping mechanics on 64bit systems when the architecture allows
it.
- Provide the new kmap_local() API which can now be used to cleanup the
kmap_atomic() usage sites all over the place. Most of the usage sites
do not require the implicit disabling of preemption and pagefaults so
the penalty on 64bit and 32bit non-highmem systems is removed and quite
some of the code can be simplified. A wholesale conversion is not
possible because some usage depends on the implicit side effects and
some need to be cleaned up because they work around these side effects.
The migrate disable side effect is only effective on highmem systems
and when enforced debugging is enabled. On 64bit and 32bit non-highmem
systems the overhead is completely avoided.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XyQwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUolD/9+R+BX96fGir+I8rG9dc3cbLw5meSi
0I/Nq3PToZMs2Iqv50DsoaPYHHz/M6fcAO9LRIgsE9jRbnY93GnsBM0wU9Y8yQaT
4wUzOG5WHaLDfqIkx/CN9coUl458oEiwOEbn79A2FmPXFzr7IpkufnV3ybGDwzwP
p73bjMJMPPFrsa9ig87YiYfV/5IAZHi82PN8Cq1v4yNzgXRP3Tg6QoAuCO84ZnWF
RYlrfKjcJ2xPdn+RuYyXolPtxr1hJQ0bOUpe4xu/UfeZjxZ7i1wtwLN9kWZe8CKH
+x4Lz8HZZ5QMTQ9sCHOLtKzu2MceMcpISzoQH4/aFQCNMgLn1zLbS790XkYiQCuR
ne9Cua+IqgYfGMG8cq8+bkU9HCNKaXqIBgPEKE/iHYVmqzCOqhW5Cogu4KFekf6V
Wi7pyyUdX2en8BAWpk5NHc8de9cGcc+HXMq2NIcgXjVWvPaqRP6DeITERTZLJOmz
XPxq5oPLGl7wdm7z+ICIaNApy8zuxpzb6sPLNcn7l5OeorViORlUu08AN8587wAj
FiVjp6ZYomg+gyMkiNkDqFOGDH5TMENpOFoB0hNNEyJwwS0xh6CgWuwZcv+N8aPO
HuS/P+tNANbD8ggT4UparXYce7YCtgOf3IG4GA3JJYvYmJ6pU+AZOWRoDScWq4o+
+jlfoJhMbtx5Gg==
=n71I
-----END PGP SIGNATURE-----
Merge tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kmap updates from Thomas Gleixner:
"The new preemtible kmap_local() implementation:
- Consolidate all kmap_atomic() internals into a generic
implementation which builds the base for the kmap_local() API and
make the kmap_atomic() interface wrappers which handle the
disabling/enabling of preemption and pagefaults.
- Switch the storage from per-CPU to per task and provide scheduler
support for clearing mapping when scheduling out and restoring them
when scheduling back in.
- Merge the migrate_disable/enable() code, which is also part of the
scheduler pull request. This was required to make the kmap_local()
interface available which does not disable preemption when a
mapping is established. It has to disable migration instead to
guarantee that the virtual address of the mapped slot is the same
across preemption.
- Provide better debug facilities: guard pages and enforced
utilization of the mapping mechanics on 64bit systems when the
architecture allows it.
- Provide the new kmap_local() API which can now be used to cleanup
the kmap_atomic() usage sites all over the place. Most of the usage
sites do not require the implicit disabling of preemption and
pagefaults so the penalty on 64bit and 32bit non-highmem systems is
removed and quite some of the code can be simplified. A wholesale
conversion is not possible because some usage depends on the
implicit side effects and some need to be cleaned up because they
work around these side effects.
The migrate disable side effect is only effective on highmem
systems and when enforced debugging is enabled. On 64bit and 32bit
non-highmem systems the overhead is completely avoided"
* tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
ARM: highmem: Fix cache_is_vivt() reference
x86/crashdump/32: Simplify copy_oldmem_page()
io-mapping: Provide iomap_local variant
mm/highmem: Provide kmap_local*
sched: highmem: Store local kmaps in task struct
x86: Support kmap_local() forced debugging
mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL
microblaze/mm/highmem: Add dropped #ifdef back
xtensa/mm/highmem: Make generic kmap_atomic() work correctly
mm/highmem: Take kmap_high_get() properly into account
highmem: High implementation details and document API
Documentation/io-mapping: Remove outdated blurb
io-mapping: Cleanup atomic iomap
mm/highmem: Remove the old kmap_atomic cruft
highmem: Get rid of kmap_types.h
xtensa/mm/highmem: Switch to generic kmap atomic
sparc/mm/highmem: Switch to generic kmap atomic
powerpc/mm/highmem: Switch to generic kmap atomic
nds32/mm/highmem: Switch to generic kmap atomic
...
Lockdep (on 5.10-rc) points out that we're delivering IRQs while IRQs
are not even enabled, which clearly shouldn't happen. Defer the time
event IRQ delivery until they actually are enabled.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If the sigio workaround needed to be applied to a file descriptor,
set_irq_wake() wouldn't work for it since it would get polled by
the thread instead of causing SIGIO, and thus could never really
cause a wakeup, since the thread notification FD wasn't marked as
being able to wake up the system.
Fix this by marking the thread's notification FD explicitly as a
wake source FD, i.e. not suppressing SIGIO for it in suspend. In
order to not cause spurious wakeups, we then need to remove all
FDs that shouldn't wake up the system from the polling thread. In
order to do this, add unlocked versions of ignore_sigio_fd() and
add_sigio_fd() (nothing else is happening in suspend, so this is
fine), and also modify ignore_sigio_fd() to return -ENOENT if the
FD wasn't originally in there. This doesn't matter because nothing
else currently checks the return value, but the irq code needs to
know which ones to restore the workaround for.
All told, this lets us use a timerfd for the RTC clock in the next
patch, which doesn't send SIGIO.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We've been running into stack overflows in helper threads
corrupting memory (e.g. because somebody put printf() or
os_info() there), so to avoid those causing hard-to-debug
issues later on, allocate a guard page for helper thread
stacks and mark it read-only.
Unfortunately, the crash dump at that point is useless as
the stack tracer will try to backtrace the *kernel* thread,
not the helper thread, but at least we don't survive to a
random issue caused by corruption.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
For now, only support set_memory_ro()/rw() which we need for
the stack protection in the next patch.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
With all the previous bits in place, we can now also support
suspend to RAM, in the sense that everything is suspended,
not just most, including userspace, processes like in s2idle.
Since um_idle_sleep() now waits forever, we can simply call
that to "suspend" the system.
As before, you can wake it up using SIGUSR1 since we're just
in a pause() call that only needs to return.
In order to implement selective resume from certain devices,
and not have any arbitrary device interrupt wake up, suspend
interrupts by removing SIGIO notification (O_ASYNC) from all
the FDs that are not supposed to wake up the system. However,
swap out the handler so we don't actually handle the SIGIO as
an interrupt.
Since we're in pause(), the mere act of receiving SIGIO wakes
us up, and then after things have been restored enough, re-set
O_ASYNC for all previously suspended FDs, reinstall the proper
SIGIO handler, and send SIGIO to self to process anything that
might now be pending.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
In order to be able to experiment with suspend in UML, add the
minimal work to be able to suspend (s2idle) an instance of UML,
and be able to wake it back up from that state with the USR1
signal sent to the main UML process.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
There really is no reason to pass the amount of time we should
sleep, especially since it's just hard-coded to one second.
Additionally, one second isn't really all that long, and as we
are expecting to be woken up by a signal, we can sleep longer
and avoid doing some work every second, so replace the current
clock_nanosleep() with just an empty select() that can _only_
be woken up by a signal.
We can also remove the deliver_alarm() since we don't need to
do that when we got e.g. SIGIO that woke us up, and if we got
SIGALRM the signal handler will actually (have) run, so it's
just unnecessary extra work.
Similarly, in time-travel mode, just program the wakeup event
from idle to be S64_MAX, which is basically the most you could
ever simulate to. Of course, you should already have an event
in the list that's earlier and will cause a wakeup, normally
that's the regular timer interrupt, though in suspend it may
(later) also be an RTC event. Since actually getting to this
point would be a bug and you can't ever get out again, panic()
on it in the time control code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We don't actually use this in um_request_irq(), so it can
never be assigned. It's also not clear what that would be
useful for, so just remove it.
This results in quite a number of cleanups, all the way to
removing the "SIGIO on close" startup check, since the data
it assigns (pty_close_sigio) is not used anymore.
While at it, also make this an enum so we get a minimum of
type checking, and remove the IRQ_NONE hack in virtio since
we now no longer have the name twice.
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We don't need an array of 4 entries to capture three and the
name 'MAX_IRQ_TYPE' really gets confusing as well. Remove it
and add a correct NUM_IRQ_TYPES, and use that correctly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This really shouldn't be called "irq_fd" since it doesn't
carry an fd. Well, it used to, apparently, but that struct
member is unused.
Rename it to "irq_reg" since it more accurately reflects a
registered interrupt, and remove the unused 'next' and 'fd'
members from the struct as well.
While at it, also move it to the implementation, it's not
used anywhere else, and the header file is shared with the
userspace components.
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This separates the devices, which is better for debug and for
later suspend/resume and wakeup support, since there we'll
have to separate which IRQs can wake up the system and which
cannot.
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
It's cumbersome and error-prone to keep adding fixed IRQ numbers,
and for proper device wakeup support for the virtio/vhost-user
support we need to have different IRQs for each device. Even if
in theory two IRQs (with and without wake) might be sufficient,
it's much easier to reason about it when we have dynamic number
assignment. It also makes it easier to add new devices that may
dynamically exist or depending on the configuration, etc.
Add support for this, up to 64 IRQs (the same limit as epoll FDs
we have right now). Since it's not easy to port all the existing
places to dynamic allocation (some data is statically initialized)
keep the low numbers are reserved for the existing hard-coded IRQ
numbers.
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Wire up TIF_NOTIFY_SIGNAL handling for um.
Cc: linux-um@lists.infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Commit b2b29d6d01 ("mm: account PMD tables like PTE tables") uncovered
a bug in uml, we forgot to call the destructor.
While we are here, give x a sane name.
Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
The header is not longer used and on alpha, ia64, openrisc, parisc and um
it was completely unused anyway as these architectures have no highmem
support.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201103095858.422094352@linutronix.de
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: linux-um@lists.infradead.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was a request to preprocess the module linker script like we
do for the vmlinux one. (https://lkml.org/lkml/2020/8/21/512)
The difference between vmlinux.lds and module.lds is that the latter
is needed for external module builds, thus must be cleaned up by
'make mrproper' instead of 'make clean'. Also, it must be created
by 'make modules_prepare'.
You cannot put it in arch/$(SRCARCH)/kernel/, which is cleaned up by
'make clean'. I moved arch/$(SRCARCH)/kernel/module.lds to
arch/$(SRCARCH)/include/asm/module.lds.h, which is included from
scripts/module.lds.S.
scripts/module.lds is fine because 'make clean' keeps all the
build artifacts under scripts/.
You can add arch-specific sections in <asm/module.lds.h>.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Most architectures define pgd_free() as a wrapper for free_page().
Provide a generic version in asm-generic/pgalloc.h and enable its use for
most architectures.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For most architectures that support >2 levels of page tables,
pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with
__GFP_ZERO and sometimes followed by memset(0) instead.
More elaborate versions on arm64 and x86 account memory for the user page
tables and call to pgtable_pmd_page_ctor() as the part of PMD page
initialization.
Move the arm64 version to include/asm-generic/pgalloc.h and use the
generic version on several architectures.
The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
not enabled, so there is no functional change for most architectures
except of the addition of __GFP_ACCOUNT for allocation of user page
tables.
The pmd_free() is a wrapper for free_page() in all the cases, so no
functional change here.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All architectures define pte_index() as
(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)
and all architectures define pte_offset_kernel() as an entry in the array
of PTEs indexed by the pte_index().
For the most architectures the pte_offset_kernel() implementation relies
on the availability of pmd_page_vaddr() that converts a PMD entry value to
the virtual address of the page containing PTEs array.
Let's move x86 definitions of the PTE accessors to the generic place in
<linux/pgtable.h> and then simply drop the respective definitions from the
other architectures.
The architectures that didn't provide pmd_page_vaddr() are updated to have
that defined.
The generic implementation of pte_offset_kernel() can be overridden by an
architecture and alpha makes use of this because it has special ordering
requirements for its version of pte_offset_kernel().
[rppt@linux.ibm.com: v2]
Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
[akpm@linux-foundation.org: fix x86 warning]
[sfr@canb.auug.org.au: fix powerpc build]
Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This seems to lead to some crazy include loops when using
asm-generic/cacheflush.h on more architectures, so leave it to the arch
header for now.
[hch@lst.de: fix warning]
Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will@kernel.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two independent changes here ended up going into the tree
one after another, without a necessary rename, fix that.
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Fixes: f185063bff ("um: Move timer-internal.h to non-shared")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL
but required to define quite similar fallback stubs for special page
table entry helpers such as pte_special() and pte_mkspecial(), as they
get build in generic MM without a config check. This creates two
generic fallback stub definitions for these helpers, eliminating much
code duplication.
mips platform has a special case where pte_special() and pte_mkspecial()
visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires.
This restricts those symbol visibility in order to avoid redefinitions
which is now exposed through this new generic stubs and subsequent build
failure. arm platform set_pte_at() definition needs to be moved into a
C file just to prevent a build failure.
[anshuman.khandual@arm.com: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas]
Link: http://lkml.kernel.org/r/1583851924-21603-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Stafford Horne <shorne@gmail.com> [openrisc]
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: http://lkml.kernel.org/r/1583802551-15406-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In external or inf-cpu time-travel mode, ndelay/udelay currently
just waste CPU time since the simulation time doesn't advance.
Implement them properly in this case.
Note that the "if (time_travel_mode == ...)" parts compile out
if CONFIG_UML_TIME_TRAVEL_SUPPORT isn't set, time_travel_mode is
defined to TT_MODE_OFF in that case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This implements synchronized time-travel mode which - using a special
application on a unix socket - lets multiple machines take part in a
time-travelling simulation together.
The protocol for the unix domain socket is defined in the new file
include/uapi/linux/um_timetravel.h.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Instead of tracking all the various timer configurations,
modify the time-travel mode to have an event scheduler and
use a timer event on the scheduler to handle the different
timer configurations.
This doesn't change the function right now, but it prepares
the code for having different kinds of events in the future
(i.e. interrupts coming from other devices that are part of
co-simulation.)
While at it, also move time_travel_sleep() to time.c to
reduce the externally visible API surface.
Also, we really should mark time-travel as incompatible with
SMP, even if UML doesn't support SMP yet.
Finally, I noticed a bug while developing this - if we move
time forward due to consuming time while reading the clock,
we might move across the next event and that would cause us
to go backward in time when we then handle that event. Fix
that by invoking the whole event machine in this case, but
in order to simplify this, make reading the clock only cost
something when interrupts are not disabled. Otherwise, we'd
have to hook into the interrupt delivery machinery etc. and
that's somewhat intrusive.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This file isn't really shared, it's only used on the kernel side,
not on the user side. Remove the include from the user-side and
move the file to a better place.
While at it, rename it to time-internal.h, it's not really just
timers but all kinds of things related to timekeeping.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Unfortunately, GCC 9.1 is expected to be be released without support for
MPX. This means that there was only a relatively small window where
folks could have ever used MPX. It failed to gain wide adoption in the
industry, and Linux was the only mainstream OS to ever support it widely.
Support for the feature may also disappear on future processors.
This set completes the process that we started during the 5.4 merge window.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJeK1/pAAoJEGg1lTBwyZKwgC8QAIiVn1d7A9Uj/WpnpgfCChCZ
9XiV6Ak999qD9fbAcrgNfPjieaD4mtokocSRVJuRgJu5iLnIJCINlozLPe4yVl7P
7zebnxkLq0CIA8d56bEUoFlC0J+oWYlDVQePZzNQsSk5KHVGXVLpF6U4vDVzZeQy
cprgvdeY+ehB7G6IIo0MWTg5ylKYAsOAyVvK8NIGpKY2k6/YqCnsptnsVE7bvlHy
TrEOiUWLv+hh0bMkZdP1PwKQKEuMO/IZly0HtviFbMN7T4TB1spfg7ELoBucEq3T
s4EVbYRe+nIE4tuEAveaX3CgxJek8cY5MlticskdaKSEACBwabdOF55qsZy0u+WA
PYC4iUIXfbOH8OgieKWtGX4IuSkRYdQ2nP4BOpe4ZX4+zvU7zOCIyVSKRrwkX8cc
ADtWI5FAtB36KCgUuWnHGHNZpOxPTbTLBuBataFY4Q2uBNJEBJpscZ5H9ObtyGFU
ZjlzqFnM0nFNDKEI1EEtv9jLzgZTU1RQ46s7EFeSeEQ2/s9wJ3+s5sBlVbljsmus
o658bLOEaRWC/aF15dgmEXW9GAO6uifNdmbzGnRn7oEMYyFQPTWbZvi1zGz58QaG
Y6WTtigVtsSrHS4wpYd+p+n1W06VnB6J3BpBM4G1VQv1Vm0dNd1tUOfkqOzPjg7c
33Itmsz2LaW1mb67GlgZ
=g4cC
-----END PGP SIGNATURE-----
Merge tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx
Pull x86 MPX removal from Dave Hansen:
"MPX requires recompiling applications, which requires compiler
support. Unfortunately, GCC 9.1 is expected to be be released without
support for MPX. This means that there was only a relatively small
window where folks could have ever used MPX. It failed to gain wide
adoption in the industry, and Linux was the only mainstream OS to ever
support it widely.
Support for the feature may also disappear on future processors.
This set completes the process that we started during the 5.4 merge
window when the MPX prctl()s were removed. XSAVE support is left in
place, which allows MPX-using KVM guests to continue to function"
* tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx:
x86/mpx: remove MPX from arch/x86
mm: remove arch_bprm_mm_init() hook
x86/mpx: remove bounds exception code
x86/mpx: remove build infrastructure
x86/alternatives: add missing insn.h include
These are updates to device drivers and file systems that for some reason
or another were not included in the kernel in the previous y2038 series.
I've gone through all users of time_t again to make sure the kernel is
in a long-term maintainable state, replacing all remaining references
to time_t with safe alternatives.
Some related parts of the series were picked up into the nfsd, xfs,
alsa and v4l2 trees. A final set of patches in linux-mm removes the now
unused time_t/timeval/timespec types and helper functions after all five
branches are merged for linux-5.6, ensuring that no new users get merged.
As a result, linux-5.6, or my backport of the patches to 5.4 [1], should
be the first release that can serve as a base for a 32-bit system designed
to run beyond year 2038, with a few remaining caveats:
- All user space must be compiled with a 64-bit time_t, which will be
supported in the coming musl-1.2 and glibc-2.32 releases, along with
installed kernel headers from linux-5.6 or higher.
- Applications that use the system call interfaces directly need to be
ported to use the time64 syscalls added in linux-5.1 in place of the
existing system calls. This impacts most users of futex() and seccomp()
as well as programming languages that have their own runtime environment
not based on libc.
- Applications that use a private copy of kernel uapi header files or
their contents may need to update to the linux-5.6 version, in
particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h.
- A few remaining interfaces cannot be changed to pass a 64-bit time_t
in a compatible way, so they must be configured to use CLOCK_MONOTONIC
times or (with a y2106 problem) unsigned 32-bit timestamps. Most
importantly this impacts all users of 'struct input_event'.
- All y2038 problems that are present on 64-bit machines also apply to
32-bit machines. In particular this affects file systems with on-disk
timestamps using signed 32-bit seconds: ext4 with ext3-style small
inodes, ext2, xfs (to be fixed soon) and ufs.
Changes since v1 [2]:
- Add Acks I received
- Rebase to v5.5-rc1, dropping patches that got merged already
- Add NFS, XFS and the final three patches from another series
- Rewrite etnaviv patches
- Add one late revert to avoid an etnaviv regression
[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
[2] https://lore.kernel.org/lkml/20191108213257.3097633-1-arnd@arndb.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJeMYy3AAoJEGCrR//JCVInEGwP/0R+S+ok7vw9OdLVT0lFl07D
IcVabgOWf24imN7m7L7Mlt3nDfxIT4tMpiAXq7eMO3spcyViG18O2LXdSQ4/7QBp
+BlhoMjOP9w34Jyd7mnkFr4vqQALvfIqkS8rFObDtDub2Rfj9PC36MRMIu8BPXlv
RK8bigwJeH/DV38yc5/JeUcD+WuewYLsK9XPWN+4yB4vgGsNU3ZQQ6nnzbR3hMsN
DN8WZ68Y7IBs0Kyxkf+s2zmRXtCa2RiFg/2TUsk5olVAJVaenvte69hq5RSbg1vW
vLi6K8cBoPWL59nqCzcNE+TUhSUg3LOj/a/KWyl76yovz7AlJaNjssOf8ZjHw6sL
MhQqz3hXTxiJDS2Jvbf1yojiYGlzrq/gqcRFGe9jPcZdieMc4/yZCx60G/Exa5Pu
YdMcqMyDWPFyUAFQNWEF59HPheOdj6tb1KpJ6bwgCo3P7QqhLrU4z9w3Py4/ZfBO
4sWcWteSsD6MN/ADJ2WQ56nNxzM2AvkeVJKcF6FCkdngXX9T0GExmZz7SqB5Du99
9lNjIiD5E+LBa/Swo/7n49aYa8x06V1pmHYTZVh9Wkl+CZiO21umezQFrWsfaMTp
xt3c6pFdMG5xNMGpreTAXOmf2R+T6O8IO2qQq/TYjzqOLH7QC830P7avkmml+cK1
LjOBE2TfSeO8Ru1dXV4t
=wx0A
-----END PGP SIGNATURE-----
Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull y2038 updates from Arnd Bergmann:
"Core, driver and file system changes
These are updates to device drivers and file systems that for some
reason or another were not included in the kernel in the previous
y2038 series.
I've gone through all users of time_t again to make sure the kernel is
in a long-term maintainable state, replacing all remaining references
to time_t with safe alternatives.
Some related parts of the series were picked up into the nfsd, xfs,
alsa and v4l2 trees. A final set of patches in linux-mm removes the
now unused time_t/timeval/timespec types and helper functions after
all five branches are merged for linux-5.6, ensuring that no new users
get merged.
As a result, linux-5.6, or my backport of the patches to 5.4 [1],
should be the first release that can serve as a base for a 32-bit
system designed to run beyond year 2038, with a few remaining caveats:
- All user space must be compiled with a 64-bit time_t, which will be
supported in the coming musl-1.2 and glibc-2.32 releases, along
with installed kernel headers from linux-5.6 or higher.
- Applications that use the system call interfaces directly need to
be ported to use the time64 syscalls added in linux-5.1 in place of
the existing system calls. This impacts most users of futex() and
seccomp() as well as programming languages that have their own
runtime environment not based on libc.
- Applications that use a private copy of kernel uapi header files or
their contents may need to update to the linux-5.6 version, in
particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
linux/elfcore.h, linux/sockios.h, linux/timex.h and
linux/can/bcm.h.
- A few remaining interfaces cannot be changed to pass a 64-bit
time_t in a compatible way, so they must be configured to use
CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
timestamps. Most importantly this impacts all users of 'struct
input_event'.
- All y2038 problems that are present on 64-bit machines also apply
to 32-bit machines. In particular this affects file systems with
on-disk timestamps using signed 32-bit seconds: ext4 with
ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"
[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
* tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
y2038: sh: remove timeval/timespec usage from headers
y2038: sparc: remove use of struct timex
y2038: rename itimerval to __kernel_old_itimerval
y2038: remove obsolete jiffies conversion functions
nfs: fscache: use timespec64 in inode auxdata
nfs: fix timstamp debug prints
nfs: use time64_t internally
sunrpc: convert to time64_t for expiry
drm/etnaviv: avoid deprecated timespec
drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
drm/msm: avoid using 'timespec'
hfs/hfsplus: use 64-bit inode timestamps
hostfs: pass 64-bit timestamps to/from user space
packet: clarify timestamp overflow
tsacct: add 64-bit btime field
acct: stop using get_seconds()
um: ubd: use 64-bit time_t where possible
xtensa: ISS: avoid struct timeval
dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
...
- Fix for time travel mode
- Disable CONFIG_CONSTRUCTORS again
- A new command line option to have an non-raw serial line
- Preparations to remove obsolete UML network drivers
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl4k2EYWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTe2EACDEsoWZvvKnocFH/umFfZdxciU
Ys5noEPElnILVIwV+Gm9SHq/RQWzG8BqSOirfOn1iGhEqWjDTPzwqPuqFGxKtRVp
VoaYDA506oDH903i4vj1OuGDHxgModEmR/GFqU9uEtXUws2qbeZQcG0COkquJU8X
URMz4XB+KLqDI2TvOTnbWevjJnslwLIqRuDdZ2q0d685J1XhRhuq/srgZGMiUpGn
4H/E4k0UxlC082oh9QWRFYYyc6vhyvlguupphzBgICZQmP4P4ck3pe23OT+vOWBl
+e2ti9MlB9/Tv3dGhzmq2180U0D74RvtHIi7RjUdaTcEoOkgDwXqKsZ1CY4kCV78
mxrXHCE6YUMvsQcTBxobXYD/zUXeqXtlSHyGQ4MUATCvI6ag8vWKWjGXV/kDVWdf
FEeL0O6AHjruTrPxi1aSJ3TFG+JerXCGZpSt2DG67sCcWJ/RqYnrs45DF4U6ywf4
BQ/nA0bpdZouLrhtCS6yBRvPiA5TVXHmrQMpK/LsOpBD4sKCV+MXghbYoWAwcSoM
H+RSpf1em3zQrlRcuNPW8XGVkqOmUKn9pFzT9ybWv0h2hVhrDiutjJEPgbpJooIr
yB0G/MVTtk3Xrok2lq8TT+Hp13TWCTFynsmKYvgv4s37p5jA5fvKL0vhdhIlAxHE
FCyGsZIkAcMLfjvC3Q==
=yi/o
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Anton Ivanov:
"I am sending this on behalf of Richard who is traveling.
This contains the following changes for UML:
- Fix for time travel mode
- Disable CONFIG_CONSTRUCTORS again
- A new command line option to have an non-raw serial line
- Preparations to remove obsolete UML network drivers"
* tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Fix time-travel=inf-cpu with xor/raid6
Revert "um: Enable CONFIG_CONSTRUCTORS"
um: Mark non-vector net transports as obsolete
um: Add an option to make serial driver non-raw
From: Dave Hansen <dave.hansen@linux.intel.com>
MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).
arch_bprm_mm_init() is used at execve() time. The only non-stub
implementation is on x86 for MPX. Remove the hook entirely from
all architectures and generic code.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Today, I erroneously built a time-travel configuration with btrfs
enabled, and noticed it cannot boot in time-travel=inf-cpu mode,
both xor and raid6 speed measurement gets stuck.
For xor, work around it by picking the first algorithm if inf-cpu
mode is enabled.
For raid6, I didn't find such a workaround, so disallow enabling
time-travel mode if RAID6_PQ_BENCHMARK is enabled.
With this, and RAID6_PQ_BENCHMARK disabled, I can boot a kernel
that has btrfs enabled in time-travel=inf-cpu mode.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This reverts commit 786b2384bf ("um: Enable CONFIG_CONSTRUCTORS").
There are two issues with this commit, uncovered by Anton in tests
on some (Debian) systems:
1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
isn't set. Don't recall now if it just wasn't needed on my system, or
if I never tested this case.
2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
unexpected since whatever wanted to run is likely to have to run
before the kernel init etc. that calls the constructors in this case.
Basically, some constructors that gcc emits (libc has?) need to run
very early during init; the failure mode otherwise was that the ptrace
fork test already failed:
----------------------
$ ./linux mem=512M
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
Aborted
----------------------
Thinking more about this, it's clear that we simply cannot support
CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
involve not use of the __attribute__((constructor)), but instead
some constructor code/entry generated by gcc. Therefore, we cannot
distinguish between kernel constructors and system constructors.
Thus, revert this commit.
Cc: stable@vger.kernel.org [5.4+]
Fixes: 786b2384bf ("um: Enable CONFIG_CONSTRUCTORS")
Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
The ubd code suffers from a possible y2038 overflow on 32-bit
architectures, both for the cow header and the os_file_modtime()
function.
Replace time_t with time64_t to extend the ubd_kern side as much
as possible.
Whether this makes a difference for the user side depends on
the host libc implementation that may use either 32-bit or 64-bit
time_t.
For the cow file format, the header contains an unsigned 32-bit
timestamp, which is good until y2106, passing this through a
'long long' gives us a consistent interpretation between 32-bit
and 64-bit um kernels.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In the x86 MM code we'd like to untangle various types of historic
header dependency spaghetti, but for this we'd need to pass to
the generic vmalloc code various vmalloc related defines that
customarily come via the <asm/page.h> low level arch header.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The UML port uses 4 and 5 level fixups to support higher level page
table directories in the generic VM code.
Implement primitives necessary for the 4th level folding, add walks of
p4d level where appropriate and drop usage of __ARCH_USE_5LEVEL_HACK.
Link: http://lkml.kernel.org/r/1572938135-31886-13-git-send-email-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Anatoly Pugachev <matorola@gmail.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Peter Rosin <peda@axentia.se>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
people, and until recently arm64 used these erroneously/pointlessly for
other levels of page table.
To make it incredibly clear that these only apply to the PTE level, and to
align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
to pgtable_pte_page_{ctor,dtor}().
These changes were generated with the following shell script:
----
git grep -lw 'pgtable_page_.tor' | while read FILE; do
sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
done
----
... with the documentation re-flowed to remain under 80 columns, and
whitespace fixed up in macros to keep backslashes aligned.
There should be no functional change as a result of this patch.
Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
cache for page table allocations on several architectures that do not use
PAGE_SIZE tables for one or more levels of the page table hierarchy.
Most architectures do not implement these functions and use __weak default
NOP implementation of pgd_cache_init(). Since there is no such default
for pgtable_cache_init(), its empty stub is duplicated among most
architectures.
Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
drop empty stubs of pgtable_cache_init().
Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org> [arm64]
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: remove quicklist page table caches".
A while ago Nicholas proposed to remove quicklist page table caches [1].
I've rebased his patch on the curren upstream and switched ia64 and sh to
use generic versions of PTE allocation.
[1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com
This patch (of 3):
Remove page table allocator "quicklists". These have been around for a
long time, but have not got much traction in the last decade and are only
used on ia64 and sh architectures.
The numbers in the initial commit look interesting but probably don't
apply anymore. If anybody wants to resurrect this it's in the git
history, but it's unhelpful to have this code and divergent allocator
behaviour for minor archs.
Also it might be better to instead make more general improvements to page
allocator if this is still so slow.
Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert files to use SPDX header. All files are licensed under the GPLv2.
Signed-off-by: Alex Dewar <alex.dewar@gmx.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
This module allows virtio devices to be used over a vhost-user socket.
Signed-off-by: Erel Geron <erelx.geron@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
UML has its own platform-specific barrier.h under arch/x86/um/,
which should get used. Fix the build system to use it, and then
fix the barrier.h to actually compile.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Periodic timers are broken, because the also only fire once.
As it happens, Linux doesn't care because it only sets the
timer to periodic very briefly during boot, and then switches
it only between one-shot and off later.
Nevertheless, fix the logic (we shouldn't even be looking at
time_travel_timer_expiry unless the timer is enabled) and
change the code to fire the timer periodically in periodic
mode, in case it ever gets used in the future.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We do need to call the constructors for *modules*, and
at least for KASAN in the future, we must call even the
kernel constructors only later when the kernel has been
initialized.
Instead of relying on libc to call them, emit an empty
section for libc and let the kernel's CONSTRUCTORS code
do the rest of the job.
Tested that it indeed doesn't work in modules, and does
work after the fixes in both, with a few functions with
__attribute__((constructor)) in both dynamic and static
builds.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
UML enables TRACE_IRQFLAGS_SUPPORT but doesn't actually implement
it. It seems to have been added for lockdep support, but that can't
actually really work well without IRQ flags tracing, as is also
very noisily reported when enabling CONFIG_DEBUG_LOCKDEP.
Implement it now.
Fixes: 711553efa5 ("[PATCH] uml: declare in Kconfig our partial LOCKDEP support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Due to the typo in the name, this can never be used, but
it's also misleading because our value for enabled/disabled
is always just 0/1, not an actual signal mask.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fix an off-by-one in IRQ enumeration
Fixes: 49da7e64f3 ("High Performance UML Vector Network Driver")
Reported by: Dana Johnson <djohns042@gmail.com>
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Unfortunately, my build fix for when time travel mode isn't
enabled broke time travel mode, because I forgot that we need
to use the timer time after the timer has been marked disabled,
and thus need to leave the time stored instead of zeroing it.
Fix that by splitting the inline into two, so we can call only
the _mode() one in the relevant code path.
Fixes: b482e48d29 ("um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
- A new timer mode, time travel, for testing with UML
- Many bugixes/improvements for the serial line driver
- Various bugfixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl0reewWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wd4QEAC2KZoi6omge+nD7+tglfAdbZ0E
X2HE3clA2tE5KqbgasT1IGZeZ/JE5wzjYJ38U1qdHd9RPeerX/snib4vyru4FSMd
a39wjNbaqP/csPMLBukYEGs7Y4sSl1KzkRUdS9XkskCymkduhYyNbCc2WPMvAwBG
xw6ffQzY/+zvC0e974jygjKbIEpU+uQ9LzwLnCKM/qKih4owwSA6Rj3tZwBSSQdG
0BKR3o2J06ZXBiJjW+5vyMRU7N5Id/t6hf9OBhLqRk1YbfbebjVRNR2ghLSNvCF+
3arPlE4T9tsjuZY+CCZh2LrrG8gzTx1M8pVlSFdgtqKCCp7MO40Q9cIhjmMYevym
Zct8iLUtSUuIHU4/q2k7LeSPOiF6eEjbuVj2aEFTc8LSg/zYG/lF7xXESPkm2pf+
eYQN2f8ML9fL183nEVkRxXhZwqCKSS7ktcKO0bRj3UsbdiJxRVvfe1POTWsvvuVi
uV5YHgFBAhqcVabM2F9dOwk/4JRnNqJTGAUAVOwiyvk64sXLp/44DM/GbHgPMkSH
uVqt70Yzt07RZ/2xDODW51xFx3WgbvmsKB6zN4Y7CAuc0CXBOSc61xgNFVhQdTrP
sfAph4yUGs9mMyhFrdTVZaleZXA3Eo3V5FRvrESNNj53UzD2dRFKO065t58YMFuS
UTqTJA2AHsJQ9j2TRA==
=xxGu
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- A new timer mode, time travel, for testing with UML
- Many bugixes/improvements for the serial line driver
- Various bugfixes
* tag 'for-linus-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT
um: Fix kcov crash during startup
um: configs: Remove useless UEVENT_HELPER_PATH
um: Support time travel mode
um: Pass nsecs to os timer functions
um: Remove drivers/ssl.h
um: Don't garbage collect in deactivate_all_fds()
um: Silence lockdep complaint about mmap_sem
um: Remove locking in deactivate_all_fds()
um: Timer code cleanup
um: fix os_timer_one_shot()
um: Fix IRQ controller regression on console read
um allocates PTE pages with __get_free_page() and uses
GFP_KERNEL | __GFP_ZERO for the allocations.
Switch it to the generic version that does exactly the same thing for the
kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
The pte_free() and pte_free_kernel() versions are identical to the generic
ones and can be simply dropped.
Link: http://lkml.kernel.org/r/1557296232-15361-14-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_UML_TIME_TRAVEL_SUPPORT isn't set, the build was broken.
Fix this.
Fixes: 065038706f ("um: Support time travel mode")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Sometimes it can be useful to run with "time travel" inside the
UML instance, for example for testing. For example, some tests
for the wireless subsystem and userspace are based on hwsim, a
virtual wireless adapter. Some tests can take a long time to
run because they e.g. wait for 120 seconds to elapse for some
regulatory checks. This obviously goes faster if it need not
actually wait that long, but time inside the test environment
just "bumps up" when there's nothing to do.
Add CONFIG_UML_TIME_TRAVEL_SUPPORT to enable code to support
such modes at runtime, selected on the command line:
* just "time-travel", in which time inside the UML instance
can move faster than real time, if there's nothing to do
* "time-travel=inf-cpu" in which time also moves slower and
any CPU processing takes no time at all, which allows to
implement consistent behaviour regardless of host CPU load
(or speed) or debug overhead.
An additional "time-travel-start=<seconds>" parameter is also
supported in this case to start the wall clock at this time
(in unix epoch).
With this enabled, the test mentioned above goes from a runtime
of about 140 seconds (with startup overhead and all) to being
CPU bound and finishing in 15 seconds (on my slow laptop).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This makes the code clearer and lets the time travel patch have
the actual time used for these functions in just one place.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
When we get into activate_mm(), lockdep complains that we're doing
something strange:
WARNING: possible circular locking dependency detected
5.1.0-10252-gb00152307319-dirty #121 Not tainted
------------------------------------------------------
inside.sh/366 is trying to acquire lock:
(____ptrval____) (&(&p->alloc_lock)->rlock){+.+.}, at: flush_old_exec+0x703/0x8d7
but task is already holding lock:
(____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_sem){++++}:
[...]
__lock_acquire+0x12ab/0x139f
lock_acquire+0x155/0x18e
down_write+0x3f/0x98
flush_old_exec+0x748/0x8d7
load_elf_binary+0x2ca/0xddb
[...]
-> #0 (&(&p->alloc_lock)->rlock){+.+.}:
[...]
__lock_acquire+0x12ab/0x139f
lock_acquire+0x155/0x18e
_raw_spin_lock+0x30/0x83
flush_old_exec+0x703/0x8d7
load_elf_binary+0x2ca/0xddb
[...]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&(&p->alloc_lock)->rlock);
lock(&mm->mmap_sem);
lock(&(&p->alloc_lock)->rlock);
*** DEADLOCK ***
2 locks held by inside.sh/366:
#0: (____ptrval____) (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file+0x12d/0x869
#1: (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7
stack backtrace:
CPU: 0 PID: 366 Comm: inside.sh Not tainted 5.1.0-10252-gb00152307319-dirty #121
Stack:
[...]
Call Trace:
[<600420de>] show_stack+0x13b/0x155
[<6048906b>] dump_stack+0x2a/0x2c
[<6009ae64>] print_circular_bug+0x332/0x343
[<6009c5c6>] check_prev_add+0x669/0xdad
[<600a06b4>] __lock_acquire+0x12ab/0x139f
[<6009f3d0>] lock_acquire+0x155/0x18e
[<604a07e0>] _raw_spin_lock+0x30/0x83
[<60151e6a>] flush_old_exec+0x703/0x8d7
[<601a8eb8>] load_elf_binary+0x2ca/0xddb
[...]
I think it's because in exec_mmap() we have
down_read(&old_mm->mmap_sem);
...
task_lock(tsk);
...
activate_mm(active_mm, mm);
(which does down_write(&mm->mmap_sem))
I'm not really sure why lockdep throws in the whole knowledge
about the task lock, but it seems that old_mm and mm shouldn't
ever be the same (and it doesn't deadlock) so tell lockdep that
they're different.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
There are some unused functions, and some others that have
unused arguments; clean up the timer code a bit.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
os_timer_one_shot() gets passed a value "unsigned long delta",
so must not have an "int ticks" as that actually ends up being
-1, and thus triggering a timer over and over again.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0
Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull core fixes from Ingo Molnar:
"This fixes a particularly thorny munmap() bug with MPX, plus fixes a
host build environment assumption in objtool"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Allow AR to be overridden with HOSTAR
x86/mpx, mm/core: Fix recursive munmap() corruption
- Kconfig cleanups
- Fix cpu_all_mask() usage
- Various bug fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlzYi30WHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wdDcD/wLx0xljjSb+j08VVSvVWGah1Vl
DMVyLp1Eik8KRnc6vR+IfC6qDE2+QmJvcLLx4IQ8wpgce+mvhLSy0+8SNsU9tz7t
7ZYVR++L3If3dx72J1aJquQt4PNLQn7QAdPWOA/FiYy4mqjxZUg4HVwf/Oge/2Un
jfom649xl1gdcYlXTCOadb4Xmqo1BSEW+Ms1zqrQlBpU6ePMvojPkjBMdaCbCjMg
bLt4XjtVbgBH3FnH0ZvuDzrMW229LiLot4KF0iUW36/gV/ZRATbinst5AQ5mUsMP
GgrqbeU+wDdzt73p/l1NG7u3DZHOhoAW1ZWTqwBMKiazQiJPa90V9TIOwbnSl7zc
hBEKKkU/u6p5E5TADcTty9ZJfCM+3Zatqt004WSbi+ug363G08XrTb3wWz6AruQ/
9shTUmzwYsK1Bzllf2T2WShBrN+vMdmpzf4+v66N1KhcPrb7Eh81N/VhQG+rvfSb
Ju/lDhu6OxlHr9OlGinI0SCLgjpk3qWcNd1noFdQsTewIopQsOL6H4R7711md3ow
PWl7HAspvCRD3ub12y0wS3bb/4AUyoBrMDT/VBfk2vH0BbCzlR/ckaKE+lk2Y2Mr
BpURt1zcqnpqi5LqRC//dhCFPyzpXd+yYVy1P6bN8q5lvfuIoaRdl2YeWjMfoo0v
r+loEdGNa57Qj67ncg==
=HB9o
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Kconfig cleanups
- Fix cpu_all_mask() usage
- Various bug fixes
* tag 'for-linus-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: irq: don't set the chip for all irqs
um: define set_pte_at() as a static inline function, not a macro
um: remove uses of variable length arrays
um: remove unused variable
uml: fix a boot splat wrt use of cpu_all_mask
um: Do not unlock mutex that is not hold.
hostfs: fix mismatch between link_file definition and declaration
arch: um: drivers: Kconfig: pedantic formatting
arch: um: Kconfig: pedantic indention cleanups
um: Revert to using stack for pt_regs in signal handling
This is a bit of a mess, to put it mildly. But, it's a bug
that only seems to have showed up in 4.20 but wasn't noticed
until now, because nobody uses MPX.
MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory. When
memory is unmapped, there is also a need to unmap the MPX
bounds tables. Barring this, unused bounds tables can eat 80%
of the address space.
But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state. It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.
See the "long story" further below for the gory details.
To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful. Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.
== UML / unicore32 impact ==
Remove unused 'vma' argument to arch_unmap(). No functional
change.
I compile tested this on UML but not unicore32.
== powerpc impact ==
powerpc uses arch_unmap() well to watch for munmap() on the
VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
arch_unmap() makes this happen earlier in __do_munmap(). But,
'vdso_base' seems to only be used in perf and in the signal
delivery that happens near the return to userspace. I can not
find any likely impact to powerpc, other than the zeroing
happening a little earlier.
powerpc does not use the 'vma' argument and is unaffected by
its removal.
I compile-tested a 64-bit powerpc defconfig.
== x86 impact ==
For the common success case this is functionally identical to
what was there before. For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use. But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).
I can't imagine anyone doing this:
ptr = mmap();
// use ptr
ret = munmap(ptr);
if (ret)
// oh, there was an error, I'll
// keep using ptr.
Because if you're doing munmap(), you are *done* with the
memory. There's probably no good data in there _anyway_.
This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.
The long story:
munmap() has a couple of pieces:
1. Find the affected VMA(s)
2. Split the start/end one(s) if neceesary
3. Pull the VMAs out of the rbtree
4. Actually zap the memory via unmap_region(), including
freeing page tables (or queueing them to be freed).
5. Fix up some of the accounting (like fput()) and actually
free the VMA itself.
This specific ordering was actually introduced by:
dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
during the 4.20 merge window. The previous __do_munmap() code
was actually safe because the only thing after arch_unmap() was
remove_vma_list(). arch_unmap() could not see 'vma' in the
rbtree because it was detached, so it is not even capable of
doing operations unsafe for remove_vma_list()'s use of 'vma'.
Richard Biener reported a test that shows this in dmesg:
[1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
[1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
What triggered this was the recursive do_munmap() called via
arch_unmap(). It was freeing page tables that has not been
properly zapped.
But, the problem was bigger than this. For one, arch_unmap()
can free VMAs. But, the calling __do_munmap() has variables
that *point* to VMAs and obviously can't handle them just
getting freed while the pointer is still in use.
I tried a couple of things here. First, I tried to fix the page
table freeing problem in isolation, but I then found the VMA
issue. I also tried having the MPX code return a flag if it
modified the rbtree which would force __do_munmap() to re-walk
to restart. That spiralled out of control in complexity pretty
fast.
Just moving arch_unmap() and accepting that the bonkers failure
case might eat some bounds tables seems like the simplest viable
fix.
This was also reported in the following kernel bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=203123
There are some reports that this commit triggered this bug:
dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
While that commit certainly made the issues easier to hit, I believe
the fundamental issue has been with us as long as MPX itself, thus
the Fixes: tag below is for one of the original MPX commits.
[ mingo: Minor edits to the changelog and the patch. ]
Reported-by: Richard Biener <rguenther@suse.de>
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-um@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@vger.kernel.org
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When defined as macro, the mm argument is unused and subsequently the
variable passed as mm is considered unused by the compiler. This fixes
a build warning.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB
YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/
VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1
GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm
rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7
yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL
WecjKC9WqBxrGY+4ew6YJP70ijLBCw==
=aC8m
-----END PGP SIGNATURE-----
Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull mmiowb removal from Will Deacon:
"Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
The only relatively recent changes have been addressing review
comments on the documentation, which is in a much better shape thanks
to the efforts of Ben and Ingo.
I was initially planning to split this into two pull requests so that
you could run the coccinelle script yourself, however it's been plain
sailing in linux-next so I've just included the whole lot here to keep
things simple"
* tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits)
docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread
docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section
arch: Remove dummy mmiowb() definitions from arch code
net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
i40iw: Redefine i40iw_mmiowb() to do nothing
scsi/qla1280: Remove stale comment about mmiowb()
drivers: Remove explicit invocations of mmiowb()
drivers: Remove useless trailing comments from mmiowb() invocations
Documentation: Kill all references to mmiowb()
riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
m68k/io: Remove useless definition of mmiowb()
nds32/io: Remove useless definition of mmiowb()
x86/io: Remove useless definition of mmiowb()
arm64/io: Remove useless definition of mmiowb()
ARM/io: Remove useless definition of mmiowb()
mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
...
Pull unified TLB flushing from Ingo Molnar:
"This contains the generic mmu_gather feature from Peter Zijlstra,
which is an all-arch unification of TLB flushing APIs, via the
following (broad) steps:
- enhance the <asm-generic/tlb.h> APIs to cover more arch details
- convert most TLB flushing arch implementations to the generic
<asm-generic/tlb.h> APIs.
- remove leftovers of per arch implementations
After this series every single architecture makes use of the unified
TLB flushing APIs"
* 'core-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/resource: Use resource_overlaps() to simplify region_intersects()
ia64/tlb: Eradicate tlb_migrate_finish() callback
asm-generic/tlb: Remove tlb_table_flush()
asm-generic/tlb: Remove tlb_flush_mmu_free()
asm-generic/tlb: Remove CONFIG_HAVE_GENERIC_MMU_GATHER
asm-generic/tlb: Remove arch_tlb*_mmu()
s390/tlb: Convert to generic mmu_gather
asm-generic/tlb: Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y
arch/tlb: Clean up simple architectures
um/tlb: Convert to generic mmu_gather
sh/tlb: Convert SH to generic mmu_gather
ia64/tlb: Convert to generic mmu_gather
arm/tlb: Convert to generic mmu_gather
asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
asm-generic/tlb, ia64: Conditionally provide tlb_migrate_finish()
asm-generic/tlb: Provide generic tlb_flush() based on flush_tlb_mm()
asm-generic/tlb, arch: Provide generic tlb_flush() based on flush_tlb_range()
asm-generic/tlb, arch: Provide generic VIPT cache flush
asm-generic/tlb, arch: Provide CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
asm-generic/tlb: Provide a comment
Hook up asm-generic/mmiowb.h to Kbuild for all architectures so that we
can subsequently include asm/mmiowb.h from core code.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Generic mmu_gather provides the simple flush_tlb_range() based range
tracking mmu_gather UM needs.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the mmu_gather::page_size things into the generic code instead of
PowerPC specific bits.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We're (finally) phasing out a.out support for good. As Borislav Petkov
points out, we've supported ELF binaries for about 25 years by now, and
coredumping in particular has bitrotted over the years.
None of the tool chains even support generating a.out binaries any more,
and the plan is to deprecate a.out support entirely for the kernel. But
I want to start with just removing the core dumping code, because I can
still imagine that somebody actually might want to support a.out as a
simpler biinary format.
Particularly if you generate some random binaries on the fly, ELF is a
much more complicated format (admittedly ELF also does have a lot of
toolchain support, mitigating that complexity a lot and you really
should have moved over in the last 25 years).
So it's at least somewhat possible that somebody out there has some
workflow that still involves generating and running a.out executables.
In contrast, it's very unlikely that anybody depends on debugging any
legacy a.out core files. But regardless, I want this phase-out to be
done in two steps, so that we can resurrect a.out support (if needed)
without having to resurrect the core file dumping that is almost
certainly not needed.
Jann Horn pointed to the <asm/a.out-core.h> file that my first trivial
cut at this had missed.
And Alan Cox points out that the a.out binary loader _could_ be done in
user space if somebody wants to, but we might keep just the loader in
the kernel if somebody really wants it, since the loader isn't that big
and has no really odd special cases like the core dumping does.
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Add support for fast mremap".
This series speeds up the mremap(2) syscall by copying page tables at
the PMD level even for non-THP systems. There is concern that the extra
'address' argument that mremap passes to pte_alloc may do something
subtle architecture related in the future that may make the scheme not
work. Also we find that there is no point in passing the 'address' to
pte_alloc since its unused. This patch therefore removes this argument
tree-wide resulting in a nice negative diff as well. Also ensuring
along the way that the enabled architectures do not do anything funky
with the 'address' argument that goes unnoticed by the optimization.
Build and boot tested on x86-64. Build tested on arm64. The config
enablement patch for arm64 will be posted in the future after more
testing.
The changes were obtained by applying the following Coccinelle script.
(thanks Julia for answering all Coccinelle questions!).
Following fix ups were done manually:
* Removal of address argument from pte_fragment_alloc
* Removal of pte_alloc_one_fast definitions from m68k and microblaze.
// Options: --include-headers --no-includes
// Note: I split the 'identifier fn' line, so if you are manually
// running it, please unsplit it so it runs for you.
virtual patch
@pte_alloc_func_def depends on patch exists@
identifier E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
type T2;
@@
fn(...
- , T2 E2
)
{ ... }
@pte_alloc_func_proto_noarg depends on patch exists@
type T1, T2, T3, T4;
identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
(
- T3 fn(T1, T2);
+ T3 fn(T1);
|
- T3 fn(T1, T2, T4);
+ T3 fn(T1, T2);
)
@pte_alloc_func_proto depends on patch exists@
identifier E1, E2, E4;
type T1, T2, T3, T4;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
(
- T3 fn(T1 E1, T2 E2);
+ T3 fn(T1 E1);
|
- T3 fn(T1 E1, T2 E2, T4 E4);
+ T3 fn(T1 E1, T2 E2);
)
@pte_alloc_func_call depends on patch exists@
expression E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
fn(...
-, E2
)
@pte_alloc_macro depends on patch exists@
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
identifier a, b, c;
expression e;
position p;
@@
(
- #define fn(a, b, c) e
+ #define fn(a, b) e
|
- #define fn(a, b) e
+ #define fn(a) e
)
Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
reenable_fd has been a NOP since the introduction of the EPOLL
based interrupt controller.
reenable_channel() is no longer needed as the flow control is
now handled via the write IRQs on the channel.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This commit removes redundant generic-y defines in
arch/um/include/asm/Kbuild.
It is redundant to define generic-y when arch-specific implementation
exists in arch/$(ARCH)/include/asm/*.h
Remove the following generic-y:
hardirq.h
io.h
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Changing protection is a very high cost operation in UML
because in addition to an extra syscall it also interrupts
mmap merge sequences generated by the tlb.
While the condition is not particularly common it is worth
avoiding.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Support for DISCARD and WRITE_ZEROES in the ubd driver using
fallocate.
DISCARD is enabled by default and can be disabled using a new
UBD command line flag.
If the underlying fs on which the UBD image is stored does not
support DISCARD the support for both DISCARD and WRITE_ZEROES
is turned off.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull UML updates from Richard Weinberger:
- removal of old and dead code
- a bug fix for our tty driver
- other minor cleanups across the code base
* 'for-linus-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Make line/tty semantics use true write IRQ
um: trap: fix spelling mistake, EACCESS -> EACCES
um: Don't hardcode path as it is architecture dependent
um: NULL check before kfree is not needed
um: remove unused AIO code
um: Give start_idle_thread() a return code
um: Remove update_debugregs()
um: Drop own definition of PTRACE_SYSEMU/_SINGLESTEP
Since the struct lsm_info table is not an initcall, we can just move it
into INIT_DATA like all the other tables.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Pull uml updates from Richard Weinberger:
"Minor updates for UML:
- fixes for our new vector network driver by Anton
- initcall cleanup by Alexander
- We have a new mailinglist, sourceforge.net sucks"
* 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Fix raw interface options
um: Fix initialization of vector queues
um: remove uml initcalls
um: Update mailing list address
__uml_initcall() is not used and .uml.initcall.init section is empty:
$ grep -r '__uml_initcall('
arch/um/include/shared/init.h:#define __uml_initcall(fn) \
$ readelf -s ../umobj/linux | grep __uml_initcall
23214: 00000000603b75d8 0 NOTYPE GLOBAL DEFAULT 32 __uml_initcall_start
25337: 00000000603b75d8 0 NOTYPE GLOBAL DEFAULT 32 __uml_initcall_end
So it is unnecessary.
Signed-off-by: Alexander Pateenok <pateenoc@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We have a couple of files that try to include asm/compat.h on
architectures where this is available. Those should generally use the
higher-level linux/compat.h file, but that in turn fails to include
asm/compat.h when CONFIG_COMPAT is disabled, unless we can provide
that header on all architectures.
This adds the asm/compat.h for all remaining architectures to
simplify the dependencies.
Architectures that are getting removed in linux-4.17 are not changed
here, to avoid needless conflicts with the removal patches. Those
architectures are broken by this patch, but we have already shown
that they have no users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
1. Provides infrastructure for vector IO using recvmmsg/sendmmsg.
1.1. Multi-message read.
1.2. Multi-message write.
1.3. Optimized queue support for multi-packet enqueue/dequeue.
1.4. BQL/DQL support.
2. Implements transports for several transports as well support
for direct wiring of PWEs to NIC. Allows direct connection of VMs
to host, other VMs and network devices with no switch in use.
2.1. Raw socket >4 times higher PPS and 10 times higher tcp RX
than existing pcap based transport (> 4Gbit)
2.2. New tap transport using socket RX and tap xmit. Similar
performance improvements (>4Gbit)
2.3. GRE transport - direct wiring to GRE PWE
2.4. L2TPv3 transport - direct wiring to L2TPv3 PWE
3. Tuning, performance and offload related setting support via ethtool.
4. Initial BPF support - used in tap/raw to avoid software looping
5. Scatter Gather support.
6. VNET and checksum offload support for raw socket transport.
7. TSO/GSO support where applicable or available
8. Migrates all error messages to netdevice_*() and rate limits
them where needed.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
1. Removes the need to walk the IRQ/Device list to determine
who triggered the IRQ.
2. Improves scalability (up to several times performance
improvement for cases with 10s of devices).
3. Improves UML baseline IO performance for one disk + one NIC
use case by up to 10%.
4. Introduces write poll triggered IRQs.
5. Prerequisite for introducing high performance mmesg family
of functions in network IO.
6. Fixes RNG shutdown which was leaking a file descriptor
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If CONFIG_MODVERSIONS=y:
WARNING: EXPORT symbol "__memcpy" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "memcpy" [vmlinux] version generation failed, symbol will not be versioned.
Add <asm/asm-prototypes.h>, including the generic version, so that
genksyms knows the types of these symbols and can generate CRCs for
them.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
to the clk rate protection support added by Jerome Brunet. This feature
will allow consumers to lock in a certain rate on the output of a clk so
that things like audio playback don't hear pops when the clk frequency
changes due to shared parent clks changing rates. Currently the clk
API doesn't guarantee the rate of a clk stays at the rate you request
after clk_set_rate() is called, so this new API will allow drivers
to express that requirement. Beyond this, the core got some debugfs
pretty printing patches and a couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h file
causes small one-liners in all the architecture Kbuild files. Overall, the
driver diff seems to be the normal stuff that comes all the time to
fix little problems here and there and to support new hardware.
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJac5vRAAoJEK0CiJfG5JUlUaIP/Riq0tbApfc4k4GMvSvaieR/
AwZFIMCxOxO+KGdUsBWj7UUoDfBYmxyknHZkVUA/m+Lm7cRH/YHHMghEceZLaBYW
zPQmDfkTl/QkwysXZMCw9vg4vO0tt5gWbHljQnvVhxVVTCkIRpaE8Vkktj1RZzpY
WU/TkvPbVGY3SNm504TRXKWC9KpMTEXVvzqlg6zLDJ/jE7PGzBKtewqMoLDCBH2L
q6b50BSXDo2Hep0vm6e5xneXKjLNR4kgN4PkbM4Yoi4iWLLbgAu79NfyOvvr/imS
HxOHRms9tejtyaiR6bQSF0pbLOERZ3QSbMFEbxdxnCTuPEfy3Nw/2W7mNJlhJa8g
EGLMnLL4WdloL4Z83dAcMrj9OmxYf7Yobf5dMidLrQT5EYuafdj0ParbI8TQpWSB
eTqaffSUGPE/7xuKouYBcbvocpXXWCcokrP/mEn3OEHXkIeeut1Jd3RmEvsi3gtJ
pNraJTIpvt4c05rj6yLUOhWfyqlA+fH3p4Fx3rrH1tmKEiG+lrhKoxF26uALZe0V
OvarhG+LPIE10pCIYlQjZjQVnYLGCxsGAIoK1uz7VYvFPh2T0cxQlzzeqFgrlTyN
32hMj3LhkQw82FG9xZqjTX1935R35mySRlx63x7HStI1YFief2X9+RHjJR/lofG0
nC0JWTp5sC/pKf54QBXj
=bGPp
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The core framework has a handful of patches this time around, mostly
due to the clk rate protection support added by Jerome Brunet.
This feature will allow consumers to lock in a certain rate on the
output of a clk so that things like audio playback don't hear pops
when the clk frequency changes due to shared parent clks changing
rates. Currently the clk API doesn't guarantee the rate of a clk stays
at the rate you request after clk_set_rate() is called, so this new
API will allow drivers to express that requirement.
Beyond this, the core got some debugfs pretty printing patches and a
couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h
file causes small one-liners in all the architecture Kbuild files.
Overall, the driver diff seems to be the normal stuff that comes all
the time to fix little problems here and there and to support new
hardware.
Summary:
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
clk: aspeed: Handle inverse polarity of USB port 1 clock gate
clk: aspeed: Fix return value check in aspeed_cc_init()
clk: aspeed: Add reset controller
clk: aspeed: Register gated clocks
clk: aspeed: Add platform driver and register PLLs
clk: aspeed: Register core clocks
clk: Add clock driver for ASPEED BMC SoCs
clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
clk: fix reentrancy of clk_enable() on UP systems
clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
clk: Simplify debugfs registration
clk: Fix debugfs_create_*() usage
clk: Show symbolic clock flags in debugfs
clk: renesas: r8a7796: Add FDP clock
clk: Move __clk_{get,put}() into private clk.h API
clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
clk: Improve flags doc for of_clk_detect_critical()
arch: Remove clkdev.h asm-generic from Kbuild
clk: sunxi-ng: a83t: Add M divider to TCON1 clock
clk: Prepare to remove asm-generic/clkdev.h
...
Construct the init thread stack in the linker script rather than doing it
by means of a union so that ia64's init_task.c can be got rid of.
The following symbols are then made available from INIT_TASK_DATA() linker
script macro:
init_thread_union
init_stack
INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
size of the init stack. init_thread_union is given its own section so that
it can be placed into the stack space in the right order. I'm assuming
that the ia64 ordering is correct and that the task_struct is first and the
thread_info second.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Will Deacon <will.deacon@arm.com> (arm64)
Tested-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Now that every architecture is using the generic clkdev.h file
and we no longer include asm/clkdev.h anywhere in the tree, we
can remove it.
Cc: Russell King <linux@armlinux.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Pull x86 PTI preparatory patches from Thomas Gleixner:
"Todays Advent calendar window contains twentyfour easy to digest
patches. The original plan was to have twenty three matching the date,
but a late fixup made that moot.
- Move the cpu_entry_area mapping out of the fixmap into a separate
address space. That's necessary because the fixmap becomes too big
with NRCPUS=8192 and this caused already subtle and hard to
diagnose failures.
The top most patch is fresh from today and cures a brain slip of
that tall grumpy german greybeard, who ignored the intricacies of
32bit wraparounds.
- Limit the number of CPUs on 32bit to 64. That's insane big already,
but at least it's small enough to prevent address space issues with
the cpu_entry_area map, which have been observed and debugged with
the fixmap code
- A few TLB flush fixes in various places plus documentation which of
the TLB functions should be used for what.
- Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
more than sysenter now and keeping the name makes backtraces
confusing.
- Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
which is only invoked on fork().
- Make vysycall more robust.
- A few fixes and cleanups of the debug_pagetables code. Check
PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
C89 initialization of the address hint array which already was out
of sync with the index enums.
- Move the ESPFIX init to a different place to prepare for PTI.
- Several code moves with no functional change to make PTI
integration simpler and header files less convoluted.
- Documentation fixes and clarifications"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
init: Invoke init_espfix_bsp() from mm_init()
x86/cpu_entry_area: Move it out of the fixmap
x86/cpu_entry_area: Move it to a separate unit
x86/mm: Create asm/invpcid.h
x86/mm: Put MMU to hardware ASID translation in one place
x86/mm: Remove hard-coded ASID limit checks
x86/mm: Move the CR3 construction functions to tlbflush.h
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
x86/mm: Remove superfluous barriers
x86/mm: Use __flush_tlb_one() for kernel memory
x86/microcode: Dont abuse the TLB-flush interface
x86/uv: Use the right TLB-flush API
x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
x86/mm/64: Improve the memory map documentation
x86/ldt: Prevent LDT inheritance on exec
x86/ldt: Rework locking
arch, mm: Allow arch_dup_mmap() to fail
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
...
In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
allowed to fail. Fix up all instances.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: dan.j.williams@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ Note, this is a Git cherry-pick of the following commit:
a23f06f06d ("bpf: fix build issues on um due to mising bpf_perf_event.h")
... for easier x86 PTI code testing and back-porting. ]
Since c895f6f703 ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
on i386 or x86_64:
[...]
CC init/main.o
In file included from ../include/linux/perf_event.h:18:0,
from ../include/linux/trace_events.h:10,
from ../include/trace/syscall.h:7,
from ../include/linux/syscalls.h:82,
from ../init/main.c:20:
../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
asm/bpf_perf_event.h: No such file or directory #include
<asm/bpf_perf_event.h>
[...]
Lets add missing bpf_perf_event.h also to um arch. This seems
to be the only one still missing.
Fixes: c895f6f703 ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@sigma-star.at>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since c895f6f703 ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
on i386 or x86_64:
[...]
CC init/main.o
In file included from ../include/linux/perf_event.h:18:0,
from ../include/linux/trace_events.h:10,
from ../include/trace/syscall.h:7,
from ../include/linux/syscalls.h:82,
from ../init/main.c:20:
../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
asm/bpf_perf_event.h: No such file or directory #include
<asm/bpf_perf_event.h>
[...]
Lets add missing bpf_perf_event.h also to um arch. This seems
to be the only one still missing.
Fixes: c895f6f703 ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@sigma-star.at>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
linux/compiler.h is included indirectly by linux/types.h via
uapi/linux/types.h -> uapi/linux/posix_types.h -> linux/stddef.h
-> uapi/linux/stddef.h and is needed to provide a proper definition of
offsetof.
Unfortunately, compiler.h requires a definition of
smp_read_barrier_depends() for defining lockless_dereference() and soon
for defining READ_ONCE(), which means that all
users of READ_ONCE() will need to include asm/barrier.h to avoid splats
such as:
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from arch/h8300/kernel/asm-offsets.c:11:
include/linux/list.h: In function 'list_empty':
>> include/linux/compiler.h:343:2: error: implicit declaration of function 'smp_read_barrier_depends' [-Werror=implicit-function-declaration]
smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
^
A better alternative is to include asm/barrier.h in linux/compiler.h,
but this requires a type definition for "bool" on some architectures
(e.g. x86), which is defined later by linux/types.h. Type "bool" is also
used directly in linux/compiler.h, so the whole thing is pretty fragile.
This patch splits compiler.h in two: compiler_types.h contains type
annotations, definitions and the compiler-specific parts, whereas
compiler.h #includes compiler-types.h and additionally defines macros
such as {READ,WRITE.ACCESS}_ONCE().
uapi/linux/stddef.h and linux/linkage.h are then moved over to include
linux/compiler_types.h, which fixes the build for h8 and blackfin.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some architectures define the no-op macros/functions copy_segments,
release_segments and forget_segments. These are used nowhere in the
tree, so removed them.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull UML updates from Richard Weinberger:
- minor improvements
- fixes for Debian's new gcc defaults (pie enabled by default)
- fixes for XSTATE/XSAVE to make UML work again on modern systems
* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: return negative in tuntap_open_tramp()
um: remove a stray tab
um: Use relative modversions with LD_SCRIPT_DYN
um: link vmlinux with -no-pie
um: Fix CONFIG_GCOV for modules.
Fix minor typos and grammar in UML start_up help
um: defconfig: Cleanup from old Kconfig options
um: Fix FP register size for XSTATE/XSAVE
Pull x86 asm updates from Ingo Molnar:
- Introduce the ORC unwinder, which can be enabled via
CONFIG_ORC_UNWINDER=y.
The ORC unwinder is a lightweight, Linux kernel specific debuginfo
implementation, which aims to be DWARF done right for unwinding.
Objtool is used to generate the ORC unwinder tables during build, so
the data format is flexible and kernel internal: there's no
dependency on debuginfo created by an external toolchain.
The ORC unwinder is almost two orders of magnitude faster than the
(out of tree) DWARF unwinder - which is important for perf call graph
profiling. It is also significantly simpler and is coded defensively:
there has not been a single ORC related kernel crash so far, even
with early versions. (knock on wood!)
But the main advantage is that enabling the ORC unwinder allows
CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
measurably:
With frame pointers disabled, GCC does not have to add frame pointer
instrumentation code to every function in the kernel. The kernel's
.text size decreases by about 3.2%, resulting in better cache
utilization and fewer instructions executed, resulting in a broad
kernel-wide speedup. Average speedup of system calls should be
roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
a speedup of 5-10% for some function execution intense workloads.
The main cost of the unwinder is that the unwinder data has to be
stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
config - which is a modest cost on modern x86 systems.
Given how young the ORC unwinder code is it's not enabled by default
- but given the performance advantages the plan is to eventually make
it the default unwinder on x86.
See Documentation/x86/orc-unwinder.txt for more details.
- Remove lguest support: its intended role was that of a temporary
proof of concept for virtualization, plus its removal will enable the
reduction (removal) of the paravirt API as well, so Rusty agreed to
its removal. (Juergen Gross)
- Clean up and fix FSGS related functionality (Andy Lutomirski)
- Clean up IO access APIs (Andy Shevchenko)
- Enhance the symbol namespace (Jiri Slaby)
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
objtool: Handle GCC stack pointer adjustment bug
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
x86/fpu/math-emu: Add ENDPROC to functions
x86/boot/64: Extract efi_pe_entry() from startup_64()
x86/boot/32: Extract efi_pe_entry() from startup_32()
x86/lguest: Remove lguest support
x86/paravirt/xen: Remove xen_patch()
objtool: Fix objtool fallthrough detection with function padding
x86/xen/64: Fix the reported SS and CS in SYSCALL
objtool: Track DRAP separately from callee-saved registers
objtool: Fix validate_branch() return codes
x86: Clarify/fix no-op barriers for text_poke_bp()
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
x86/asm/32: Fix regs_get_register() on segment registers
x86/xen/64: Rearrange the SYSCALL entries
x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
...
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
Quote from Mel Gorman:
"The race in question is CPU 0 running madv_free and updating some PTEs
while CPU 1 is also running madv_free and looking at the same PTEs.
CPU 1 may have writable TLB entries for a page but fail the pte_dirty
check (because CPU 0 has updated it already) and potentially fail to
flush.
Hence, when madv_free on CPU 1 returns, there are still potentially
writable TLB entries and the underlying PTE is still present so that a
subsequent write does not necessarily propagate the dirty bit to the
underlying PTE any more. Reclaim at some unknown time at the future
may then see that the PTE is still clean and discard the page even
though a write has happened in the meantime. I think this is possible
but I could have missed some protection in madv_free that prevents it
happening."
This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].
TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on. In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.
I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.
NOTE:
This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um). It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends. However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.
[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/
[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a preparatory patch for solving race problems caused by
TLB batch. For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.
Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.
It shouldn't change any behavior.
Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
It plugs into the existing x86 unwinder framework.
It relies on objtool to generate the needed .orc_unwind and
.orc_unwind_ip sections.
For more details on why ORC is used instead of DWARF, see
Documentation/x86/orc-unwinder.txt - but the short version is
that it's a simplified, fundamentally more robust debugninfo
data structure, which also allows up to two orders of magnitude
faster lookups than the DWARF unwinder - which matters to
profiling workloads like perf.
Thanks to Andy Lutomirski for the performance improvement ideas:
splitting the ORC unwind table into two parallel arrays and creating a
fast lookup table to search a subset of the unwind table.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
[ Extended the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull UML updates from Richard Weinberger:
"Mostly fixes for UML:
- First round of fixes for PTRACE_GETRESET/SETREGSET
- A printf vs printk cleanup
- Minor improvements"
* 'for-linus-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Correctly check for PTRACE_GETRESET/SETREGSET
um: v2: Use generic NOTES macro
um: Add kerneldoc for userspace_tramp() and start_userspace()
um: Add kerneldoc for segv_handler
um: stub-data.h: remove superfluous include
um: userspace - be more verbose in ptrace set regs error
um: add dummy ioremap and iounmap functions
um: Allow building and running on older hosts
um: Avoid longjmp/setjmp symbol clashes with libpthread.a
um: console: Ignore console= option
um: Use os_warn to print out pre-boot warning/error messages
um: Add os_warn() for pre-boot warning/error messages
um: Use os_info for the messages on normal path
um: Add os_info() for pre-boot information messages
um: Use printk instead of printf in make_uml_dir
The user mode architecture does not provide ioremap or iounmap, and
because of this, the arch won't build when the functions are used in some
core libraries.
I have designs to use these functions in scatterlist.c where they'd
almost certainly never be called on the um architecture but it does need
to compile.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Add os_warn() for printing out pre-boot warning/error
messages in stderr. The messages via os_warn() are not
suppressed by quiet option.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Add os_info() for printing out pre-boot information
level messages in stderr. The messages via os_info()
are suppressed by "quiet" kernel command line.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()"). Remove the implementations as well.
Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 mm updates from Ingo Molnar:
"The main x86 MM changes in this cycle were:
- continued native kernel PCID support preparation patches to the TLB
flushing code (Andy Lutomirski)
- various fixes related to 32-bit compat syscall returning address
over 4Gb in applications, launched from 64-bit binaries - motivated
by C/R frameworks such as Virtuozzo. (Dmitry Safonov)
- continued Intel 5-level paging enablement: in particular the
conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)
- x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)
- ... plus misc updates, fixes and cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
x86/mm: Fix flush_tlb_page() on Xen
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm/64: Fix crash in remove_pagetable()
Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
x86/boot/e820: Remove a redundant self assignment
x86/mm: Fix dump pagetables for 4 levels of page tables
x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
x86/espfix: Add support for 5-level paging
x86/kasan: Extend KASAN to support 5-level paging
x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
x86/paravirt: Add 5-level support to the paravirt code
x86/mm: Define virtual memory map for 5-level paging
x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
x86/boot: Detect 5-level paging support
x86/mm/numa: Remove numa_nodemask_from_meminfo()
...