Commit Graph

136 Commits

Author SHA1 Message Date
Tiwei Bie
f7e9077a16 um: Stop tracking stub's PID via userspace_pid[]
The PID of the stub process can be obtained from current_mm_id().
There is no need to track it via userspace_pid[]. Stop doing that
to simplify the code.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250711065021.2535362-4-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-13 19:42:49 +02:00
Tiwei Bie
b3fb0eb5c2 um: Remove the pid parameter of handle_trap()
It's no longer used. Remove it.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250711065021.2535362-3-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-13 19:42:49 +02:00
Tiwei Bie
cba737fa59 um: Use err consistently in userspace()
Avoid declaring a new variable 'ret' inside the 'if (using_seccomp)'
block, as the existing 'err' variable declared at the top of the
function already serves the same purpose.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250711065021.2535362-2-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-13 19:42:49 +02:00
Tiwei Bie
159e76514b um: Make unscheduled_userspace_iterations static
It's only used within process.c. Make it static.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20250708090403.1067440-2-tiwei.bie@linux.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-11 08:49:17 +02:00
Benjamin Berg
e92e255285 um: pass FD for memory operations when needed
Instead of always sharing the FDs with the userspace process, only hand
over the FDs needed for mmap when required. The idea is that userspace
might be able to force the stub into executing an mmap syscall, however,
it will not be able to manipulate the control flow sufficiently to have
access to an FD that would allow mapping arbitrary memory.

Security wise, we need to be sure that only the expected syscalls are
executed after the kernel sends FDs through the socket. This is
currently not the case, as userspace can trivially jump to the
rt_sigreturn syscall instruction to execute any syscall that the stub is
permitted to do. With this, it can trick the kernel to send the FD,
which in turn allows userspace to freely map any physical memory.

As such, this is currently *not* secure. However, in principle the
approach should be fine with a more strict SECCOMP filter and a careful
review of the stub control flow (as userspace can prepare a stack). With
some care, it is likely possible to extend the security model to SMP if
desired.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250602130052.545733-8-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-02 16:20:10 +02:00
Benjamin Berg
406d17c6c3 um: Implement kernel side of SECCOMP based process handling
This adds the kernel side of the seccomp based process handling.

Co-authored-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250602130052.545733-6-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-02 15:17:19 +02:00
Benjamin Berg
8420e08fe3 um: Track userspace children dying in SECCOMP mode
When in seccomp mode, we would hang forever on the futex if a child has
died unexpectedly. In contrast, ptrace mode will notice it and kill the
corresponding thread when it fails to run it.

Fix this issue using a new IRQ that is fired after a SIGCHLD and keeping
an (internal) list of all MMs. In the IRQ handler, find the affected MM
and set its PID to -1 as well as the futex variable to FUTEX_IN_KERN.

This, together with futex returning -EINTR after the signal is
sufficient to implement a race-free detection of a child dying.

Note that this also enables IRQ handling while starting a userspace
process. This should be safe and SECCOMP requires the IRQ in case the
process does not come up properly.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250602130052.545733-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-02 15:17:19 +02:00
Benjamin Berg
247ed9e4a6 um: Move faultinfo extraction into userspace routine
The segv handler is called slightly differently depending on whether
PTRACE_FULL_FAULTINFO is set or not (32bit vs. 64bit). The only
difference is that we don't try to pass the registers and instruction
pointer to the segv handler.

It would be good to either document or remove the difference, but I do
not know why this difference exists. And, passing NULL can even result
in a crash.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20250602130052.545733-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-02 15:17:19 +02:00
Johannes Berg
d1d7f01f7c um: mark rodata read-only and implement _nofault accesses
Mark read-only data actually read-only (simple mprotect), and
to be able to test it also implement _nofault accesses. This
works by setting up a new "segv_continue" pointer in current,
and then when we hit a segfault we change the signal return
context so that we continue at that address. The code using
this sets it up so that it jumps to a label and then aborts
the access that way, returning -EFAULT.

It's possible to optimize the ___backtrack_faulted() thing by
using asm goto (compiler version dependent) and/or gcc's (not
sure if clang has it) &&label extension, but at least in one
attempt I made the && caused the compiler to not load -EFAULT
into the register in case of jumping to the &&label from the
fault handler. So leave it like this for now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Co-developed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250210160926.420133-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18 11:03:14 +01:00
Benjamin Berg
f82a9e7b9f um: fix execve stub execution on old host OSs
The stub execution uses the somewhat new close_range and execveat
syscalls. Of these two, the execveat call is essential, but the
close_range call is more about stub process hygiene rather than safety
(and its result is ignored).

Replace both calls with a raw syscall as older machines might not have a
recent enough kernel for close_range (with CLOSE_RANGE_CLOEXEC) or a
libc that does not yet expose both of the syscalls.

Fixes: 32e8eaf263 ("um: use execveat to create userspace MMs")
Reported-by: Glenn Washburn <development@efficientek.com>
Closes: https://lore.kernel.org/20250108022404.05e0de1e@crass-HP-ZBook-15-G2
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250113094107.674738-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2025-02-12 23:40:45 +01:00
Benjamin Berg
2f278b5957 um: always include kconfig.h and compiler-version.h
Since commit a95b37e20d ("kbuild: get <linux/compiler_types.h> out of
<linux/kconfig.h>") we can safely include these files in userspace code.
Doing so simplifies matters as options do not need to be exported via
asm-offsets.h anymore.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07 17:36:30 +01:00
Benjamin Berg
0b8b2668f9 um: insert scheduler ticks when userspace does not yield
In time-travel mode userspace can do a lot of work without any time
passing. Unfortunately, this can result in OOM situations as the RCU
core code will never be run.

Work around this by keeping track of userspace processes that do not
yield for a lot of operations. When this happens, insert a jiffie into
the sched_clock clock to account time against the process and cause the
bookkeeping to run.

As sched_clock is used for tracing, it is useful to keep it in sync
between the different VMs. As such, try to remove added ticks again when
the actual clock ticks.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241010142537.1134685-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 09:52:49 +02:00
Tiwei Bie
2717c6b649 um: Abandon the _PAGE_NEWPROT bit
When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will
always be set. And the corresponding page will always be mapped or
unmapped depending on whether the PTE is present or not. The check
on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will
allow us to simplify the code and remove the unreachable code.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 09:52:49 +02:00
Tiwei Bie
45aa6026d1 um: Do not propagate noreboot parameter to kernel
This parameter is UML specific and is unknown to kernel. It should not
be propagated to kernel, otherwise it could be passed to user space as
a command line option by kernel with a warning like:

Unknown kernel command line parameters "noreboot", will be passed to user space.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011040441.1586345-6-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 09:52:48 +02:00
Benjamin Berg
32e8eaf263 um: use execveat to create userspace MMs
Using clone will not undo features that have been enabled by libc. An
example of this already happening is rseq, which could cause the kernel
to read/write memory of the userspace process. In the future the
standard library might also use mseal by default to protect itself,
which would also thwart our attempts at unmapping everything.

Solve all this by taking a step back and doing an execve into a tiny
static binary that sets up the minimal environment required for the
stub without using any standard library. That way we have a clean
execution environment that is fully under the control of UML.

Note that this changes things a bit as the FDs are not anymore shared
with the kernel. Instead, we explicitly share the FDs for the physical
memory and all existing iomem regions. Doing this is fine, as iomem
regions cannot be added at runtime.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-3-benjamin@sipsolutions.net
[use pipe() instead of pipe2(), remove unneeded close() calls]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 13:37:16 +02:00
Benjamin Berg
c6ce72005d um: remove auxiliary FP registers
We do not need the extra save/restore of the FP registers when getting
the fault information. This was originally added in commit 2f56debd77
("uml: fix FP register corruption") but at that time the code was not
saving/restoring the FP registers when switching to userspace. This was
fixed in commit fbfe9c847e ("um: Save FPU registers between task
switches") and since then the auxiliary registers have not been useful.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241004233821.2130874-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:10:30 +02:00
Tiwei Bie
59376fb2a7 um: Remove unused mm_fd field from mm_id
It's no longer used since the removal of the SKAS3/4 support.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12 20:36:22 +02:00
Benjamin Berg
bcf3d957c6 um: refactor TLB update handling
Conceptually, we want the memory mappings to always be up to date and
represent whatever is in the TLB. To ensure that, we need to sync them
over in the userspace case and for the kernel we need to process the
mappings.

The kernel will call flush_tlb_* if page table entries that were valid
before become invalid. Unfortunately, this is not the case if entries
are added.

As such, change both flush_tlb_* and set_ptes to track the memory range
that has to be synchronized. For the kernel, we need to execute a
flush_tlb_kern_* immediately but we can wait for the first page fault in
case of set_ptes. For userspace in contrast we only store that a range
of memory needs to be synced and do so whenever we switch to that
process.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:50 +02:00
Benjamin Berg
573a446fc8 um: simplify and consolidate TLB updates
The HVC update was mostly used to compress consecutive calls into one.
This is mostly relevant for userspace where it is already handled by the
syscall stub code.

Simplify the whole logic and consolidate it for both kernel and
userspace. This does remove the sequential syscall compression for the
kernel, however that shouldn't be the main factor in most runs.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-12-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:50 +02:00
Benjamin Berg
3c83170d7c um: Delay flushing syscalls until the thread is restarted
As running the syscalls is expensive due to context switches, we should
do so as late as possible in case more syscalls need to be queued later
on. This will also benefit a later move to a SECCOMP enabled userspace
as in that case the need for extra context switches is removed entirely.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-9-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:49 +02:00
Benjamin Berg
a5d2cfe749 um: remove copy_context_skas0
The kernel flushes the memory ranges anyway for CoW and does not assume
that the userspace process has anything set up already. So, start with a
fresh process for the new mm context.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-8-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:49 +02:00
Benjamin Berg
6d8992e49e um: compress memory related stub syscalls while adding them
To keep the number of syscalls that the stub has to do lower, compress
two consecutive syscalls of the same type if the second is just a
continuation of the first.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-6-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:49 +02:00
Benjamin Berg
76ed9158e1 um: Rework syscall handling
Rework syscall handling to be platform independent. Also create a clean
split between queueing of syscalls and flushing them out, removing the
need to keep state in the code that triggers the syscalls.

The code adds syscall_data_len to the global mm_id structure. This will
be used later to allow surrounding code to track whether syscalls still
need to run and if errors occurred.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:49 +02:00
Benjamin Berg
dc26184a9d um: Create signal stack memory assignment in stub_data
When we switch to use seccomp, we need both the signal stack and other
data (i.e. syscall information) to co-exist in the stub data. To
facilitate this, start by defining separate memory areas for the stack
and syscall data.

This moves the signal stack onto a new page as the memory area is not
sufficient to hold both signal stack and syscall information.

Only change the signal stack setup for now, as the syscall code will be
reworked later.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:48 +02:00
Mordechay Goodstein
6555acdefc um: time-travel: support time-travel protocol broadcast messages
Add a message type to the time-travel protocol to broadcast
a small (64-bit) value to all participants in a simulation.
The main use case is to have an identical message come to
all participants in a simulation, e.g. to separate out logs
for different tests running in a single simulation.

Down in the guts of time_travel_handle_message() we can't
use printk() and not even printk_deferred(), so just store
the message and print it at the start of the userspace()
function.

Unfortunately this means that other prints in the kernel
can actually bypass the message, but in most cases where
this is used, for example to separate test logs, userspace
will be involved. Also, even if we could use
printk_deferred(), we'd still need to flush it out in the
userspace() function since otherwise userspace messages
might cross it.

As a result, this is a reasonable compromise, there's no
need to have any core changes and it solves the main use
case we have for it.

Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Link: https://patch.msgid.link/20240702192118.c4093bc5b15e.I2ca8d006b67feeb866ac2017af7b741c9e06445a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 12:24:22 +02:00
Tiwei Bie
847d3abc6a um: Add an internal header shared among the user code
Move relevant declarations to this header. This will address
below -Wmissing-prototypes warnings:

arch/um/os-Linux/elf_aux.c:26:13: warning: no previous prototype for ‘scan_elf_aux’ [-Wmissing-prototypes]
arch/um/os-Linux/mem.c:213:13: warning: no previous prototype for ‘check_tmpexec’ [-Wmissing-prototypes]
arch/um/os-Linux/skas/process.c:107:6: warning: no previous prototype for ‘wait_stub_done’ [-Wmissing-prototypes]

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-04-30 14:16:44 +02:00
Benjamin Berg
6d64095ea8 um: Do not use printk in userspace trampoline
The trampoline is running in a cloned process. It is not safe to use
printk for error printing there.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-04 23:51:47 +01:00
Benjamin Berg
5713533794 um: Drop NULL check from start_userspace
start_userspace is only called from exactly one location, and the passed
pointer for the userspace process stack cannot be NULL.

Remove the check, without changing the control flow.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-04 23:31:05 +01:00
Benjamin Berg
a55719847d um: Drop support for hosts without SYSEMU_SINGLESTEP support
These features have existed since Linux 2.6.14 and can be considered
widely available at this point. Also drop the backward compatibility
code for PTRACE_SETOPTIONS.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>

----

v2:
 * Continue to define PTRACE_SYSEMU_SINGLESTEP as glibc only added it in
   version 2.27.
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-01-04 23:29:11 +01:00
Johannes Berg
6032aca0de um: make stub data pages size tweakable
There's a lot of code here that hard-codes that the
data is a single page, and right now that seems to
be sufficient, but to make it easier to change this
in the future, add a new STUB_DATA_PAGES constant
and use it throughout the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2023-04-20 23:08:43 +02:00
Linus Torvalds
64e851689e This pull request contains the following changes for UML:
- Add support for rust (yay!)
 - Add support for LTO
 - Add platform bus support to virtio-pci
 - Various virtio fixes
 - Coding style, spelling cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmP/AVYWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wUQvEADRVll8yCg+a4C1BORSefv0FQ/I
 z+3jiUtzq/ABGBf9S0TYEfaGcOn1LXFzlEgtqf4kd3vmzyIVG0pHUt3BxaqLSSTU
 IjDZkbtZ2GC0i7EK8D3iAC+lvew+TjBMfgEjhrNyni3VeYNDBh8EkUseWIbrNX8Q
 pqoUQwYlVdjY6PedtcdGDko70Fiy4OMIK7lat7JDtuoL5pvMEadbR1D7ClfiYRIh
 NGg5mSnfBBTGc20ochDkHUhubzagtSCDHvNe2SiYDBrM5sjeSANecsICmymWS+Pm
 aJtYybwpC4tl9J25O4OTD3SyfKxZ93mKwK89Tw9ryqQV2rmXG+3qB2hbZ0zpi75I
 vpgDrfv+VC+4daC0Wp8zjoeSE6zUbCKVE1s307EC5fjYyIoHjSUAsPE9GrNaJi5K
 91WVe1x8Dnpfq9/ZO8o3sLqftBo3aVH21dGVuqi5qS6OjRqDMkFkaY31nUjVXELV
 tEBj6n9UoyqPFzcgvsQfTcCjjlMiVL+JU+sl2L7dQXTNev1/RReTJngdK/vv7Epo
 BjLpGfn+1I/8dlbsyjLt0FOIwhIbUf+8RbWpezENGVgKP81iqEQPOdax/yBEhCKm
 NWduDz2itQn94KNMcUoPq+G3xoG3dOW7lXlEzXP6ZbLkcuIyXpeNZeJNlvq7J/z/
 2PBy61Ngs/DDUK/doQ==
 =q2ks
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Add support for rust (yay!)

 - Add support for LTO

 - Add platform bus support to virtio-pci

 - Various virtio fixes

 - Coding style, spelling cleanups

* tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (27 commits)
  Documentation: rust: Fix arch support table
  uml: vector: Remove unused definitions VECTOR_{WRITE,HEADERS}
  um: virt-pci: properly remove PCI device from bus
  um: virtio_uml: move device breaking into workqueue
  um: virtio_uml: mark device as unregistered when breaking it
  um: virtio_uml: free command if adding to virtqueue failed
  UML: define RUNTIME_DISCARD_EXIT
  virt-pci: add platform bus support
  um-virt-pci: Make max delay configurable
  um: virt-pci: implement pcibios_get_phb_of_node()
  um: Support LTO
  um: put power options in a menu
  um: Use CFLAGS_vmlinux
  um: Prevent building modules incompatible with MODVERSIONS
  um: Avoid pcap multiple definition errors
  um: Make the definition of cpu_data more compatible
  x86: um: vdso: Add '%rcx' and '%r11' to the syscall clobber list
  rust: arch/um: Add support for CONFIG_RUST under x86_64 UML
  rust: arch/um: Disable FP/SIMD instruction to match x86
  rust: arch/um: Use 'pie' relocation mode under UML
  ...
2023-03-01 09:13:00 -08:00
Masahiro Yamada
67d7c3023a kbuild: remove --include-dir MAKEFLAG from top Makefile
I added $(srctree)/ to some included Makefiles in the following commits:

 - 3204a7fb98 ("kbuild: prefix $(srctree)/ to some included Makefiles")
 - d828563955 ("kbuild: do not require sub-make for separate output tree builds")

They were a preparation for removing --include-dir flag.

I have never thought --include-dir useful. Rather, it _is_ harmful.

For example, run the following commands:

  $ make -s ARCH=x86 mrproper defconfig
  $ make ARCH=arm O=foo dtbs
  make[1]: Entering directory '/tmp/linux/foo'
    HOSTCC  scripts/basic/fixdep
  Error: kernelrelease not valid - run 'make prepare' to update it
    UPD     include/config/kernel.release
  make[1]: Leaving directory '/tmp/linux/foo'

The first command configures the source tree for x86. The next command
tries to build ARM device trees in the separate foo/ directory - this
must stop because the directory foo/ has not been configured yet.

However, due to --include-dir=$(abs_srctree), the top Makefile includes
the wrong include/config/auto.conf from the source tree and continues
building. Kbuild traverses the directory tree, but of course it does
not work correctly. The Error message is also pointless - 'make prepare'
does not help at all for fixing the issue.

This commit fixes more arch Makefile, and finally removes --include-dir
from the top Makefile.

There are more breakages under drivers/, but I do not volunteer to fix
them all. I just moved --include-dir to drivers/Makefile.

With this commit, the second command will stop with a sensible message.

  $ make -s ARCH=x86 mrproper defconfig
  $ make ARCH=arm O=foo dtbs
  make[1]: Entering directory '/tmp/linux/foo'
    SYNC    include/config/auto.conf.cmd
  ***
  *** The source tree is not clean, please run 'make ARCH=arm mrproper'
  *** in /tmp/linux
  ***
  make[2]: *** [../Makefile:646: outputmakefile] Error 1
  /tmp/linux/Makefile:770: include/config/auto.conf.cmd: No such file or directory
  make[1]: *** [/tmp/linux/Makefile:793: include/config/auto.conf.cmd] Error 2
  make[1]: Leaving directory '/tmp/linux/foo'
  make: *** [Makefile:226: __sub-make] Error 2

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-02-05 18:51:22 +09:00
Benjamin Berg
d119595f87 um: Switch printk calls to adhere to correct coding style
This means having the string literal in one line and using __func__
where appropriate.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
2023-02-01 22:11:28 +01:00
Guenter Roeck
8970d5c9f4 um: Replace to_phys() and to_virt() with less generic function names
to_virt() and to_phys() are very generic and may be defined by drivers.
As it turns out, commit 9409c9b670 ("pmem: refactor pmem_clear_poison()")
did exactly that. This results in build errors such as the following
when trying to build um:allmodconfig.

drivers/nvdimm/pmem.c: In function ‘pmem_dax_zero_page_range’:
./arch/um/include/asm/page.h:105:20: error:
			too few arguments to function ‘to_phys’
  105 | #define __pa(virt) to_phys((void *) (unsigned long) (virt))
      |                    ^~~~~~~

Use less generic function names for the um specific to_phys() and to_virt()
functions to fix the problem and to avoid similar problems in the future.

Fixes: 9409c9b670 ("pmem: refactor pmem_clear_poison()")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-17 23:44:40 +02:00
Jason A. Donenfeld
dda520d07b um: add "noreboot" command line option for PANIC_TIMEOUT=-1 setups
QEMU has a -no-reboot option, which halts instead of reboots when the
guest asks to reboot. This is invaluable when used with
CONFIG_PANIC_TIMEOUT=-1 (and panic_on_warn), because it allows panics
and warnings to be caught immediately in CI. Implement this in UML too,
by way of a basic setup param.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-17 23:41:02 +02:00
YiFei Zhu
558f9b2f94 um: Fix stack pointer alignment
GCC assumes that stack is aligned to 16-byte on call sites [1].
Since GCC 8, GCC began using 16-byte aligned SSE instructions to
implement assignments to structs on stack. When
CC_OPTIMIZE_FOR_PERFORMANCE is enabled, this affects
os-Linux/sigio.c, write_sigio_thread:

  struct pollfds *fds, tmp;
  tmp = current_poll;

Note that struct pollfds is exactly 16 bytes in size.
GCC 8+ generates assembly similar to:

  movdqa (%rdi),%xmm0
  movaps %xmm0,-0x50(%rbp)

This is an issue, because movaps will #GP if -0x50(%rbp) is not
aligned to 16 bytes [2], and how rbp gets assigned to is via glibc
clone thread_start, then function prologue, going though execution
trace similar to (showing only relevant instructions):

  sub    $0x10,%rsi
  mov    %rcx,0x8(%rsi)
  mov    %rdi,(%rsi)
  syscall
  pop    %rax
  pop    %rdi
  callq  *%rax
  push   %rbp
  mov    %rsp,%rbp

The stack pointer always points to the topmost element on stack,
rather then the space right above the topmost. On push, the
pointer decrements first before writing to the memory pointed to
by it. Therefore, there is no need to have the stack pointer
pointer always point to valid memory unless the stack is poped;
so the `- sizeof(void *)` in the code is unnecessary.

On the other hand, glibc reserves the 16 bytes it needs on stack
and pops itself, so by the call instruction the stack pointer
is exactly the caller-supplied sp. It then push the 16 bytes of
the return address and the saved stack pointer, so the base
pointer will be 16-byte aligned if and only if the caller
supplied sp is 16-byte aligned. Therefore, the caller must supply
a 16-byte aligned pointer, which `stack + UM_KERN_PAGE_SIZE`
already satisfies.

On a side note, musl is unaffected by this issue because it forces
16 byte alignment via `and $-16,%rsi` in its clone wrapper.
Similarly, glibc i386 is also unaffected because it has
`andl $0xfffffff0, %ecx`.

To reproduce this bug, enable CONFIG_UML_RTC and
CC_OPTIMIZE_FOR_PERFORMANCE. uml_rtc will call
add_sigio_fd which will then cause write_sigio_thread to either go
into segfault loop or panic with "Segfault with no mm".

Similarly, signal stacks will be aligned by the host kernel upon
signal delivery. `- sizeof(void *)` to sigaltstack is
unconventional and extraneous.

On a related note, initialization of longjmp buffers do require
`- sizeof(void *)`. This is to account for the return address
that would have been pushed to the stack at the call site.

The reason for uml to respect 16-byte alignment, rather than
telling GCC to assume 8-byte alignment like the host kernel since
commit d9b0cde91c ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported"), is because uml links
against libc. There is no reason to assume libc is also compiled
with that flag and assumes 8-byte alignment rather than 16-byte.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[2] https://c9x.me/x86/html/file_module_x86_id_180.html

Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17 22:08:31 +02:00
Johannes Berg
bfc58e2b98 um: remove process stub VMA
This mostly reverts the old commit 3963333fe6 ("uml: cover stubs
with a VMA") which had added a VMA to the existing PTEs. However,
there's no real reason to have the PTEs in the first place and the
VMA cannot be 'fixed' in place, which leads to bugs that userspace
could try to unmap them and be forcefully killed, or such. Also,
there's a bit of an ugly hole in userspace's address space.

Simplify all this: just install the stub code/page at the top of
the (inner) address space, i.e. put it just above TASK_SIZE. The
pages are simply hard-coded to be mapped in the userspace process
we use to implement an mm context, and they're out of reach of the
inner mmap/munmap/mprotect etc. since they're above TASK_SIZE.

Getting rid of the VMA also makes vma_merge() no longer hit one of
the VM_WARN_ON()s there because we installed a VMA while the code
assumes the stack VMA is the first one.

It also removes a lockdep warning about mmap_sem usage since we no
longer have uml_setup_stubs() and thus no longer need to do any
manipulation that would require mmap_sem in activate_mm().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12 21:37:38 +01:00
Johannes Berg
9f0b4807a4 um: rework userspace stubs to not hard-code stub location
The userspace stacks mostly have a stack (and in the case of the
syscall stub we can just set their stack pointer) that points to
the location of the stub data page already.

Rework the stubs to use the stack pointer to derive the start of
the data page, rather than requiring it to be hard-coded.

In the clone stub, also integrate the int3 into the stack remap,
since we really must not use the stack while we remap it.

This prepares for putting the stub at a variable location that's
not part of the normal address space of the userspace processes
running inside the UML machine.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12 21:35:02 +01:00
Johannes Berg
84b2789d61 um: separate child and parent errors in clone stub
If the two are mixed up, then it looks as though the parent
returned an error if the child failed (before) the mmap(),
and then the resulting process never gets killed. Fix this
by splitting the child and parent errors, reporting and
using them appropriately.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12 21:34:33 +01:00
Johannes Berg
a7d48886ca um: defer killing userspace on page table update failures
In some cases we can get to fix_range_common() with mmap_sem held,
and in others we get there without it being held. For example, we
get there with it held from sys_mprotect(), and without it held
from fork_handler().

Avoid any issues in this and simply defer killing the task until
it runs the next time. Do it on the mm so that another task that
shares the same mm can't continue running afterwards.

Cc: stable@vger.kernel.org
Fixes: 468f65976a ("um: Fix hung task in fix_range_common()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12 21:32:04 +01:00
Johannes Berg
e1e22d0d91 um: print register names in wait_for_stub
Since we're basically debugging the userspace (it runs in ptrace)
it's useful to dump out the registers - but they're not readable,
so if something goes wrong it's hard to say what. Print the names
of registers in the register dump so it's easier to look at.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-12 21:30:19 +01:00
Anton Ivanov
3c6ac61bc9 um: Fetch registers only for signals which need them
UML userspace fetches siginfo and passes it to signal handlers
in UML. This is needed only for some of the signals, because
key handlers like SIGIO make no use of this variable.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13 22:38:06 +01:00
Alex Dewar
97870c34b4 um: Add SPDX headers for files in arch/um/os-Linux
Convert files to use SPDX header. All files are licensed under the GPLv2.

Signed-off-by: Alex Dewar <alex.dewar@gmx.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 21:37:17 +02:00
Johannes Berg
0dafcbe128 um: Implement TRACE_IRQFLAGS_SUPPORT
UML enables TRACE_IRQFLAGS_SUPPORT but doesn't actually implement
it. It seems to have been added for lockdep support, but that can't
actually really work well without IRQ flags tracing, as is also
very noisily reported when enabling CONFIG_DEBUG_LOCKDEP.

Implement it now.

Fixes: 711553efa5 ("[PATCH] uml: declare in Kconfig our partial LOCKDEP support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 21:37:11 +02:00
Richard Weinberger
7ff1e34bbd um: Give start_idle_thread() a return code
Fixes:
arch/um/os-Linux/skas/process.c:613:1: warning: control reaches end of
non-void function [-Wreturn-type]

longjmp() never returns but gcc still warns that the end of the function
can be reached.
Add a return code and debug aid to detect this impossible case.

Signed-off-by: Richard Weinberger <richard@nod.at>
2018-10-29 22:23:11 +01:00
Thomas Meyer
6f602afda7 um: Fix FP register size for XSTATE/XSAVE
Hard code max size. Taken from
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/common/x86-xstate.h

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-09-13 22:24:38 +02:00
Thomas Meyer
e909983026 um: Add kerneldoc for userspace_tramp() and start_userspace()
Also use correct function name spelling (stub_segv_handler) for better grepping

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-07 10:54:03 +02:00
Thomas Meyer
22e19c8d06 um: userspace - be more verbose in ptrace set regs error
When ptrace fails to set GP/FP regs for the target process,
log the error before crashing the UML kernel.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-07-07 10:37:42 +02:00
Richard Weinberger
ce4586063f um: Add missing NR_CPUS include
We need linux/threads.h for that variable.

Fixes: 8bba077066 ("um: Set number of CPUs")
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-05-04 08:15:10 +02:00
Nikola Kotur
8bba077066 um: Set number of CPUs
Define NR_CPUS required by the timer subsystem.

Fixes this make warning:

    scripts/kconfig/conf  --oldconfig arch/x86/um/Kconfig
    kernel/time/Kconfig:155:warning: range is invalid

Signed-off-by: Nikola Kotur <kotnick@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-05-03 22:33:52 +02:00