Jisheng Zhang <jszhang@kernel.org> says:
This series tries to improve link time handling of riscv:
patch1 adds the missing RUNTIME_DISCARD_EXIT as suggested by Masahiro.
Similar as other architectures such as x86, arm64 and so on, enable
ARCH_WANT_LD_ORPHAN_WARN to enable linker orphan warnings to prevent
from missing any new sections in future. So the following two patches
are preparation ones, and the last patch finally selects
ARCH_WANT_LD_ORPHAN_WARN
* b4-shazam-merge:
riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
riscv: lds: define RUNTIME_DISCARD_EXIT
Link: https://lore.kernel.org/r/20230119155417.2600-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
While working on something else, I noticed that the kernel would start
accepting interrupts again after crashing in an interrupt handler. Since
the kernel is already in inconsistent state, enabling interrupts is
dangerous and opens up risk of kernel state deteriorating further.
Interrupts do get enabled via what looks like an unintended side effect of
spin_unlock_irq, so switch to the more cautious
spin_lock_irqsave/spin_unlock_irqrestore instead.
Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230215144828.3370316-1-mnissler@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
As Andrew reported,
Zb* comes after Zi* according 27.11 "Subset Naming Convention"
so fix the ordering accordingly.
Reported-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230208225328.1636017-2-heiko@sntech.de
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When enabling linker orphan section warning, I got warnings similar as
below:
ld.lld: warning:
./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss)
is being placed in '.init.bss'
Catch the sections so that we can enable linker orphan section warning.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.riscv.attributes' from
`init/main.o' being placed in section `.riscv.attributes'
While I don't see any usage of .riscv.attributes sections' in kernel
now, just catch the sections so that we can enable linker orphan
section warning.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.rela.text' from
`init/main.o' being placed in section `.rela.dyn'
Use the approach similar as ARM64 does and declare it in vmlinux.lds.S
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv discards .exit.* sections at run-time but doesn't define
RUNTIME_DISCARD_EXIT. However, the .exit.* sections are still allocated
and kept even if the generic DISCARDS would discard the sections due
to missing RUNTIME_DISCARD_EXIT, because the DISCARD sits at the end of
the linker script. Add the missing RUNTIME_DISCARD_EXIT define so that
it still works if we move DISCARD up or even at the beginning of the
linker script.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230119155417.2600-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Core:
- Yet another round of improvements to make the clocksource watchdog
more robust:
- Relax the clocksource-watchdog skew criteria to match the NTP
criteria.
- Temporarily skip the watchdog when high memory latencies are
detected which can lead to false-positives.
- Provide an option to enable TSC skew detection even on systems
where TSC is marked as reliable.
Sigh!
- Initialize the restart block in the nanosleep syscalls to be directed
to the no restart function instead of doing a partial setup on entry.
This prevents an erroneous restart_syscall() invocation from
corrupting user space data. While such a situation is clearly a user
space bug, preventing this is a correctness issue and caters to the
least suprise principle.
- Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
to align it with the nanosleep semantics.
Drivers:
- The obligatory new driver bindings for Mediatek, Rockchip and RISC-V
variants.
- Add support for the C3STOP misfeature to the RISC-V timer to handle
the case where the timer stops in deeper idle state.
- Set up a static key in the RISC-V timer correctly before first use.
- The usual small improvements and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzV+cTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYlDEACMrjN2F6qeiOW94t4nQ3qP1M9AMSgO
OihC04XuM14/3tEviu/cUOd60wYcUQ/kfI5C+IL35ezeP2w9lnuKqeFpG7aDOa33
5F3isDPamJdXZEZs44CW15brR6dqDlEi5acKee/TtFV9mN6xNhzxM64IaFqecPmW
P+BTwunB8xwquY8RzsHXor/GOGb6mqWQIPoHEPnywTDe/xQYWt0Exzi7ch6HQr5Z
ZzHG6X4h6UTNimjay6L4qsRQWILmPIg4Z5IlycWMQ8qDFM0lbnIJqkG4JwceolI6
aRQyLe3NQFcPYgq3ue+SNm4RckYn4NbAa1zFm0d5VDgKp4xW1sxvtkxOJuxjaOw2
/rLkHkmyuVvCeTMAySfxrwnszAoM505CHC6CEYc1xELbeCkROFUaymtVyNFnnTru
V/Jt/T2Gyx6tOrafX7u+djUjv9figddRpNbskVZvEi3Ztq4MQ069nK3oSUqtP5vO
INApNg4lq6s8aGqVE+Kp9+CKwGqZqI4MdxQMNMAmCRLPon6apActVawbj18qO/wS
qblQ0cbF8a16itlQ3V68qmhcPh6EZOuq8II4etNq6U0ulV9712WfMbat3z53LG94
QNkAmZ3/wui93I+Q2NPxhf5ybJFQZhR0SOtVO6xIdTgOntkODwzzGu9UapfD8mLb
k5BpWnH8CoUgiw==
=I67j
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timekeeping, timers and clockevent/source drivers:
Core:
- Yet another round of improvements to make the clocksource watchdog
more robust:
- Relax the clocksource-watchdog skew criteria to match the NTP
criteria.
- Temporarily skip the watchdog when high memory latencies are
detected which can lead to false-positives.
- Provide an option to enable TSC skew detection even on systems
where TSC is marked as reliable.
Sigh!
- Initialize the restart block in the nanosleep syscalls to be
directed to the no restart function instead of doing a partial
setup on entry.
This prevents an erroneous restart_syscall() invocation from
corrupting user space data. While such a situation is clearly a
user space bug, preventing this is a correctness issue and caters
to the least suprise principle.
- Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
to align it with the nanosleep semantics.
Drivers:
- The obligatory new driver bindings for Mediatek, Rockchip and
RISC-V variants.
- Add support for the C3STOP misfeature to the RISC-V timer to handle
the case where the timer stops in deeper idle state.
- Set up a static key in the RISC-V timer correctly before first use.
- The usual small improvements and fixes all over the place"
* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
clocksource/drivers/em_sti: Mark driver as non-removable
clocksource/drivers/sh_tmu: Mark driver as non-removable
clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
clocksource/drivers/timer-microchip-pit64b: Add delay timer
clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
dt-bindings: timer: mediatek,mtk-timer: add MT8365
clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
clocksource/drivers/sh_cmt: Mark driver as non-removable
clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
clocksource/drivers/riscv: Increase the clock source rating
clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
dt-bindings: timer: Add bindings for the RISC-V timer device
RISC-V: time: initialize hrtimer based broadcast clock event device
dt-bindings: timer: rk-timer: Add rktimer for rv1126
time/debug: Fix memory leak with using debugfs_lookup()
clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
clocksource: Verify HPET and PMTMR when TSC unverified
...
- Improve the scalability of the CFS bandwidth unthrottling logic
with large number of CPUs.
- Fix & rework various cpuidle routines, simplify interaction with
the generic scheduler code. Add __cpuidle methods as noinstr to
objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
- Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
to query previously issued registrations.
- Limit scheduler slice duration to the sysctl_sched_latency period,
to improve scheduling granularity with a large number of SCHED_IDLE
tasks.
- Debuggability enhancement on sys_exit(): warn about disabled IRQs,
but also enable them to prevent a cascade of followup problems and
repeat warnings.
- Fix the rescheduling logic in prio_changed_dl().
- Micro-optimize cpufreq and sched-util methods.
- Micro-optimize ttwu_runnable()
- Micro-optimize the idle-scanning in update_numa_stats(),
select_idle_capacity() and steal_cookie_task().
- Update the RSEQ code & self-tests
- Constify various scheduler methods
- Remove unused methods
- Refine __init tags
- Documentation updates
- ... Misc other cleanups, fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
skkQXoONNaM=
=l1nN
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Improve the scalability of the CFS bandwidth unthrottling logic with
large number of CPUs.
- Fix & rework various cpuidle routines, simplify interaction with the
generic scheduler code. Add __cpuidle methods as noinstr to objtool's
noinstr detection and fix boatloads of cpuidle bugs & quirks.
- Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
previously issued registrations.
- Limit scheduler slice duration to the sysctl_sched_latency period, to
improve scheduling granularity with a large number of SCHED_IDLE
tasks.
- Debuggability enhancement on sys_exit(): warn about disabled IRQs,
but also enable them to prevent a cascade of followup problems and
repeat warnings.
- Fix the rescheduling logic in prio_changed_dl().
- Micro-optimize cpufreq and sched-util methods.
- Micro-optimize ttwu_runnable()
- Micro-optimize the idle-scanning in update_numa_stats(),
select_idle_capacity() and steal_cookie_task().
- Update the RSEQ code & self-tests
- Constify various scheduler methods
- Remove unused methods
- Refine __init tags
- Documentation updates
- Misc other cleanups, fixes
* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
sched/rt: pick_next_rt_entity(): check list_entry
sched/deadline: Add more reschedule cases to prio_changed_dl()
sched/fair: sanitize vruntime of entity being placed
sched/fair: Remove capacity inversion detection
sched/fair: unlink misfit task from cpu overutilized
objtool: mem*() are not uaccess safe
cpuidle: Fix poll_idle() noinstr annotation
sched/clock: Make local_clock() noinstr
sched/clock/x86: Mark sched_clock() noinstr
x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
x86/atomics: Always inline arch_atomic64*()
cpuidle: tracing, preempt: Squash _rcuidle tracing
cpuidle: tracing: Warn about !rcu_is_watching()
cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
cpuidle: drivers: firmware: psci: Dont instrument suspend code
KVM: selftests: Fix build of rseq test
exit: Detect and fix irq disabled state in oops
cpuidle, arm64: Fix the ARM64 cpuidle logic
cpuidle: mvebu: Fix duplicate flags assignment
sched/fair: Limit sched slice duration
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+/uBgAKCRDbK58LschI
g0ngAPwJHd1RicBuy2C4fLv0nGKZtmYZBAnTGlI2RisPxU6BRwEAwUDLHuc5K6nR
j261okOxOy/MRxdN1NhmR6Qe7nMyQAk=
=tYU+
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-02-17
We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
The main changes are:
1) Add a rbtree data structure following the "next-gen data structure"
precedent set by recently-added linked-list, that is, by using
kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
2) Add a new benchmark for hashmap lookups to BPF selftests,
from Anton Protopopov.
3) Fix bpf_fib_lookup to only return valid neighbors and add an option
to skip the neigh table lookup, from Martin KaFai Lau.
4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
accouting for container environments, from Yafang Shao.
5) Batch of ice multi-buffer and driver performance fixes,
from Alexander Lobakin.
6) Fix a bug in determining whether global subprog's argument is
PTR_TO_CTX, which is based on type names which breaks kprobe progs,
from Andrii Nakryiko.
7) Prep work for future -mcpu=v4 LLVM option which includes usage of
BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
from Eduard Zingerman.
8) More prep work for later building selftests with Memory Sanitizer
in order to detect usages of undefined memory, from Ilya Leoshkevich.
9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
dereference via sendmsg(), from Maciej Fijalkowski.
10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
11) Fix BPF memory allocator in combination with BPF hashtab where it could
corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
12) Fix LoongArch BPF JIT to always use 4 instructions for function
address so that instruction sequences don't change between passes,
from Hengqi Chen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
selftests/bpf: Add bpf_fib_lookup test
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
riscv, bpf: Add bpf trampoline support for RV64
riscv, bpf: Add bpf_arch_text_poke support for RV64
riscv, bpf: Factor out emit_call for kernel and bpf context
riscv: Extend patch_text for multiple instructions
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
selftests/bpf: Add global subprog context passing tests
selftests/bpf: Convert test_global_funcs test to test_loader framework
bpf: Fix global subprog context argument resolution logic
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
bpf: Disable bh in bpf_test_run for xdp and tc prog
xsk: check IFF_UP earlier in Tx path
Fix typos in selftest/bpf files
selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
...
====================
Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend patch_text for multiple instructions. This is the preparaiton for
multiple instructions text patching in riscv BPF trampoline, and may be
useful for other scenario.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/bpf/20230215135205.1411105-2-pulehui@huaweicloud.com
The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside
the kprobes implementation. This got moved into a central header in
commit ec5f908775 ("RISC-V: Move riscv_insn_is_* macros into a common header").
Though it looks like I overlooked two of them, so fix that. FENCE itself is
an instruction defined directly by its own opcode, while the created
riscv_isn_is_system function covers all instructions defined under the SYSTEM
opcode.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230113211955.3534431-1-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
guoren@kernel.org <guoren@kernel.org> says:
From: Guo Ren <guoren@linux.alibaba.com>
The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
- The most horrible bug is preemption panic which found by Andy [1].
Let's disable preemption for ftrace first, and Andy could continue
the ftrace preemption work.
- The "-fpatchable-function-entry= CFLAG" wasted code size
!RISCV_ISA_C.
- The ftrace detour implementation wasted code size.
- When livepatching, the trampoline (ftrace_regs_caller) would not
return to <func_prolog+12> but would rather jump to the new function.
So, "REG_L ra, -SZREG(sp)" would not run and the original return
address would not be restored. The kernel is likely to hang or crash
as a result. (Found by Evgenii Shatokhin [4])
[Palmer: The first three patches in this series are pretty concrete
fixes, so I'm pulling them ahead of the rest of the series.]
* b4-shazam-merge:
riscv: ftrace: Reduce the detour code size to half
riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
riscv: ftrace: Fixup panic by disabling preemption
Link: https://lore.kernel.org/r/20230112090603.1295340-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Use a temporary register to reduce the size of detour code from 16 bytes to
8 bytes. The previous implementation is from 'commit afc76b8b80 ("riscv:
Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'.
Before the patch:
<func_prolog>:
0: REG_S ra, -SZREG(sp)
4: auipc ra, ?
8: jalr ?(ra)
12: REG_L ra, -SZREG(sp)
(func_boddy)
After the patch:
<func_prolog>:
0: auipc t0, ?
4: jalr t0, ?(t0)
(func_boddy)
This patch not just reduces the size of detour code, but also fixes an
important issue:
An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can
actually change the instruction pointer, e.g. to "replace" the given
kernel function with a new one, which is needed for livepatching, etc.
In this case, the trampoline (ftrace_regs_caller) would not return to
<func_prolog+12> but would rather jump to the new function. So, "REG_L
ra, -SZREG(sp)" would not run and the original return address would not
be restored. The kernel is likely to hang or crash as a result.
This can be easily demonstrated if one tries to "replace", say,
cmdline_proc_show() with a new function with the same signature using
instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace
callback.
Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/
Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
Co-developed-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>
Reviewed-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/r/20230112090603.1295340-4-guoren@kernel.org
Cc: stable@vger.kernel.org
Fixes: 10626c32e3 ("riscv/ftrace: Add basic support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
Ever since RISC-V starting using generic arch topology code, the code
paths for cpu-capacity have been there but there's no binding defined to
actually convey the information. Defining the same property as used on
arm seems to be the only logical thing to do, so do it.
[Palmer: This is on top of the fix required to make it work, which
itself wasn't merged until late in the 6.2 cycle and thus pulls in
various other fixes.]
* b4-shazam-merge:
dt-bindings: riscv: add a capacity-dmips-mhz cpu property
dt-bindings: arm: move cpu-capacity to a shared loation
riscv: Move call to init_cpu_topology() to later initialization stage
riscv/kprobe: Fix instruction simulation of JALR
riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
MAINTAINERS: add an IRC entry for RISC-V
RISC-V: fix compile error from deduplicated __ALTERNATIVE_CFG_2
dt-bindings: riscv: fix single letter canonical order
dt-bindings: riscv: fix underscore requirement for multi-letter extensions
riscv: uaccess: fix type of 0 variable on error in get_user()
riscv, kprobes: Stricter c.jr/c.jalr decoding
Link: https://lore.kernel.org/r/20230104180513.1379453-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Having a clocksource_arch_init() callback always sets vdso_clock_mode to
VDSO_CLOCKMODE_ARCHTIMER if GENERIC_GETTIMEOFDAY is enabled, this is
required for the riscv-timer.
This works for platforms where just riscv-timer clocksource is present.
On platforms where other clock sources are available we want them to
register with vdso_clock_mode set to VDSO_CLOCKMODE_NONE.
On the Renesas RZ/Five SoC OSTM block can be used as clocksource [0], to
avoid multiple clock sources being registered as VDSO_CLOCKMODE_ARCHTIMER
move setting of vdso_clock_mode in the riscv-timer driver instead of doing
this in clocksource_arch_init() callback as done similarly for ARM/64
architecture.
[0] drivers/clocksource/renesas-ostm.c
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221229224601.103851-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Similarly to commit 022eb8ae8b ("ARM: 8938/1: kernel: initialize
broadcast hrtimer based clock event device"), RISC-V needs to initiate
hrtimer based broadcast clock event device before C3STOP can be used.
Otherwise, the introduction of C3STOP for the RISC-V arch timer in
commit 232ccac1bd ("clocksource/drivers/riscv: Events are stopped
during CPU suspend") leaves us without any broadcast timer registered.
This prevents the kernel from entering oneshot mode, which breaks timer
behaviour, for example clock_nanosleep().
A test app that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250
& C3STOP enabled, the sleep times are rounded up to the next jiffy:
== CPU: 1 == == CPU: 2 == == CPU: 3 == == CPU: 4 ==
Mean: 7.974992 Mean: 7.976534 Mean: 7.962591 Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000 Hi: 10.495000 Hi: 8.864000 Hi: 4.736000
Lo: 6.087000 Lo: 6.380000 Lo: 4.872000 Lo: 3.403000
Samples: 521 Samples: 521 Samples: 521 Samples: 521
Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Fixes: 232ccac1bd ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Suggested-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230103141102.772228-2-apatel@ventanamicro.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Add the generic plumbing to detect whether or not the runtime code
regions were constructed with BTI/IBT landing pads by the firmware,
permitting the OS to enable enforcement when mapping these regions into
the OS's address space.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
When running kfence_test, I found some testcases failed like this:
# test_out_of_bounds_read: EXPECTATION FAILED at mm/kfence/kfence_test.c:346
Expected report_matches(&expect) to be true, but is false
not ok 1 - test_out_of_bounds_read
The corresponding call-trace is:
BUG: KFENCE: out-of-bounds read in kunit_try_run_case+0x38/0x84
Out-of-bounds read at 0x(____ptrval____) (32B right of kfence-#10):
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
The kfence_test using the first frame of call trace to check whether the
testcase is succeed or not. Commit 6a00ef4493 ("riscv: eliminate
unreliable __builtin_frame_address(1)") skip first frame for all
case, which results the kfence_test failed. Indeed, we only need to skip
the first frame for case (task==NULL || task==current).
With this patch, the call-trace will be:
BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0x88/0x19e
Out-of-bounds read at 0x(____ptrval____) (1B left of kfence-#7):
test_out_of_bounds_read+0x88/0x19e
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
Fixes: 6a00ef4493 ("riscv: eliminate unreliable __builtin_frame_address(1)")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221207025038.1022045-1-liushixin2@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The kernel would panic when probed for an illegal position. eg:
(CONFIG_RISCV_ISA_C=n)
echo 'p:hello kernel_clone+0x16 a0=%a0' >> kprobe_events
echo 1 > events/kprobes/hello/enable
cat trace
Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: __do_sys_newfstatat+0xb8/0xb8
CPU: 0 PID: 111 Comm: sh Not tainted
6.2.0-rc1-00027-g2d398fe49a4d #490
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80007268>] dump_backtrace+0x38/0x48
[<ffffffff80c5e83c>] show_stack+0x50/0x68
[<ffffffff80c6da28>] dump_stack_lvl+0x60/0x84
[<ffffffff80c6da6c>] dump_stack+0x20/0x30
[<ffffffff80c5ecf4>] panic+0x160/0x374
[<ffffffff80c6db94>] generic_handle_arch_irq+0x0/0xa8
[<ffffffff802deeb0>] sys_newstat+0x0/0x30
[<ffffffff800158c0>] sys_clone+0x20/0x30
[<ffffffff800039e8>] ret_from_syscall+0x0/0x4
---[ end Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 ]---
That is because the kprobe's ebreak instruction broke the kernel's
original code. The user should guarantee the correction of the probe
position, but it couldn't make the kernel panic.
This patch adds arch_check_kprobe in arch_prepare_kprobe to prevent an
illegal position (Such as the middle of an instruction).
Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230201040604.3390509-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang <jszhang@kernel.org> says:
Generally, riscv ISA extensions are fixed for any specific hardware
platform, so a hart's features won't change after booting, this
chacteristic makes it straightforward to use a static branch to check
a specific ISA extension is supported or not to optimize performance.
However, some ISA extensions such as SVPBMT and ZICBOM are handled
via. the alternative sequences.
Basically, for ease of maintenance, we prefer to use static branches
in C code, but recently, Samuel found that the static branch usage in
cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
Samuel pointed out, "Having a static branch in cpu_relax() is
problematic because that function is widely inlined, including in some
quite complex functions like in the VDSO. A quick measurement shows
this static branch is responsible by itself for around 40% of the jump
table."
Samuel's findings pointed out one of a few downsides of static branches
usage in C code to handle ISA extensions detected at boot time:
static branch's metadata in the __jump_table section, which is not
discarded after ISA extensions are finalized, wastes some space.
I want to try to solve the issue for all possible dynamic handling of
ISA extensions at boot time. Inspired by Mark[2], this patch introduces
riscv_has_extension_*() helpers, which work like static branches but
are patched using alternatives, thus the metadata can be freed after
patching.
[1]https://lore.kernel.org/linux-riscv/20220922060958.44203-1-samuel@sholland.org/
[2]https://lore.kernel.org/linux-arm-kernel/20220912162210.3626215-8-mark.rutland@arm.com/
[3]https://lore.kernel.org/linux-riscv/20221130225614.1594256-1-heiko@sntech.de/
* b4-shazam-merge:
riscv: remove riscv_isa_ext_keys[] array and related usage
riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
riscv: cpu_relax: switch to riscv_has_extension_likely()
riscv: alternative: patch alternatives in the vDSO
riscv: switch to relative alternative entries
riscv: module: Add ADD16 and SUB16 rela types
riscv: module: move find_section to module.h
riscv: fpu: switch has_fpu() to riscv_has_extension_likely()
riscv: introduce riscv_has_extension_[un]likely()
riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions
riscv: hwcap: make ISA extension ids can be used in asm
riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
riscv: move riscv_noncoherent_supported() out of ZICBOM probe
Link: https://lore.kernel.org/r/20230128172856.3814-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
All users have switched to riscv_has_extension_*, remove unused
definitions, vars and related setting code.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230128172856.3814-14-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Make it possible to use alternatives in the vDSO, so that better
implementations can be used if possible.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230128172856.3814-11-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Instead of using absolute addresses for both the old instrucions and
the alternative instructions, use offsets relative to the alt_entry
values. So this not only cuts the size of the alternative entry, but
also meets the prerequisite for patching alternatives in the vDSO,
since absolute alternative entries are subject to dynamic relocation,
which is incompatible with the vDSO building.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-10-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
To prepare for 16-bit relocation types to be emitted in alternatives
add support for ADD16 and SUB16.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-9-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Move find_section() to module.h so that the implementation can be shared
by the alternatives code. This will allow us to use alternatives in
the vdso.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230128172856.3814-8-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv_cpufeature_patch_func() currently only scans a limited set of
cpufeatures, explicitly defined with macros. Extend it to probe for all
ISA extensions.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently riscv_cpufeature_patch_func() does nothing at the
RISCV_ALTERNATIVES_EARLY_BOOT stage. Add a check to detect whether we
are in this stage and exit early. This will allow us to use
riscv_cpufeature_patch_func() for scanning of all ISA extensions.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
It's a bit weird to call riscv_noncoherent_supported() each time when
insmoding a module. Move the calling out of feature patch func.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Heiko Stuebner <heiko@sntech.de> says:
From: Heiko Stuebner <heiko.stuebner@vrull.eu>
This series still tries to allow optimized string functions for specific
extensions. The last approach of using an inline base function to hold
the alternative calls did cause some issues in a number of places
So instead of that we're now just using an alternative j at the beginning
of the generic function to jump to a separate place inside the function
itself.
* b4-shazam-merge:
RISC-V: add zbb support to string functions
RISC-V: add infrastructure to allow different str* implementations
Link: https://lore.kernel.org/r/20230113212301.3534711-1-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add handling for ZBB extension and add support for using it as a
variant for optimized string functions.
Support for the Zbb-str-variants is limited to the GNU-assembler
for now, as LLVM has not yet acquired the functionality to
selectively change the arch option in assembler code.
This is still under review at
https://reviews.llvm.org/D123515
Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230113212301.3534711-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Depending on supported extensions on specific RISC-V cores,
optimized str* functions might make sense.
This adds basic infrastructure to allow patching the function calls
via alternatives later on.
The Linux kernel provides standard implementations for string functions
but when architectures want to extend them, they need to provide their
own.
The added generic string functions are done in assembler (taken from
disassembling the main-kernel functions for now) to allow us to control
the used registers and extend them with optimized variants.
This doesn't override the compiler's use of builtin replacements. So still
first of all the compiler will select if a builtin will be better suitable
i.e. for known strings. For all regular cases we will want to later
select possible optimized variants and in the worst case fall back to the
generic implemention added with this change.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230113212301.3534711-2-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
SDc/Y/c=
=zpaD
-----END PGP SIGNATURE-----
Merge tag 'v6.2-rc6' into sched/core, to pick up fixes
Pick up fixes before merging another batch of cpuidle updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If "capacity-dmips-mhz" is present in a CPU DT node,
topology_parse_cpu_capacity() will fail to allocate memory. arm64, with
which this code path is shared, does not call
topology_parse_cpu_capacity() until later in boot where memory
allocation is available. While "capacity-dmips-mhz" is not yet a valid
property on RISC-V, invalid properties should be ignored rather than
cause issues. Move init_cpu_topology(), which calls
topology_parse_cpu_capacity(), to a later initialization stage, to match
arm64.
As a side effect of this change, RISC-V is "protected" from changes to
core topology code that would work on arm64 where memory allocation is
safe but on RISC-V isn't.
Fixes: 03f11f03db ("RISC-V: Parse cpu topology during boot.")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Link: https://lore.kernel.org/r/20230105033705.3946130-1-leyfoon.tan@starfivetech.com
[Palmer: use Conor's commit text]
Link: https://lore.kernel.org/linux-riscv/20230104183033.755668-1-pierre.gondois@arm.com/T/#me592d4c8b9508642954839f0077288a353b0b9b2
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alternatives live in a different section, so offsets used by jal
instruction will point to wrong locations after the patch got applied.
Similar to arm64, adjust the location to consider that offset.
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230113212205.3534622-1-heiko@sntech.de
Fixes: 27c653c065 ("RISC-V: fix auipc-jalr addresses in patched alternatives")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The main change is to build the cache topology information for all
the CPUs from the primary CPU. Currently the cacheinfo for secondary CPUs
is created during the early boot on the respective CPU itself. Preemption
and interrupts are disabled at this stage. On PREEMPT_RT kernels, allocating
memory and even parsing the PPTT table for ACPI based systems triggers a:
'BUG: sleeping function called from invalid context'
To prevent this bug, the cacheinfo is now allocated from the primary CPU
when preemption and interrupts are enabled and before booting secondary
CPUs. The cache levels/leaves are computed from DT/ACPI PPTT information
only, without relying on any architecture specific mechanism if done so
early.
The other minor change included here is to handle shared caches at
different levels when not all the CPUs on the system have the same
cache hierarchy.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEunHlEgbzHrJD3ZPhAEG6vDF+4pgFAmPKg60ACgkQAEG6vDF+
4pi91hAAoSluqjbUHzzCW+OIIjKAAvQrAw6bsKGvSpcUfYno1Lry+9y76L6TMYSy
OPtiKGxcJyzhdlCwIpJzgaX9nTz7uiu70euNZiAp11XA2KlphtLoI3TMUa60jD4i
ZGfn9UiAp719Vog5m3CmZXjHZ6drI0HloL8ZWTl4VDATUu5pfcx4uYPT2o63Xc62
k5QglaRJWFhFAJ+R6R9vQS2zfeMI9xvehl72445wb8pxxPW2f91dvBhJqJgKlziw
gHKx+D1DnpAUd+v+7HAEmzjXKlY6JnQybmBHmRayllVAa8kGUtvhTcBlRGNsNBzR
m7VBFKq+eSk7VgxOgka1qXVtHUrlaEWf5qWnG+w4XEiE1VgzNagjaFRaGQQneKI/
z3yNKG8Xjp+3BdSz0pUDJVEWFnnjueAUEh6/xODEXnWdX166abQZslLIHCvmcnM8
q7blasuj2mxyCZFC1tyK9WHI7/KCe0cmHbdau3qs0j9bvhzfdB3DwMLsdRjXQTOv
8FVX0Z5EKY9/bW2oqCg/mb3KOWbmFX2ZHho4cds3IV+9GGB8JkD/6b8vpGejmmx2
E2vInzhP3gLd9WiWQWDjg4+aklE29P/nDAA8BSPnW3TtEGAFJMZZQRgNlCZnBW56
Tx2/lE0VD5/UX+1MqSFGchi+KEX3mcZykcra9VNt+uZH26a8gBE=
=nH4b
-----END PGP SIGNATURE-----
Merge tag 'archtopo-cacheinfo-updates-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into driver-core-next
Sudeep writes:
"cacheinfo and arch_topology updates for v6.3
The main change is to build the cache topology information for all
the CPUs from the primary CPU. Currently the cacheinfo for secondary CPUs
is created during the early boot on the respective CPU itself. Preemption
and interrupts are disabled at this stage. On PREEMPT_RT kernels, allocating
memory and even parsing the PPTT table for ACPI based systems triggers a:
'BUG: sleeping function called from invalid context'
To prevent this bug, the cacheinfo is now allocated from the primary CPU
when preemption and interrupts are enabled and before booting secondary
CPUs. The cache levels/leaves are computed from DT/ACPI PPTT information
only, without relying on any architecture specific mechanism if done so
early.
The other minor change included here is to handle shared caches at
different levels when not all the CPUs on the system have the same
cache hierarchy."
* tag 'archtopo-cacheinfo-updates-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
cacheinfo: Fix shared_cpu_map to handle shared caches at different levels
arch_topology: Build cacheinfo from primary CPU
ACPI: PPTT: Update acpi_find_last_cache_level() to acpi_get_cache_info()
ACPI: PPTT: Remove acpi_find_cache_levels()
cacheinfo: Check 'cache-unified' property to count cache leaves
cacheinfo: Return error code in init_of_cache_level()
cacheinfo: Use RISC-V's init_cache_level() as generic OF implementation
This cleans up the ISA string handling to more closely match a version
of the ISA spec. This is visible in /proc/cpuinfo and the ordering
changes may break something in userspace, but these orderings have
changed before without issues so with any luck that's still the case.
This also adds documentation so userspace has a better idea of what is
intended when it comes to compatibility for /proc/cpuinfo, which should
help everyone as this will likely keep changing.
* b4-shazam-merge:
Documentation: riscv: add a section about ISA string ordering in /proc/cpuinfo
RISC-V: resort all extensions in consistent orders
RISC-V: clarify ISA string ordering rules in cpu.c
Link: https://lore.kernel.org/r/20221205144525.2148448-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Since commit 80b6093b55 ("kbuild: add -Wundef to KBUILD_CPPFLAGS
for W=1 builds"), building with W=1 detects misuse of #if.
$ make W=1 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- arch/riscv/kernel/
[snip]
AS arch/riscv/kernel/head.o
arch/riscv/kernel/head.S:329:5: warning: "CONFIG_RISCV_BOOT_SPINWAIT" is not defined, evaluates to 0 [-Wundef]
329 | #if CONFIG_RISCV_BOOT_SPINWAIT
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
CONFIG_RISCV_BOOT_SPINWAIT is a bool option. #ifdef should be used.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Fixes: 2ffc48fc70 ("RISC-V: Move spinwait booting method to its own config")
Link: https://lore.kernel.org/r/20230106161213.2374093-1-masahiroy@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas. While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma. In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas. When
crossing a vma boundary, a new mmu_notifier_range_init/end call pair with
the new vma should be made.
Instead of fixing zap_page_range, do the following:
- Create a new routine zap_vma_pages() that will remove all pages within
the passed vma. Most users of zap_page_range pass the entire vma and
can use this new routine.
- For callers of zap_page_range not passing the entire vma, instead call
zap_page_range_single().
- Remove zap_page_range.
[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit 3fcbf1c77d ("arch_topology: Fix cache attributes detection
in the CPU hotplug path")
adds a call to detect_cache_attributes() to populate the cacheinfo
before updating the siblings mask. detect_cache_attributes() allocates
memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT
kernels, on secondary CPUs, this triggers a:
'BUG: sleeping function called from invalid context' [1]
as the code is executed with preemption and interrupts disabled.
The primary CPU was previously storing the cache information using
the now removed (struct cpu_topology).llc_id:
commit 5b8dc787ce ("arch_topology: Drop LLC identifier stash from
the CPU topology")
allocate_cache_info() tries to build the cacheinfo from the primary
CPU prior secondary CPUs boot, if the DT/ACPI description
contains cache information.
If allocate_cache_info() fails, then fallback to the current state
for the cacheinfo allocation. [1] will be triggered in such case.
When unplugging a CPU, the cacheinfo memory cannot be freed. If it
was, then the memory would be allocated early by the re-plugged
CPU and would trigger [1].
Note that populate_cache_leaves() might be called multiple times
due to populate_leaves being moved up. This is required since
detect_cache_attributes() might be called with per_cpu_cacheinfo(cpu)
being allocated but not populated.
[1]:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111
| preempt_count: 1, expected: 0
| RCU nest depth: 1, expected: 1
| 3 locks held by swapper/111/0:
| #0: (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8
| #1: (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0
| #2: (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80
| irq event stamp: 0
| hardirqs last enabled at (0): 0x0
| hardirqs last disabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last enabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last disabled at (0): 0x0
| Preemption disabled at:
| migrate_enable+0x30/0x130
| CPU: 111 PID: 0 Comm: swapper/111 Tainted: G W 6.0.0-rc4-rt6-[...]
| Call trace:
| __kmalloc+0xbc/0x1e8
| detect_cache_attributes+0x2d4/0x5f0
| update_siblings_masks+0x30/0x368
| store_cpu_topology+0x78/0xb8
| secondary_start_kernel+0xd0/0x198
| __secondary_switched+0xb0/0xb4
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230104183033.755668-7-pierre.gondois@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Ordering between each and every list of extensions is wildly
inconsistent. Per discussion on the lists pick the following policy:
- The array defining order in /proc/cpuinfo follows a narrow
interpretation of the ISA specifications, described in a comment
immediately presiding it.
- All other lists of extensions are sorted alphabetically.
This will hopefully allow for easier review & future additions, and
reduce conflicts between patchsets as the number of extensions grows.
Link: https://lore.kernel.org/all/20221129144742.2935581-2-conor.dooley@microchip.com/
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221205144525.2148448-3-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
While the current list of rules may have been accurate when created
it now lacks some clarity in the face of isa-manual updates. Instead of
trying to continuously align this rule-set with the one in the
specifications, change the role of this comment.
This particular comment is important, as the array it "decorates"
defines the order in which the ISA string appears to userspace in
/proc/cpuinfo.
Re-jig and strengthen the wording to provide contributors with a set
order in which to add entries & note why this particular struct needs
more attention than others.
While in the area, add some whitespace and tweak some wording for
readability's sake.
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221205144525.2148448-2-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISC-V's implementation of init_of_cache_level() is following
the Devicetree Specification v0.3 regarding caches, cf.:
- s3.7.3 'Internal (L1) Cache Properties'
- s3.8 'Multi-level and Shared Cache Nodes'
Allow reusing the implementation by moving it.
Also make 'levels', 'leaves' and 'level' unsigned int.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230104183033.755668-2-pierre.gondois@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.
However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.
Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.618076436@infradead.org
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add
is encoded the following way (each instruction is 16b):
---+-+-----------+-----------+--
100 0 rs1[4:0]!=0 00000 10 : c.jr
100 1 rs1[4:0]!=0 00000 10 : c.jalr
100 0 rd[4:0]!=0 rs2[4:0]!=0 10 : c.mv
100 1 rd[4:0]!=0 rs2[4:0]!=0 10 : c.add
The following logic is used to decode c.jr and c.jalr:
insn & 0xf007 == 0x8002 => instruction is an c.jr
insn & 0xf007 == 0x9002 => instruction is an c.jalr
When 0xf007 is used to mask the instruction, c.mv can be incorrectly
decoded as c.jr, and c.add as c.jalr.
Correct the decoding by changing the mask from 0xf007 to 0xf07f.
Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230102160748.1307289-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alternatives live in a different section, so addresses used by call
functions will point to wrong locations after the patch got applied.
Similar to arm64, adjust the location to consider that offset.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-13-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Don't redefine values that are already available in the central header
asm/insn.h . Use the values from there instead.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-9-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The current parse_asm header should become a more centralized place
for everything concerning parsing and constructing instructions.
We already have a header insn-def.h similar to aarch64, so rename
parse_asm.h to insn.h (again similar to aarch64) to show that it's
meant for more than simple instruction parsing.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-8-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Right now the riscv kernel has (at least) two independent sets
of functions to check if an encoded instruction is of a specific
type. One in kgdb and one kprobes simulate-insn code.
More parts of the kernel will probably need this in the future,
so instead of allowing this duplication to go on further,
move macros that do the function declaration in a common header,
similar to at least aarch64.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-7-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Some of the constants and macros already have suitable RV_, RVG_ or
RVC_ prefixes.
Extend this to the rest of the file as well, as we want to use these
things in a broader scope soon.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* Allow unloading KVM module
* Allow KVM user-space to set mvendorid, marchid, and mimpid
* Several fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOhy+QUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOdUwf+K3i8RHW1H8TF/JSrn1I6nURNLYhb
2wXzl3esOsfswtn6dxEvLEXivcKmD2G9bLpa2UIa3vw1Plg9tdce9IJ5qDodtxVL
mlISMUSgMNy+lelKJiG+l5Ld4oJ4HUY0yw/p3Ml9WUpra98UCB0sJ+FsqXr4ndi9
LxkQJrNyZkQcRH2IXjQhKjdjkepFTmkhKs/uCxAZvW9zfUmGX0dcp9W22PTbsapQ
IcaBKdVaNN3TXNSIdDCM2Iv+oBN7gJn1CbgFxhkp4L8eE5PvRjFw0QooFMn2TjDw
VflP3gIs/41+5tnoPWXGAkKFe/Z5aJjGjx6Yx0WnEEgoAG47RUHYsKIUjw==
=8ejV
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull RISC-V kvm updates from Paolo Bonzini:
- Allow unloading KVM module
- Allow KVM user-space to set mvendorid, marchid, and mimpid
- Several fixes and cleanups
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
RISC-V: KVM: Add ONE_REG interface for mvendorid, marchid, and mimpid
RISC-V: KVM: Save mvendorid, marchid, and mimpid when creating VCPU
RISC-V: Export sbi_get_mvendorid() and friends
RISC-V: KVM: Move sbi related struct and functions to kvm_vcpu_sbi.h
RISC-V: KVM: Use switch-case in kvm_riscv_vcpu_set/get_reg()
RISC-V: KVM: Remove redundant includes of asm/csr.h
RISC-V: KVM: Remove redundant includes of asm/kvm_vcpu_timer.h
RISC-V: KVM: Fix reg_val check in kvm_riscv_vcpu_set_reg_config()
RISC-V: KVM: Simplify kvm_arch_prepare_memory_region()
RISC-V: KVM: Exit run-loop immediately if xfer_to_guest fails
RISC-V: KVM: use vma_lookup() instead of find_vma_intersection()
RISC-V: KVM: Add exit logic to main.c
* Support for the T-Head PMU via the perf subsystem.
* ftrace support for rv32.
* Support for non-volatile memory devices.
* Various fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmOZ6WsTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiWGcD/wLGiHq3ekQhl5D+CaA1WlJ5XzQFfY2
bv1ZCZGdjuiv66jiMlmEsbpfUCk3bSAIjCO3MHQNDmTuPJztCHVJXOHbZFWItzzO
soW4nXHKW1sGHa7hDLGQUPkltA48OdPoyqEDvlnpyEWFT+2xHwdFEURWE85FXGeq
ZzFSKUQqX/V52n9TS4M4QtmNnQatR3TgIs8ttzD4JqwWFBbp4/iBfIGt6n3W24XH
9lKWikO4YOYUPl0KVIakM4d8NmX7g+7vhCKWavLke1fF/IQOlyWwA0eM8ryj33OG
L1nFkqfF3mCw9i72WHftlc0rAgVqcYS8ntnQkPNpt2zPp3xFjDwEy+XiZrRE+sAp
m5Ma2Tkw7G3ueBtXwP1yo+EKa7PrVFbCRD/rEpLJAC6+9ktvc7cYs39E08O+wrwT
qkYThDolovqMOqfOq6afEGy5lfIa5U00vxK+3MXiE3eLEjHSJhwTXadUbwyMjJWE
zOwA6p5NfDFzklESSNTtIBY85Zlh/g2q6GWCy7yBQnlaSdbpDxcnAlSZipq66Iqm
9ytdZiHid4BIRQxr5qyXTB184BvFnWNRs9NGhCj38uLEnuxwSChzwoh/WPDxLNte
U9ouvwJO5U2qAZsMGJhY8W2s/9WvWpSqRhSMA/nnNV1Hh+URFz8rFXAln6kNn//v
j+cYGCyjLnO1hg==
=4Ak2
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.2-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the T-Head PMU via the perf subsystem
- ftrace support for rv32
- Support for non-volatile memory devices
- Various fixes and cleanups
* tag 'riscv-for-linus-6.2-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (52 commits)
Documentation: RISC-V: patch-acceptance: s/implementor/implementer
Documentation: RISC-V: Mention the UEFI Standards
Documentation: RISC-V: Allow patches for non-standard behavior
Documentation: RISC-V: Fix a typo in patch-acceptance
riscv: Fixup compile error with !MMU
riscv: Fix P4D_SHIFT definition for 3-level page table mode
riscv: Apply a static assert to riscv_isa_ext_id
RISC-V: Add some comments about the shadow and overflow stacks
RISC-V: Align the shadow stack
RISC-V: Ensure Zicbom has a valid block size
RISC-V: Introduce riscv_isa_extension_check
RISC-V: Improve use of isa2hwcap[]
riscv: Don't duplicate _ALTERNATIVE_CFG* macros
riscv: alternatives: Drop the underscores from the assembly macro names
riscv: alternatives: Don't name unused macro parameters
riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2
riscv: mm: call best_map_size many times during linear-mapping
riscv: Move cast inside kernel_mapping_[pv]a_to_[vp]a
riscv: Fix crash during early errata patching
riscv: boot: add zstd support
...
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB or
systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it to
recover from synchronous exceptions that might occur in the firmware
code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOTQ1cACgkQw08iOZLZ
jyQRkAv+LqaZFWeVwhAQHiw/N3RnRM0nZHea6++D2p1y/ZbCpwv3pdLl2YHQ1KmW
wDG9Nr4C1ITLtfy1YZKeYpwloQtq9S1GZDWnFpVv/hdo7L924eRAwIlxowWn1OnP
ruxv2PaYXyb0plh1YD1f6E1BqrfUOtajET55Kxs9ZsxmnMtDpIX3NiYy4LKMBIZC
+Eywt41M3uBX+wgmSujFBMVVJjhOX60WhUYXqy0RXwDKOyrz/oW5td+eotSCreB6
FVbjvwQvUdtzn4s1FayOMlTrkxxLw4vLhsaUGAdDOHd3rg3sZT9Xh1HqFFD6nss6
ZAzAYQ6BzdiV/5WSB9meJe+BeG1hjTNKjJI6JPO2lctzYJqlnJJzI6JzBuH9vzQ0
dffLB8NITeEW2rphIh+q+PAKFFNbXWkJtV4BMRpqmzZ/w7HwupZbUXAzbWE8/5km
qlFpr0kmq8GlVcbXNOFjmnQVrJ8jPYn+O3AwmEiVAXKZJOsMH0sjlXHKsonme9oV
Sk71c6Em
=JEXz
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Another fairly sizable pull request, by EFI subsystem standards.
Most of the work was done by me, some of it in collaboration with the
distro and bootloader folks (GRUB, systemd-boot), where the main focus
has been on removing pointless per-arch differences in the way EFI
boots a Linux kernel.
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB
or systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it
to recover from synchronous exceptions that might occur in the
firmware code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records"
* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
arm64: efi: Recover from synchronous exceptions occurring in firmware
arm64: efi: Execute runtime services from a dedicated stack
arm64: efi: Limit allocations to 48-bit addressable physical region
efi: Put Linux specific magic number in the DOS header
efi: libstub: Always enable initrd command line loader and bump version
efi: stub: use random seed from EFI variable
efi: vars: prohibit reading random seed variables
efi: random: combine bootloader provided RNG seed with RNG protocol output
efi/cper, cxl: Decode CXL Error Log
efi/cper, cxl: Decode CXL Protocol Error Section
efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
efi: x86: Move EFI runtime map sysfs code to arch/x86
efi: runtime-maps: Clarify purpose and enable by default for kexec
efi: pstore: Add module parameter for setting the record size
efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
efi: memmap: Move manipulation routines into x86 arch tree
efi: memmap: Move EFI fake memmap support into x86 arch tree
efi: libstub: Undeprecate the command line initrd loader
efi: libstub: Add mixed mode support to command line initrd loader
efi: libstub: Permit mixed mode return types other than efi_status_t
...
- Core:
- The timer_shutdown[_sync]() infrastructure:
Tearing down timers can be tedious when there are circular
dependencies to other things which need to be torn down. A prime
example is timer and workqueue where the timer schedules work and the
work arms the timer.
What needs to prevented is that pending work which is drained via
destroy_workqueue() does not rearm the previously shutdown
timer. Nothing in that shutdown sequence relies on the timer being
functional.
The conclusion was that the semantics of timer_shutdown_sync() should
be:
- timer is not enqueued
- timer callback is not running
- timer cannot be rearmed
Preventing the rearming of shutdown timers is done by discarding rearm
attempts silently. A warning for the case that a rearm attempt of a
shutdown timer is detected would not be really helpful because it's
entirely unclear how it should be acted upon. The only way to address
such a case is to add 'if (in_shutdown)' conditionals all over the
place. This is error prone and in most cases of teardown not required
all.
- The real fix for the bluetooth HCI teardown based on
timer_shutdown_sync().
A larger scale conversion to timer_shutdown_sync() is work in
progress.
- Consolidation of VDSO time namespace helper functions
- Small fixes for timer and timerqueue
- Drivers:
- Prevent integer overflow on the XGene-1 TVAL register which causes
an never ending interrupt storm.
- The usual set of new device tree bindings
- Small fixes and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUuC0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodpZD/9kCDi009n65QFF1J4kE5aZuABbRMtO
7sy66fJpDyB/MtcbPPH29uzQUEs1VMTQVB+ZM+7e1YGoxSWuSTzeoFH+yK1w4tEZ
VPbOcvUEjG0esKUehwYFeOjSnIjy6M1Y41aOUaDnq00/azhfTrzLxQA1BbbFbkpw
S7u2hllbyRJ8KdqQyV9cVpXmze6fcpdtNhdQeoA7qQCsSPnJ24MSpZ/PG9bAovq8
75IRROT7CQRd6AMKAVpA9Ov8ak9nbY3EgQmoKcp5ZXfXz8kD3nHky9Lste7djgYB
U085Vwcelt39V5iXevDFfzrBYRUqrMKOXIf2xnnoDNeF5Jlj5gChSNVZwTLO38wu
RFEVCjCjuC41GQJWSck9LRSYdriW/htVbEE8JLc6uzUJGSyjshgJRn/PK4HjpiLY
AvH2rd4rAap/rjDKvfWvBqClcfL7pyBvavgJeyJ8oXyQjHrHQwapPcsMFBm0Cky5
soF0Lr3hIlQ9u+hwUuFdNZkY9mOg09g9ImEjW1AZTKY0DfJMc5JAGjjSCfuopVUN
Uf/qqcUeQPSEaC+C9xiFs0T3svYFxBqpgPv4B6t8zAnozon9fyZs+lv5KdRg4X77
qX395qc6PaOSQlA7gcxVw3vjCPd0+hljXX84BORP7z+uzcsomvIH1MxJepIHmgaJ
JrYbSZ5qzY5TTA==
=JlDe
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timers, timekeeping and drivers:
Core:
- The timer_shutdown[_sync]() infrastructure:
Tearing down timers can be tedious when there are circular
dependencies to other things which need to be torn down. A prime
example is timer and workqueue where the timer schedules work and
the work arms the timer.
What needs to prevented is that pending work which is drained via
destroy_workqueue() does not rearm the previously shutdown timer.
Nothing in that shutdown sequence relies on the timer being
functional.
The conclusion was that the semantics of timer_shutdown_sync()
should be:
- timer is not enqueued
- timer callback is not running
- timer cannot be rearmed
Preventing the rearming of shutdown timers is done by discarding
rearm attempts silently.
A warning for the case that a rearm attempt of a shutdown timer is
detected would not be really helpful because it's entirely unclear
how it should be acted upon. The only way to address such a case is
to add 'if (in_shutdown)' conditionals all over the place. This is
error prone and in most cases of teardown not required all.
- The real fix for the bluetooth HCI teardown based on
timer_shutdown_sync().
A larger scale conversion to timer_shutdown_sync() is work in
progress.
- Consolidation of VDSO time namespace helper functions
- Small fixes for timer and timerqueue
Drivers:
- Prevent integer overflow on the XGene-1 TVAL register which causes
an never ending interrupt storm.
- The usual set of new device tree bindings
- Small fixes and improvements all over the place"
* tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support
dt-bindings: timer: renesas,tmu: Add r8a779g0 support
clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool()
clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock()
clocksource/drivers/timer-ti-dm: Clear settings on probe and free
clocksource/drivers/timer-ti-dm: Make timer_get_irq static
clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match
clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error
clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use
dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks
dt-bindings: timer: rockchip: Add rockchip,rk3128-timer
clockevents: Repair kernel-doc for clockevent_delta2ns()
clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct
clocksource/drivers/sh_cmt: Access registers according to spec
vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy
Bluetooth: hci_qca: Fix the teardown problem for real
timers: Update the documentation to reflect on the new timer_shutdown() API
timers: Provide timer_shutdown[_sync]()
timers: Add shutdown mechanism to the internal functions
timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode
...
Palmer Dabbelt <palmer@rivosinc.com> says:
This contains a pair of cleanups that depend on a fix that has already
landed upstream.
* b4-shazam-merge:
RISC-V: Add some comments about the shadow and overflow stacks
RISC-V: Align the shadow stack
riscv: fix race when vmap stack overflow
Link: https://lore.kernel.org/r/20221130023515.20217-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
It took me a while to page all this back in when trying to review the
recent spin_shadow_stack, so I figured I'd just write up some comments.
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221130023515.20217-2-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The standard RISC-V ABIs all require 16-byte stack alignment. We're
only calling that one function on the shadow stack so I doubt it'd
result in a real issue, but might as well keep this lined up.
Fixes: 31da94c25a ("riscv: add VMAP_STACK overflow detection")
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221130023515.20217-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones <ajones@ventanamicro.com> says:
When a DT puts zicbom in the isa string, but does not provide a block
size, ALT_CMO_OP() will attempt to do cache operations on address
zero since the start address will be ANDed with zero. We can't simply
BUG() in riscv_init_cbom_blocksize() when we fail to find a block
size because the failure will happen before logging works, leaving
users to scratch their heads as to why the boot hung. Instead, ensure
Zicbom is disabled and output an error which will hopefully alert
people that the DT needs to be fixed. While at it, add a check that
the block size is a power-of-2 too.
* b4-shazam-merge:
RISC-V: Ensure Zicbom has a valid block size
RISC-V: Introduce riscv_isa_extension_check
RISC-V: Improve use of isa2hwcap[]
Link: https://lore.kernel.org/r/20221129143447.49714-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When a DT puts zicbom in the isa string, but does not provide a block
size, ALT_CMO_OP() will attempt to do cache operations on address
zero since the start address will be ANDed with zero. We can't simply
BUG() in riscv_init_cbom_blocksize() when we fail to find a block
size because the failure will happen before logging works, leaving
users to scratch their heads as to why the boot hung. Instead, ensure
Zicbom is disabled and output an error which will hopefully alert
people that the DT needs to be fixed. While at it, add a check that
the block size is a power-of-2 too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20221129143447.49714-4-ajones@ventanamicro.com
[Palmer: base on 5c20a3a9df ("RISC-V: Fix compilation without RISCV_ISA_ZICBOM"]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently any isa extension found in the isa string is set in the
isa bitmap. An isa extension set in the bitmap indicates that the
extension is present and may be used (a.k.a is enabled). However,
when an extension cannot be used due to missing dependencies or
errata it should not be added to the bitmap. Introduce a function
where additional checks may be placed in order to determine if an
extension should be enabled or not.
Note, the checks may simply indicate an issue with the DT, but,
since extensions may be used in early boot, it's not always possible
to simply produce an error at the point the issue is determined.
It's best to keep the extension disabled and produce an error.
No functional change intended, as the function is only introduced
and always returns true. A later patch will provide checks for an
isa extension.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20221129143447.49714-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Improve isa2hwcap[] by removing it from static storage, as
riscv_fill_hwcap() is only called once, and by reducing its size
from 256 bytes to 26. The latter improvement is possible because
isa2hwcap[] will never be indexed with capital letters and we can
precompute the offsets from 'a'.
No functional change intended.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20221129143447.49714-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
I'm merging this in as a single patch to make it easier to handle the
backports.
* b4-shazam-merge:
RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path
Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The return to userspace path in entry.S may enable interrupts without the
corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP
is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy
due to the use of RA to point back to ret_from_exception, so just move
the whole slow-path loop into C. It's more readable and it lets us use
local_irq_{enable,disable}(), avoiding the need for manual annotations
altogether.
[0]:
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled())
WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0
Modules linked in:
CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53
Hardware name: riscv-virtio,qemu (DT)
epc : check_flags+0x10a/0x1e0
ra : check_flags+0x10a/0x1e0
<snip>
status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff808edb90>] lock_is_held_type+0x78/0x14e
[<ffffffff8003dae2>] __might_resched+0x26/0x22c
[<ffffffff8003dd24>] __might_sleep+0x3c/0x66
[<ffffffff80022c60>] get_signal+0x9e/0xa70
[<ffffffff800054a2>] do_notify_resume+0x6e/0x422
[<ffffffff80003c68>] ret_from_exception+0x0/0x10
irq event stamp: 44512
hardirqs last enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62
hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14
softirqs last enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e
softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104
---[ end trace 0000000000000000 ]---
possible reason: unannotated irqs-on.
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Fixes: 3c46979829 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")
Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The sbi_get_mvendorid(), sbi_get_marchid(), and sbi_get_mimpid()
can be used by KVM module so let us export these functions.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The 'retp' is a pointer to the return address on the stack, so we
must pass the current return address pointer as the 'retp'
argument to ftrace_push_return_trace(). Not parent function's
return address on the stack.
Fixes: b785ec129b ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221109064937.3643993-2-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This is reported by kmemleak detector:
unreferenced object 0xff2000000403d000 (size 4096):
comm "kexec", pid 146, jiffies 4294900633 (age 64.792s)
hex dump (first 32 bytes):
7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
04 00 f3 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000566ca97c>] kmemleak_vmalloc+0x3c/0xbe
[<00000000979283d8>] __vmalloc_node_range+0x3ac/0x560
[<00000000b4b3712a>] __vmalloc_node+0x56/0x62
[<00000000854f75e2>] vzalloc+0x2c/0x34
[<00000000e9a00db9>] crash_prepare_elf64_headers+0x80/0x30c
[<0000000067e8bf48>] elf_kexec_load+0x3e8/0x4ec
[<0000000036548e09>] kexec_image_load_default+0x40/0x4c
[<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322
[<0000000040c62c03>] ret_from_syscall+0x0/0x2
In elf_kexec_load(), a buffer is allocated via vzalloc() to store elf
headers. While it's not freed back to system when kdump kernel is
reloaded or unloaded, or when image->elf_header is successfully set and
then fails to load kdump kernel for some reason. Fix it by freeing the
buffer in arch_kimage_file_post_load_cleanup().
Fixes: 8acea455fa ("RISC-V: Support for kexec_file on panic")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221104095658.141222-2-lihuafei1@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This is reported by kmemleak detector:
unreferenced object 0xff60000082864000 (size 9588):
comm "kexec", pid 146, jiffies 4294900634 (age 64.788s)
hex dump (first 32 bytes):
d0 0d fe ed 00 00 12 ed 00 00 00 48 00 00 11 40 ...........H...@
00 00 00 28 00 00 00 11 00 00 00 02 00 00 00 00 ...(............
backtrace:
[<00000000f95b17c4>] kmemleak_alloc+0x34/0x3e
[<00000000b9ec8e3e>] kmalloc_order+0x9c/0xc4
[<00000000a95cf02e>] kmalloc_order_trace+0x34/0xb6
[<00000000f01e68b4>] __kmalloc+0x5c2/0x62a
[<000000002bd497b2>] kvmalloc_node+0x66/0xd6
[<00000000906542fa>] of_kexec_alloc_and_setup_fdt+0xa6/0x6ea
[<00000000e1166bde>] elf_kexec_load+0x206/0x4ec
[<0000000036548e09>] kexec_image_load_default+0x40/0x4c
[<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322
[<0000000040c62c03>] ret_from_syscall+0x0/0x2
In elf_kexec_load(), a buffer is allocated via kvmalloc() to store fdt.
While it's not freed back to system when kexec kernel is reloaded or
unloaded. Then memory leak is caused. Fix it by introducing riscv
specific function arch_kimage_file_post_load_cleanup(), and freeing the
buffer there.
Fixes: 6261586e0c ("RISC-V: Add kexec_file support")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Link: https://lore.kernel.org/r/20221104095658.141222-1-lihuafei1@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES,
VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for
vmcore.
* b4-shazam-merge:
Documentation: kdump: describe VMCOREINFO export for RISCV64
RISC-V: Add arch_crash_save_vmcoreinfo support
Link: https://lore.kernel.org/r/20221026144208.373504-1-xianting.tian@linux.alibaba.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES,
VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for
vmcore.
Default pagetable levels and PAGE_OFFSET aren't same for different
kernel version as below. For pagetable levels, it sets sv57 by default
and falls back to setting sv48 at boot time if sv57 is not supported by
the hardware.
For ram base, the default value is 0x80200000 for qemu riscv64 env and,
for example, is 0x200000 on the XuanTie 910 CPU.
* Linux Kernel 5.18 ~
* PGTABLE_LEVELS = 5
* PAGE_OFFSET = 0xff60000000000000
* Linux Kernel 5.17 ~
* PGTABLE_LEVELS = 4
* PAGE_OFFSET = 0xffffaf8000000000
* Linux Kernel 4.19 ~
* PGTABLE_LEVELS = 3
* PAGE_OFFSET = 0xffffffe000000000
Since these configurations change from time to time and version to
version, it is preferable to export them via vmcoreinfo than to change
the crash's code frequently, it can simplify the development of crash
tool.
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Tested-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20221026144208.373504-2-xianting.tian@linux.alibaba.com
[Palmer: wrap commit text]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Implement the kretprobes on riscv arch by using rethook machenism
which abstracts general kretprobe info into a struct rethook_node
to be embedded in the struct kretprobe_instance.
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Binglei Wang <l3b2w1@gmail.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221025151831.1097417-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jamie Iles <jamie@jamieiles.com> says:
This series enables dynamic ftrace support for RV32I bringing it to
parity with RV64I. Most of the work is already there, this is largely
just assembly fixes to handle register sizes, correct handling of the
psABI calling convention and Kconfig change.
Validated with all ftrace boot time self test with qemu for RV32I and
RV64I in addition to real tracing on an RV32I FPGA design.
* b4-shazam-merge:
RISC-V: enable dynamic ftrace for RV32I
RISC-V: preserve a1 in mcount
RISC-V: reduce mcount save space on RV32
RISC-V: use REG_S/REG_L for mcount
Link: https://lore.kernel.org/r/20221115200832.706370-1-jamie@jamieiles.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The RISC-V ELF psABI states that both a0 and a1 are used for return
values so we should preserve them both in return_to_handler. This is
especially important for RV32 for functions returning a 64-bit quantity
otherwise the return value can be corrupted and undefined behaviour
results.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Link: https://lore.kernel.org/r/20221115200832.706370-4-jamie@jamieiles.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
For RV32 we can reduce the size of the ABI save+restore state by using
SZREG so that register stores are packed rather than on an 8 byte
boundary.
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20221115200832.706370-3-jamie@jamieiles.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In preparation for rv32i ftrace support, convert mcount routines to use
native sized loads/stores.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Link: https://lore.kernel.org/r/20221115200832.706370-2-jamie@jamieiles.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This fixes a concrete bug but is also the basis for some cleanup work,
so I'm merging it based on the offending commit in order to minimize
future conflicts.
* commit '7e1864332fbc1b993659eab7974da9fe8bf8c128':
riscv: fix race when vmap stack overflow
find_timens_vvar_page() is not architecture-specific, as can be seen from
how all five per-architecture versions of it are the same.
(arm64, powerpc and riscv are exactly the same; x86 and s390 have two
characters difference inside a comment, less blank lines, and mark the
!CONFIG_TIME_NS version as inline.)
Refactor the five copies into a central copy in kernel/time/namespace.c.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130115320.2918447-1-jannh@google.com
If a crash happens on cpu3 and all interrupts are binding on cpu0, the
bad irq routing will cause a crash kernel which can't receive any irq.
Because crash kernel won't clean up all harts' PLIC enable bits in
enable registers. This patch is similar to 9141a003a4 ("ARM: 7316/1:
kexec: EOI active and mask all interrupts in kexec crash path") and
78fd584cde ("arm64: kdump: implement machine_crash_shutdown()"), and
PowerPC also has the same mechanism.
Fixes: fba8a8674f ("RISC-V: Add kexec support")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Cc: Nick Kossifidis <mick@ics.forth.gr>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20221020141603.2856206-2-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
64-bit RISC-V kernels have the kernel image mapped separately to alias
the linear map. The linear map and the kernel image map are documented
as "direct mapping" and "kernel" respectively in [1].
At image load time, the linear map corresponding to the kernel image
is set to PAGE_READ permission, and the kernel image map is set to
PAGE_READ|PAGE_EXEC.
When the initmem is freed, the pages in the linear map should be
restored to PAGE_READ|PAGE_WRITE, whereas the corresponding pages in
the kernel image map should be restored to PAGE_READ, by removing the
PAGE_EXEC permission.
This is not the case. For 64-bit kernels, only the linear map is
restored to its proper page permissions at initmem free, and not the
kernel image map.
In practise this results in that the kernel can potentially jump to
dead __init code, and start executing invalid instructions, without
getting an exception.
Restore the freed initmem properly, by setting both the kernel image
map to the correct permissions.
[1] Documentation/riscv/vm-layout.rst
Fixes: e5c35fa040 ("riscv: Map the kernel with correct permissions the first time")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>
Tested-by: Alexandre Ghiti <alex@ghiti.fr>
Link: https://lore.kernel.org/r/20221115090641.258476-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
lkp reported a build error, I tried the config and can reproduce
build error as below:
VDSOLD arch/riscv/kernel/vdso/vdso.so.dbg
ld.lld: error: section .note file range overlaps with .text
>>> .note range is [0x7C8, 0x803]
>>> .text range is [0x800, 0x1993]
ld.lld: error: section .text file range overlaps with .dynamic
>>> .text range is [0x800, 0x1993]
>>> .dynamic range is [0x808, 0x937]
ld.lld: error: section .note virtual address range overlaps with .text
>>> .note range is [0x7C8, 0x803]
>>> .text range is [0x800, 0x1993]
Fix it by setting DISABLE_BRANCH_PROFILING which will disable branch
tracing for vdso, thus avoid useless _ftrace_annotated_branch section
and _ftrace_branch section. Although we can also fix it by removing
the hardcoded .text begin address, but I think that's another story
and should be put into another patch.
Link: https://lore.kernel.org/lkml/202210122123.Cc4FPShJ-lkp@intel.com/#r
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221102170254.1925-1-jszhang@kernel.org
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, when detecting vmap stack overflow, riscv firstly switches
to the so called shadow stack, then use this shadow stack to call the
get_overflow_stack() to get the overflow stack. However, there's
a race here if two or more harts use the same shadow stack at the same
time.
To solve this race, we introduce spin_shadow_stack atomic var, which
will be swap between its own address and 0 in atomic way, when the
var is set, it means the shadow_stack is being used; when the var
is cleared, it means the shadow_stack isn't being used.
Fixes: 31da94c25a ("riscv: add VMAP_STACK overflow detection")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org
[Palmer: Add AQ to the swap, and also some comments.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Recently, ld.lld moved from '--undefined-version' to
'--no-undefined-version' as the default, which breaks the compat vDSO
build:
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_gettimeofday' failed: symbol not defined
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_gettime' failed: symbol not defined
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_getres' failed: symbol not defined
These symbols are not present in the compat vDSO or the regular vDSO for
32-bit but they are unconditionally included in the version section of
the linker script, which is prohibited with '--no-undefined-version'.
Fix this issue by only including the symbols that are actually exported
in the version section of the linker script.
Link: https://github.com/ClangBuiltLinux/linux/issues/1756
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221108171324.3377226-1-nathan@kernel.org/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, RISC-V sets up reserved memory using the "early" copy of the
device tree. As a result, when trying to get a reserved memory region
using of_reserved_mem_lookup(), the pointer to reserved memory regions
is using the early, pre-virtual-memory address which causes a kernel
panic when trying to use the buffer's name:
Unable to handle kernel paging request at virtual address 00000000401c31ac
Oops [#1]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
epc : string+0x4a/0xea
ra : vsnprintf+0x1e4/0x336
epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
t5 : ffffffff812f3618 t6 : ffffffff81203d08
status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
[<ffffffff80338936>] vsnprintf+0x1e4/0x336
[<ffffffff80055ae2>] vprintk_store+0xf6/0x344
[<ffffffff80055d86>] vprintk_emit+0x56/0x192
[<ffffffff80055ed8>] vprintk_default+0x16/0x1e
[<ffffffff800563d2>] vprintk+0x72/0x80
[<ffffffff806813b2>] _printk+0x36/0x50
[<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
[<ffffffff808057ec>] paging_init+0x528/0x5bc
[<ffffffff808031ae>] setup_arch+0xd0/0x592
[<ffffffff8080070e>] start_kernel+0x82/0x73c
early_init_fdt_scan_reserved_mem() takes no arguments as it operates on
initial_boot_params, which is populated by early_init_dt_verify(). On
RISC-V, early_init_dt_verify() is called twice. Once, directly, in
setup_arch() if CONFIG_BUILTIN_DTB is not enabled and once indirectly,
very early in the boot process, by parse_dtb() when it calls
early_init_dt_scan_nodes().
This first call uses dtb_early_va to set initial_boot_params, which is
not usable later in the boot process when
early_init_fdt_scan_reserved_mem() is called. On arm64 for example, the
corresponding call to early_init_dt_scan_nodes() uses fixmap addresses
and doesn't suffer the same fate.
Move early_init_fdt_scan_reserved_mem() further along the boot sequence,
after the direct call to early_init_dt_verify() in setup_arch() so that
the names use the correct virtual memory addresses. The above supposed
that CONFIG_BUILTIN_DTB was not set, but should work equally in the case
where it is - unflatted_and_copy_device_tree() also updates
initial_boot_params.
Reported-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/linux-riscv/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Fixes: 922b0375fc ("riscv: Fix memblock reservation for device tree blob")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/r/20221107151524.3941467-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Even after commit 89fd4a1df8 ("riscv: jump_label: mark arguments as
const to satisfy asm constraints"), building with CC_OPTIMIZE_FOR_SIZE
+ LLVM=1 can reproduce below build error:
CC arch/riscv/kernel/vdso/vgettimeofday.o
In file included from <built-in>:4:
In file included from lib/vdso/gettimeofday.c:5:
In file included from include/vdso/datapage.h:17:
In file included from include/vdso/processor.h:10:
In file included from arch/riscv/include/asm/vdso/processor.h:7:
In file included from include/linux/jump_label.h:112:
arch/riscv/include/asm/jump_label.h:42:3: error:
invalid operand for inline asm constraint 'i'
" .option push \n\t"
^
1 error generated.
I think the problem is when "-Os" is passed as CFLAGS, it's removed by
"CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os" which is
introduced in commit e05d57dcb8 ("riscv: Fixup __vdso_gettimeofday
broke dynamic ftrace"), thus no optimization at all for vgettimeofday.c
arm64 does remove "-Os" as well, but it forces "-O2" after removing
"-Os".
I compared the generated vgettimeofday.o with "-O2" and "-Os",
I think no big performance difference. So let's tell the kbuild not
to remove "-Os" rather than follow arm64 style.
vdso related performance can be improved a lot when building kernel with
CC_OPTIMIZE_FOR_SIZE after this commit, ("-Os" VS no optimization)
Fixes: e05d57dcb8 ("riscv: Fixup __vdso_gettimeofday broke dynamic ftrace")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221031182943.2453-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Clone the implementations of strrchr() and memchr() in lib/string.c so
we can use them in the standalone zboot decompressor app. These routines
are used by the FDT handling code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Split the efi_printk() routine into its own source file, and provide
local implementations of strlen() and strnlen() so that the standalone
zboot app can efi_err and efi_info etc.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
We will no longer be able to call into the kernel image once we merge
the decompressor with the EFI stub, so we need our own implementation of
memcmp(). Let's add the one from lib/string.c and simplify it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
In preparation for moving the EFI stub functionality into the zboot
decompressor, switch to the stub's implementation of strncmp()
unconditionally.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Commit 78e5a33994 ("cpumask: fix checking valid cpu range") has
started issuing warnings[*] when cpu indices equal to nr_cpu_ids - 1
are passed to cpumask_next* functions. seq_read_iter() and cpuinfo's
start and next seq operations implement a pattern like
n = cpumask_next(n - 1, mask);
show(n);
while (1) {
++n;
n = cpumask_next(n - 1, mask);
if (n >= nr_cpu_ids)
break;
show(n);
}
which will issue the warning when reading /proc/cpuinfo. Ensure no
warning is generated by validating the cpu index before calling
cpumask_next().
[*] Warnings will only appear with DEBUG_PER_CPU_MAPS enabled.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20221014155845.1986223-2-ajones@ventanamicro.com/
Fixes: 78e5a33994 ("cpumask: fix checking valid cpu range")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
sbi_get_mvendorid(), sbi_get_marchid() and sbi_get_mimpid() might get
called multiple times, though the values of these CSRs should not change
during the runtime of a specific machine.
Though the values can be different depending on which hart of the system
they get called. So hook into the newly introduced cpuinfo struct to allow
retrieving these cached values via new functions.
Also use arch_initcall for the cpuinfo setup instead, as that now clearly
is "architecture specific initialization" and also makes these information
available slightly earlier.
[caching vendor ids]
Suggested-by: Atish Patra <atishp@atishpatra.org>
[using cpuinfo struct as cache]
Suggested-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/all/20221011231841.2951264-2-heiko@sntech.de/
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* A handful of DT updates for the PolarFire SOC.
* A fix to correct the handling of write-only mappings.
* m{vetndor,arcd,imp}id is now in /proc/cpuinfo
* The SiFive L2 cache controller support has been refactored to also
support L3 caches.
There's also a handful of fixes, cleanups and improvements throughout
the tree.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmNJiegTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiaW+D/9nN7JMKD4KbED8MUgRVcs+TS4MQk5J
QkjmAXme7w2H1+T5mxNWHk0QvEC6qMu2JQWHeott0ROYPkbXoNtuOOkzcsZhVaeb
rjWH/WC5QhEeMPDc1qc0AmxVkOa937f2NkGtoEvhW6SMWvStHpefdOHo/ij106Re
7wxkcj0fjgn/zPmDltRbWSMyFZWJec5DvZ3AB4NYHnc2ycr8Z7HAG08rjZkdkIjQ
zYGsCkUtrk4qShikKK3cSelW6hH1/FdM+bAo0rzt1frmTw1FLaFOsZZjOaVYekzi
l0jxsb8FRBWyKBFqcagukjZwFy6D+7Q+masb0cXY03eZpEVrroA4bHiPkuZ22Ms/
7ol/I5FvTyrj2R4zd70ziYVF3usO78t5HC4AIQmFl25TNcQdYQd8X28yh12iG6QN
Pa0lh/EOr5idaT2+TErhzRepICnK1Nj9y0H5TZxYljLAhH9j0d/8+Iw88vzJnPga
vek7unZ3BzkooLVIpfpkT8vC94MA0hoP849MVFQDtZgZhPMbjIkN91HrDiw9ktV2
SK9cuPndfrs5nW1WKu8F0cDziusbMHv4F51TPVRu/dFcIkspv+aLojAThgvcm42K
55LgvDgLjo7P3PUghiDXUoPZXvL19t5Bnq2/E47rlSTvvGssIu/onZO6BkxybAm6
BkHuchr8TGBWMg==
=iCOr
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.1-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- DT updates for the PolarFire SOC
- a fix to correct the handling of write-only mappings
- m{vetndor,arcd,imp}id is now in /proc/cpuinfo
- the SiFive L2 cache controller support has been refactored to also
support L3 caches
- misc fixes, cleanups and improvements throughout the tree
* tag 'riscv-for-linus-6.1-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
MAINTAINERS: add RISC-V's patchwork
RISC-V: Make port I/O string accessors actually work
riscv: enable software resend of irqs
RISC-V: Re-enable counter access from userspace
riscv: vdso: fix NULL deference in vdso_join_timens() when vfork
riscv: Add cache information in AUX vector
soc: sifive: ccache: define the macro for the register shifts
soc: sifive: ccache: use pr_fmt() to remove CCACHE: prefixes
soc: sifive: ccache: reduce printing on init
soc: sifive: ccache: determine the cache level from dts
soc: sifive: ccache: Rename SiFive L2 cache to Composable cache.
dt-bindings: sifive-ccache: change Sifive L2 cache to Composable cache
riscv: check for kernel config option in t-head memory types errata
riscv: use BIT() marco for cpufeature probing
riscv: use BIT() macros in t-head errata init
riscv: drop some idefs from CMO initialization
riscv: cleanup svpbmt cpufeature probing
riscv: Pass -mno-relax only on lld < 15.0.0
RISC-V: Avoid dereferening NULL regs in die()
dt-bindings: riscv: add new riscv,isa strings for emulators
...
I'm merging this in as a single commit as it's a dependency for some
other work.
* commit '3baca1a4d490484fcd555413f1fec85b2e071912':
RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo output
Commit 2139619bca ("riscv: mmap with PROT_WRITE but no PROT_READ is
invalid") made mmap() reject mappings with only PROT_WRITE set in an
attempt to fix an observed inconsistency in behavior when attempting
to read from a PROT_WRITE-only mapping. The root cause of this behavior
was actually that while RISC-V's protection_map maps VM_WRITE to
readable PTE permissions (since write-only PTEs are considered reserved
by the privileged spec), the page fault handler considered loads from
VM_WRITE-only VMAs illegal accesses. Fix the underlying cause by
handling faults in VM_WRITE-only VMAs (patch 1) and then re-enable
use of mmap(PROT_WRITE) (patch 2), making RISC-V's behavior consistent
with all other architectures that don't support write-only PTEs.
* remotes/palmer/riscv-wonly:
riscv: Allow PROT_WRITE-only mmap()
riscv: Make VM_WRITE imply VM_READ
Link: https://lore.kernel.org/r/20220915193702.2201018-1-abrestic@rivosinc.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Heiko Stuebner <heiko@sntech.de> says:
As noted by some people, some parts of the recently added extensions
(svpbmt, zicbom) + t-head errata could use some styling upgrades.
So this series provides these.
changes in v2:
- add patch also converting cpufeature probe to BIT()
- update commit message in patch1 (Conor)
Heiko Stuebner (5):
riscv: cleanup svpbmt cpufeature probing
riscv: drop some idefs from CMO initialization
riscv: use BIT() macros in t-head errata init
riscv: use BIT() marco for cpufeature probing
riscv: check for kernel config option in t-head memory types errata
arch/riscv/errata/thead/errata.c | 14 ++++++-----
arch/riscv/include/asm/cacheflush.h | 2 ++
arch/riscv/kernel/cpufeature.c | 39 ++++++++++++-----------------
3 files changed, 26 insertions(+), 29 deletions(-)
Link: https://lore.kernel.org/r/20220905111027.2463297-1-heiko@sntech.de
* b4-shazam-merge:
riscv: check for kernel config option in t-head memory types errata
riscv: use BIT() marco for cpufeature probing
riscv: use BIT() macros in t-head errata init
riscv: drop some idefs from CMO initialization
riscv: cleanup svpbmt cpufeature probing
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Wrapping things in #ifdefs makes the code harder to read
while we also have IS_ENABLED() macros to do this in regular code
and the extension detection is not _that_ runtime critical.
So define a stub for riscv_noncoherent_supported() in the
non-CONFIG_RISCV_DMA_NONCOHERENT case and move the code to
us IS_ENABLED.
Suggested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20220905111027.2463297-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
For better readability (and compile time coverage) use IS_ENABLED
instead of ifdef and drop the new unneeded switch statement.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20220905111027.2463297-2-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
I don't think we can actually die() without a regs pointer, but the
compiler was warning about a NULL check after a dereference. It seems
prudent to just avoid the possibly-NULL dereference, given that when
die()ing the system is already toast so who knows how we got there.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20220920200037.6727-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* Fixes for single-stepping in the presence of an async
exception as well as the preservation of PSTATE.SS
* Better handling of AArch32 ID registers on AArch64-only
systems
* Fixes for the dirty-ring API, allowing it to work on
architectures with relaxed memory ordering
* Advertise the new kvmarm mailing list
* Various minor cleanups and spelling fixes
RISC-V:
* Improved instruction encoding infrastructure for
instructions not yet supported by binutils
* Svinval support for both KVM Host and KVM Guest
* Zihintpause support for KVM Guest
* Zicbom support for KVM Guest
* Record number of signal exits as a VCPU stat
* Use generic guest entry infrastructure
x86:
* Misc PMU fixes and cleanups.
* selftests: fixes for Hyper-V hypercall
* selftests: fix nx_huge_pages_test on TDP-disabled hosts
* selftests: cleanups for fix_hypercall_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM7OcMUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPAFgf/Rqc9hrXZVdbh2OZ+gScSsFsPK1zO
DISUksLcXaYVYYsvQAEg/N2BPz3XbmO4jA+z8bIUrYTA7fC98we2C4jfR+EaX/fO
+/Kzf0lAgu/nQZyFzUya+1jRsZqvVbC/HmDCI2kzN4u78e/LZ7NVcMijdV/ly6ib
cq0b0LLqJHe/fcpJ806JZP3p5sndQhDmlUkZ2AWZf6CUKSEFcufbbYkt+84ZK4PL
N9mEqXYQ3DXClLQmIBv+NZhtGlmADkWDE4BNouw8dVxhaXH7Hw/jfBHdb6SSHMRe
tQ6Src1j8AYOhf5J35SMudgkbGcMelm0yeZ7Sizk+5Ft0EmdbZsnkvsGdQ==
=4RA+
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"The main batch of ARM + RISC-V changes, and a few fixes and cleanups
for x86 (PMU virtualization and selftests).
ARM:
- Fixes for single-stepping in the presence of an async exception as
well as the preservation of PSTATE.SS
- Better handling of AArch32 ID registers on AArch64-only systems
- Fixes for the dirty-ring API, allowing it to work on architectures
with relaxed memory ordering
- Advertise the new kvmarm mailing list
- Various minor cleanups and spelling fixes
RISC-V:
- Improved instruction encoding infrastructure for instructions not
yet supported by binutils
- Svinval support for both KVM Host and KVM Guest
- Zihintpause support for KVM Guest
- Zicbom support for KVM Guest
- Record number of signal exits as a VCPU stat
- Use generic guest entry infrastructure
x86:
- Misc PMU fixes and cleanups.
- selftests: fixes for Hyper-V hypercall
- selftests: fix nx_huge_pages_test on TDP-disabled hosts
- selftests: cleanups for fix_hypercall_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (57 commits)
riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
RISC-V: KVM: Use generic guest entry infrastructure
RISC-V: KVM: Record number of signal exits as a vCPU stat
RISC-V: KVM: add __init annotation to riscv_kvm_init()
RISC-V: KVM: Expose Zicbom to the guest
RISC-V: KVM: Provide UAPI for Zicbom block size
RISC-V: KVM: Make ISA ext mappings explicit
RISC-V: KVM: Allow Guest use Zihintpause extension
RISC-V: KVM: Allow Guest use Svinval extension
RISC-V: KVM: Use Svinval for local TLB maintenance when available
RISC-V: Probe Svinval extension form ISA string
RISC-V: KVM: Change the SBI specification version to v1.0
riscv: KVM: Apply insn-def to hlv encodings
riscv: KVM: Apply insn-def to hfence encodings
riscv: Introduce support for defining instructions
riscv: Add X register names to gpr-nums
KVM: arm64: Advertise new kvmarm mailing list
kvm: vmx: keep constant definition format consistent
kvm: mmu: fix typos in struct kvm_arch
KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hosts
...
When CONFIG_CMDLINE_FORCE is enabled, cmdline provided by
CONFIG_CMDLINE are always used. This allows CONFIG_CMDLINE to be
used regardless of the result of device tree scanning.
This especially fixes the case where a device tree without the
chosen node is supplied to the kernel. In such cases,
early_init_dt_scan would return true. But inside
early_init_dt_scan_chosen, the cmdline won't be updated as there
is no chosen node in the device tree. As a result, CONFIG_CMDLINE
is not copied into boot_command_line even if CONFIG_CMDLINE_FORCE
is enabled. This commit allows properly update boot_command_line
in this situation.
Fixes: 8fd6e05c74 ("arch: riscv: support kernel command line forcing when no DTB passed")
Signed-off-by: Wenting Zhang <zephray@outlook.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/PSBPR04MB399135DFC54928AB958D0638B1829@PSBPR04MB3991.apcprd04.prod.outlook.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
linux-next for a couple of months without, to my knowledge, any negative
reports (or any positive ones, come to that).
- Also the Maple Tree from Liam R. Howlett. An overlapping range-based
tree for vmas. It it apparently slight more efficient in its own right,
but is mainly targeted at enabling work to reduce mmap_lock contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
(https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
This has yet to be addressed due to Liam's unfortunately timed
vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down to
the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
=xfWx
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
- Remove potentially incomplete targets when Kbuid is interrupted by
SIGINT etc. in case GNU Make may miss to do that when stderr is piped
to another program.
- Rewrite the single target build so it works more correctly.
- Fix rpm-pkg builds with V=1.
- List top-level subdirectories in ./Kbuild.
- Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in kallsyms.
- Avoid two different modules in lib/zstd/ having shared code, which
potentially causes building the common code as build-in and modular
back-and-forth.
- Unify two modpost invocations to optimize the build process.
- Remove head-y syntax in favor of linker scripts for placing particular
sections in the head of vmlinux.
- Bump the minimal GNU Make version to 3.82.
- Clean up misc Makefiles and scripts.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmM+4vcVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGY2IQAInr0JUNnkkxwUSXtOcQuA3IK8RJ
FbU9HXJRoV9H+7+l3SMlN7mIbrs5eE5fTY3iwQ3CVe139d1+1q7nvTMRv8owywJx
GBgzswncuu1lk7iQQ//CxiqMwSCG8GJdYn1uDVy4I5jg3o+DtFZJtyq2Wb7pqsMm
ZhZ4PozRN+idYQJSF6Vx/zEVLHI7quMBwfe4CME8/0Kg2+hnYzbXV/aUf0ED2emq
zdCMDQgIOK5AhY+8qgMXKYnBUJMTqBp6LoR4p3ApfUkwRFY0sGa0/LK3U/B22OE7
uWyR4fCUExGyerlcHEVev+9eBfmsLLPyqlchNwpSDOPf5OSdnKmgqJEBR/Cvx0eh
URerPk7EHxyH3G8yi+cU2GtofNTGc5RHPRgJE2ADsQEi5TAUKGmbXMlsFRL/51Vn
lTANZObBNa1d4enljF6TfTL5nuccOa+DKvXnH9fQ49t0QdtSikv6J/lGwilwm1Sr
BctmCsySPuURZfkpI9OQnLuouloMXl9f7Q/+S39haS/tSgvPpyITyO71nxDnXn/s
BbFObZJUk9QkqOACjBP1hNErTLt83uBxQ9z+rDCw/SbLIe4nw0wyneuygfHI5rI8
3RZB2DbGauuJHX2Zs6YGS14SLSY33IsLqKR1/Vy3LrPvOHuEvNiOR8LITq5E0YCK
OffK2Y5cIlXR0QWf
=DHiN
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Remove potentially incomplete targets when Kbuid is interrupted by
SIGINT etc in case GNU Make may miss to do that when stderr is piped
to another program.
- Rewrite the single target build so it works more correctly.
- Fix rpm-pkg builds with V=1.
- List top-level subdirectories in ./Kbuild.
- Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in
kallsyms.
- Avoid two different modules in lib/zstd/ having shared code, which
potentially causes building the common code as build-in and modular
back-and-forth.
- Unify two modpost invocations to optimize the build process.
- Remove head-y syntax in favor of linker scripts for placing
particular sections in the head of vmlinux.
- Bump the minimal GNU Make version to 3.82.
- Clean up misc Makefiles and scripts.
* tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
docs: bump minimal GNU Make version to 3.82
ia64: simplify esi object addition in Makefile
Revert "kbuild: Check if linker supports the -X option"
kbuild: rebuild .vmlinux.export.o when its prerequisite is updated
kbuild: move modules.builtin(.modinfo) rules to Makefile.vmlinux_o
zstd: Fixing mixed module-builtin objects
kallsyms: ignore __kstrtab_* and __kstrtabns_* symbols
kallsyms: take the input file instead of reading stdin
kallsyms: drop duplicated ignore patterns from kallsyms.c
kbuild: reuse mksysmap output for kallsyms
mksysmap: update comment about __crc_*
kbuild: remove head-y syntax
kbuild: use obj-y instead extra-y for objects placed at the head
kbuild: hide error checker logs for V=1 builds
kbuild: re-run modpost when it is updated
kbuild: unify two modpost invocations
kbuild: move vmlinux.o rule to the top Makefile
kbuild: move .vmlinux.objs rule to Makefile.modpost
kbuild: list sub-directories in ./Kbuild
Makefile.compiler: replace cc-ifversion with compiler-specific macros
...
* Improvements to the CPU topology subsystem, which fix some issues
where RISC-V would report bad topology information.
* The default NR_CPUS has increased to XLEN, and the maximum
configurable value is 512.
* The CD-ROM filesystems have been enabled in the defconfig.
* Support for THP_SWAP has been added for rv64 systems.
There are also a handful of cleanups and fixes throughout the tree.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmNAWgwTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYicSiEACmuB9WuGZmAasKvmPgz7thyLqakg7/
cE4YK0MxgJxkhsXzYSAv1Fn+WUfX7DSzhK4OOM5wEngAYul7QoFdc84MF0DYKO+E
InjdOvVavzUsWYqETNCuMHPRK6xyzvfHCqqBDDxKHx5jUoicCQfFwJyHLw+cvouR
7WSJoFdvOEV01QyN5Qw9bQp7ASx61ZZX1yE6OAPc2/EJlDEA2QSnjBAi4M+n2ZCx
ZsQz+Dp9RfSU8/nIr13oGiL3Zm+kyXwdOS/8PaDqtrkyiGh6+vSeGqZZwRLVITP/
oUxqGEgnn2eFBD1y8vjsQNWMLWoi9Av4746Fxr8CEHX+jX1cp9CCkU2OkkLxaFcv
6XFtXPJIh/UjzVgPmjZxK+ArEX28QOM5IVyBFxsSl0dNtvyVqKpBXCV1RQ+fFHkO
ntHF3ZxibqOn8ZJmziCn0nzWSOqugNTdAhD4dJAbl58RB/IQtQT0OnHpmpXCG3xh
+/JBzy//xkr7u2HMqU69PzwPtWwZrENUV6jl5SHUDUoW8pySng2Pl4pbmTFqgWty
JTfc5EdyWOWyshhoSCtK2//bnVFryl2ntwGr3LIZrZxkiUiOeYjn+C/YedXZIRob
yy2CN+QanW/FXdIa4GMNeGc9sGDApd3/RtP+8L9mV1kWK6OE0EVskkI1UMCGXrIP
5JoE1jLMVhjcKQ==
=LJg6
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Improvements to the CPU topology subsystem, which fix some issues
where RISC-V would report bad topology information.
- The default NR_CPUS has increased to XLEN, and the maximum
configurable value is 512.
- The CD-ROM filesystems have been enabled in the defconfig.
- Support for THP_SWAP has been added for rv64 systems.
There are also a handful of cleanups and fixes throughout the tree.
* tag 'riscv-for-linus-6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: enable THP_SWAP for RV64
RISC-V: Print SSTC in canonical order
riscv: compat: s/failed/unsupported if compat mode isn't supported
RISC-V: Increase range and default value of NR_CPUS
cpuidle: riscv-sbi: Fix CPU_PM_CPU_IDLE_ENTER_xyz() macro usage
perf: RISC-V: throttle perf events
perf: RISC-V: exclude invalid pmu counters from SBI calls
riscv: enable CD-ROM file systems in defconfig
riscv: topology: fix default topology reporting
arm64: topology: move store_cpu_topology() to shared code
- implement EFI boot support for LoongArch
- implement generic EFI compressed boot support for arm64, RISC-V and
LoongArch, none of which implement a decompressor today
- measure the kernel command line into the TPM if measured boot is in
effect
- refactor the EFI stub code in order to isolate DT dependencies for
architectures other than x86
- avoid calling SetVirtualAddressMap() on arm64 if the configured size
of the VA space guarantees that doing so is unnecessary
- move some ARM specific code out of the generic EFI source files
- unmap kernel code from the x86 mixed mode 1:1 page tables
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmM5mfEACgkQw08iOZLZ
jySnJwv9G2nBheSlK9bbWKvCpnDvVIExtlL+mg1wB64oxPrGiWRgjxeyA9+92bT0
Y6jYfKbGOGKnxkEJQl19ik6C3JfEwtGm4SnOVp4+osFeDRB7lFemfcIYN5dqz111
wkZA/Y15rnz3tZeGaXnq2jMoFuccQDXPJtOlqbdVqFQ5Py6YT92uMyuI079pN0T+
GSu7VVOX+SBsv4nGaUKIpSVwAP0gXkS/7s7CTf47QiR2+j8WMTlQEYZVjOKZjMJZ
/7hXY2/mduxnuVuT7cfx0mpZKEryUREJoBL5nDzjTnlhLb5X8cHKiaE1lx0aJ//G
JYTR8lDklJZl/7RUw/IW/YodcKcofr3F36NMzWB5vzM+KHOOpv4qEZhoGnaXv94u
auqhzYA83heaRjz7OISlk6kgFxdlIRE1VdrkEBXSlQeCQUv1woS+ZNVGYcKqgR0B
48b31Ogm2A0pAuba89+U9lz/n33lhIDtYvJqLO6AAPLGiVacD9ZdapN5kMftVg/1
SfhFqNzy
=d8Ps
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"A bit more going on than usual in the EFI subsystem. The main driver
for this has been the introduction of the LoonArch architecture last
cycle, which inspired some cleanup and refactoring of the EFI code.
Another driver for EFI changes this cycle and in the future is
confidential compute.
The LoongArch architecture does not use either struct bootparams or DT
natively [yet], and so passing information between the EFI stub and
the core kernel using either of those is undesirable. And in general,
overloading DT has been a source of issues on arm64, so using DT for
this on new architectures is a to avoid for the time being (even if we
might converge on something DT based for non-x86 architectures in the
future). For this reason, in addition to the patch that enables EFI
boot for LoongArch, there are a number of refactoring patches applied
on top of which separate the DT bits from the generic EFI stub bits.
These changes are on a separate topich branch that has been shared
with the LoongArch maintainers, who will include it in their pull
request as well. This is not ideal, but the best way to manage the
conflicts without stalling LoongArch for another cycle.
Another development inspired by LoongArch is the newly added support
for EFI based decompressors. Instead of adding yet another
arch-specific incarnation of this pattern for LoongArch, we are
introducing an EFI app based on the existing EFI libstub
infrastructure that encapulates the decompression code we use on other
architectures, but in a way that is fully generic. This has been
developed and tested in collaboration with distro and systemd folks,
who are eager to start using this for systemd-boot and also for arm64
secure boot on Fedora. Note that the EFI zimage files this introduces
can also be decompressed by non-EFI bootloaders if needed, as the
image header describes the location of the payload inside the image,
and the type of compression that was used. (Note that Fedora's arm64
GRUB is buggy [0] so you'll need a recent version or switch to
systemd-boot in order to use this.)
Finally, we are adding TPM measurement of the kernel command line
provided by EFI. There is an oversight in the TCG spec which results
in a blind spot for command line arguments passed to loaded images,
which means that either the loader or the stub needs to take the
measurement. Given the combinatorial explosion I am anticipating when
it comes to firmware/bootloader stacks and firmware based attestation
protocols (SEV-SNP, TDX, DICE, DRTM), it is good to set a baseline now
when it comes to EFI measured boot, which is that the kernel measures
the initrd and command line. Intermediate loaders can measure
additional assets if needed, but with the baseline in place, we can
deploy measured boot in a meaningful way even if you boot into Linux
straight from the EFI firmware.
Summary:
- implement EFI boot support for LoongArch
- implement generic EFI compressed boot support for arm64, RISC-V and
LoongArch, none of which implement a decompressor today
- measure the kernel command line into the TPM if measured boot is in
effect
- refactor the EFI stub code in order to isolate DT dependencies for
architectures other than x86
- avoid calling SetVirtualAddressMap() on arm64 if the configured
size of the VA space guarantees that doing so is unnecessary
- move some ARM specific code out of the generic EFI source files
- unmap kernel code from the x86 mixed mode 1:1 page tables"
* tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
efi/arm64: libstub: avoid SetVirtualAddressMap() when possible
efi: zboot: create MemoryMapped() device path for the parent if needed
efi: libstub: fix up the last remaining open coded boot service call
efi/arm: libstub: move ARM specific code out of generic routines
efi/libstub: measure EFI LoadOptions
efi/libstub: refactor the initrd measuring functions
efi/loongarch: libstub: remove dependency on flattened DT
efi: libstub: install boot-time memory map as config table
efi: libstub: remove DT dependency from generic stub
efi: libstub: unify initrd loading between architectures
efi: libstub: remove pointless goto kludge
efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmap
efi: libstub: avoid efi_get_memory_map() for allocating the virt map
efi: libstub: drop pointless get_memory_map() call
efi: libstub: fix type confusion for load_options_size
arm64: efi: enable generic EFI compressed boot
loongarch: efi: enable generic EFI compressed boot
riscv: efi: enable generic EFI compressed boot
efi/libstub: implement generic EFI zboot
efi/libstub: move efi_system_table global var into separate object
...
This got out of order during a merge conflict, fix it by putting the
entries in the correct order.
Fixes: 7ab52f75a9 ("RISC-V: Add Sstc extension support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220920204518.10988-1-palmer@rivosinc.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When compat mode isn't supported(I believe this is the most case now),
kernel will emit somthing as:
[ 0.050407] riscv: ELF compat mode failed
This msg may make users think there's something wrong with the kernel
itself, replace "failed" with "unsupported" to make it clear. In fact
this is the real compat_mode_supported meaning. After the patch, the
msg would be:
[ 0.050407] riscv: ELF compat mode unsupported
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220821141819.3804-1-jszhang@kernel.org/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Identifying the underlying RISC-V implementation can be important
for some of the user space applications. For example, the perf tool
uses arch specific CPU implementation id (i.e. CPUID) to select a
JSON file describing custom perf events on a CPU.
Currently, there is no way to identify RISC-V implementation so we
add mvendorid, marchid, and mimpid to /proc/cpuinfo output.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Tested-by: Nikita Shubin <n.shubin@yadro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20220727043829.151794-1-apatel@ventanamicro.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The objects placed at the head of vmlinux need special treatments:
- arch/$(SRCARCH)/Makefile adds them to head-y in order to place
them before other archives in the linker command line.
- arch/$(SRCARCH)/kernel/Makefile adds them to extra-y instead of
obj-y to avoid them going into built-in.a.
This commit gets rid of the latter.
Create vmlinux.a to collect all the objects that are unconditionally
linked to vmlinux. The objects listed in head-y are moved to the head
of vmlinux.a by using 'ar m'.
With this, arch/$(SRCARCH)/kernel/Makefile can consistently use obj-y
for builtin objects.
There is no *.o that is directly linked to vmlinux. Drop unneeded code
in scripts/clang-tools/gen_compile_commands.py.
$(AR) mPi needs 'T' to workaround the llvm-ar bug. The fix was suggested
by Nathan Chancellor [1].
[1]: https://lore.kernel.org/llvm/YyjjT5gQ2hGMH0ni@dev-arch.thelio-3990X/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Just like other ISA extensions, we allow callers/users to detect the
presence of Svinval extension from ISA string.
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Remove the linked list use in favour of the vma iterator.
Link: https://lkml.kernel.org/r/20220906194824.2110408-67-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 2139619bca ("riscv: mmap with PROT_WRITE but no PROT_READ is
invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
PROT_READ with the justification that a write-only PTE is considered a
reserved PTE permission bit pattern in the privileged spec. This check
is unnecessary since we let VM_WRITE imply VM_READ on RISC-V, and it is
inconsistent with other architectures that don't support write-only PTEs,
creating a potential software portability issue. Just remove the check
altogether and let PROT_WRITE imply PROT_READ as is the case on other
architectures.
Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
disallowed prior to the aforementioned commit; PROT_READ is implied in
such mappings as well.
Fixes: 2139619bca ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220915193702.2201018-3-abrestic@rivosinc.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The stub is used in different execution environments, but on arm64,
RISC-V and LoongArch, we still use the core kernel's implementation of
memcpy and memset, as they are just a branch instruction away, and can
generally be reused even from code such as the EFI stub that runs in a
completely different address space.
KAsan complicates this slightly, resulting in the need for some hacks to
expose the uninstrumented, __ prefixed versions as the normal ones, as
the latter are instrumented to include the KAsan checks, which only work
in the core kernel.
Unfortunately, #define'ing memcpy to __memcpy when building C code does
not guarantee that no explicit memcpy() calls will be emitted. And with
the upcoming zboot support, which consists of a separate binary which
therefore needs its own implementation of memcpy/memset anyway, it's
better to provide one explicitly instead of linking to the existing one.
Given that EFI exposes implementations of memmove() and memset() via the
boot services table, let's wire those up in the appropriate way, and
drop the references to the core kernel ones.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
riscv has an equivalent of arm bug fixed by 653d48b221 ("arm: fix
really nasty sigreturn bug"); if signal gets caught by an interrupt that
hits when we have the right value in a0 (-513), *and* another signal
gets delivered upon sigreturn() (e.g. included into the blocked mask for
the first signal and posted while the handler had been running), the
syscall restart logics will see regs->cause equal to EXC_SYSCALL (we are
in a syscall, after all) and a0 already restored to its original value
(-513, which happens to be -ERESTARTNOINTR) and assume that we need to
apply the usual syscall restart logics.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/YxJEiSq%2FCGaL6Gm9@ZenIV/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This fixes two issues: I truncated the warning's hart ID when porting to
the 64-bit hart ID code, and the original code's warning handling could
fire on an uninitialized hart ID.
The biggest change here is that riscv_cbom_block_size is no longer
initialized, as IMO the default isn't sane: there's nothing in the ISA
that mandates any specific cache block size, so falling back to one will
just silently produce the wrong answer on some systems. This also
changes the probing order so the cache block size is known before
enabling Zicbom support.
CC: stable@vger.kernel.org
CC: Andrew Jones <ajones@ventanamicro.com>
CC: Heiko Stuebner <heiko@sntech.de>
CC: Atish Patra <atishp@rivosinc.com>
Fixes: 3aefb2ee5b ("riscv: implement Zicbom-based CMO instructions + the t-head variant")
Fixes: 1631ba1259 ("riscv: Add support for non-coherent devices using zicbom extension")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
[Conor: fixed the redefinition errors]
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220912224800.998121-1-mail@conchuod.ie
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This contains a pair of fixes for build-time warnings.
* 'riscv-variable_fixes_without_kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git:
riscv: traps: add missing prototype
riscv: signal: fix missing prototype warning
Sparse complains:
arch/riscv/kernel/traps.c:213:6: warning: symbol 'shadow_stack' was not declared. Should it be static?
The variable is used in entry.S, so declare shadow_stack there
alongside SHADOW_OVERFLOW_STACK_SIZE.
Fixes: 31da94c25a ("riscv: add VMAP_STACK overflow detection")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220814141237.493457-5-mail@conchuod.ie
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Fix the warning:
arch/riscv/kernel/signal.c:316:27: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes]
asmlinkage __visible void do_notify_resume(struct pt_regs *regs,
All other functions in the file are static & none of the existing
headers stood out as an obvious location. Create signal.h to hold the
declaration.
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220814141237.493457-4-mail@conchuod.ie
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv_isa_ext_keys[] is an array of static keys used in the unified
ISA extension framework. The keys added to this array may be used
anywhere, including in modules. Ensure the keys remain writable by
placing them in the data section.
The need to change riscv_isa_ext_keys[]'s section was found when the
kvm module started failing to load. Commit 8eb060e101 ("arch/riscv:
add Zihintpause support") adds a static branch check for a newly
added isa-ext key to cpu_relax(), which kvm uses.
Fixes: c360cbec35 ("riscv: introduce unified static key mechanism for ISA extensions")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Cc: stable@vger.kernel.org
Reported-by: Ron Economos <re@w6rz.net>
Reported-by: Anup Patel <apatel@ventanamicro.com>
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220816163058.3004536-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISC-V has no sane defaults to fall back on where there is no cpu-map
in the devicetree.
Without sane defaults, the package, core and thread IDs are all set to
-1. This causes user-visible inaccuracies for tools like hwloc/lstopo
which rely on the sysfs cpu topology files to detect a system's
topology.
On a PolarFire SoC, which should have 4 harts with a thread each,
lstopo currently reports:
Machine (793MB total)
Package L#0
NUMANode L#0 (P#0 793MB)
Core L#0
L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
Adding calls to store_cpu_topology() in {boot,smp} hart bringup code
results in the correct topolgy being reported:
Machine (793MB total)
Package L#0
NUMANode L#0 (P#0 793MB)
L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
CC: stable@vger.kernel.org # 456797da79: arm64: topology: move store_cpu_topology() to shared code
Fixes: 03f11f03db ("RISC-V: Parse cpu topology during boot.")
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Link: https://github.com/open-mpi/hwloc/issues/536
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
There's still a handful of new features in here, but there are a lot of
fixes/cleanups as well:
* Support for the Zicbom for explicit cache-block management, along with
the necessary bits to make the non-standard cache management ops on
the Allwinner D1 function.
* Support for the Zihintpause extension, which codifies a go-slow
instruction used for cpu_relax().
* Support for the Sstc extension for supervisor-mode timer/counter
management.
* Many device tree fixes and cleanups, including a large set for the
Canaan device trees.
* A handful of fixes and cleanups for the PMU driver.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmL21egTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiSsiD/9cCN/7ndt4v7N65PUya+mVYC9VPppB
d/UC74M0mMUHQbtdtHzlCZVHW0pxc6Pc8oDTWLviKxNSHa6LQkLQJ/RZZz4YlH91
V/vh6DCZv9TRfHJS2E6jMUKEAVAiGg+723gE5EqLc5uapIBrvmiluQwBIQcu8dj1
egfdxJH3IVrEZWwROrYtffDgw4sipENuch5v4yhk4vH0bMlatcIM+hMpPZgOfbgX
xip4K4B/HTAJRn5vunrlCQzYdg+g9l5iEy73A/A9HfzOFCMTMJFp1zHvIrzLUjKC
79MZza3GJLpwMG4C1j8u+qOL01wVrQcA5gNp+14UuUedl/jHaceZwBkCL4cmFyGP
LE94Ed7+6cIJJH/NTcNfOOSD9byOePjfan+qJIlRBxZGbHKt+Ip6Lu2FGeftpXah
MlhhN5S1nGTuFpn7XGRsYrB/VLBD/KWsLxvWBZZWsSYwHwnFA9ZTUbcuQfxTJ+Qq
mH9wZIZ/z8MaEjKcdooIPHjKl+CFjDpVYWge83/t12LLYC9ryTM4vIlltZ84bs6i
2CMSNjBRSuPa7FQPHW+a6CWAjz2Lv1u9jCSH0iI62ytZR5/zinI39tv6LvwbypbY
VfWBnrtLNLlZmNbk13ODV64ayhTpZoZXWGL2TvkJklBqEPqda+9/nopiqb8a8jFZ
yEmKFdEqrb1LAg==
=caji
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
"There's still a handful of new features in here, but there are a lot
of fixes/cleanups as well:
- Support for the Zicbom extension for explicit cache-block
management, along with the necessary bits to make the non-standard
cache management ops on the Allwinner D1 function
- Support for the Zihintpause extension, which codifies a go-slow
instruction used for cpu_relax()
- Support for the Sstc extension for supervisor-mode timer/counter
management
- Many device tree fixes and cleanups, including a large set for the
Canaan device trees
- A handful of fixes and cleanups for the PMU driver"
* tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (43 commits)
dt-bindings: gpio: sifive: add gpio-line-names
wireguard: selftests: set CONFIG_NONPORTABLE on riscv32
RISC-V: KVM: Support sstc extension
RISC-V: Improve SBI definitions
RISC-V: Move counter info definition to sbi header file
RISC-V: Fix SBI PMU calls for RV32
RISC-V: Update user page mapping only once during start
RISC-V: Fix counter restart during overflow for RV32
RISC-V: Prefer sstc extension if available
RISC-V: Enable sstc extension parsing from DT
RISC-V: Add SSTC extension CSR details
riscv:uprobe fix SR_SPIE set/clear handling
dt-bindings: riscv: fix SiFive l2-cache's cache-sets
riscv: ensure cpu_ops_sbi is declared
RISC-V: cpu_ops_spinwait.c should include head.h
RISC-V: Declare cpu_ops_spinwait in <asm/cpu_ops.h>
riscv: dts: starfive: correct number of external interrupts
riscv: dts: sifive unmatched: Add PWM controlled LEDs
riscv/purgatory: Omit use of bin2c
riscv/purgatory: hard-code obj-y in Makefile
...
This series implements Sstc extension support which was ratified
recently. Before the Sstc extension, an SBI call is necessary to
generate timer interrupts as only M-mode have access to the timecompare
registers. Thus, there is significant latency to generate timer
interrupts at kernel. For virtualized enviornments, its even worse as
the KVM handles the SBI call and uses a software timer to emulate the
timecomapre register.
Sstc extension solves both these problems by defining a
stimecmp/vstimecmp at supervisor (host/guest) level. It allows kernel to
program a timer and recieve interrupt without supervisor execution
enviornment (M-mode/HS mode) intervention.
* palmer/riscv-sstc:
RISC-V: Prefer sstc extension if available
RISC-V: Enable sstc extension parsing from DT
RISC-V: Add SSTC extension CSR details
The ISA extension framework now allows parsing any multi-letter
ISA extension.
Enable that for sstc extension.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220722165047.519994-3-atishp@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In riscv the process of uprobe going to clear spie before exec
the origin insn,and set spie after that.But When access the page
which origin insn has been placed a page fault may happen and
irq was disabled in arch_uprobe_pre_xol function,It cause a WARN
as follows.
There is no need to clear/set spie in arch_uprobe_pre/post/abort_xol.
We can just remove it.
[ 31.684157] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1488
[ 31.684677] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 76, name: work
[ 31.684929] preempt_count: 0, expected: 0
[ 31.685969] CPU: 2 PID: 76 Comm: work Tainted: G
[ 31.686542] Hardware name: riscv-virtio,qemu (DT)
[ 31.686797] Call Trace:
[ 31.687053] [<ffffffff80006442>] dump_backtrace+0x30/0x38
[ 31.687699] [<ffffffff80812118>] show_stack+0x40/0x4c
[ 31.688141] [<ffffffff8081817a>] dump_stack_lvl+0x44/0x5c
[ 31.688396] [<ffffffff808181aa>] dump_stack+0x18/0x20
[ 31.688653] [<ffffffff8003e454>] __might_resched+0x114/0x122
[ 31.688948] [<ffffffff8003e4b2>] __might_sleep+0x50/0x7a
[ 31.689435] [<ffffffff80822676>] down_read+0x30/0x130
[ 31.689728] [<ffffffff8000b650>] do_page_fault+0x166/x446
[ 31.689997] [<ffffffff80003c0c>] ret_from_exception+0x0/0xc
Fixes: 74784081aa ("riscv: Add uprobes supported")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220721065820.245755-1-zouyipeng@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Sparse complains that cpu_ops_sbi is used undeclared:
arch/riscv/kernel/cpu_ops_sbi.c:17:29: warning: symbol 'cpu_ops_sbi' was not declared. Should it be static?
Fix the warning by adding cpu_ops_sbi to cpu_ops_sbi.h & including that
where used.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20220714080235.3853374-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Running sparse shows cpu_ops_spinwait.c is missing two definitions
found in head.h, so include it to stop the following warnings:
arch/riscv/kernel/cpu_ops_spinwait.c:15:6: warning: symbol '__cpu_spinwait_stack_pointer' was not declared. Should it be static?
arch/riscv/kernel/cpu_ops_spinwait.c:16:6: warning: symbol '__cpu_spinwait_task_pointer' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Link: https://lore.kernel.org/r/20220713215306.94675-1-ben.dooks@sifive.com
Fixes: c78f94f35c ("RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The cpu_ops_spinwait is used in a couple of places in arch/riscv
and is causing a sparse warning due to no declaration. Add this
to <asm/cpu_ops.h> with the others to fix the following:
arch/riscv/kernel/cpu_ops_spinwait.c:16:29: warning: symbol 'cpu_ops_spinwait' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Link: https://lore.kernel.org/r/20220714071811.187491-1-ben.dooks@sifive.com
[Palmer: Drop the extern from cpu_ops.c]
Fixes: 2ffc48fc70 ("RISC-V: Move spinwait booting method to its own config")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
A handful of fixes to our kexec/crash kernel support that allow crash
tool to function.
Link: https://lore.kernel.org/r/mhng-f5fdaa37-e99a-4214-a297-ec81f0fed0c1@palmer-mbp2014
* commit 'f9293ad46d8ba9909187a37b7215324420ad4596':
RISC-V: Add modules to virtual kernel memory layout dump
RISC-V: Fixup schedule out issue in machine_crash_shutdown()
RISC-V: Fixup get incorrect user mode PC for kernel mode regs
RISC-V: kexec: Fixup use of smp_processor_id() in preemptible context
This series is based on the alternatives changes done in my svpbmt
series and thus also depends on Atish's isa-extension parsing series.
It implements using the cache-management instructions from the Zicbom-
extension to handle cache flush, etc actions on platforms needing them.
SoCs using cpu cores from T-Head like the Allwinne D1 implement a
different set of cache instructions. But while they are different,
instructions they provide the same functionality, so a variant can easly
hook into the existing alternatives mechanism on those.
[Palmer: Some minor fixups, including a RISCV_ISA_ZICBOM dependency on
MMU that's probably not strictly necessary. The Zicbom support will
trip up sparse for users that have new toolchains, I just sent a patch.]
Link: https://lore.kernel.org/all/20220706231536.2041855-1-heiko@sntech.de/
Link: https://lore.kernel.org/linux-sparse/20220811033138.20676-1-palmer@rivosinc.com/T/#u
* palmer/riscv-zicbom:
riscv: implement cache-management errata for T-Head SoCs
riscv: Add support for non-coherent devices using zicbom extension
dt-bindings: riscv: document cbom-block-size
of: also handle dma-noncoherent in of_dma_is_coherent()
fatfs, autofs, squashfs, procfs, etc.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYu9BeQAKCRDdBJ7gKXxA
jp1DAP4mjCSvAwYzXklrIt+Knv3CEY5oVVdS+pWOAOGiJpldTAD9E5/0NV+VmlD9
kwS/13j38guulSlXRzDLmitbg81zAAI=
=Zfum
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
material this time"
* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
scripts/gdb: ensure the absolute path is generated on initial source
MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
mailmap: add linux.dev alias for Brendan Higgins
mailmap: update Kirill's email
profile: setup_profiling_timer() is moslty not implemented
ocfs2: fix a typo in a comment
ocfs2: use the bitmap API to simplify code
ocfs2: remove some useless functions
lib/mpi: fix typo 'the the' in comment
proc: add some (hopefully) insightful comments
bdi: remove enum wb_congested_state
kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
squashfs: support reading fragments in readahead call
squashfs: implement readahead
squashfs: always build "file direct" version of page actor
Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
fs/ocfs2: Fix spelling typo in comment
ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
proc: fix test for "vsyscall=xonly" boot option
...
* Enabling the FPU is now a static_key.
* Improvements to the Svpbmt support.
* CPU topology bindings for a handful of systems.
* Support for systems with 64-bit hart IDs.
* Many settings have been enabled in the defconfig, including both
support for the StarFive systems and many of the Docker requirements.
There are also a handful of cleanups and improvements, like usual.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAmLtolMTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQcqLD/48RwM9IjNKanYgA0JWMSzrtQKLpv6R
+vD6/9RVse+8HL5pMacJdUKKMLUFkIxJIpzmTviGHMc5j4MnVql7CNHAPQQX0L60
+eV6ogVQI54CzC/lYgfiDolS7ZIVVSatM6YgBVf8045bHqgmEqx9H4W87fsM42XW
IDmRmW9bQu6a5I+gegsHvKwG4ex5SWeesrIOi0RaTwMR7lZuiy3shFTfpHsqalSp
N3oPyC0AvngsZcl/iqMB4AbEFgWct7bL6EipuDP3q4VvT4v9eMOkl+j6wIKNlfBJ
1igdmNNTLQQHCVSeTx93i1ZsJDh1yvAx3+sbv34WiMIVEDrusTFIxGzV4uX3kVO/
2pR31ItH7SFNjOMp6j/ouaGvtAqJVYSiya82jijUpMtxt0Pajua+cxN5xrX3RY2f
L13VRW5XIgBykmfJLhLz0rzPj/Sfk55RhcV8CIoOPhzugz39oF6z4XaTH2lvqykM
mZ1s0Ab4Ontf26MDnj73gUHzRadDzHbIptvdDGVPlxn8+Ki/Yed6tT7nQrAvxZ5b
93uHJAjGTL9DrorYCEhfp4SAnQPTtqXnmSvw84dAU5QrsLvDk/JIS+MNMPm9VCcS
GRUmI6iCE1D+KERT2mED7NOmRqbd4NCZ6T1UXO64MVv1PuqM9f5+bVohU31PJEU1
uAXelP/sl0+mYw==
=v9hn
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.20-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Enabling the FPU is now a static_key
- Improvements to the Svpbmt support
- CPU topology bindings for a handful of systems
- Support for systems with 64-bit hart IDs
- Many settings have been enabled in the defconfig, including both
support for the StarFive systems and many of the Docker requirements
There are also a handful of cleanups and improvements, as usual.
* tag 'riscv-for-linus-5.20-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (28 commits)
riscv: enable Docker requirements in defconfig
riscv: convert the t-head pbmt errata to use the __nops macro
riscv: introduce nops and __nops macros for NOP sequences
RISC-V: Add fast call path of crash_kexec()
riscv: mmap with PROT_WRITE but no PROT_READ is invalid
riscv/efi_stub: Add 64bit boot-hartid support on RV64
riscv: cpu: Add 64bit hartid support on RV64
riscv: smp: Add 64bit hartid support on RV64
riscv: spinwait: Fix hartid variable type
riscv: cpu_ops_sbi: Add 64bit hartid support on RV64
riscv: dts: sifive: "fix" pmic watchdog node name
riscv: dts: canaan: Add k210 topology information
riscv: dts: sifive: Add fu740 topology information
riscv: dts: sifive: Add fu540 topology information
riscv: dts: starfive: Add JH7100 CPU topology
RISC-V: Add CONFIG_{NON,}PORTABLE
riscv: config: enable SOC_STARFIVE in defconfig
riscv: dts: microchip: Add mpfs' topology information
riscv: Kconfig.socs: Add comments
riscv: Kconfig.erratas: Add comments
...
This pull request contains the following branches:
doc.2022.06.21a: Documentation updates.
fixes.2022.07.19a: Miscellaneous fixes.
nocb.2022.07.19a: Callback-offload updates, perhaps most notably a new
RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to
be offloaded at boot time, regardless of kernel boot parameters.
This is useful to battery-powered systems such as ChromeOS
and Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel
boot parameter prevents offloaded callbacks from interfering
with real-time workloads and with energy-efficiency mechanisms.
poll.2022.07.21a: Polled grace-period updates, perhaps most notably
making these APIs account for both normal and expedited grace
periods.
rcu-tasks.2022.06.21a: Tasks RCU updates, perhaps most notably reducing
the CPU overhead of RCU tasks trace grace periods by more than
a factor of two on a system with 15,000 tasks. The reduction
is expected to increase with the number of tasks, so it seems
reasonable to hypothesize that a system with 150,000 tasks might
see a 20-fold reduction in CPU overhead.
torture.2022.06.21a: Torture-test updates.
ctxt.2022.07.05a: Updates that merge RCU's dyntick-idle tracking into
context tracking, thus reducing the overhead of transitioning to
kernel mode from either idle or nohz_full userspace execution
for kernels that track context independently of RCU. This is
expected to be helpful primarily for kernels built with
CONFIG_NO_HZ_FULL=y.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLgMcgTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jArXD/0fjbCwqpRjHVTzjMY8jN4zDkqZZD6m
g8Fx27hZ4ToNFwRptyHwNezrNj14skjAJEXfdjaVw32W62ivXvf0HINvSzsTLCSq
k2kWyBdXLc9CwY5p5W4smnpn5VoAScjg5PoPL59INoZ/Zziji323C7Zepl/1DYJt
0T6bPCQjo1ZQoDUCyVpSjDmAqxnderWG0MeJVt74GkLqmnYLANg0GH8c7mH4+9LL
kVGlLp5nlPgNJ4FEoFdMwNU8T/ETmaVld/m2dkiawjkXjJzB2XKtBigU91DDmXz5
7DIdV4ABrxiy4kGNqtIe/jFgnKyVD7xiDpyfjd6KTeDr/rDS8u2ZH7+1iHsyz3g0
Np/tS3vcd0KR+gI/d0eXxPbgm5sKlCmKw/nU2eArpW/+4LmVXBUfHTG9Jg+LJmBc
JrUh6aEdIZJZHgv/nOQBNig7GJW43IG50rjuJxAuzcxiZNEG5lUSS23ysaA9CPCL
PxRWKSxIEfK3kdmvVO5IIbKTQmIBGWlcWMTcYictFSVfBgcCXpPAksGvqA5JiUkc
egW+xLFo/7K+E158vSKsVqlWZcEeUbsNJ88QOlpqnRgH++I2Yv/LhK41XfJfpH+Y
ALxVaDd+mAq6v+qSHNVq9wT3ozXIPy/zK1hDlMIqx40h2YvaEsH4je+521oSoN9r
vX60+QNxvUBLwA==
=vUNm
-----END PGP SIGNATURE-----
Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offload updates, perhaps most notably a new
RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
offloaded at boot time, regardless of kernel boot parameters.
This is useful to battery-powered systems such as ChromeOS and
Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
parameter prevents offloaded callbacks from interfering with
real-time workloads and with energy-efficiency mechanisms
- Polled grace-period updates, perhaps most notably making these APIs
account for both normal and expedited grace periods
- Tasks RCU updates, perhaps most notably reducing the CPU overhead of
RCU tasks trace grace periods by more than a factor of two on a
system with 15,000 tasks.
The reduction is expected to increase with the number of tasks, so it
seems reasonable to hypothesize that a system with 150,000 tasks
might see a 20-fold reduction in CPU overhead
- Torture-test updates
- Updates that merge RCU's dyntick-idle tracking into context tracking,
thus reducing the overhead of transitioning to kernel mode from
either idle or nohz_full userspace execution for kernels that track
context independently of RCU.
This is expected to be helpful primarily for kernels built with
CONFIG_NO_HZ_FULL=y
* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
rcu: Diagnose extended sync_rcu_do_polled_gp() loops
rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
rcutorture: Test polled expedited grace-period primitives
rcu: Add polled expedited grace-period primitives
rcutorture: Verify that polled GP API sees synchronous grace periods
rcu: Make Tiny RCU grace periods visible to polled APIs
rcu: Make polled grace-period API account for expedited grace periods
rcu: Switch polled grace-period APIs to ->gp_seq_polled
rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
rcu/nocb: Add option to opt rcuo kthreads out of RT priority
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
rcu/nocb: Add an option to offload all CPUs on boot
rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
rcu/nocb: Add/del rdp to iterate from rcuog itself
rcu/tree: Add comment to describe GP-done condition in fqs loop
rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
rcu/kvfree: Remove useless monitor_todo flag
rcu: Cleanup RCU urgency state for offline CPU
...
- lockdep: Fix a handful of the more complex lockdep_init_map_*() primitives
that can lose the lock_type & cause false reports. No such mishap was
observed in the wild.
- jump_label improvements: simplify the cross-arch support of
initial NOP patching by making it arch-specific code (used on MIPS only),
and remove the s390 initial NOP patching that was superfluous.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLn3jERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hzeg/7BTC90XeMANhTiL23iiH7dOYZwqdFeB12
VBqdaPaGC8i+mJzVAdGyPFwCFDww6Ak6P33PcHkemuIO5+DhWis8hfw5krHEOO1k
AyVSMOZuWJ8/g6ZenjgNFozQ8C+3NqURrpdqN55d7jhMazPWbsNLLqUgvSSqo6DY
Ah2O+EKrDfGNCxT6/YaTAmUryctotxafSyFDQxv3RKPfCoIIVv9b3WApYqTOqFIu
VYTPr+aAcMsU20hPMWQI4kbQaoCxFqr3bZiZtAiS/IEunqi+PlLuWjrnCUpLwVTC
+jOCkNJHt682FPKTWelUnCnkOg9KhHRujRst5mi1+2tWAOEvKltxfe05UpsZYC3b
jhzddREMwBt3iYsRn65LxxsN4AMK/C/41zjejHjZpf+Q5kwDsc6Ag3L5VifRFURS
KRwAy9ejoVYwnL7CaVHM2zZtOk4YNxPeXmiwoMJmOufpdmD1LoYbNUbpSDf+goIZ
yPJpxFI5UN8gi8IRo3DMe4K2nqcFBC3wFn8tNSAu+44gqDwGJAJL6MsLpkLSZkk8
3QN9O11UCRTJDkURjoEWPgRRuIu9HZ4GKNhiblDy6gNM/jDE/m5OG4OYfiMhojgc
KlMhsPzypSpeApL55lvZ+AzxH8mtwuUGwm8lnIdZ2kIse1iMwapxdWXWq9wQr8eW
jLWHgyZ6rcg=
=4B89
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"This was a fairly quiet cycle for the locking subsystem:
- lockdep: Fix a handful of the more complex lockdep_init_map_*()
primitives that can lose the lock_type & cause false reports. No
such mishap was observed in the wild.
- jump_label improvements: simplify the cross-arch support of initial
NOP patching by making it arch-specific code (used on MIPS only),
and remove the s390 initial NOP patching that was superfluous"
* tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix lockdep_init_map_*() confusion
jump_label: make initial NOP patching the special case
jump_label: mips: move module NOP patching into arch code
jump_label: s390: avoid pointless initial NOP patching
The setup_profiling_timer() is mostly un-implemented by many
architectures. In many places it isn't guarded by CONFIG_PROFILE which is
needed for it to be used. Make it a weak symbol in kernel/profile.c and
remove the 'return -EINVAL' implementations from the kenrel.
There are a couple of architectures which do return 0 from the
setup_profiling_timer() function but they don't seem to do anything else
with it. To keep the /proc compatibility for now, leave these for a
future update or removal.
On ARM, this fixes the following sparse warning:
arch/arm/kernel/smp.c:793:5: warning: symbol 'setup_profiling_timer' was not declared. Should it be static?
Link: https://lkml.kernel.org/r/20220721195509.418205-1-ben-linux@fluff.org
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The Zicbom ISA-extension was ratified in november 2021
and introduces instructions for dcache invalidate, clean
and flush operations.
Implement cache management operations for non-coherent devices
based on them.
Of course not all cores will support this, so implement an
alternative-based mechanism that replaces empty instructions
with ones done around Zicbom instructions.
As discussed in previous versions, assume the platform
being coherent by default so that non-coherent devices need
to get marked accordingly by firmware.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220706231536.2041855-4-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, almost all archs (x86, arm64, mips...) support fast call
of crash_kexec() when "regs && kexec_should_crash()" is true. But
RISC-V not, it can only enter crash system via panic(). However panic()
doesn't pass the regs of the real accident scene to crash_kexec(),
it caused we can't get accurate backtrace via gdb,
$ riscv64-linux-gnu-gdb vmlinux vmcore
Reading symbols from vmlinux...
[New LWP 95]
#0 console_unlock () at kernel/printk/printk.c:2557
2557 if (do_cond_resched)
(gdb) bt
#0 console_unlock () at kernel/printk/printk.c:2557
#1 0x0000000000000000 in ?? ()
With the patch we can get the accurate backtrace,
$ riscv64-linux-gnu-gdb vmlinux vmcore
Reading symbols from vmlinux...
[New LWP 95]
#0 0xffffffe00063a4e0 in test_thread (data=<optimized out>) at drivers/test_crash.c:81
81 *(int *)p = 0xdead;
(gdb)
(gdb) bt
#0 0xffffffe00064d5c0 in test_thread (data=<optimized out>) at drivers/test_crash.c:81
#1 0x0000000000000000 in ?? ()
Test code to produce NULL address dereference in test_crash.c,
void *p = NULL;
*(int *)p = 0xdead;
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220606082308.2883458-1-xianting.tian@linux.alibaba.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
As mentioned in Table 4.5 in RISC-V spec Volume 2 Section 4.3, write
but not read is "Reserved for future use.". For now, they are not valid.
In the current code, -wx is marked as invalid, but -w- is not marked
as invalid.
This patch refines that judgment.
Reported-by: xctan <xc-tan@outlook.com>
Co-developed-by: dram <dramforever@live.com>
Signed-off-by: dram <dramforever@live.com>
Co-developed-by: Ruizhe Pan <c141028@gmail.com>
Signed-off-by: Ruizhe Pan <c141028@gmail.com>
Signed-off-by: Celeste Liu <coelacanthus@outlook.com>
Link: https://lore.kernel.org/r/PH7PR14MB559464DBDD310E755F5B21E8CEDC9@PH7PR14MB5594.namprd14.prod.outlook.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The hartid can be a 64bit value on RV64 platforms. This series updates
the code so that 64bit hartid can be supported on RV64 platforms.
* 'riscv-64bit_hartid' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git:
riscv/efi_stub: Add 64bit boot-hartid support on RV64
riscv: cpu: Add 64bit hartid support on RV64
riscv: smp: Add 64bit hartid support on RV64
riscv: spinwait: Fix hartid variable type
riscv: cpu_ops_sbi: Add 64bit hartid support on RV64
The hartid can be a 64bit value on RV64 platforms.
Add support for 64bit hartid in riscv_of_processor_hartid() and
update its callers.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220527051743.2829940-5-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The hartid can be a 64bit value on RV64 platforms.
Modify the hartid parameter in riscv_hartid_to_cpuid() as
unsigned long so that it can hold 64bit value on RV64 platforms.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220527051743.2829940-4-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The hartid variable is of type int but compared with
ULONG_MAX(INVALID_HARTID). This issue is fixed by changing
the hartid variable type to unsigned long.
Fixes: c78f94f35c ("RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method")
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220527051743.2829940-3-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The hartid can be a 64bit value on RV64 platforms.
Modify the hartid variable type to unsigned long so that it can
hold 64bit value on RV64 platforms.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220527051743.2829940-2-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When CONFIG_KEXEC_FILE=y but CONFIG_KEXEC is not set:
kernel/kexec_core.o: In function `kimage_free':
kexec_core.c:(.text+0xa0c): undefined reference to `machine_kexec_cleanup'
kernel/kexec_core.o: In function `.L0 ':
kexec_core.c:(.text+0xde8): undefined reference to `machine_crash_shutdown'
kexec_core.c:(.text+0xdf4): undefined reference to `machine_kexec'
kernel/kexec_core.o: In function `.L231':
kexec_core.c:(.text+0xe1c): undefined reference to `riscv_crash_save_regs'
kernel/kexec_core.o: In function `.L0 ':
kexec_core.c:(.text+0x119e): undefined reference to `machine_shutdown'
kernel/kexec_core.o: In function `.L312':
kexec_core.c:(.text+0x11b2): undefined reference to `machine_kexec'
kernel/kexec_file.o: In function `.L0 ':
kexec_file.c:(.text+0xb84): undefined reference to `machine_kexec_prepare'
kernel/kexec_file.o: In function `.L177':
kexec_file.c:(.text+0xc5a): undefined reference to `machine_kexec_prepare'
Makefile:1160: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1
These symbols should depend on CONFIG_KEXEC_CORE rather than CONFIG_KEXEC
when kexec_file has been implemented on RISC-V, like the other archs have
done.
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20220601070204.26882-1-lizhengyu3@huawei.com
Fixes: 6261586e0c ("RISC-V: Add kexec_file support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When CONFIG_MODULES is not set/enabled:
../arch/riscv/kernel/elf_kexec.c:353:9: error: unknown type name 'Elf_Rela'; did you mean 'Elf64_Rela'?
353 | Elf_Rela *relas;
| ^~~~~~~~
| Elf64_Rela
Replace Elf_Rela by Elf64_Rela to avoid relying on CONFIG_MODULES.
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20220601063924.13037-1-lizhengyu3@huawei.com
Fixes: 838b3e2848 ("RISC-V: Load purgatory in kexec_file")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Context tracking is going to be used not only to track user transitions
but also idle/IRQs/NMIs. The user tracking part will then become a
separate feature. Prepare Kconfig for that.
[ frederic: Apply Max Filippov feedback. ]
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
context_tracking_user_enter() and context_tracking_user_exit() are
ASM callable versions of user_enter() and user_exit() for architectures
that didn't manage to check the context tracking static key from ASM.
Change those function names to better reflect their purpose.
[ frederic: Apply Max Filippov feedback. ]
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Instead of defaulting to patching NOP opcodes at init time, and leaving
it to the architectures to override this if this is not needed, switch
to a model where doing nothing is the default. This is the common case
by far, as only MIPS requires NOP patching at init time. On all other
architectures, the correct encodings are emitted by the compiler and so
no initial patching is needed.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220615154142.1574619-4-ardb@kernel.org
Some additionals comments and notes from autobuilders received after the
series got applied, warranted some changes.
* 'riscv-svpbmt' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
riscv: remove usage of function-pointers from cpufeatures and t-head errata
riscv: make patch-function pointer more generic in cpu_manufacturer_info struct
riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
riscv: drop cpufeature_apply_feature tracking variable
riscv: fix dependency for t-head errata
Having a list of alternatives to check with a per-entry function pointer
to a check function is nice style-wise. But in case of early-alternatives
it can clash with the non-relocated kernel and the function pointer in
the list pointing to a completely wrong location.
This isn't an issue with one or two list entries, as in that case the
compiler seems to unroll the loop and even usage of the list structure
and then only does relative jumps into the check functions based on this.
When adding a third entry to either list though, the issue that was
hiding there from the beginning is triggered resulting a jump to a
memory address that isn't part of the kernel at all.
The list of features/erratas only contained an unused name and the
pointer to the check function, so an easy solution for the problem
is to just unroll the loop in code, dismantle the whole list structure
and just call the relevant check functions one by one ourself.
For the T-Head errata this includes moving the stage-check inside
the check functions.
The issue is only relevant for things that might be called for early-
alternatives (T-Head and possible future main extensions), so the
SiFive erratas were not affected from the beginning, as they got
an early return for early-alternatives in the original patchset.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20220526205646.258337-6-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
During review the naming of the function-pointer was called
confusing as the vendor id is just one of three inputs for
the patching and indeed it serves no real purpose, as with
recent changes the function pointer is not a static
global element anymore, so drop the "vendor_" prefix.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220526205646.258337-4-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The variable was tracking which feature patches got applied
but that information was never actually used - and thus resulted
in a warning as well.
Drop the variable.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220526205646.258337-2-heiko@sntech.de
Fixes: ff689fd21c ("riscv: add RISC-V Svpbmt extension support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This is to use the unified static key mechanism instead of putting
static key related here and there.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20220522153543.2656-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, riscv has several extensions which may not be supported on
all riscv platforms, for example, FPU and so on. To support unified
kernel Image style, we need to check whether the feature is supported
or not. If the check sits at hot code path, then performance will be
impacted a lot. static key can be used to solve the issue. In the past,
FPU support has been converted to use static key mechanism. I believe
we will have similar cases in the future.
This patch tries to add an unified mechanism to use static keys for
some ISA extensions by implementing an array of default-false static keys
and enabling them when detected.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20220522153543.2656-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This series includes the following patchsets:
- bitmap: optimize bitmap_weight() usage(w/o bitmap_weight_cmp), from me;
- lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
Carvalho Chehab;
- include/linux/find: Fix documentation, from Anna-Maria Behnsen;
- bitmap: fix conversion from/to fix-sized arrays, from me;
- bitmap: Fix return values to be unsigned, from Kees Cook.
It has been in linux-next for at least a week with no problems.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmKaEzYACgkQsUSA/Tof
vsiGKwv8Dgr3G0mLbSfmHZqdFMIsmSmwhxlEH6eBNtX6vjQbGafe/Buhj/1oSY8N
NYC4+5Br6s7MmMRth3Kp6UECdl94TS3Ka06T+lVBKkG+C+B1w1/svqUMM2ZCQF3e
Z5R/HhR6av9X9Qb2mWSasWLkWp629NjdtRsJSDWiVt1emVVwh+iwxQnMH9VuE+ao
z3mvaQfSRhe4h+xCZOiohzFP+0jZb1EnPrQAIVzNUjigo7mglpNvVyO7p/8LU7gD
dIjfGmSbtsHU72J+/0lotRqjhjORl1F/EILf8pIzx5Ga7ExUGhOzGWAj7/3uZxfA
Cp1Z/QV271MGwv/sNdSPwCCJHf51eOmsbyOyUScjb3gFRwIStEa1jB4hKwLhS5wF
3kh4kqu3WGuIQAdxkUpDBsy3CQjAPDkvtRJorwyWGbjwa9xUETESAgH7XCCTsgWc
0sIuldWWaxC581+fAP1Dzmo8uuWBURTaOrVmRMILQHMTw54zoFyLY+VI9gEAT9aV
gnPr3M4F
=U7DN
-----END PGP SIGNATURE-----
Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- bitmap: optimize bitmap_weight() usage, from me
- lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
Carvalho Chehab
- include/linux/find: Fix documentation, from Anna-Maria Behnsen
- bitmap: fix conversion from/to fix-sized arrays, from me
- bitmap: Fix return values to be unsigned, from Kees Cook
It has been in linux-next for at least a week with no problems.
* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
nodemask: Fix return values to be unsigned
bitmap: Fix return values to be unsigned
KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
KVM: x86: hyper-v: fix type of valid_bank_mask
ia64: cleanup remove_siblinginfo()
drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
lib/bitmap: add test for bitmap_{from,to}_arr64
lib: add bitmap_{from,to}_arr64
lib/bitmap: extend comment for bitmap_(from,to)_arr32()
include/linux/find: Fix documentation
lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
MAINTAINERS: add cpumask and nodemask files to BITMAP_API
arch/x86: replace nodes_weight with nodes_empty where appropriate
mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
irq: mips: replace cpumask_weight with cpumask_empty where appropriate
drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
arch/x86: replace cpumask_weight with cpumask_empty where appropriate
...
ordinary user mode tasks.
In commit 40966e316f ("kthread: Ensure struct kthread is present for
all kthreads") caused init and the user mode helper threads that call
kernel_execve to have struct kthread allocated for them. This struct
kthread going away during execve in turned made a use after free of
struct kthread possible.
The commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
init and umh") is enough to fix the use after free and is simple enough
to be backportable.
The rest of the changes pass struct kernel_clone_args to clean things
up and cause the code to make sense.
In making init and the user mode helpers tasks purely user mode tasks
I ran into two complications. The function task_tick_numa was
detecting tasks without an mm by testing for the presence of
PF_KTHREAD. The initramfs code in populate_initrd_image was using
flush_delayed_fput to ensuere the closing of all it's file descriptors
was complete, and flush_delayed_fput does not work in a userspace thread.
I have looked and looked and more complications and in my code review
I have not found any, and neither has anyone else with the code sitting
in linux-next.
Link: https://lkml.kernel.org/r/87mtfu4up3.fsf@email.froward.int.ebiederm.org
Eric W. Biederman (8):
kthread: Don't allocate kthread_struct for init and umh
fork: Pass struct kernel_clone_args into copy_thread
fork: Explicity test for idle tasks in copy_thread
fork: Generalize PF_IO_WORKER handling
init: Deal with the init process being a user mode process
fork: Explicitly set PF_KTHREAD
fork: Stop allowing kthreads to call execve
sched: Update task_tick_numa to ignore tasks without an mm
arch/alpha/kernel/process.c | 13 ++++++------
arch/arc/kernel/process.c | 13 ++++++------
arch/arm/kernel/process.c | 12 ++++++-----
arch/arm64/kernel/process.c | 12 ++++++-----
arch/csky/kernel/process.c | 15 ++++++-------
arch/h8300/kernel/process.c | 10 ++++-----
arch/hexagon/kernel/process.c | 12 ++++++-----
arch/ia64/kernel/process.c | 15 +++++++------
arch/m68k/kernel/process.c | 12 ++++++-----
arch/microblaze/kernel/process.c | 12 ++++++-----
arch/mips/kernel/process.c | 13 ++++++------
arch/nios2/kernel/process.c | 12 ++++++-----
arch/openrisc/kernel/process.c | 12 ++++++-----
arch/parisc/kernel/process.c | 18 +++++++++-------
arch/powerpc/kernel/process.c | 15 +++++++------
arch/riscv/kernel/process.c | 12 ++++++-----
arch/s390/kernel/process.c | 12 ++++++-----
arch/sh/kernel/process_32.c | 12 ++++++-----
arch/sparc/kernel/process_32.c | 12 ++++++-----
arch/sparc/kernel/process_64.c | 12 ++++++-----
arch/um/kernel/process.c | 15 +++++++------
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/include/asm/switch_to.h | 8 +++----
arch/x86/kernel/fpu/core.c | 4 ++--
arch/x86/kernel/process.c | 18 +++++++++-------
arch/xtensa/kernel/process.c | 17 ++++++++-------
fs/exec.c | 8 ++++---
include/linux/sched/task.h | 8 +++++--
init/initramfs.c | 2 ++
init/main.c | 2 +-
kernel/fork.c | 46 +++++++++++++++++++++++++++++++++-------
kernel/sched/fair.c | 2 +-
kernel/umh.c | 6 +++---
33 files changed, 234 insertions(+), 160 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaR/MACgkQC/v6Eiaj
j0Aayg/7Bx66872d9c6igkJ+MPCTuh+v9QKCGwiYEmiU4Q5sVAFB0HPJO27qC14u
630X0RFNZTkPzNNEJNIW4kw6Dj8s8YRKf+FgQAVt4SzdRwT7eIPDjk1nGraopPJ3
O04pjvuTmUyidyViRyFcf2ptx/pnkrwP8jUSc+bGTgfASAKAgAokqKE5ecjewbBc
Y/EAkQ6QW7KxPjeSmpAHwI+t3BpBev9WEC4PbhRhsBCQFO2+PJiklvqdhVNBnIjv
qUezll/1xv9UYgniB15Q4Nb722SmnWSU3r8as1eFPugzTHizKhufrrpyP+KMK1A0
tdtEJNs5t2DZF7ZbGTFSPqJWmyTYLrghZdO+lOmnaSjHxK4Nda1d4NzbefJ0u+FE
tutewowvHtBX6AFIbx+H3O+DOJM2IgNMf+ReQDU/TyNyVf3wBrTbsr9cLxypIJIp
zze8npoLMlB7B4yxVo5ES5e63EXfi3iHl0L3/1EhoGwriRz1kWgVLUX/VZOUpscL
RkJHsW6bT8sqxPWAA5kyWjEN+wNR2PxbXi8OE4arT0uJrEBMUgDCzydzOv5tJB00
mSQdytxH9LVdsmxBKAOBp5X6WOLGA4yb1cZ6E/mEhlqXMpBDF1DaMfwbWqxSYi4q
sp5zU3SBAW0qceiZSsWZXInfbjrcQXNV/DkDRDO9OmzEZP4m1j0=
=x6fy
-----END PGP SIGNATURE-----
Merge tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull kthread updates from Eric Biederman:
"This updates init and user mode helper tasks to be ordinary user mode
tasks.
Commit 40966e316f ("kthread: Ensure struct kthread is present for
all kthreads") caused init and the user mode helper threads that call
kernel_execve to have struct kthread allocated for them. This struct
kthread going away during execve in turned made a use after free of
struct kthread possible.
Here, commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
init and umh") is enough to fix the use after free and is simple
enough to be backportable.
The rest of the changes pass struct kernel_clone_args to clean things
up and cause the code to make sense.
In making init and the user mode helpers tasks purely user mode tasks
I ran into two complications. The function task_tick_numa was
detecting tasks without an mm by testing for the presence of
PF_KTHREAD. The initramfs code in populate_initrd_image was using
flush_delayed_fput to ensuere the closing of all it's file descriptors
was complete, and flush_delayed_fput does not work in a userspace
thread.
I have looked and looked and more complications and in my code review
I have not found any, and neither has anyone else with the code
sitting in linux-next"
* tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sched: Update task_tick_numa to ignore tasks without an mm
fork: Stop allowing kthreads to call execve
fork: Explicitly set PF_KTHREAD
init: Deal with the init process being a user mode process
fork: Generalize PF_IO_WORKER handling
fork: Explicity test for idle tasks in copy_thread
fork: Pass struct kernel_clone_args into copy_thread
kthread: Don't allocate kthread_struct for init and umh
This is mostly some DT updates, but also a handful of cleanups and some
fixes. The most user-visible of those are:
* A device tree for the Sundance Polarberry, along with a handful of
fixes and clenups to the PolarFire SOC device trees and bindings.
* The memfd_secret syscall number is now visible to userspace,
* Some improvements to the vm layout dump, which really should have
followed shortly after the sv48 patches but I missed.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmKaL78THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYibvYEACz22qhTcxtaef4FK89Av4q53pGLPIO
zKn5UzwVVqIVFuwDeFBSWCKRL31szssIhnrQWZ2z9u6TbXNtMhWLESZJpNK4XGRa
GqcVBxfQw8sTfxj/PdRrPxlq/XWKVRJ7UFjov9/rSkNTG5bhM7D63l+mCUix//mS
oc6aVIXAkt0O3K/Q0YsUTdBO+6VOABBtyMScisplvAYy8ZZ0BDAMsxebergfF1kX
Yvk6xEk7DtZqOjAHn1c//Nz2Dm/SVI7lIf6IUbD4Pi/+dDzXIe2cm4v0gV/reZF6
gBa/efvVYa5GIkzQZ6YsH629VcWxZW4LXJ++wz6Q02+36WGLGeky8GgudAATXLe/
A5un2bTr0vxpHBJ5G8ov7vMWQqAYYx/I+ExaTtoTd9W/ygNgwkXw93Nlzsow0wWC
hbhcPBAx5ChRLPwcD6hNpn+s69+tS1WS6fXBZoHT1UTim/9iE3//SjH/ISmlbPTD
EFb0Q4+EKwDPy12uqABvh4QMKoANAbravobDlXDLaAs8NeAdr4oaYzDy1O0hwl4a
IiIbWLd6VUnSA4gq3HttnxvofjVhfl81kkzLmC+8es1DeZhBYchtz2Btu7ibYGP2
AoTv24tuy9e7lz6V5jlmjURzzA+uunUE/LH3t4ExlTagrZddz7S3oRH98dNN9or9
Cw6kST16JC/BWA==
=qpf3
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.19-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
"This is mostly some DT updates, but also a handful of cleanups and
some fixes. The most user-visible of those are:
- A device tree for the Sundance Polarberry, along with a handful of
fixes and clenups to the PolarFire SOC device trees and bindings.
- The memfd_secret syscall number is now visible to userspace,
- Some improvements to the vm layout dump, which really should have
followed shortly after the sv48 patches but I missed"
* tag 'riscv-for-linus-5.19-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Move alternative length validation into subsection
riscv: mm: init: make pt_ops_set_[early|late|fixmap] static
riscv: move errata/ and kvm/ builds to arch/riscv/Kbuild
RISC-V: Mark IORESOURCE_EXCLUSIVE for reserved mem instead of IORESOURCE_BUSY
riscv: Wire up memfd_secret in UAPI header
riscv: Fix irq_work when SMP is disabled
riscv: Improve virtual kernel memory layout dump
riscv: Initialize thread pointer before calling C functions
Documentation: riscv: Add sv48 description to VM layout
RISC-V: Only default to spinwait on SBI-0.1 and M-mode
riscv: dts: icicle: sort nodes alphabetically
riscv: microchip: icicle: readability fixes
riscv: dts: microchip: add the sundance polarberry
dt-bindings: riscv: microchip: add polarberry compatible string
dt-bindings: vendor-prefixes: add Sundance DSP
riscv: dts: microchip: make the fabric dtsi board specific
dt-bindings: riscv: microchip: document icicle reference design
riscv: dts: microchip: remove soc vendor from filenames
riscv: dts: microchip: move sysctrlr out of soc bus
riscv: dts: microchip: remove icicle memory clocks
- fix new DXE service invocations for mixed mode
- use correct Kconfig symbol when setting PE header flag
- clean up the drivers/firmware/efi Kconfig dependencies so that
features that depend on CONFIG_EFI are hidden from the UI when the
symbol is not enabled.
Also included is a RISC-V bugfix from Heinrich to avoid read-write
mappings of read-only firmware regions in the EFI page tables.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmKXk5YACgkQw08iOZLZ
jyQQ0wv/cB9Z9kJur3wJqj75HFEly7bwSk5oxJ+txRytApSaRYnqm7l4WeP3QQ8c
o9GzZZNwoRQSx1mCBJefaO4s8fA24QkIeD8Oy4MeucKaPX1UbNc6Z84srOynjpSj
mOyIYB+kurxsCBKmzQQBy8txIWld4EkrMhEoc1h2L4d2+OVRvIlsu1PMv03eCiww
4Sop0yO5CydEpjxJDCfwol0L/PBiXc2PfRs2FdHFwOSQaisQLxhNruCnovyS9Zwk
zLkhYC5dS+OZctknl6XMzOAi3x7sNYzVwNf4+yhFeU2cTuj3kJWnEAqs3CU/tiPO
DOobLg/r/j7H44Nsc/8aJGT4GPNrbUrb6aOcfrBAkxvsu1Sp/k/UfSMZLS9fU1gC
XUUl46NXG1yFpCntruQm5SMytVKdtlyUu7pPa+Ijmr+vc6UWl1XJq26J3UpiiFYT
mjrer5gvzrnhuvUjIb4ulKoNMdoOQQMtofLxUGuc1u/53jWHxbiKt7/QvyFepJVe
zi/ikD/v
=7wiT
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull more EFI updates from Ard Biesheuvel:
"Follow-up tweaks for EFI changes - they mostly address issues
introduced this merge window, except for Heinrich's patch:
- fix new DXE service invocations for mixed mode
- use correct Kconfig symbol when setting PE header flag
- clean up the drivers/firmware/efi Kconfig dependencies so that
features that depend on CONFIG_EFI are hidden from the UI when the
symbol is not enabled.
Also included is a RISC-V bugfix from Heinrich to avoid read-write
mappings of read-only firmware regions in the EFI page tables"
* tag 'efi-next-for-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: clean up Kconfig dependencies on CONFIG_EFI
efi/x86: libstub: Make DXE calls mixed mode safe
efi: x86: Fix config name for setting the NX-compatibility flag in the PE header
riscv: read-only pages should not be writable
bitmap_empty() is better than bitmap_weight() because it may return
earlier, and improves on readability.
CC: Albert Ou <aou@eecs.berkeley.edu>
CC: Anup Patel <anup@brainfault.org>
CC: Atish Patra <atishp@atishpatra.org>
CC: Jisheng Zhang <jszhang@kernel.org>
CC: Palmer Dabbelt <palmer@dabbelt.com>
CC: Paul Walmsley <paul.walmsley@sifive.com>
CC: Tsukasa OI <research_trasio@irq.a4lg.com>
CC: linux-riscv@lists.infradead.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Because of the stack canary feature that reads from the current task
structure the stack canary value, the thread pointer register "tp" must
be set before calling any C function from head.S: by chance, setup_vm
and all the functions that it calls does not seem to be part of the
functions where the canary check is done, but in the following commits,
some functions will.
Fixes: f2c9699f65 ("riscv: Add STACKPROTECTOR supported")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* Support for the Svpbmt extension, which allows memory attributes to be
encoded in pages.
* Support for the Allwinner D1's implementation of page-based memory
attributes.
* Support for running rv32 binaries on rv64 systems, via the compat
subsystem.
* Support for kexec_file().
* Support for the new generic ticket-based spinlocks, which allows us to
also move to qrwlock. These should have already gone in through the
asm-geneic tree as well.
* A handful of cleanups and fixes, include some larger ones around
atomics and XIP.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmKWOx8THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYieAiEADAUdP7ctoaSQwk5skd/fdA3b4KJuKn
1Zjl+Br32WP0DlbirYBYWRUQZnCCsvABbTiwSJMcG7NBpU5pyQ5XDtB3OA5kJswO
Fdp8Nd53//+GK1M5zdEM9OdgvT9fbfTZ3qTu8bKsROOQhGwnYL+Csc9KjFRqEmzN
oQii0jlb3n5PM4FL3GsbV4uMn9zzkP9mnVAPQktcock2EKFEK/Fy3uNYMQiO2KPi
n8O6bIDaeRdQ6SurzWOuOkt0cro0tEF85ilzT04mynQsOU0el5oGqCxnOhNH3VWg
ndqPT6Yafw12hZOtbKJeP+nF8IIR6aJLP3jOtRwEVgcfbXYAw4QwbAV8kQZISefN
ipn8JGY7GX9Y9TYU692OUGkcmAb3/dxb6c0WihBdvJ0M6YyLD5X+YKHNuG2onLgK
ss43C5Mxsu629rsjdu/PV91B1+pve3rG9siVmF+g4eo0x9rjMq6/JB0Kal/8SLI1
Je5T55d5ujV1a2XxhZLQOSD5owrK7J1M9owb0bloTnr9nVwFTWDrfEQEU82o3kP+
Xm+FfXktnz9ai55NjkMbbEur5D++dKJhBavwCTnBcTrJmMtEH0R45GTK9ZehP+WC
rNVrRXjIsS18wsTfJxnkZeFQA38as6VBKTzvwHvOgzTrrZU1/xk3lpkouYtAO6BG
gKacHshVilmUuA==
=Loi6
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the Svpbmt extension, which allows memory attributes to
be encoded in pages
- Support for the Allwinner D1's implementation of page-based memory
attributes
- Support for running rv32 binaries on rv64 systems, via the compat
subsystem
- Support for kexec_file()
- Support for the new generic ticket-based spinlocks, which allows us
to also move to qrwlock. These should have already gone in through
the asm-geneic tree as well
- A handful of cleanups and fixes, include some larger ones around
atomics and XIP
* tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
riscv: compat: Using seperated vdso_maps for compat_vdso_info
RISC-V: Fix the XIP build
RISC-V: Split out the XIP fixups into their own file
RISC-V: ignore xipImage
RISC-V: Avoid empty create_*_mapping definitions
riscv: Don't output a bogus mmu-type on a no MMU kernel
riscv: atomic: Add custom conditional atomic operation implementation
riscv: atomic: Optimize dec_if_positive functions
riscv: atomic: Cleanup unnecessary definition
RISC-V: Load purgatory in kexec_file
RISC-V: Add purgatory
RISC-V: Support for kexec_file on panic
RISC-V: Add kexec_file support
RISC-V: use memcpy for kexec_file mode
kexec_file: Fix kexec_file.c build error for riscv platform
riscv: compat: Add COMPAT Kbuild skeletal support
riscv: compat: ptrace: Add compat_arch_ptrace implement
riscv: compat: signal: Add rt_frame implementation
riscv: add memory-type errata for T-Head
...
- Add Tegra234 cpufreq support (Sumit Gupta).
- Clean up and enhance the Mediatek cpufreq driver (Wan Jiabing,
Rex-BC Chen, and Jia-Wei Chang).
- Fix up the CPPC cpufreq driver after recent changes (Zheng Bin,
Pierre Gondois).
- Minor update to dt-binding for Qcom's opp-v2-kryo-cpu (Yassine
Oudjana).
- Use list iterator only inside the list_for_each_entry loop (Xiaomeng
Tong, and Jakob Koschel).
- New APIs related to finding OPP based on interconnect bandwidth
(Krzysztof Kozlowski).
- Fix the missing of_node_put() in _bandwidth_supported() (Dan
Carpenter).
- Cleanups (Krzysztof Kozlowski, and Viresh Kumar).
- Add Out of Band mode description to the intel-speed-select utility
documentation (Srinivas Pandruvada).
- Add power sequences support to the system reboot and power off
code and make related platform-specific changes for multiple
platforms (Dmitry Osipenko, Geert Uytterhoeven).
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKU8lESHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxVz0P91LNCbkDSt60jzNkXdEjsvUnI/YjJ+QJ
/+ta7iCwf90obb6s9soBkTyU8Ia7hJ/IWDJW/5xhdG0ySYF17hGNIGKK9xKGsJFK
tzzWtjFsvT3PeUZQERekqWp8OYskHYmQMj8o4jqqFF7DZD/AswTgkVLALUd7YhVL
UvLmcKsUA7eXy3ZrhtrGSzVSEbKOGXBLFyjy3IuWjfz6Uk/nGQRNKGf7byRWLM44
y7zb75/5+p4MPyyJP8M/uiXzEYDKuubRtfx9PdmLgBUSMbtho6eB1x47dZWooaxe
YKmcFjF80AmnwxHb+Te2rZHPeIYr+5hLBaEq7xaLQf/nAS3y5z1PIfI2wVQ5mXPz
D599jHHda/6oSAKCVTq2fKfnlR6fetm5j66xOQINpD+G5b5tNSpllXJDamFZxFgP
DiQAOFzdnRYnK7yTiLWVl1q76SVRxqsGz7/5Ak+NRj2OQK2wRkLzHuZfiV/8r0pk
ksi6Ew9TerXkstoTQsSToPQxB2VvosSajNU3Oy27pmM0oal1XxP0LIPz9sMor5/g
tfk5f6Yz/+FFIfXj3cZffZNdhsJgejmcqPdrSdCOV3sBrblnIMQNpHiYg4jGztoj
IjYKYPVpSaWiSZLQOaK2moTEvm9CfQz1TQCF+/Kz88LX6/7ZaDJFxHG2FDEob0sg
6KVbrZWweLI=
=PAh+
-----END PGP SIGNATURE-----
Merge tag 'pm-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update the ARM cpufreq drivers and fix up the CPPC cpufreq
driver after recent changes, update the OPP code and PM documentation
and add power sequences support to the system reboot and power off
code.
Specifics:
- Add Tegra234 cpufreq support (Sumit Gupta)
- Clean up and enhance the Mediatek cpufreq driver (Wan Jiabing,
Rex-BC Chen, and Jia-Wei Chang)
- Fix up the CPPC cpufreq driver after recent changes (Zheng Bin,
Pierre Gondois)
- Minor update to dt-binding for Qcom's opp-v2-kryo-cpu (Yassine
Oudjana)
- Use list iterator only inside the list_for_each_entry loop
(Xiaomeng Tong, and Jakob Koschel)
- New APIs related to finding OPP based on interconnect bandwidth
(Krzysztof Kozlowski)
- Fix the missing of_node_put() in _bandwidth_supported() (Dan
Carpenter)
- Cleanups (Krzysztof Kozlowski, and Viresh Kumar)
- Add Out of Band mode description to the intel-speed-select utility
documentation (Srinivas Pandruvada)
- Add power sequences support to the system reboot and power off code
and make related platform-specific changes for multiple platforms
(Dmitry Osipenko, Geert Uytterhoeven)"
* tag 'pm-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (60 commits)
cpufreq: CPPC: Fix unused-function warning
cpufreq: CPPC: Fix build error without CONFIG_ACPI_CPPC_CPUFREQ_FIE
Documentation: admin-guide: PM: Add Out of Band mode
kernel/reboot: Change registration order of legacy power-off handler
m68k: virt: Switch to new sys-off handler API
kernel/reboot: Add devm_register_restart_handler()
kernel/reboot: Add devm_register_power_off_handler()
soc/tegra: pmc: Use sys-off handler API to power off Nexus 7 properly
reboot: Remove pm_power_off_prepare()
regulator: pfuze100: Use devm_register_sys_off_handler()
ACPI: power: Switch to sys-off handler API
memory: emif: Use kernel_can_power_off()
mips: Use do_kernel_power_off()
ia64: Use do_kernel_power_off()
x86: Use do_kernel_power_off()
sh: Use do_kernel_power_off()
m68k: Switch to new sys-off handler API
powerpc: Use do_kernel_power_off()
xen/x86: Use do_kernel_power_off()
parisc: Use do_kernel_power_off()
...
- The majority of the changes are for fixes and clean ups.
Noticeable changes:
- Rework trace event triggers code to be easier to interact with.
- Support for embedding bootconfig with the kernel (as suppose to having it
embedded in initram). This is useful for embedded boards without initram
disks.
- Speed up boot by parallelizing the creation of tracefs files.
- Allow absolute ring buffer timestamps handle timestamps that use more than
59 bits.
- Added new tracing clock "TAI" (International Atomic Time)
- Have weak functions show up in available_filter_function list as:
__ftrace_invalid_address___<invalid-offset>
instead of using the name of the function before it.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYpOgXRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qjkKAQDbpemxvpFyJlZqT8KgEIXubu+ag2/q
p0XDHaPS0zF9OQEAjTxg6GMEbnFYl6fzxZtOoEbiaQ7ppfdhRI8t6sSMVA8=
=+nDD
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The majority of the changes are for fixes and clean ups.
Notable changes:
- Rework trace event triggers code to be easier to interact with.
- Support for embedding bootconfig with the kernel (as suppose to
having it embedded in initram). This is useful for embedded boards
without initram disks.
- Speed up boot by parallelizing the creation of tracefs files.
- Allow absolute ring buffer timestamps handle timestamps that use
more than 59 bits.
- Added new tracing clock "TAI" (International Atomic Time)
- Have weak functions show up in available_filter_function list as:
__ftrace_invalid_address___<invalid-offset> instead of using the
name of the function before it"
* tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (52 commits)
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
tracing: Fix comments for event_trigger_separate_filter()
x86/traceponit: Fix comment about irq vector tracepoints
x86,tracing: Remove unused headers
ftrace: Clean up hash direct_functions on register failures
tracing: Fix comments of create_filter()
tracing: Disable kcov on trace_preemptirq.c
tracing: Initialize integer variable to prevent garbage return value
ftrace: Fix typo in comment
ftrace: Remove return value of ftrace_arch_modify_*()
tracing: Cleanup code by removing init "char *name"
tracing: Change "char *" string form to "char []"
tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
tracing/timerlat: Print stacktrace in the IRQ handler if needed
tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
kprobes: Fix build errors with CONFIG_KRETPROBES=n
tracing: Fix return value of trace_pid_write()
tracing: Fix potential double free in create_var_ref()
tracing: Use strim() to remove whitespace instead of doing it manually
ftrace: Deal with error return code of the ftrace_process_locs() function
...
If EFI pages are marked as read-only,
we should remove the _PAGE_WRITE flag.
The current code overwrites an unused value.
Fixes: b91540d52a ("RISC-V: Add EFI runtime services")
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Link: https://lore.kernel.org/r/20220528014132.91052-1-heinrich.schuchardt@canonical.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
subsystems. Most notably some maintenance work in ocfs2 and initramfs.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo/6xQAKCRDdBJ7gKXxA
jkD9AQCPczLBbRWpe1edL+5VHvel9ePoHQmvbHQnufdTh9rB5QEAu0Uilxz4q9cx
xSZypNhj2n9f8FCYca/ZrZneBsTnAA8=
=AJEO
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"The non-MM patch queue for this merge window.
Not a lot of material this cycle. Many singleton patches against
various subsystems. Most notably some maintenance work in ocfs2
and initramfs"
* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
kcov: update pos before writing pc in trace function
ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
fs/ntfs: remove redundant variable idx
fat: remove time truncations in vfat_create/vfat_mkdir
fat: report creation time in statx
fat: ignore ctime updates, and keep ctime identical to mtime in memory
fat: split fat_truncate_time() into separate functions
MAINTAINERS: add Muchun as a memcg reviewer
proc/sysctl: make protected_* world readable
ia64: mca: drop redundant spinlock initialization
tty: fix deadlock caused by calling printk() under tty_port->lock
relay: remove redundant assignment to pointer buf
fs/ntfs3: validate BOOT sectors_per_clusters
lib/string_helpers: fix not adding strarray to device's resource list
kernel/crash_core.c: remove redundant check of ck_cmdline
ELF, uapi: fixup ELF_ST_TYPE definition
ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
ipc: update semtimedop() to use hrtimer
ipc/sem: remove redundant assignments
...
All instances of the function ftrace_arch_modify_prepare() and
ftrace_arch_modify_post_process() return zero. There's no point in
checking their return value. Just have them be void functions.
Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
This fixes a handful of issues with the XIP support, which has bit
rotted some lately.
* palmer/riscv-xip:
RISC-V: Fix the XIP build
RISC-V: Split out the XIP fixups into their own file
RISC-V: ignore xipImage
RISC-V: Avoid empty create_*_mapping definitions
This was broken by the original refactoring (as the XIP definitions
depend on <asm/pgtable.h>) and then more broken by the merge (as I
accidentally took the old version). This fixes both breakages, while
also pulling this out of <asm/asm.h> to avoid polluting most assembly
files with the XIP fixups.
Fixes: bee7fbc385 ("RISC-V CPU Idle Support")
Fixes: 63b13e64a8 ("RISC-V: Add arch functions for non-retentive suspend entry/exit")
Link: https://lore.kernel.org/r/20220420184056.7886-4-palmer@rivosinc.com
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently on a 64-bit kernel built without CONFIG_MMU, /proc/cpuinfo will
show the current MMU mode as sv57.
While the device tree property "mmu-type" does have a value "riscv,none" to
describe a CPU without a MMU, since commit 73c7c8f68e ("riscv: Use
pgtable_l4_enabled to output mmu_type in cpuinfo"), we no longer rely on
device tree to output the MMU mode. (Not even for CONFIG_32BIT.)
Therefore, instead of readding code to look at the "mmu-type" device tree
property, let's continue with the existing convention to use fixed values
for configurations where we don't determine the MMU mode at runtime.
Add a new fixed value for !CONFIG_MMU in order to output the correct
MMU mode in cpuinfo.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20220414173037.1381927-1-niklas.cassel@wdc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This patch set implements kexec_file_load() for RISC-V, which is
currently only allowed on rv64 due to some minor build issues on 32-bit
platforms in the generic code. This allows users to kexec() using an FD
as opposed to a buffer.
Link: https://lore.kernel.org/all/20220408100914.150110-1-lizhengyu3@huawei.com/
* palmer/riscv-kexec_file:
RISC-V: Load purgatory in kexec_file
RISC-V: Add purgatory
RISC-V: Support for kexec_file on panic
RISC-V: Add kexec_file support
RISC-V: use memcpy for kexec_file mode
kexec_file: Fix kexec_file.c build error for riscv platform
This patch supports kexec_file to load and relocate purgatory.
It works well on riscv64 QEMU, being tested with devmem.
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-7-lizhengyu3@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This patch adds support for loading a kexec on panic (kdump) kernel.
It has been tested with vmcore-dmesg on riscv64 QEMU on both an smp
and a non-smp system.
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-5-lizhengyu3@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This patch adds support for kexec_file on RISC-V. I tested it on riscv64
QEMU with busybear-linux and single core along with the OpenSBI firmware
fw_jump.bin for generic platform.
On SMP system, it depends on CONFIG_{HOTPLUG_CPU, RISCV_SBI} to
resume/stop hart through OpenSBI firmware, it also needs a OpenSBI that
support the HSM extension.
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-4-lizhengyu3@huawei.com
[Palmer: Make 64-bit only]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The pointer to buffer loading kernel binaries is in kernel space for
kexec_fil mode, When copy_from_user copies data from pointer to a block
of memory, it checkes that the pointer is in the user space range, on
RISCV-V that is:
static inline bool __access_ok(unsigned long addr, unsigned long size)
{
return size <= TASK_SIZE && addr <= TASK_SIZE - size;
}
and TASK_SIZE is 0x4000000000 for 64-bits, which now causes
copy_from_user to reject the access of the field 'buf' of struct
kexec_segment that is in range [CONFIG_PAGE_OFFSET - VMALLOC_SIZE,
CONFIG_PAGE_OFFSET), is invalid user space pointer.
This patch fixes this issue by skipping access_ok(), use mempcy() instead.
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-3-lizhengyu3@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Kernel now supports chained power-off handlers. Use do_kernel_power_off()
that invokes chained power-off handlers. It also invokes legacy
pm_power_off() for now, which will be removed once all drivers will
be converted to the new sys-off API.
Acked-by: Palmer Dabbelt <palmer@dabbelt.com>
Reviewed-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The RISC-V port supports the rv32i and rv64i base ISAs, but provides no
mechanism to run 32-bit userspace on 64-bit systems. This adds that
support, via the COMPAT framework. As the RISC-V ISAs (and uABIs) were
developed concurrently, the resulting compat support is mostly generic.
This includes a handful of cleanups to the generic compat infrastructure
to more cleanly support RISC-V, followed by the RISC-V implementation.
* palmer/riscv-compat:
riscv: compat: Add COMPAT Kbuild skeletal support
riscv: compat: ptrace: Add compat_arch_ptrace implement
riscv: compat: signal: Add rt_frame implementation
riscv: compat: vdso: Add setup additional pages implementation
riscv: compat: vdso: Add COMPAT_VDSO base code implementation
riscv: compat: Add hw capability check for elf
riscv: compat: Add elf.h implementation
riscv: compat: process: Add UXL_32 support in start_thread
riscv: compat: syscall: Add entry.S implementation
riscv: compat: syscall: Add compat_sys_call_table implementation
riscv: compat: Support TASK_SIZE for compat mode
riscv: compat: Add basic compat data type implementation
riscv: Fixup difference with defconfig
syscalls: compat: Fix the missing part for __SYSCALL_COMPAT
asm-generic: compat: Cleanup duplicate definitions
fs: stat: compat: Add __ARCH_WANT_COMPAT_STAT
arch: Add SYSVIPC_COMPAT for all architectures
compat: consolidate the compat_flock{,64} definition
uapi: always define F_GETLK64/F_SETLK64/F_SETLKW64 in fcntl.h
uapi: simplify __ARCH_FLOCK{,64}_PAD a little
Implement compat_setup_rt_frame for sigcontext save & restore. The
main process is the same with signal, but the rv32 pt_regs' size
is different from rv64's, so we needs convert them.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220405071314.3225832-19-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Some current cpus based on T-Head cores implement memory-types
way different than described in the svpbmt spec even going
so far as using PTE bits marked as reserved.
Add the T-Head vendor-id and necessary errata code to
replace the affected instructions.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20220511192921.2223629-13-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Right now the code uses a global struct to store vendor-ids
and another global variable to store the vendor-patch-function.
There exist specific cases where we'll need to patch the kernel
at an even earlier stage, where trying to write to a static
variable might actually result in hangs.
Also collecting the vendor-information consists of 3 sbi-ecalls
(or csr-reads) which is pretty negligible in the context of
booting a kernel.
So rework the code to not rely on static variables and instead
collect the vendor-information when a round of alternatives is
to be applied.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Link: https://lore.kernel.org/r/20220511192921.2223629-12-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Svpbmt (the S should be capitalized) is the
"Supervisor-mode: page-based memory types" extension
that specifies attributes for cacheability, idempotency
and ordering.
The relevant settings are done in special bits in PTEs:
Here is the svpbmt PTE format:
| 63 | 62-61 | 60-8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
N MT RSW D A G U X W R V
^
Of the Reserved bits [63:54] in a leaf PTE, the high bit is already
allocated (as the N bit), so bits [62:61] are used as the MT (aka
MemType) field. This field specifies one of three memory types that
are close equivalents (or equivalent in effect) to the three main x86
and ARMv8 memory types - as shown in the following table.
RISC-V
Encoding &
MemType RISC-V Description
---------- ------------------------------------------------
00 - PMA Normal Cacheable, No change to implied PMA memory type
01 - NC Non-cacheable, idempotent, weakly-ordered Main Memory
10 - IO Non-cacheable, non-idempotent, strongly-ordered I/O memory
11 - Rsvd Reserved for future standard use
As the extension will not be present on all implementations,
implement a method to handle cpufeatures via alternatives
to not incur runtime penalties on cpu variants not supporting
specific extensions and patch relevant code parts at runtime.
Co-developed-by: Wei Fu <wefu@redhat.com>
Signed-off-by: Wei Fu <wefu@redhat.com>
Co-developed-by: Liu Shaohua <liush@allwinnertech.com>
Signed-off-by: Liu Shaohua <liush@allwinnertech.com>
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
[moved to use the alternatives mechanism]
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Link: https://lore.kernel.org/r/20220511192921.2223629-10-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Move the application of boot alternatives to after the hw-capabilities
are populated. This allows to check for available extensions when
determining which alternatives to apply and also makes it actually
work if CONFIG_SMP is disabled for whatever reason.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220511192921.2223629-8-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This allows alternatives to also be applied when loading modules
and follows the implementation of other architectures (e.g. arm64).
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Link: https://lore.kernel.org/r/20220511192921.2223629-4-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Future features may need to be applied at a different
time during boot, so allow defining stages for alternatives
and handling them differently depending on the stage.
Also make the alternatives-location more flexible so that
future stages may provide their own location.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Link: https://lore.kernel.org/r/20220511192921.2223629-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Right now the alternatives need to be explicitly enabled and
erratas are limited to SiFive ones.
We want to use alternatives not only for patching soc erratas,
but in the future also for handling different behaviour depending
on the existence of future extensions.
So move the core alternatives over to the kernel subdirectory
and move the CONFIG_RISCV_ALTERNATIVE to be a hidden symbol
which we expect relevant erratas and extensions to just select
if needed.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Link: https://lore.kernel.org/r/20220511192921.2223629-2-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add fn and fn_arg members into struct kernel_clone_args and test for
them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER).
This allows any task that wants to be a user space task that only runs
in kernel mode to use this functionality.
The code on x86 is an exception and still retains a PF_KTHREAD test
because x86 unlikely everything else handles kthreads slightly
differently than user space tasks that start with a function.
The functions that created tasks that start with a function
have been updated to set ".fn" and ".fn_arg" instead of
".stack" and ".stack_size". These functions are fork_idle(),
create_io_thread(), kernel_thread(), and user_mode_thread().
Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With io_uring we have started supporting tasks that are for most
purposes user space tasks that exclusively run code in kernel mode.
The kernel task that exec's init and tasks that exec user mode
helpers are also user mode tasks that just run kernel code
until they call kernel execve.
Pass kernel_clone_args into copy_thread so these oddball
tasks can be supported more cleanly and easily.
v2: Fix spelling of kenrel_clone_args on h8300
Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Patch series "Convert vmcore to use an iov_iter", v5.
For some reason several people have been sending bad patches to fix
compiler warnings in vmcore recently. Here's how it should be done.
Compile-tested only on x86. As noted in the first patch, s390 should take
this conversion a bit further, but I'm not inclined to do that work
myself.
This patch (of 3):
Instead of passing in a 'buf' and 'userbuf' argument, pass in an iov_iter.
s390 needs more work to pass the iov_iter down further, or refactor, but
I'd be more comfortable if someone who can test on s390 did that work.
It's more convenient to convert the whole of read_from_oldmem() to take an
iov_iter at the same time, so rename it to read_from_oldmem_iter() and add
a temporary read_from_oldmem() wrapper that creates an iov_iter.
Link: https://lkml.kernel.org/r/20220408090636.560886-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20220408090636.560886-2-bhe@redhat.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is no vgettimeofday supported in rv32 that makes simple to
generate rv32 vdso code which only needs riscv64 compiler. Other
architectures need change compiler or -m (machine parameter) to
support vdso32 compiling. If rv32 support vgettimeofday (which
cause C compile) in future, we would add CROSS_COMPILE to support
that makes more requirement on compiler enviornment.
linux-rv64/arch/riscv/kernel/compat_vdso/compat_vdso.so.dbg:
file format elf64-littleriscv
Disassembly of section .text:
0000000000000800 <__vdso_rt_sigreturn>:
800: 08b00893 li a7,139
804: 00000073 ecall
808: 0000 unimp
...
000000000000080c <__vdso_getcpu>:
80c: 0a800893 li a7,168
810: 00000073 ecall
814: 8082 ret
...
0000000000000818 <__vdso_flush_icache>:
818: 10300893 li a7,259
81c: 00000073 ecall
820: 8082 ret
linux-rv32/arch/riscv/kernel/vdso/vdso.so.dbg:
file format elf32-littleriscv
Disassembly of section .text:
00000800 <__vdso_rt_sigreturn>:
800: 08b00893 li a7,139
804: 00000073 ecall
808: 0000 unimp
...
0000080c <__vdso_getcpu>:
80c: 0a800893 li a7,168
810: 00000073 ecall
814: 8082 ret
...
00000818 <__vdso_flush_icache>:
818: 10300893 li a7,259
81c: 00000073 ecall
820: 8082 ret
Finally, reuse all *.S from vdso in compat_vdso that makes
implementation clear and readable.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220405071314.3225832-17-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Detect hardware COMPAT (32bit U-mode) capability in rv64. If not
support COMPAT mode in hw, compat_elf_check_arch would return
false by compat_binfmt_elf.c
Add CLASS to enhance (compat_)elf_check_arch to distinguish
32BIT/64BIT elf.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220405071314.3225832-16-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
If the current task is in COMPAT mode, set SR_UXL_32 in status for
returning userspace. We need CONFIG _COMPAT to prevent compiling
errors with rv32 defconfig.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220405071314.3225832-14-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
These patch_text implementations are using stop_machine_cpuslocked
infrastructure with atomic cpu_count. The original idea: When the
master CPU patch_text, the others should wait for it. But current
implementation is using the first CPU as master, which couldn't
guarantee the remaining CPUs are waiting. This patch changes the
last CPU as the master to solve the potential risk.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 043cb41a85 ("riscv: introduce interfaces to patch kernel code")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This has a handful of new features
* Support for CURRENT_STACK_POINTER, which enables some extra stack
debugging for HARDENED_USERCOPY.
* Support for the new SBI CPU idle extension, via cpuidle and suspend
drivers.
* Profiling has been enabled in the defconfigs.
but is mostly fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAmJHFvoTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQUTQD/0Qzp1SQ7hGNBvg5mQKtP91xV4AU9Aa
db7UmeYXdey7KsCmQsb/6TVS7LOb+NGghC3XDjTs0Ts4YHnZOjDb5DVPxYcog8bc
a2h1ZOtWr0IiZwjve66tIhBOvXh/lllXEriNOjqG3qIz/uySqoIJfzEFakZRvzN5
PPXthpm7yMVl4wz37BhQQDE1s5yXrI31lmDp+k9jF/lyNfW+rHGUwbrG2Ur/p/Yx
+kNNuYhEVs8utkzedVELSbyMPjMKodQB5iOIdDw9iNIQi6e8BFVjPLnpXlcNEvmL
PSEt3POkMXWORMiuOD6WWTr0+z5BZjL8x5KhDse3KNg5xt7ExzoAY+WVDBU2eO+D
WvpXfGPVOKnaH9an7Rjrxa4VqEXYxPSaYmUu9yztQ1nRV4QXlAMol8fw/0ToJ8TH
yCMo75byy5q5OZayuj3QksiCHngcm9Q8s3KFWvMjuN0WSoJH1bT7aC/wJ+dBdpuq
3f0i7xAfktDbSqLXoEp72Msv42Mggy5+VPbAzhYAdRBcNfi6mt9afodS0CaL9nCo
3X/lLWRlL712cKyInB2tJ/sMrZpRAuRWHs/Q1ZNopmyYxHN0Of6qKtIvwreSbbLB
sWOIy5w/chpQYffFoBxcJMGEa8ArrEABV/FNWHrKTLUpJhN5Jas1gGgMCtTCBR2R
pkSudiPXV9e0qA==
=vbXd
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
"This has a handful of new features:
- Support for CURRENT_STACK_POINTER, which enables some extra stack
debugging for HARDENED_USERCOPY.
- Support for the new SBI CPU idle extension, via cpuidle and suspend
drivers.
- Profiling has been enabled in the defconfigs.
but is mostly fixes and cleanups"
* tag 'riscv-for-linus-5.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
RISC-V: K210 defconfigs: Drop redundant MEMBARRIER=n
RISC-V: defconfig: Drop redundant SBI HVC and earlycon
Documentation: riscv: remove non-existent directory from table of contents
riscv: cpu.c: don't use kernel-doc markers for comments
RISC-V: Enable profiling by default
RISC-V: module: fix apply_r_riscv_rcv_branch_rela typo
RISC-V: Declare per cpu boot data as static
RISC-V: Fix a comment typo in riscv_of_parent_hartid()
riscv: Increase stack size under KASAN
riscv: Fix fill_callchain return value
riscv: dts: canaan: Fix SPI3 bus width
riscv: Rename "sp_in_global" to "current_stack_pointer"
riscv module: remove (NOLOAD)
RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine
dt-bindings: Add common bindings for ARM and RISC-V idle states
cpuidle: Add RISC-V SBI CPU idle driver
cpuidle: Factor-out power domain related code from PSCI domain driver
RISC-V: Add SBI HSM suspend related defines
RISC-V: Add arch functions for non-retentive suspend entry/exit
RISC-V: Rename relocate() and make it global
...
Repair kernel-doc build warnings caused by using "/**" kernel-doc
markers for comments that are not in kernel-doc format:
cpu.c:89: warning: cannot understand function prototype: 'struct riscv_isa_ext_data isa_ext_arr[] = '
cpu.c:114: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This function name was spelled incorrectly, likely to do a typo.
Signed-off-by: Wu Caize <zepan@sipeed.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The per cpu boot data is only used within the cpu_ops_sbi.c. It can
be delcared as static.
Fixes: 9a2451f186 ("RISC-V: Avoid using per cpu array for ordered booting")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This fixes a typo in a comment that is both obvious and went unnoticed.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Fixes: a9b202606c ("RISC-V: Improve /proc/cpuinfo output for ISA extensions")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This series adds RISC-V CPU Idle support using SBI HSM suspend function.
The RISC-V SBI CPU idle driver added by this series is highly inspired
from the ARM PSCI CPU idle driver.
Special thanks Sandeep Tripathy for providing early feeback on SBI HSM
support in all above projects (RISC-V SBI specification, OpenSBI, and
Linux RISC-V).
* palmer/riscv-idle:
RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine
dt-bindings: Add common bindings for ARM and RISC-V idle states
cpuidle: Add RISC-V SBI CPU idle driver
cpuidle: Factor-out power domain related code from PSCI domain driver
RISC-V: Add SBI HSM suspend related defines
RISC-V: Add arch functions for non-retentive suspend entry/exit
RISC-V: Rename relocate() and make it global
RISC-V: Enable CPU_IDLE drivers
To follow the existing per-arch conventions, rename "sp_in_global" to
"current_stack_pointer". This will let it be used in non-arch places
(like HARDENED_USERCOPY).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
* Support for Sv57-based virtual memory.
* Various improvements for the MicroChip PolarFire SOC and the
associated Icicle dev board, which should allow upstream kernels to
boot without any additional modifications.
* An improved memmove() implementation.
* Support for the new Ssconfpmf and SBI PMU extensions, which allows for
a much more useful perf implementation on RISC-V systems.
* Support for restartable sequences.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmI96FcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiQBFD/425+6xmoOru6Wiki3Ja0fqQToNrQyW
IbmE/8AxUP7UxMvJSNzvQm8deXgklzvmegXCtnjwZZins971vMzzDSI83k/zn8I7
m5thVC9z01BjodV+pvIp/44hS6FesolOLzkVHksX0Zh6h0iidrc34Qf5HrqvvNfN
CZ/4K1+E9ig5r9qZp4WdvocCXj+FzwF/30GjKoW9vwA599CEG/dCo+TNN9GKD6XS
k+xiUGwlIRA+kCLSPFCi7ev9XPr1tCmQB7uB8Igcvr7Y3mWl8HKfajQVXBnXNRC3
ifbDxpx1elJiLPyf7Rza8jIDwDhLQdxBiwPgDgP9h9R4x0uF4efq8PzLzFlFmaE+
9Z9thfykBb5dXYDFDje9bAOXvKnGk7Iqxdsz0qWo/ChEQawX1+11bJb0TNN8QTT9
YvlQfUXgb1dmEcj5yG2uVE1Y8L7YNLRMsZU3W3FbmPJZoavSOuU4x0yCGeLyv597
76af3nuBJ5v80Db97gu6St+HIACeevKflsZUf/8GS/p7d1DlvmrWzQUMEycxPTG9
UZpZak58jh7AqQ9JbLnavhwmeacY50vpZOw6QHGAHSN+8daCPlOHDG7Ver7Z+kNj
+srJ7iKMvLnnaEjGNgavfxdqTOme1gv4LWs/JdHYMkpphqVN92xBDJnhXTPRVZiQ
0x39vK86qtB46A==
=Omc6
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for Sv57-based virtual memory.
- Various improvements for the MicroChip PolarFire SOC and the
associated Icicle dev board, which should allow upstream kernels to
boot without any additional modifications.
- An improved memmove() implementation.
- Support for the new Ssconfpmf and SBI PMU extensions, which allows
for a much more useful perf implementation on RISC-V systems.
- Support for restartable sequences.
* tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (36 commits)
rseq/selftests: Add support for RISC-V
RISC-V: Add support for restartable sequence
MAINTAINERS: Add entry for RISC-V PMU drivers
Documentation: riscv: Remove the old documentation
RISC-V: Add sscofpmf extension support
RISC-V: Add perf platform driver based on SBI PMU extension
RISC-V: Add RISC-V SBI PMU extension definitions
RISC-V: Add a simple platform driver for RISC-V legacy perf
RISC-V: Add a perf core library for pmu drivers
RISC-V: Add CSR encodings for all HPMCOUNTERS
RISC-V: Remove the current perf implementation
RISC-V: Improve /proc/cpuinfo output for ISA extensions
RISC-V: Do no continue isa string parsing without correct XLEN
RISC-V: Implement multi-letter ISA extension probing framework
RISC-V: Extract multi-letter extension names from "riscv, isa"
RISC-V: Minimal parser for "riscv, isa" strings
RISC-V: Correctly print supported extensions
riscv: Fixed misaligned memory access. Fixed pointer comparison.
MAINTAINERS: update riscv/microchip entry
riscv: dts: microchip: add new peripherals to icicle kit device tree
...
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical sections
that last too long for very large guests backed by 4 KiB SPTEs.
- Zap invalid and defunct roots asynchronously via concurrency-managed
work queue.
- Allowing yielding when zapping TDP MMU roots in response to the root's
last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the paging
structure now holds RCU as a proxy for all vCPUs running in the guest,
i.e. to prolongs the grace period on their behalf. It then kicks the
the vCPUs out of guest mode before doing rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that
need memcg accounting
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
=keox
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical
sections that last too long for very large guests backed by 4
KiB SPTEs.
- Zap invalid and defunct roots asynchronously via
concurrency-managed work queue.
- Allowing yielding when zapping TDP MMU roots in response to the
root's last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the
paging structure now holds RCU as a proxy for all vCPUs running
in the guest, i.e. to prolongs the grace period on their behalf.
It then kicks the the vCPUs out of guest mode before doing
rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that need
memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
KVM: use kvcalloc for array allocations
KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
kvm: x86: Require const tsc for RT
KVM: x86: synthesize CPUID leaf 0x80000021h if useful
KVM: x86: add support for CPUID leaf 0x80000021
KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
KVM: arm64: fix typos in comments
KVM: arm64: Generalise VM features into a set of flags
KVM: s390: selftests: Add error memop tests
KVM: s390: selftests: Add more copy memop tests
KVM: s390: selftests: Add named stages for memop test
KVM: s390: selftests: Add macro as abstraction for MEM_OP
KVM: s390: selftests: Split memop tests
KVM: s390x: fix SCK locking
RISC-V: KVM: Implement SBI HSM suspend call
RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
RISC-V: Add SBI HSM suspend related defines
RISC-V: KVM: Implement SBI v0.3 SRST extension
...
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
Add support for RSEQ, restartable sequences, for RISC-V. This also adds
support for the related selftests.
Note: the selftests require a linker with 3e7bd7f2414 ("RISC-V: Fix
linker problems with tls copy relocs."), which was first released in
2.33 (from 2019).
* palmer/riscv-rseq:
rseq/selftests: Add support for RISC-V
RISC-V: Add support for restartable sequence
... and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls. All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().
This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.
Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.
The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device. The latter should be the case for the current users of
topology_init().
Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64)
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add calls to rseq_signal_deliver() and rseq_syscall() to introduce RSEQ
support.
1. Call the rseq_signal_deliver() function to fixup on the pre-signal
frame when a signal is delivered on top of a restartable sequence
critical section.
2. Check that system calls are not invoked from within rseq critical
sections by invoking rseq_signal() from ret_from_syscall(). With
CONFIG_DEBUG_RSEQ, such behavior results in termination of the
process with SIGSEGV.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This series improves perf support for RISC-V based system using SBI PMU
and Sscofpmf extensions, by adding a new generic RISC-V perf framework
along with a pair of drivers: one that usese the new
performance-monitoring extensions and one that keeps support for the
existing systems that only have the legacy counters.
Tested-by: Nikita Shubin <n.shubin@yadro.com>
* palmer/riscv-pmu:
MAINTAINERS: Add entry for RISC-V PMU drivers
Documentation: riscv: Remove the old documentation
RISC-V: Add sscofpmf extension support
RISC-V: Add perf platform driver based on SBI PMU extension
RISC-V: Add RISC-V SBI PMU extension definitions
RISC-V: Add a simple platform driver for RISC-V legacy perf
RISC-V: Add a perf core library for pmu drivers
RISC-V: Add CSR encodings for all HPMCOUNTERS
RISC-V: Remove the current perf implementation
The sscofpmf extension allows counter overflow and filtering for
programmable counters. Enable the perf driver to handle the overflow
interrupt. The overflow interrupt is a hart local interrupt.
Thus, per cpu overflow interrupts are setup as a child under the root
INTC irq domain.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The current perf implementation in RISC-V is not very useful as it can not
count any events other than cycle/instructions. Moreover, perf record
can not be used or the events can not be started or stopped.
Remove the implementation now for a better platform driver in future
that will implement most of the missing functionality.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This series implements a generic framework to parse multi-letter ISA
extensions.
* palmer/riscv-isa:
RISC-V: Improve /proc/cpuinfo output for ISA extensions
RISC-V: Do no continue isa string parsing without correct XLEN
RISC-V: Implement multi-letter ISA extension probing framework
RISC-V: Extract multi-letter extension names from "riscv, isa"
RISC-V: Minimal parser for "riscv, isa" strings
RISC-V: Correctly print supported extensions
Currently, the /proc/cpuinfo outputs the entire riscv,isa string which
is not ideal when we have multiple ISA extensions present in the ISA
string. Some of them may not be enabled in kernel as well.
Same goes for the single letter extensions as well which prints the
entire ISA string. Some of they may not be valid ISA extensions as
well (e.g 'su')
Parse only the valid & enabled ISA extension and print them.
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The isa string should begin with either rv64 or rv32. Otherwise, it is
an incorrect isa string. Currently, the string parsing continues even if
it doesnot begin with current XLEN.
Fix this by checking if it found "rv64" or "rv32" in the beginning.
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Multi-letter extensions can be probed using exising
riscv_isa_extension_available API now. It doesn't support versioning
right now as there is no use case for it.
Individual extension specific implementation will be added during
each extension support.
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, there is no usage for version numbers in extensions as
any ratified non base ISA extension will always at v1.0.
Extract the extension names in place for future parsing.
Tested-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Tsukasa OI <research_trasio@irq.a4lg.com>
[Improved commit text and comments]
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Current hart ISA ("riscv,isa") parser don't correctly parse:
1. Multi-letter extensions
2. Version numbers
All ISA extensions ratified recently has multi-letter extensions
(except 'H'). The current "riscv,isa" parser that is easily confused
by multi-letter extensions and "p" in version numbers can be a huge
problem for adding new extensions through the device tree.
Leaving it would create incompatible hacks and would make "riscv,isa"
value unreliable.
This commit implements minimal parser for "riscv,isa" strings. With this,
we can safely ignore multi-letter extensions and version numbers.
[Improved commit text and fixed a bug around 's' in base extension]
Signed-off-by: Atish Patra <atishp@rivosinc.com>
[Fixed workaround for QEMU]
Signed-off-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This commit replaces BITS_PER_LONG with number of alphabet letters.
Current ISA pretty-printing code expects extension 'a' (bit 0) through
'z' (bit 25). Although bit 26 and higher is not currently used (thus never
cause an issue in practice), it will be an annoying problem if we start to
use those in the future.
This commit disables printing high bits for now.
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We add defines related to SBI HSM suspend call and also update HSM states
naming as-per the latest SBI specification.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
RISC-V can do PC-relative jumps with a 32bit range using the following
two instructions:
auipc t0, imm20 ; t0 = PC + imm20 * 2^12
jalr ra, t0, imm12 ; ra = PC + 4, PC = t0 + imm12
Crucially both the 20bit immediate imm20 and the 12bit immediate imm12
are treated as two's-complement signed values. For this reason the
immediates are usually calculated like this:
imm20 = (offset + 0x800) >> 12
imm12 = offset & 0xfff
..where offset is the signed offset from the auipc instruction. When
the 11th bit of offset is 0 the addition of 0x800 doesn't change the top
20 bits and imm12 considered positive. When the 11th bit is 1 the carry
of the addition by 0x800 means imm20 is one higher, but since imm12 is
then considered negative the two's complement representation means it
all cancels out nicely.
However, this addition by 0x800 (2^11) means an offset greater than or
equal to 2^31 - 2^11 would overflow so imm20 is considered negative and
result in a backwards jump. Similarly the lower range of offset is also
moved down by 2^11 and hence the true 32bit range is
[-2^31 - 2^11, 2^31 - 2^11)
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h.
While doing that rename tracehook_notify_resume to resume_user_mode_work.
Update all of the places that included tracehook.h for these functions to
include resume_user_mode.h instead.
Update all of the callers of tracehook_notify_resume to call
resume_user_mode_work.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rename tracehook_report_syscall_{entry,exit} to
ptrace_report_syscall_{entry,exit} and place them in ptrace.h
There is no longer any generic tracehook infractructure so make
these ptrace specific functions ptrace specific.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The hart registers and CSRs are not preserved in non-retentative
suspend state so we provide arch specific helper functions which
will save/restore hart context upon entry/exit to non-retentive
suspend state. These helper functions can be used by cpuidle
drivers for non-retentive suspend entry/exit.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The low-level relocate() function enables mmu and relocates
execution to link-time addresses. We rename relocate() function
to relocate_enable_mmu() function which is more informative.
Also, the relocate_enable_mmu() function will be used in the
resume path when a CPU wakes-up from a non-retentive suspend
so we make it global symbol.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We force select CPU_PM and provide asm/cpuidle.h so that we can
use CPU IDLE drivers for Linux RISC-V kernel.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <apatel@vetanamicro.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.
Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This patch sets sv57 on defaultly if CONFIG_64BIT. And do fallback to try
to set sv48 on boot time if sv57 is not supported in current hardware.
Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
If the boot CPU does not have the lowest hartid, "hartid - hbase" can
become negative, leading to an incorrect hmask, causing userspace to
crash with SEGV. This is observed on e.g. Starlight Beta, where cpuid 1
maps to hartid 0, and cpuid 0 maps to hartid 1.
Fix this by detecting this case, and shifting the accumulated mask and
updating hbase, if possible.
Fixes: 26fb751ca3 ("RISC-V: Do not use cpumask data structure for hartid bitmap")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The cpumask rework slightly changed the behavior of the code. Fix this
by treating an empty cpumask as meaning all online CPUs.
Extracted from a patch by Atish Patra <atishp@rivosinc.com>.
Reported-by: Jessica Clarke <jrtc27@jrtc27.com>
Fixes: 26fb751ca3 ("RISC-V: Do not use cpumask data structure for hartid bitmap")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jessica reports that using "1 << hartid" causes undefined behavior for
hartid 31 and up.
Fix this by using the BIT() helper instead of an explicit shift.
Reported-by: Jessica Clarke <jrtc27@jrtc27.com>
Fixes: 26fb751ca3 ("RISC-V: Do not use cpumask data structure for hartid bitmap")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>