This cosmetic change that is needed for audit support, specifically to
be able to filter according to cross-execution boundaries.
Optimize current_check_access_socket() to only handle the access
request.
Remove explicit domain->num_layers check which is now part of the
landlock_get_applicable_subject() call.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-6-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
This cosmetic change is needed for audit support, specifically to be
able to filter according to cross-execution boundaries.
Add landlock_get_applicable_subject(), mainly a copy of
landlock_get_applicable_domain(), which will fully replace it in a
following commit.
Optimize current_check_access_path() to only handle the access request.
Partially replace get_current_fs_domain() with explicit calls to
landlock_get_applicable_subject(). The remaining ones will follow with
more changes.
Remove explicit domain->num_layers check which is now part of the
landlock_get_applicable_subject() call.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Create a new domain.h file containing the struct landlock_hierarchy
definition and helpers. This type will grow with audit support. This
also prepares for a new domain type.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-4-mic@digikod.net
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Landlock IDs can be generated to uniquely identify Landlock objects.
For now, only Landlock domains get an ID at creation time. These IDs
map to immutable domain hierarchies.
Landlock IDs have important properties:
- They are unique during the lifetime of the running system thanks to
the 64-bit values: at worse, 2^60 - 2*2^32 useful IDs.
- They are always greater than 2^32 and must then be stored in 64-bit
integer types.
- The initial ID (at boot time) is randomly picked between 2^32 and
2^33, which limits collisions in logs across different boots.
- IDs are sequential, which enables users to order them.
- IDs may not be consecutive but increase with a random 2^4 step, which
limits side channels.
Such IDs can be exposed to unprivileged processes, even if it is not the
case with this audit patch series. The domain IDs will be useful for
user space to identify sandboxes and get their properties.
These Landlock IDs are more secure that other absolute kernel IDs such
as pipe's inodes which rely on a shared global counter.
For checkpoint/restore features (i.e. CRIU), we could easily implement a
privileged interface (e.g. sysfs) to set the next ID counter.
IDR/IDA are not used because we only need a bijection from Landlock
objects to Landlock IDs, and we must not recycle IDs. This enables us
to identify all Landlock objects during the lifetime of the system (e.g.
in logs), but not to access an object from an ID nor know if an ID is
assigned. Using a counter is simpler, it scales (i.e. avoids growing
memory footprint), and it does not require locking. We'll use proper
file descriptors (with IDs used as inode numbers) to access Landlock
objects.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Extract code from dump_common_audit_data() into the audit_log_lsm_data()
helper. This helps reuse common LSM audit data while not abusing
AUDIT_AVC records because of the common_lsm_audit() helper.
Depends-on: 7ccbe076d9 ("lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set")
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-2-mic@digikod.net
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Because Linux credentials are managed per thread, user space relies on
some hack to synchronize credential update across threads from the same
process. This is required by the Native POSIX Threads Library and
implemented by set*id(2) wrappers and libcap(3) to use tgkill(2) to
synchronize threads. See nptl(7) and libpsx(3). Furthermore, some
runtimes like Go do not enable developers to have control over threads
[1].
To avoid potential issues, and because threads are not security
boundaries, let's relax the Landlock (optional) signal scoping to always
allow signals sent between threads of the same process. This exception
is similar to the __ptrace_may_access() one.
hook_file_set_fowner() now checks if the target task is part of the same
process as the caller. If this is the case, then the related signal
triggered by the socket will always be allowed.
Scoping of abstract UNIX sockets is not changed because kernel objects
(e.g. sockets) should be tied to their creator's domain at creation
time.
Note that creating one Landlock domain per thread puts each of these
threads (and their future children) in their own scope, which is
probably not what users expect, especially in Go where we do not control
threads. However, being able to drop permissions on all threads should
not be restricted by signal scoping. We are working on a way to make it
possible to atomically restrict all threads of a process with the same
domain [2].
Add erratum for signal scoping.
Closes: https://github.com/landlock-lsm/go-landlock/issues/36
Fixes: 54a6e6bbf3 ("landlock: Add signal scoping")
Fixes: c899496501 ("selftests/landlock: Test signal scoping for threads")
Depends-on: 26f204380a ("fs: Fix file_set_fowner LSM hook inconsistencies")
Link: https://pkg.go.dev/kernel.org/pub/linux/libs/security/libcap/psx [1]
Link: https://github.com/landlock-lsm/linux/issues/2 [2]
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20250318161443.279194-6-mic@digikod.net
[mic: Add extra pointer check and RCU guard, and ease backport]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
-----BEGIN PGP SIGNATURE-----
iQJLBAABCAA1FiEEC+9tH1YyUwIQzUIeOKUVfIxDyBEFAmfi1Z4XHGNhc2V5QHNj
aGF1Zmxlci1jYS5jb20ACgkQOKUVfIxDyBGPlRAAua8w+rpw06Njdi8LNDytiMil
Z8a1dbF2yupLclaydLuwxOvJbdjNcCzEVDDFNYEfh6iBiXsaoqr1FO13mmdcjon7
NXPscvhBWbb0dXvPEk74upWk2HRUIS0BI3tofY1YaGJLdWZRM2z6qvHcb4jNkM89
o+wzzxme7HNwez4SpB+vmm8WZ4AH3rX7q0ihf3XOpplvxeKwob3ZCu+nNQe0AXMI
FjMXPn148O96UKJgYJFMV1rLzWmYXzrrDBvzYS79walyti9ct1SYKRY4Zs80MiJK
0mQGDpg60PFUldcsnBD9Pp5nifcLKg5nNUWb15AhRw1sR4wnKl4DxCA4032NYB/x
wuVW2rsRjPDuzD6XGS16FZDv86qSyxPtJmvk3qKJmiIQVRc1xhJWAH57A+UzgnLx
UGvIcIoJjifTX4EUZXdCdH44RBBs/tjCbrBKROPnKBPbkLHNxAzW/Z65hLvxHgg8
f0vFuSNAL5dCNRdspWgE5BayM0RoW/HZbY15/O+oc7hDgs2ORgKPtayt4uJZr1km
PhxG3/9BWRjga44RRwIGCsxzqVdO0l5FYYqh+AYnzTOz8S+kncmcPHbnWOYVU0aH
9u5jE46TCmRxVnEaco+1dhGBUZaFgjXpPv+PxT+LZgTVHLbhBuONCEIot8ejFOKc
TomnIzxqMacRAlURNXM=
=SUpV
-----END PGP SIGNATURE-----
Merge tag 'Smack-for-6.15' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
"This is a larger set of patches than usual, consisting of a set of
build clean-ups, a rework of error handling in setting up CIPSO label
specification and a bug fix in network labeling"
* tag 'Smack-for-6.15' of https://github.com/cschaufler/smack-next:
smack: recognize ipv4 CIPSO w/o categories
smack: Revert "smackfs: Added check catlen"
smack: remove /smack/logging if audit is not configured
smack: ipv4/ipv6: tcp/dccp/sctp: fix incorrect child socket label
smack: dont compile ipv6 code unless ipv6 is configured
Smack: fix typos and spelling errors
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmfgWewUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNTXA/9F7Fo5ov6mP15jChSSZWuVPBdi1gD
y8Q8sCbu/KeCRO1Qb4QTv8ZCVGkP+EDK47IIvLXj27Aa19y1m3E4r1mddCSBQ3eu
jSqR/kOXf3j8AWPP2m4qYK/EJvNqNd/V67PkktFal+95crcmz3IDV68qWuNafdSc
r8VuprrEw+NSuKhPh4e2tM0hvOmAzePuvI6gGPb9z7Fj807/qfSOteAkvYpJ1y+d
vZzHLeu3FRExxu4wKZZymGpT2+5Xl/MrjRJUtKuJdxXW8FphPUr5cfHDIP0Ae97w
J70RGr0Oy02dQnCtAMkOGi7lpS1S1r0Qnhr+eloQQvG7J2eRRPZqGrmaU69qopAo
JY/Xc7/r29pGwGnXtiHKZ4ej65mTIN9bmPsHIjjr01hiB/gEUnX2vdVSwVYLxOsF
dzCnXb1VBc4mSIJ1Sjst0a6CRNPVA3U/bCfCbvfeyhn6A0XHmJI1PDRbxEXavnki
sQIAtLv5M0Pyzyjij+6qHfd8TsUgiH/rtR6st31SnL5iqIWkE9wPMFldg064vHgS
8dECnF7G9ZU/OErJjTQVshJE3fDEJvbQj8YIq7u1gQOZV02G7U3q4R3Aoj3GoSKJ
dMjoeG18+yuIevW/OHWtbjp4QMpp2R4xuXaJJlfsB2OaOX6jSS4S5KpYO3eKQ/Jd
kNQxuG8VD3tK8jc=
=QD7q
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Add additional SELinux access controls for kernel file reads/loads
The SELinux kernel file read/load access controls were never updated
beyond the initial kernel module support, this pull request adds
support for firmware, kexec, policies, and x.509 certificates.
- Add support for wildcards in network interface names
There are a number of userspace tools which auto-generate network
interface names using some pattern of <XXXX>-<NN> where <XXXX> is a
fixed string, e.g. "podman", and <NN> is a increasing counter.
Supporting wildcards in the SELinux policy for network interfaces
simplifies the policy associted with these interfaces.
- Fix a potential problem in the kernel read file SELinux code
SELinux should always check the file label in the
security_kernel_read_file() LSM hook, regardless of if the file is
being read in chunks. Unfortunately, the existing code only
considered the file label on the first chunk; this pull request fixes
this problem.
There is more detail in the individual commit, but thankfully the
existing code didn't expose a bug due to multi-stage reads only
taking place in one driver, and that driver loading a file type that
isn't targeted by the SELinux policy.
- Fix the subshell error handling in the example policy loader
Minor fix to SELinux example policy loader in scripts/selinux due to
an undesired interaction with subshells and errexit.
* tag 'selinux-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: get netif_wildcard policycap from policy instead of cache
selinux: support wildcard network interface names
selinux: Chain up tool resolving errors in install_policy.sh
selinux: add permission checks for loading other kinds of kernel files
selinux: always check the file label in selinux_kernel_read_file()
selinux: fix spelling error
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmfgWgMUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNW5RAAvCDq5gBtY0aTNlULe637EVLSh+t8
PkSzHzu/NlzU6BfjtwSm2fuML8welTGxSwUPxUzMCI91gPdkGeFktefavT3xa+QI
BHWROn7fEJ/KmRZvngPeIkgLr5xhF5nBJmc/Jw71qem20zRzNgJnpzMX16d10Phx
dxd2xOO1qM3bv6Z9RcIssZRGaN+PHngpWWg+0B69XuaBUso87S6NDyKNn1XPmvoz
as96k+Wk/xAZGVEeCbs/+H5rBx6DLg+FfTRa06Oh4BFsqedpkDPxLrTgCJGJkA0H
dsK6O/993zvjx0Jn4ZPoJ9n35S82BmkCsz4bGq1xVl6FYUiMcm3/8yO41wllS+w4
j+RlTU/RIdB7n8EKyMMl1hj1stTvt3Bi9F5Cbf7ZEv0snfR00K4KVpi17jnFjUHv
kpOiEtXZb/NGQip7UAuUq0PisfqbiO4jJurYHRetDgv1WCy6+C8ufM5t6I+cnvmG
VG+dlxcW+rDIn6bLRVuGi9TJRsQ6eox9ipa+qEKNNiOXgftELcgT7m74nAS5m0uv
n5rDa221nPXecEB0X7d6YUFk711lly90dbelNeLrmv1w6jl8L1PpS1oBaW+UzGu9
46eGBd6pzu9otvK9WVyDEdotDOCrgH0sd7pTetqDhLJZ7KrGwyyqO2gD/JroUKcC
lnxBQwPnat86iI8=
=oxfV
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Various minor updates to the LSM Rust bindings
Changes include marking trivial Rust bindings as inlines and comment
tweaks to better reflect the LSM hooks.
- Add LSM/SELinux access controls to io_uring_allowed()
Similar to the io_uring_disabled sysctl, add a LSM hook to
io_uring_allowed() to enable LSMs a simple way to enforce security
policy on the use of io_uring. This pull request includes SELinux
support for this new control using the io_uring/allowed permission.
- Remove an unused parameter from the security_perf_event_open() hook
The perf_event_attr struct parameter was not used by any currently
supported LSMs, remove it from the hook.
- Add an explicit MAINTAINERS entry for the credentials code
We've seen problems in the past where patches to the credentials code
sent by non-maintainers would often languish on the lists for
multiple months as there was no one explicitly tasked with the
responsibility of reviewing and/or merging credentials related code.
Considering that most of the code under security/ has a vested
interest in ensuring that the credentials code is well maintained,
I'm volunteering to look after the credentials code and Serge Hallyn
has also volunteered to step up as an official reviewer. I posted the
MAINTAINERS update as a RFC to LKML in hopes that someone else would
jump up with an "I'll do it!", but beyond Serge it was all crickets.
- Update Stephen Smalley's old email address to prevent confusion
This includes a corresponding update to the mailmap file.
* tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
mailmap: map Stephen Smalley's old email addresses
lsm: remove old email address for Stephen Smalley
MAINTAINERS: add Serge Hallyn as a credentials reviewer
MAINTAINERS: add an explicit credentials entry
cred,rust: mark Credential methods inline
lsm,rust: reword "destroy" -> "release" in SecurityCtx
lsm,rust: mark SecurityCtx methods inline
perf: Remove unnecessary parameter of security check
lsm: fix a missing security_uring_allowed() prototype
io_uring,lsm,selinux: add LSM hooks for io_uring_setup()
io_uring: refactor io_uring_allowed()
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile time
(Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for it
- Introduce __nonstring_array for silencing future GCC 15 warnings
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hGrgAKCRA2KwveOeQk
u1WvAQC3ZxFu3b0Omfmht2pPqCltf2UOQNvUx3egjoeXpUaNSgD+Lxr/T4xksy7E
jHh7rCYDkruOWs3DHA5JjRQcf0BBLQo=
=FTQp
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As usual, it's scattered changes all over. Patches touching things
outside of our traditional areas in the tree have been Acked by
maintainers or were trivial changes:
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile
time (Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of
memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
it
- Introduce __nonstring_array for silencing future GCC 15 warnings"
* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
compiler_types: Introduce __nonstring_array
hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
x86/build: Remove -ffreestanding on i386 with GCC
ubsan/overflow: Enable ignorelist parsing and add type filter
ubsan/overflow: Enable pattern exclusions
ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
samples/check-exec: Fix script name
yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
kbuild: clang: Support building UM with SUBARCH=i386
loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
lib/string_choices: Rearrange functions in sorted order
string.h: Validate memtostr*()/strtomem*() arguments more carefully
compiler.h: Introduce __must_be_noncstr()
nilfs2: Mark on-disk strings as nonstring
uapi: stddef.h: Introduce __kernel_nonstring
x86/tdx: Mark message.bytes as nonstring
string: kunit: Mark nonstring test strings as __nonstring
scsi: qla2xxx: Mark device strings as nonstring
scsi: mpt3sas: Mark device strings as nonstring
scsi: mpi3mr: Mark device strings as nonstring
...
Use the "struct" keyword in kernel-doc when describing struct
ipefs_file. Add kernel-doc for the struct members also.
Don't use kernel-doc notation for 'policy_subdir'. kernel-doc does
not support documentation comments for data definitions.
This eliminates multiple kernel-doc warnings:
security/ipe/policy_fs.c:21: warning: cannot understand function prototype: 'struct ipefs_file '
security/ipe/policy_fs.c:407: warning: cannot understand function prototype: 'const struct ipefs_file policy_subdir[] = '
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Fan Wu <wufan@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Fan Wu <wufan@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc
onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q
N8YtzXArHWt7u0YbcVpy9WK3F72BdwU=
=VJgY
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs async dir updates from Christian Brauner:
"This contains cleanups that fell out of the work from async directory
handling:
- Change kern_path_locked() and user_path_locked_at() to never return
a negative dentry. This simplifies the usability of these helpers
in various places
- Drop d_exact_alias() from the remaining place in NFS where it is
still used. This also allows us to drop the d_exact_alias() helper
completely
- Drop an unnecessary call to fh_update() from nfsd_create_locked()
- Change i_op->mkdir() to return a struct dentry
Change vfs_mkdir() to return a dentry provided by the filesystems
which is hashed and positive. This allows us to reduce the number
of cases where the resulting dentry is not positive to very few
cases. The code in these places becomes simpler and easier to
understand.
- Repack DENTRY_* and LOOKUP_* flags"
* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
doc: fix inline emphasis warning
VFS: Change vfs_mkdir() to return the dentry.
nfs: change mkdir inode_operation to return alternate dentry if needed.
fuse: return correct dentry for ->mkdir
ceph: return the correct dentry on mkdir
hostfs: store inode in dentry after mkdir if possible.
Change inode_operations.mkdir to return struct dentry *
nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
nfs/vfs: discard d_exact_alias()
VFS: add common error checks to lookup_one_qstr_excl()
VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
VFS: repack LOOKUP_ bit flags.
VFS: repack DENTRY_ flags.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90qAwAKCRCRxhvAZXjc
on7lAP0akpIsJMWREg9tLwTNTySI1b82uKec0EAgM6T7n/PYhAD/T4zoY8UYU0Pr
qCxwTXHUVT6bkNhjREBkfqq9OkPP8w8=
=GxeN
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Mount notifications
The day has come where we finally provide a new api to listen for
mount topology changes outside of /proc/<pid>/mountinfo. A mount
namespace file descriptor can be supplied and registered with
fanotify to listen for mount topology changes.
Currently notifications for mount, umount and moving mounts are
generated. The generated notification record contains the unique
mount id of the mount.
The listmount() and statmount() api can be used to query detailed
information about the mount using the received unique mount id.
This allows userspace to figure out exactly how the mount topology
changed without having to generating diffs of /proc/<pid>/mountinfo
in userspace.
- Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
api
- Support detached mounts in overlayfs
Since last cycle we support specifying overlayfs layers via file
descriptors. However, we don't allow detached mounts which means
userspace cannot user file descriptors received via
open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
attach them to a mount namespace via move_mount() first.
This is cumbersome and means they have to undo mounts via umount().
Allow them to directly use detached mounts.
- Allow to retrieve idmappings with statmount
Currently it isn't possible to figure out what idmapping has been
attached to an idmapped mount. Add an extension to statmount() which
allows to read the idmapping from the mount.
- Allow creating idmapped mounts from mounts that are already idmapped
So far it isn't possible to allow the creation of idmapped mounts
from already idmapped mounts as this has significant lifetime
implications. Make the creation of idmapped mounts atomic by allow to
pass struct mount_attr together with the open_tree_attr() system call
allowing to solve these issues without complicating VFS lookup in any
way.
The system call has in general the benefit that creating a detached
mount and applying mount attributes to it becomes an atomic operation
for userspace.
- Add a way to query statmount() for supported options
Allow userspace to query which mount information can be retrieved
through statmount().
- Allow superblock owners to force unmount
* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
umount: Allow superblock owners to force umount
selftests: add tests for mount notification
selinux: add FILE__WATCH_MOUNTNS
samples/vfs: fix printf format string for size_t
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper
statmount: add a new supported_mask field
samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
selftests: add tests for using detached mount with overlayfs
samples/vfs: check whether flag was raised
statmount: allow to retrieve idmappings
uidgid: add map_id_range_up()
fs: allow detached mounts in clone_private_mount()
selftests/overlayfs: test specifying layers as O_PATH file descriptors
fs: support O_PATH fds with FSCONFIG_SET_FD
vfs: add notifications for mount attach and detach
fanotify: notify on mount attach and detach
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90p4AAKCRCRxhvAZXjc
ojMIAP9atkG3u7+490+NGWLdulQlaHnD51Owa9MiW87UfKpsTQEArwi/NrJqXJNT
PFQ2xIa5TxG+9haChR89w3kjZ6b/hgs=
=iDkx
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Add CONFIG_DEBUG_VFS infrastucture:
- Catch invalid modes in open
- Use the new debug macros in inode_set_cached_link()
- Use debug-only asserts around fd allocation and install
- Place f_ref to 3rd cache line in struct file to resolve false
sharing
Cleanups:
- Start using anon_inode_getfile_fmode() helper in various places
- Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
f_pos_lock
- Add unlikely() to kcmp()
- Remove legacy ->remount_fs method from ecryptfs after port to the
new mount api
- Remove invalidate_inodes() in favour of evict_inodes()
- Simplify ep_busy_loopER by removing unused argument
- Avoid mmap sem relocks when coredumping with many missing pages
- Inline getname()
- Inline new_inode_pseudo() and de-staticize alloc_inode()
- Dodge an atomic in putname if ref == 1
- Consistently deref the files table with rcu_dereference_raw()
- Dedup handling of struct filename init and refcounts bumps
- Use wq_has_sleeper() in end_dir_add()
- Drop the lock trip around I_NEW wake up in evict()
- Load the ->i_sb pointer once in inode_sb_list_{add,del}
- Predict not reaching the limit in alloc_empty_file()
- Tidy up do_sys_openat2() with likely/unlikely
- Call inode_sb_list_add() outside of inode hash lock
- Sort out fd allocation vs dup2 race commentary
- Turn page_offset() into a wrapper around folio_pos()
- Remove locking in exportfs around ->get_parent() call
- try_lookup_one_len() does not need any locks in autofs
- Fix return type of several functions from long to int in open
- Fix return type of several functions from long to int in ioctls
Fixes:
- Fix watch queue accounting mismatch"
* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
fs: sort out fd allocation vs dup2 race commentary, take 2
fs: call inode_sb_list_add() outside of inode hash lock
fs: tidy up do_sys_openat2() with likely/unlikely
fs: predict not reaching the limit in alloc_empty_file()
fs: load the ->i_sb pointer once in inode_sb_list_{add,del}
fs: drop the lock trip around I_NEW wake up in evict()
fs: use wq_has_sleeper() in end_dir_add()
VFS/autofs: try_lookup_one_len() does not need any locks
fs: dedup handling of struct filename init and refcounts bumps
fs: consistently deref the files table with rcu_dereference_raw()
exportfs: remove locking around ->get_parent() call.
fs: use debug-only asserts around fd allocation and install
fs: dodge an atomic in putname if ref == 1
vfs: Remove invalidate_inodes()
ecryptfs: remove NULL remount_fs from super_operations
watch_queue: fix pipe accounting mismatch
fs: place f_ref to 3rd cache line in struct file to resolve false sharing
epoll: simplify ep_busy_loop by removing always 0 argument
fs: Turn page_offset() into a wrapper around folio_pos()
kcmp: improve performance adding an unlikely hint to task comparisons
...
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point. The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.
However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.
Fix this by using a flag to indicate that a key can be gc'd now rather than
looking at the key's refcount in the garbage collector.
Fixes: 9578e327b2 ("keys: update key quotas in key_put()")
Reported-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/673b6aec.050a0220.87769.004a.GAE@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Potentially include errata for Landlock ABI v5 (Linux 6.10) and v6
(Linux 6.12). That will be useful for the following signal scoping
erratum.
As explained in errata.h, this commit should be backportable without
conflict down to ABI v5. It must then not include the errata/abi-6.h
file.
Fixes: 54a6e6bbf3 ("landlock: Add signal scoping")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature. For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons). However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.
Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts. The solution is to only update a
file related to the lower ABI impacted by this issue. All the ABI files
are then used to create a bitmask of fixes.
The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.
The actual errata will come with dedicated commits. The description is
not actually used in the code but serves as documentation.
Create the landlock_abi_version symbol and use its value to check errata
consistency.
Update test_base's create_ruleset_checks_ordering tests and add errata
tests.
This commit is backportable down to the first version of Landlock.
Fixes: 3532b0b435 ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
To ease backports in setup.c, let's group changes from
__lsm_ro_after_init to __ro_after_init with commit f22f9aaf6c
("selinux: remove the runtime disable functionality"), and the
landlock_lsmid addition with commit f3b8788cde ("LSM: Identify modules
by more than name").
That will help to backport the following errata.
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-2-mic@digikod.net
Fixes: f3b8788cde ("LSM: Identify modules by more than name")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Any driver that needs these library functions should already be selecting
the corresponding Kconfig symbols, so there is no real point in making
these visible.
The original patch that made these user selectable described problems
with drivers failing to select the code they use, but for consistency
it's better to always use 'select' on a symbol than to mix it with
'depends on'.
Fixes: e56e189855 ("lib/crypto: add prompts back to crypto libraries")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Retrieve the netif_wildcard policy capability in security_netif_sid()
from the locked active policy instead of the cached value in
selinux_state.
Fixes: 8af43b61c1 ("selinux: support wildcard network interface names")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: /netlabel/netif/ due to a typo in the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Certain bpf syscall subcommands are available for usage from both
userspace and the kernel. LSM modules or eBPF gatekeeper programs may
need to take a different course of action depending on whether or not
a BPF syscall originated from the kernel or userspace.
Additionally, some of the bpf_attr struct fields contain pointers to
arbitrary memory. Currently the functionality to determine whether or
not a pointer refers to kernel memory or userspace memory is exposed
to the bpf verifier, but that information is missing from various LSM
hooks.
Here we augment the LSM hooks to provide this data, by simply passing
a boolean flag indicating whether or not the call originated in the
kernel, in any hook that contains a bpf_attr struct that corresponds
to a subcommand that may be called from the kernel.
Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250310221737.821889-2-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Remove my old, no longer functioning, email address from comments.
Could alternatively replace with my current email but seems
redundant with MAINTAINERS and prone to being out of date.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek
BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm
SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+
t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd
n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ
JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc
eWLFUeg=
=Wypt
-----END PGP SIGNATURE-----
Merge 6.14-rc6 into driver-core-next
We need the driver core fix in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function can be replaced by evict_inodes. The only difference is
that evict_inodes() skips the inodes with positive refcount without
touching ->i_lock, but they are equivalent as evict_inodes() repeats the
refcount check after having grabbed ->i_lock.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
The vanilla has_capability() function has been unused since 2018's
commit dcb569cf6a ("Smack: ptrace capability use fixes")
Remove it.
Fixup a comment in security/commoncap.c that referenced it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Serge Hallyn <sergeh@kernel.org>
current->group_leader is stable, no need to take rcu_read_lock() and do
get/put_task_struct().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250219161417.GA20851@redhat.com
Signed-off-by: Kees Cook <kees@kernel.org>
Add support for wildcard matching of network interface names. This is
useful for auto-generated interfaces, for example podman creates network
interfaces for containers with the naming scheme podman0, podman1,
podman2, ...
To maintain backward compatibility guard this feature with a new policy
capability 'netif_wildcard'.
Netifcon definitions are compared against in the order given by the
policy, so userspace tools should sort them in a reasonable order.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Updated the MODULE_COMPRESS_NONE with MODULE_COMPRESS as it was no longer
available from kernel modules. As MODULE_COMPRESS and MODULE_DECOMPRESS
depends on MODULES removing MODULES as well.
Fixes: c7ff693fa2 ("module: Split modules_install compression and in-kernel decompression")
Signed-off-by: Arulpandiyan Vadivel <arulpandiyan.vadivel@siemens.com>
Link: https://lore.kernel.org/r/20250302103831.285381-1-arulpandiyan.vadivel@siemens.com
Signed-off-by: Kees Cook <kees@kernel.org>
FORTIFY_SOURCE is a hardening option both at build and runtime. Move
it under 'Kernel hardening options'.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250123221115.19722-5-mgorman@techsingularity.net
Signed-off-by: Kees Cook <kees@kernel.org>
HARDENED_USERCOPY defaults to on if enabled at compile time. Allow
hardened_usercopy= default to be set at compile time similar to
init_on_alloc= and init_on_free=. The intent is that hardening
options that can be disabled at runtime can set their default at
build time.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20250123221115.19722-3-mgorman@techsingularity.net
Signed-off-by: Kees Cook <kees@kernel.org>
There is a submenu for 'Kernel hardening options' under "Security".
Move HARDENED_USERCOPY under the hardening options as it is clearly
related.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250123221115.19722-2-mgorman@techsingularity.net
Signed-off-by: Kees Cook <kees@kernel.org>
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g. on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns. For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.
This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes. Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.
This lookup-after-mkdir requires that the directory remains locked to
avoid races. Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.
To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
NULL - the directory was created and no other dentry was used
ERR_PTR() - an error occurred
non-NULL - this other dentry was spliced in
This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations. Subsequent patches will make
further changes to some file-systems to return a correct dentry.
Not all filesystems reliably result in a positive hashed dentry:
- NFS, cifs, hostfs will sometimes need to perform a lookup of
the name to get inode information. Races could result in this
returning something different. Note that this lookup is
non-atomic which is what we are trying to avoid. Placing the
lookup in filesystem code means it only happens when the filesystem
has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
operation ensures that lookup will be called to correctly populate
the dentry. This could be fixed but I don't think it is important
to any of the users of vfs_mkdir() which look at the dentry.
The recommendation to use
d_drop();d_splice_alias()
is ugly but fits with current practice. A planned future patch will
change this.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
Although the LSM hooks for loading kernel modules were later generalized
to cover loading other kinds of files, SELinux didn't implement
corresponding permission checks, leaving only the module case covered.
Define and add new permission checks for these other cases.
Signed-off-by: Cameron K. Williams <ckwilliams.work@gmail.com>
Signed-off-by: Kipp N. Davis <kippndavis.work@gmx.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: merge fuzz, line length, and spacing fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ785FhAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSILQBAMFwpFClzjVeWyLFNd/gaTlPWeedvnag+yZu
CK9q39jOAP9tj1unQBFpsI7jsTk6ZxPVxb4DymPIirZMb/5FuxUnDQ==
=QrSM
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"Fixes to TCP socket identification, documentation, and tests"
* tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add binaries to .gitignore
selftests/landlock: Test that MPTCP actions are not restricted
selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP
landlock: Fix non-TCP sockets restriction
landlock: Minor typo and grammar fixes in IPC scoping documentation
landlock: Fix grammar error
selftests/landlock: Enable the new CONFIG_AF_UNIX_OOB
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZ78LSBQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5X1kAQCsB9LO0NX+DhZILASeSJfzjAkY0cGY
jzQ3eq3keLkXrQD/RjeKD9e/qjBrXX0C4qbr9e8+JYuzY22o911bEiK67wg=
=lOQV
-----END PGP SIGNATURE-----
Merge tag 'integrity-v6.14-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fixes from Mimi Zohar:
"One bugfix and one spelling cleanup. The bug fix restores a
performance improvement"
* tag 'integrity-v6.14-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Reset IMA_NONACTION_RULE_FLAGS after post_setattr
integrity: fix typos and spelling errors
It seems that the attr parameter was never been used in security
checks since it was first introduced by:
commit da97e18458 ("perf_event: Add support for LSM and SELinux checks")
so remove it.
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
If SMACK label has CIPSO representation w/o categories, e.g.:
| # cat /smack/cipso2
| foo 10
| @ 250/2
| ...
then SMACK does not recognize such CIPSO in input ipv4 packets
and substitues '*' label instead. Audit records may look like
| lsm=SMACK fn=smack_socket_sock_rcv_skb action=denied
| subject="*" object="_" requested=w pid=0 comm="swapper/1" ...
This happens in two steps:
1) security/smack/smackfs.c`smk_set_cipso
does not clear NETLBL_SECATTR_MLS_CAT
from (struct smack_known *)skp->smk_netlabel.flags
on assigning CIPSO w/o categories:
| rcu_assign_pointer(skp->smk_netlabel.attr.mls.cat, ncats.attr.mls.cat);
| skp->smk_netlabel.attr.mls.lvl = ncats.attr.mls.lvl;
2) security/smack/smack_lsm.c`smack_from_secattr
can not match skp->smk_netlabel with input packet's
struct netlbl_lsm_secattr *sap
because sap->flags have not NETLBL_SECATTR_MLS_CAT (what is correct)
but skp->smk_netlabel.flags have (what is incorrect):
| if ((sap->flags & NETLBL_SECATTR_MLS_CAT) == 0) {
| if ((skp->smk_netlabel.flags &
| NETLBL_SECATTR_MLS_CAT) == 0)
| found = 1;
| break;
| }
This commit sets/clears NETLBL_SECATTR_MLS_CAT in
skp->smk_netlabel.flags according to the presense of CIPSO categories.
The update of smk_netlabel is not atomic, so input packets processing
still may be incorrect during short time while update proceeds.
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
This reverts commit ccfd889acb
The indicated commit
* does not describe the problem that change tries to solve
* has programming issues
* introduces a bug: forever clears NETLBL_SECATTR_MLS_CAT
in (struct smack_known *)skp->smk_netlabel.flags
Reverting the commit to reapproach original problem
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Using RCU lifetime rules to access kernfs_node::name can avoid the
trouble with kernfs_rename_lock in kernfs_name() and kernfs_path_from_node()
if the fs was created with KERNFS_ROOT_INVARIANT_PARENT. This is usefull
as it allows to implement kernfs_path_from_node() only with RCU
protection and avoiding kernfs_rename_lock. The lock is only required if
the __parent node can be changed and the function requires an unchanged
hierarchy while it iterates from the node to its parent.
The change is needed to allow the lookup of the node's path
(kernfs_path_from_node()) from context which runs always with disabled
preemption and or interrutps even on PREEMPT_RT. The problem is that
kernfs_rename_lock becomes a sleeping lock on PREEMPT_RT.
I went through all ::name users and added the required access for the lookup
with a few extensions:
- rdtgroup_pseudo_lock_create() drops all locks and then uses the name
later on. resctrl supports rename with different parents. Here I made
a temporal copy of the name while it is used outside of the lock.
- kernfs_rename_ns() accepts NULL as new_parent. This simplifies
sysfs_move_dir_ns() where it can set NULL in order to reuse the current
name.
- kernfs_rename_ns() is only using kernfs_rename_lock if the parents are
different. All users use either kernfs_rwsem (for stable path view) or
just RCU for the lookup. The ::name uses always RCU free.
Use RCU lifetime guarantees to access kernfs_node::name.
Suggested-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reported-by: syzbot+6ea37e2e6ffccf41a7e6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/67251dc6.050a0220.529b6.015e.GAE@google.com/
Reported-by: Hillf Danton <hdanton@sina.com>
Closes: https://lore.kernel.org/20241102001224.2789-1-hdanton@sina.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20250213145023.2820193-7-bigeasy@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use sk_is_tcp() to check if socket is TCP in bind(2) and connect(2)
hooks.
SMC, MPTCP, SCTP protocols are currently restricted by TCP access
rights. The purpose of TCP access rights is to provide control over
ports that can be used by userland to establish a TCP connection.
Therefore, it is incorrect to deny bind(2) and connect(2) requests for a
socket of another protocol.
However, SMC, MPTCP and RDS implementations use TCP internal sockets to
establish communication or even to exchange packets over a TCP
connection [1]. Landlock rules that configure bind(2) and connect(2)
usage for TCP sockets should not cover requests for sockets of such
protocols. These protocols have different set of security issues and
security properties, therefore, it is necessary to provide the userland
with the ability to distinguish between them (eg. [2]).
Control over TCP connection used by other protocols can be achieved with
upcoming support of socket creation control [3].
[1] https://lore.kernel.org/all/62336067-18c2-3493-d0ec-6dd6a6d3a1b5@huawei-partners.com/
[2] https://lore.kernel.org/all/20241204.fahVio7eicim@digikod.net/
[3] https://lore.kernel.org/all/20240904104824.1844082-1-ivanov.mikhail1@huawei-partners.com/
Closes: https://github.com/landlock-lsm/linux/issues/40
Fixes: fff69fb03d ("landlock: Support network rules with TCP bind and connect")
Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Link: https://lore.kernel.org/r/20250205093651.1424339-2-ivanov.mikhail1@huawei-partners.com
[mic: Format commit message to 72 columns]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
If CONFIG_AUDIT is not set then
SMACK does not generate audit messages,
however, keeps audit control file, /smack/logging,
while there is no entity to control.
This change removes audit control file /smack/logging
when audit is not configured in the kernel
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Since inception [1], SMACK initializes ipv* child socket security
for connection-oriented communications (tcp/sctp/dccp)
during accept() syscall, in the security_sock_graft() hook:
| void smack_sock_graft(struct sock *sk, ...)
| {
| // only ipv4 and ipv6 are eligible here
| // ...
| ssp = sk->sk_security; // socket security
| ssp->smk_in = skp; // process label: smk_of_current()
| ssp->smk_out = skp; // process label: smk_of_current()
| }
This approach is incorrect for two reasons:
A) initialization occurs too late for child socket security:
The child socket is created by the kernel once the handshake
completes (e.g., for tcp: after receiving ack for syn+ack).
Data can legitimately start arriving to the child socket
immediately, long before the application calls accept()
on the socket.
Those data are (currently — were) processed by SMACK using
incorrect child socket security attributes.
B) Incoming connection requests are handled using the listening
socket's security, hence, the child socket must inherit the
listening socket's security attributes.
smack_sock_graft() initilizes the child socket's security with
a process label, as is done for a new socket()
But ... the process label is not necessarily the same as the
listening socket label. A privileged application may legitimately
set other in/out labels for a listening socket.
When this happens, SMACK processes incoming packets using
incorrect socket security attributes.
In [2] Michael Lontke noticed (A) and fixed it in [3] by adding
socket initialization into security_sk_clone_security() hook like
| void smack_sk_clone_security(struct sock *oldsk, struct sock *newsk)
| {
| *(struct socket_smack *)newsk->sk_security =
| *(struct socket_smack *)oldsk->sk_security;
| }
This initializes the child socket security with the parent (listening)
socket security at the appropriate time.
I was forced to revisit this old story because
smack_sock_graft() was left in place by [3] and continues overwriting
the child socket's labels with the process label,
and there might be a reason for this, so I undertook a study.
If the process label differs from the listening socket's labels,
the following occurs for ipv4:
assigning the smk_out is not accompanied by netlbl_sock_setattr,
so the outgoing packet's cipso label does not change.
So, the only effect of this assignment for interhost communications
is a divergence between the program-visible “out” socket label and
the cipso network label. For intrahost communications this label,
however, becomes visible via secmark netfilter marking, and is
checked for access rights by the client, receiving side.
Assigning the smk_in affects both interhost and intrahost
communications: the server begins to check access rights against
an wrong label.
Access check against wrong label (smk_in or smk_out),
unsurprisingly fails, breaking the connection.
The above affects protocols that calls security_sock_graft()
during accept(), namely: {tcp,dccp,sctp}/{ipv4,ipv6}
One extra security_sock_graft() caller, crypto/af_alg.c`af_alg_accept
is not affected, because smack_sock_graft() does nothing for PF_ALG.
To reproduce, assign non-default in/out labels to a listening socket,
setup rules between these labels and client label, attempt to connect
and send some data.
Ipv6 specific: ipv6 packets do not convey SMACK labels. To reproduce
the issue in interhost communications set opposite labels in
/smack/ipv6host on both hosts.
Ipv6 intrahost communications do not require tricking, because SMACK
labels are conveyed via secmark netfilter marking.
So, currently smack_sock_graft() is not useful, but harmful,
therefore, I have removed it.
This fixes the issue for {tcp,dccp}/{ipv4,ipv6},
but not sctp/{ipv4,ipv6}.
Although this change is necessary for sctp+smack to function
correctly, it is not sufficient because:
sctp/ipv4 does not call security_sk_clone() and
sctp/ipv6 ignores SMACK completely.
These are separate issues, belong to other subsystem,
and should be addressed separately.
[1] 2008-02-04,
Fixes: e114e47377 ("Smack: Simplified Mandatory Access Control Kernel")
[2] Michael Lontke, 2022-08-31, SMACK LSM checks wrong object label
during ingress network traffic
Link: https://lore.kernel.org/linux-security-module/6324997ce4fc092c5020a4add075257f9c5f6442.camel@elektrobit.com/
[3] 2022-08-31, michael.lontke,
commit 4ca165fc6c ("SMACK: Add sk_clone_security LSM hook")
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
I want to be sure that ipv6-specific code
is not compiled in kernel binaries
if ipv6 is not configured.
[1] was getting rid of "unused variable" warning, but,
with that, it also mandated compilation of a handful ipv6-
specific functions in ipv4-only kernel configurations:
smk_ipv6_localhost, smack_ipv6host_label, smk_ipv6_check.
Their compiled bodies are likely to be removed by compiler
from the resulting binary, but, to be on the safe side,
I remove them from the compiler view.
[1]
Fixes: 00720f0e7f ("smack: avoid unused 'sip' variable warning")
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Fix typos and spelling errors in security/smack module comments that
were identified using the codespell tool.
No functional changes - documentation only.
Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
When CONFIG_SECURITY_APPARMOR_DEBUG_ASSERTS is disabled, there is a
warning that sock is unused:
security/apparmor/file.c: In function '__file_sock_perm':
security/apparmor/file.c:544:24: warning: unused variable 'sock' [-Wunused-variable]
544 | struct socket *sock = (struct socket *) file->private_data;
| ^~~~
sock was moved into aa_sock_file_perm(), where the same check is
present, so remove sock and the assertion from __file_sock_perm() to fix
the warning.
Fixes: c05e705812 ("apparmor: add fine grained af_unix mediation")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501190757.myuLxLyL-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
This follows the established practice and fixes a build failure for me:
security/apparmor/file.c: In function ‘__file_sock_perm’:
security/apparmor/file.c:544:24: error: unused variable ‘sock’ [-Werror=unused-variable]
544 | struct socket *sock = (struct socket *) file->private_data;
| ^~~~
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Fix typos and spelling errors in apparmor module comments that were
identified using the codespell tool.
No functional changes - documentation only.
Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
clang warns:
security/apparmor/label.c:206:15: error: address of array 'new->vec' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
206 | AA_BUG(!new->vec);
| ~~~~~~^~~
The address of this array can never be NULL because it is not at the
beginning of a structure. Convert the assertion to check that the new
pointer is not NULL.
Fixes: de4754c801 ("apparmor: carry mediation check on label")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501191802.bDp2voTJ-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
It is desirable to allow LSM to configure accessibility to io_uring
because it is a coarse yet very simple way to restrict access to it. So,
add an LSM for io_uring_allowed() to guard access to io_uring.
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
[PM: merge fuzz due to changes in preceding patches, subj tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 2039bda1fa ("LSM: Add "contents" flag to kernel_read_file hook")
added a new flag to the security_kernel_read_file() LSM hook, "contents",
which was set if a file was being read in its entirety or if it was the
first chunk read in a multi-step process. The SELinux LSM callback was
updated to only check against the file label if this "contents" flag was
set, meaning that in multi-step reads the file label was not considered
in the access control decision after the initial chunk.
Thankfully the only in-tree user that performs a multi-step read is the
"bcm-vk" driver and it is loading firmware, not a kernel module, so there
are no security regressions to worry about. However, we still want to
ensure that the SELinux code does the right thing, and *always* checks
the file label, especially as there is a chance the file could change
between chunk reads.
Fixes: 2039bda1fa ("LSM: Add "contents" flag to kernel_read_file hook")
Signed-off-by: Paul Moore <paul@paul-moore.com>
The dac_mmap_min_addr belongs to min_addr.c, move it to
min_addr.c from /kernel/sysctl.c. In the previous Linux kernel
boot process, sysctl_init_bases needs to be executed before
init_mmap_min_addr, So, register_sysctl_init should be executed
before update_mmap_min_addr in init_mmap_min_addr. And according
to the compilation condition in security/Makefile:
obj-$(CONFIG_MMU) += min_addr.o
if CONFIG_MMU is not defined, min_addr.c would not be included in the
compilation process. So, drop the CONFIG_MMU check.
Signed-off-by: Kaixiong Yu <yukaixiong@huawei.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Commit 0d73a55208 ("ima: re-introduce own integrity cache lock")
mistakenly reverted the performance improvement introduced in commit
42a4c60319 ("ima: fix ima_inode_post_setattr"). The unused bit mask was
subsequently removed by commit 11c60f23ed ("integrity: Remove unused
macro IMA_ACTION_RULE_FLAGS").
Restore the performance improvement by introducing the new mask
IMA_NONACTION_RULE_FLAGS, equal to IMA_NONACTION_FLAGS without
IMA_NEW_FILE, which is not a rule-specific flag.
Finally, reset IMA_NONACTION_RULE_FLAGS instead of IMA_NONACTION_FLAGS in
process_measurement(), if the IMA_CHANGE_ATTR atomic flag is set (after
file metadata modification).
With this patch, new files for which metadata were modified while they are
still open, can be reopened before the last file close (when security.ima
is written), since the IMA_NEW_FILE flag is not cleared anymore. Otherwise,
appraisal fails because security.ima is missing (files with IMA_NEW_FILE
set are an exception).
Cc: stable@vger.kernel.org # v4.16.x
Fixes: 0d73a55208 ("ima: re-introduce own integrity cache lock")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Fix typos and spelling errors in integrity module comments that were
identified using the codespell tool.
No functional changes - documentation only.
Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Fix spelling error in selinux module comments that were identified
using the codespell tool.
No functional changes - documentation only.
Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 08ae2487b2 ("tomoyo: automatically use patterns for several
situations in learning mode") replaced only $PID part of procfs pathname
with \$ pattern. But it turned out that we need to also replace $TID part
and $FD part to make this functionality useful for e.g. /bin/lsof .
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap library
code.
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some
cleanup and Rust preparation in the xarray library code.
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes
pathnames in some code comments.
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the
new secs_to_jiffies() in various places where that is appropriate.
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API.
- "Convert ocfs2 to use folios" from Matthew Wilcox does that.
- "Remove get_task_comm() and print task comm directly" from Yafang Shao
removes now-unneeded calls to get_task_comm() in various places.
- "squashfs: reduce memory usage and update docs" from Phillip Lougher
implements some memory savings in squashfs and performs some
maintainability work.
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work.
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a
corrupted image.
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc.
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger.
- "minmax.h: Cleanups and minor optimisations" from David Laight
does some maintenance work on the min/max library code.
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work
on the xarray library code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5SP5QAKCRDdBJ7gKXxA
jqN7AQChvwXGG43n4d5SDiA/rH7ddvowQcDqhC9cAMJ1ReR7qwEA8/LIWDE4PdMX
mJnaZ1/ibpEpearrChCViApQtcyEGQI=
=ti4E
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Mainly individually changelogged singleton patches. The patch series
in this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap
library code
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
some cleanup and Rust preparation in the xarray library code
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven
fixes pathnames in some code comments
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
the new secs_to_jiffies() in various places where that is
appropriate
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API
- "Convert ocfs2 to use folios" from Matthew Wilcox does that
- "Remove get_task_comm() and print task comm directly" from Yafang
Shao removes now-unneeded calls to get_task_comm() in various
places
- "squashfs: reduce memory usage and update docs" from Phillip
Lougher implements some memory savings in squashfs and performs
some maintainability work
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented
with a corrupted image
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger
- "minmax.h: Cleanups and minor optimisations" from David Laight does
some maintenance work on the min/max library code
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
work on the xarray library code"
* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
ocfs2: use str_yes_no() and str_no_yes() helper functions
include/linux/lz4.h: add some missing macros
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Xarray: remove repeat check in xas_squash_marks()
Xarray: distinguish large entries correctly in xas_split_alloc()
Xarray: move forward index correctly in xas_pause()
Xarray: do not return sibling entries from xas_find_marked()
ipc/util.c: complete the kernel-doc function descriptions
gcov: clang: use correct function param names
latencytop: use correct kernel-doc format for func params
minmax.h: remove some #defines that are only expanded once
minmax.h: simplify the variants of clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: update some comments
minmax.h: add whitespace around operators and after commas
nilfs2: do not update mtime of renamed directory that is not moved
nilfs2: handle errors that nilfs_prepare_chunk() may return
CREDITS: fix spelling mistake
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmePs7oACgkQnJ2qBz9k
QNmHuAf9GkLnY5u1/81xP5V9ukZ4N2yeMW0dydLS5cjWj/St5ELeMAza3jeqtJtD
j36vbnmy2c5pPaGLAK8BJpMXT/R2TkmmKD004zcfqF2S3SgbGzdgO1zMZzq9KJpM
woRKZtLuglDajedsDEBBcKotBhlN2+C/sQlFuL1mX4zitk9ajr0qYUB1+JqOeg5f
qwPsDLT077ADpxd7lVIMcm+OqbduP5KWkBKYHpn7lJcLe1eqVMMzceJroW42zhVG
Dq8Iln26bbU9Wx6FSPFCUcHEzHRHUfXmu07HN9U0X++0QgWjrmBQQLooGFB/bR4a
edBrPpVas6xE4/brjgFX3gOKtv8xYg==
=ewDV
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify pre-content notification support from Jan Kara:
"This introduces a new fsnotify event (FS_PRE_ACCESS) that gets
generated before a file contents is accessed.
The event is synchronous so if there is listener for this event, the
kernel waits for reply. On success the execution continues as usual,
on failure we propagate the error to userspace. This allows userspace
to fill in file content on demand from slow storage. The context in
which the events are generated has been picked so that we don't hold
any locks and thus there's no risk of a deadlock for the userspace
handler.
The new pre-content event is available only for users with global
CAP_SYS_ADMIN capability (similarly to other parts of fanotify
functionality) and it is an administrator responsibility to make sure
the userspace event handler doesn't do stupid stuff that can DoS the
system.
Based on your feedback from the last submission, fsnotify code has
been improved and now file->f_mode encodes whether pre-content event
needs to be generated for the file so the fast path when nobody wants
pre-content event for the file just grows the additional file->f_mode
check. As a bonus this also removes the checks whether the old
FS_ACCESS event needs to be generated from the fast path. Also the
place where the event is generated during page fault has been moved so
now filemap_fault() generates the event if and only if there is no
uptodate folio in the page cache.
Also we have dropped FS_PRE_MODIFY event as current real-world users
of the pre-content functionality don't really use it so let's start
with the minimal useful feature set"
* tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits)
fanotify: Fix crash in fanotify_init(2)
fs: don't block write during exec on pre-content watched files
fs: enable pre-content events on supported file systems
ext4: add pre-content fsnotify hook for DAX faults
btrfs: disable defrag on pre-content watched files
xfs: add pre-content fsnotify hook for DAX faults
fsnotify: generate pre-content permission event on page fault
mm: don't allow huge faults for files with pre content watches
fanotify: disable readahead if we have pre-content watches
fanotify: allow to set errno in FAN_DENY permission response
fanotify: report file range info with pre-content events
fanotify: introduce FAN_PRE_ACCESS permission event
fsnotify: generate pre-content permission event on truncate
fsnotify: pass optional file access range in pre-content event
fsnotify: introduce pre-content permission events
fanotify: reserve event bit of deprecated FAN_DIR_MODIFY
fanotify: rename a misnamed constant
fanotify: don't skip extra event info if no info_mode is set
fsnotify: check if file is actually being watched for pre-content events on open
fsnotify: opt-in for permission events at file open time
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmeOu1YACgkQ6rmadz2v
bTrrHxAAn6eqEsluWnDlzhI0OGsPjvgS00sf+MOeqiXYeS2eJ8yJuKifp38+nIQZ
lIplsWU2ReUY20eizPqLPnQ7TXZGvLgp08E8yHUoZ0siWanqr9iDRfbZCCNrDMNm
lMqeR1SLapMws2R/UX9JbvPn2ajIJ6Lb4wxenTfdlW6q+0hAGM6Dt0k/jBod+quq
/oo+xwG3L0q4APBovJfiAFN2z6IYN03b+zLiOrpIJtMACGewEXnl3m4mkL8ZM/FV
nZGPIxIUPXCpKTGEkNqxfkrnHN2wZQ4ZSKEJ6lhEEp4jrgCVITaGZ/E7jlx6fZoj
bbd4YMonIPo9Nhim8p1dt8yYBhKKiE5IXIq0GqlMv5+MvAN8ylrlydpsouW1fu66
hZ1W1BxbxmrgyF0Bwo9JPOMhBHwMrmD6iH9LgiMpZf0ASeF+q9cJpoSOU5j5E9XB
LpLIRf5jYTd4wZjhDmrQREReLo+Bng9DlCBu+jjh2+YTz6l6Qed+ETpENcd7lL5i
IHZVbgD2RVPNJoUfdrd763HfYfDTk+50MF5FIMEyfKHz11if0E/LhBMzto22hm6b
2f8ruj/8yvg8s2dxEP3ySQgcnynlwEnGxLenUVv7uEOYKeWri1rq+fvTK5ne1OLK
oHnTlkViwQb74c0r8cFW+nkyfUYTfhhBAql14rl/fMjGDO2KZ10=
=f2CA
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"A smaller than usual release cycle.
The main changes are:
- Prepare selftest to run with GCC-BPF backend (Ihor Solodrai)
In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile
only mode. Half of the tests are failing, since support for
btf_decl_tag is still WIP, but this is a great milestone.
- Convert various samples/bpf to selftests/bpf/test_progs format
(Alexis Lothoré and Bastien Curutchet)
- Teach verifier to recognize that array lookup with constant
in-range index will always succeed (Daniel Xu)
- Cleanup migrate disable scope in BPF maps (Hou Tao)
- Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao)
- Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin
KaFai Lau)
- Refactor verifier lock support (Kumar Kartikeya Dwivedi)
This is a prerequisite for upcoming resilient spin lock.
- Remove excessive 'may_goto +0' instructions in the verifier that
LLVM leaves when unrolls the loops (Yonghong Song)
- Remove unhelpful bpf_probe_write_user() warning message (Marco
Elver)
- Add fd_array_cnt attribute for prog_load command (Anton Protopopov)
This is a prerequisite for upcoming support for static_branch"
* tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits)
selftests/bpf: Add some tests related to 'may_goto 0' insns
bpf: Remove 'may_goto 0' instruction in opt_remove_nops()
bpf: Allow 'may_goto 0' instruction in verifier
selftests/bpf: Add test case for the freeing of bpf_timer
bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
bpf: Free special fields after unlock in htab_lru_map_delete_node()
tools: Sync if_xdp.h uapi tooling header
libbpf: Work around kernel inconsistently stripping '.llvm.' suffix
bpf: selftests: verifier: Add nullness elision tests
bpf: verifier: Support eliding map lookup nullness
bpf: verifier: Refactor helper access type tracking
bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
bpf: verifier: Add missing newline on verbose() call
selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED
libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED
libbpf: Fix return zero when elf_begin failed
selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test
veristat: Load struct_ops programs only once
...
This branch contains basically the same two patches as last time:
1. A patch by Paul Moore to remove the cap_mmap_file() hook, as it simply
returned the default return value and so doesn't need to exist.
2. A patch by Jordan Rome to add a trace event for cap_capable(), updated
to address your feedback during the last cycle.
Both patches have been sitting in linux-next since 6.13-rc1 with no
issues.
Signed-off-by: Serge E. Hallyn <serge@hallyn.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEqb0/8XByttt4D8+UNXDaFycKziQFAmeOxO0ACgkQNXDaFycK
ziSbqwf9FmQbCG9zpgHhAaODz8GXPn1EYm0TfabbfuG+hRvTQLt/7eVuLB6Tt69l
lx7zM8HUjZLQW8qsDc1nmdnrvvLK6z8e97yGBBMG4uzFyzsCgNQowyDRz69IOG+l
eTCUMXOQXYtO4OYm7pECBeUos8yCOpW7vdZzyyKInw0A8JXy98K880HlYoiYc7wI
9xXtKWTmqry156llwIYU/opo/Pag480Y2hzP9x5EqvTNqJ/iMEUb2Dswhf+53dOY
HePwerTu1BYYupSC2gl3ujl/m6R2BroLBmOMApLiAhNtRZCm+J6rkhmMW9cFqyxZ
Nyw8nAuc08cAKoobAdggD+cgFy9e6g==
=WKYe
-----END PGP SIGNATURE-----
Merge tag 'caps-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux
Pull capabilities updates from Serge Hallyn:
- remove the cap_mmap_file() hook, as it simply returned the default
return value and so doesn't need to exist (Paul Moore)
- add a trace event for cap_capable() (Jordan Rome)
* tag 'caps-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
security: add trace event for cap_capable
capabilities: remove cap_mmap_file()
- Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)
- Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
(Mickaël Salaün)
- Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ4hO7wAKCRA2KwveOeQk
u4l+AP9UHO1KwMn3aOt6uFPj7omaoY0vpcB1rx/x5s4efNFHOAD/QjY0f+ND+HzF
mKLYOIeacGEQi7TNhpnOkGjz6jzSiwg=
=sMhZ
-----END PGP SIGNATURE-----
Merge tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull AT_EXECVE_CHECK from Kees Cook:
- Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)
- Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
(Mickaël Salaün)
- Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)
* tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ima: instantiate the bprm_creds_for_exec() hook
samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
selftests: ktap_helpers: Fix uninitialized variable
samples/check-exec: Add set-exec
selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK
selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits
security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
Tetsuo Handa (3):
tomoyo: automatically use patterns for several situations in learning mode
tomoyo: use realpath if symlink's pathname refers to procfs
tomoyo: don't emit warning in tomoyo_write_control()
security/tomoyo/common.c | 32 +++++++++++++++++++++++++++++++-
security/tomoyo/domain.c | 11 +++++++++--
2 files changed, 40 insertions(+), 3 deletions(-)
-----BEGIN PGP SIGNATURE-----
iQJXBAABCABBFiEEQ8gzaWI9etOpbC/HQl8SjQxk9SoFAmeRdEUjHHBlbmd1aW4t
a2VybmVsQGktbG92ZS5zYWt1cmEubmUuanAACgkQQl8SjQxk9SqbCRAAg0aZ5pu/
p9uJhsZFzBZzTE8BcFRdMWKS+t5+RGE1KqHoNc/YsP73eb9ui/N1jckJjsYCFh4m
NsqWvadVMtma4qTcAw5Um48kZ898QVoq9kFBpxCklRnH0eGJsn2kNjNmgywYKK9q
TOSKTzJM2DfOEjrSdepvthp/FK76nbHdrKREBYsslPcSLFx5c5qBqx60l6p8hS0A
1+zKQM8m9H2nZ7KnuCr0e4Lhiq1W6C/BaIN/+NzHxbPVXl55nE1i4OXM3WSTC92H
i6Fe+eKv5B9B5zaQwWbAGhfONcL8pkmgwTvPFtZLj7y5t5HW8TxmPMFZ/jE5E1Kt
AFFLWvXIlDZ5aYU4pbNUAA4RCNm2hWw1vniPgLPiJte2ft2AdSxBHO9J6w/W4xs9
W44TvsFXDfJgKsoY4Hy5HR3VxleIFA5tvIyWp1ONp7rZ+4WKolTQYW0hEz26V30K
+urTZFusv3+229z3ZM8qsPUbL/WLML1jRw8E3AtnBMRJ92c8tpFaBA8b+o829zz8
QAFWeca3psgPqxHPZsnNeM8lOLOrtBik3+CNpmXJLq6KL2BrI9eRDUntBwHbBoRu
/hAJX0feI/4yoixzr5sOk5UMNlp+G5CDBQb9NkTynUnr3Q1KRcgKWC4f8cAQlTb2
6OfT8xw2AID0/XXxJVLOdDRTPnYEVU3VH9c=
=wawL
-----END PGP SIGNATURE-----
Merge tag 'tomoyo-pr-20250123' of git://git.code.sf.net/p/tomoyo/tomoyo
Pull tomoyo updates from Tetsuo Handa:
"Small changes to improve usability"
* tag 'tomoyo-pr-20250123' of git://git.code.sf.net/p/tomoyo/tomoyo:
tomoyo: automatically use patterns for several situations in learning mode
tomoyo: use realpath if symlink's pathname refers to procfs
tomoyo: don't emit warning in tomoyo_write_control()
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ5EMhBAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSMv0BAMOG2TFwq+UhbtxtL6pM7qzxfdWg6GR/t4t8
MFasAcCaAQDtTnW0HymHge8k7JFgWHHp0JBu7V7dhFrdJoS+718aDA==
=1Hfr
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This mostly factors out some Landlock code and prepares for upcoming
audit support.
Because files with invalid modes might be visible after filesystem
corruption, Landlock now handles those weird files too.
A few sample and test issues are also fixed"
* tag 'landlock-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add layout1.umount_sandboxer tests
selftests/landlock: Add wrappers.h
selftests/landlock: Fix error message
landlock: Optimize file path walks and prepare for audit support
selftests/landlock: Add test to check partial access in a mount tree
landlock: Align partial refer access checks with final ones
landlock: Simplify initially denied access rights
landlock: Move access types
landlock: Factor out check_access_path()
selftests/landlock: Fix build with non-default pthread linking
landlock: Use scoped guards for ruleset in landlock_add_rule()
landlock: Use scoped guards for ruleset
landlock: Constify get_mode_access()
landlock: Handle weird files
samples/landlock: Fix possible NULL dereference in parse_path()
selftests/landlock: Remove unused macros in ptrace_test.c
Here's the keys changes for 6.14-rc1.
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZ49r8AAKCRAaerohdGur
0qiFAP9Av1djr8/HA+VidLPOe5neBkRYSuX54yNz3TGBOpjmvwD+L2YJfJYcBAd+
sku1MLkpLbMmBCLSTNhqjSJWxx4vngg=
=iF42
-----END PGP SIGNATURE-----
Merge tag 'keys-next-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys updates from Jarkko Sakkinen.
Avoid using stack addresses for sg lists. And a cleanup.
* tag 'keys-next-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: dcp: fix improper sg use with CONFIG_VMAP_STACK=y
keys: drop shadowing dead prototype
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmeQE9YUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXO66BAAmmhqw3Cm94u6QOjYTQCNrHpmZVpv
atPfW0EIlvYWuLGvyQZVh/2SKrYm0o7iNlTlOyMcBV9BfFUZd6vepX3L+ylhMacG
L2lg5jgl11zUZdo8m++38kCABbdMexhTzgtfdAm+w2RmLRoOzXjBOCDx18sWVtCy
aV3DQAvl6qdU/Y5U6PccmOCwgFVQEmWzQ4A1CMq696Fybr4EzTjI8mCCnotHWarz
cgfMHHf5RYR4M4VdmxWo3MR6y6Qiq19/Vsy43YP/G/A0Ad+mfLqhHmc27+Mx2bDk
IfdrvOOjaVxiEeIJe6mOePcW9p1D9q4OrPmBZHxN1+R3ck7k0MgVLIDvSzLDMbbj
3PSZx2UFk1xz+B0x3hvzhAXJ5YfbAjPj1Z65HlLIQBFBo5jvLWMrLxdpcH4eRdhT
ovTqFuB4wQwYOeeKlXlnCWFitsAynjo9qGxcqjxG63geJfnBlsnLoYIa0g+cN6Uf
3Ty0+zeHDfCajj40buvtOWv98CyAMF5vBopnr18Kfo4upp6pgVERVBoGy2Yw020I
yItiRhi1fpV31J8Gxrd7WA2/OmZZLISnAJtKMSsyd+hBihfOjVZ9LhKKYJk4vH+X
mWVOdplHpCDe6y2EJE4EmaNwQCOVJfg4/Xvh9fghELdBdc91wFGPDO36AitDBNr8
/o13aUvFarsEmtA=
=UJsr
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Extended permissions supported in conditional policy
The SELinux extended permissions, aka "xperms", allow security admins
to target individuals ioctls, and recently netlink messages, with
their SELinux policy. Adding support for conditional policies allows
admins to toggle the granular xperms using SELinux booleans, helping
pave the way for greater use of xperms in general purpose SELinux
policies. This change bumps the maximum SELinux policy version to 34.
- Fix a SCTP/SELinux error return code inconsistency
Depending on the loaded SELinux policy, specifically it's
EXTSOCKCLASS support, the bind(2) LSM/SELinux hook could return
different error codes due to the SELinux code checking the socket's
SELinux object class (which can vary depending on EXTSOCKCLASS) and
not the socket's sk_protocol field. We fix this by doing the obvious,
and looking at the sock->sk_protocol field instead of the object
class.
- Makefile fixes to properly cleanup av_permissions.h
Add av_permissions.h to "targets" so that it is properly cleaned up
using the kbuild infrastructure.
- A number of smaller improvements by Christian Göttsche
A variety of straightforward changes to reduce code duplication,
reduce pointer lookups, migrate void pointers to defined types,
simplify code, constify function parameters, and correct iterator
types.
* tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: make more use of str_read() when loading the policy
selinux: avoid unnecessary indirection in struct level_datum
selinux: use known type instead of void pointer
selinux: rename comparison functions for clarity
selinux: rework match_ipv6_addrmask()
selinux: constify and reconcile function parameter names
selinux: avoid using types indicating user space interaction
selinux: supply missing field initializers
selinux: add netlink nlmsg_type audit message
selinux: add support for xperms in conditional policies
selinux: Fix SCTP error inconsistency in selinux_socket_bind()
selinux: use native iterator types
selinux: add generated av_permissions.h to targets
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmeQFBoUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPvcA//XCdwMz0bGtWKv58nuyP8vkQx08n6
//olz/O8te3uWK5O3kRiarzFLwH8qsHQ6A7GYalwwix34hatR4ndJE0Y/guVRWa1
+aBmJxJ7Jm/q3fvpAEfqiSgreuE6kBoztlDOWEq+hUQGu4qfnQGm2EnvbvfFrAmN
VheOfIQSU2KCL/Scc3FGnF6uru4WrqN0JJ9RbvrEpfdQgmcyTGLnQsZLljutWSIq
kDWkteIr7cj3O9J45zpxZsTftvYSgVn/y1iKeXbHI4DBA1eheK12vsHB9AADKI1J
GwHxOrnLpZtv+ICUKqcfFTmWTl+NmfJJurAT5KXKdBjL3xM5MoJlBvK1A5qE9CMo
LaHVG/TZR2MmBaoM3EN+gvWhDgWlvT02Q/0cYaafTlVLMez3HtfctxN6OnCvTXTB
Y8dqYClhhlBm/mHQwYfMoeKw4MftUpzEqBd1Nj7Qe8dbP0f/62Ca3K2B3D6Rf8QV
pj3ryMlSWYV9mdTerruLNQexTGoN7l66jPwzdWpTbFeL3WmNtfCako8OZGbXgPIu
Iahm3P+jnSVx8ZQro2c9zwdKXI5xiI335pCBbDZ8aX+JAsfj0OofHsFx5Q5diber
M7tAEhxDqRisbpz7Ei+/LOAEGg2Z619XKg8ks4z6Y4P5PF7zEgeWTkZJk2iLbxXe
6LLOjmF7LLw+G4M=
=fgyr
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Improved handling of LSM "secctx" strings through lsm_context struct
The LSM secctx string interface is from an older time when only one
LSM was supported, migrate over to the lsm_context struct to better
support the different LSMs we now have and make it easier to support
new LSMs in the future.
These changes explain the Rust, VFS, and networking changes in the
diffstat.
- Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are
enabled
Small tweak to be a bit smarter about when we build the LSM's common
audit helpers.
- Check for absurdly large policies from userspace in SafeSetID
SafeSetID policies rules are fairly small, basically just "UID:UID",
it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which
helps quiet a number of syzbot related issues. While work is being
done to address the syzbot issues through other mechanisms, this is a
trivial and relatively safe fix that we can do now.
- Various minor improvements and cleanups
A collection of improvements to the kernel selftests, constification
of some function parameters, removing redundant assignments, and
local variable renames to improve readability.
* tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: initialize local array before use to quiet static analysis
safesetid: check size of policy writes
net: corrections for security_secid_to_secctx returns
lsm: rename variable to avoid shadowing
lsm: constify function parameters
security: remove redundant assignment to return variable
lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set
selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test
binder: initialize lsm_context structure
rust: replace lsm context+len with lsm_context
lsm: secctx provider check on release
lsm: lsm_context in security_dentry_init_security
lsm: use lsm_context in security_inode_getsecctx
lsm: replace context+len with lsm_context
lsm: ensure the correct LSM context releaser
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZ4rmvxQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5avvAP4tzjNdVp3tFeq9bA8gQZEJ74E6q/6a
Qb18xTn54hxGXAEAuotzwJiandLBk/3hkHIE0BbyzmULiVMpos4qVuUmTwI=
=ZvaD
-----END PGP SIGNATURE-----
Merge tag 'integrity-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"There's just a couple of changes: two kernel messages addressed, a
measurement policy collision addressed, and one policy cleanup.
Please note that the contents of the IMA measurement list is
potentially affected. The builtin tmpfs IMA policy rule change might
introduce additional measurements, while detecting a reboot might
eliminate some measurements"
* tag 'integrity-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: ignore suffixed policy rule comments
ima: limit the builtin 'tcb' dont_measure tmpfs policy rule
ima: kexec: silence RCU list traversal warning
ima: Suspend PCR extends and log appends when rebooting
With vmalloc stack addresses enabled (CONFIG_VMAP_STACK=y) DCP trusted
keys can crash during en- and decryption of the blob encryption key via
the DCP crypto driver. This is caused by improperly using sg_init_one()
with vmalloc'd stack buffers (plain_key_blob).
Fix this by always using kmalloc() for buffers we give to the DCP crypto
driver.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 0e28bf61a5 ("KEYS: trusted: dcp: fix leak of blob encryption key")
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pRjQAKCRCRxhvAZXjc
omUyAP9k31Qr7RY1zNtmpPfejqc+3Xx+xXD7NwHr+tONWtUQiQEA/F94qU2U3ivS
AzyDABWrEQ5ZNsm+Rq2Y3zyoH7of3ww=
=s3Bu
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support caching symlink lengths in inodes
The size is stored in a new union utilizing the same space as
i_devices, thus avoiding growing the struct or taking up any more
space
When utilized it dodges strlen() in vfs_readlink(), giving about
1.5% speed up when issuing readlink on /initrd.img on ext4
- Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
If a file system supports uncached buffered IO, it may set
FOP_DONTCACHE and enable support for RWF_DONTCACHE.
If RWF_DONTCACHE is attempted without the file system supporting
it, it'll get errored with -EOPNOTSUPP
- Enable VBOXGUEST and VBOXSF_FS on ARM64
Now that VirtualBox is able to run as a host on arm64 (e.g. the
Apple M3 processors) we can enable VBOXSF_FS (and in turn
VBOXGUEST) for this architecture.
Tested with various runs of bonnie++ and dbench on an Apple MacBook
Pro with the latest Virtualbox 7.1.4 r165100 installed
Cleanups:
- Delay sysctl_nr_open check in expand_files()
- Use kernel-doc includes in fiemap docbook
- Use page->private instead of page->index in watch_queue
- Use a consume fence in mnt_idmap() as it's heavily used in
link_path_walk()
- Replace magic number 7 with ARRAY_SIZE() in fc_log
- Sort out a stale comment about races between fd alloc and dup2()
- Fix return type of do_mount() from long to int
- Various cosmetic cleanups for the lockref code
Fixes:
- Annotate spinning as unlikely() in __read_seqcount_begin
The annotation already used to be there, but got lost in commit
52ac39e5db ("seqlock: seqcount_t: Implement all read APIs as
statement expressions")
- Fix proc_handler for sysctl_nr_open
- Flush delayed work in delayed fput()
- Fix grammar and spelling in propagate_umount()
- Fix ESP not readable during coredump
In /proc/PID/stat, there is the kstkesp field which is the stack
pointer of a thread. While the thread is active, this field reads
zero. But during a coredump, it should have a valid value
However, at the moment, kstkesp is zero even during coredump
- Don't wake up the writer if the pipe is still full
- Fix unbalanced user_access_end() in select code"
* tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
gfs2: use lockref_init for qd_lockref
erofs: use lockref_init for pcl->lockref
dcache: use lockref_init for d_lockref
lockref: add a lockref_init helper
lockref: drop superfluous externs
lockref: use bool for false/true returns
lockref: improve the lockref_get_not_zero description
lockref: remove lockref_put_not_zero
fs: Fix return type of do_mount() from long to int
select: Fix unbalanced user_access_end()
vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64
pipe_read: don't wake up the writer if the pipe is still full
selftests: coredump: Add stackdump test
fs/proc: do_task_stat: Fix ESP not readable during coredump
fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
fs: sort out a stale comment about races between fd alloc and dup2
fs: Fix grammar and spelling in propagate_umount()
fs: fc_log replace magic number 7 with ARRAY_SIZE()
fs: use a consume fence in mnt_idmap()
file: flush delayed work in delayed fput()
...
dbus permission queries need to be synced with fine grained unix
mediation to avoid potential policy regressions. To ensure that
dbus queries don't result in a case where fine grained unix mediation
is not being applied but dbus mediation is check the loaded policy
support ABI and abort the query if policy doesn't support the
v9 ABI.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Fine grained unix mediation in Ubuntu used ABI v7, and policy using
this has propogated onto systems where fine grained unix mediation was
not supported. The userspace policy compiler supports downgrading
policy so the policy could be shared without changes.
Unfortunately this had the side effect that policy was not updated for
the none Ubuntu systems and enabling fine grained unix mediation on
those systems means that a new kernel can break a system with existing
policy that worked with the previous kernel. With fine grained af_unix
mediation this regression can easily break the system causing boot to
fail, as it affect unix socket files, non-file based unix sockets, and
dbus communication.
To aoid this regression move fine grained af_unix mediation behind
a new abi. This means that the system's userspace and policy must
be updated to support the new policy before it takes affect and
dropping a new kernel on existing system will not result in a
regression.
The abi bump is done in such a way as existing policy can be activated
on the system by changing the policy abi declaration and existing unix
policy rules will apply. Policy then only needs to be incrementally
updated, can even be backported to existing Ubuntu policy.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Extend af_unix mediation to support fine grained controls based on
the type (abstract, anonymous, fs), the address, and the labeling
on the socket.
This allows for using socket addresses to label and the socket and
control which subjects can communicate.
The unix rule format follows standard apparmor rules except that fs
based unix sockets can be mediated by existing file rules. None fs
unix sockets can be mediated by a unix socket rule. Where The address
of an abstract unix domain socket begins with the @ character, similar
to how they are reported (as paths) by netstat -x. The address then
follows and may contain pattern matching and any characters including
the null character. In apparmor null characters must be specified by
using an escape sequence \000 or \x00. The pattern matching is the
same as is used by file path matching so * will not match / even
though it has no special meaning with in an abstract socket name. Eg.
allow unix addr=@*,
Autobound unix domain sockets have a unix sun_path assigned to them by
the kernel, as such specifying a policy based address is not possible.
The autobinding of sockets can be controlled by specifying the special
auto keyword. Eg.
allow unix addr=auto,
To indicate that the rule only applies to auto binding of unix domain
sockets. It is important to note this only applies to the bind
permission as once the socket is bound to an address it is
indistinguishable from a socket that have an addr bound with a
specified name. When the auto keyword is used with other permissions
or as part of a peer addr it will be replaced with a pattern that can
match an autobound socket. Eg. For some kernels
allow unix rw addr=auto,
It is important to note, this pattern may match abstract sockets that
were not autobound but have an addr that fits what is generated by the
kernel when autobinding a socket.
Anonymous unix domain sockets have no sun_path associated with the
socket address, however it can be specified with the special none
keyword to indicate the rule only applies to anonymous unix domain
sockets. Eg.
allow unix addr=none,
If the address component of a rule is not specified then the rule
applies to autobind, abstract and anonymous sockets.
The label on the socket can be compared using the standard label=
rule conditional. Eg.
allow unix addr=@foo peer=(label=bar),
see man apparmor.d for full syntax description.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Rework match_prot into a common fn that can be shared by all the
networking rules. This will provide compatibility with current socket
mediation, via the early bailout permission encoding.
Signed-off-by: John Johansen <john.johansen@canonical.com>
There is no need for the kern check to be in the critical section,
it only complicates the code and slows down the case where the
socket is being created by the kernel.
Lifting it out will also allow socket_create to share common template
code, with other socket_permission checks.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The af_select macro just adds a layer of unnecessary abstraction that
makes following what the code is doing harder.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Currently the caps encoding is very limited and can't be used with
conditionals. Allow capabilities to be mediated by the state
machine. This will allow us to add conditionals to capabilities that
aren't possible with the current encoding.
This patch only adds support for using the state machine and retains
the old encoding lookup as part of the runtime mediation code to
support older policy abis. A follow on patch will move backwards
compatibility to a mapping function done at policy load time.
Signed-off-by: John Johansen <john.johansen@canonical.com>
x_table_lookup currently does stacking during label_parse() if the
target specifies a stack but its only caller ensures that it will
never be used with stacking.
Refactor to slightly simplify the code in x_to_label(), this
also fixes a long standing problem where x_to_labels check on stacking
is only on the first element to the table option list, instead of
the element that is found and used.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Previously apparmor has only sent SIGKILL but there are cases where
it can be useful to send a different signal. Allow the profile
to optionally specify a different value.
Signed-off-by: John Johansen <john.johansen@canonical.com>
This is a step towards merging the file and policy state machines.
With the switch to extended permissions the state machine's ACCEPT2
table became unused freeing it up to store state specific flags. The
first flags to be stored are FLAG_OWNER and FLAG other which paves the
way towards merging the file and policydb perms into a single
permission table.
Currently Lookups based on the objects ownership conditional will
still need separate fns, this will be address in a following patch.
Signed-off-by: John Johansen <john.johansen@canonical.com>
In order to speed up the mediated check, precompute and store the
result as a bit per class type. This will not only allow us to
speed up the mediation check but is also a step to removing the
unconfined special cases as the unconfined check can be replaced
with the generic label_mediates() check.
Note: label check does not currently work for capabilities and resources
which need to have their mediation updated first.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Provide semantics, via fn names, for some checks being done in
file_perm(). This is a preparatory patch for improvements to both
permission caching and delegation, where the check will become more
involved.
Signed-off-by: John Johansen <john.johansen@canonical.com>
There does not need to be an explicit restriction that unconfined
can't use change_hat. Traditionally unconfined doesn't have hats
so change_hat could not be used. But newer unconfined profiles have
the potential of having hats, and even system unconfined will be
able to be replaced with a profile that allows for hats.
To remain backwards compitible with expected return codes, continue
to return -EPERM if the unconfined profile does not have any hats.
Signed-off-by: John Johansen <john.johansen@canonical.com>
labels containing more than one entry need to accumulate flag info
from profiles that the label is constructed from. This is done
correctly for labels created by a merge but is not being done for
labels created by an update or directly created via a parse.
This technically is a bug fix, however the effect in current code is
to cause early unconfined bail out to not happen (ie. without the fix
it is slower) on labels that were created via update or a parse.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Currently signal mediation is using a hard coded form of the
RULE_MEDIATES check. This hides the intended semantics, and means this
specific check won't pickup any changes or improvements made in the
RULE_MEDIATES check. Switch to using RULE_MEDIATES().
Signed-off-by: John Johansen <john.johansen@canonical.com>
profile_af_perm and profile_af_sk_perm are only ever called after
checking that the profile is not unconfined. So we can drop these
redundant checks.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Remove another case of code duplications. Switch to using the generic
routine instead of the current custom checks.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Make it so apparmor debug output can be controlled by class flags
as well as the debug flag on labels. This provides much finer
control at what is being output so apparmor doesn't flood the
logs with information that is not needed, making it hard to find
what is important.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Remove hard-coded strings by using the str_yes_no() helper function.
Fix a typo in a comment: s/unpritable/unprintable/
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Always synchronize access_masked_parent* with access_request_parent*
according to allowed_parent*. This is required for audit support to be
able to get back to the reason of denial.
In a rename/link action, instead of always checking a rule two times for
the same parent directory of the source and the destination files, only
check it when an action on a child was not already allowed. This also
enables us to keep consistent allowed_parent* status, which is required
to get back to the reason of denial.
For internal mount points, only upgrade allowed_parent* to true but do
not wrongfully set both of them to false otherwise. This is also
required to get back to the reason of denial.
This does not impact the current behavior but slightly optimize code and
prepare for audit support that needs to know the exact reason why an
access was denied.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250108154338.1129069-14-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Fix a logical issue that could have been visible if the source or the
destination of a rename/link action was allowed for either the source or
the destination but not both. However, this logical bug is unreachable
because either:
- the rename/link action is allowed by the access rights tied to the
same mount point (without relying on access rights in a parent mount
point) and the access request is allowed (i.e. allow_parent1 and
allow_parent2 are true in current_check_refer_path),
- or a common rule in a parent mount point updates the access check for
the source and the destination (cf. is_access_to_paths_allowed).
See the following layout1.refer_part_mount_tree_is_allowed test that
work with and without this fix.
This fix does not impact current code but it is required for the audit
support.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250108154338.1129069-12-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Upgrade domain's handled access masks when creating a domain from a
ruleset, instead of converting them at runtime. This is more consistent
and helps with audit support.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250108154338.1129069-7-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Move LANDLOCK_ACCESS_FS_INITIALLY_DENIED, access_mask_t, struct
access_mask, and struct access_masks_all to a dedicated access.h file.
Rename LANDLOCK_ACCESS_FS_INITIALLY_DENIED to
_LANDLOCK_ACCESS_FS_INITIALLY_DENIED to make it clear that it's not part
of UAPI. Add some newlines when appropriate.
This file will be extended with following commits, and it will help to
avoid dependency loops.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250108154338.1129069-6-mic@digikod.net
[mic: Fix rebase conflict because of the new cleanup headers]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Simplify error handling by replacing goto statements with automatic
calls to landlock_put_ruleset() when going out of scope.
This change depends on the TCP support.
Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Cc: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250113161112.452505-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Simplify error handling by replacing goto statements with automatic
calls to landlock_put_ruleset() when going out of scope.
This change will be easy to backport to v6.6 if needed, only the
kernel.h include line conflicts. As for any other similar changes, we
should be careful when backporting without goto statements.
Add missing include file.
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250113161112.452505-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Since task->comm is guaranteed to be NUL-terminated, we can print it
directly without the need to copy it into a separate buffer. This
simplifies the code and avoids unnecessary operations.
Link: https://lkml.kernel.org/r/20241219023452.69907-5-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Acked-by: Kees Cook <kees@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: "André Almeida" <andrealmeid@igalia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Simplify the call sites, and enable future string validation in a single
place.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Store the owned member of type struct mls_level directly in the parent
struct instead of an extra heap allocation.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Improve type safety and readability by using the known type.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The functions context_cmp(), mls_context_cmp() and ebitmap_cmp() are not
traditional C style compare functions returning -1, 0, and 1 for less
than, equal, and greater than; they only return whether their arguments
are equal.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Constify parameters, add size hints, and simplify control flow.
According to godbolt the same assembly is generated.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Align the parameter names between declarations and definitions, and
constify read-only parameters.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: tweak the subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Integer types starting with a double underscore, like __u32, are
intended for usage of variables interacting with user-space.
Just use the plain variant.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Please clang by supplying the missing field initializers in the
secclass_map variable and sel_fill_super() function.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: tweak subj and commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmd9cIwUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOUCA//cioM7Z3f2NLzvV0O8zmQcnaEV1U3
8RumFdUi+n2g7jUTgkOSJoPEsY7loeE/n/7tqD0D7r0nD5sgS4CsFtVdLfz/yVH5
8tIslrJR18h3H70riL1fev7giNbdrftGJA1W0VSh2cwSqJUUpJ79skrZ9HgGOILW
vDHoxv3IlzyNrc1XwfjaXXhr7xQMFRghjUGSBpIXk56iVeZebI/Bd1CcHiI1JPBl
cea73AF8CiGvgjXh2IUPmxjVzJiM2Whjea0QT6RDg8dx2/NJ11jaYHNsKvjT14Z/
H9s3wx4rl0+qwkmmp0zO3bK4It1X0I1XWrjvhjIxkAq7nZ0befNIJCfK7JXBT2TI
vGP7kGDbIaN1183CcnXfZ3cpvQTzEanRn+CzLhXVuRgfFoIGIPMxYGRWYunNVjiN
RW5ptv2pFmNRlCEthTAzwbzxk5ISlTspnsgyo5O5RhB2HGPbCyhEQN8K4rfJxzUy
nSznOi8+TXdHpOwGFnU0Olhem2Yj0j0wo6c5nvB5+0NDVGFXULget2vHv00FTfOp
uglGECdKf4Uc0+g6ZRrGjdxJtkEyZr6QAL43ccmQ+2WMb95R2nLq9YjOTpBMg1pD
zUfcW9qensxAYucloCtHOdOAdd62udEVSERhna7ZPApUjVzaeD50l8SsasaWBxDL
m6sl3gRDKWW9m2k=
=ojzK
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20250107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A single SELinux patch to address a problem with a single domain using
multiple xperm classes"
* tag 'selinux-pr-20250107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: match extended permissions to their base permissions
The "file_pattern" keyword was used for automatically recording patternized
pathnames when using the learning mode. This keyword was removed in TOMOYO
2.4 because it is impossible to predefine all possible pathname patterns.
However, since the numeric part of proc:/$PID/ , pipe:[$INO] and
socket:[$INO] has no meaning except $PID == 1, automatically replacing
the numeric part with \$ pattern helps reducing frequency of restarting
the learning mode due to hitting the quota.
Since replacing one digit with \$ pattern requires enlarging string buffer,
and several programs access only $PID == 1, replace only two or more digits
with \$ pattern.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
The static code analysis tool "Coverity Scan" pointed the following
details out for further development considerations:
CID 1486102: Uninitialized scalar variable (UNINIT)
uninit_use_in_call: Using uninitialized value *temp when calling
strlen.
Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com>
[PM: edit/reformat the description, subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
syzbot attempts to write a buffer with a large size to a sysfs entry
with writes handled by handle_policy_update(), triggering a warning
in kmalloc.
Check the size specified for write buffers before allocating.
Reported-by: syzbot+4eb7a741b3216020043a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4eb7a741b3216020043a
Signed-off-by: Leo Stone <leocstone@gmail.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The function dump_common_audit_data() contains two variables with the
name comm: one declared at the top and one nested one. Rename the
nested variable to improve readability and make future refactorings
of the function less error prone.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: description long line removal, line wrap cleanup, merge fuzz]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The functions print_ipv4_addr() and print_ipv6_addr() are called with
string literals and do not modify these parameters internally.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: cleaned up the description to remove long lines]
Signed-off-by: Paul Moore <paul@paul-moore.com>
In the case where rc is equal to EOPNOTSUPP it is being reassigned a
new value of zero that is never read. The following continue statement
loops back to the next iteration of the lsm_for_each_hook loop and
rc is being re-assigned a new value from the call to getselfattr.
The assignment is redundant and can be removed.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
[PM: subj tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
In commit d1d991efaf ("selinux: Add netlink xperm support") a new
extended permission was added ("nlmsg"). This was the second extended
permission implemented in selinux ("ioctl" being the first one).
Extended permissions are associated with a base permission. It was found
that, in the access vector cache (avc), the extended permission did not
keep track of its base permission. This is an issue for a domain that is
using both extended permissions (i.e., a domain calling ioctl() on a
netlink socket). In this case, the extended permissions were
overlapping.
Keep track of the base permission in the cache. A new field "base_perm"
is added to struct extended_perms_decision to make sure that the
extended permission refers to the correct policy permission. A new field
"base_perms" is added to struct extended_perms to quickly decide if
extended permissions apply.
While it is in theory possible to retrieve the base permission from the
access vector, the same base permission may not be mapped to the same
bit for each class (e.g., "nlmsg" is mapped to a different bit for
"netlink_route_socket" and "netlink_audit_socket"). Instead, use a
constant (AVC_EXT_IOCTL or AVC_EXT_NLMSG) provided by the caller.
Fixes: d1d991efaf ("selinux: Add netlink xperm support")
Signed-off-by: Thiébaud Weksteen <tweek@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When CONFIG_AUDIT is set, its CONFIG_NET dependency is also set, and the
dev_get_by_index and init_net symbols (used by dump_common_audit_data)
are found by the linker. dump_common_audit_data() should then failed to
build when CONFIG_NET is not set. However, because the compiler is
smart, it knows that audit_log_start() always return NULL when
!CONFIG_AUDIT, and it doesn't build the body of common_lsm_audit(). As
a side effect, dump_common_audit_data() is not built and the linker
doesn't error out because of missing symbols.
Let's only build lsm_audit.o when CONFIG_SECURITY and CONFIG_AUDIT are
both set, which is checked with the new CONFIG_HAS_SECURITY_AUDIT.
ipv4_skb_to_auditdata() and ipv6_skb_to_auditdata() are only used by
Smack if CONFIG_AUDIT is set, so they don't need fake implementations.
Because common_lsm_audit() is used in multiple places without
CONFIG_AUDIT checks, add a fake implementation.
Link: https://lore.kernel.org/r/20241122143353.59367-2-mic@digikod.net
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Lines beginning with '#' in the IMA policy are comments and are ignored.
Instead of placing the rule and comment on separate lines, allow the
comment to be suffixed to the IMA policy rule.
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
With a custom policy similar to the builtin IMA 'tcb' policy [1], arch
specific policy, and a kexec boot command line measurement policy rule,
the kexec boot command line is not measured due to the dont_measure
tmpfs rule.
Limit the builtin 'tcb' dont_measure tmpfs policy rule to just the
"func=FILE_CHECK" hook. Depending on the end users security threat
model, a custom policy might not even include this dont_measure tmpfs
rule.
Note: as a result of this policy rule change, other measurements might
also be included in the IMA-measurement list that previously weren't
included.
[1] https://ima-doc.readthedocs.io/en/latest/ima-policy.html#ima-tcb
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
The ima_measurements list is append-only and doesn't require
rcu_read_lock() protection. However, lockdep issues a warning when
traversing RCU lists without the read lock:
security/integrity/ima/ima_kexec.c:40 RCU-list traversed in non-reader section!!
Fix this by using the variant of list_for_each_entry_rcu() with the last
argument set to true. This tells the RCU subsystem that traversing this
append-only list without the read lock is intentional and safe.
This change silences the lockdep warning while maintaining the correct
semantics for the append-only list traversal.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
speed up when issuing readlink on /initrd.img on ext4.
Filesystems opt in by calling inode_set_cached_link() when creating an
inode.
The size is stored in a new union utilizing the same space as i_devices,
thus avoiding growing the struct or taking up any more space.
Churn-wise the current readlink_copy() helper is patched to accept the
size instead of calculating it.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20241120112037.822078-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Like direct file execution (e.g. ./script.sh), indirect file execution
(e.g. sh script.sh) needs to be measured and appraised. Instantiate
the new security_bprm_creds_for_exec() hook to measure and verify the
indirect file's integrity. Unlike direct file execution, indirect file
execution is optionally enforced by the interpreter.
Differentiate kernel and userspace enforced integrity audit messages.
Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241212174223.389435-9-mic@digikod.net
Signed-off-by: Kees Cook <kees@kernel.org>
The new SECBIT_EXEC_RESTRICT_FILE, SECBIT_EXEC_DENY_INTERACTIVE, and
their *_LOCKED counterparts are designed to be set by processes setting
up an execution environment, such as a user session, a container, or a
security sandbox. Unlike other securebits, these ones can be set by
unprivileged processes. Like seccomp filters or Landlock domains, the
securebits are inherited across processes.
When SECBIT_EXEC_RESTRICT_FILE is set, programs interpreting code should
control executable resources according to execveat(2) + AT_EXECVE_CHECK
(see previous commit).
When SECBIT_EXEC_DENY_INTERACTIVE is set, a process should deny
execution of user interactive commands (which excludes executable
regular files).
Being able to configure each of these securebits enables system
administrators or owner of image containers to gradually validate the
related changes and to identify potential issues (e.g. with interpreter
or audit logs).
It should be noted that unlike other security bits, the
SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE bits are
dedicated to user space willing to restrict itself. Because of that,
they only make sense in the context of a trusted environment (e.g.
sandbox, container, user session, full system) where the process
changing its behavior (according to these bits) and all its parent
processes are trusted. Otherwise, any parent process could just execute
its own malicious code (interpreting a script or not), or even enforce a
seccomp filter to mask these bits.
Such a secure environment can be achieved with an appropriate access
control (e.g. mount's noexec option, file access rights, LSM policy) and
an enlighten ld.so checking that libraries are allowed for execution
e.g., to protect against illegitimate use of LD_PRELOAD.
Ptrace restrictions according to these securebits would not make sense
because of the processes' trust assumption.
Scripts may need some changes to deal with untrusted data (e.g. stdin,
environment variables), but that is outside the scope of the kernel.
See chromeOS's documentation about script execution control and the
related threat model:
https://www.chromium.org/chromium-os/developer-library/guides/security/noexec-shell-scripts/
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Tested-by: Jeff Xu <jeffxu@chromium.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241212174223.389435-3-mic@digikod.net
Signed-off-by: Kees Cook <kees@kernel.org>
Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
be allowed for execution. The main use case is for script interpreters
and dynamic linkers to check execution permission according to the
kernel's security policy. Another use case is to add context to access
logs e.g., which script (instead of interpreter) accessed a file. As
any executable code, scripts could also use this check [1].
This is different from faccessat(2) + X_OK which only checks a subset of
access rights (i.e. inode permission and mount options for regular
files), but not the full context (e.g. all LSM access checks). The main
use case for access(2) is for SUID processes to (partially) check access
on behalf of their caller. The main use case for execveat(2) +
AT_EXECVE_CHECK is to check if a script execution would be allowed,
according to all the different restrictions in place. Because the use
of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
execution, user space gets the same error codes.
An interesting point of using execveat(2) instead of openat2(2) is that
it decouples the check from the enforcement. Indeed, the security check
can be logged (e.g. with audit) without blocking an execution
environment not yet ready to enforce a strict security policy.
LSMs can control or log execution requests with
security_bprm_creds_for_exec(). However, to enforce a consistent and
complete access control (e.g. on binary's dependencies) LSMs should
restrict file executability, or measure executed files, with
security_file_open() by checking file->f_flags & __FMODE_EXEC.
Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
doesn't make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown. Because of that, security_bprm_check() is
never called when AT_EXECVE_CHECK is used.
It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
being executed to interpret the script. Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.
Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
SECBIT_EXEC_DENY_INTERACTIVE securebits.
This is a redesign of the CLIP OS 4's O_MAYEXEC:
f5cb330d6b/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters. Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Tested-by: Jeff Xu <jeffxu@chromium.org>
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241212174223.389435-2-mic@digikod.net
Signed-off-by: Kees Cook <kees@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmdiU+EUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXP/Dg//X14XikP3UB0OcVRFkG3etPuUTf0L
gCTDvPcv+Ck4T1AVhYgyPnZCjkuzvIWeqPMPcSOpUmgeJb9x3pPAB1pJSJnrhAoE
3VmOmyalxnj/weboKwFLHRgEBN+gYe1J+fchFkQjGJQF+LzZ3I4jk/FARhYzE2UY
gy/WVKS68MWK/RwED4Hc4c+ZJ/fM27bc3QPLB3C62J9qlQI4p+4XIRNrcfqYYvah
X+Gd0oKMpRF6evHfx7LujWq+e9fZv5ZaGrRDRUwTTmdyWK2+iFKfQw1x24ijw3Iq
0xrj8XR1O8nVd+FWo78mSEax+YXa8UY/WbQlTC1IxlN1lETshVGlQPz7QYV0yOpu
FH47UhXDN2fPHGnMQRbSZf7d8GhOmEBEpms7xll5mDKQnx78Cqxp+xL7BzMCRMyK
ktO8HPyQcxlKMAIrNStvA9xYWcbXf6PhNfogKln9hAiUyJBeEAMEQWp/tz2r1IHw
yl78ZsbL3bNOjlk4K7G9w1qqiHjo7DDPgvzE7bTi2yolG/QX4iUIbAeEUAKqxKtl
qn7R+GGIy/oijSohbkxIPDlf93dzQfMG8QzWN+Z/WZ4NtbdDQglZD6F3ediPNPvP
RpmabcXBEK4TKnHzwWx1fsxd256OzrWI3QF5bJaEQ2u+R4RIJGmPjz27xiXZiXyb
oheacqtiYnAyJQU=
=LS+v
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"One small SELinux patch to get rid improve our handling of unknown
extended permissions by safely ignoring them.
Not only does this make it easier to support newer SELinux policy
on older kernels in the future, it removes to BUG() calls from the
SELinux code."
* tag 'selinux-pr-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: ignore unknown extended permissions
Fedora 41 has reached Linux 6.12 kernel with TOMOYO enabled. I observed
that /usr/lib/systemd/systemd executes /usr/lib/systemd/systemd-executor
by passing dirfd == 9 or dirfd == 16 upon execveat().
Commit ada1986d07 ("tomoyo: fallback to realpath if symlink's pathname
does not exist") used realpath only if symlink's pathname does not exist.
But an out of tree patch suggested that it will be reasonable to always
use realpath if symlink's pathname refers to proc filesystem.
Therefore, this patch changes the pathname used for checking "file execute"
and the domainname used after a successful execve() request.
Before:
<kernel> /usr/lib/systemd/systemd
file execute proc:/self/fd/16 exec.realpath="/usr/lib/systemd/systemd-executor" exec.argv[0]="/usr/lib/systemd/systemd-executor"
file execute proc:/self/fd/9 exec.realpath="/usr/lib/systemd/systemd-executor" exec.argv[0]="/usr/lib/systemd/systemd-executor"
<kernel> /usr/lib/systemd/systemd proc:/self/fd/16
file execute /usr/sbin/auditd exec.realpath="/usr/sbin/auditd" exec.argv[0]="/usr/sbin/auditd"
<kernel> /usr/lib/systemd/systemd proc:/self/fd/16 /usr/sbin/auditd
<kernel> /usr/lib/systemd/systemd proc:/self/fd/9
file execute /usr/bin/systemctl exec.realpath="/usr/bin/systemctl" exec.argv[0]="/usr/bin/systemctl"
<kernel> /usr/lib/systemd/systemd proc:/self/fd/9 /usr/bin/systemctl
After:
<kernel> /usr/lib/systemd/systemd
file execute /usr/lib/systemd/systemd-executor exec.realpath="/usr/lib/systemd/systemd-executor" exec.argv[0]="/usr/lib/systemd/systemd-executor"
<kernel> /usr/lib/systemd/systemd /usr/lib/systemd/systemd-executor
file execute /usr/bin/systemctl exec.realpath="/usr/bin/systemctl" exec.argv[0]="/usr/bin/systemctl"
file execute /usr/sbin/auditd exec.realpath="/usr/sbin/auditd" exec.argv[0]="/usr/sbin/auditd"
<kernel> /usr/lib/systemd/systemd /usr/lib/systemd/systemd-executor /usr/bin/systemctl
<kernel> /usr/lib/systemd/systemd /usr/lib/systemd/systemd-executor /usr/sbin/auditd
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
free_task() already calls bpf_task_storage_free(). It is not necessary
to call it again on security_task_free(). Remove the hook.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://patch.msgid.link/20241212075956.2614894-1-song@kernel.org
syzbot is reporting too large allocation warning at tomoyo_write_control(),
for one can write a very very long line without new line character. To fix
this warning, I use __GFP_NOWARN rather than checking for KMALLOC_MAX_SIZE,
for practically a valid line should be always shorter than 32KB where the
"too small to fail" memory-allocation rule applies.
One might try to write a valid line that is longer than 32KB, but such
request will likely fail with -ENOMEM. Therefore, I feel that separately
returning -EINVAL when a line is longer than KMALLOC_MAX_SIZE is redundant.
There is no need to distinguish over-32KB and over-KMALLOC_MAX_SIZE.
Reported-by: syzbot+7536f77535e5210a5c76@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7536f77535e5210a5c76
Reported-by: Leo Stone <leocstone@gmail.com>
Closes: https://lkml.kernel.org/r/20241216021459.178759-2-leocstone@gmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
When evaluating extended permissions, ignore unknown permissions instead
of calling BUG(). This commit ensures that future permissions can be
added without interfering with older kernels.
Cc: stable@vger.kernel.org
Fixes: fa1aa143ac ("selinux: extended permissions for ioctls")
Signed-off-by: Thiébaud Weksteen <tweek@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a new audit message type to capture nlmsg-related information. This
is similar to LSM_AUDIT_DATA_IOCTL_OP which was added for the other
SELinux extended permission (ioctl).
Adding a new type is preferred to adding to the existing
lsm_network_audit structure which contains irrelevant information for
the netlink sockets (i.e., dport, sport).
Signed-off-by: Thiébaud Weksteen <tweek@google.com>
[PM: change "nlnk-msgtype" to "nl-msgtype" as discussed]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add support for extended permission rules in conditional policies.
Currently the kernel accepts such rules already, but evaluating a
security decision will hit a BUG() in
services_compute_xperms_decision(). Thus reject extended permission
rules in conditional policies for current policy versions.
Add a new policy version for this feature.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Check sk->sk_protocol instead of security class to recognize SCTP socket.
SCTP socket is initialized with SECCLASS_SOCKET class if policy does not
support EXTSOCKCLASS capability. In this case bind(2) hook wrongfully
return EAFNOSUPPORT instead of EINVAL.
The inconsistency was detected with help of Landlock tests:
https://lore.kernel.org/all/b58680ca-81b2-7222-7287-0ac7f4227c3c@huawei-partners.com/
Fixes: 0f8db8cc73 ("selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()")
Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use types for iterators equal to the type of the to be compared values.
Reported by clang:
../ss/sidtab.c:126:2: warning: comparison of integers of different
signs: 'int' and 'unsigned long'
126 | hash_for_each_rcu(sidtab->context_to_sid, i, entry, list) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../hashtable.h:139:51: note: expanded from macro 'hash_for_each_rcu'
139 | for (... ; obj == NULL && (bkt) < HASH_SIZE(name);\
| ~~~ ^ ~~~~~~~~~~~~~~~
../selinuxfs.c:1520:23: warning: comparison of integers of different
signs: 'int' and 'unsigned int'
1520 | for (cpu = *idx; cpu < nr_cpu_ids; ++cpu) {
| ~~~ ^ ~~~~~~~~~~
../hooks.c:412:16: warning: comparison of integers of different signs:
'int' and 'unsigned long'
412 | for (i = 0; i < ARRAY_SIZE(tokens); i++) {
| ~ ^ ~~~~~~~~~~~~~~~~~~
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: munged the clang output due to line length concerns]
Signed-off-by: Paul Moore <paul@paul-moore.com>
av_permissions.h was not declared as a target and therefore not cleaned
up automatically by kbuild.
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/lkml/CAK7LNATUnCPt03BRFSKh1EH=+Sy0Q48wE4ER0BZdJqOb_44L8w@mail.gmail.com/
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
To avoid the following types of error messages due to a failure by the TPM
driver to use the TPM, suspend TPM PCR extensions and the appending of
entries to the IMA log once IMA's reboot notifier has been called. This
avoids trying to use the TPM after the TPM subsystem has been shut down.
[111707.685315][ T1] ima: Error Communicating to TPM chip, result: -19
[111707.685960][ T1] ima: Error Communicating to TPM chip, result: -19
Synchronization with the ima_extend_list_mutex to set
ima_measurements_suspended ensures that the TPM subsystem is not shut down
when IMA holds the mutex while appending to the log and extending the PCR.
The alternative of reading the system_state variable would not provide this
guarantee.
This error could be observed on a ppc64 machine running SuSE Linux where
processes are still accessing files after devices have been shut down.
Suspending the IMA log and PCR extensions shortly before reboot does not
seem to open a significant measurement gap since neither TPM quoting would
work for attestation nor that new log entries could be written to anywhere
after devices have been shut down. However, there's a time window between
the invocation of the reboot notifier and the shutdown of devices. This
includes all subsequently invoked reboot notifiers as well as
kernel_restart_prepare() where __usermodehelper_disable() waits for all
running_helpers to exit. During this time window IMA could now miss log
entries even though attestation would still work. The reboot of the system
shortly after may make this small gap insignificant.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
but it meant for a different use case of filling file content before
access to a file range, so it has slightly different semantics.
Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, so content
scanners could inspect the content filled by pre-content event handler.
Unlike FS_ACCESS_PERM, FS_PRE_ACCESS is also called before a file is
modified by syscalls as write() and fallocate().
FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
pre-content events are only reported for regular files and dirs.
The pre-content events are meant to be used by hierarchical storage
managers that want to fill the content of files on first access.
There are some specific requirements from filesystems that could
be used with pre-content events, so add a flag for fs to opt-in
for pre-content events explicitly before they can be used.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/b934c5e3af205abc4e0e4709f6486815937ddfdf.1731684329.git.josef@toxicpanda.com
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth: bnxt_en: support header page pool in queue API
- eth: ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth: mlxsw: use correct key block on Spectrum-4
- eth: mlx5: fix memory leak in mlx5hws_definer_calc_layout
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmdRve4SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk/TUQAItDVkTQDiAUUqd0TDH2SSnboxXozcMF
fKVrdWulFAOe7qcajUqjkzvTDjAOw9Xbfh8rEiFBLaXmUzSUT2eqbm2VahvWIR5/
k08v7fTTuCNzEwhnbQlsZ47Nd26LJVPwOvbtE/4V8d50pU/serjWuI/74tUCWAjn
DQOjyyqRjKgKKY+WWCQ6g4tVpD//DB4Sj15GiE3MhlW1f08AAnPJSe2oTaq0RZBG
nXo7abOGn8x3RYilrlp/ZwWYuNpVI4oF+qmp+t/46NV+7ER1JgrC97k0kFyyCYVD
g7vBvFjvA7vDmiuzfsOW2n7IRdcfBtkfi8UJYOIvVgJg7KDF0DXoxj3qD4FagI35
RWRMJw+PoNlFFkPprlp0we/19jPJWOO6rx+AOMEQD78jrH7NoFQ/+eeBf/nppfjy
wX0LD1aQsgPk2ju0I8GbcM/qaJ81EJUiYYVLJHieH0+vvmqts8cMDRLZf5t08EHa
myXcRGB9N8gjguGp5mdR5KmtY82zASlNC0PbDp3nlPYc3e/opCmMRYGjBohO+vqn
n7u250WThPwiBtOwYmSbcK7zpS034/VX0ufnTT2X3MWnFGGNDv6XVmho/OBuCHqJ
m/EiJo/D9qVGAIql/vAxN9a4lQKrcZgaMzCyEPYCf7rtLmzx7sfyHWbf/GZlUAU/
9dUUfWqCbZcL
=nkPk
-----END PGP SIGNATURE-----
Merge tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can and netfilter.
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth:
- bnxt_en: support header page pool in queue API
- ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth:
- mlxsw: use correct key block on Spectrum-4
- mlx5: fix memory leak in mlx5hws_definer_calc_layout"
* tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
net: avoid potential UAF in default_operstate()
vsock/test: verify socket options after setting them
vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
net/mlx5e: Remove workaround to avoid syndrome for internal port
net/mlx5e: SD, Use correct mdev to build channel param
net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
net/mlx5: HWS: Properly set bwc queue locks lock classes
net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
bnxt_en: handle tpa_info in queue API implementation
bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
bnxt_en: refactor tpa_info alloc/free into helpers
geneve: do not assume mac header is set in geneve_xmit_skb()
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
ipmr: tune the ipmr_can_free_table() checks.
netfilter: nft_set_hash: skip duplicated elements pending gc run
netfilter: ipset: Hold module reference while requesting a module
...
In cases where we want a stable way to observe/trace
cap_capable (e.g. protection from inlining and API updates)
add a tracepoint that passes:
- The credentials used
- The user namespace of the resource being accessed
- The user namespace in which the credential provides the
capability to access the targeted resource
- The capability to check for
- The return value of the check
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Link: https://lore.kernel.org/r/20241204155911.1817092-1-linux@jordanrome.com
Signed-off-by: Serge Hallyn <sergeh@kernel.org>
The cap_mmap_file() LSM callback returns the default value for the
security_mmap_file() LSM hook and can be safely removed.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Serge Hallyn <sergeh@kernel.org>
Verify that the LSM releasing the secctx is the LSM that
allocated it. This was not necessary when only one LSM could
create a secctx, but once there can be more than one it is.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace the (secctx,seclen) pointer pair with a single lsm_context
pointer to allow return of the LSM identifier along with the context
and context length. This allows security_release_secctx() to know how
to release the context. Callers have been modified to use or save the
returned data from the new structure.
Cc: ceph-devel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Change the security_inode_getsecctx() interface to fill a lsm_context
structure instead of data and length pointers. This provides
the information about which LSM created the context so that
security_release_secctx() can use the correct hook.
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace the (secctx,seclen) pointer pair with a single
lsm_context pointer to allow return of the LSM identifier
along with the context and context length. This allows
security_release_secctx() to know how to release the
context. Callers have been modified to use or save the
returned data from the new structure.
security_secid_to_secctx() and security_lsmproc_to_secctx()
will now return the length value on success instead of 0.
Cc: netdev@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: Todd Kjos <tkjos@google.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak, kdoc fix, signedness fix from Dan Carpenter]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a new lsm_context data structure to hold all the information about a
"security context", including the string, its size and which LSM allocated
the string. The allocation information is necessary because LSMs have
different policies regarding the lifecycle of these strings. SELinux
allocates and destroys them on each use, whereas Smack provides a pointer
to an entry in a list that never goes away.
Update security_release_secctx() to use the lsm_context instead of a
(char *, len) pair. Change its callers to do likewise. The LSMs
supporting this hook have had comments added to remind the developer
that there is more work to be done.
The BPF security module provides all LSM hooks. While there has yet to
be a known instance of a BPF configuration that uses security contexts,
the possibility is real. In the existing implementation there is
potential for multiple frees in that case.
Cc: linux-integrity@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: linux-nfs@vger.kernel.org
Cc: Todd Kjos <tkjos@google.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmdKdi4UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXP36A//UudCkttNTVCErQM0r8mjcPjtthlo
fRblzsuXbHB3FTsrDQA4bpPcLTlxMcNZXGgO9F+cwjNRNRbavBZo8+AbtAWOppNx
3XGr65Z0iIelUgvy/T3jYl8xQM4IePWbBn/wXMtqX3synE/UX4fqfxC/feeaUly8
qdyoefM0I4JOabjn9i9N6KyeyU/sjI4aTRhd5YwaCdy6wK2u9hQEqvER5XdFZjhN
HnuFKlBKEH5fo3Wwq5X+PbX8hP4szl3DE7QEplTlMjrh0y7xIc/ndugp/f3IN2Ri
0/Da6FpqjtBiis+wkyGWwwfmw/erRRaLUhwbuW3MjCQTmtbcjkIjA2xb4nxtcT+n
e/sCdWkgf1PX15igHpbWAynlv+jF3Ue+Kr/hrWi/H50fWiZh9skUmUxJ92SlI4fK
DT1N430xvoBpQwynASAM6hcWWgLJlFBo11aZA2fl++ORC3z/dbCZ5iLPdbIvaemF
KSmQTnrk4sVH8sPl9aIHqCFxj78uTHkSzLGQyq76AWU/s8hGzPszpU6iPwlcx1wk
WpkMLG5FQr0a4vKnNx1FVLCmwalPlB6/ZclRvKuSiEGZWTrnOo5KhPy1+Yn39AUo
IDBFnJkJNWxUVyb6zK9OJJ3Tsd6c1qeBEqb0VzUDGmw2umTuEAKqlqNJAohxVvyP
A7GFrlg9FvsDAwM=
=ZCsN
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull ima fix from Paul Moore:
"One small patch to fix a function parameter / local variable naming
snafu that went up to you in the current merge window"
* tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ima: uncover hidden variable in ima_match_rules()
The variable name "prop" is inadvertently used twice in
ima_match_rules(), resulting in incorrect use of the local
variable when the function parameter should have been.
Rename the local variable and correct the use of the parameter.
Suggested-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Roberto Sassu <roberto.sassu@huawei.com>
[PM: subj tweak, Roberto's ACK]
Signed-off-by: Paul Moore <paul@paul-moore.com>
the kernel test robot reports a C23 extension
warning: label followed by a declaration is a C23 extension
[-Wc23-extensions]
696 | struct aa_profile *new_profile = NULL;
Instead of adding a null statement creating a C99 style inline var
declaration lift the label declaration out of the block so that it no
longer immediatedly follows the label.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411101808.AI8YG6cs-lkp@intel.com/
Fixes: ee650b3820f3 ("apparmor: properly handle cx/px lookup failure for complain")
Signed-off-by: John Johansen <john.johansen@canonical.com>
The wording of 'scrubbing environment' implied that all environment
variables would be removed, when instead secure-execution mode only
removes a small number of environment variables. This patch updates the
wording to describe what actually occurs instead: setting AT_SECURE for
ld.so's secure-execution mode.
Link: https://gitlab.com/apparmor/apparmor/-/merge_requests/1315 is a
merge request that does similar updating for apparmor userspace.
Signed-off-by: Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
The macros for label combination XXX_comb are no longer used and there
are no plans to use them so remove the dead code.
Signed-off-by: John Johansen <john.johansen@canonical.com>
In the macro definition of next_comb(), a parameter L1 is accepted,
but it is not used. Hence, it should be removed.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
The previous audit_cap cache deduping was based on the profile that was
being audited. This could cause confusion due to the deduplication then
occurring across multiple processes, which could happen if multiple
instances of binaries matched the same profile attachment (and thus ran
under the same profile) or a profile was attached to a container and its
processes.
Instead, perform audit_cap deduping over ad->subj_cred, which ensures the
deduping only occurs across a single process, instead of across all
processes that match the current one's profile.
Signed-off-by: Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
When auditing capabilities, AppArmor uses a per-CPU, per-profile cache
such that the same capability for the same profile doesn't get repeatedly
audited, with the original goal of reducing audit logspam. However, this
cache does not have an expiration time, resulting in confusion when a
profile is shared across binaries (for example) and an expected DENIED
audit entry doesn't appear, despite the cache entry having been populated
much longer ago. This confusion was exacerbated by the per-CPU nature of
the cache resulting in the expected entries sporadically appearing when
the later denial+audit occurred on a different CPU.
To resolve this, record the last time a capability was audited for a
profile and add a timestamp expiration check before doing the audit.
v1 -> v2:
- Hardcode a longer timeout and drop the patches making it a sysctl,
after discussion with John Johansen.
- Cache the expiration time instead of the last-audited time. This value
can never be zero, which lets us drop the kernel_cap_t caps field from
the cache struct.
Signed-off-by: Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
The profile_capabile function takes a struct apparmor_audit_data *ad,
which is documented as possibly being NULL. However, the single place that
calls this function never passes it a NULL ad. If we were ever to call
profile_capable with a NULL ad elsewhere, we would need to rework the
function, as its very first use of ad is to dereference ad->class without
checking if ad is NULL.
Thus, document profile_capable's ad parameter as not accepting NULL.
Signed-off-by: Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Multiple profiles shared 'ent->caps', so some logs missed.
Fixes: 0ed3b28ab8 ("AppArmor: mediation of non file objects")
Signed-off-by: chao liu <liuzgyid@outlook.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Add a comment to unpack_perm to document the first entry in the packed
perms struct is reserved, and make a non-functional change of unpacking
to a temporary stack variable named "reserved" to help suppor the
documentation of which value is reserved.
Suggested-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
aa_label_audit, aa_label_find, aa_label_seq_print and aa_update_label_name
were added by commit
f1bd904175 ("apparmor: add the base fns() for domain labels")
but never used.
aa_profile_label_perm was added by commit
637f688dc3 ("apparmor: switch from profiles to using labels on contexts")
but never used.
aa_secid_update was added by commit
c092921219 ("apparmor: add support for mapping secids and using secctxes")
but never used.
aa_split_fqname has been unused since commit
3664268f19 ("apparmor: add namespace lookup fns()")
aa_lookup_profile has been unused since commit
93c98a484c ("apparmor: move exec domain mediation to using labels")
aa_audit_perms_cb was only used by aa_profile_label_perm (see above).
All of these commits are from around 2017.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Since kvfree() already checks if its argument is NULL, an additional
check before calling kvfree() is unnecessary and can be removed.
Remove it and the following Coccinelle/coccicheck warning reported by
ifnullfree.cocci:
WARNING: NULL check before some freeing functions is not needed
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Use the IS_ERR_OR_NULL() helper instead of open-coding a
NULL and an error pointer checks to simplify the code and
improve readability.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Currently the dfa state machine is limited by its default, next, and
check tables using u16. Allow loading of u32 tables, and if u16 tables
are loaded map them to u32.
The number of states allowed does not increase to 2^32 because the
base table uses the top 8 bits of its u32 for flags. Moving the flags
into a separate table allowing a full 2^32 bit range wil be done in
a separate patch.
Link: https://gitlab.com/apparmor/apparmor/-/issues/419
Signed-off-by: John Johansen <john.johansen@canonical.com>
mode profiles
When a cx/px lookup fails, apparmor would deny execution of the binary
even in complain mode (where it would audit as allowing execution while
actually denying it). Instead, in complain mode, create a new learning
profile, just as would have been done if the cx/px line wasn't there.
Signed-off-by: Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
attach->xmatch was not set when allocating a null profile, which is used in
complain mode to allocate a learning profile. This was causing downstream
failures in find_attach, which expected a valid xmatch but did not find
one under a certain sequence of profile transitions in complain mode.
This patch ensures the xmatch is set up properly for null profiles.
Signed-off-by: Ryan Lee <ryan.lee@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
performs some cleanups in the resource management code.
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of task_struct.comm[].
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest.
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code.
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification.
- The series "add detect count for hung tasks" from Lance Yang adds more
userspace visibility into the hung-task detector's activity.
- Apart from that, singelton patches in many places - please see the
individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ0L6lQAKCRDdBJ7gKXxA
jmEIAPwMSglNPKRIOgzOvHh8MUJW1Dy8iKJ2kWCO3f6QTUIM2AEA+PazZbUd/g2m
Ii8igH0UBibIgva7MrCyJedDI1O23AA=
=8BIU
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of
task_struct.comm[]
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification
- The series "add detect count for hung tasks" from Lance Yang adds
more userspace visibility into the hung-task detector's activity
- Apart from that, singelton patches in many places - please see the
individual changelogs for details
* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
gdb: lx-symbols: do not error out on monolithic build
kernel/reboot: replace sprintf() with sysfs_emit()
lib: util_macros_kunit: add kunit test for util_macros.h
util_macros.h: fix/rework find_closest() macros
Improve consistency of '#error' directive messages
ocfs2: fix uninitialized value in ocfs2_file_read_iter()
hung_task: add docs for hung_task_detect_count
hung_task: add detect count for hung tasks
dma-buf: use atomic64_inc_return() in dma_buf_getfile()
fs/proc/kcore.c: fix coccinelle reported ERROR instances
resource: avoid unnecessary resource tree walking in __region_intersects()
ocfs2: remove unused errmsg function and table
ocfs2: cluster: fix a typo
lib/scatterlist: use sg_phys() helper
checkpatch: always parse orig_commit in fixes tag
nilfs2: convert metadata aops from writepage to writepages
nilfs2: convert nilfs_recovery_copy_block() to take a folio
nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
nilfs2: remove nilfs_writepage
nilfs2: convert checkpoint file to be folio-based
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmc/WikACgkQnJ2qBz9k
QNnZdwf9FfT95zhnNWk3ohNOh5BO0P/uTY2fNkQBDPLPY3Bi8nywPIjXYCDSOgX1
SBV0rakkWp+rVO1/qkg5J1mUvBoefzT7O17rG0LfRw3zjHPX+XeO+e3Xf/kPmJHJ
3fvN//VTZQ6uPcn8PWgLe8VVQqNXD3nlUrwz/JKaxyodsdm0ERej4QZjG6Cikotk
aKuDPAnOiS37/lIFZGdJRca/rwJPwMekNt1SxVrnmin0/QfB/Uubba2+NNdQ+z3W
SCA/26PK822T3ELB8BkfwpdINC17WUwDJlkC8qha/JRzDlxJC/ysr43fHn/7Adfb
CthG8V4JDGm51jcC0qe0Yk2HV75U4A==
=htHs
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"A couple of smaller random fsnotify fixes"
* tag 'fsnotify_for_v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: Fix ordering of iput() and watched_objects decrement
fsnotify: fix sending inotify event with unexpected filename
fanotify: allow reporting errors on failure to open fd
fsnotify, lsm: Decouple fsnotify from lsm
API:
- Add sig driver API.
- Remove signing/verification from akcipher API.
- Move crypto_simd_disabled_for_test to lib/crypto.
- Add WARN_ON for return values from driver that indicates memory corruption.
Algorithms:
- Provide crc32-arch and crc32c-arch through Crypto API.
- Optimise crc32c code size on x86.
- Optimise crct10dif on arm/arm64.
- Optimise p10-aes-gcm on powerpc.
- Optimise aegis128 on x86.
- Output full sample from test interface in jitter RNG.
- Retry without padata when it fails in pcrypt.
Drivers:
- Add support for Airoha EN7581 TRNG.
- Add support for STM32MP25x platforms in stm32.
- Enable iproc-r200 RNG driver on BCMBCA.
- Add Broadcom BCM74110 RNG driver.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmc6sQsACgkQxycdCkmx
i6dfHxAAnkI65TE6agZq9DlkEU4ZqOsxxdk0MsGIhbCUTxW3KENzu9vtKjnvg9T/
Ou0d2J49ny87Y4zaA59Wf/Q1+gg5YSQR5kelonpfrPLkCkJjr72HZpyCHv8TTzEC
uHHoVj9cnPIF5/yfiqQsrWT1ACip9vn+slyVPaMJV1qR6gnvnSALtsg4e/vKHkn7
ZMaf2pZ2ROYXdB02nMK5KQcCrxD64MQle/yQepY44eYjnT+XclkqPdi6o1nUSpj/
RFAeY0jFSTu0pj3DqT48TnU/LiiNLlFOZrGjCdEySoac63vmTtKqfYDmrRaFz4hB
sucxbgJ3xnnYseRijtfXnxaD/IkDJln+ipGNQKAZLfOVMDCTxPdYGmOpobMTXMS+
0sY0eAHgqr23P9pOp+sOzcAEFIqg6llAYQVWx3Zl4vpXBUuxzg6AqmHnPicnck7y
Lw1cJhQxij2De3dG2ZL/0dgQxMjGN/YfCM8SSg6l+Xn3j4j47rqJNH2ZsmXtbJ2n
kTkmemmWdgRR1IvgQQGsvyKs9ThkcEDW+IzW26SUv3Clvru2NSkX4ZPHbezZQf+D
R0wMZsW3Fw7Zymerz1GIBSqdLnsyFWtIAjukDpOR6ordPgOBeDt76v6tw5vL2/II
KYoeN1pdEEecwuhAsEvCryT5ZG4noBeNirf/ElWAfEybgcXiTks=
=T8pa
-----END PGP SIGNATURE-----
Merge tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Add sig driver API
- Remove signing/verification from akcipher API
- Move crypto_simd_disabled_for_test to lib/crypto
- Add WARN_ON for return values from driver that indicates memory
corruption
Algorithms:
- Provide crc32-arch and crc32c-arch through Crypto API
- Optimise crc32c code size on x86
- Optimise crct10dif on arm/arm64
- Optimise p10-aes-gcm on powerpc
- Optimise aegis128 on x86
- Output full sample from test interface in jitter RNG
- Retry without padata when it fails in pcrypt
Drivers:
- Add support for Airoha EN7581 TRNG
- Add support for STM32MP25x platforms in stm32
- Enable iproc-r200 RNG driver on BCMBCA
- Add Broadcom BCM74110 RNG driver"
* tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits)
crypto: marvell/cesa - fix uninit value for struct mv_cesa_op_ctx
crypto: cavium - Fix an error handling path in cpt_ucode_load_fw()
crypto: aesni - Move back to module_init
crypto: lib/mpi - Export mpi_set_bit
crypto: aes-gcm-p10 - Use the correct bit to test for P10
hwrng: amd - remove reference to removed PPC_MAPLE config
crypto: arm/crct10dif - Implement plain NEON variant
crypto: arm/crct10dif - Macroify PMULL asm code
crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl
crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code
crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply
crypto: arm64/crct10dif - Remove obsolete chunking logic
crypto: bcm - add error check in the ahash_hmac_init function
crypto: caam - add error check to caam_rsa_set_priv_key_form
hwrng: bcm74110 - Add Broadcom BCM74110 RNG driver
dt-bindings: rng: add binding for BCM74110 RNG
padata: Clean up in padata_do_multithreaded()
crypto: inside-secure - Fix the return value of safexcel_xcbcmac_cra_init()
crypto: qat - Fix missing destroy_workqueue in adf_init_aer()
crypto: rsassa-pkcs1 - Reinstate support for legacy protocols
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmcztFcUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPvFQ/+KYwRe3g6gFSu7tRA34okHtUopvpF
KGAaic06c8oy85gSX4B2Xk4HINCgXVUuRi9Z+0yExRWvvBXRRdQRUj1Vdbj4KOEG
sRsIA1j1YhPU3wyhkAqwpJ97sQE1v9Xb3xizGwTfQKGQkd+cvtHg0QKM08/jPQYq
bbbcSxoVsUzh8+idAq1UMfdoTsMh2xeCW7Q1+dbBINJykNzKiqEEc21xgBxeomST
lSG9XFP3BJr1RBlb4Ux+J8YL+2G/rDBWZh1sR5+t31kgClSgs3CMBRFdTATvplKk
e9vrcUF8wR7xWWnDmmdobHa462qUt6BWifYarX9RTomGBugZfYDOR/C+jpb+xZwd
+tZfL6HSOVeBtQ/Zu1bs18eS5i2dj7GxFN7GPY2qXIPvsW5Acwcx1CCK6oNDmX05
1cOaNuZRYBDye4eAnT3yufnJ34VO80UQIfKTE6dqrX0XtCFYomTxb+Km0qM3utl5
ubr3Krp6GmVs65lIvtnIhDKSlcNIBbJfH64vdQNnOn/8FvkovGqp2eaX+0wBhROM
8KgbqntXU4/DgQuDiP01g13mTDeTGdcfyRWKcKMI/CzI/WASPZBpVuqX6xWXh3bs
NlZmJ/7+Y48Xp2FvaEchQ/A8ppyIrigMLloZ8yAHf2P1z9g6wBNRCrsScdSQVx63
ArxHLRY44pUOnPs=
=m/yY
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
"Thirteen patches, all focused on moving away from the current 'secid'
LSM identifier to a richer 'lsm_prop' structure.
This move will help reduce the translation that is necessary in many
LSMs, offering better performance, and make it easier to support
different LSMs in the future"
* tag 'lsm-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: remove lsm_prop scaffolding
netlabel,smack: use lsm_prop for audit data
audit: change context data from secid to lsm_prop
lsm: create new security_cred_getlsmprop LSM hook
audit: use an lsm_prop in audit_names
lsm: use lsm_prop in security_inode_getsecid
lsm: use lsm_prop in security_current_getsecid
audit: update shutdown LSM data
lsm: use lsm_prop in security_ipc_getsecid
audit: maintain an lsm_prop in audit_context
lsm: add lsmprop_to_secctx hook
lsm: use lsm_prop in security_audit_rule_match
lsm: add the lsm_prop data structure
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmcztEsUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXN8qg//eUYYuhO4LWY3Qq5P4pNbaJ40Mx8Y
MSjYoN345nwLnWN1T3JTFik6Se6RKQmqMTd5xtdphnaP1NP3dtxTiV3nWWdP+/Ak
JNwGAGrjX8JOep8KNEkUbH219iuFgUvvYIfIXaEswe6AAgtK1A7VDwAkSVdeoenD
Ll0xpwKiZppxnDrwHtyB7JwPFVxsx4ctUOz8u7HBEyGDXPbiDmAGvLNwWbcmPSb1
EndFPdxIOsNaipl8NcQEBz5x5t/r/qVhXkSbalx5o5eAouXHfr4ArurgGV69TRDM
3Xqr8RkS6nkA+/rvTUxe2JF4IZ7MTD61+iAFxgsj4cnVavI3oszTfCy1j45auAt9
QoRVAIQgJv/f7DI15A/0u2ZuGwCBAPFn6lG34jHauI/LQ1f9s1w/anSLYXOzxw74
NmC2eYedznqemDP1DUPUjpp06/Nm88eEvrfsl9lTCY3cN8wAaFWEDhAFCu2IbDQM
bpl8/rNoVKE1v2+p3WmXnug9DRs2JF6gvjlo/HPEHxv/hfO0rbTrb6cPcMd7BUXB
ZM1D45oj5lPaOR+by7AaFzSL0zZiyMa5f59Jib7cvIXTy9t2aXGiHps2kbnRdIgx
3JfJIWf7TSA8HPzkU766nGvBaEWUumWbKka+SVuSv/I2A9lP2RskprfSEEvCP/5P
ysmXzAwulqfDY30=
=Jo+V
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Add support for netlink xperms
Some time ago we added the concept of "xperms" to the SELinux policy
so that we could write policy for individual ioctls, this builds upon
this by using extending xperms to netlink so that we can write
SELinux policy for individual netlnk message types and not rely on
the fairly coarse read/write mapping tables we currently have.
There are limitations involving generic netlink due to the
multiplexing that is done, but it's no worse that what we currently
have. As usual, more information can be found in the commit message.
- Deprecate /sys/fs/selinux/user
We removed the only known userspace use of this back in 2020 and now
that several years have elapsed we're starting down the path of
deprecating it in the kernel.
- Cleanup the build under scripts/selinux
A couple of patches to move the genheaders tool under
security/selinux and correct our usage of kernel headers in the tools
located under scripts/selinux. While these changes originated out of
an effort to build Linux on different systems, they are arguably the
right thing to do regardless.
- Minor code cleanups and style fixes
Not much to say here, two minor cleanup patches that came out of the
netlink xperms work
* tag 'selinux-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: Deprecate /sys/fs/selinux/user
selinux: apply clang format to security/selinux/nlmsgtab.c
selinux: streamline selinux_nlmsg_lookup()
selinux: Add netlink xperm support
selinux: move genheaders to security/selinux/
selinux: do not include <linux/*.h> headers from host programs
Making sure that struct fd instances are destroyed in the same
scope where they'd been created, getting rid of reassignments
and passing them by reference, converting to CLASS(fd{,_pos,_raw}).
We are getting very close to having the memory safety of that stuff
trivial to verify.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ
69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9
CN18glYmD3wRyU6Bwl4vGODouSJvDgA=
=gVH3
-----END PGP SIGNATURE-----
Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' class updates from Al Viro:
"The bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same scope
where they'd been created, getting rid of reassignments and passing
them by reference, converting to CLASS(fd{,_pos,_raw}).
We are getting very close to having the memory safety of that stuff
trivial to verify"
* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
deal with the last remaing boolean uses of fd_file()
css_set_fork(): switch to CLASS(fd_raw, ...)
memcg_write_event_control(): switch to CLASS(fd)
assorted variants of irqfd setup: convert to CLASS(fd)
do_pollfd(): convert to CLASS(fd)
convert do_select()
convert vfs_dedupe_file_range().
convert cifs_ioctl_copychunk()
convert media_request_get_by_fd()
convert spu_run(2)
switch spufs_calls_{get,put}() to CLASS() use
convert cachestat(2)
convert do_preadv()/do_pwritev()
fdget(), more trivial conversions
fdget(), trivial conversions
privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
o2hb_region_dev_store(): avoid goto around fdget()/fdput()
introduce "fd_pos" class, convert fdget_pos() users to it.
fdget_raw() users: switch to CLASS(fd_raw)
convert vmsplice() to CLASS(fd)
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcW4gAKCRCRxhvAZXjc
okF+AP9xTMb2SlnRPBOBd9yFcmVXmQi86TSCUPAEVb+wIldGYwD/RIOdvXYJlp9v
RgJkU1DC3ddkXtONNDY6gFaP+siIWA0=
=gMc7
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file updates from Christian Brauner:
"This contains changes the changes for files for this cycle:
- Introduce a new reference counting mechanism for files.
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop
it has O(N^2) behaviour under contention with N concurrent
operations and it is in a hot path in __fget_files_rcu().
The rcuref infrastructures remedies this problem by using an
unconditional increment relying on safe- and dead zones to make
this work and requiring rcu protection for the data structure in
question. This not just scales better it also introduces overflow
protection.
However, in contrast to generic rcuref, files require a memory
barrier and thus cannot rely on *_relaxed() atomic operations and
also require to be built on atomic_long_t as having massive amounts
of reference isn't unheard of even if it is just an attack.
This adds a file specific variant instead of making this a generic
library.
This has been tested by various people and it gives consistent
improvement up to 3-5% on workloads with loads of threads.
- Add a fastpath for find_next_zero_bit(). Skip 2-levels searching
via find_next_zero_bit() when there is a free slot in the word that
contains the next fd. This improves pts/blogbench-1.1.0 read by 8%
and write by 4% on Intel ICX 160.
- Conditionally clear full_fds_bits since it's very likely that a bit
in full_fds_bits has been cleared during __clear_open_fds(). This
improves pts/blogbench-1.1.0 read up to 13%, and write up to 5% on
Intel ICX 160.
- Get rid of all lookup_*_fdget_rcu() variants. They were used to
lookup files without taking a reference count. That became invalid
once files were switched to SLAB_TYPESAFE_BY_RCU and now we're
always taking a reference count. Switch to an already existing
helper and remove the legacy variants.
- Remove pointless includes of <linux/fdtable.h>.
- Avoid cmpxchg() in close_files() as nobody else has a reference to
the files_struct at that point.
- Move close_range() into fs/file.c and fold __close_range() into it.
- Cleanup calling conventions of alloc_fdtable() and expand_files().
- Merge __{set,clear}_close_on_exec() into one.
- Make __set_open_fd() set cloexec as well instead of doing it in two
separate steps"
* tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressor
fs: port files to file_ref
fs: add file_ref
expand_files(): simplify calling conventions
make __set_open_fd() set cloexec state as well
fs: protect backing files with rcu
file.c: merge __{set,clear}_close_on_exec()
alloc_fdtable(): change calling conventions.
fs/file.c: add fast path in find_next_fd()
fs/file.c: conditionally clear full_fds
fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()
move close_range(2) into fs/file.c, fold __close_range() into it
close_files(): don't bother with xchg()
remove pointless includes of <linux/fdtable.h>
get rid of ...lookup...fdget_rcu() family
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZzNj9BQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5QKDAQCkbTcWVTnMrdz/0hV9JVmoLCFs6GWZ
cTjaBApOQge1pgD/bTQGJ0fYP6sWEzMPSTMXr6uJaJtlmpsGdPNoOmKUTQU=
=+K7B
-----END PGP SIGNATURE-----
Merge tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fixes from Mimi Zohar:
"One bug fix, one performance improvement, and the use of
static_assert:
- The bug fix addresses "only a cosmetic change" commit, which didn't
take into account the original 'ima' template definition.
- The performance improvement limits the atomic_read()"
* tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Use static_assert() to check struct sizes
evm: stop avoidably reading i_writecount in evm_file_release
ima: fix buffer overrun in ima_eventdigest_init_common
Do not walk through the domain hierarchy when the required scope is not
supported by this domain. This is the same approach as for filesystem
and network restrictions.
Cc: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20241109110856.222842-4-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Replace get_raw_handled_fs_accesses() with a generic
landlock_union_access_masks(), and replace get_fs_domain() with a
generic landlock_get_applicable_domain(). These helpers will also be
useful for other types of access.
Cc: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20241109110856.222842-2-mic@digikod.net
[mic: Slightly improve doc as suggested by Günther]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Quoted from Linus [0]:
selinux never wanted a lock, and never wanted any kind of *consistent*
result, it just wanted a *stable* result.
Using get_task_comm() to read the task comm ensures that the name is
always NUL-terminated, regardless of the source string. This approach also
facilitates future extensions to the task comm.
Link: https://lkml.kernel.org/r/20241007144911.27693-4-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com/ [0]
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Alejandro Colomar <alx@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matus Jokay <matus.jokay@stuba.sk>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When sealing or unsealing a key blob we currently do not wait for
the AEAD cipher operation to finish and simply return after submitting
the request. If there is some load on the system we can exit before
the cipher operation is done and the buffer we read from/write to
is already removed from the stack. This will e.g. result in NULL
pointer dereference errors in the DCP driver during blob creation.
Fix this by waiting for the AEAD cipher operation to finish before
resuming the seal and unseal calls.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 0e28bf61a5 ("KEYS: trusted: dcp: fix leak of blob encryption key")
Reported-by: Parthiban N <parthiban@linumiz.com>
Closes: https://lore.kernel.org/keyrings/254d3bb1-6dbc-48b4-9c08-77df04baee2f@linumiz.com/
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
KASAN reports an out of bounds read:
BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36
BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline]
BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410
security/keys/permission.c:54
Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362
CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15
Call Trace:
__dump_stack lib/dump_stack.c:82 [inline]
dump_stack+0x107/0x167 lib/dump_stack.c:123
print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400
__kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560
kasan_report+0x3a/0x50 mm/kasan/report.c:585
__kuid_val include/linux/uidgid.h:36 [inline]
uid_eq include/linux/uidgid.h:63 [inline]
key_task_permission+0x394/0x410 security/keys/permission.c:54
search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793
This issue was also reported by syzbot.
It can be reproduced by following these steps(more details [1]):
1. Obtain more than 32 inputs that have similar hashes, which ends with the
pattern '0xxxxxxxe6'.
2. Reboot and add the keys obtained in step 1.
The reproducer demonstrates how this issue happened:
1. In the search_nested_keyrings function, when it iterates through the
slots in a node(below tag ascend_to_node), if the slot pointer is meta
and node->back_pointer != NULL(it means a root), it will proceed to
descend_to_node. However, there is an exception. If node is the root,
and one of the slots points to a shortcut, it will be treated as a
keyring.
2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function.
However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as
ASSOC_ARRAY_PTR_SUBTYPE_MASK.
3. When 32 keys with the similar hashes are added to the tree, the ROOT
has keys with hashes that are not similar (e.g. slot 0) and it splits
NODE A without using a shortcut. When NODE A is filled with keys that
all hashes are xxe6, the keys are similar, NODE A will split with a
shortcut. Finally, it forms the tree as shown below, where slot 6 points
to a shortcut.
NODE A
+------>+---+
ROOT | | 0 | xxe6
+---+ | +---+
xxxx | 0 | shortcut : : xxe6
+---+ | +---+
xxe6 : : | | | xxe6
+---+ | +---+
| 6 |---+ : : xxe6
+---+ +---+
xxe6 : : | f | xxe6
+---+ +---+
xxe6 | f |
+---+
4. As mentioned above, If a slot(slot 6) of the root points to a shortcut,
it may be mistakenly transferred to a key*, leading to a read
out-of-bounds read.
To fix this issue, one should jump to descend_to_node if the ptr is a
shortcut, regardless of whether the node is root or not.
[1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/
[jarkko: tweaked the commit message a bit to have an appropriate closes
tag.]
Fixes: b2a4df200d ("KEYS: Expand the capacity of a keyring")
Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
all failure exits prior to fdget() leave the scope, all matching fdput()
are immediately followed by leaving the scope.
[xfs_ioc_commit_range() chunk moved here as well]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If enabled, we fallback to the platform keyring if the trusted keyring
doesn't have the key used to sign the ipe policy. But if pkcs7_verify()
rejects the key for other reasons, such as usage restrictions, we do not
fallback. Do so, following the same change in dm-verity.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Suggested-by: Serge Hallyn <serge@hallyn.com>
[FW: fixed some line length issues and a typo in the commit message]
Signed-off-by: Fan Wu <wufan@kernel.org>
The current policy management makes it impossible to use IPE
in a general purpose distribution. In such cases the users are not
building the kernel, the distribution is, and access to the private
key included in the trusted keyring is, for obvious reason, not
available.
This means that users have no way to enable IPE, since there will
be no built-in generic policy, and no access to the key to sign
updates validated by the trusted keyring.
Just as we do for dm-verity, kernel modules and more, allow the
secondary and platform keyrings to also validate policies. This
allows users enrolling their own keys in UEFI db or MOK to also
sign policies, and enroll them. This makes it sensible to enable
IPE in general purpose distributions, as it becomes usable by
any user wishing to do so. Keys in these keyrings can already
load kernels and kernel modules, so there is no security
downgrade.
Add a kconfig each, like dm-verity does, but default to enabled if
the dependencies are available.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
[FW: fixed some style issues]
Signed-off-by: Fan Wu <wufan@kernel.org>
Currently IPE accepts an update that has the same version as the policy
being updated, but it doesn't make it a no-op nor it checks that the
old and new policyes are the same. So it is possible to change the
content of a policy, without changing its version. This is very
confusing from userspace when managing policies.
Instead change the update logic to reject updates that have the same
version with ESTALE, as that is much clearer and intuitive behaviour.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
When loading policies in userspace we want a recognizable error when an
update attempts to use an old policy, as that is an error that needs
to be treated differently from an invalid policy. Use -ESTALE as it is
clear enough for an update mechanism.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
Currently, fsnotify_open_perm() is called from security_file_open().
This is a a bit unexpected and creates otherwise unnecessary dependency
of CONFIG_FANOTIFY_ACCESS_PERMISSIONS on CONFIG_SECURITY. Fix this by
calling fsnotify_open_perm() directly.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241013002248.3984442-1-song@kernel.org