__hashtab_insert() in hashtab.h has a cleaner interface that allows the
caller to specify the chain node location that the new node is being
inserted into so that it can update the node that currently occupies it.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The "sb_kern_mount" hook has implementation registered in SELinux.
Looking at the function implementation we observe that the "sb"
parameter is not changing.
Mark the "sb" parameter of LSM hook security_sb_kern_mount() as "const"
since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
[PM: minor merge fuzzing due to other constification patches]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Three LSMs register the implementations for the 'bprm_committed_creds()'
hook: AppArmor, SELinux and tomoyo. Looking at the function
implementations we may observe that the 'bprm' parameter is not changing.
Mark the 'bprm' parameter of LSM hook security_bprm_committed_creds() as
'const' since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
[PM: minor merge fuzzing due to other constification patches]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The 'bprm_committing_creds' hook has implementations registered in
SELinux and Apparmor. Looking at the function implementations we observe
that the 'bprm' parameter is not changing.
Mark the 'bprm' parameter of LSM hook security_bprm_committing_creds()
as 'const' since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
SELinux registers the implementation for the "quotactl" hook. Looking at
the function implementation we observe that the parameter "sb" is not
changing.
Mark the "sb" parameter of LSM hook security_quotactl() as "const" since
it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
avtab_read_item() is a hot function called when reading each rule in a
binary policydb. With the current Fedora policy and refpolicy, this
function is called nearly 100,000 times per policy load.
A single avtab node is only permitted to have a single specifier to
describe the data it holds. As such, a check is performed to make sure
only one specifier is set. Previously this was done via a for-loop.
However, there is already an optimal function for finding the number of
bits set (hamming weight) and on some architectures, dedicated
instructions (popcount) which can be executed much more efficiently.
Even when using -mcpu=generic on a x86-64 Fedora 38 VM, this commit
results in a modest 2-4% speedup for policy loading due to a substantial
reduction in the number of instructions executed.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The number of buckets is calculated by performing a binary AND against
the mask of the hash table, which is one less than its size (which is a
power of two). This leads to all top bits being discarded, e.g. with
the Reference Policy on Debian there exists 376 entries, leading to a
size of 512, discarding the top 23 bits.
Use jhash to improve the hash table utilization:
# current
roletr: 376 entries and 124/512 buckets used,
longest chain length 8, sum of chain length^2 1496
# patch
roletr: 376 entries and 266/512 buckets used,
longest chain length 4, sum of chain length^2 646
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line wrap in the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Instead of dividing by 8 and then performing log2 by hand, use a more
readable calculation.
The behavior of rounddown_pow_of_two() for an input of 0 is undefined,
so handle that case and small values manually to achieve the same
results.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
If the SELinux debug configuration is enabled define the macro DEBUG
such that pr_debug() calls are always enabled, regardless of
CONFIG_DYNAMIC_DEBUG, since those message are the main reason for this
configuration in the first place.
Mention example usage in case CONFIG_DYNAMIC_DEBUG is enabled in the
help section of the configuration.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Print the sum of chain lengths squared as a metric for hash tables to
provide more insights, similar to avtabs.
While on it add a comma in the avtab message to improve readability of
the output.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
selinux_set_mnt_opts() relies on the fact that the mount options pointer
is always NULL when all options are unset (specifically in its
!selinux_initialized() branch. However, the new
selinux_fs_context_submount() hook breaks this rule by allocating a new
structure even if no options are set. That causes any submount created
before a SELinux policy is loaded to be rejected in
selinux_set_mnt_opts().
Fix this by making selinux_fs_context_submount() leave fc->security
set to NULL when there are no options to be copied from the reference
superblock.
Cc: <stable@vger.kernel.org>
Reported-by: Adam Williamson <awilliam@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2236345
Fixes: d80a8f1b58 ("vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct sidtab_str_cache.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: selinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKLcUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXM/Eg//cwaOu/ASS08Cz/tfXeKpzg9UpzbW
uHqGtgdE9ZEvS71z+3dorOJVPEwPr+/yviq3FXYjYHFqvVhLZCvYM9rw+eNo/k4T
I95UTchGUsMWwkw61YBDLythfXm2UL5nabjckO81i9UPtxUYOwF6xQMQXYyMcLL8
6fm1vnCvK5FBEXi2HSUWy3Eb3wdviGdHrL6h19Aeew+q8u33asWSxn9vmBSSFEzZ
492//Pgy0t3FA6paWXQRvoR+GvLgBXNOvHB68cAx9vS8Lq6mAwJJSCRrQtKGh2Gd
YInr49f+TXOosD5Tm6ueWO4sr8RzQZ7nPyM+BLue4Yn2ZzdYgjwfHdkHWS1KeH5X
qVqa9s6/QONvkSCzqHs/ne2qio1Q0/0uGgwOkx6N7oVWQWjE7iTYlADwM0CDJnd2
UD7AHTOgpc88x1T1eW599MZttSCznBTSFXv4waaS5/5NT9n8Db1TpTtCTedOc1x2
n+c+F5BHLy69vhSGCanvum/8i2gNoKVyYaHyaMsQxr5LRcLnvN6oOjWIv7jMKxe7
GavUAxU7M5rxPUH44vrrrI+XztKJOdpCz4S0xp+7pSSSGAK5KkmVVLXjzrlGO1WS
55ixxQWYTGK0KlWHp4Ofi6brE9a4ATKcd1XscPN+AtBYX2ufNHLskCZulu/lyrMx
lAy9RRDe1hHWTvg=
=dnm4
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Add proper multi-LSM support for xattrs in the
security_inode_init_security() hook
Historically the LSM layer has only allowed a single LSM to add an
xattr to an inode, with IMA/EVM measuring that and adding its own as
well. As we work towards promoting IMA/EVM to a "proper LSM" instead
of the special case that it is now, we need to better support the
case of multiple LSMs each adding xattrs to an inode and after
several attempts we now appear to have something that is working
well. It is worth noting that in the process of making this change we
uncovered a problem with Smack's SMACK64TRANSMUTE xattr which is also
fixed in this pull request.
- Additional LSM hook constification
Two patches to constify parameters to security_capget() and
security_binder_transfer_file(). While I generally don't make a
special note of who submitted these patches, these were the work of
an Outreachy intern, Khadija Kamran, and that makes me happy;
hopefully it does the same for all of you reading this.
- LSM hook comment header fixes
One patch to add a missing hook comment header, one to fix a minor
typo.
- Remove an old, unused credential function declaration
It wasn't clear to me who should pick this up, but it was trivial,
obviously correct, and arguably the LSM layer has a vested interest
in credentials so I merged it. Sadly I'm now noticing that despite my
subject line cleanup I didn't cleanup the "unsued" misspelling, sigh
* tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: constify the 'file' parameter in security_binder_transfer_file()
lsm: constify the 'target' parameter in security_capget()
lsm: add comment block for security_sk_classify_flow LSM hook
security: Fix ret values doc for security_inode_init_security()
cred: remove unsued extern declaration change_create_files_as()
evm: Support multiple LSMs providing an xattr
evm: Align evm_inode_init_security() definition with LSM infrastructure
smack: Set the SMACK64TRANSMUTE xattr in smack_inode_init_security()
security: Allow all LSMs to provide xattrs for inode_init_security hook
lsm: fix typo in security_file_lock() comment header
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKKAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNTVBAAjFt/+t74FkH7OeIuwa4QFodNHWLS
4AgtudUrH0oE/JnDDZsMNm3gjtFhM0cYcCCvItjNmea9sCMfR/huAaQXuYldLI/w
IF2/n31Q9SlaHF8xgdpPg55tudx7khGoecC+aHA/qsRoikjCvUhOiWJZT4xn92R7
3DtPMyoY7/1Bw8TWhq9Kj4ks2evx+o9Q798Nxu6XKUWZ6X+CasQfH0bAWWcmh8sL
Z/mh5pyICmg9/sHvTdi3/VhhhSiWY0gyaX3tBo7Y3z6J4xjBzLzpJTC2TMaqpqLx
nUO2R02Bk6PgMjQcVYl77WlUOfwPz3L2NkWcCmTBNzyBzirl9pUjVZ7ay9fek4Zf
GSnoZ3IPNLxsK/oU4Zh+LKzkBpq7astovebFd4SYG+KHIhXkp8eblzxM5/M+LfG/
SXnGrLaCwxWp4jWlYubufPtKhF8jFlpzRvTELKrhU9pZCgzpvwvIJZhUGVP6cDWh
7j9RRJy1wBgqoDASM5+tCrplkwFa3r4AV5CVOFYSXgyvLCHk28IxcMEVqCvoO3yQ
FmvAi0MV85XOa/NpD3hYpnvfAcLbv6ftZC6JekqXMjwVE9vfdrywS4ZW4Z3BcEo+
M+eDsCh8ek9LmO+6IW0jffNuHijLKDr3hywnndNWascrmkmgDmu60kW/HIQORkPa
tZkswaKUa6RszR0=
=XB/b
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Thirty three SELinux patches, which is a pretty big number for us, but
there isn't really anything scary in here; in fact we actually manage
to remove 10 lines of code with this :)
- Promote the SELinux DEBUG_HASHES macro to CONFIG_SECURITY_SELINUX_DEBUG
The DEBUG_HASHES macro was a buried SELinux specific preprocessor
debug macro that was a problem waiting to happen. Promoting the
debug macro to a proper Kconfig setting should help both improve
the visibility of the feature as well enable improved test
coverage. We've moved some additional debug functions under the
CONFIG_SECURITY_SELINUX_DEBUG flag and we may see more work in the
future.
- Emit a pr_notice() message if virtual memory is executable by default
As this impacts the SELinux access control policy enforcement, if
the system's configuration is such that virtual memory is
executable by default we print a single line notice to the console.
- Drop avtab_search() in favor of avtab_search_node()
Both functions are nearly identical so we removed avtab_search()
and converted the callers to avtab_search_node().
- Add some SELinux network auditing helpers
The helpers not only reduce a small amount of code duplication, but
they provide an opportunity to improve UDP flood performance
slightly by delaying initialization of the audit data in some
cases.
- Convert GFP_ATOMIC allocators to GFP_KERNEL when reading SELinux policy
There were two SELinux policy load helper functions that were
allocating memory using GFP_ATOMIC, they have been converted to
GFP_KERNEL.
- Quiet a KMSAN warning in selinux_inet_conn_request()
A one-line error path (re)set patch that resolves a KMSAN warning.
It is important to note that this doesn't represent a real bug in
the current code, but it quiets KMSAN and arguably hardens the code
against future changes.
- Cleanup the policy capability accessor functions
This is a follow-up to the patch which reverted SELinux to using a
global selinux_state pointer. This patch cleans up some artifacts
of that change and turns each accessor into a one-line READ_ONCE()
call into the policy capabilities array.
- A number of patches from Christian Göttsche
Christian submitted almost two-thirds of the patches in this pull
request as he worked to harden the SELinux code against type
differences, variable overflows, etc.
- Support for separating early userspace from the kernel in policy,
with a later revert
We did have a patch that added a new userspace initial SID which
would allow SELinux to distinguish between early user processes
created before the initial policy load and the kernel itself.
Unfortunately additional post-merge testing revealed a problematic
interaction with an old SELinux userspace on an old version of
Ubuntu so we've reverted the patch until we can resolve the
compatibility issue.
- Remove some outdated comments dealing with LSM hook registration
When we removed the runtime disable functionality we forgot to
remove some old comments discussing the importance of LSM hook
registration ordering.
- Minor administrative changes
Stephen Smalley updated his email address and "debranded" SELinux
from "NSA SELinux" to simply "SELinux". We've come a long way from
the original NSA submission and I would consider SELinux a true
community project at this point so removing the NSA branding just
makes sense"
* tag 'selinux-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (33 commits)
selinux: prevent KMSAN warning in selinux_inet_conn_request()
selinux: use unsigned iterator in nlmsgtab code
selinux: avoid implicit conversions in policydb code
selinux: avoid implicit conversions in selinuxfs code
selinux: make left shifts well defined
selinux: update type for number of class permissions in services code
selinux: avoid implicit conversions in avtab code
selinux: revert SECINITSID_INIT support
selinux: use GFP_KERNEL while reading binary policy
selinux: update comment on selinux_hooks[]
selinux: avoid implicit conversions in services code
selinux: avoid implicit conversions in mls code
selinux: use identical iterator type in hashtab_duplicate()
selinux: move debug functions into debug configuration
selinux: log about VM being executable by default
selinux: fix a 0/NULL mistmatch in ad_net_init_from_iif()
selinux: introduce SECURITY_SELINUX_DEBUG configuration
selinux: introduce and use lsm_ad_net_init*() helpers
selinux: update my email address
selinux: add missing newlines in pr_err() statements
...
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support tracking
KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the GENERIC_IOREMAP
ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency improvements
("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation, from
Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code ("Two
minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table range
API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM subsystem
documentation ("Improve mm documentation").
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
=Afws
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
Core
----
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with large
writes operations.
- Store netdevs in an xarray, to simplify iterating over netdevs.
- Refactor nexthop selection for multipath routes.
- Improve sched class lifetime handling.
- Add backup nexthop ID support for bridge.
- Implement drop reasons support in openvswitch.
- Several data races annotations and fixes.
- Constify the sk parameter of routing functions.
- Prepend kernel version to netconsole message.
Protocols
---------
- Implement support for TCP probing the peer being under memory
pressure.
- Remove hard coded limitation on IPv6 specific info placement
inside the socket struct.
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated
per socket scaling factor.
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes.
- In-kernel support for the TLS alert protocol.
- Better support for UDP reuseport with connected sockets.
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size.
- Get rid of additional ancillary per MPTCP connection struct socket.
- Implement support for BPF-based MPTCP packet schedulers.
- Format MPTCP subtests selftests results in TAP.
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation.
BPF
---
- Multi-buffer support in AF_XDP.
- Add multi uprobe BPF links for attaching multiple uprobes
and usdt probes, which is significantly faster and saves extra fds.
- Implement an fd-based tc BPF attach API (TCX) and BPF link support on
top of it.
- Add SO_REUSEPORT support for TC bpf_sk_assign.
- Support new instructions from cpu v4 to simplify the generated code and
feature completeness, for x86, arm64, riscv64.
- Support defragmenting IPv(4|6) packets in BPF.
- Teach verifier actual bounds of bpf_get_smp_processor_id()
and fix perf+libbpf issue related to custom section handling.
- Introduce bpf map element count and enable it for all program types.
- Add a BPF hook in sys_socket() to change the protocol ID
from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy.
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress.
- Add uprobe support for the bpf_get_func_ip helper.
- Check skb ownership against full socket.
- Support for up to 12 arguments in BPF trampoline.
- Extend link_info for kprobe_multi and perf_event links.
Netfilter
---------
- Speed-up process exit by aborting ruleset validation if a
fatal signal is pending.
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types.
Driver API
----------
- Page pool optimizations, to improve data locality and cache usage.
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need
for raw ioctl() handling in drivers.
- Simplify genetlink dump operations (doit/dumpit) providing them
the common information already populated in struct genl_info.
- Extend and use the yaml devlink specs to [re]generate the split ops.
- Introduce devlink selective dumps, to allow SF filtering SF based on
handle and other attributes.
- Add yaml netlink spec for netlink-raw families, allow route, link and
address related queries via the ynl tool.
- Remove phylink legacy mode support.
- Support offload LED blinking to phy.
- Add devlink port function attributes for IPsec.
New hardware / drivers
----------------------
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers
-------
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq
9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr
qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO
/QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq
OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz
6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH
ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1
/daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3
Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW
Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n
4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP
DCW8VGFOZjZM
=9CsR
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory
pressure
- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers:
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTxQAKCRCRxhvAZXjc
okaVAP94WAlItvDRt/z2Wtzf0+RqPZeTXEdGTxua8+RxqCyYIQD+OO5nRfKQPHlV
AqqGJMKItQMSMIYgB5ftqVhNWZfnHgM=
=pSEW
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual filesystems.
Features:
- Block mode changes on symlinks and rectify our broken semantics
- Report file modifications via fsnotify() for splice
- Allow specifying an explicit timeout for the "rootwait" kernel
command line option. This allows to timeout and reboot instead of
always waiting indefinitely for the root device to show up
- Use synchronous fput for the close system call
Cleanups:
- Get rid of open-coded lockdep workarounds for async io submitters
and replace it all with a single consolidated helper
- Simplify epoll allocation helper
- Convert simple_write_begin and simple_write_end to use a folio
- Convert page_cache_pipe_buf_confirm() to use a folio
- Simplify __range_close to avoid pointless locking
- Disable per-cpu buffer head cache for isolated cpus
- Port ecryptfs to kmap_local_page() api
- Remove redundant initialization of pointer buf in pipe code
- Unexport the d_genocide() function which is only used within core
vfs
- Replace printk(KERN_ERR) and WARN_ON() with WARN()
Fixes:
- Fix various kernel-doc issues
- Fix refcount underflow for eventfds when used as EFD_SEMAPHORE
- Fix a mainly theoretical issue in devpts
- Check the return value of __getblk() in reiserfs
- Fix a racy assert in i_readcount_dec
- Fix integer conversion issues in various functions
- Fix LSM security context handling during automounts that prevented
NFS superblock sharing"
* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
cachefiles: use kiocb_{start,end}_write() helpers
ovl: use kiocb_{start,end}_write() helpers
aio: use kiocb_{start,end}_write() helpers
io_uring: use kiocb_{start,end}_write() helpers
fs: create kiocb_{start,end}_write() helpers
fs: add kerneldoc to file_{start,end}_write() helpers
io_uring: rename kiocb_end_write() local helper
splice: Convert page_cache_pipe_buf_confirm() to use a folio
libfs: Convert simple_write_begin and simple_write_end to use a folio
fs/dcache: Replace printk and WARN_ON by WARN
fs/pipe: remove redundant initialization of pointer buf
fs: Fix kernel-doc warnings
devpts: Fix kernel-doc warnings
doc: idmappings: fix an error and rephrase a paragraph
init: Add support for rootwait timeout parameter
vfs: fix up the assert in i_readcount_dec
fs: Fix one kernel-doc comment
docs: filesystems: idmappings: clarify from where idmappings are taken
fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc
oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2
XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0=
=eJz5
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner:
"This adds VFS support for multi-grain timestamps and converts tmpfs,
xfs, ext4, and btrfs to use them. This carries acks from all relevant
filesystems.
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot of metadata updates, down to around 1 per
jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support
a change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps
(e.g., backup applications).
If we were to always use fine-grained timestamps, that would improve
the situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
This introduces fine-grained timestamps that are used when they are
actively queried.
This uses the 31st bit of the ctime tv_nsec field to indicate that
something has queried the inode for the mtime or ctime. When this flag
is set, on the next mtime or ctime update, the kernel will fetch a
fine-grained timestamp instead of the usual coarse-grained one.
As POSIX generally mandates that when the mtime changes, the ctime
must also change the kernel always stores normalized ctime values, so
only the first 30 bits of the tv_nsec field are ever used.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.
Various preparatory changes, fixes and cleanups are included:
- Fixup all relevant places where POSIX requires updating ctime
together with mtime. This is a wide-range of places and all
maintainers provided necessary Acks.
- Add new accessors for inode->i_ctime directly and change all
callers to rely on them. Plain accesses to inode->i_ctime are now
gone and it is accordingly rename to inode->__i_ctime and commented
as requiring accessors.
- Extend generic_fillattr() to pass in a request mask mirroring in a
sense the statx() uapi. This allows callers to pass in a request
mask to only get a subset of attributes filled in.
- Rework timestamp updates so it's possible to drop the @now
parameter the update_time() inode operation and associated helpers.
- Add inode_update_timestamps() and convert all filesystems to it
removing a bunch of open-coding"
* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
tmpfs: add support for multigrain timestamps
fs: add infrastructure for multigrain timestamps
fs: drop the timespec64 argument from update_time
xfs: have xfs_vn_update_time gets its own timestamp
fat: make fat_update_time get its own timestamp
fat: remove i_version handling from fat_update_time
ubifs: have ubifs_update_time use inode_update_timestamps
btrfs: have it use inode_update_timestamps
fs: drop the timespec64 arg from generic_update_time
fs: pass the request_mask to generic_fillattr
fs: remove silly warning from current_time
gfs2: fix timestamp handling on quota inodes
fs: rename i_ctime field to __i_ctime
selinux: convert to ctime accessor functions
security: convert to ctime accessor functions
apparmor: convert to ctime accessor functions
sunrpc: convert to ctime accessor functions
...
Use the helpers to simplify code.
Link: https://lkml.kernel.org/r/20230728050043.59880-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Christian Göttsche <cgzones@googlemail.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Set the next pointer in filename_trans_read_helper() before attaching
the new node under construction to the list, otherwise garbage would be
dereferenced on subsequent failure during cleanup in the out goto label.
Cc: <stable@vger.kernel.org>
Fixes: 4300590243 ("selinux: implement new format of filename transitions")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
KMSAN reports the following issue:
[ 81.822503] =====================================================
[ 81.823222] BUG: KMSAN: uninit-value in selinux_inet_conn_request+0x2c8/0x4b0
[ 81.823891] selinux_inet_conn_request+0x2c8/0x4b0
[ 81.824385] security_inet_conn_request+0xc0/0x160
[ 81.824886] tcp_v4_route_req+0x30e/0x490
[ 81.825343] tcp_conn_request+0xdc8/0x3400
[ 81.825813] tcp_v4_conn_request+0x134/0x190
[ 81.826292] tcp_rcv_state_process+0x1f4/0x3b40
[ 81.826797] tcp_v4_do_rcv+0x9ca/0xc30
[ 81.827236] tcp_v4_rcv+0x3bf5/0x4180
[ 81.827670] ip_protocol_deliver_rcu+0x822/0x1230
[ 81.828174] ip_local_deliver_finish+0x259/0x370
[ 81.828667] ip_local_deliver+0x1c0/0x450
[ 81.829105] ip_sublist_rcv+0xdc1/0xf50
[ 81.829534] ip_list_rcv+0x72e/0x790
[ 81.829941] __netif_receive_skb_list_core+0x10d5/0x1180
[ 81.830499] netif_receive_skb_list_internal+0xc41/0x1190
[ 81.831064] napi_complete_done+0x2c4/0x8b0
[ 81.831532] e1000_clean+0x12bf/0x4d90
[ 81.831983] __napi_poll+0xa6/0x760
[ 81.832391] net_rx_action+0x84c/0x1550
[ 81.832831] __do_softirq+0x272/0xa6c
[ 81.833239] __irq_exit_rcu+0xb7/0x1a0
[ 81.833654] irq_exit_rcu+0x17/0x40
[ 81.834044] common_interrupt+0x8d/0xa0
[ 81.834494] asm_common_interrupt+0x2b/0x40
[ 81.834949] default_idle+0x17/0x20
[ 81.835356] arch_cpu_idle+0xd/0x20
[ 81.835766] default_idle_call+0x43/0x70
[ 81.836210] do_idle+0x258/0x800
[ 81.836581] cpu_startup_entry+0x26/0x30
[ 81.837002] __pfx_ap_starting+0x0/0x10
[ 81.837444] secondary_startup_64_no_verify+0x17a/0x17b
[ 81.837979]
[ 81.838166] Local variable nlbl_type.i created at:
[ 81.838596] selinux_inet_conn_request+0xe3/0x4b0
[ 81.839078] security_inet_conn_request+0xc0/0x160
KMSAN warning is reproducible with:
* netlabel_mgmt_protocount is 0 (e.g. netlbl_enabled() returns 0)
* CONFIG_SECURITY_NETWORK_XFRM may be set or not
* CONFIG_KMSAN=y
* `ssh USER@HOSTNAME /bin/date`
selinux_skb_peerlbl_sid() will call selinux_xfrm_skb_sid(), then fall
to selinux_netlbl_skbuff_getsid() which will not initialize nlbl_type,
but it will be passed to:
err = security_net_peersid_resolve(nlbl_sid,
nlbl_type, xfrm_sid, sid);
and checked by KMSAN, although it will not be used inside
security_net_peersid_resolve() (at least now), since this function
will check either (xfrm_sid == SECSID_NULL) or (nlbl_sid ==
SECSID_NULL) first and return before using uninitialized nlbl_type.
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
[PM: subject line tweak, removed 'fixes' tag as code is not broken]
Signed-off-by: Paul Moore <paul@paul-moore.com>
SELinux registers the implementation for the "binder_transfer_file"
hook. Looking at the function implementation we observe that the
parameter "file" is not changing.
Mark the "file" parameter of LSM hook security_binder_transfer_file() as
"const" since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
[PM: subject line whitespace fix]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical type for local variables, e.g. loop counters.
Declare members of struct policydb_compat_info unsigned to consistently
use unsigned iterators. They hold read-only non-negative numbers in the
global variable policydb_compat.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use umode_t as parameter type for sel_make_inode(), which assigns the
value to the member i_mode of struct inode.
Use identical and unsigned types for loop iterators.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The loops upper bound represent the number of permissions used (for the
current class or in general). The limit for this is 32, thus we might
left shift of one less, 31. Shifting a base of 1 results in undefined
behavior; use (u32)1 as base.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Security classes have only up to 32 permissions, hence using an u16 is
sufficient (while improving padding in struct selinux_mapping).
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return u32 from avtab_hash() instead of int, since the hashing is done
on u32 and the result is used as an index on the hash array.
Use the type of the limit in for loops.
Avoid signed to unsigned conversion of multiplication result in
avtab_hash_eval() and perform multiplication in destination type.
Use unsigned loop iterator for index operations, to avoid sign
extension.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This commit reverts 5b0eea835d ("selinux: introduce an initial SID
for early boot processes") as it was found to cause problems on
distros with old SELinux userspace tools/libraries, specifically
Ubuntu 16.04.
Hopefully we will be able to re-add this functionality at a later
date, but let's revert this for now to help ensure a stable and
backwards compatible SELinux tree.
Link: https://lore.kernel.org/selinux/87edkseqf8.fsf@mail.lhotse
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Three LSMs register the implementations for the "capget" hook: AppArmor,
SELinux, and the normal capability code. Looking at the function
implementations we may observe that the first parameter "target" is not
changing.
Mark the first argument "target" of LSM hook security_capget() as
"const" since it will not be changing in the LSM hook.
cap_capget() LSM hook declaration exceeds the 80 characters per line
limit. Split the function declaration to multiple lines to decrease the
line length.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
[PM: align the cap_capget() declaration, spelling fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use GFP_KERNEL instead of GFP_ATOMIC while reading a binary policy in
sens_read() and cat_read(), similar to surrounding code.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
After commit f22f9aaf6c ("selinux: remove the runtime disable
functionality"), the comment on selinux_hooks[] is out-of-date,
remove the last paragraph about runtime disable functionality.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use u32 as the output parameter type in security_get_classes() and
security_get_permissions(), based on the type of the symtab nprim
member.
Declare the read-only class string parameter of
security_get_permissions() const.
Avoid several implicit conversions by using the identical type for the
destination.
Use the type identical to the source for local variables.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: cleanup extra whitespace in subject]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use u32 for ebitmap bits and sensitivity levels, char for the default
range of a class.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: description tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical type u32 for the loop iterator.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: remove extra whitespace in subject]
Signed-off-by: Paul Moore <paul@paul-moore.com>
avtab_hash_eval() and hashtab_stat() are only used in policydb.c when
the configuration SECURITY_SELINUX_DEBUG is enabled.
Move the function definitions under that configuration as well and
provide empty definitions in case SECURITY_SELINUX_DEBUG is disabled, to
avoid using #ifdef in the callers.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In case virtual memory is being marked as executable by default, SELinux
checks regarding explicit potential dangerous use are disabled.
Inform the user about it.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-89-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use a NULL instead of a zero to resolve a int/pointer mismatch.
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307210332.4AqFZfzI-lkp@intel.com/
Fixes: dd51fcd42f ("selinux: introduce and use lsm_ad_net_init*() helpers")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The policy database code contains several debug output statements
related to hashtable utilization. Those are guarded by the macro
DEBUG_HASHES, which is neither documented nor set anywhere.
Introduce a new Kconfig configuration guarding this and potential
other future debugging related code. Disable the setting by default.
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: fixed line lengths in the help text]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Perf traces of network-related workload shows a measurable overhead
inside the network-related selinux hooks while zeroing the
lsm_network_audit struct.
In most cases we can delay the initialization of such structure to the
usage point, avoiding such overhead in a few cases.
Additionally, the audit code accesses the IP address information only
for AF_INET* families, and selinux_parse_skb() will fill-out the
relevant fields in such cases. When the family field is zeroed or the
initialization is followed by the mentioned parsing, the zeroing can be
limited to the sk, family and netif fields.
By factoring out the audit-data initialization to new helpers, this
patch removes some duplicate code and gives small but measurable
performance gain under UDP flood.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Update my email address; MAINTAINERS was updated some time ago.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The kernel print statements do not append an implicit newline to format
strings.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
avtab_search() shares the same logic with avtab_search_node(), except
that it returns, if found, a pointer to the struct avtab_node member
datum instead of the node itself. Since the member is an embedded
struct, and not a pointer, the returned value of avtab_search() and
avtab_search_node() will always in unison either be NULL or non-NULL.
Drop avtab_search() and replace its calls by avtab_search_node() to
deduplicate logic and adopt the only caller caring for the type of
the returned value accordingly.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Change "NSA SELinux" to just "SELinux" in Kconfig help text and
comments. While NSA was the original primary developer and continues to
help maintain SELinux, SELinux has long since transitioned to a wide
community of developers and maintainers. SELinux has been part of the
mainline Linux kernel for nearly 20 years now [1] and has received
contributions from many individuals and organizations.
[1] https://lore.kernel.org/lkml/Pine.LNX.4.44.0308082228470.1852-100000@home.osdl.org/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the type bool as parameter type in
selinux_status_update_setenforce(). The related function
enforcing_enabled() returns the type bool, while the struct
selinux_kernel_status member enforcing uses an u32.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
hashtab_init() takes an u32 as size parameter type.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The specifier for avtab keys is always supplied with a type of u16,
either as a macro to security_compute_sid() or the member specified of
the struct avtab_key.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical types in assignments of local variables for the
destination.
Merge tail calls into return statements.
Avoid using leading underscores for function local variable.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use a consistent type of u32 for sequence numbers.
Use a non-negative and input parameter matching type for the hash
result.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical type sel_netif_hashfn() returns.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Align the type with the one used in selinux_notify_policy_change() and
the sequence member of struct selinux_kernel_status.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Prevent inserting more than the supported U32_MAX number of entries.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The function is always inlined and most of the time both relevant
arguments are compile time constants, allowing compilers to elide the
check. Also the function is part of outputting the policy, which is not
performance critical.
Also convert the type of the third parameter into a size_t, since it
should always be a non-negative number of elements.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The sk_getsecid hook shouldn't need to modify its socket argument.
Make it const so that callers of security_sk_classify_flow() can use a
const struct sock *.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, SELinux doesn't allow distinguishing between kernel threads
and userspace processes that are started before the policy is first
loaded - both get the label corresponding to the kernel SID. The only
way a process that persists from early boot can get a meaningful label
is by doing a voluntary dyntransition or re-executing itself.
Reusing the kernel label for userspace processes is problematic for
several reasons:
1. The kernel is considered to be a privileged domain and generally
needs to have a wide range of permissions allowed to work correctly,
which prevents the policy writer from effectively hardening against
early boot processes that might remain running unintentionally after
the policy is loaded (they represent a potential extra attack surface
that should be mitigated).
2. Despite the kernel being treated as a privileged domain, the policy
writer may want to impose certain special limitations on kernel
threads that may conflict with the requirements of intentional early
boot processes. For example, it is a good hardening practice to limit
what executables the kernel can execute as usermode helpers and to
confine the resulting usermode helper processes. However, a
(legitimate) process surviving from early boot may need to execute a
different set of executables.
3. As currently implemented, overlayfs remembers the security context of
the process that created an overlayfs mount and uses it to bound
subsequent operations on files using this context. If an overlayfs
mount is created before the SELinux policy is loaded, these "mounter"
checks are made against the kernel context, which may clash with
restrictions on the kernel domain (see 2.).
To resolve this, introduce a new initial SID (reusing the slot of the
former "init" initial SID) that will be assigned to any userspace
process started before the policy is first loaded. This is easy to do,
as we can simply label any process that goes through the
bprm_creds_for_exec LSM hook with the new init-SID instead of
propagating the kernel SID from the parent.
To provide backwards compatibility for existing policies that are
unaware of this new semantic of the "init" initial SID, introduce a new
policy capability "userspace_initial_context" and set the "init" SID to
the same context as the "kernel" SID unless this capability is set by
the policy.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In the process of reverting back to directly accessing the global
selinux_state pointer we left behind some artifacts in the
selinux_policycap_XXX() helper functions. This patch cleans up
some of that left-behind cruft.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently, the LSM infrastructure supports only one LSM providing an xattr
and EVM calculating the HMAC on that xattr, plus other inode metadata.
Allow all LSMs to provide one or multiple xattrs, by extending the security
blob reservation mechanism. Introduce the new lbs_xattr_count field of the
lsm_blob_sizes structure, so that each LSM can specify how many xattrs it
needs, and the LSM infrastructure knows how many xattr slots it should
allocate.
Modify the inode_init_security hook definition, by passing the full
xattr array allocated in security_inode_init_security(), and the current
number of xattr slots in that array filled by LSMs. The first parameter
would allow EVM to access and calculate the HMAC on xattrs supplied by
other LSMs, the second to not leave gaps in the xattr array, when an LSM
requested but did not provide xattrs (e.g. if it is not initialized).
Introduce lsm_get_xattr_slot(), which LSMs can call as many times as the
number specified in the lbs_xattr_count field of the lsm_blob_sizes
structure. During each call, lsm_get_xattr_slot() increments the number of
filled xattrs, so that at the next invocation it returns the next xattr
slot to fill.
Cleanup security_inode_init_security(). Unify the !initxattrs and
initxattrs case by simply not allocating the new_xattrs array in the
former. Update the documentation to reflect the changes, and fix the
description of the xattr name, as it is not allocated anymore.
Adapt both SELinux and Smack to use the new definition of the
inode_init_security hook, and to call lsm_get_xattr_slot() to obtain and
fill the reserved slots in the xattr array.
Move the xattr->name assignment after the xattr->value one, so that it is
done only in case of successful memory allocation.
Finally, change the default return value of the inode_init_security hook
from zero to -EOPNOTSUPP, so that BPF LSM correctly follows the hook
conventions.
Reported-by: Nicolas Bouchinet <nicolas.bouchinet@clip-os.org>
Link: https://lore.kernel.org/linux-integrity/Y1FTSIo+1x+4X0LS@archlinux/
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: minor comment and variable tweaks, approved by RS]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Avoid using the identifier `bool` to improve support with future C
standards. C23 is about to make `bool` a predefined macro (see N2654).
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
As noted in the comments of this commit, the current SELinux Makefile
requires features found in make v4.3 or later, which is problematic
as the Linux Kernel currently only requires make v3.82. This patch
fixes the SELinux Makefile so that it works properly on these older
versions of make, and adds a couple of comments to the Makefile about
how it can be improved once make v4.3 is required by the kernel.
Fixes: 6f933aa7df ("selinux: more Makefile tweaks")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently, when an NFS filesystem that supports passing LSM/SELinux
labels is mounted during early boot (before the SELinux policy is
loaded), it ends up mounted without the labeling support (i.e. with
Fedora policy all files get the generic NFS label
system_u:object_r:nfs_t:s0).
This is because the information that the NFS mount supports passing
labels (communicated to the LSM layer via the kern_flags argument of
security_set_mnt_opts()) gets lost and when the policy is loaded the
mount is initialized as if the passing is not supported.
Fix this by noting the "native labeling" in newsbsec->flags (using a new
SE_SBNATIVE flag) on the pre-policy-loaded call of
selinux_set_mnt_opts() and then making sure it is respected on the
second call from delayed_superblock_init().
Additionally, make inode_doinit_with_dentry() initialize the inode's
label from its extended attributes whenever it doesn't find it already
intitialized by the filesystem. This is needed to properly initialize
pre-existing inodes when delayed_superblock_init() is called. It should
not trigger in any other cases (and if it does, it's still better to
initialize the correct label instead of leaving the inode unlabeled).
Fixes: eb9ae68650 ("SELinux: Add new labeling type native labels")
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: fixed 'Fixes' tag format]
Signed-off-by: Paul Moore <paul@paul-moore.com>
exit_sel_fs() has been removed since commit f22f9aaf6c ("selinux:
remove the runtime disable functionality").
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The object context type `fs`, not to be confused with the well used
object context type `fscon`, was introduced in the initial git commit
1da177e4c3 ("Linux-2.6.12-rc2") but never actually used since.
The paper "A Security Policy Configuration for the Security-Enhanced
Linux" [1] mentions it under `7.2 File System Contexts` but also states:
Currently, this configuration is unused.
The policy statement defining such object contexts is `fscon`, e.g.:
fscon 2 3 gen_context(system_u:object_r:conA_t,s0) \
gen_context(system_u:object_r:conB_t,s0)
It is not documented at selinuxproject.org or in the SELinux notebook
and not supported by the Reference Policy buildsystem - the statement is
not properly sorted - and thus not used in the Reference or Fedora
Policy.
Print a warning message at policy load for each such object context:
SELinux: void and deprecated fs ocon 02:03
This topic was initially highlighted by Nicolas Iooss [2].
[1]: https://media.defense.gov/2021/Jul/29/2002815735/-1/-1/0/SELINUX-SECURITY-POLICY-CONFIGURATION-REPORT.PDF
[2]: https://lore.kernel.org/selinux/CAJfZ7=mP2eJaq2BfO3y0VnwUJaY2cS2p=HZMN71z1pKjzaT0Eg@mail.gmail.com/
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: tweaked deprecation comment, description line wrapping]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Include all necessary headers in header files to enable third party
applications, like LSP servers, to resolve all used symbols.
ibpkey.h: include "flask.h" for SECINITSID_UNLABELED
initial_sid_to_string.h: include <linux/stddef.h> for NULL
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 53f3517ae0 ("selinux: do not leave dangling pointer behind")
reset the `str` field of the `context` struct in an OOM error branch.
In this struct the fields `str` and `len` are coupled and should be kept
in sync. Set the length to zero according to the string be set to NULL.
Fixes: 53f3517ae0 ("selinux: do not leave dangling pointer behind")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Newly added subflows should inherit the LSM label from the associated
MPTCP socket regardless of the current context.
This patch implements the above copying sid and class from the MPTCP
socket context, deleting the existing subflow label, if any, and then
re-creating the correct one.
The new helper reuses the selinux_netlbl_sk_security_free() function,
and the latter can end-up being called multiple times with the same
argument; we additionally need to make it idempotent.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A few small tweaks to selinux_audit_rule_init():
- Adjust how we use the @rc variable so we are not doing any extra
work in the common/success case.
- Related to the above, rework the 'out' jump label so that the
success and error paths are different, simplifying both.
- Cleanup some of the vertical whitespace while we are making the
other changes.
Signed-off-by: Paul Moore <paul@paul-moore.com>
The array of mount tokens in only used in match_opt_prefix() and never
modified.
The array of symtab names is never modified and only used in the
DEBUG_HASHES configuration as output.
The array of files for the SElinux filesystem sub-directory `ss` is
similar to the other `struct tree_descr` usages only read from to
construct the containing entries.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The second parameter `tag` of avtab_hash_eval() is only used for
printing. In policydb_index() it is called with a string literal:
avtab_hash_eval(&p->te_avtab, "rules");
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: slight formatting tweak in description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 539813e418 ("selinux: stop returning node from avc_insert()")
converted the return value of avc_insert() to void but left the now
unnecessary trailing return statement.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Since commit f22f9aaf6c ("selinux: remove the runtime disable
functionality") the function avc_disable() is no longer used.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In case mls_context_cpy() fails due to OOM set the free'd pointer in
context_cpy() to NULL to avoid it potentially being dereferenced or
free'd again in future. Freeing a NULL pointer is well-defined and a
hard NULL dereference crash is at least not exploitable and should give
a workable stack trace.
Fixes: 12b29f3455 ("selinux: support deferred mapping of contexts")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A few small tweaks to improve the SELinux Makefile:
- Define a new variable, 'genhdrs', to represent both flask.h and
av_permissions.h; this should help ensure consistent processing for
both generated headers.
- Move the 'ccflags-y' variable closer to the top, just after the
main 'obj-$(CONFIG_SECURITY_SELINUX)' definition to make it more
visible and improve the grouping in the Makefile.
- Rework some of the vertical whitespace to improve some of the
grouping in the Makefile.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The Makefile rule responsible for building flask.h and
av_permissions.h only lists flask.h as a target which means that
av_permissions.h is only generated when flask.h needs to be
generated. This patch fixes this by adding av_permissions.h as a
target to the rule.
Fixes: 8753f6bec3 ("selinux: generate flask headers during kernel build")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Make the flask.h target depend on the genheaders binary instead of
classmap.h to ensure that it is rebuilt if any of the dependencies of
genheaders are changed.
Notably this fixes flask.h not being rebuilt when
initial_sid_to_string.h is modified.
Fixes: 8753f6bec3 ("selinux: generate flask headers during kernel build")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The callers haven't used the returned node since
commit 21193dcd1f ("SELinux: more careful use of avd in
avc_has_perm_noaudit") and the return value assignments were removed in
commit 0a9876f36b ("selinux: Remove redundant assignments"). Stop
returning the node altogether and make the functions return void.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
PM: minor subj tweak, repair whitespace damage
Signed-off-by: Paul Moore <paul@paul-moore.com>
After working with the larger SELinux-based distros for several
years, we're finally at a place where we can disable the SELinux
runtime disable functionality. The existing kernel deprecation
notice explains the functionality and why we want to remove it:
The selinuxfs "disable" node allows SELinux to be disabled at
runtime prior to a policy being loaded into the kernel. If
disabled via this mechanism, SELinux will remain disabled until
the system is rebooted.
The preferred method of disabling SELinux is via the "selinux=0"
boot parameter, but the selinuxfs "disable" node was created to
make it easier for systems with primitive bootloaders that did not
allow for easy modification of the kernel command line.
Unfortunately, allowing for SELinux to be disabled at runtime makes
it difficult to secure the kernel's LSM hooks using the
"__ro_after_init" feature.
It is that last sentence, mentioning the '__ro_after_init' hardening,
which is the real motivation for this change, and if you look at the
diffstat you'll see that the impact of this patch reaches across all
the different LSMs, helping prevent tampering at the LSM hook level.
From a SELinux perspective, it is important to note that if you
continue to disable SELinux via "/etc/selinux/config" it may appear
that SELinux is disabled, but it is simply in an uninitialized state.
If you load a policy with `load_policy -i`, you will see SELinux
come alive just as if you had loaded the policy during early-boot.
It is also worth noting that the "/sys/fs/selinux/disable" file is
always writable now, regardless of the Kconfig settings, but writing
to the file has no effect on the system, other than to display an
error on the console if a non-zero/true value is written.
Finally, in the several years where we have been working on
deprecating this functionality, there has only been one instance of
someone mentioning any user visible breakage. In this particular
case it was an individual's kernel test system, and the workaround
documented in the deprecation notice ("selinux=0" on the kernel
command line) resolved the issue without problem.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
We originally promised that the SELinux 'checkreqprot' functionality
would be removed no sooner than June 2021, and now that it is March
2023 it seems like it is a good time to do the final removal. The
deprecation notice in the kernel provides plenty of detail on why
'checkreqprot' is not desirable, with the key point repeated below:
This was a compatibility mechanism for legacy userspace and
for the READ_IMPLIES_EXEC personality flag. However, if set to
1, it weakens security by allowing mappings to be made executable
without authorization by policy. The default value of checkreqprot
at boot was changed starting in Linux v4.4 to 0 (i.e. check the
actual protection), and Android and Linux distributions have been
explicitly writing a "0" to /sys/fs/selinux/checkreqprot during
initialization for some time.
Along with the official deprecation notice, we have been discussing
this on-list and directly with several of the larger SELinux-based
distros and everyone is happy to see this feature finally removed.
In an attempt to catch all of the smaller, and DIY, Linux systems
we have been writing a deprecation notice URL into the kernel log,
along with a growing ssleep() penalty, when admins enabled
checkreqprot at runtime or via the kernel command line. We have
yet to have anyone come to us and raise an objection to the
deprecation or planned removal.
It is worth noting that while this patch removes the checkreqprot
functionality, it leaves the user visible interfaces (kernel command
line and selinuxfs file) intact, just inert. This should help
prevent breakages with existing userspace tools that correctly, but
unnecessarily, disable checkreqprot at boot or runtime. Admins
that attempt to enable checkreqprot will be met with a removal
message in the kernel log.
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus observed that the pervasive passing of selinux_state pointers
introduced by me in commit aa8e712cee ("selinux: wrap global selinux
state") adds overhead and complexity without providing any
benefit. The original idea was to pave the way for SELinux namespaces
but those have not yet been implemented and there isn't currently
a concrete plan to do so. Remove the passing of the selinux_state
pointers, reverting to direct use of the single global selinux_state,
and likewise remove passing of child pointers like the selinux_avc.
The selinux_policy pointer remains as it is needed for atomic switching
of policies.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303101057.mZ3Gv5fK-lkp@intel.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This is based on earlier patch posted to the list by Linus, his
commit description read:
"avc_has_perm_noaudit()is one of those hot functions that end up
being used by almost all filesystem operations (through
"avc_has_perm()") and it's intended to be cheap enough to inline.
However, it turns out that the unlikely parts of it (where it
doesn't find an existing avc node) need a fair amount of stack
space for the automatic replacement node, so if it were to be
inlined (at least clang does not) it would just use stack space
unnecessarily.
So split the unlikely part out of it, and mark that part noinline.
That improves the actual likely part."
The basic idea behind the patch was reasonable, but there were minor
nits (double indenting, etc.) and the RCU read lock unlock/re-lock in
avc_compute_av() began to look even more ugly. This patch builds on
Linus' first effort by cleaning things up a bit and removing the RCU
unlock/lock dance in avc_compute_av().
Removing the RCU lock dance in avc_compute_av() is safe as there are
currently two callers of avc_compute_av(): avc_has_perm_noaudit() and
avc_has_extended_perms(). The first caller in avc_has_perm_noaudit()
does not require a RCU lock as there is no avc_node to protect so the
RCU lock can be dropped before calling avc_compute_av(). The second
caller, avc_has_extended_perms(), is similar in that there is no
avc_node that requires RCU protection, but the code is simplified by
holding the RCU look around the avc_compute_av() call, and given that
we enter a RCU critical section in security_compute_av() (called from
av_compute_av()) the impact will likely be unnoticeable. It is also
worth noting that avc_has_extended_perms() is only called from the
SELinux ioctl() access control hook at the moment.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter". These filters provide users
with finer-grained control over DAMOS's actions. SeongJae has also done
some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series "mm:
support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with his
series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings. The previous BPF-based approach had
shortcomings. See "mm: In-kernel support for memory-deny-write-execute
(MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a per-node
basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage during
compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in ths
series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's series
"mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
"fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest of
the kernel in the series "Simplify the external interface for GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the series
"mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
=MlGs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.
[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmOXmxkUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMPXg//cxfYC8lRtVpuGNCZWDietSiHzpzu
+qFntaTplvybJMQX0HfgNee5cTBZM+W5mp1BHRcZInvV5LRhyrVtgsxDBifutE4x
LyUJAw5SkiPdRC+XLDIRLKiZCobFBLVs2zO+qibIqsyR60pFjU6WXBLbJfidXBFR
yWudDbLU0YhQJCHdNHNqnHCgqrEculxn6q3QPvm/DX0xzBwkFHSSYBkGNvHW2ZTA
lKNreEOwEk5DTLIKjP4bJ72ixp0xbshw5CXuxtwB/12/4h8QbWbJVQLlIeZrTLmp
zQXQLJ3pCqKJ2OUCgMDK+wmkvLezd80BV3Due7KX0pT0YRDygoh5QEpZ5/8k8eG7
prxToh2gJWk2htfJF6kgMpAh9Jqewcke4BysbYVM/427OPZYwQqLDZDGOzbtT6pl
FYF+adN9wwkAErnHnPlzYipUEpBWurbjtsV8KFWNERoZ4YmzfSPEisRqGIHDGRws
bTyq/7qs5FXkb1zULELj8V+S2ULsmxPqsxJ63p9di54Uo9lHK0I+0IUtajGDdfze
psAasa9DD/oH2PAbSmpQ5Xo9XyfHRXsVuz1twEmEA14ML0m4wHbNWVHaK0aaXVdG
kJKSDSjMsiV+GiwNo7ISJ4pVdUpnMI/iZSghFfV28cJslNhJDeaREHaE/Wtn1/xF
/bCVmEfS16UoJsQ=
=klFk
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Improve the error handling in the device cgroup such that memory
allocation failures when updating the access policy do not
potentially alter the policy.
- Some minor fixes to reiserfs to ensure that it properly releases
LSM-related xattr values.
- Update the security_socket_getpeersec_stream() LSM hook to take
sockptr_t values.
Previously the net/BPF folks updated the getsockopt code in the
network stack to leverage the sockptr_t type to make it easier to
pass both kernel and __user pointers, but unfortunately when they did
so they didn't convert the LSM hook.
While there was/is no immediate risk by not converting the LSM hook,
it seems like this is a mistake waiting to happen so this patch
proactively does the LSM hook conversion.
- Convert vfs_getxattr_alloc() to return an int instead of a ssize_t
and cleanup the callers. Internally the function was never going to
return anything larger than an int and the callers were doing some
very odd things casting the return value; this patch fixes all that
and helps bring a bit of sanity to vfs_getxattr_alloc() and its
callers.
- More verbose, and helpful, LSM debug output when the system is booted
with "lsm.debug" on the command line. There are examples in the
commit description, but the quick summary is that this patch provides
better information about which LSMs are enabled and the ordering in
which they are processed.
- General comment and kernel-doc fixes and cleanups.
* tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: Fix description of fs_context_parse_param
lsm: Add/fix return values in lsm_hooks.h and fix formatting
lsm: Clarify documentation of vm_enough_memory hook
reiserfs: Add missing calls to reiserfs_security_free()
lsm,fs: fix vfs_getxattr_alloc() return type and caller error paths
device_cgroup: Roll back to original exceptions after copy failure
LSM: Better reporting of actual LSMs at boot
lsm: make security_socket_getpeersec_stream() sockptr_t safe
audit: Fix some kernel-doc warnings
lsm: remove obsoleted comments for security hooks
fs: edit a comment made in bad taste
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmOXmvkUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNP8BAA0jhzbzMXynz7es7dQTdE2J22umMe
CzGoNxyMAPEYRPlTZmqqwSUaDPhtt4Z0MDkAG1Fn46qn3W8b0L31Z5kXTpanl+1P
ZMP2WRCiuBS8V90XrMhQ9qvUjnIJwe/RRbwiyaSBxRUrN4MU6RA/q9suyYu/aKvo
sueRJJtJgcwb8fGpKbaoGU4NiSeCCzabT7E+ofPYt4joCAdbLzokszbWrqEYInh/
yb6V03Mad/wl7jz3BwSwY+cVdEuJV+mDcfIg1yB7O9pr/H8HpIcXvYIyEICrVdGw
nstkI76w22HcbHkWWbLWNAdPRUcMRA8Bf3GAXuhV+8gr2g8bt5ePEXsqkc1Oh75z
o59TaBwCGxsE6qffBcytdBueqaf+CFWXv0kTIRGS9SMMCe6r3y8UIYxzdebOEB3v
uJVWOUZTI3FqFdHl6v9I2d1R5FQurh2yX01JIe5vk2I5Oswy8hHVvDFxnJ5AEeUW
Mcl/zV2lGgdfLrxQ+qideiTx/d71Dw/BExlyaFP8b1/ccX0X6vnOtvt6z3vw4KsR
QDffPbFZhtApJuHBf05iYMXaUS41RU55sAaDtFh94eWRD5EZ9298qGpP6+weJvlz
ofBvKaZswQj6ZdymoZB+A+vbwUKItp2ApijyLbOMtaP1RNY1/47aO0kQkmPRuHe7
5+cKG8cjyrruZXw=
=4AGR
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Two SELinux patches: one increases the sleep time on deprecated
functionality, and one removes the indirect calls in the sidtab
context conversion code"
* tag 'selinux-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: remove the sidtab context conversion indirect calls
selinux: increase the deprecation sleep for checkreqprot and runtime disable
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bwTgAKCRCRxhvAZXjc
ovd2AQCK00NAtGjQCjQPQGyTa4GAPqvWgq1ef0lnhv+TL5US5gD9FncQ8UofeMXt
pBfjtAD6ettTPCTxUQfnTwWEU4rc7Qg=
=27Wm
-----END PGP SIGNATURE-----
Merge tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull VFS acl updates from Christian Brauner:
"This contains the work that builds a dedicated vfs posix acl api.
The origins of this work trace back to v5.19 but it took quite a while
to understand the various filesystem specific implementations in
sufficient detail and also come up with an acceptable solution.
As we discussed and seen multiple times the current state of how posix
acls are handled isn't nice and comes with a lot of problems: The
current way of handling posix acls via the generic xattr api is error
prone, hard to maintain, and type unsafe for the vfs until we call
into the filesystem's dedicated get and set inode operations.
It is already the case that posix acls are special-cased to death all
the way through the vfs. There are an uncounted number of hacks that
operate on the uapi posix acl struct instead of the dedicated vfs
struct posix_acl. And the vfs must be involved in order to interpret
and fixup posix acls before storing them to the backing store, caching
them, reporting them to userspace, or for permission checking.
Currently a range of hacks and duct tape exist to make this work. As
with most things this is really no ones fault it's just something that
happened over time. But the code is hard to understand and difficult
to maintain and one is constantly at risk of introducing bugs and
regressions when having to touch it.
Instead of continuing to hack posix acls through the xattr handlers
this series builds a dedicated posix acl api solely around the get and
set inode operations.
Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl()
helpers must be used in order to interact with posix acls. They
operate directly on the vfs internal struct posix_acl instead of
abusing the uapi posix acl struct as we currently do. In the end this
removes all of the hackiness, makes the codepaths easier to maintain,
and gets us type safety.
This series passes the LTP and xfstests suites without any
regressions. For xfstests the following combinations were tested:
- xfs
- ext4
- btrfs
- overlayfs
- overlayfs on top of idmapped mounts
- orangefs
- (limited) cifs
There's more simplifications for posix acls that we can make in the
future if the basic api has made it.
A few implementation details:
- The series makes sure to retain exactly the same security and
integrity module permission checks. Especially for the integrity
modules this api is a win because right now they convert the uapi
posix acl struct passed to them via a void pointer into the vfs
struct posix_acl format to perform permission checking on the mode.
There's a new dedicated security hook for setting posix acls which
passes the vfs struct posix_acl not a void pointer. Basing checking
on the posix acl stored in the uapi format is really unreliable.
The vfs currently hacks around directly in the uapi struct storing
values that frankly the security and integrity modules can't
correctly interpret as evidenced by bugs we reported and fixed in
this area. It's not necessarily even their fault it's just that the
format we provide to them is sub optimal.
- Some filesystems like 9p and cifs need access to the dentry in
order to get and set posix acls which is why they either only
partially or not even at all implement get and set inode
operations. For example, cifs allows setxattr() and getxattr()
operations but doesn't allow permission checking based on posix
acls because it can't implement a get acl inode operation.
Thus, this patch series updates the set acl inode operation to take
a dentry instead of an inode argument. However, for the get acl
inode operation we can't do this as the old get acl method is
called in e.g., generic_permission() and inode_permission(). These
helpers in turn are called in various filesystem's permission inode
operation. So passing a dentry argument to the old get acl inode
operation would amount to passing a dentry to the permission inode
operation which we shouldn't and probably can't do.
So instead of extending the existing inode operation Christoph
suggested to add a new one. He also requested to ensure that the
get and set acl inode operation taking a dentry are consistently
named. So for this version the old get acl operation is renamed to
->get_inode_acl() and a new ->get_acl() inode operation taking a
dentry is added. With this we can give both 9p and cifs get and set
acl inode operations and in turn remove their complex custom posix
xattr handlers.
In the future I hope to get rid of the inode method duplication but
it isn't like we have never had this situation. Readdir is just one
example. And frankly, the overall gain in type safety and the more
pleasant api wise are simply too big of a benefit to not accept
this duplication for a while.
- We've done a full audit of every codepaths using variant of the
current generic xattr api to get and set posix acls and
surprisingly it isn't that many places. There's of course always a
chance that we might have missed some and if so I'm sure we'll find
them soon enough.
The crucial codepaths to be converted are obviously stacking
filesystems such as ecryptfs and overlayfs.
For a list of all callers currently using generic xattr api helpers
see [2] including comments whether they support posix acls or not.
- The old vfs generic posix acl infrastructure doesn't obey the
create and replace semantics promised on the setxattr(2) manpage.
This patch series doesn't address this. It really is something we
should revisit later though.
The patches are roughly organized as follows:
(1) Change existing set acl inode operation to take a dentry
argument (Intended to be a non-functional change)
(2) Rename existing get acl method (Intended to be a non-functional
change)
(3) Implement get and set acl inode operations for filesystems that
couldn't implement one before because of the missing dentry.
That's mostly 9p and cifs (Intended to be a non-functional
change)
(4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(),
and vfs_set_acl() including security and integrity hooks
(Intended to be a non-functional change)
(5) Implement get and set acl inode operations for stacking
filesystems (Intended to be a non-functional change)
(6) Switch posix acl handling in stacking filesystems to new posix
acl api now that all filesystems it can stack upon support it.
(7) Switch vfs to new posix acl api (semantical change)
(8) Remove all now unused helpers
(9) Additional regression fixes reported after we merged this into
linux-next
Thanks to Seth for a lot of good discussion around this and
encouragement and input from Christoph"
* tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits)
posix_acl: Fix the type of sentinel in get_acl
orangefs: fix mode handling
ovl: call posix_acl_release() after error checking
evm: remove dead code in evm_inode_set_acl()
cifs: check whether acl is valid early
acl: make vfs_posix_acl_to_xattr() static
acl: remove a slew of now unused helpers
9p: use stub posix acl handlers
cifs: use stub posix acl handlers
ovl: use stub posix acl handlers
ecryptfs: use stub posix acl handlers
evm: remove evm_xattr_acl_change()
xattr: use posix acl api
ovl: use posix acl api
ovl: implement set acl method
ovl: implement get acl method
ecryptfs: implement set acl method
ecryptfs: implement get acl method
ksmbd: use vfs_remove_acl()
acl: add vfs_remove_acl()
...
The sidtab conversion code has support for multiple context
conversion routines through the use of function pointers and
indirect calls. However, the reality is that all current users rely
on the same conversion routine: convert_context(). This patch does
away with this extra complexity and replaces the indirect calls
with direct function calls; allowing us to remove a layer of
obfuscation and create cleaner, more maintainable code.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 4ff09db1b7 ("bpf: net: Change sk_getsockopt() to take the
sockptr_t argument") made it possible to call sk_getsockopt()
with both user and kernel address space buffers through the use of
the sockptr_t type. Unfortunately at the time of conversion the
security_socket_getpeersec_stream() LSM hook was written to only
accept userspace buffers, and in a desire to avoid having to change
the LSM hook the commit author simply passed the sockptr_t's
userspace buffer pointer. Since the only sk_getsockopt() callers
at the time of conversion which used kernel sockptr_t buffers did
not allow SO_PEERSEC, and hence the
security_socket_getpeersec_stream() hook, this was acceptable but
also very fragile as future changes presented the possibility of
silently passing kernel space pointers to the LSM hook.
There are several ways to protect against this, including careful
code review of future commits, but since relying on code review to
catch bugs is a recipe for disaster and the upstream eBPF maintainer
is "strongly against defensive programming", this patch updates the
LSM hook, and all of the implementations to support sockptr_t and
safely handle both user and kernel space buffers.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
So far posix acls were passed as a void blob to the security and
integrity modules. Some of them like evm then proceed to interpret the
void pointer and convert it into the kernel internal struct posix acl
representation to perform their integrity checking magic. This is
obviously pretty problematic as that requires knowledge that only the
vfs is guaranteed to have and has lead to various bugs. Add a proper
security hook for setting posix acls and pass down the posix acls in
their appropriate vfs format instead of hacking it through a void
pointer stored in the uapi format.
I spent considerate time in the security module infrastructure and
audited all codepaths. SELinux has no restrictions based on the posix
acl values passed through it. The capability hook doesn't need to be
called either because it only has restrictions on security.* xattrs. So
these are all fairly simply hooks for SELinux.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
The following warning was triggered on a hardware environment:
SELinux: Converting 162 SID table entries...
BUG: sleeping function called from invalid context at
__might_sleep+0x60/0x74 0x0
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 5943, name: tar
CPU: 7 PID: 5943 Comm: tar Tainted: P O 5.10.0 #1
Call trace:
dump_backtrace+0x0/0x1c8
show_stack+0x18/0x28
dump_stack+0xe8/0x15c
___might_sleep+0x168/0x17c
__might_sleep+0x60/0x74
__kmalloc_track_caller+0xa0/0x7dc
kstrdup+0x54/0xac
convert_context+0x48/0x2e4
sidtab_context_to_sid+0x1c4/0x36c
security_context_to_sid_core+0x168/0x238
security_context_to_sid_default+0x14/0x24
inode_doinit_use_xattr+0x164/0x1e4
inode_doinit_with_dentry+0x1c0/0x488
selinux_d_instantiate+0x20/0x34
security_d_instantiate+0x70/0xbc
d_splice_alias+0x4c/0x3c0
ext4_lookup+0x1d8/0x200 [ext4]
__lookup_slow+0x12c/0x1e4
walk_component+0x100/0x200
path_lookupat+0x88/0x118
filename_lookup+0x98/0x130
user_path_at_empty+0x48/0x60
vfs_statx+0x84/0x140
vfs_fstatat+0x20/0x30
__se_sys_newfstatat+0x30/0x74
__arm64_sys_newfstatat+0x1c/0x2c
el0_svc_common.constprop.0+0x100/0x184
do_el0_svc+0x1c/0x2c
el0_svc+0x20/0x34
el0_sync_handler+0x80/0x17c
el0_sync+0x13c/0x140
SELinux: Context system_u:object_r:pssp_rsyslog_log_t:s0:c0 is
not valid (left unmapped).
It was found that within a critical section of spin_lock_irqsave in
sidtab_context_to_sid(), convert_context() (hooked by
sidtab_convert_params.func) might cause the process to sleep via
allocating memory with GFP_KERNEL, which is problematic.
As Ondrej pointed out [1], convert_context()/sidtab_convert_params.func
has another caller sidtab_convert_tree(), which is okay with GFP_KERNEL.
Therefore, fix this problem by adding a gfp_t argument for
convert_context()/sidtab_convert_params.func and pass GFP_KERNEL/_ATOMIC
properly in individual callers.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20221018120111.1474581-1-gongruiqi1@huawei.com/ [1]
Reported-by: Tan Ninghao <tanninghao1@huawei.com>
Fixes: ee1a84fdfe ("selinux: overhaul sidtab to fix bug and improve performance")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: wrap long BUG() output lines, tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Further the checkreqprot and runtime disable deprecation efforts by
increasing the sleep time from 5 to 15 seconds to help make this more
noticeable for any users who are still using these knobs.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68YIUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOTbA//TR8i+Wy8iswUCmtfmYg91h1uebpl
/kjNsSmfgivAUTGamr3eN2WRlGhZfkFDPIHa25uybSA6Q+75p4lst83Rt3HDbjkv
Ga7grCXnHwSDwJoHOSeFh0pojV2u7Zvfmiib2U5hPZEmd3kBw3NCgAJVcSGN80B2
dct36fzZNXjvpWDbygmFtRRkmEseslSkft8bUVvNZBP+B0zvv3vcNY1QFuKuK+W2
8wWpvO/cCSmke5i2c2ktHSk2f8/Y6n26Ik/OTHcTVfoKZLRaFbXEzLyxzLrNWd6m
hujXgcxszTtHdmoXx+J6uBauju7TR8pi1x8mO2LSGrlpRc1cX0A5ED8WcH71+HVE
8L1fIOmZShccPZn8xRok7oYycAUm/gIfpmSLzmZA76JsZYAe+mp9Ze9FA6fZtSwp
7Q/rfw/Rlz25WcFBe4xypP078HkOmqutkCk2zy5liR+cWGrgy/WKX15vyC0TaPrX
tbsRKuCLkipgfXrTk0dX3kmhz+3bJYjqeZEt7sfPSZYpaOGkNXVmAW0wnCOTuLMU
+8pIVktvQxMmACEj2gBMz11iooR4DpWLxOcQQR/impgCpNdZ60nA0a6KPJoIXC+5
NfTa422FZkc99QRVblUZyWSgJBW78Z3ZAQcQlo1AGLlFydbfrSFTRLbmNJZo/Nkl
KwpGvWs5nB0rVw0=
=VZl5
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
"Seven patches for the LSM layer and we've got a mix of trivial and
significant patches. Highlights below, starting with the smaller bits
first so they don't get lost in the discussion of the larger items:
- Remove some redundant NULL pointer checks in the common LSM audit
code.
- Ratelimit the lockdown LSM's access denial messages.
With this change there is a chance that the last visible lockdown
message on the console is outdated/old, but it does help preserve
the initial series of lockdown denials that started the denial
message flood and my gut feeling is that these might be the more
valuable messages.
- Open userfaultfds as readonly instead of read/write.
While this code obviously lives outside the LSM, it does have a
noticeable impact on the LSMs with Ondrej explaining the situation
in the commit description. It is worth noting that this patch
languished on the VFS list for over a year without any comments
(objections or otherwise) so I took the liberty of pulling it into
the LSM tree after giving fair notice. It has been in linux-next
since the end of August without any noticeable problems.
- Add a LSM hook for user namespace creation, with implementations
for both the BPF LSM and SELinux.
Even though the changes are fairly small, this is the bulk of the
diffstat as we are also including BPF LSM selftests for the new
hook.
It's also the most contentious of the changes in this pull request
with Eric Biederman NACK'ing the LSM hook multiple times during its
development and discussion upstream. While I've never taken NACK's
lightly, I'm sending these patches to you because it is my belief
that they are of good quality, satisfy a long-standing need of
users and distros, and are in keeping with the existing nature of
the LSM layer and the Linux Kernel as a whole.
The patches in implement a LSM hook for user namespace creation
that allows for a granular approach, configurable at runtime, which
enables both monitoring and control of user namespaces. The general
consensus has been that this is far preferable to the other
solutions that have been adopted downstream including outright
removal from the kernel, disabling via system wide sysctls, or
various other out-of-tree mechanisms that users have been forced to
adopt since we haven't been able to provide them an upstream
solution for their requests. Eric has been steadfast in his
objections to this LSM hook, explaining that any restrictions on
the user namespace could have significant impact on userspace.
While there is the possibility of impacting userspace, it is
important to note that this solution only impacts userspace when it
is requested based on the runtime configuration supplied by the
distro/admin/user. Frederick (the pathset author), the LSM/security
community, and myself have tried to work with Eric during
development of this patchset to find a mutually acceptable
solution, but Eric's approach and unwillingness to engage in a
meaningful way have made this impossible. I have CC'd Eric directly
on this pull request so he has a chance to provide his side of the
story; there have been no objections outside of Eric's"
* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: ratelimit denial messages
userfaultfd: open userfaultfds with O_RDONLY
selinux: Implement userns_create hook
selftests/bpf: Add tests verifying bpf lsm userns_create hook
bpf-lsm: Make bpf_lsm_userns_create() sleepable
security, lsm: Introduce security_create_user_ns()
lsm: clean up redundant NULL pointer check
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68ZsUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOAtRAAw/lcyPoyN8ia6+PPihRtAKGUFIf5
+IdEPYfCqkGghqB7BRDl5bXOLFgpY/m/41g+xFvzJ0fhVPLa7UWB//N7yTu3OnW/
vXz1wn0EJAeDlLbPzWd6V/SpcxJ1WPzjHj2B3YXNWnukfMjCnPIA8XlZc18zAWS1
/OOEBoOo/a/8Giw2l1bEXxfmDI20NrXNL3vWKQ+Bbhg2PJaH/FTk4DNxopt84o28
vA+cbfQcOOjeRjBuncnTp9/b244ojeM+lRSJZozGTogFIeDUp3KW1D7NHqNwyX12
seDooqLEP25vP+kQh8zH7gvacpoeDLz40bSpd+MKKj02IxKGikykWuvtlFWY3xNB
o1mT4SJhh3JcewS7gh6P5aESSSgLg9zb3zMGtjHhtz+HHi/Sq7PK7xJgrnKOBNgu
CLIu3L+5vJpAgrsze2tIcwRUySIzDKnfgw8Oz7zaS2lOTJ58emz00QwEioHMQufK
8gZXTvZykJAtLF19PJw+mHKu38hbdD/4vt8AFuIgJzFkjWKzaZAxUBT+3p/uaLHG
2PegjKzpCqH9vZ/HCdYI42OB8TKiPU3eBtYZ2eP3h7cdDu++tp1rf0hwHQrwE2AD
PRuoCaBYOTUedbR8CV07fSSGFnZvlPnuk9yB7/eztV2thBQG28ALGxVhWadn4ap/
UIFgCs5QDRj11u8=
=BQ+i
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"Six SELinux patches, all are simple and easily understood, but a list
of the highlights is below:
- Use 'grep -E' instead of 'egrep' in the SELinux policy install
script.
Fun fact, this seems to be GregKH's *second* dedicated SELinux
patch since we transitioned to git (ignoring merges, the SPDX
stuff, and a trivial fs reference removal when lustre was yanked);
the first was back in 2011 when selinuxfs was placed in
/sys/fs/selinux. Oh, the memories ...
- Convert the SELinux policy boolean values to use signed integer
types throughout the SELinux kernel code.
Prior to this we were using a mix of signed and unsigned integers
which was probably okay in this particular case, but it is
definitely not a good idea in general.
- Remove a reference to the SELinux runtime disable functionality in
/etc/selinux/config as we are in the process of deprecating that.
See [1] for more background on this if you missed the previous
notes on the deprecation.
- Minor cleanups: remove unneeded variables and function parameter
constification"
Link: https://github.com/SELinuxProject/selinux-kernel/wiki/DEPRECATE-runtime-disable [1]
* tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: remove runtime disable message in the install_policy.sh script
selinux: use "grep -E" instead of "egrep"
selinux: remove the unneeded result variable
selinux: declare read-only parameters const
selinux: use int arrays for boolean values
selinux: remove an unneeded variable in sel_make_class_dir_entries()
Return the value avc_has_perm() directly instead of storing it in
another redundant variable.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
cast of ->d_name.name to char * is completely wrong - nothing is
allowed to modify its contents.
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Declare ebitmap, mls_level and mls_context parameters const where they
are only read from. This allows callers to supply pointers to const
as arguments and increases readability.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Do not cast pointers of signed integers to pointers of unsigned integers
and vice versa.
It should currently not be an issue since they hold SELinux boolean
values which should only contain either 0's or 1's, which should have
the same representation.
Reported by sparse:
.../selinuxfs.c:1485:30: warning: incorrect type in assignment
(different signedness)
.../selinuxfs.c:1485:30: expected unsigned int *
.../selinuxfs.c:1485:30: got int *[addressable] values
.../selinuxfs.c:1402:48: warning: incorrect type in argument 3
(different signedness)
.../selinuxfs.c:1402:48: expected int *values
.../selinuxfs.c:1402:48: got unsigned int *bool_pending_values
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: minor whitespace fixes, sparse output cleanup]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return the value sel_make_perm_files() directly instead of storing it
in another redundant variable.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a SELinux access control for the iouring IORING_OP_URING_CMD
command. This includes the addition of a new permission in the
existing "io_uring" object class: "cmd". The subject of the new
permission check is the domain of the process requesting access, the
object is the open file which points to the device/file that is the
target of the IORING_OP_URING_CMD operation. A sample policy rule
is shown below:
allow <domain> <file>:io_uring { cmd };
Cc: stable@vger.kernel.org
Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Unprivileged user namespace creation is an intended feature to enable
sandboxing, however this feature is often used to as an initial step to
perform a privilege escalation attack.
This patch implements a new user_namespace { create } access control
permission to restrict which domains allow or deny user namespace
creation. This is necessary for system administrators to quickly protect
their systems while waiting for vulnerability patches to be applied.
This permission can be used in the following way:
allow domA_t domA_t : user_namespace { create };
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmLoEeIUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNSOhAAwWwRcmcHnk+k2agT9QjKrLo26NCO
MQLE89o4y2ChEFHxC7F7SKoQRxtfYa323p1vmlGzKrlB+IZ6oqERVp4QNQQbXsfn
n9VvVpxjRNHAetcRhCM9ZOchWjUdw6AMaJ8e3fdRNRESadAUUFDxifw1wpjgG9+i
LmtDbfZ7vLs2grTf9OZy3JIl1VF3lVRUTI7ZBQggfJncMa+LXNWdVNmEe3yfyboA
1MwpSao7K2si0hBGAQo/UGQz4b19Tm4xMg8bSy7oTsP5Lae5ciPkeI3qazvs9usp
WScZYhQ8NugqLbDbjs7dm6QCpj4x3dUs6ei48LKe3GF2mcGesFfOPo9sNHao4kKv
C9t0f9qw+EhGvnNL7uQIDDf8OuTjuLWDvZSrMLID/IJKFF5NJ3y+XzaS9aPM3VEY
qyOsX+cEzheXGhD6xE1sCo+AyPUDYqNDMIKBj2wlIGCKlzDGa8RT6VsQuvgf3c3K
43CnRCQeWDWOHCq3MnRe/fmYtW+JB7tsXiKAq4OJADacwPP36bsP3bqU8AlWYwDt
tnuMa+LKusHnMEQpMPI8FW8qGdxwGSen+mymfLFIMgtwNGkV7WGRJ6Lbyn0SaR6v
HyXgZASIOQRnamK3yZCDpxo0K81IVxPWJIjHyg53znqT5TCpXccPyV4HwbJKI/KG
8PtHrXOdPOGCZ2g=
=WWq1
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A relatively small set of patches for SELinux this time, eight patches
in total with really only one significant change.
The highlights are:
- Add support for proper labeling of memfd_secret anonymous inodes.
This will allow LSMs that implement the anonymous inode hooks to
apply security policy to memfd_secret() fds.
- Various small improvements to memory management: fixed leaks, freed
memory when needed, boundary checks.
- Hardened the selinux_audit_data struct with __randomize_layout.
- A minor documentation tweak to fix a formatting/style issue"
* tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: selinux_add_opt() callers free memory
selinux: Add boundary check in put_entry()
selinux: fix memleak in security_read_state_kernel()
docs: selinux: add '=' signs to kernel boot options
mm: create security context for memfd_secret inodes
selinux: fix typos in comments
selinux: drop unnecessary NULL check
selinux: add __randomize_layout to selinux_audit_data
The selinux_add_opt() function may need to allocate memory for the
mount options if none has already been allocated, but there is no
need to free that memory on error as the callers handle that. Drop
the existing kfree() on error to help increase consistency in the
selinux_add_opt() error handling.
This patch also changes selinux_add_opt() to return -EINVAL when
the mount option value, @s, is NULL. It currently return -ENOMEM.
Link: https://lore.kernel.org/lkml/20220611090550.135674-1-xiujianfeng@huawei.com/T/
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
[PM: fix subject, rework commit description language]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Just like next_entry(), boundary check is necessary to prevent memory
out-of-bound access.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In this function, it directly returns the result of __security_read_policy
without freeing the allocated memory in *data, cause memory leak issue,
so free the memory if __security_read_policy failed.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit e3489f8974 ("selinux: kill selinux_sb_get_mnt_opts()")
introduced a NULL check on the context after a successful call to
security_sid_to_context(). This is on the one hand redundant after
checking for success and on the other hand insufficient on an actual
NULL pointer, since the context is passed to seq_escape() leading to a
call of strlen() on it.
Reported by Clang analyzer:
In file included from security/selinux/hooks.c:28:
In file included from ./include/linux/tracehook.h:50:
In file included from ./include/linux/memcontrol.h:13:
In file included from ./include/linux/cgroup.h:18:
./include/linux/seq_file.h:136:25: warning: Null pointer passed as 1st argument to string length function [unix.cstring.NullArg]
seq_escape_mem(m, src, strlen(src), flags, esc);
^~~~~~~~~~~
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Randomize the layout of struct selinux_audit_data as suggested in [1],
since it contains a pointer to struct selinux_state, an already
randomized strucure.
[1]: https://github.com/KSPP/linux/issues/188
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmKLj4oUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNIoA//c2Fbgr3tTs6yCWAJk+mQcVwD1eq5
F2f3ild8qpSH15aYZkQPapJ0Ep1W4EDuf/AbRbfVB4t+tknrxtR8IAtiUYOPDlfW
eK85ENj5b+Hc6mPPHE8On0kc6oNySYeHXHGZ84c4DxRwjXolnHQTOIHb7pMKTGyU
cq6oqsgkpou88rnzJg/eiFkf/Yk2h0oS8jDQcu2OVaeNoBaVg5oAau01HES1IMzB
gqiEi0WXQII9lQX2qRLCPiPuHwA//PoMmx342JiIFcrOrprBCYiQ5yNWYR+VKuGP
WH85etJOeWh9kqsvRVSMs/y3L+RPFoydwLXsud0lIappbad53KJDq53oDco7PTY/
lhrhgSEipwc18QFZzIj7+h2R53k5YQYWFk5dC1nKfkVLd/sAqAcLPfbyOmeSQ097
/DbzUouiP8zq7WHpPw6dikVeT5wBqBjEcwoCZSjctXi4vDSWNWt6OBunx7bwOhbr
IfKESEDJhyG2xtmyYgEpDFXTn4d2SuxspPRmdYDOlvgLLH037+cXm/8TmzoMNiQ3
Xs6/vpzFmh+r+0Astzt+MisQrWDGNF9XQqVz4UrXkSXTqtkXO28/4ZCh0NE2squu
6zXf2KX79HxMos8OELvBV73U6yIEoK18qsygYgHwT+iB+YOMZvwZMpyl35JZWnAK
fxVu54GrcQNjCQs=
=1ZFj
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220523' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"We've got twelve patches queued for v5.19, with most being fairly
minor. The highlights are below:
- The checkreqprot and runtime disable knobs have been deprecated for
some time with no active users that we can find. In an effort to
move things along we are adding a pause when the knobs are used to
help make the deprecation more noticeable in case anyone is still
using these hacks in the shadows.
- We've added the anonymous inode class name to the AVC audit records
when anonymous inodes are involved. This should make writing policy
easier when anonymous inodes are involved.
- More constification work. This is fairly straightforward and the
source of most of the diffstat.
- The usual minor cleanups: remove unnecessary assignments, assorted
style/checkpatch fixes, kdoc fixes, macro while-loop
encapsulations, #include tweaks, etc"
* tag 'selinux-pr-20220523' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
security: declare member holding string literal const
selinux: log anon inode class name
selinux: declare data arrays const
selinux: fix indentation level of mls_ops block
selinux: include necessary headers in headers
selinux: avoid extra semicolon
selinux: update parameter documentation
selinux: resolve checkpatch errors
selinux: don't sleep when CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE is true
selinux: checkreqprot is deprecated, add some ssleep() discomfort
selinux: runtime disable is deprecated, add some ssleep() discomfort
selinux: Remove redundant assignments
The code attempts to free the 'new' pointer using kmem_cache_free(),
which is wrong because this function isn't responsible of freeing it.
Instead, the function should free new->htable and clear the contents of
*new (to prevent double-free).
Cc: stable@vger.kernel.org
Fixes: c7c556f1e8 ("selinux: refactor changing booleans")
Reported-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Log the anonymous inode class name in the security hook
inode_init_security_anon. This name is the key for name based type
transitions on the anon_inode security class on creation. Example:
type=AVC msg=audit(02/16/22 22:02:50.585:216) : avc: granted \
{ create } for pid=2136 comm=mariadbd anonclass=[io_uring] \
scontext=system_u:system_r:mysqld_t:s0 \
tcontext=system_u:system_r:mysqld_iouring_t:s0 tclass=anon_inode
Add a new LSM audit data type holding the inode and the class name.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: adjusted 'anonclass' to be a trusted string, cgzones approved]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The arrays for the policy capability names, the initial sid identifiers
and the class and permission names are not changed at runtime. Declare
them const to avoid accidental modification.
Do not override the classmap and the initial sid list in the build time
script genheaders.
Check flose(3) is successful in genheaders.c, otherwise the written data
might be corrupted or incomplete.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: manual merge due to fuzz, minor style tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add one level of indentation to the code block of the label mls_ops in
constraint_expr_eval(), to adjust the trailing break; to the parent
case: branch.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Include header files required for struct or typedef declarations in
header files. This is for example helpful when working with an IDE, which
needs to resolve those symbols.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Wrap macro into `do { } while (0)` to avoid Clang emitting warnings
about extra semicolons.
Similar to userspace commit
9d85aa60d1
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: whitespace/indenting tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
security/selinux/include/audit.h:54: warning: Function parameter or member 'krule' not described in 'selinux_audit_rule_known'
security/selinux/include/audit.h:54: warning: Excess function parameter 'rule' description in 'selinux_audit_rule_known'
security/selinux/include/avc.h:130: warning: Function parameter or member 'state' not described in 'avc_audit'
This also bring the parameter name of selinux_audit_rule_known() in sync
between declaration and definition.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Reported by checkpatch:
security/selinux/nlmsgtab.c
---------------------------
ERROR: that open brace { should be on the previous line
#29: FILE: security/selinux/nlmsgtab.c:29:
+static const struct nlmsg_perm nlmsg_route_perms[] =
+{
ERROR: that open brace { should be on the previous line
#97: FILE: security/selinux/nlmsgtab.c:97:
+static const struct nlmsg_perm nlmsg_tcpdiag_perms[] =
+{
ERROR: that open brace { should be on the previous line
#105: FILE: security/selinux/nlmsgtab.c:105:
+static const struct nlmsg_perm nlmsg_xfrm_perms[] =
+{
ERROR: that open brace { should be on the previous line
#134: FILE: security/selinux/nlmsgtab.c:134:
+static const struct nlmsg_perm nlmsg_audit_perms[] =
+{
security/selinux/ss/policydb.c
------------------------------
ERROR: that open brace { should be on the previous line
#318: FILE: security/selinux/ss/policydb.c:318:
+static int (*destroy_f[SYM_NUM]) (void *key, void *datum, void *datap) =
+{
ERROR: that open brace { should be on the previous line
#674: FILE: security/selinux/ss/policydb.c:674:
+static int (*index_f[SYM_NUM]) (void *key, void *datum, void *datap) =
+{
ERROR: that open brace { should be on the previous line
#1643: FILE: security/selinux/ss/policydb.c:1643:
+static int (*read_f[SYM_NUM]) (struct policydb *p, struct symtab *s, void *fp) =
+{
ERROR: that open brace { should be on the previous line
#3246: FILE: security/selinux/ss/policydb.c:3246:
+ void *datap) =
+{
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Unfortunately commit 81200b0265 ("selinux: checkreqprot is
deprecated, add some ssleep() discomfort") added a five second sleep
during early kernel boot, e.g. start_kernel(), which could cause a
"scheduling while atomic" panic. This patch fixes this problem by
moving the sleep out of checkreqprot_set() and into
sel_write_checkreqprot() so that we only sleep when the checkreqprot
setting is set during runtime, after the kernel has booted. The
error message remains the same in both cases.
Fixes: 81200b0265 ("selinux: checkreqprot is deprecated, add some ssleep() discomfort")
Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The checkreqprot functionality was disabled by default back in
Linux v4.4 (2015) with commit 2a35d196c1 ("selinux: change
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and it was
officially marked as deprecated in Linux v5.7. It was always a
bit of a hack to workaround very old userspace and to the best of
our knowledge, the checkreqprot functionality has been disabled by
Linux distributions for quite some time.
This patch moves the deprecation messages from KERN_WARNING to
KERN_ERR and adds a five second sleep to anyone using it to help
draw their attention to the deprecation and provide a URL which
helps explain things in more detail.
Signed-off-by: Paul Moore <paul@paul-moore.com>
We deprecated the SELinux runtime disable functionality in Linux
v5.6, and it is time to get a bit more serious about removing it.
Add a five second sleep to anyone using it to help draw their
attention to the deprecation and provide a URL which helps explain
things in more detail, including how to add kernel command line
parameters to some of the more popular Linux distributions.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Get rid of redundant assignments which end up in values not being
read either because they are overwritten or the function ends.
Reported by clang-tidy [deadcode.DeadStores]
Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout
the stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower
iTLB pressure and prevents identity mapping huge pages from
getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop
the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called
from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling
of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
/DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
=ELpe
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmI473AUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPaMBAAuxb+RBG0Wqlt0ktUYHF0ZxDVJOTK
YGGmaDp657YJ349+c0U3mrhm7Wj8Mn7Eoz3tAYUWRQ5xPziJQRX7PfxFzT/qpPUz
XYLRppwCWpLSB5NdpNzK3RdGNv+/9BzZ6gmjTj2wfsUCOA8cfpB1pYwyIWm6M9B+
FXMTZ7WOqiuJ3wJa5nD1PPM1z+99nPkYiE6/iKsDidbQgSl8NX6mJY/yUsVxcZ6A
c45n0Pf6Fj9w1XKdVDPfiRY4nekmPCwqbrn7QVtiuCYyC54JcZNmuCQnoN8dy5XY
s/j2M2DBxT6M9rjOqQznL5jGdNKFCWydCAso06JO/13pfakvPpSS6v95Iltqkbtw
1oHf3j5URIirAhyqcyPGoQz+g5c6krgx/Z2GOpvDs9r/AQ80GlpOBYhN3x61lVT5
MLYq0ylV1Vfosnv7a6+AQZ9lJAkmIqws1WtG28adn7/zMPyD/hWwQ7736k/50CMl
oC6zi3G6jCZueWdHZviqf96bjW20ZmNL2DQRy0n8ZSQQGgrsQnFgYMpXtB1Zv8+m
XaDOPo20Ne68rzmTsEp2gVgcnXFc5/KQBDvaUta9etrbTEWqQqqTWiP8mA2QiGme
JwKMgprV0uVDd6s9TC/O0as02xoKrWuGaL7czhlFxuL45k0nYDmk7ea/gz9MrcWV
Y5pzAxs4LVMwVzs=
=5E1v
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220321' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"We've got a number of SELinux patches queued up, the highlights are:
- Fixup the security_fs_context_parse_param() LSM hook so it executes
all of the LSM hook implementations unless a serious error occurs.
We also correct the SELinux hook implementation so that it returns
zero on success.
- In addition to a few SELinux mount option parsing fixes, we
simplified the parsing by moving it earlier in the process.
The logic was that it was unlikely an admin/user would use the new
mount API and not have the policy loaded before passing the SELinux
options.
- Properly fixed the LSM/SELinux/SCTP hooks with the addition of the
security_sctp_assoc_established() hook.
This work was done in conjunction with the netdev folks and should
complete the move of the SCTP labeling from the endpoints to the
associations.
- Fixed a variety of sparse warnings caused by changes in the "__rcu"
markings of some core kernel structures.
- Ensure we access the superblock's LSM security blob using the
stacking-safe accessors.
- Added the ability for the kernel to always allow FIOCLEX and
FIONCLEX if the "ioctl_skip_cloexec" policy capability is
specified.
- Various constifications improvements, type casting improvements,
additional return value checks, and dead code/parameter removal.
- Documentation fixes"
* tag 'selinux-pr-20220321' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (23 commits)
selinux: shorten the policy capability enum names
docs: fix 'make htmldocs' warning in SCTP.rst
selinux: allow FIOCLEX and FIONCLEX with policy capability
selinux: use correct type for context length
selinux: drop return statement at end of void functions
security: implement sctp_assoc_established hook in selinux
security: add sctp_assoc_established hook
selinux: parse contexts for mount options early
selinux: various sparse fixes
selinux: try to use preparsed sid before calling parse_sid()
selinux: Fix selinux_sb_mnt_opts_compat()
LSM: general protection fault in legacy_parse_param
selinux: fix a type cast problem in cred_init_security()
selinux: drop unused macro
selinux: simplify cred_init_security
selinux: do not discard const qualifier in cast
selinux: drop unused parameter of avtab_insert_node
selinux: drop cast to same type
selinux: enclose macro arguments in parenthesis
selinux: declare name parameter of hash_eval const
...
Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.
Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. These stats are only accessible through RTM_GETSTATS, and
therefore should be toggled by a RTM_SETSTATS message. Add it, and the
necessary skeleton handler.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SELinux policy capability enum names are rather long and follow
the "POLICYDB_CAPABILITY_XXX format". While the "POLICYDB_" prefix
is helpful in tying the enums to other SELinux policy constants,
macros, etc. there is no reason why we need to spell out
"CAPABILITY" completely. Shorten "CAPABILITY" to "CAP" in order to
make things a bit shorter and cleaner.
Moving forward, the SELinux policy capability enum names should
follow the "POLICYDB_CAP_XXX" format.
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch adds new rtm tunnel msg and api for tunnel id
filtering in dst_metadata devices. First dst_metadata
device to use the api is vxlan driver with AF_BRIDGE
family.
This and later changes add ability in vxlan driver to do
tunnel id filtering (or vni filtering) on dst_metadata
devices. This is similar to vlan api in the vlan filtering bridge.
this patch includes selinux nlmsg_route_perms support for RTM_*TUNNEL
api from Benjamin Poirier.
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These ioctls are equivalent to fcntl(fd, F_SETFD, flags), which SELinux
always allows too. Furthermore, a failed FIOCLEX could result in a file
descriptor being leaked to a process that should not have access to it.
As this patch removes access controls, a policy capability needs to be
enabled in policy to always allow these ioctls.
Based-on-patch-by: Demi Marie Obenour <demiobenour@gmail.com>
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
mutex_is_locked() tests whether the mutex is locked *by any task*, while
here we want to test if it is held *by the current task*. To avoid
false/missed WARNINGs, use lockdep_assert_is_held() and
lockdep_assert_is_not_held() instead, which do the right thing (though
they are a no-op if CONFIG_LOCKDEP=n).
Cc: stable@vger.kernel.org
Fixes: 2554a48f44 ("selinux: measure state and policy capabilities")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
security_sid_to_context() expects a pointer to an u32 as the address
where to store the length of the computed context.
Reported by sparse:
security/selinux/xfrm.c:359:39: warning: incorrect type in arg 4
(different signedness)
security/selinux/xfrm.c:359:39: expected unsigned int
[usertype] *scontext_len
security/selinux/xfrm.c:359:39: got int *
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: wrapped commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Those return statements at the end of a void function are redundant.
Reported by clang-tidy [readability-redundant-control-flow]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Do this by extracting the peer labeling per-association logic from
selinux_sctp_assoc_request() into a new helper
selinux_sctp_process_new_assoc() and use this helper in both
selinux_sctp_assoc_request() and selinux_sctp_assoc_established(). This
ensures that the peer labeling behavior as documented in
Documentation/security/SCTP.rst is applied both on the client and server
side:
"""
An SCTP socket will only have one peer label assigned to it. This will be
assigned during the establishment of the first association. Any further
associations on this socket will have their packet peer label compared to
the sockets peer label, and only if they are different will the
``association`` permission be validated. This is validated by checking the
socket peer sid against the received packets peer sid to determine whether
the association should be allowed or denied.
"""
At the same time, it also ensures that the peer label of the association
is set to the correct value, such that if it is peeled off into a new
socket, the socket's peer label will then be set to the association's
peer label, same as it already works on the server side.
While selinux_inet_conn_established() (which we are replacing by
selinux_sctp_assoc_established() for SCTP) only deals with assigning a
peer label to the connection (socket), in case of SCTP we need to also
copy the (local) socket label to the association, so that
selinux_sctp_sk_clone() can then pick it up for the new socket in case
of SCTP peeloff.
Careful readers will notice that the selinux_sctp_process_new_assoc()
helper also includes the "IPv4 packet received over an IPv6 socket"
check, even though it hadn't been in selinux_sctp_assoc_request()
before. While such check is not necessary in
selinux_inet_conn_request() (because struct request_sock's family field
is already set according to the skb's family), here it is needed, as we
don't have request_sock and we take the initial family from the socket.
In selinux_sctp_assoc_established() it is similarly needed as well (and
also selinux_inet_conn_established() already has it).
Fixes: 72e89f5008 ("security: Add support for SCTP security hooks")
Reported-by: Prashanth Prahlad <pprahlad@redhat.com>
Based-on-patch-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Richard Haines <richard_c_haines@btinternet.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit b8b87fd954 ("selinux: Fix selinux_sb_mnt_opts_compat()")
started to parse mount options into SIDs in selinux_add_opt() if policy
has already been loaded. Since it's extremely unlikely that anyone would
depend on the ability to set SELinux contexts on fs_context before
loading the policy and then mounting that context after simplify the
logic by always parsing the options early.
Note that the multi-step mounting is only possible with the new
fscontext mount API and wasn't possible before its introduction.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
On error path from cond_read_list() and duplicate_policydb_cond_list()
the cond_list_destroy() gets called a second time in caller functions,
resulting in NULL pointer deref. Fix this by resetting the
cond_list_len to 0 in cond_list_destroy(), making subsequent calls a
noop.
Also consistently reset the cond_list pointer to NULL after freeing.
Cc: stable@vger.kernel.org
Signed-off-by: Vratislav Bendel <vbendel@redhat.com>
[PM: fix line lengths in the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
When running the SELinux code through sparse, there are a handful of
warnings. This patch resolves some of these warnings caused by
"__rcu" mismatches.
% make W=1 C=1 security/selinux/
Signed-off-by: Paul Moore <paul@paul-moore.com>
Avoid unnecessary parsing of sids that have already been parsed via
selinux_sb_eat_lsm_opts().
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
selinux_sb_mnt_opts_compat() is called under the sb_lock spinlock and
shouldn't be performing any memory allocations. Fix this by parsing the
sids at the same time we're chopping up the security mount options
string and then using the pre-parsed sids when doing the comparison.
Fixes: cc274ae776 ("selinux: fix sleeping function called from invalid context")
Fixes: 69c4a42d72 ("lsm,selinux: add new hook to compare new mount to an existing mount")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The usual LSM hook "bail on fail" scheme doesn't work for cases where
a security module may return an error code indicating that it does not
recognize an input. In this particular case Smack sees a mount option
that it recognizes, and returns 0. A call to a BPF hook follows, which
returns -ENOPARAM, which confuses the caller because Smack has processed
its data.
The SELinux hook incorrectly returns 1 on success. There was a time
when this was correct, however the current expectation is that it
return 0 on success. This is repaired.
Reported-by: syzbot+d1e3b1d92d25abf97943@syzkaller.appspotmail.com
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In the process of removing an explicit type cast to preserve a cred
const qualifier in cred_init_security() we ran into a problem where
the task_struct::real_cred field is defined with the "__rcu"
attribute but the selinux_cred() function parameter is not, leading
to a sparse warning:
security/selinux/hooks.c:216:36: sparse: sparse:
incorrect type in argument 1 (different address spaces)
@@ expected struct cred const *cred
@@ got struct cred const [noderef] __rcu *real_cred
As we don't want to add the "__rcu" attribute to the selinux_cred()
parameter, we're going to add an explicit cast back to
cred_init_security().
Fixes: b084e189b0 ("selinux: simplify cred_init_security")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The macro _DEBUG_HASHES is nowhere used. The configuration DEBUG_HASHES
enables debugging of the SELinux hash tables, but the with an underscore
prefixed macro definition has no direct impact or any documentation.
Reported by clang [-Wunused-macros]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The parameter of selinux_cred() is declared const, so an explicit cast
dropping the const qualifier is not necessary. Without the cast the
local variable cred serves no purpose.
Reported by clang [-Wcast-qual]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Do not discard the const qualifier on the cast from const void* to
__be32*; the addressed value is not modified.
Reported by clang [-Wcast-qual]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The parameter cur is not used in avtab_insert_node().
Reported by clang [-Wunused-parameter]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Both the lvalue scontextp and rvalue scontext are of the type char*.
Drop the redundant explicit cast not needed since commit 9a59daa03d
("SELinux: fix sleeping allocation in security_context_to_sid"), where
the type of scontext changed from const char* to char*.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Enclose the macro arguments in parenthesis to avoid potential evaluation
order issues.
Note the xperm and ebitmap macros are still not side-effect safe due to
double evaluation.
Reported by clang-tidy [bugprone-macro-parentheses]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
String literals are passed as second argument to hash_eval(). Also the
parameter is already declared const in the DEBUG_HASHES configuration.
Reported by clang [-Wwrite-strings]:
security/selinux/ss/policydb.c:1881:26: error: passing
'const char [8]' to parameter of type 'char *' discards
qualifiers
hash_eval(&p->range_tr, rangetr);
^~~~~~~~~
security/selinux/ss/policydb.c:707:55: note: passing argument to
parameter 'hash_name' here
static inline void hash_eval(struct hashtab *h, char *hash_name)
^
security/selinux/ss/policydb.c:2099:32: error: passing
'const char [11]' to parameter of type 'char *' discards
qualifiers
hash_eval(&p->filename_trans, filenametr);
^~~~~~~~~~~~
security/selinux/ss/policydb.c:707:55: note: passing argument to
parameter 'hash_name' here
static inline void hash_eval(struct hashtab *h, char *hash_name)
^
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: line wrapping in description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The path parameter is only read from in security_genfs_sid(),
selinux_policy_genfs_sid() and __security_genfs_sid(). Since a string
literal is passed as argument, declare the parameter const.
Also align the parameter names in the declaration and definition.
Reported by clang [-Wwrite-strings]:
security/selinux/hooks.c:553:60: error: passing 'const char [2]'
to parameter of type 'char *' discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
rc = security_genfs_sid(&selinux_state, ... , /,
^~~
./security/selinux/include/security.h:389:36: note: passing
argument to parameter 'name' here
const char *fstype, char *name, u16 sclass,
^
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: wrapped description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
sel_make_avc_files() might fail and return a negative errno value on
memory allocation failures. Re-add the check of the return value,
dropped in 66f8e2f03c ("selinux: sidtab reverse lookup hash table").
Reported by clang-analyzer:
security/selinux/selinuxfs.c:2129:2: warning: Value stored to
'ret' is never read [deadcode.DeadStores]
ret = sel_make_avc_files(dentry);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 66f8e2f03c ("selinux: sidtab reverse lookup hash table")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
[PM: description line wrapping, added proper commit ref]
Signed-off-by: Paul Moore <paul@paul-moore.com>
LSM blob has been involved for superblock's security struct. So fix the
remaining direct access to sb->s_security by using the LSM blob
mechanism.
Fixes: 08abe46b2c ("selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support")
Fixes: 69c4a42d72 ("lsm,selinux: add new hook to compare new mount to an existing mount")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmHcekAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNMAQ//TlHhQPEkPYZwD0EPXQIkl5IxGZfR
DSg24OMzrh+x/jKVYhMXuW6BlnF10wd2ocG0DNAt9weF22PLrbDsxbzAAnv+MLXM
wCzWLmgKpPpbaG57oa17LMcswDjVr8YbxXPOvyZJ/G3YsWDarH8Ezot8iNlVUkOh
oqjjXTy0dyLUB/FsW7sIGWa3O18SkbI4pDQxL2vcpdDbvPtAY94kNG3bKTONK/kn
OxLrjKscTtu6EuSdFhMqwcUxpfqvqUDrEfiSdNruMlT0/DixHgFlJA8WHXjn7Wpq
XTbqdFrClfpmIiTrPSSszLrZAceegTdefDAf5wgfTxmcfYKiPDF9kPu5UPX1rAgO
hSoXvGvPjez+RzyXmv9S292lkhV4tz/Wg0YlxZ0LUWif/CSZEyeGGoFZs8080Ukl
1C/oNrByriaGGRLVwKNGOg5x/UkP1ipnorNnAiiIhw+xkzfXm12RdisUyUgLh9+w
WsfsC9BJkXMpuvfkgya5PJ677wVNou4ZtJX+2MV8CfGEKTAJp/X/HKh3tWkQFYKx
35QLVO1LD6gJZlSZZTsjZDxUyPwSd8e55GSFn1qKIHuC2jKjSkwCE7JfErIrI/W0
js6HKO2ak7oMTWjxRizANyKI/yva/DeMl+mKCC4QV0xmwlm5JsOe7mYSCNGws7AV
A2qYX7S1xKbLTGI=
=8H+p
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220110' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Nothing too significant, but five SELinux patches for v5.17 that do
the following:
- Harden the code through additional use of the struct_size() macro
- Plug some memory leaks
- Clean up the code via removal of the security_add_mnt_opt() LSM
hook and minor tweaks to selinux_add_opt()
- Rename security_task_getsecid_subj() to better reflect its actual
behavior/use - now called security_current_getsecid_subj()"
* tag 'selinux-pr-20220110' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: minor tweaks to selinux_add_opt()
selinux: fix potential memleak in selinux_add_opt()
security,selinux: remove security_add_mnt_opt()
selinux: Use struct_size() helper in kmalloc()
lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()
Clang static analysis reports this warning
hooks.c:5765:6: warning: 4th function call argument is an uninitialized
value
if (selinux_xfrm_postroute_last(sksec->sid, skb, &ad, proto))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
selinux_parse_skb() can return ok without setting proto. The later call
to selinux_xfrm_postroute_last() does an early check of proto and can
return ok if the garbage proto value matches. So initialize proto.
Cc: stable@vger.kernel.org
Fixes: eef9b41622 ("selinux: cleanup selinux_xfrm_sock_rcv_skb() and selinux_xfrm_postroute_last()")
Signed-off-by: Tom Rix <trix@redhat.com>
[PM: typo/spelling and checkpatch.pl description fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Two minor edits to selinux_add_opt(): use "sizeof(*ptr)" instead of
"sizeof(type)" in the kzalloc() call, and rename the "Einval" jump
target to "err" for the sake of consistency.
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch try to fix potential memleak in error branch.
Fixes: ba64186233 ("selinux: new helper - selinux_add_opt()")
Signed-off-by: Bernard Zhao <bernard@vivo.com>
[PM: tweak the subject line, add Fixes tag]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Its last user has been removed in commit f2aedb713c ("NFS: Add
fs_context support.").
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Make use of struct_size() helper instead of an open-coded calculation.
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The security_task_getsecid_subj() LSM hook invites misuse by allowing
callers to specify a task even though the hook is only safe when the
current task is referenced. Fix this by removing the task_struct
argument to the hook, requiring LSM implementations to use the
current task. While we are changing the hook declaration we also
rename the function to security_current_getsecid_subj() in an effort
to reinforce that the hook captures the subjective credentials of the
current task and not an arbitrary task on the system.
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When the hash table slot array allocation fails in hashtab_init(),
h->size is left initialized with a non-zero value, but the h->htable
pointer is NULL. This may then cause a NULL pointer dereference, since
the policydb code relies on the assumption that even after a failed
hashtab_init(), hashtab_map() and hashtab_destroy() can be safely called
on it. Yet, these detect an empty hashtab only by looking at the size.
Fix this by making sure that hashtab_init() always leaves behind a valid
empty hashtab when the allocation fails.
Cc: stable@vger.kernel.org
Fixes: 03414a49ad ("selinux: do not allocate hashtabs dynamically")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch reverts two prior patches, e7310c9402
("security: implement sctp_assoc_established hook in selinux") and
7c2ef0240e ("security: add sctp_assoc_established hook"), which
create the security_sctp_assoc_established() LSM hook and provide a
SELinux implementation. Unfortunately these two patches were merged
without proper review (the Reviewed-by and Tested-by tags from
Richard Haines were for previous revisions of these patches that
were significantly different) and there are outstanding objections
from the SELinux maintainers regarding these patches.
Work is currently ongoing to correct the problems identified in the
reverted patches, as well as others that have come up during review,
but it is unclear at this point in time when that work will be ready
for inclusion in the mainline kernel. In the interest of not keeping
objectionable code in the kernel for multiple weeks, and potentially
a kernel release, we are reverting the two problematic patches.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Different from selinux_inet_conn_established(), it also gives the
secid to asoc->peer_secid in selinux_sctp_assoc_established(),
as one UDP-type socket may have more than one asocs.
Note that peer_secid in asoc will save the peer secid for this
asoc connection, and peer_sid in sksec will just keep the peer
secid for the latest connection. So the right use should be do
peeloff for UDP-type socket if there will be multiple asocs in
one socket, so that the peeloff socket has the right label for
its asoc.
v1->v2:
- call selinux_inet_conn_established() to reduce some code
duplication in selinux_sctp_assoc_established(), as Ondrej
suggested.
- when doing peeloff, it calls sock_create() where it actually
gets secid for socket from socket_sockcreate_sid(). So reuse
SECSID_WILD to ensure the peeloff socket keeps using that
secid after calling selinux_sctp_sk_clone() for client side.
Fixes: 72e89f5008 ("security: Add support for SCTP security hooks")
Reported-by: Prashanth Prahlad <pprahlad@redhat.com>
Reviewed-by: Richard Haines <richard_c_haines@btinternet.com>
Tested-by: Richard Haines <richard_c_haines@btinternet.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to move secid and peer_secid from endpoint to association,
and pass asoc to sctp_assoc_request and sctp_sk_clone instead of ep. As
ep is the local endpoint and asoc represents a connection, and in SCTP
one sk/ep could have multiple asoc/connection, saving secid/peer_secid
for new asoc will overwrite the old asoc's.
Note that since asoc can be passed as NULL, security_sctp_assoc_request()
is moved to the place right after the new_asoc is created in
sctp_sf_do_5_1B_init() and sctp_sf_do_unexpected_init().
v1->v2:
- fix the description of selinux_netlbl_skbuff_setsid(), as Jakub noticed.
- fix the annotation in selinux_sctp_assoc_request(), as Richard Noticed.
Fixes: 72e89f5008 ("security: Add support for SCTP security hooks")
Reported-by: Prashanth Prahlad <pprahlad@redhat.com>
Reviewed-by: Richard Haines <richard_c_haines@btinternet.com>
Tested-by: Richard Haines <richard_c_haines@btinternet.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmGANbAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNaMBAAg+9gZr0F7xiafu8JFZqZfx/AQdJ2
G2cn3le+/tXGZmF8m/+82lOaR6LeQLatgSDJNSkXWkKr0nRwseQJDbtRfvYJdn0t
Ax05/Fmz6OGxQ2wgRYgaFiSrKpE5p3NhDtiLFVdkCJaQNe/8DZOc7NhBl6EjZf3x
ubhl2hUiJ4AmiXGwcYhr4uKgP4nhW8OM1/OkskVi+bBMmLA8KTY9kslmIDP5E3BW
29W4qhqeLNQupY5dGMEMVcyxY9ZUWpO39q4uOaQVZrUGE7xABkj/jhnxT5gFTSlI
pu8VhsYXm9KuRVveIsv0L5SZfadwoM9YAl7ki1wD3W5rHqOAte3rBTm6VmNlQwfU
MqxP65Jiyxudxet5Be3/dCRH/+MDQuwBxivgmZXbeVxor2SeznVb0GDaEUC5FSHu
CJIgWtQzsPJMxgAEGXN4F3QGP0htTTJni56GUPOsrf4TIBW02TT+oLTLFRIokQQL
INNOfwVSRXElnCsvxsHR4oB+JZ9pJyBaAmeupcQ6jmcKiWlbLj4s+W0U0pM5h91v
hmMpz7KMxrX6gVL4gB2Jj4aN3r5YRbq26NBu6D+wdwwBTeTTocaHSpAqkv4buClf
uNk3cG8Hkp8TTg9cM8jYgpxMyzKH/AI/Uw3VhEa1xCiq2Ck3DgfnZvnvcRRaZevU
FPgmwgqePJXGi60=
=sb8J
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Add LSM/SELinux/Smack controls and auditing for io-uring.
As usual, the individual commit descriptions have more detail, but we
were basically missing two things which we're adding here:
+ establishment of a proper audit context so that auditing of
io-uring ops works similarly to how it does for syscalls (with
some io-uring additions because io-uring ops are *not* syscalls)
+ additional LSM hooks to enable access control points for some of
the more unusual io-uring features, e.g. credential overrides.
The additional audit callouts and LSM hooks were done in conjunction
with the io-uring folks, based on conversations and RFC patches
earlier in the year.
- Fixup the binder credential handling so that the proper credentials
are used in the LSM hooks; the commit description and the code
comment which is removed in these patches are helpful to understand
the background and why this is the proper fix.
- Enable SELinux genfscon policy support for securityfs, allowing
improved SELinux filesystem labeling for other subsystems which make
use of securityfs, e.g. IMA.
* tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
security: Return xattr name from security_dentry_init_security()
selinux: fix a sock regression in selinux_ip_postroute_compat()
binder: use cred instead of task for getsecid
binder: use cred instead of task for selinux checks
binder: use euid from cred instead of using task
LSM: Avoid warnings about potentially unused hook variables
selinux: fix all of the W=1 build warnings
selinux: make better use of the nf_hook_state passed to the NF hooks
selinux: fix race condition when computing ocontext SIDs
selinux: remove unneeded ipv6 hook wrappers
selinux: remove the SELinux lockdown implementation
selinux: enable genfscon labeling for securityfs
Smack: Brutalist io_uring support
selinux: add support for the io_uring access controls
lsm,io_uring: add LSM hooks to io_uring
io_uring: convert io_uring to the secure anon inode interface
fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure()
audit: add filtering for io_uring records
audit,io_uring,io-wq: add some basic audit support to io_uring
audit: prepare audit_context for use in calling contexts beyond syscalls
Right now security_dentry_init_security() only supports single security
label and is used by SELinux only. There are two users of this hook,
namely ceph and nfs.
NFS does not care about xattr name. Ceph hardcodes the xattr name to
security.selinux (XATTR_NAME_SELINUX).
I am making changes to fuse/virtiofs to send security label to virtiofsd
and I need to send xattr name as well. I also hardcoded the name of
xattr to security.selinux.
Stephen Smalley suggested that it probably is a good idea to modify
security_dentry_init_security() to also return name of xattr so that
we can avoid this hardcoding in the callers.
This patch adds a new parameter "const char **xattr_name" to
security_dentry_init_security() and LSM puts the name of xattr
too if caller asked for it (xattr_name != NULL).
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
[PM: fixed typos in the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Unfortunately we can't rely on nf_hook_state->sk being the proper
originating socket so revert to using skb_to_full_sk(skb).
Fixes: 1d1e1ded13 ("selinux: make better use of the nf_hook_state passed to the NF hooks")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Since binder was integrated with selinux, it has passed
'struct task_struct' associated with the binder_proc
to represent the source and target of transactions.
The conversion of task to SID was then done in the hook
implementations. It turns out that there are race conditions
which can result in an incorrect security context being used.
Fix by using the 'struct cred' saved during binder_open and pass
it to the selinux subsystem.
Cc: stable@vger.kernel.org # 5.14 (need backport for earlier stables)
Fixes: 79af73079d ("Add security hooks to binder and implement the hooks for SELinux.")
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
There were a number of places in the code where the function
definition did not match the associated comment block as well
at least one file where the appropriate header files were not
included (missing function declaration/prototype); this patch
fixes all of these issue such that building the SELinux code
with "W=1" is now warning free.
% make W=1 security/selinux/
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch builds on a previous SELinux/netfilter patch by Florian
Westphal and makes better use of the nf_hook_state variable passed
into the SELinux/netfilter hooks as well as a number of other small
cleanups in the related code.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Current code contains a lot of racy patterns when converting an
ocontext's context structure to an SID. This is being done in a "lazy"
fashion, such that the SID is looked up in the SID table only when it's
first needed and then cached in the "sid" field of the ocontext
structure. However, this is done without any locking or memory barriers
and is thus unsafe.
Between commits 24ed7fdae6 ("selinux: use separate table for initial
SID lookup") and 66f8e2f03c ("selinux: sidtab reverse lookup hash
table"), this race condition lead to an actual observable bug, because a
pointer to the shared sid field was passed directly to
sidtab_context_to_sid(), which was using this location to also store an
intermediate value, which could have been read by other threads and
interpreted as an SID. In practice this caused e.g. new mounts to get a
wrong (seemingly random) filesystem context, leading to strange denials.
This bug has been spotted in the wild at least twice, see [1] and [2].
Fix the race condition by making all the racy functions use a common
helper that ensures the ocontext::sid accesses are made safely using the
appropriate SMP constructs.
Note that security_netif_sid() was populating the sid field of both
contexts stored in the ocontext, but only the first one was actually
used. The SELinux wiki's documentation on the "netifcon" policy
statement [3] suggests that using only the first context is intentional.
I kept only the handling of the first context here, as there is really
no point in doing the SID lookup for the unused one.
I wasn't able to reproduce the bug mentioned above on any kernel that
includes commit 66f8e2f03c, even though it has been reported that the
issue occurs with that commit, too, just less frequently. Thus, I wasn't
able to verify that this patch fixes the issue, but it makes sense to
avoid the race condition regardless.
[1] https://github.com/containers/container-selinux/issues/89
[2] https://lists.fedoraproject.org/archives/list/selinux@lists.fedoraproject.org/thread/6DMTAMHIOAOEMUAVTULJD45JZU7IBAFM/
[3] https://selinuxproject.org/page/NetworkStatements#netifcon
Cc: stable@vger.kernel.org
Cc: Xinjie Zheng <xinjie@google.com>
Reported-by: Sujithra Periasamy <sujithra@google.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Netfilter places the protocol number the hook function is getting called
from in state->pf, so we can use that instead of an extra wrapper.
While at it, remove one-line wrappers too and make
selinux_ip_{out,forward,postroute} useable as hook function.
Signed-off-by: Florian Westphal <fw@strlen.de>
Message-Id: <20211011202229.28289-1-fw@strlen.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2021-10-07
1) Fix a sysbot reported shift-out-of-bounds in xfrm_get_default.
From Pavel Skripkin.
2) Fix XFRM_MSG_MAPPING ABI breakage. The new XFRM_MSG_MAPPING
messages were accidentally not paced at the end.
Fix by Eugene Syromiatnikov.
3) Fix the uapi for the default policy, use explicit field and macros
and make it accessible to userland.
From Nicolas Dichtel.
4) Fix a missing rcu lock in xfrm_notify_userpolicy().
From Nicolas Dichtel.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
NOTE: This patch intentionally omits any "Fixes:" metadata or stable
tagging since it removes a SELinux access control check; while
removing the control point is the right thing to do moving forward,
removing it in stable kernels could be seen as a regression.
The original SELinux lockdown implementation in 59438b4647
("security,lockdown,selinux: implement SELinux lockdown") used the
current task's credentials as both the subject and object in the
SELinux lockdown hook, selinux_lockdown(). Unfortunately that
proved to be incorrect in a number of cases as the core kernel was
calling the LSM lockdown hook in places where the credentials from
the "current" task_struct were not the correct credentials to use
in the SELinux access check.
Attempts were made to resolve this by adding a credential pointer
to the LSM lockdown hook as well as suggesting that the single hook
be split into two: one for user tasks, one for kernel tasks; however
neither approach was deemed acceptable by Linus. Faced with the
prospect of either changing the subj/obj in the access check to a
constant context (likely the kernel's label) or removing the SELinux
lockdown check entirely, the SELinux community decided that removing
the lockdown check was preferable.
The supporting changes to the general LSM layer are left intact, this
patch only removes the SELinux implementation.
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add support for genfscon per-file labeling of securityfs files.
This allows for separate labels and thereby access control for
different files. For example a genfscon statement
genfscon securityfs /integrity/ima/policy \
system_u:object_r:ima_policy_t:s0
will set a private label to the IMA policy file and thus allow to
control the ability to set the IMA policy. Setting labels directly
with setxattr(2), e.g. by chcon(1) or setfiles(8), is still not
supported.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: line width fixes in the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Jann Horn reported a problem with commit eb1231f73c ("selinux:
clarify task subjective and objective credentials") where some LSM
hooks were attempting to access the subjective credentials of a task
other than the current task. Generally speaking, it is not safe to
access another task's subjective credentials and doing so can cause
a number of problems.
Further, while looking into the problem, I realized that Smack was
suffering from a similar problem brought about by a similar commit
1fb057dcde ("smack: differentiate between subjective and objective
task credentials").
This patch addresses this problem by restoring the use of the task's
objective credentials in those cases where the task is other than the
current executing task. Not only does this resolve the problem
reported by Jann, it is arguably the correct thing to do in these
cases.
Cc: stable@vger.kernel.org
Fixes: eb1231f73c ("selinux: clarify task subjective and objective credentials")
Fixes: 1fb057dcde ("smack: differentiate between subjective and objective task credentials")
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch implements two new io_uring access controls, specifically
support for controlling the io_uring "personalities" and
IORING_SETUP_SQPOLL. Controlling the sharing of io_urings themselves
is handled via the normal file/inode labeling and sharing mechanisms.
The io_uring { override_creds } permission restricts which domains
the subject domain can use to override it's own credentials.
Granting a domain the io_uring { override_creds } permission allows
it to impersonate another domain in io_uring operations.
The io_uring { sqpoll } permission restricts which domains can create
asynchronous io_uring polling threads. This is important from a
security perspective as operations queued by this asynchronous thread
inherit the credentials of the thread creator by default; if an
io_uring is shared across process/domain boundaries this could result
in one domain impersonating another. Controlling the creation of
sqpoll threads, and the sharing of io_urings across processes, allow
policy authors to restrict the ability of one domain to impersonate
another via io_uring.
As a quick summary, this patch adds a new object class with two
permissions:
io_uring { override_creds sqpoll }
These permissions can be seen in the two simple policy statements
below:
allow domA_t domB_t : io_uring { override_creds };
allow domA_t self : io_uring { sqpoll };
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 2d151d3907 ("xfrm: Add possibility to set the default to block
if we have no policy") broke ABI by changing the value of the XFRM_MSG_MAPPING
enum item, thus also evading the build-time check
in security/selinux/nlmsgtab.c:selinux_nlmsg_lookup for presence of proper
security permission checks in nlmsg_xfrm_perms. Fix it by placing
XFRM_MSG_SETDEFAULT/XFRM_MSG_GETDEFAULT to the end of the enum, right before
__XFRM_MSG_MAX, and updating the nlmsg_xfrm_perms accordingly.
Fixes: 2d151d3907 ("xfrm: Add possibility to set the default to block if we have no policy")
References: https://lore.kernel.org/netdev/20210901151402.GA2557@altlinux.org/
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Acked-by: Antony Antony <antony.antony@secunet.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCYS4c6hQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5b4OAP9l7cnpkOzVUtjoNIIYdIiKTDp+Kb8v
3o08lxtyzALfKgEAlrizzLfphqLa2yCdxbyaTjkx19J7tav27xVti8uVGgs=
=hIxY
-----END PGP SIGNATURE-----
Merge tag 'integrity-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity subsystem updates from Mimi Zohar:
- Limit the allowed hash algorithms when writing security.ima xattrs or
verifying them, based on the IMA policy and the configured hash
algorithms.
- Return the calculated "critical data" measurement hash and size to
avoid code duplication. (Preparatory change for a proposed LSM.)
- and a single patch to address a compiler warning.
* tag 'integrity-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
IMA: reject unknown hash algorithms in ima_get_hash_algo
IMA: prevent SETXATTR_CHECK policy rules with unavailable algorithms
IMA: introduce a new policy option func=SETXATTR_CHECK
IMA: add a policy option to restrict xattr hash algorithms on appraisal
IMA: add support to restrict the hash algorithms used for file appraisal
IMA: block writes of the security.ima xattr with unsupported algorithms
IMA: remove the dependency on CRYPTO_MD5
ima: Add digest and digest_len params to the functions to measure a buffer
ima: Return int in the functions to measure a buffer
ima: Introduce ima_get_current_hash_algo()
IMA: remove -Wmissing-prototypes warning
- Enable memcg accounting for various networking objects.
BPF:
- Introduce bpf timers.
- Add perf link and opaque bpf_cookie which the program can read
out again, to be used in libbpf-based USDT library.
- Add bpf_task_pt_regs() helper to access user space pt_regs
in kprobes, to help user space stack unwinding.
- Add support for UNIX sockets for BPF sockmap.
- Extend BPF iterator support for UNIX domain sockets.
- Allow BPF TCP congestion control progs and bpf iterators to call
bpf_setsockopt(), e.g. to switch to another congestion control
algorithm.
Protocols:
- Support IOAM Pre-allocated Trace with IPv6.
- Support Management Component Transport Protocol.
- bridge: multicast: add vlan support.
- netfilter: add hooks for the SRv6 lightweight tunnel driver.
- tcp:
- enable mid-stream window clamping (by user space or BPF)
- allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
- more accurate DSACK processing for RACK-TLP
- mptcp:
- add full mesh path manager option
- add partial support for MP_FAIL
- improve use of backup subflows
- optimize option processing
- af_unix: add OOB notification support.
- ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by
the router.
- mac80211: Target Wake Time support in AP mode.
- can: j1939: extend UAPI to notify about RX status.
Driver APIs:
- Add page frag support in page pool API.
- Many improvements to the DSA (distributed switch) APIs.
- ethtool: extend IRQ coalesce uAPI with timer reset modes.
- devlink: control which auxiliary devices are created.
- Support CAN PHYs via the generic PHY subsystem.
- Proper cross-chip support for tag_8021q.
- Allow TX forwarding for the software bridge data path to be
offloaded to capable devices.
Drivers:
- veth: more flexible channels number configuration.
- openvswitch: introduce per-cpu upcall dispatch.
- Add internet mix (IMIX) mode to pktgen.
- Transparently handle XDP operations in the bonding driver.
- Add LiteETH network driver.
- Renesas (ravb):
- support Gigabit Ethernet IP
- NXP Ethernet switch (sja1105)
- fast aging support
- support for "H" switch topologies
- traffic termination for ports under VLAN-aware bridge
- Intel 1G Ethernet
- support getcrosststamp() with PCIe PTM (Precision Time
Measurement) for better time sync
- support Credit-Based Shaper (CBS) offload, enabling HW traffic
prioritization and bandwidth reservation
- Broadcom Ethernet (bnxt)
- support pulse-per-second output
- support larger Rx rings
- Mellanox Ethernet (mlx5)
- support ethtool RSS contexts and MQPRIO channel mode
- support LAG offload with bridging
- support devlink rate limit API
- support packet sampling on tunnels
- Huawei Ethernet (hns3):
- basic devlink support
- add extended IRQ coalescing support
- report extended link state
- Netronome Ethernet (nfp):
- add conntrack offload support
- Broadcom WiFi (brcmfmac):
- add WPA3 Personal with FT to supported cipher suites
- support 43752 SDIO device
- Intel WiFi (iwlwifi):
- support scanning hidden 6GHz networks
- support for a new hardware family (Bz)
- Xen pv driver:
- harden netfront against malicious backends
- Qualcomm mobile
- ipa: refactor power management and enable automatic suspend
- mhi: move MBIM to WWAN subsystem interfaces
Refactor:
- Ambient BPF run context and cgroup storage cleanup.
- Compat rework for ndo_ioctl.
Old code removal:
- prism54 remove the obsoleted driver, deprecated by the p54 driver.
- wan: remove sbni/granch driver.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEukBYACgkQMUZtbf5S
IrsyHA//TO8dw18NYts4n9LmlJT2naJ7yBUUSSXK/M+DtW0MQ9nnHhqzPm5uJdRl
IgQTNJrW3dYzRwgqaWZqEwO1t5/FI+f87ND1Nsekg7x9tF66a6ov5WxU26TwwSba
U+si/inQ/4chuQ+LxMQobqCDxaLE46I2dIoRl+YfndJ24DRzYSwAEYIPPbSdfyU+
+/l+3s4GaxO4k/hLciPAiOniyxLoUNiGUTNh+2yqRBXelSRJRKVnl+V22ANFrxRW
nTEiplfVKhlPU1e4iLuRtaxDDiePHhw9I3j/lMHhfeFU2P/gKJIvz4QpGV0CAZg2
1VvDU32WEx1GQLXJbKm0KwoNRUq1QSjOyyFti+BO7ugGaYAR4gKhShOqlSYLzUtB
tbtzQhSNLWOGqgmSJOztZb5kFDm2EdRSll5/lP2uyFlPkIsIp0QbscJVzNTnS74b
Xz15ZOw41Z4TfWPEMWgfrx6Zkm7pPWkly+7WfUkPcHa1gftNz6tzXXxSXcXIBPdi
yQ5JCzzxrM5573YHuk5YedwZpn6PiAt4A/muFGk9C6aXP60TQAOS/ppaUzZdnk4D
NfOk9mj06WEULjYjPcKEuT3GGWE6kmjb8Pu0QZWKOchv7vr6oZly1EkVZqYlXELP
AfhcrFeuufie8mqm0jdb4LnYaAnqyLzlb1J4Zxh9F+/IX7G3yoc=
=JDGD
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Enable memcg accounting for various networking objects.
BPF:
- Introduce bpf timers.
- Add perf link and opaque bpf_cookie which the program can read out
again, to be used in libbpf-based USDT library.
- Add bpf_task_pt_regs() helper to access user space pt_regs in
kprobes, to help user space stack unwinding.
- Add support for UNIX sockets for BPF sockmap.
- Extend BPF iterator support for UNIX domain sockets.
- Allow BPF TCP congestion control progs and bpf iterators to call
bpf_setsockopt(), e.g. to switch to another congestion control
algorithm.
Protocols:
- Support IOAM Pre-allocated Trace with IPv6.
- Support Management Component Transport Protocol.
- bridge: multicast: add vlan support.
- netfilter: add hooks for the SRv6 lightweight tunnel driver.
- tcp:
- enable mid-stream window clamping (by user space or BPF)
- allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
- more accurate DSACK processing for RACK-TLP
- mptcp:
- add full mesh path manager option
- add partial support for MP_FAIL
- improve use of backup subflows
- optimize option processing
- af_unix: add OOB notification support.
- ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
router.
- mac80211: Target Wake Time support in AP mode.
- can: j1939: extend UAPI to notify about RX status.
Driver APIs:
- Add page frag support in page pool API.
- Many improvements to the DSA (distributed switch) APIs.
- ethtool: extend IRQ coalesce uAPI with timer reset modes.
- devlink: control which auxiliary devices are created.
- Support CAN PHYs via the generic PHY subsystem.
- Proper cross-chip support for tag_8021q.
- Allow TX forwarding for the software bridge data path to be
offloaded to capable devices.
Drivers:
- veth: more flexible channels number configuration.
- openvswitch: introduce per-cpu upcall dispatch.
- Add internet mix (IMIX) mode to pktgen.
- Transparently handle XDP operations in the bonding driver.
- Add LiteETH network driver.
- Renesas (ravb):
- support Gigabit Ethernet IP
- NXP Ethernet switch (sja1105):
- fast aging support
- support for "H" switch topologies
- traffic termination for ports under VLAN-aware bridge
- Intel 1G Ethernet
- support getcrosststamp() with PCIe PTM (Precision Time
Measurement) for better time sync
- support Credit-Based Shaper (CBS) offload, enabling HW traffic
prioritization and bandwidth reservation
- Broadcom Ethernet (bnxt)
- support pulse-per-second output
- support larger Rx rings
- Mellanox Ethernet (mlx5)
- support ethtool RSS contexts and MQPRIO channel mode
- support LAG offload with bridging
- support devlink rate limit API
- support packet sampling on tunnels
- Huawei Ethernet (hns3):
- basic devlink support
- add extended IRQ coalescing support
- report extended link state
- Netronome Ethernet (nfp):
- add conntrack offload support
- Broadcom WiFi (brcmfmac):
- add WPA3 Personal with FT to supported cipher suites
- support 43752 SDIO device
- Intel WiFi (iwlwifi):
- support scanning hidden 6GHz networks
- support for a new hardware family (Bz)
- Xen pv driver:
- harden netfront against malicious backends
- Qualcomm mobile
- ipa: refactor power management and enable automatic suspend
- mhi: move MBIM to WWAN subsystem interfaces
Refactor:
- Ambient BPF run context and cgroup storage cleanup.
- Compat rework for ndo_ioctl.
Old code removal:
- prism54 remove the obsoleted driver, deprecated by the p54 driver.
- wan: remove sbni/granch driver"
* tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
net: Add depends on OF_NET for LiteX's LiteETH
ipv6: seg6: remove duplicated include
net: hns3: remove unnecessary spaces
net: hns3: add some required spaces
net: hns3: clean up a type mismatch warning
net: hns3: refine function hns3_set_default_feature()
ipv6: remove duplicated 'net/lwtunnel.h' include
net: w5100: check return value after calling platform_get_resource()
net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
fou: remove sparse errors
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
octeontx2-af: Set proper errorcode for IPv4 checksum errors
octeontx2-af: Fix static code analyzer reported issues
octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
octeontx2-af: Fix loop in free and unmap counter
af_unix: fix potential NULL deref in unix_dgram_connect()
dpaa2-eth: Replace strlcpy with strscpy
octeontx2-af: Use NDC TX for transmit packet data
...
Pull selinux update from Paul Moore:
"We've got an unusually small SELinux pull request for v5.15 that
consists of only one (?!) patch that is really pretty minor when you
look at it.
Unsurprisingly it passes all of our tests and merges cleanly on top of
your tree right now, please merge this for v5.15"
* tag 'selinux-pr-20210830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: return early for possible NULL audit buffers
Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).
Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
589918df93 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
0fac6aa098 ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")
Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df93 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It should not return 0 when SID 0 is assigned to isids.
This patch fixes it.
Cc: stable@vger.kernel.org
Fixes: e3e0b582c3 ("selinux: remove unused initial SIDs and improve handling")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
[PM: remove changelog from description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add basic Kconfig, an initial (empty) af_mctp source object, and
{AF,PF}_MCTP definitions, and the required definitions for a new
protocol type.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch performs the final modification necessary to pass the buffer
measurement to callers, so that they provide a functionality similar to
ima_file_hash(). It adds the 'digest' and 'digest_len' parameters to
ima_measure_critical_data() and process_buffer_measurement().
These functions calculate the digest even if there is no suitable rule in
the IMA policy and, in this case, they simply return 1 before generating a
new measurement entry.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
audit_log_start() may return NULL in below cases:
- when audit is not initialized.
- when audit backlog limit exceeds.
After the call to audit_log_start() is made and then possible NULL audit
buffer argument is passed to audit_log_*() functions,
audit_log_*() functions return immediately in case of a NULL audit buffer
argument.
But it is optimal to return early when audit_log_start() returns NULL,
because it is not necessary for audit_log_*() functions to be called with
NULL audit buffer argument.
So add exception handling for possible NULL audit buffers where
return value can be handled from callers.
Signed-off-by: Austin Kim <austin.kim@lge.com>
[PM: tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
... along with avc_has_perm_flags() itself, since now it's identical
to avc_has_perm() (as pointed out by Paul Moore)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[PM: add "selinux:" prefix to subj and tweak for length]
Signed-off-by: Paul Moore <paul@paul-moore.com>
dump_common_audit_data() is safe to use under rcu_read_lock() now;
no need for AVC_NONBLOCKING and games around it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Fix function name and add comment for parameter state in ss/services.c
kernel-doc to remove some warnings found by running make W=1 LLVM=1.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
While trying to address a Coverity warning that the dev_name string
might end up unterminated when strcpy'ing it in
selinux_ib_endport_manage_subnet(), I realized that it is possible (and
simpler) to just pass the dev_name pointer directly, rather than copying
the string to a buffer.
The ibendport variable goes out of scope at the end of the function
anyway, so the lifetime of the dev_name pointer will never be shorter
than that of ibendport, thus we can safely just pass the dev_name
pointer and be done with it.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Variable rc is set to '-EINVAL' but this value is never read as
it is overwritten or not used later on, hence it is a redundant
assignment and can be removed.
Cleans up the following clang-analyzer warning:
security/selinux/ss/services.c:2103:3: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].
security/selinux/ss/services.c:2079:2: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].
security/selinux/ss/services.c:2071:2: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].
security/selinux/ss/services.c:2062:2: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].
security/selinux/ss/policydb.c:2592:3: warning: Value stored to 'rc' is
never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Minor documentation update.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
seliunx_xfrm_policy_lookup() is hooks of security_xfrm_policy_lookup().
The dir argument is uselss in security_xfrm_policy_lookup(). So
remove the dir argument from selinux_xfrm_policy_lookup() and
security_xfrm_policy_lookup().
Signed-off-by: Zhongjun Tan <tanzhongjun@yulong.com>
[PM: reformat the subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
We can do the allocation + copying of expr.nodes in one go using
kmemdup().
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEgycj0O+d1G2aycA8rZhLv9lQBTwFAmCInP4ACgkQrZhLv9lQ
BTza0g//dTeb9woC9H7qlEhK4l9yk62lTss60Q8X7m7ZSNfdL4tiEbi64SgK+iOW
OOegbrOEb8Kzh4KJJYmVlVZ5YUWyH4szgmee1wnylBdsWiWaPLPF3Cflz77apy6T
TiiBsJd7rRE29FKheaMt34B41BMh8QHESN+DzjzJWsFoi/uNxjgSs2W16XuSupKu
bpRmB1pYNXMlrkzz7taL05jndZYE5arVriqlxgAsuLOFOp/ER7zecrjImdCM/4kL
W6ej0R1fz2Geh6CsLBJVE+bKWSQ82q5a4xZEkSYuQHXgZV5eywE5UKu8ssQcRgQA
VmGUY5k73rfY9Ofupf2gCaf/JSJNXKO/8Xjg0zAdklKtmgFjtna5Tyg9I90j7zn+
5swSpKuRpilN8MQH+6GWAnfqQlNoviTOpFeq3LwBtNVVOh08cOg6lko/bmebBC+R
TeQPACKS0Q0gCDPm9RYoU1pMUuYgfOwVfVRZK1prgi2Co7ZBUMOvYbNoKYoPIydr
ENBYljlU1OYwbzgR2nE+24fvhU8xdNOVG1xXYPAEHShu+p7dLIWRLhl8UCtRQpSR
1ofeVaJjgjrp29O+1OIQjB2kwCaRdfv/Gq1mztE/VlMU/r++E62OEzcH0aS+mnrg
yzfyUdI8IFv1q6FGT9yNSifWUWxQPmOKuC8kXsKYfqfJsFwKmHM=
=uCN4
-----END PGP SIGNATURE-----
Merge tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull Landlock LSM from James Morris:
"Add Landlock, a new LSM from Mickaël Salaün.
Briefly, Landlock provides for unprivileged application sandboxing.
From Mickaël's cover letter:
"The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock
is a stackable LSM [1], it makes possible to create safe security
sandboxes as new security layers in addition to the existing
system-wide access-controls. This kind of sandbox is expected to
help mitigate the security impact of bugs or unexpected/malicious
behaviors in user-space applications. Landlock empowers any
process, including unprivileged ones, to securely restrict
themselves.
Landlock is inspired by seccomp-bpf but instead of filtering
syscalls and their raw arguments, a Landlock rule can restrict the
use of kernel objects like file hierarchies, according to the
kernel semantic. Landlock also takes inspiration from other OS
sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This
series still addresses multiple use cases, especially with the
combined use of seccomp-bpf: applications with built-in sandboxing,
init systems, security sandbox tools and security-oriented APIs [2]"
The cover letter and v34 posting is here:
https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/
See also:
https://landlock.io/
This code has had extensive design discussion and review over several
years"
Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]
* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
landlock: Enable user space to infer supported features
landlock: Add user and kernel documentation
samples/landlock: Add a sandbox manager example
selftests/landlock: Add user space tests
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
LSM: Infrastructure management of the superblock
landlock: Add ptrace restrictions
landlock: Set up the security framework and manage credentials
landlock: Add ruleset and domain management
landlock: Add object management
Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to
walk all map elements in a more robust and easier to verify
fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF
on s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices
which don't need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability
on next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is
slow in reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take
place correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP
packets before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used
to define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
-independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and
bnxt support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP
which define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay
for packets sampled from switch HW (and corresponding egress
and policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP
forwarding, bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface
to distribute MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x -
11-port Ethernet switch with 8x 1-Gigabit Ethernet
and 3x 10-Gigabit interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365
and BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack,
matching on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode
changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCKFPIACgkQMUZtbf5S
Irtw0g/+NA8bWdHNgG4H5rya0pv2z3IieLRmSdDfKRQQXcJpklawc5MKVVaTee/Q
5/QqgPdCsu1LAU6JXBKsKmyDDaMlQKdWuKbOqDSiAQKoMesZStTEHf9d851ZzgxA
Cdb6O7BD3lBl/IN+oxNG+KcmD1LKquTPKGySq2mQtEdLO12ekAsranzmj4voKffd
q9tBShpXQ7Dq77DLYfiQXVCvsizNcbbJFuxX0o9Lpb9+61ZyYAbogZSa9ypiZZwR
I/9azRBtJg7UV1aD/cLuAfy66Qh7t63+rCxVazs5Os8jVO26P/jQdisnnOe/x+p9
wYEmKm3GSu0V4SAPxkWW+ooKusflCeqDoMIuooKt6kbP6BRj540veGw3Ww/m5YFr
7pLQkTSP/tSjuGQIdBE1LOP5LBO8DZeC8Kiop9V0fzAW9hFSZbEq25WW0bPj8QQO
zA4Z7yWlslvxcfY2BdJX3wD8klaINkl/8fDWZFFsBdfFX2VeLtm7Xfduw34BJpvU
rYT3oWr6PhtkPAKR32SUcemSfeWgIVU41eSshzRz3kez1NngBUuLlSGGSEaKbes5
pZVt6pYFFVByyf6MTHFEoQvafZfEw04JILZpo4R5V8iTHzom0kD3Py064sBiXEw2
B6t+OW4qgcxGblpFkK2lD4kR2s1TPUs0ckVO6sAy1x8q60KKKjY=
=vcbA
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to walk
all map elements in a more robust and easier to verify fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF on
s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices which don't
need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability on
next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is slow in
reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP packets
before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used to
define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and bnxt
support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP which
define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay for
packets sampled from switch HW (and corresponding egress and
policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP forwarding,
bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface to distribute
MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port
Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit
interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and
BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack, matching
on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes"
* tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits)
net: selftest: fix build issue if INET is disabled
net: netrom: nr_in: Remove redundant assignment to ns
net: tun: Remove redundant assignment to ret
net: phy: marvell: add downshift support for M88E1240
net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255
net/sched: act_ct: Remove redundant ct get and check
icmp: standardize naming of RFC 8335 PROBE constants
bpf, selftests: Update array map tests for per-cpu batched ops
bpf: Add batched ops support for percpu array
bpf: Implement formatted output helpers with bstr_printf
seq_file: Add a seq_bprintf function
sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
net: fix a concurrency bug in l2tp_tunnel_register()
net/smc: Remove redundant assignment to rc
mpls: Remove redundant assignment to err
llc2: Remove redundant assignment to rc
net/tls: Remove redundant initialization of record
rds: Remove redundant assignment to nr_sig
dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmCHM2sUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNfCg/9GmoCyCh+ZRj5RGQ6M+yJas1+yyJQ
uEfTNde54yfATUTaaWYnZG59yqzM3I2uaV11U7tqg8ajiFPxJKqbs5R9jl3lnSjH
0Dg22nXPSCOTKcU0x/DeLoKRr+M9jO1K/nQ8NEZvYX4nC/OgtCvJqb/oEQZIKAk5
2a7OEmNNQyFGd274p9dELaDHxN9UIaJ2PzQFXtq7ROHgBXQO4ONb2ajOf6mDSFQb
vP/CDHwaH+pcE28w44oRy0/YBkO1SrdqoFQchg5yFagM5tQRLGkXK4OFSs5KHi5Q
YMtmaOzMPIv1e5eaC1HuuMJYA4pPb30T9hFHP7tmBVZfmZaFaDeUs+BhMm98WTiS
o0iTP7tfs36/poOR1Q0/sB06uvF9hUAAX1ZuE95YySifbXU9hsUc9b0uQSwCdg9P
/J9rcdHLTpWqjw9n02mezWmAvo5U8ZvbDs+0xPIwI+3RTUP5t6mp+Hd5Tc7bPTq1
0rpWXx+FQoSytFap5qiUSiwBp+HF6HQnNIXB0Muf6wctChoTjvo7TwoxH//z4kEm
+SddhOCNkB7VC/X7hOxhl0F/rdHuXvb1AFIWjpTLJH2CR1PvMtF+sGey+uPT6hKZ
/gvhmQGjFdph99eGlfVbCNvx1pM61O25IscaYD1T2wGImw+z7dX4WkG3WoOdDSkR
bRjrBkcHh0gLhWk=
=HTEy
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Add support for measuring the SELinux state and policy capabilities
using IMA.
- A handful of SELinux/NFS patches to compare the SELinux state of one
mount with a set of mount options. Olga goes into more detail in the
patch descriptions, but this is important as it allows more
flexibility when using NFS and SELinux context mounts.
- Properly differentiate between the subjective and objective LSM
credentials; including support for the SELinux and Smack. My clumsy
attempt at a proper fix for AppArmor didn't quite pass muster so John
is working on a proper AppArmor patch, in the meantime this set of
patches shouldn't change the behavior of AppArmor in any way. This
change explains the bulk of the diffstat beyond security/.
- Fix a problem where we were not properly terminating the permission
list for two SELinux object classes.
* tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: add proper NULL termination to the secclass_map permissions
smack: differentiate between subjective and objective task credentials
selinux: clarify task subjective and objective credentials
lsm: separate security_task_getsecid() into subjective and objective variants
nfs: account for selinux security context when deciding to share superblock
nfs: remove unneeded null check in nfs_fill_super()
lsm,selinux: add new hook to compare new mount to an existing mount
selinux: fix misspellings using codespell tool
selinux: fix misspellings using codespell tool
selinux: measure state and policy capabilities
selinux: Allow context mounts for unpriviliged overlayfs
Move management of the superblock->sb_security blob out of the
individual security modules and into the security infrastructure.
Instead of allocating the blobs from within the modules, the modules
tell the infrastructure how much space is required, and the space is
allocated there.
Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-6-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
This patch adds the missing NULL termination to the "bpf" and
"perf_event" object class permission lists.
This missing NULL termination should really only affect the tools
under scripts/selinux, with the most important being genheaders.c,
although in practice this has not been an issue on any of my dev/test
systems. If the problem were to manifest itself it would likely
result in bogus permissions added to the end of the object class;
thankfully with no access control checks using these bogus
permissions and no policies defining these permissions the impact
would likely be limited to some noise about undefined permissions
during policy load.
Cc: stable@vger.kernel.org
Fixes: ec27c3568a ("selinux: bpf: Add selinux check for eBPF syscall operations")
Fixes: da97e18458 ("perf_event: Add support for LSM and SELinux checks")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmBwjZcUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOAcg//eZL4z0ksGo9s/Y+/9qOIYH2tMPU5
OOVZCekBENiq2LOuVbzAndeHOLZflf3iigBwtMvqHaAsdPAKH/3UedzD0/nxG39m
S2gowEuNEfxtuBwuIZMFaMGzzLyjlZJ3xxi6omIyj/2JqPNyBbbFxR/VC4agJZI5
oG6VfwhZJmFi1oJiNoGKjwihKHZQ90yd8UU5rMI+Np0TnP1Or3OvRaZjR47r+dWS
tAu3nTKrVEyGTcPeGzg9TS5tIko0jQ1FyrqPDBhfaJta48bX/9s70We6rwqJj8Vg
HiiSDPMK5EKkPLso+1vqvBI9q6xdhNeS+M2JP+/ewK/cqVKMkTVVys6l+T3a6HcY
rIXdgTWdMFiAQ6OW44z30fiwSxW3kI5M62um31nepoqvzX7acl6R1laILFztedWM
EOfCznZmE6ccYmcZnrqEmNsdF+Se1TUiM87bN90tAGmF9F4Yw2qGM0raiV3OJhDZ
P2zR/+DceSHI2pNfFtB5VVXZelHoKVhoRcRWvpzn7YW3UmnAl83HoJasBfa/j4rx
qvo+nj5ptCSX/kUYjvfvrRV1rY/BAaSVlFLpgYKY1r8/hdRN5DLpdE5cHh6Gky6B
fJen4a7yVecp8IKK+WR3maJ0hymo5ccUoB5AKzMOXeECKRqKkIDAKiEN59g9t96+
avKfojgsh1tNHA0=
=mGDl
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20210409' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
"Three SELinux fixes.
These fix known problems relating to (re)loading SELinux policy or
changing the policy booleans, and pass our test suite without problem"
* tag 'selinux-pr-20210409' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix race between old and new sidtab
selinux: fix cond_list corruption when changing booleans
selinux: make nslot handling in avtab more robust
Since commit 1b8b31a2e6 ("selinux: convert policy read-write lock to
RCU"), there is a small window during policy load where the new policy
pointer has already been installed, but some threads may still be
holding the old policy pointer in their read-side RCU critical sections.
This means that there may be conflicting attempts to add a new SID entry
to both tables via sidtab_context_to_sid().
See also (and the rest of the thread):
https://lore.kernel.org/selinux/CAFqZXNvfux46_f8gnvVvRYMKoes24nwm2n3sPbMjrB8vKTW00g@mail.gmail.com/
Fix this by installing the new policy pointer under the old sidtab's
spinlock along with marking the old sidtab as "frozen". Then, if an
attempt to add new entry to a "frozen" sidtab is detected, make
sidtab_context_to_sid() return -ESTALE to indicate that a new policy
has been installed and that the caller will have to abort the policy
transaction and try again after re-taking the policy pointer (which is
guaranteed to be a newer policy). This requires adding a retry-on-ESTALE
logic to all callers of sidtab_context_to_sid(), but fortunately these
are easy to determine and aren't that many.
This seems to be the simplest solution for this problem, even if it
looks somewhat ugly. Note that other places in the kernel (e.g.
do_mknodat() in fs/namei.c) use similar stale-retry patterns, so I think
it's reasonable.
Cc: stable@vger.kernel.org
Fixes: 1b8b31a2e6 ("selinux: convert policy read-write lock to RCU")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently, duplicate_policydb_cond_list() first copies the whole
conditional avtab and then tries to link to the correct entries in
cond_dup_av_list() using avtab_search(). However, since the conditional
avtab may contain multiple entries with the same key, this approach
often fails to find the right entry, potentially leading to wrong rules
being activated/deactivated when booleans are changed.
To fix this, instead start with an empty conditional avtab and add the
individual entries one-by-one while building the new av_lists. This
approach leads to the correct result, since each entry is present in the
av_lists exactly once.
The issue can be reproduced with Fedora policy as follows:
# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True
# setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off ftpd_connect_db=off ftpd_full_access=off
On fixed kernels, the sesearch output is the same after the setsebool
command:
# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True
While on the broken kernels, it will be different:
# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True
While there, also simplify the computation of nslots. This changes the
nslots values for nrules 2 or 3 to just two slots instead of 4, which
makes the sequence more consistent.
Cc: stable@vger.kernel.org
Fixes: c7c556f1e8 ("selinux: refactor changing booleans")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
1. Make sure all fileds are initialized in avtab_init().
2. Slightly refactor avtab_alloc() to use the above fact.
3. Use h->nslot == 0 as a sentinel in the access functions to prevent
dereferencing h->htable when it's not allocated.
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
SELinux has a function, task_sid(), which returns the task's
objective credentials, but unfortunately is used in a few places
where the subjective task credentials should be used. Most notably
in the new security_task_getsecid_subj() LSM hook.
This patch fixes this and attempts to make things more obvious by
introducing a new function, task_sid_subj(), and renaming the
existing task_sid() function to task_sid_obj().
This patch also adds an interesting function in task_sid_binder().
The task_sid_binder() function has a comment which hopefully
describes it's reason for being, but it basically boils down to the
simple fact that we can't safely access another task's subjective
credentials so in the case of binder we need to stick with the
objective credentials regardless.
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Of the three LSMs that implement the security_task_getsecid() LSM
hook, all three LSMs provide the task's objective security
credentials. This turns out to be unfortunate as most of the hook's
callers seem to expect the task's subjective credentials, although
a small handful of callers do correctly expect the objective
credentials.
This patch is the first step towards fixing the problem: it splits
the existing security_task_getsecid() hook into two variants, one
for the subjective creds, one for the objective creds.
void security_task_getsecid_subj(struct task_struct *p,
u32 *secid);
void security_task_getsecid_obj(struct task_struct *p,
u32 *secid);
While this patch does fix all of the callers to use the correct
variant, in order to keep this patch focused on the callers and to
ease review, the LSMs continue to use the same implementation for
both hooks. The net effect is that this patch should not change
the behavior of the kernel in any way, it will be up to the latter
LSM specific patches in this series to change the hook
implementations and return the correct credentials.
Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a new hook that takes an existing super block and a new mount
with new options and determines if new options confict with an
existing mount or not.
A filesystem can use this new hook to determine if it can share
the an existing superblock with a new superblock for the new mount.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[PM: tweak the subject line, fix tab/space problems]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmBYx3gUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPSSw/+MnJxbBEfxMXll2LwCRXvyW0Q/F++
sSLPKZL9B5E7jANbTBlkUW+tMwsckTS7euPvRuJj2+mrSujRnSTl158JAAcn34gd
lpiGQpttFZD75Eh9sLNg0OZ7PflwQvAzHt52EweD8/OE5O8BLBg7o56SYMr3LkGu
Up9YcZPHNlj+NhfvWebv3jSB6dv392cG33iZoqmW81wSzmlXHGdzS5UTiIFnsp3X
kbhLKaZWDSBHuAVMuAxtx3x3sQO1ElfFHxKRYM1fzfl0BMy30Wv6YnXHW2nn08Hr
oT26968C0Rl9carTnA+G60Nj4WoTWW2dF20Mih+05vkpqFLjdMtFra7fFndbmfNi
f7Gj5DJNrbunX1dMFJkyPnO/1x74RFUhZbCKm5ffvmF8AcYVivbbsyUAy/xduPWo
m9hjXDVZLUbWxGBUFxyJD6qQw/wuz+qII8B7SBCKaDdCtM74TlXBVug8prrPcWHV
tO3ljjbxEjBJ6zsFIJ9IlV3rJTL0v4RbAELXXp5qcZOJpnUtuH8cxj0Ryzo3yCY5
g/m6IHhm5OfJ5TBSc5UIj2NJQi7sJ+Yv/++lms+RB2MVopx4lJ+UK7140gCA40iC
1EPOGXCnB/b1k5F38dqdpI5MD+/uAzOMusQvPfL4x0xoQidzsqDmqgaS+V8pIYl6
nisL4eEe2K7PWX4=
=mFaE
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20210322' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
"Three SELinux patches:
- Fix a problem where a local variable is used outside its associated
function. Thankfully this can only be triggered by reloading the
SELinux policy, which is a restricted operation for other obvious
reasons.
- Fix some incorrect, and inconsistent, audit and printk messages
when loading the SELinux policy.
All three patches are relatively minor and have been through our
testing with no failures"
* tag 'selinux-pr-20210322' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinuxfs: unify policy load error reporting
selinux: fix variable scope issue in live sidtab conversion
selinux: don't log MAC_POLICY_LOAD record on failed policy load
Let's drop the pr_err()s from sel_make_policy_nodes() and just add one
pr_warn_ratelimited() call to the sel_make_policy_nodes() error path in
sel_write_load().
Changing from error to warning makes sense, since after 02a52c5c8c
("selinux: move policy commit after updating selinuxfs"), this error
path no longer leads to a broken selinuxfs tree (it's just kept in the
original state and policy load is aborted).
I also added _ratelimited to be consistent with the other prtin in the
same function (it's probably not necessary, but can't really hurt...
there are likely more important error messages to be printed when
filesystem entry creation starts erroring out).
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 02a52c5c8c ("selinux: move policy commit after updating
selinuxfs") moved the selinux_policy_commit() call out of
security_load_policy() into sel_write_load(), which caused a subtle yet
rather serious bug.
The problem is that security_load_policy() passes a reference to the
convert_params local variable to sidtab_convert(), which stores it in
the sidtab, where it may be accessed until the policy is swapped over
and RCU synchronized. Before 02a52c5c8c, selinux_policy_commit() was
called directly from security_load_policy(), so the convert_params
pointer remained valid all the way until the old sidtab was destroyed,
but now that's no longer the case and calls to sidtab_context_to_sid()
on the old sidtab after security_load_policy() returns may cause invalid
memory accesses.
This can be easily triggered using the stress test from commit
ee1a84fdfe ("selinux: overhaul sidtab to fix bug and improve
performance"):
```
function rand_cat() {
echo $(( $RANDOM % 1024 ))
}
function do_work() {
while true; do
echo -n "system_u:system_r:kernel_t:s0:c$(rand_cat),c$(rand_cat)" \
>/sys/fs/selinux/context 2>/dev/null || true
done
}
do_work >/dev/null &
do_work >/dev/null &
do_work >/dev/null &
while load_policy; do echo -n .; sleep 0.1; done
kill %1
kill %2
kill %3
```
Fix this by allocating the temporary sidtab convert structures
dynamically and passing them among the
selinux_policy_{load,cancel,commit} functions.
Fixes: 02a52c5c8c ("selinux: move policy commit after updating selinuxfs")
Cc: stable@vger.kernel.org
Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: merge fuzz in security.h and services.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
If sel_make_policy_nodes() fails, we should jump to 'out', not 'out1',
as the latter would incorrectly log an MAC_POLICY_LOAD audit record,
even though the policy hasn't actually been reloaded. The 'out1' jump
label now becomes unused and can be removed.
Fixes: 02a52c5c8c ("selinux: move policy commit after updating selinuxfs")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
- RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested
attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*.
- RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will
currently serve only for dumping of individual buckets of resilient next
hop groups. For nexthop group buckets, these messages will carry a nested
attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*.
There are several reasons why a new suite of messages is created for
nexthop buckets instead of overloading the information on the existing
RTM_{NEW,DEL,GET}NEXTHOP messages.
First, a nexthop group can contain a large number of nexthop buckets (4k
is not unheard of). This imposes limits on the amount of information that
can be encoded for each nexthop bucket given a netlink message is limited
to 64k bytes.
Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at
this point, in the future it can be extended to provide user space with
control over nexthop buckets configuration.
- The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is
adjusted to bounce groups with that type for now.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A typo is f out by codespell tool in 422th line of security.h:
$ codespell ./security/selinux/include/
./security.h:422: thie ==> the, this
Fix a typo found by codespell.
Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
A typo is found out by codespell tool in 16th line of hashtab.c
$ codespell ./security/selinux/ss/
./hashtab.c:16: rouding ==> rounding
Fix a typo found by codespell.
Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
SELinux stores the configuration state and the policy capabilities
in kernel memory. Changes to this data at runtime would have an impact
on the security guarantees provided by SELinux. Measuring this data
through IMA subsystem provides a tamper-resistant way for
an attestation service to remotely validate it at runtime.
Measure the configuration state and policy capabilities by calling
the IMA hook ima_measure_critical_data().
To enable SELinux data measurement, the following steps are required:
1, Add "ima_policy=critical_data" to the kernel command line arguments
to enable measuring SELinux data at boot time.
For example,
BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc3+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux ima_policy=critical_data
2, Add the following rule to /etc/ima/ima-policy
measure func=CRITICAL_DATA label=selinux
Sample measurement of SELinux state and policy capabilities:
10 2122...65d8 ima-buf sha256:13c2...1292 selinux-state 696e...303b
Execute the following command to extract the measured data
from the IMA's runtime measurements list:
grep "selinux-state" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut -d' ' -f 6 | xxd -r -p
The output should be a list of key-value pairs. For example,
initialized=1;enforcing=0;checkreqprot=1;network_peer_controls=1;open_perms=1;extended_socket_class=1;always_check_network=0;cgroup_seclabel=1;nnp_nosuid_transition=1;genfs_seclabel_symlinks=0;
To verify the measurement is consistent with the current SELinux state
reported on the system, compare the integer values in the following
files with those set in the IMA measurement (using the following commands):
- cat /sys/fs/selinux/enforce
- cat /sys/fs/selinux/checkreqprot
- cat /sys/fs/selinux/policy_capabilities/[capability_file]
Note that the actual verification would be against an expected state
and done on a separate system (likely an attestation server) requiring
"initialized=1;enforcing=1;checkreqprot=0;"
for a secure state and then whatever policy capabilities are actually
set in the expected policy (which can be extracted from the policy
itself via seinfo, for example).
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Now overlayfs allow unpriviliged mounts. That is root inside a non-init
user namespace can mount overlayfs. This is being added in 5.11 kernel.
Giuseppe tried to mount overlayfs with option "context" and it failed
with error -EACCESS.
$ su test
$ unshare -rm
$ mkdir -p lower upper work merged
$ mount -t overlay -o lowerdir=lower,workdir=work,upperdir=upper,userxattr,context='system_u:object_r:container_file_t:s0' none merged
This fails with -EACCESS. It works if option "-o context" is not specified.
Little debugging showed that selinux_set_mnt_opts() returns -EACCESS.
So this patch adds "overlay" to the list, where it is fine to specific
context from non init_user_ns.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
[PM: trimmed the changelog from the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
=yPaw
-----END PGP SIGNATURE-----
Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.
Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:
- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.
- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).
- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.
- It is possible to share files between containers with
non-overlapping idmappings.
- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.
- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.
- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.
- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.
Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:
- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.
https://systemd.io/HOME_DIRECTORY/
- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734
- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.
- ChromeOS: Sharing host-directories with unprivileged containers.
I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:
https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdfhttps://fosdem.org/2021/schedule/event/containers_idmap/
This comes with an extensive xfstests suite covering both ext4 and
xfs:
https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.
In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.
Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.
The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.
The following conditions must be met in order to create an idmapped
mount:
- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.
- The underlying filesystem must support idmapped mounts.
- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.
- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.
The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.
The manpage with a detailed description can be found here:
1d7b902e28
In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.
The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.
Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.
While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.
With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.
There is a simple tool available at
https://github.com/brauner/mount-idmapped
that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:
Here's an example to a simple idmapped mount of another user's home
directory:
u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"
* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEjSMCCC7+cjo3nszSa3kkZrA+cVoFAmArRwIUHHpvaGFyQGxp
bnV4LmlibS5jb20ACgkQa3kkZrA+cVo6JxAAkZHDhv6Zv7FfVsFjE7yDJwRFBu4o
jnAowPxa/xl6MlR2ICTFLHjOimEcpvzySO2IM85WCxjRYaNevITOxEZE+qfE/Byo
K1MuZOSXXBa2+AgO1Tku+ZNrQvzTsphgtvhlSD9ReN7P84C/rxG5YDomME+8/6rR
QH7Ly/izyc3VNKq7nprT8F2boJ0UxpcwNHZiH2McQD3UvUaZOecwpcpvth5pbgad
Ej2r72Q+IR0voqM/T1dc4TjW5Wcw/m27vhGQoOfQ5f+as5r9r1cPSWj0wRJTkATo
F/SiKuyWUwOGkRO8I9aaXXzTBgcJw/7MmZe8yNDg5QJrUzD8F5cdjlHZdsnz5BJq
tLo4kUsR4xMePEppJ4a10ZUDQa737j97C20xTwOHf6mKGIqmoooGAsjW9xUyYqHU
rYuLP4qB7ua4j8Uz9zVJazjgQWPQ+8Ad9MkjQLLhr00Azpz4mVweWVGjCJQC0pky
Jr2H4xj3JLAoygqMWfJxr9aVBpfy4Wmo0U29ryZuxZUr178qSXoL3QstGWXRa2MN
TwzpgHi1saItQ6iXAO0HB6Tsw0h8INyjrm7c3ANbmBwMsYMYeKcTG87+Z0LkK82w
C5SW2uQT9aLBXx9lZx8z0RpxygO1cW+KjlZxRYSfQa/ev/aF2kBz0ruGQgvqai4K
ceh/cwrYjrCbFVc=
=mojv
-----END PGP SIGNATURE-----
Merge tag 'integrity-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull IMA updates from Mimi Zohar:
"New is IMA support for measuring kernel critical data, as per usual
based on policy. The first example measures the in memory SELinux
policy. The second example measures the kernel version.
In addition are four bug fixes to address memory leaks and a missing
'static' function declaration"
* tag 'integrity-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Make function integrity_add_key() static
ima: Free IMA measurement buffer after kexec syscall
ima: Free IMA measurement buffer on error
IMA: Measure kernel version in early boot
selinux: include a consumer of the new IMA critical data hook
IMA: define a builtin critical data measurement policy
IMA: extend critical data hook to limit the measurement based on a label
IMA: limit critical data measurement based on a label
IMA: add policy rule to measure critical data
IMA: define a hook to measure kernel integrity critical data
IMA: add support to measure buffer data hash
IMA: generalize keyring specific measurement constructs
evm: Fix memleak in init_desc
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.
In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.
In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.
If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.
Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.
Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
SELinux stores the active policy in memory, so the changes to this data
at runtime would have an impact on the security guarantees provided
by SELinux. Measuring in-memory SELinux policy through IMA subsystem
provides a secure way for the attestation service to remotely validate
the policy contents at runtime.
Measure the hash of the loaded policy by calling the IMA hook
ima_measure_critical_data(). Since the size of the loaded policy
can be large (several MB), measure the hash of the policy instead of
the entire policy to avoid bloating the IMA log entry.
To enable SELinux data measurement, the following steps are required:
1, Add "ima_policy=critical_data" to the kernel command line arguments
to enable measuring SELinux data at boot time.
For example,
BOOT_IMAGE=/boot/vmlinuz-5.10.0-rc1+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux ima_policy=critical_data
2, Add the following rule to /etc/ima/ima-policy
measure func=CRITICAL_DATA label=selinux
Sample measurement of the hash of SELinux policy:
To verify the measured data with the current SELinux policy run
the following commands and verify the output hash values match.
sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1
grep "selinux-policy-hash" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut -d' ' -f 6
Note that the actual verification of SELinux policy would require loading
the expected policy into an identical kernel on a pristine/known-safe
system and run the sha256sum /sys/kernel/selinux/policy there to get
the expected hash.
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.
A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".
Example:
type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };
(The next patch in this series is necessary for making userfaultfd
support this new interface. The example above is just
for exposition.)
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When a superblock is assigned the SECURITY_FS_USE_XATTR behavior by the
policy yet it lacks xattr support, try to fall back to genfs rather than
rejecting the mount. If a genfscon rule is found for the filesystem,
then change the behavior to SECURITY_FS_USE_GENFS, otherwise reject the
mount as before. A similar fallback is already done in security_fs_use()
if no behavior specification is found for the given filesystem.
This is needed e.g. for virtiofs, which may or may not support xattrs
depending on the backing host filesystem.
Example:
# seinfo --genfs | grep ' ramfs'
genfscon ramfs / system_u:object_r:ramfs_t:s0
# echo '(fsuse xattr ramfs (system_u object_r fs_t ((s0) (s0))))' >ramfs_xattr.cil
# semodule -i ramfs_xattr.cil
# mount -t ramfs none /mnt
Before:
mount: /mnt: mount(2) system call failed: Operation not supported.
After:
(mount succeeds)
# ls -Zd /mnt
system_u:object_r:ramfs_t:s0 /mnt
See also:
https://lore.kernel.org/selinux/20210105142148.GA3200@redhat.com/T/https://github.com/fedora-selinux/selinux-policy/pull/478
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This is motivated by a perfomance regression of selinux_xfrm_enabled()
that happened on a RHEL kernel due to false sharing between
selinux_xfrm_refcount and (the late) selinux_ss.policy_rwlock (i.e. the
.bss section memory layout changed such that they happened to share the
same cacheline). Since the policy rwlock's memory region was modified
upon each read-side critical section, the readers of
selinux_xfrm_refcount had frequent cache misses, eventually leading to a
significant performance degradation under a TCP SYN flood on a system
with many cores (32 in this case, but it's detectable on less cores as
well).
While upstream has since switched to RCU locking, so the same can no
longer happen here, selinux_xfrm_refcount could still share a cacheline
with another frequently written region, thus marking it __read_mostly
still makes sense. __read_mostly helps, because it will put the symbol
in a separate section along with other read-mostly variables, so there
should never be a clash with frequently written data.
Since selinux_xfrm_refcount is modified only in case of an explicit
action, it should be safe to do this (i.e. it shouldn't disrupt other
read-mostly variables too much).
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
All of these are never modified outside initcalls, so they can be
__ro_after_init.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
It is not referenced outside selinuxfs.c, so remove its extern header
declaration and make it static.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Its value is actually not changed anywhere, so it can be substituted for
a direct call to audit_update_lsm_rules().
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
All of sel_ib_pkey_list, sel_netif_list, sel_netnode_list, and
sel_netport_list are declared but never used. Remove them.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.
When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr. This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.
This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().
Match the logic of inode_listsecurity to that of inode_getxattr and
do not list the security.selinux xattr if selinux is not initialized.
Reported-by: Michael Labriola <michael.d.labriola@gmail.com>
Tested-by: Michael Labriola <michael.d.labriola@gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/
Fixes: c8e222616c ("selinux: allow reading labels before policy is loaded")
Cc: stable@vger.kernel.org#v5.9+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The MPTCP protocol uses a specific protocol value, even if
it's an extension to TCP. Additionally, MPTCP sockets
could 'fall-back' to TCP at run-time, depending on peer MPTCP
support and available resources.
As a consequence of the specific protocol number, selinux
applies the raw_socket class to MPTCP sockets.
Existing TCP application converted to MPTCP - or forced to
use MPTCP socket with user-space hacks - will need an
updated policy to run successfully.
This change lets selinux attach the TCP socket class to
MPTCP sockets, too, so that no policy changes are needed in
the above scenario.
Note that the MPTCP is setting, propagating and updating the
security context on all the subflows and related request
socket.
Link: https://lore.kernel.org/linux-security-module/CAHC9VhTaK3xx0hEGByD2zxfF7fadyPP1kb-WeWH_YCyq9X-sRg@mail.gmail.com/T/#t
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[PM: tweaked subject's prefix]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl/YBtEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNnwA/9Ek8DG/1t8CEoJxpoRvwovQxNo+bi
0rCT9vqvx9PeCwoZi/0Vp6oKmpE1HADvbeB/+e00VrbLYnzE3oRY6VkpjoZRofKS
vc0/MzHSFxFUR1OTHwCefcXlPLK+bfitQbX5jEMeVyQCXNXXIrN7CnJf1LmCeLTR
kQBPlEN9lt7HyNVAi34FhOD/TQbWnFHgl2z5puffgri6cWnc+TALKMYytUZ+rYex
NYndDJW5b3g5kTat2eErn0FruxfzloGs0xMIiWb+z2i9kl41D+dkKPdAN7idqCSC
Jv0nJP/bDftzA0wOe9szmGaLQzu7YnCN5kiWcSspatZVnon42Cy/tp9tiuPGLRFU
XtelDfpyX6o3CLN0tX7LQEO+GYxPzvM6iaR2OrsChWPozUIIR3TLQg7jJN4bvNKl
TR6gCGZCoAeS5JLNGjzVKxT/oKQY+tCLLlYXQdQY6swNFi3EKmPr+K1D9lgm98fO
f3d1QmWiZZNmtxxoVogT0qoQYjkfgpnm3dVx813Vt+lwHlVpHGMEPpO27iD3/RYb
w2yWOJaGKwMD8iL0l+Cm6CPW0/nE5FFISQjWgC8b4Vgxlyan6+L9eViqGICkrUQ2
Edo0i1YFFZ4utHYkDf1VYBbJ+36KyCtdktgLAcbgnePiPB3E1XBsXTIIStSUIbVQ
iEbTkBlsCG4GIeU=
=6Cqb
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"While we have a small number of SELinux patches for v5.11, there are a
few changes worth highlighting:
- Change the LSM network hooks to pass flowi_common structs instead
of the parent flowi struct as the LSMs do not currently need the
full flowi struct and they do not have enough information to use it
safely (missing information on the address family).
This patch was discussed both with Herbert Xu (representing team
netdev) and James Morris (representing team
LSMs-other-than-SELinux).
- Fix how we handle errors in inode_doinit_with_dentry() so that we
attempt to properly label the inode on following lookups instead of
continuing to treat it as unlabeled.
- Tweak the kernel logic around allowx, auditallowx, and dontauditx
SELinux policy statements such that the auditx/dontauditx are
effective even without the allowx statement.
Everything passes our test suite"
* tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
selinux: Fix fall-through warnings for Clang
selinux: drop super_block backpointer from superblock_security_struct
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
selinux: allow dontauditx and auditallowx rules to take effect without allowx
selinux: fix error initialization in inode_doinit_with_dentry()
A followup change to tcp_request_sock_op would have to drop the 'const'
qualifier from the 'route_req' function as the
'security_inet_conn_request' call is moved there - and that function
expects a 'struct sock *'.
However, it turns out its also possible to add a const qualifier to
security_inet_conn_request instead.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As pointed out by Herbert in a recent related patch, the LSM hooks do
not have the necessary address family information to use the flowi
struct safely. As none of the LSMs currently use any of the protocol
specific flowi information, replace the flowi pointers with pointers
to the address family independent flowi_common struct.
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl+vD0MUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOHZw//SbZ4c32LGxANDJrASdRWGRYzUsSm
cQZrLR373TMMmawZDenvRK0Wfa8vj+BK5V5knBXXTDFq5M+TVtlzhRhED4ZH40RP
CmaL+XC24ed1IPsYiZZ0lytXrDnb7EgmGaSjHP7RttBdLG/2ABamm9FNZFpiFJjV
y9txgNyJ9NvrdFCO+UoLEm9BFq4x/Ddzu+BAEqu1iSGaKXufiBlq9uQ+3qommJlx
Bv+i6HeqCruTj+y8p8tkR7aLma9jsbqCpBlQkG1ja4pgxMXwC2uRbNzz//dlxWZ5
pKtz8HJqi+myla/rwNZAA82uZlkFnEOp7Yy404Deu73WznMdrQ1oJqlGDrXdafKn
BJwpcUQnJ7NP2MK1JrMbxdVllUKsZF2ICP0t3CFTuuyJ2CakOuaSW+ERBvbO6NYa
G7h9Ll8AO4ADbG3W+/c6nybDt8AkjBLJTg7uXWO3LIjjZ/UJwJxO4YwzYbaMdg6Q
7zVNj7j36YLHhv9Hc60vhCjO53KXYo60Z6TCblkeyClifvM8Gx109JkOwPXWFGIY
Bh7KbKtGr2dZoFxU2yS5oQgtbzJhsfJQGYsF8uANHcj76hQ/rh+Ff1NKKofHaRSl
umQSgNbekaQjnmq08otVlIHtjdaG+or1PFW2Yep37OwsCXwKrFG4YZRY9FdpwQeG
Cjx3JAJTibvjMqg=
=qpR+
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20201113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"One small SELinux patch to make sure we return an error code when an
allocation fails. It passes all of our tests, but given the nature of
the patch that isn't surprising"
* tag 'selinux-pr-20201113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: Fix error return code in sel_ib_pkey_sid_slow()
Fix to return a negative error code from the error handling case
instead of 0 in function sel_ib_pkey_sid_slow(), as done elsewhere
in this function.
Cc: stable@vger.kernel.org
Fixes: 409dcf3153 ("selinux: Add a cache for quicker retreival of PKey SIDs")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
It appears to have been needed for selinux_complete_init() in the past,
but today it's useless.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A previous fix, commit 83370b31a9 ("selinux: fix error initialization
in inode_doinit_with_dentry()"), changed how failures were handled
before a SELinux policy was loaded. Unfortunately that patch was
potentially problematic for two reasons: it set the isec->initialized
state without holding a lock, and it didn't set the inode's SELinux
label to the "default" for the particular filesystem. The later can
be a problem if/when a later attempt to revalidate the inode fails
and SELinux reverts to the existing inode label.
This patch should restore the default inode labeling that existed
before the original fix, without affecting the LABEL_INVALID marking
such that revalidation will still be attempted in the future.
Fixes: 83370b31a9 ("selinux: fix error initialization in inode_doinit_with_dentry()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This allows for dontauditing very specific ioctls e.g. TCGETS without
dontauditing every ioctl or granting additional permissions.
Now either an allowx, dontauditx or auditallowx rules enables checking
for extended permissions.
Signed-off-by: Jonathan Hettwer <j2468h@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Mark the inode security label as invalid if we cannot find
a dentry so that we will retry later rather than marking it
initialized with the unlabeled SID.
Fixes: 9287aed2ad ("selinux: Convert isec->lock into a spinlock")
Signed-off-by: Tianyue Ren <rentianyue@kylinos.cn>
[PM: minor comment tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Here is the big set of char, misc, and other assorted driver subsystem
patches for 5.10-rc1.
There's a lot of different things in here, all over the drivers/
directory. Some summaries:
- soundwire driver updates
- habanalabs driver updates
- extcon driver updates
- nitro_enclaves new driver
- fsl-mc driver and core updates
- mhi core and bus updates
- nvmem driver updates
- eeprom driver updates
- binder driver updates and fixes
- vbox minor bugfixes
- fsi driver updates
- w1 driver updates
- coresight driver updates
- interconnect driver updates
- misc driver updates
- other minor driver updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX4g8YQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yngKgCeNpArCP/9vQJRK9upnDm8ZLunSCUAn1wUT/2A
/bTQ42c/WRQ+LU828GSM
=6sO2
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char, misc, and other assorted driver subsystem
patches for 5.10-rc1.
There's a lot of different things in here, all over the drivers/
directory. Some summaries:
- soundwire driver updates
- habanalabs driver updates
- extcon driver updates
- nitro_enclaves new driver
- fsl-mc driver and core updates
- mhi core and bus updates
- nvmem driver updates
- eeprom driver updates
- binder driver updates and fixes
- vbox minor bugfixes
- fsi driver updates
- w1 driver updates
- coresight driver updates
- interconnect driver updates
- misc driver updates
- other minor driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (396 commits)
binder: fix UAF when releasing todo list
docs: w1: w1_therm: Fix broken xref, mistakes, clarify text
misc: Kconfig: fix a HISI_HIKEY_USB dependency
LSM: Fix type of id parameter in kernel_post_load_data prototype
misc: Kconfig: add a new dependency for HISI_HIKEY_USB
firmware_loader: fix a kernel-doc markup
w1: w1_therm: make w1_poll_completion static
binder: simplify the return expression of binder_mmap
test_firmware: Test partial read support
firmware: Add request_partial_firmware_into_buf()
firmware: Store opt_flags in fw_priv
fs/kernel_file_read: Add "offset" arg for partial reads
IMA: Add support for file reads without contents
LSM: Add "contents" flag to kernel_read_file hook
module: Call security_kernel_post_load_data()
firmware_loader: Use security_post_load_data()
LSM: Introduce kernel_post_load_data() hook
fs/kernel_read_file: Add file_size output argument
fs/kernel_read_file: Switch buffer size arg to size_t
fs/kernel_read_file: Remove redundant size argument
...