All PFN_* pfn_t flags have been removed. Therefore there is no longer a
need for the pfn_t type and all uses can be replaced with normal pfns.
Link: https://lkml.kernel.org/r/bbedfa576c9822f8032494efbe43544628698b1f.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Björn Töpel <bjorn@rivosinc.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: John Groves <john@groves.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
PFN_DEV was used by callers of dax_direct_access() to figure out if the
returned PFN is associated with a page using pfn_t_has_page() or not.
However all DAX PFNs now require an assoicated ZONE_DEVICE page so can
assume a page exists.
Other users of PFN_DEV were setting it before calling vmf_insert_mixed().
This is unnecessary as it is no longer checked, instead relying on
pfn_valid() to determine if there is an associated page or not.
Link: https://lkml.kernel.org/r/74b293aebc21b941090bc3e7aeafa91b57c821a5.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Björn Töpel <bjorn@rivosinc.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: John Groves <john@groves.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert all strcpy() usages to strscpy() where the conversion means
just replacing strcpy() with strscpy(). strcpy() is deprecated since
it performs no bounds checking on the destination buffer.
Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The dcssblk driver has long needed special case supoprt to enable limited
dax operation, so called CONFIG_FS_DAX_LIMITED. This mode works around
the incomplete support for ZONE_DEVICE on s390 by forgoing the ability of
dax-mapped pages to support GUP.
Now, pending cleanups to fsdax that fix its reference counting [1] depend
on the ability of all dax drivers to supply ZONE_DEVICE pages.
To allow that work to move forward, dax support needs to be paused for
dcssblk until ZONE_DEVICE support arrives. That work has been known for a
few years [2], and the removal of "pte_devmap" requirements [3] makes the
conversion easier.
For now, place the support behind CONFIG_BROKEN, and remove PFN_SPECIAL
(dcssblk was the only user).
Link: http://lore.kernel.org/cover.9f0e45d52f5cff58807831b6b867084d0b14b61c.1725941415.git-series.apopple@nvidia.com [1]
Link: http://lore.kernel.org/20210820210318.187742e8@thinkpad/ [2]
Link: http://lore.kernel.org/4511465a4f8429f45e2ac70d2e65dc5e1df1eb47.1725941415.git-series.apopple@nvidia.com [3]
Link: https://lkml.kernel.org/r/33eef2379c0d240f40cc15453fad2df1a4ae34c8.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred for
presenting attributes to user space in sysfs. Convert the left-over uses
in the block/dcssblk code.
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
- Remove restrictions on PAI NNPA and crypto counters, enabling
concurrent per-task and system-wide sampling and counting events
- Switch to GENERIC_CPU_DEVICES by setting up the CPU present mask in
the architecture code and letting the generic code handle CPU bring-up
- Add support for the diag204 busy indication facility to prevent
undesirable blocking during hypervisor logical CPU utilization
queries. Implement results caching
- Improve the handling of Store Data SCLP events by suppressing
unnecessary warning, preventing buffer release in I/O during failures,
and adding timeout handling for Store Data requests to address potential
firmware issues
- Provide optimized __arch_hweight*() implementations
- Remove the unnecessary CPU KOBJ_CHANGE uevents generated during topology
updates, as they are unused and also not present on other architectures
- Cleanup atomic_ops, optimize __atomic_set() for small values and
__atomic_cmpxchg_bool() for compilers supporting flag output constraint
- Couple of cleanups for KVM:
- Move and improve KVM struct definitions for DAT tables from gaccess.c
to a new header
- Pass the asce as parameter to sie64a()
- Make the crdte() and cspg() page table handling wrappers return a
boolean to indicate success, like the other existing "compare and swap"
wrappers
- Add documentation for HWCAP flags
- Switch to obtaining total RAM pages from memblock instead of
totalram_pages() during mm init, to ensure correct calculation of zero
page size, when defer_init is enabled
- Refactor lowcore access and switch to using the get_lowcore() function
instead of the S390_lowcore macro
- Cleanups for PG_arch_1 and folio handling in UV and hugetlb code
- Add missing MODULE_DESCRIPTION() macros
- Fix VM_FAULT_HWPOISON handling in do_exception()
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmaYGegACgkQjYWKoQLX
FBjCwwf/aRYoLIXCa9/nHGWFiUjZm6xBgVwZh55bXjfNG9TI2J9UZSsYlOFGUJKl
gvD2Ym+LqAejK8R4EUHkfD6ftaKMQuIxNDoedxhwuSpfOQ2mZ5teu0MxTh8QcUAx
4Y2w5XEeCuqE3SuoZ4SJa58K4rGl4cFpPsKNa8ofdzH1ZLFNe8Wqzis4kh0htqLb
FtPj6nsgfzQ5kg14rVkGxCa4CqoFxonXgsA6nH6xZLbxKUInyq8uV44UBQ+aJq5v
dsdzZ5XuAJHN2FpBuuOYQYZYw3XIy/kka7o4EjffORi5SGCRMWO4Zt0P6HXaNkh6
xV8EEO8myeo7rV8dnrk1V4yGjGJmfA==
=3IGY
-----END PGP SIGNATURE-----
Merge tag 's390-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Remove restrictions on PAI NNPA and crypto counters, enabling
concurrent per-task and system-wide sampling and counting events
- Switch to GENERIC_CPU_DEVICES by setting up the CPU present mask in
the architecture code and letting the generic code handle CPU
bring-up
- Add support for the diag204 busy indication facility to prevent
undesirable blocking during hypervisor logical CPU utilization
queries. Implement results caching
- Improve the handling of Store Data SCLP events by suppressing
unnecessary warning, preventing buffer release in I/O during
failures, and adding timeout handling for Store Data requests to
address potential firmware issues
- Provide optimized __arch_hweight*() implementations
- Remove the unnecessary CPU KOBJ_CHANGE uevents generated during
topology updates, as they are unused and also not present on other
architectures
- Cleanup atomic_ops, optimize __atomic_set() for small values and
__atomic_cmpxchg_bool() for compilers supporting flag output
constraint
- Couple of cleanups for KVM:
- Move and improve KVM struct definitions for DAT tables from
gaccess.c to a new header
- Pass the asce as parameter to sie64a()
- Make the crdte() and cspg() page table handling wrappers return a
boolean to indicate success, like the other existing "compare and
swap" wrappers
- Add documentation for HWCAP flags
- Switch to obtaining total RAM pages from memblock instead of
totalram_pages() during mm init, to ensure correct calculation of
zero page size, when defer_init is enabled
- Refactor lowcore access and switch to using the get_lowcore()
function instead of the S390_lowcore macro
- Cleanups for PG_arch_1 and folio handling in UV and hugetlb code
- Add missing MODULE_DESCRIPTION() macros
- Fix VM_FAULT_HWPOISON handling in do_exception()
* tag 's390-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception()
s390/kvm: Move bitfields for dat tables
s390/entry: Pass the asce as parameter to sie64a()
s390/sthyi: Use cached data when diag is busy
s390/sthyi: Move diag operations
s390/hypfs_diag: Diag204 busy loop
s390/diag: Add busy-indication-facility requirements
s390/diag: Diag204 add busy return errno
s390/diag: Return errno's from diag204
s390/sclp: Diag204 busy indication facility detection
s390/atomic_ops: Make use of flag output constraint
s390/atomic_ops: Improve __atomic_set() for small values
s390/atomic_ops: Use symbolic names
s390/smp: Switch to GENERIC_CPU_DEVICES
s390/hwcaps: Add documentation for HWCAP flags
s390/pgtable: Make crdte() and cspg() return a value
s390/topology: Remove CPU KOBJ_CHANGE uevents
s390/sclp: Add timeout to Store Data requests
s390/sclp: Prevent release of buffer in I/O
s390/sclp: Suppress unnecessary Store Data warning
...
With ARCH=s390, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/s390/block/dcssblk.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240615-md-s390-drivers-s390-block-dcssblk-v1-1-d9d19703abcb@quicinc.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Move the dax flag into the queue_limits feature field so that it can be
set atomically with the queue frozen.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-21-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Various virtual vs physical address usage fixes
- Add new bitwise types and helper functions and use them in s390 specific
drivers and code to make it easier to find virtual vs physical address
usage bugs. Right now virtual and physical addresses are identical for
s390, except for module, vmalloc, and similar areas. This will be
changed, hopefully with the next merge window, so that e.g. the kernel
image and modules will be located close to each other, allowing for
direct branches and also for some other simplifications.
As a prerequisite this requires to fix all misuses of virtual and
physical addresses. As it turned out people are so used to the concept
that virtual and physical addresses are the same, that new bugs got added
to code which was already fixed. In order to avoid that even more code
gets merged which adds such bugs add and use new bitwise types, so that
sparse can be used to find such usage bugs.
Most likely the new types can go away again after some time
- Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation
- Fix kprobe branch handling: if an out-of-line single stepped relative
branch instruction has a target address within a certain address area in
the entry code, the program check handler may incorrectly execute cleanup
code as if KVM code was executed, leading to crashes
- Fix reference counting of zcrypt card objects
- Various other small fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmX5lIYACgkQIg7DeRsp
bsJYxA//cYGSaopFLuHxQndQi5UcMx0NMcaD9auMyDraPWJH0/F+vIWnLPxgJ1wB
zfav3Ssvdd/41rPIXHSGxUzXzv+EHCY5+91t4plfCxWtd1SbpJ8gG2vrXTH512ql
RhJ7crRFQqoigIxlsdwUkwGSqd+L6H73ikKzQyAaJFKC/e9BEpCH4zb8NuTqCeJE
a81/A8BGSIo4Fk4qj1T5bHZzkznmxtisMPGXzoKa28LKhzwqqbGZYeHohnkaT6co
UpY/HMdhGH74kkKqpF0cLDI0Bd8ptfjbcVcKyJ8OD7vCfinIM3bZdYg0KIzyRMhu
d49JDXctPiePXTCi8X+AJnhNYFGlHuDciEpTMzkw97kwhmCfAOW5UaAlBo8dJqat
zt76Cxbl3D+hYI7Wbs9lt2QsXso4/1fMMNQbu9pwyMKxHlCBVe+ZqNIl0gP8OAXZ
51pghcLlcwYNeYRkSzvfEhO9aeUsRg40O5UBCklVMRScrx7qie2wsllEFavb7vZd
Ejv89dvn0KtzYIHG+U5SQ9d0x1JL9qApVM06sCGZGsBM9r4hHFGa8x57Pr+51ZPs
j0qbr7iAWgCGXN8c3p/Nrev6eqBio81CpD9PWik7QpJS38EKkussqfPdQitgQpsi
7r0Sx51oBzyAVadmVAf8/flUrbJTvD3BdbkzN99W4BXyARJk7CI=
=Vh2x
-----END PGP SIGNATURE-----
Merge tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Various virtual vs physical address usage fixes
- Add new bitwise types and helper functions and use them in s390
specific drivers and code to make it easier to find virtual vs
physical address usage bugs.
Right now virtual and physical addresses are identical for s390,
except for module, vmalloc, and similar areas. This will be changed,
hopefully with the next merge window, so that e.g. the kernel image
and modules will be located close to each other, allowing for direct
branches and also for some other simplifications.
As a prerequisite this requires to fix all misuses of virtual and
physical addresses. As it turned out people are so used to the
concept that virtual and physical addresses are the same, that new
bugs got added to code which was already fixed. In order to avoid
that even more code gets merged which adds such bugs add and use new
bitwise types, so that sparse can be used to find such usage bugs.
Most likely the new types can go away again after some time
- Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation
- Fix kprobe branch handling: if an out-of-line single stepped relative
branch instruction has a target address within a certain address area
in the entry code, the program check handler may incorrectly execute
cleanup code as if KVM code was executed, leading to crashes
- Fix reference counting of zcrypt card objects
- Various other small fixes and cleanups
* tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
s390/entry: compare gmap asce to determine guest/host fault
s390/entry: remove OUTSIDE macro
s390/entry: add CIF_SIE flag and remove sie64a() address check
s390/cio: use while (i--) pattern to clean up
s390/raw3270: make class3270 constant
s390/raw3270: improve raw3270_init() readability
s390/tape: make tape_class constant
s390/vmlogrdr: make vmlogrdr_class constant
s390/vmur: make vmur_class constant
s390/zcrypt: make zcrypt_class constant
s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL support
s390/vfio_ccw_cp: use new address translation helpers
s390/iucv: use new address translation helpers
s390/ctcm: use new address translation helpers
s390/lcs: use new address translation helpers
s390/qeth: use new address translation helpers
s390/zfcp: use new address translation helpers
s390/tape: fix virtual vs physical address confusion
s390/3270: use new address translation helpers
s390/3215: use new address translation helpers
...
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is hotplugged
as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving policy
wherein we allocate memory across nodes in a weighted fashion rather
than uniformly. This is beneficial in heterogeneous memory environments
appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the process
has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown situations.
The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's series
"Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page faults.
He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction test",
Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
our handling of DAX on archiecctures which have virtually aliasing data
caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
improvements in worst-case mmap_lock hold times during certain
userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability improvements
in his series "Mitigate a vmap lock contention". It realizes a 12x
improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging of
large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages() to
an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which are
configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
=TG55
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.
dax_direct_access() should receive a virtual kernel address in kaddr.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
In preparation for checking whether the architecture has data cache
aliasing within alloc_dax(), modify the error handling of dcssblk
dcssblk_add_store() to handle alloc_dax() -EOPNOTSUPP failures.
Considering that s390 is not a data cache aliasing architecture,
and considering that DCSSBLK selects DAX, a return value of -EOPNOTSUPP
from alloc_dax() should make dcssblk_add_store() fail.
Link: https://lkml.kernel.org/r/20240215144633.96437-6-mathieu.desnoyers@efficios.com
Fixes: d92576f116 ("dax: does not work correctly with virtual aliasing caches")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.
Also change blk_alloc_disk to return an ERR_PTR instead of just NULL
which can't distinguish errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
dcssblk_remove_store() holds the dcssblk_devices_sem semaphore while
calling del_gendisk(dev_info->gd), which in turn tries to acquire
disk->open_mutex. Then there is dcssblk_release(), which is called
with disk->open_mutex held, and tries to acquire dcssblk_devices_sem.
Lockdep reports this as possible circular locking dependency (CPU0 =
dcssblk_remove_store, CPU1 = dcssblk_release):
[ 44.948865] Possible unsafe locking scenario:
[ 44.948866] CPU0 CPU1
[ 44.948867] ---- ----
[ 44.948868] lock(&dcssblk_devices_sem);
[ 44.948870] lock(&disk->open_mutex);
[ 44.948872] lock(&dcssblk_devices_sem);
[ 44.948874] lock(&disk->open_mutex);
[ 44.948876]
*** DEADLOCK ***
In practice, this deadlock should not happen, since dcssblk_remove_store()
checks for dev_info->use_count != 0 after acquiring dcssblk_devices_sem,
and breaks out before calling del_gendisk(). dev_info->use_count will be
decremented in dcssblk_release(), protected by dcssblk_devices_sem.
Still there is no need for dcssblk_remove_store() to hold the
dcssblk_devices_sem until after calling del_gendisk(), as this only
protects dcssblk internal data. So fix the lockdep warning by releasing
dcssblk_devices_sem earlier. Also move the segment_unload() loop up,
similar to dcssblk_shared_store() error path, no need to do that after
calling del_gendisk().
Also change dcssblk_shared_store() error path, where dcssblk_devices_sem
was also released only after calling del_gendisk(), and a similar lockdep
warning could be triggered (but also deadlock prevented by check for
dev_info->use_count).
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit fb08a1908c ("dax: simplify the dax_device <-> gendisk
association") introduced new logic for gendisk association, requiring
drivers to explicitly call dax_add_host() and dax_remove_host().
For dcssblk driver, some dax_remove_host() calls were missing, e.g. in
device remove path. The commit also broke error handling for out_dax case
in device add path, resulting in an extra put_device() w/o the previous
get_device() in that case.
This lead to stale xarray entries after device add / remove cycles. In the
case when a previously used struct gendisk pointer (xarray index) would be
used again, because blk_alloc_disk() happened to return such a pointer, the
xa_insert() in dax_add_host() would fail and go to out_dax, doing the extra
put_device() in the error path. In combination with an already flawed error
handling in dcssblk (device_register() cleanup), which needs to be
addressed in a separate patch, this resulted in a missing device_del() /
klist_del(), and eventually in the kernel crash with list_add corruption on
a subsequent device_add() / klist_add().
Fix this by adding the missing dax_remove_host() calls, and also move the
put_device() in the error path to restore the previous logic.
Fixes: fb08a1908c ("dax: simplify the dax_device <-> gendisk association")
Cc: <stable@vger.kernel.org> # 5.17+
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Fix virtual vs physical address confusion (which currently are the same).
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use IS_ALIGNED() instead of cumbersome bit manipulations.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
- Fix virtual vs physical address confusion in vmem_add_range()
and vmem_remove_range() functions.
- Include <linux/io.h> instead of <asm/io.h> and <asm-generic/io.h>
throughout s390 code.
- Make all PSW related defines also available for assembler files.
Remove PSW_DEFAULT_KEY define from uapi for that.
- When adding an undefined symbol the build still succeeds, but
userspace crashes trying to execute VDSO, because the symbol
is not resolved. Add undefined symbols check to prevent that.
- Use kvmalloc_array() instead of kzalloc() for allocaton of 256k
memory when executing s390 crypto adapter IOCTL.
- Add -fPIE flag to prevent decompressor misaligned symbol build
error with clang.
- Use .balign instead of .align everywhere. This is a no-op for s390,
but with this there no mix in using .align and .balign anymore.
- Filter out -mno-pic-data-is-text-relative flag when compiling
kernel to prevent VDSO build error.
- Rework entering of DAT-on mode on CPU restart to use PSW_KERNEL_BITS
mask directly.
- Do not retry administrative requests to some s390 crypto cards,
since the firmware assumes replay attacks.
- Remove most of the debug code, which is build in when kernel config
option CONFIG_ZCRYPT_DEBUG is enabled.
- Remove CONFIG_ZCRYPT_MULTIDEVNODES kernel config option and switch
off the multiple devices support for the s390 zcrypt device driver.
- With the conversion to generic entry machine checks are accounted
to the current context instead of irq time. As result, the STCKF
instruction at the beginning of the machine check handler and the
lowcore member are no longer required, therefore remove it.
- Fix various typos found with codespell.
- Minor cleanups to CPU-measurement Counter and Sampling Facilities code.
- Revert patch that removes VMEM_MAX_PHYS macro, since it causes
a regression.
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZKakSBccYWdvcmRlZXZA
bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8HeIAQCg9RX3/olsZhCqRNLZ/O+6FXAF
29ohi2JmVqxJBKkmwgEA/QXCjoTOp41pQJ1FD39HnI8DeYpJFRnYYE5D3acibAw=
=2Ykk
-----END PGP SIGNATURE-----
Merge tag 's390-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- Fix virtual vs physical address confusion in vmem_add_range() and
vmem_remove_range() functions
- Include <linux/io.h> instead of <asm/io.h> and <asm-generic/io.h>
throughout s390 code
- Make all PSW related defines also available for assembler files.
Remove PSW_DEFAULT_KEY define from uapi for that
- When adding an undefined symbol the build still succeeds, but
userspace crashes trying to execute VDSO, because the symbol is not
resolved. Add undefined symbols check to prevent that
- Use kvmalloc_array() instead of kzalloc() for allocaton of 256k
memory when executing s390 crypto adapter IOCTL
- Add -fPIE flag to prevent decompressor misaligned symbol build error
with clang
- Use .balign instead of .align everywhere. This is a no-op for s390,
but with this there no mix in using .align and .balign anymore
- Filter out -mno-pic-data-is-text-relative flag when compiling kernel
to prevent VDSO build error
- Rework entering of DAT-on mode on CPU restart to use PSW_KERNEL_BITS
mask directly
- Do not retry administrative requests to some s390 crypto cards, since
the firmware assumes replay attacks
- Remove most of the debug code, which is build in when kernel config
option CONFIG_ZCRYPT_DEBUG is enabled
- Remove CONFIG_ZCRYPT_MULTIDEVNODES kernel config option and switch
off the multiple devices support for the s390 zcrypt device driver
- With the conversion to generic entry machine checks are accounted to
the current context instead of irq time. As result, the STCKF
instruction at the beginning of the machine check handler and the
lowcore member are no longer required, therefore remove it
- Fix various typos found with codespell
- Minor cleanups to CPU-measurement Counter and Sampling Facilities
code
- Revert patch that removes VMEM_MAX_PHYS macro, since it causes a
regression
* tag 's390-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (25 commits)
Revert "s390/mm: get rid of VMEM_MAX_PHYS macro"
s390/cpum_sf: remove check on CPU being online
s390/cpum_sf: handle casts consistently
s390/cpum_sf: remove unnecessary debug statement
s390/cpum_sf: remove parameter in call to pr_err
s390/cpum_sf: simplify function setup_pmu_cpu
s390/cpum_cf: remove unneeded debug statements
s390/entry: remove mcck clock
s390: fix various typos
s390/zcrypt: remove ZCRYPT_MULTIDEVNODES kernel config option
s390/zcrypt: do not retry administrative requests
s390/zcrypt: cleanup some debug code
s390/entry: rework entering DAT-on mode on CPU restart
s390/mm: fence off VM macros from asm and linker
s390: include linux/io.h instead of asm/io.h
s390/ptrace: make all psw related defines also available for asm
s390/ptrace: remove PSW_DEFAULT_KEY from uapi
s390/vdso: filter out mno-pic-data-is-text-relative cflag
s390: consistently use .balign instead of .align
s390/decompressor: fix misaligned symbol build error
...
Include linux/io.h instead of asm/io.h everywhere. linux/io.h includes
asm/io.h, so this shouldn't cause any problems. Instead this might help for
some randconfig build errors which were reported due to some undefined io
related functions.
Also move the changed include so it stays grouped together with other
includes from the same directory.
For ctcm_mpc.c also remove not needed comments (actually questions).
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
- DAX fixes and cleanups including a use after free, extra references,
and device unregistration, and a redundant variable.
- Allow the DAX fault handler to return VM_FAULT_HWPOISON
- A few libnvdimm cleanups such as making some functions and variables
static where sufficient.
- Add a few missing prototypes for wrapped functions in
tools/testing/nvdimm
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQT9vPEBxh63bwxRYEEPzq5USduLdgUCZJ6AdAAKCRAPzq5USduL
dtGnAP9uh+DxVKLnp/Q0977pLZKYVHYU32C/pG3hFnjS5tAp6QEAke/uF+wxcTGr
EZdnDJuTGt2sAMQsQ34NdDJUzwqQEgw=
=7l6z
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm and DAX updates from Vishal Verma:
"This is mostly small cleanups and fixes, with the biggest change being
the change to the DAX fault handler allowing it to return
VM_FAULT_HWPOISON.
Summary:
- DAX fixes and cleanups including a use after free, extra
references, and device unregistration, and a redundant variable.
- Allow the DAX fault handler to return VM_FAULT_HWPOISON
- A few libnvdimm cleanups such as making some functions and
variables static where sufficient.
- Add a few missing prototypes for wrapped functions in
tools/testing/nvdimm"
* tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: enable dax fault handler to report VM_FAULT_HWPOISON
nvdimm: make security_show static
nvdimm: make nd_class variable static
dax/kmem: Pass valid argument to memory_group_register_static
fsdax: remove redundant variable 'error'
dax: Cleanup extra dax_region references
dax: Introduce alloc_dev_dax_id()
dax: Use device_unregister() in unregister_dax_mapping()
dax: Fix dax_mapping_release() use after free
tools/testing/nvdimm: Drop empty platform remove function
libnvdimm: mark 'security_show' static again
testing: nvdimm: add missing prototypes for wrapped functions
dax: fix missing-prototype warnings
When multiple processes mmap() a dax file, then at some point,
a process issues a 'load' and consumes a hwpoison, the process
receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb
set for the poison scope. Soon after, any other process issues
a 'load' to the poisoned page (that is unmapped from the kernel
side by memory_failure), it receives a SIGBUS with
si_code = BUS_ADRERR and without valid si_lsb.
This is confusing to user, and is different from page fault due
to poison in RAM memory, also some helpful information is lost.
Channel dax backend driver's poison detection to the filesystem
such that instead of reporting VM_FAULT_SIGBUS, it could report
VM_FAULT_HWPOISON.
If user level block IO syscalls fail due to poison, the errno will
be converted to EIO to maintain block API consistency.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The mode argument to the ->release block_device_operation is never used,
so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->open is only called on the whole device. Make that explicit by
passing a gendisk instead of the block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
s390 iterates over the bio using bio_for_each_segment and doesn't need
any bio splitting.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20230123075356.60847-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This can't happen right now, but in preparation for allowing
bio_split_to_limits() returning NULL if it ended the bio, check for it
in all the callers.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After the rework from commit 1ebe2e5f9d ("block: remove
GENHD_FL_EXT_DEVT"), when calling device_add_disk(), dcssblk will end up
in disk_scan_partitions(), and not break out early w/o GENHD_FL_NO_PART.
This will trigger implicit open/release via blkdev_get/put_whole()
later. dcssblk_release() will then deadlock on dcssblk_devices_sem
semaphore, which is already held from dcssblk_add_store() when calling
device_add_disk().
dcssblk does not support partitions (DCSSBLK_MINORS_PER_DISK == 1), and
never scanned partitions before. Therefore restore the previous
behavior, and explicitly disallow partition scanning by setting the
GENHD_FL_NO_PART flag. This will also prevent this deadlock scenario.
Fixes: 1ebe2e5f9d ("block: remove GENHD_FL_EXT_DEVT")
Cc: <stable@vger.kernel.org> # 5.17+
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
The double indirect bio leads to somewhat suboptimal code generation.
Instead return the (original or split) bio, and make sure the
request_queue arguments to the lower level helpers is passed after the
bio to avoid constant reshuffling of the argument passing registers.
Also give it and the helpers used to implement it more descriptive names.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now,
so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Up till now, dax_direct_access() is used implicitly for normal
access, but for the purpose of recovery write, dax range with
poison is requested. To make the interface clear, introduce
enum dax_access_mode {
DAX_ACCESS,
DAX_RECOVERY_WRITE,
}
where DAX_ACCESS is used for normal dax access, and
DAX_RECOVERY_WRITE is used for dax recovery write.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
These methods indirect the actual DAX read/write path. In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.
Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
routines are used to remove indirect call from the read/write fast path
as well as a lot of boilerplate code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
just let the drivers call set_dax_synchronous directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Just open code the block size and dax_dev == NULL checks in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs]
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitly calls from the block drivers that
want to associate their gendisk with a dax_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
- Add support for ftrace with direct call and ftrace direct call samples.
- Add support for kernel command lines longer than current 896 bytes and
make its length configurable.
- Add support for BEAR enhancement facility to improve last breaking
event instruction tracking.
- Add kprobes sanity checks and testcases to prevent kprobe in the mid
of an instruction.
- Allow concurrent access to /dev/hwc for the CPUMF users.
- Various ftrace / jump label improvements.
- Convert unwinder tests to KUnit.
- Add s390_iommu_aperture kernel parameter to tweak the limits on
concurrently usable DMA mappings.
- Add ap.useirq AP module option which can be used to disable interrupt
use.
- Add add_disk() error handling support to block device drivers.
- Drop arch specific and use generic implementation of strlcpy and strrchr.
- Several __pa/__va usages fixes.
- Various cio, crypto, pci, kernel doc and other small fixes and
improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmGFW6EACgkQjYWKoQLX
FBg20Qf/UbohgnKnE6vxbbH3sNTlI2dk3Cw4z3IobcsZgqXAu6AFLgLQGLk/X07F
DIyUdrgSgCzLIEKLqrLrFXIOMIK44zAGaurIltNt7IrnWWlA+/YVD+YeL2gHwccq
wT7KXRcrVMZQ1z18djJQ45DpPUC8ErBdL6+P+ftHck90YGFZsfMA5S7jf8X1h08U
IlqdPTmY8t4unKHWVpHbxx9b+xrUuV6KTEXADsllpMV2jQoTLdDECd3vmefYR6tR
3lssgop1m/RzH5OCqvia5Sy2D5fOQObNWDMakwOkVMxOD43lmGCTHstzS2Uo2OFE
QcY79lfZ5NrzKnenUdE5Fd0XJ9kSwQ==
=k0Ab
-----END PGP SIGNATURE-----
Merge tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for ftrace with direct call and ftrace direct call
samples.
- Add support for kernel command lines longer than current 896 bytes
and make its length configurable.
- Add support for BEAR enhancement facility to improve last breaking
event instruction tracking.
- Add kprobes sanity checks and testcases to prevent kprobe in the mid
of an instruction.
- Allow concurrent access to /dev/hwc for the CPUMF users.
- Various ftrace / jump label improvements.
- Convert unwinder tests to KUnit.
- Add s390_iommu_aperture kernel parameter to tweak the limits on
concurrently usable DMA mappings.
- Add ap.useirq AP module option which can be used to disable interrupt
use.
- Add add_disk() error handling support to block device drivers.
- Drop arch specific and use generic implementation of strlcpy and
strrchr.
- Several __pa/__va usages fixes.
- Various cio, crypto, pci, kernel doc and other small fixes and
improvements all over the code.
[ Merge fixup as per https://lore.kernel.org/all/YXAqZ%2FEszRisunQw@osiris/ ]
* tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (63 commits)
s390: make command line configurable
s390: support command lines longer than 896 bytes
s390/kexec_file: move kernel image size check
s390/pci: add s390_iommu_aperture kernel parameter
s390/spinlock: remove incorrect kernel doc indicator
s390/string: use generic strlcpy
s390/string: use generic strrchr
s390/ap: function rework based on compiler warning
s390/cio: make ccw_device_dma_* more robust
s390/vfio-ap: s390/crypto: fix all kernel-doc warnings
s390/hmcdrv: fix kernel doc comments
s390/ap: new module option ap.useirq
s390/cpumf: Allow multiple processes to access /dev/hwc
s390/bitops: return true/false (not 1/0) from bool functions
s390: add support for BEAR enhancement facility
s390: introduce nospec_uses_trampoline()
s390: rename last_break to pgm_last_break
s390/ptrace: add last_break member to pt_regs
s390/sclp: sort out physical vs virtual pointers usage
s390/setup: convert start and end initrd pointers to virtual
...
Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.
Polling for the bio itself leads to a few advantages:
- the cookie construction can made entirely private in blk-mq.c
- the caller does not need to remember the request_queue and cookie
separately and thus sidesteps their lifetime issues
- keeping the device and the cookie inside the bio allows to trivially
support polling BIOs remapping by stacking drivers
- a lot of code to propagate the cookie back up the submission path can
be removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Link: https://lore.kernel.org/r/20210927220232.1071926-6-mcgrof@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Rework inline asm to get rid of error prone "register asm" constructs,
which are problematic especially when code instrumentation is enabled. In
particular introduce and use register pair union to allocate even/odd
register pairs. Unfortunately this breaks compatibility with older
clang compilers and minimum clang version for s390 has been raised to 13.
https://lore.kernel.org/linux-next/CAK7LNARuSmPCEy-ak0erPrPTgZdGVypBROFhtw+=3spoGoYsyw@mail.gmail.com/
- Fix gcc 11 warnings, which triggered various minor reworks all over
the code.
- Add zstd kernel image compression support.
- Rework boot CPU lowcore handling.
- De-duplicate and move kernel memory layout setup logic earlier.
- Few fixes in preparation for FORTIFY_SOURCE performing compile-time
and run-time field bounds checking for mem functions.
- Remove broken and unused power management support leftovers in s390
drivers.
- Disable stack-protector for decompressor and purgatory to fix buildroot
build.
- Fix vt220 sclp console name to match the char device name.
- Enable HAVE_IOREMAP_PROT and add zpci_set_irq()/zpci_clear_irq() in
zPCI code.
- Remove some implausible WARN_ON_ONCEs and remove arch specific counter
transaction call backs in favour of default transaction handling in
perf code.
- Extend/add new uevents for online/config/mode state changes of
AP card / queue device in zcrypt.
- Minor entry and ccwgroup code improvements.
- Other small various fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmDhuTEACgkQjYWKoQLX
FBjVlggAgDFBkDjlyfvrm4xzmHi7BJMmhrTJIONsSz+3tcA4/u5kE+Hrdrqxm0Uh
ZH4MXBxn4q4Fmoomhu5w5ZDe8o2ip0aN9fFNdsBoP8hurmQbL/IbdTnBETKMrKpV
XpogU2G7p+2nQ0+9+o6PS/vWlZhI88NVh8dWyRd2+5/XdMycgLv2Qm7NpQoACVw1
CbUvxP2PlpZ0wltLvNBKPg1xXMZa3GS0wbVUsS2jiWcr/3VzCqfTHenZJ/RadoE6
axG99QXCbLDMsJgVQcXtlI8K6Z461fAwbNtWZWC+Uq7o5pYuUFW1dovMg9WWF+7T
lFNqXyyNy5wwITRkvuzjlVTE8yzYYg==
=ADZ4
-----END PGP SIGNATURE-----
Merge tag 's390-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Rework inline asm to get rid of error prone "register asm"
constructs, which are problematic especially when code
instrumentation is enabled.
In particular introduce and use register pair union to allocate
even/odd register pairs. Unfortunately this breaks compatibility with
older clang compilers and minimum clang version for s390 has been
raised to 13.
https://lore.kernel.org/linux-next/CAK7LNARuSmPCEy-ak0erPrPTgZdGVypBROFhtw+=3spoGoYsyw@mail.gmail.com/
- Fix gcc 11 warnings, which triggered various minor reworks all over
the code.
- Add zstd kernel image compression support.
- Rework boot CPU lowcore handling.
- De-duplicate and move kernel memory layout setup logic earlier.
- Few fixes in preparation for FORTIFY_SOURCE performing compile-time
and run-time field bounds checking for mem functions.
- Remove broken and unused power management support leftovers in s390
drivers.
- Disable stack-protector for decompressor and purgatory to fix
buildroot build.
- Fix vt220 sclp console name to match the char device name.
- Enable HAVE_IOREMAP_PROT and add zpci_set_irq()/zpci_clear_irq() in
zPCI code.
- Remove some implausible WARN_ON_ONCEs and remove arch specific
counter transaction call backs in favour of default transaction
handling in perf code.
- Extend/add new uevents for online/config/mode state changes of AP
card / queue device in zcrypt.
- Minor entry and ccwgroup code improvements.
- Other small various fixes and improvements all over the code.
* tag 's390-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (91 commits)
s390/dasd: use register pair instead of register asm
s390/qdio: get rid of register asm
s390/ioasm: use symbolic names for asm operands
s390/ioasm: get rid of register asm
s390/cmf: get rid of register asm
s390/lib,string: get rid of register asm
s390/lib,uaccess: get rid of register asm
s390/string: get rid of register asm
s390/cmpxchg: use register pair instead of register asm
s390/mm,pages-states: get rid of register asm
s390/lib,xor: get rid of register asm
s390/timex: get rid of register asm
s390/hypfs: use register pair instead of register asm
s390/zcrypt: Switch to flexible array member
s390/speculation: Use statically initialized const for instructions
virtio/s390: get rid of open-coded kvm hypercall
s390/pci: add zpci_set_irq()/zpci_clear_irq()
scripts/min-tool-version.sh: Raise minimum clang version to 13.0.0 for s390
s390/ipl: use register pair instead of register asm
s390/mem_detect: fix tprot() program check new psw handling
...
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover dcssblk-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert the dcssblk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device. From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The block layer already checks for this conditions in bio_check_eod
before calling the driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bd_block_size contains a value that matches the logic block size when
opening, so the statement is redundant. Even if it wasn't the dumb
assignment would cause a a mismatch with bd_inode->i_blkbits.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The make_request_fn is a little weird in that it sits directly in
struct request_queue instead of an operation vector. Replace it with
a block_device_operations method called submit_bio (which describes much
better what it does). Also remove the request_queue argument to it, as
the queue can be derived pretty trivially from the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The queue can be trivially derived from the bio, so pass one less
argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.
- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.
- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.
- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach in
the platform-memory subsystem before the platform will consider them
power-fail protected.
- Promote numa_map_to_online_node() to a cross-kernel generic facility.
- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.
- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit test
compilation fixups.
- Fixup some flexible-array declarations.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl6LtIAACgkQHtKRamZ9
iAIwRA/8CLVVuQpgHQ1tqK4h8CZPrISFXh7wy7uhocEU2xrDh6iGVnLztmoLRr2k
5f8T9lRzreSAwIVL5DbGqP1pFncqIt9VMnKsFlaPMBGCBNR+hURY0iBCNjIT+jiq
BOzLd52MR2rqJxeXGTMUbWrBrbmuj4mZPdmGVuFFe7GFRpoaVpCgOo+296eWa/ot
gIOFUTonZY7STYjNvDok0TXCmiCFuJb+P+y5ldfCPShHvZhTiaF53jircja8vAjO
G5dt8ixBKUK0rXRc4SEQsQhAZNcAFHb6Gy5lg4C2QzhTF374xTc9usJZNWbIE9iM
5mipBYvjVuoY+XaCNZDkaRcJIy/jqB15O6l3QIWbZLGaK9m95YPp9LmkPFwd3JpO
e3rO24ML471DxqB9iWIiJCNcBBocLOlnd6qAQTpppWDpGNbudwXvfsmKHmKIScSE
x+IDCdscLmmm+WG2dLmLraWOVPu42xZFccoQCi4M3TTqfeB9pZ9XckFQ37zX62zG
5t+7Ek+t1W4QVt/JQYVKH03XT15sqUpVknvx0Hl4Y5TtbDOkFLkO8RN0/HyExDef
7iegS35kqTsM4EfZQ+9juKbI2JBAjHANcbj0V4dogqaRj6vr3akumBzUtuYqAofv
qU3s9skmLsEemOJC+ns2PT8vl5dyIoeDfH0r2XvGWxYqolMqJpA=
=sY4N
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm and dax updates from Dan Williams:
"There were multiple touches outside of drivers/nvdimm/ this round to
add cross arch compatibility to the devm_memremap_pages() interface,
enhance numa information for persistent memory ranges, and add a
zero_page_range() dax operation.
This cycle I switched from the patchwork api to Konstantin's b4 script
for collecting tags (from x86, PowerPC, filesystem, and device-mapper
folks), and everything looks to have gone ok there. This has all
appeared in -next with no reported issues.
Summary:
- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.
- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.
- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.
- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach
in the platform-memory subsystem before the platform will consider
them power-fail protected.
- Promote numa_map_to_online_node() to a cross-kernel generic
facility.
- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.
- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit
test compilation fixups.
- Fixup some flexible-array declarations"
* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
dax: Move mandatory ->zero_page_range() check in alloc_dax()
dax,iomap: Add helper dax_iomap_zero() to zero a range
dax: Use new dax zero page method for zeroing a page
dm,dax: Add dax zero_page_range operation
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dax, pmem: Add a dax operation zero_page_range
pmem: Add functions for reading/writing page to/from pmem
libnvdimm: Update persistence domain value for of_pmem and papr_scm device
tools/test/nvdimm: Fix out of tree build
libnvdimm/region: Fix build error
libnvdimm/region: Replace zero-length array with flexible-array member
libnvdimm/label: Replace zero-length array with flexible-array member
ACPI: NFIT: Replace zero-length array with flexible-array member
libnvdimm/region: Introduce an 'align' attribute
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm: Out of bounds read in __nd_ioctl()
acpi/nfit: improve bounds checking for 'func'
mm/memremap_pages: Introduce memremap_compat_align()
...
zero_page_range() dax operation is mandatory for dax devices. Right now
that check happens in dax_zero_page_range() function. Dan thinks that's
too late and its better to do the check earlier in alloc_dax().
I also modified alloc_dax() to return pointer with error code in it in
case of failure. Right now it returns NULL and caller assumes failure
happened due to -ENOMEM. But with this ->zero_page_range() check, I
need to return -EINVAL instead.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>