Commit Graph

301 Commits

Author SHA1 Message Date
Matthew Wilcox (Oracle)
9c4ce08dd2 iomap: Convert iomap_write_end_inline to take a folio
This conversion is only safe because iomap only supports writes to inline
data which starts at the beginning of the file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-18 00:06:07 -05:00
Matthew Wilcox (Oracle)
bc6123a84a iomap: Convert iomap_write_begin() and iomap_write_end() to folios
These functions still only work in PAGE_SIZE chunks, but there are
fewer conversions from tail to head pages as a result of this patch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-12-18 00:06:07 -05:00
Matthew Wilcox (Oracle)
a25def1fe5 iomap: Convert __iomap_zero_iter to use a folio
The zero iterator can work in folio-sized chunks instead of page-sized
chunks.  This will save a lot of page cache lookups if the file is cached
in large folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-12-18 00:06:07 -05:00
Matthew Wilcox (Oracle)
d454ab82bc iomap: Allow iomap_write_begin() to be called with the full length
In the future, we want write_begin to know the entire length of the
write so that it can choose to allocate large folios.  Pass the full
length in from __iomap_zero_iter() and limit it where necessary.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-12-18 00:06:00 -05:00
Matthew Wilcox (Oracle)
ea0f843aa7 iomap: Convert iomap_page_mkwrite to use a folio
If we write to any page in a folio, we have to mark the entire
folio as dirty, and potentially COW the entire folio, because it'll
all get written back as one unit.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
3aa9c659bf iomap: Convert readahead and readpage to use a folio
Handle folios of arbitrary size instead of working in PAGE_SIZE units.
readahead_folio() decreases the page refcount for you, so this is not
quite a mechanical change.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
874628a2c5 iomap: Convert iomap_read_inline_data to take a folio
We still only support up to a single page of inline data (at least,
per call to iomap_read_inline_data()), but it can now be written into
the middle of a folio in case we decide to allocate a 16KiB page for
a file that's 8.1KiB in size.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
431c0566bb iomap: Use folio offsets instead of page offsets
Pass a folio around instead of the page, and make sure the offset
is relative to the start of the folio instead of the start of a page.
Also use size_t for offset & length to make it clear that these are byte
counts, and to support >2GB folios in the future.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
8ffd74e9a8 iomap: Convert bio completions to use folios
Use bio_for_each_folio() to iterate over each folio in the bio
instead of iterating over each page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
cd1e5afe55 iomap: Pass the iomap_page into iomap_set_range_uptodate
All but one caller already has the iomap_page, so we can avoid getting
it again.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
8306a5f563 iomap: Add iomap_invalidate_folio
Keep iomap_invalidatepage around as a wrapper for use in address_space
operations.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
39f16c8345 iomap: Convert iomap_releasepage to use a folio
This is an address_space operation, so its argument must remain as a
struct page, but we can use a folio internally.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
c46e8324ca iomap: Convert iomap_page_release to take a folio
iomap_page_release() was also assuming that it was being passed a
head page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:52 -05:00
Matthew Wilcox (Oracle)
435d44b3fd iomap: Convert iomap_page_create to take a folio
This function already assumed it was being passed a head page, so
just formalise that.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:51 -05:00
Matthew Wilcox (Oracle)
95c4cd053a iomap: Convert to_iomap_page to take a folio
The big comment about only using a head page can go away now that
it takes a folio argument.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-12-16 15:49:51 -05:00
Matthew Wilcox (Oracle)
d1bd0b4ebf fs/buffer: Convert __block_write_begin_int() to take a folio
There are no plans to convert buffer_head infrastructure to use large
folios, but __block_write_begin_int() is called from iomap, and it's
more convenient and less error-prone if we pass in a folio from iomap.
It also has a nice saving of almost 200 bytes of code from removing
repeated calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-12-16 15:49:51 -05:00
Christoph Hellwig
de291b5902 iomap: turn the byte variable in iomap_zero_iter into a ssize_t
@bytes also holds the return value from iomap_write_end, which can
contain a negative error value.  As @bytes is always less than the page
size even the signed type can hold the entire possible range.

Fixes: c6f4046865 ("fsdax: decouple zeroing from the iomap buffered I/O code")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211208091203.2927754-1-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-08 07:04:21 -08:00
Christoph Hellwig
c6f4046865 fsdax: decouple zeroing from the iomap buffered I/O code
Unshare the DAX and iomap buffered I/O page zeroing code.  This code
previously did a IS_DAX check deep inside the iomap code, which in
fact was the only DAX check in the code.  Instead move these checks
into the callers.  Most callers already have DAX special casing anyway
and XFS will need it for reflink support as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-19-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04 08:58:53 -08:00
Andreas Gruenbacher
5ad448ce29 iomap: iomap_read_inline_data cleanup
Change iomap_read_inline_data to return 0 or an error code; this
simplifies the callers.  Add a description.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: document the return value of iomap_read_inline_data explicitly]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-24 10:15:47 -08:00
Andreas Gruenbacher
d8af404ffc iomap: Fix inline extent handling in iomap_readpage
Before commit 740499c784 ("iomap: fix the iomap_readpage_actor return
value for inline data"), when hitting an IOMAP_INLINE extent,
iomap_readpage_actor would report having read the entire page.  Since
then, it only reports having read the inline data (iomap->length).

This will force iomap_readpage into another iteration, and the
filesystem will report an unaligned hole after the IOMAP_INLINE extent.
But iomap_readpage_actor (now iomap_readpage_iter) isn't prepared to
deal with unaligned extents, it will get things wrong on filesystems
with a block size smaller than the page size, and we'll eventually run
into the following warning in iomap_iter_advance:

  WARN_ON_ONCE(iter->processed > iomap_length(iter));

Fix that by changing iomap_readpage_iter to return 0 when hitting an
inline extent; this will cause iomap_iter to stop immediately.

To fix readahead as well, change iomap_readahead_iter to pass on
iomap_readpage_iter return values less than or equal to zero.

Fixes: 740499c784 ("iomap: fix the iomap_readpage_actor return value for inline data")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-21 16:28:07 -08:00
Andreas Gruenbacher
a6294593e8 iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable
Turn iov_iter_fault_in_readable into a function that returns the number
of bytes not faulted in, similar to copy_to_user, instead of returning a
non-zero value when any of the requested pages couldn't be faulted in.
This supports the existing users that require all pages to be faulted in
as well as new users that are happy if any pages can be faulted in.

Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
sure this change doesn't silently break things.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18 16:35:06 +02:00
Christoph Hellwig
fad0a1ab34 iomap: constify iomap_iter_srcmap
The srcmap returned from iomap_iter_srcmap is never modified, so mark
the iomap returned from it const and constify a lot of code that never
modifies the iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
b74b1293e6 iomap: rework unshare flag
Instead of another internal flags namespace inside of buffered-io.c,
just pass a UNSHARE hint in the main iomap flags field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
1b5c1e36dc iomap: pass an iomap_iter to various buffered I/O helpers
Pass the iomap_iter structure instead of individual parameters to
various internal helpers for buffered I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
253564baff iomap: switch iomap_page_mkwrite to use iomap_iter
Switch iomap_page_mkwrite to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
2aa3048e03 iomap: switch iomap_zero_range to use iomap_iter
Switch iomap_zero_range to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
8fc274d1f4 iomap: switch iomap_file_unshare to use iomap_iter
Switch iomap_file_unshare to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
ce83a0251c iomap: switch iomap_file_buffered_write to use iomap_iter
Switch iomap_file_buffered_write to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
f6d480006c iomap: switch readahead and readpage to use iomap_iter
Switch the page cache read functions to use iomap_iter instead of
iomap_apply.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
740499c784 iomap: fix the iomap_readpage_actor return value for inline data
The actor should never return a larger value than the length that was
passed in.  The current code handles this gracefully, but the opcoming
iter model will be more picky.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
1acd9e9c01 iomap: mark the iomap argument to iomap_read_page_sync const
iomap_read_page_sync never modifies the passed in iomap, so mark
it const.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
78c64b00f8 iomap: mark the iomap argument to iomap_read_inline_data const
iomap_read_inline_data never modifies the passed in iomap, so mark
it const.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Christoph Hellwig
1d25d0aecf iomap: remove the iomap arguments to ->page_{prepare,done}
These aren't actually used by the only instance implementing the methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-16 21:26:33 -07:00
Darrick J. Wong
b69eea82d3 iomap: pass writeback errors to the mapping
Modern-day mapping_set_error has the ability to squash the usual
negative error code into something appropriate for long-term storage in
a struct address_space -- ENOSPC becomes AS_ENOSPC, and everything else
becomes EIO.  iomap squashes /everything/ to EIO, just as XFS did before
that, but this doesn't make sense.

Fix this by making it so that we can pass ENOSPC to userspace when
writeback fails due to space problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2021-08-16 12:12:52 -07:00
Matthew Wilcox (Oracle)
ae44f9c286 iomap: Add another assertion to inline data handling
Check that the file tail does not cross a page boundary.  Requested by
Andreas.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-05 10:30:33 -07:00
Matthew Wilcox (Oracle)
ab069d5fdc iomap: Use kmap_local_page instead of kmap_atomic
kmap_atomic() has the side-effect of disabling pagefaults and
preemption.  kmap_local_page() does not do this and is preferred.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-05 10:30:33 -07:00
Andreas Gruenbacher
f1f264b4c1 iomap: Fix some typos and bad grammar
Fix some typos and bad grammar in buffered-io.c to make the comments
easier to read.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03 09:43:14 -07:00
Matthew Wilcox (Oracle)
b405435b41 iomap: Support inline data with block size < page size
Remove the restriction that inline data must start on a page boundary
in a file.  This allows, for example, the first 2KiB to be stored out
of line and the trailing 30 bytes to be stored inline.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03 09:43:13 -07:00
Gao Xiang
69f4a26c1e iomap: support reading inline data from non-zero pos
The existing inline data support only works for cases where the entire
file is stored as inline data.  For larger files, EROFS stores the
initial blocks separately and the remainder of the file ("file tail")
adjacent to the inode.  Generalise inline data to allow reading the
inline file tail.  Tails may not cross a page boundary in memory.

We currently have no filesystems that support tails and writing,
so that case is currently disabled (see iomap_write_begin_inline).

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03 09:43:13 -07:00
Christoph Hellwig
c1b79f11f4 iomap: simplify iomap_add_to_ioend
Now that the outstanding writes are counted in bytes, there is no need
to use the low-level __bio_try_merge_page API, we can switch back to
always using bio_add_page and simply iomap_add_to_ioend again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03 09:43:13 -07:00
Christoph Hellwig
d0364f9490 iomap: simplify iomap_readpage_actor
Now that the outstanding reads are counted in bytes, there is no need
to use the low-level __bio_try_merge_page API, we can switch back to
always using bio_add_page and simplify iomap_readpage_actor again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03 09:43:13 -07:00
Andreas Gruenbacher
229adf3c64 iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor
Now that we create those objects in iomap_writepage_map when needed,
there's no need to pre-create them in iomap_page_mkwrite_actor anymore.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-15 09:58:06 -07:00
Andreas Gruenbacher
637d337595 iomap: Don't create iomap_page objects for inline files
In iomap_readpage_actor, don't create iop objects for inline inodes.
Otherwise, iomap_read_inline_data will set PageUptodate without setting
iop->uptodate, and iomap_page_release will eventually complain.

To prevent this kind of bug from occurring in the future, make sure the
page doesn't have private data attached in iomap_read_inline_data.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-15 09:58:05 -07:00
Andreas Gruenbacher
8e1bcef8e1 iomap: Permit pages without an iop to enter writeback
Create an iop in the writeback path if one doesn't exist.  This allows us
to avoid creating the iop in some cases.  We'll initially do that for pages
with inline data, but it can be extended to pages which are entirely within
an extent.  It also allows for an iop to be removed from pages in the
future (eg page split).

Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-15 09:58:05 -07:00
Linus Torvalds
d3acb15a3a Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:
 "iov_iter cleanups and fixes.

  There are followups, but this is what had sat in -next this cycle. IMO
  the macro forest in there became much thinner and easier to follow..."

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller
  clean up copy_mc_pipe_to_iter()
  pipe_zero(): we don't need no stinkin' kmap_atomic()...
  iov_iter: clean csum_and_copy_...() primitives up a bit
  copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases
  copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases
  iterate_xarray(): only of the first iteration we might get offset != 0
  pull handling of ->iov_offset into iterate_{iovec,bvec,xarray}
  iov_iter: make iterator callbacks use base and len instead of iovec
  iov_iter: make the amount already copied available to iterator callbacks
  iov_iter: get rid of separate bvec and xarray callbacks
  iov_iter: teach iterate_{bvec,xarray}() about possible short copies
  iterate_bvec(): expand bvec.h macro forest, massage a bit
  iov_iter: unify iterate_iovec and iterate_kvec
  iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec
  iterate_and_advance(): get rid of magic in case when n is 0
  csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter()
  iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
  [xarray] iov_iter_npages(): just use DIV_ROUND_UP()
  iov_iter_npages(): don't bother with iterate_all_kinds()
  ...
2021-07-03 11:30:04 -07:00
Matthew Wilcox (Oracle)
fd7353f88b iomap: use __set_page_dirty_nobuffers
The only difference between iomap_set_page_dirty() and
__set_page_dirty_nobuffers() is that the latter includes a debugging check
that a !Uptodate page has private data.

Link: https://lkml.kernel.org/r/20210615162342.1669332-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:48 -07:00
Al Viro
f0b65f39ac iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
callers do *not* need to do iov_iter_advance() after it.  In case when they end
up consuming less than they'd been given they need to do iov_iter_revert() on
everything they had not consumed.  That, however, needs to be done only on slow
paths.

All in-tree callers converted.  And that kills the last user of iterate_all_kinds()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:14 -04:00
Al Viro
bc1bb416bb generic_perform_write()/iomap_write_actor(): saner logics for short copy
if we run into a short copy and ->write_end() refuses to advance at all,
use the amount we'd managed to copy for the next iteration to handle.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-02 17:50:44 -04:00
Matthew Wilcox (Oracle)
076171a677 mm/filemap: fix readahead return types
A readahead request will not allocate more memory than can be represented
by a size_t, even on systems that have HIGHMEM available.  Change the
length functions from returning an loff_t to a size_t.

Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org
Fixes: 32c0a6bcaa ("btrfs: add and use readahead_batch_length")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14 19:41:32 -07:00
Brian Foster
6e552494fb iomap: remove unused private field from ioend
The only remaining user of ->io_private is the generic ioend merging
infrastructure. The only user of that is XFS, which no longer sets
->io_private or passes an associated merge callback. Remove the
unused parameter and the ->io_private field.

CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-05-04 08:54:29 -07:00
Sami Tolvanen
4f0f586bf0 treewide: Change list_sort to use const pointers
list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-08 16:04:22 -07:00
Christoph Hellwig
a8affc03a9 block: rename BIO_MAX_PAGES to BIO_MAX_VECS
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed.  Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 07:47:48 -07:00
Linus Torvalds
03dc748bf1 More new code for 5.12:
- Restore a disused sysctl control knob that was inadvertently dropped
   during the merge window to avoid fstests regressions.
 - Don't speculatively release freed blocks from the busy list until
   we're actually allocating them, which fixes a rare log recovery
   regression.
 - Don't nest transactions when scanning for free space.
 - Add an idiot^Wmaintainer light to detect nested transactions. ;)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmA3zW8ACgkQ+H93GTRK
 tOtT4xAAmZ5BdQ6V3yUeT/N++L6Ax62T2VzEryZvVK/ZFyVBRYKi9LOL1exq1cja
 HXINPuYWAD8TbGVU9/lZR1yUX/y1VvJR0EPly8EN6WpGFeErSxLs++YzP1Q8iv5i
 ZtniscpGE6JvCcDeRH5kBfklGpyzTf3t6Xe8x+6+/aawf34ChNlM/gQcAyKvYYU5
 Jb9j7BqbRAnhvPEfa554yxIIoZhmTDYY7Wx7VMKCMcOP1lfriC+I1iuiZIMONIQJ
 mMgz9XnHVo256+YvkvwRKp294r+MEkuJL5EBXrs01r3PwVdaigo13qTk8l1ZC3zS
 VYkC/sRoiyMwnJvKEUNtnM3/8Zu/DvPp9iqXiWc60UBGqpBkm8Jgv+W6H7u1FinP
 0M0Wt2wHC7e51uW5G/8QwUXZv+n8IZHyZkkYbjyXRkhfyFlexYwTVchZz9q/RB/A
 HEZ9jcIke8Rwkav4f0kJ00Y/7FQSPn6ItapXf92rl00z3Z5S2sqBaT5kIotsW0Ke
 634yPknkLuBDQg4j8l3A88ik2SNFRQQfBXsjt27He/s2wV0Dj8RjDnLWfoV7P5to
 Sc2lx3HhL4OCojAXXAFP3MDKz0nqcuUTPoPCeS6QKQGcjTzVvoI7ZutXODcxi67k
 Q7AK+gIqHRWA8F+4wciYDwAHMES1rRAa7/iuYmtCtT1sBdXp9NU=
 =g9K3
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
 "The most notable fix here prevents premature reuse of freed metadata
  blocks, and adding the ability to detect accidental nested
  transactions, which are not allowed here.

   - Restore a disused sysctl control knob that was inadvertently
     dropped during the merge window to avoid fstests regressions.

   - Don't speculatively release freed blocks from the busy list until
     we're actually allocating them, which fixes a rare log recovery
     regression.

   - Don't nest transactions when scanning for free space.

   - Add an idiot^Wmaintainer light to detect nested transactions. ;)"

* tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use current->journal_info for detecting transaction recursion
  xfs: don't nest transactions when scanning for eofblocks
  xfs: don't reuse busy extents on extent trim
  xfs: restore speculative_cow_prealloc_lifetime sysctl
2021-02-28 11:45:25 -08:00
Matthew Wilcox (Oracle)
5f7136db82 block: Add bio_max_segs
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same.  Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-26 15:49:51 -07:00
Dave Chinner
756b1c3433 xfs: use current->journal_info for detecting transaction recursion
Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
recursion in XFS is just wrong. Remove it from the iomap code and
replace it with XFS specific internal checks using
current->journal_info instead.

[djwong: This change also realigns the lifetime of NOFS flag changes to
match the incore transaction, instead of the inconsistent scheme we have
now.]

Fixes: 9070733b4e ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-02-25 08:07:04 -08:00
Roman Gushchin
bcfe06bf26 mm: memcontrol: Use helpers to read page's memcg data
Patch series "mm: allow mapping accounted kernel pages to userspace", v6.

Currently a non-slab kernel page which has been charged to a memory cgroup
can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
flag is defined as a page type (like buddy, offline, etc), so it takes a
bit from a page->mapped counter.  Pages with a type set can't be mapped to
userspace.

But in general the kmemcg flag has nothing to do with mapping to
userspace.  It only means that the page has been accounted by the page
allocator, so it has to be properly uncharged on release.

Some bpf maps are mapping the vmalloc-based memory to userspace, and their
memory can't be accounted because of this implementation detail.

This patchset removes this limitation by moving the PageKmemcg flag into
one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
adds several checks and removes a couple of obsolete functions.  As the
result the code became more robust with fewer open-coded bit tricks.

This patch (of 4):

Currently there are many open-coded reads of the page->mem_cgroup pointer,
as well as a couple of read helpers, which are barely used.

It creates an obstacle on a way to reuse some bits of the pointer for
storing additional bits of information.  In fact, we already do this for
slab pages, where the last bit indicates that a pointer has an attached
vector of objcg pointers instead of a regular memcg pointer.

This commits uses 2 existing helpers and introduces a new helper to
converts all read sides to calls of these helpers:
  struct mem_cgroup *page_memcg(struct page *page);
  struct mem_cgroup *page_memcg_rcu(struct page *page);
  struct mem_cgroup *page_memcg_check(struct page *page);

page_memcg_check() is intended to be used in cases when the page can be a
slab page and have a memcg pointer pointing at objcg vector.  It does
check the lowest bit, and if set, returns NULL.  page_memcg() contains a
VM_BUG_ON_PAGE() check for the page not being a slab page.

To make sure nobody uses a direct access, struct page's
mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
2020-12-02 18:28:05 -08:00
Brian Foster
50e7d6c7a5 iomap: clean up writeback state logic on writepage error
The iomap writepage error handling logic is a mash of old and
slightly broken XFS writepage logic. When keepwrite writeback state
tracking was introduced in XFS in commit 0d085a529b ("xfs: ensure
WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an
additional cluster writeback context that scanned ahead of
->writepage() to process dirty pages over the current ->writepage()
extent mapping. This context expected a dirty page and required
retention of the TOWRITE tag on partial page processing so the
higher level writeback context would revisit the page (in contrast
to ->writepage(), which passes a page with the dirty bit already
cleared).

The cluster writeback mechanism was eventually removed and some of
the error handling logic folded into the primary writeback path in
commit 150d5be09c ("xfs: remove xfs_cancel_ioend"). This patch
accidentally conflated the two contexts by using the keepwrite logic
in ->writepage() without accounting for the fact that the page is
not dirty. Further, the keepwrite logic has no practical effect on
the core ->writepage() caller (write_cache_pages()) because it never
revisits a page in the current function invocation.

Technically, the page should be redirtied for the keepwrite logic to
have any effect. Otherwise, write_cache_pages() may find the tagged
page but will skip it since it is clean. Even if the page was
redirtied, however, there is still no practical effect to keepwrite
since write_cache_pages() does not wrap around within a single
invocation of the function. Therefore, the dirty page would simply
end up retagged on the next writeback sequence over the associated
range.

All that being said, none of this really matters because redirtying
a partially processed page introduces a potential infinite redirty
-> writeback failure loop that deviates from the current design
principle of clearing the dirty state on writepage failure to avoid
building up too much dirty, unreclaimable memory on the system.
Therefore, drop the spurious keepwrite usage and dirty state
clearing logic from iomap_writepage_map(), treat the partially
processed page the same as a fully processed page, and let the
imminent ioend failure clean up the writeback state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-11-04 08:52:46 -08:00
Brian Foster
763e4cdc0f iomap: support partial page discard on writeback block mapping failure
iomap writeback mapping failure only calls into ->discard_page() if
the current page has not been added to the ioend. Accordingly, the
XFS callback assumes a full page discard and invalidation. This is
problematic for sub-page block size filesystems where some portion
of a page might have been mapped successfully before a failure to
map a delalloc block occurs. ->discard_page() is not called in that
error scenario and the bio is explicitly failed by iomap via the
error return from ->prepare_ioend(). As a result, the filesystem
leaks delalloc blocks and corrupts the filesystem block counters.

Since XFS is the only user of ->discard_page(), tweak the semantics
to invoke the callback unconditionally on mapping errors and provide
the file offset that failed to map. Update xfs_discard_page() to
discard the corresponding portion of the file and pass the range
along to iomap_invalidatepage(). The latter already properly handles
both full and sub-page scenarios by not changing any iomap or page
state on sub-page invalidations.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-11-04 08:52:46 -08:00
Matthew Wilcox (Oracle)
4595a298d5 iomap: Set all uptodate bits for an Uptodate page
For filesystems with block size < page size, we need to set all the
per-block uptodate bits if the page was already uptodate at the time
we create the per-block metadata.  This can happen if the page is
invalidated (eg by a write to drop_caches) but ultimately not removed
from the page cache.

This is a data corruption issue as page writeback skips blocks which
are marked !uptodate.

Fixes: 9dc55f1389 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Qian Cai <cai@redhat.com>
Cc: Brian Foster <bfoster@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-28 08:47:01 -07:00
Matthew Wilcox (Oracle)
81ee8e52a7 iomap: Change calling convention for zeroing
Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled.  This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21 08:59:27 -07:00
Matthew Wilcox (Oracle)
e25ba8cbfd iomap: Convert iomap_write_end types
iomap_write_end cannot return an error, so switch it to return
size_t instead of int and remove the error checking from the callers.
Also convert the arguments to size_t from unsigned int, in case anyone
ever wants to support a page size larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21 08:59:26 -07:00
Matthew Wilcox (Oracle)
0fb2d7209d iomap: Convert write_count to write_bytes_pending
Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21 08:59:26 -07:00
Matthew Wilcox (Oracle)
7d636676d2 iomap: Convert read_count to read_bytes_pending
Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21 08:59:26 -07:00
Matthew Wilcox (Oracle)
0a195b91e8 iomap: Support arbitrarily many blocks per page
Size the uptodate array dynamically to support larger pages in the
page cache.  With a 64kB page, we're only saving 8 bytes per page today,
but with a 2MB maximum page size, we'd have to allocate more than 4kB
per page.  Add a few debugging assertions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21 08:59:26 -07:00
Matthew Wilcox (Oracle)
b21866f514 iomap: Use bitmap ops to set uptodate bits
Now that the bitmap is protected by a spinlock, we can use the
more efficient bitmap ops instead of individual test/set bit ops.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21 08:59:26 -07:00
Matthew Wilcox (Oracle)
a6901d4d14 iomap: Use kzalloc to allocate iomap_page
We can skip most of the initialisation, although spinlocks still
need explicit initialisation as architectures may use a non-zero
value to indicate unlocked.  The comment is no longer useful as
attach_page_private() handles the refcount now.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21 08:59:26 -07:00
Matthew Wilcox (Oracle)
24addd848a fs: Introduce i_blocks_per_page
This helper is useful for both THPs and for supporting block size larger
than page size.  Convert all users that I could find (we have a few
different ways of writing this idiom, and I may have missed some).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2020-09-21 08:59:26 -07:00
Matthew Wilcox (Oracle)
7ed3cd1a69 iomap: Fix misplaced page flushing
If iomap_unshare_actor() unshares to an inline iomap, the page was
not being flushed.  block_write_end() and __iomap_write_end() already
contain flushes, so adding it to iomap_write_end_inline() seems like
the best place.  That means we can remove it from iomap_write_actor().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21 08:59:26 -07:00
Nikolay Borisov
6cc19c5fad iomap: Use round_down/round_up macros in __iomap_write_begin
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21 08:59:25 -07:00
Matthew Wilcox (Oracle)
14284fedf5 iomap: Mark read blocks uptodate in write_begin
When bringing (portions of) a page uptodate, we were marking blocks that
were zeroed as being uptodate, but not blocks that were read from storage.

Like the previous commit, this problem was found with generic/127 and
a kernel which failed readahead I/Os.  This bug causes writes to be
silently lost when working with flaky storage.

Fixes: 9dc55f1389 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-10 08:26:18 -07:00
Matthew Wilcox (Oracle)
e6e7ca9262 iomap: Clear page error before beginning a write
If we find a page in write_begin which is !Uptodate, we need
to clear any error on the page before starting to read data
into it.  This matches how filemap_fault(), do_read_cache_page()
and generic_file_buffered_read() handle PageError on !Uptodate pages.
When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks
were not being marked as uptodate.

This was found with generic/127 and a specially modified kernel which
would fail (some) readahead I/Os.  The test read some bytes in a prior
page which caused readahead to extend into page 0x34.  There was
a subsequent write to page 0x34, followed by a read to page 0x34.
Because the blocks were still marked as !Uptodate, the read caused all
blocks to be re-read, overwriting the write.  With this change, and the
next one, the bytes which were written are marked as being Uptodate, so
even though the page is still marked as !Uptodate, the blocks containing
the written data are not re-read from storage.

Fixes: 9dc55f1389 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-10 08:26:17 -07:00
Linus Torvalds
593bd5e5d3 New code for 5.8:
- Fix an integer overflow problem in the unshare actor.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl7fCTEACgkQ+H93GTRK
 tOuQxQ//Ya/xLx9UPoZepTzjHQKl2MlYVYRfKCL60NrH6kNpvq9jyGiPg6xOXc3g
 KGTe23YDiuP80L3hpIZ9yj/SbJAItI8LsqHHrvVDbAdVSQdK56ajZqq3xwyvOC9u
 RqCkGkVzRE+nmToJQbYCSmPA446aqMWuCpmlsTbuGmjvkRKAMgBBG/66nbcplQnC
 eeflcVW7IdbbQ45K8QpyP4AeNMobc26B7zmWqXYeZuMxHcFsrnvld3pgke39i8Hk
 k0SzMenGddYfb6/FknnxHASMnqnhE7lA1YyWe7F3uDM8OwmpNIseBysqm+6tETkn
 DBlcpVeENNJB7ygPhqOJXmmDGnap5Y7vwhAc8jX84yuXRkd0gx5aTRIyH8cNp9lQ
 TRwoVY9DTUkUlMkSLpgeCFIOR5SyOW3H4xZV4PC0sJxAWtM0J3B8A5zvAjQ5kVRP
 79gVRpl2OUj648nbrPRwhDBwnNZAhflRVvBh9kasteA7SAtuGJFJKZZ162Smltz2
 1E9i/2CvUUartNOjKkT3qPzAF6B1Je3AGTMwuDPhcYX9bdW+9pCD09yi1CiGOn7S
 QuuwyHTAcLRtZiShNCG6zQhqq++zQCZ58J1IBHYajE73YM1+8r/5wCfTIhB+CPuf
 J0rjqS+d151d2qMBnK6oag0t2u5Hj+xlcJw9QnQGqPKs6yIktA0=
 =s+Pr
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fix from Darrick Wong:
 "A single iomap bug fix for a variable type mistake on 32-bit
  architectures, fixing an integer overflow problem in the unshare
  actor"

* tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: Fix unsharing of an extent >2GB on a 32-bit machine
2020-06-13 12:44:30 -07:00
Matthew Wilcox (Oracle)
d4ff3b2ef9 iomap: Fix unsharing of an extent >2GB on a 32-bit machine
Widen the type used for counting the number of bytes unshared.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-06-08 20:58:29 -07:00
Guoqing Jiang
58aeb73196 iomap: use attach/detach_page_private
Since the new pair function is introduced, we can call them to clean the
code in iomap.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Link: http://lkml.kernel.org/r/20200517214718.468-7-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:07 -07:00
Matthew Wilcox (Oracle)
9d24a13a93 iomap: convert from readpages to readahead
Use the new readahead operation in iomap.  Convert XFS and ZoneFS to use
it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-26-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:07 -07:00
Matthew Wilcox (Oracle)
d4388340ae fs: convert mpage_readpages to mpage_readahead
Implement the new readahead aop and convert all callers (block_dev,
exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
reiserfs & udf).

The callers are all trivial except for GFS2 & OCFS2.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:07 -07:00
Linus Torvalds
9744b923d5 Bug fixes for 5.7:
- Fix a problem in readahead where we can crash if we can't allocate a
 full bio due to GFP_NORETRY.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6GDloACgkQ+H93GTRK
 tOuH4g/+J95SMTPKVoChOwJ/vMS7VsPCrbj7GTgDe0LKtaRq5/dQ1fTd3QuSqAh9
 0KqCCvAb34y7oP58SccBlr11lRBsJPuWKVJAbiH02Hs+r+mWC+pbWtQz6PCez9wl
 hLN44jfsuLHPwivf3oHrmw8B5SHdFjkm659ocLseOJCI2tyQO0kZ8AUSEt22gx8r
 w/Z9pX2NCy7Bhzxpd5jvzh7xY27TTcI2M+i/FW9R4hgznM22Z2pZ8MoKTvDj5gly
 20UljmP0sTRgMOMavXs7ME2hHwXDitTIkmKD2JZjNkoeKxaRl4iz+9T0E4xZvdoZ
 lH+bbUikDocWlOn9zy9+VNBGqtLLKfBp8kOO/I1q1fkW3G/abEPqGvke2ehSmqFx
 65MWReBOS0L+g7A0KakHdJVKRuuK3lvXVls+a/2lzTjvjojXCBfy1YCTvhOP2dhT
 qO3d2ZPBEr+l0VKwLK4QQHz6/1zfJdUy2rFZZnagRUigKc9xW7oXXwkXRHd0sMRv
 KPOLwxeQu5TSf8hVSH9Z6M53TyEQXX2N0Kifr2GUFXkwoxY91I+hFnuGMSafoKTz
 CqjOJGNiBFM8ypT/FYRKSakZXEju6zHfyD0tQ0hnEAb2A2HCOJdGqejqbJRZywW2
 OM+loTn80spDyuoNpat3/bJBxQLAWX7Dq7m6zL1rkcQLtKxoxik=
 =5vQD
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fix from Darrick Wong:
 "Fix a problem in readahead where we can crash if we can't allocate a
  full bio due to GFP_NORETRY"

* tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: Handle memory allocation failure in readahead
2020-04-08 21:37:18 -07:00
Linus Torvalds
9b06860d7c libnvdimm for 5.7
- Add support for region alignment configuration and enforcement to
   fix compatibility across architectures and PowerPC page size
   configurations.
 
 - Introduce 'zero_page_range' as a dax operation. This facilitates
   filesystem-dax operation without a block-device.
 
 - Introduce phys_to_target_node() to facilitate drivers that want to
   know resulting numa node if a given reserved address range was
   onlined.
 
 - Advertise a persistence-domain for of_pmem and papr_scm. The
   persistence domain indicates where cpu-store cycles need to reach in
   the platform-memory subsystem before the platform will consider them
   power-fail protected.
 
 - Promote numa_map_to_online_node() to a cross-kernel generic facility.
 
 - Save x86 numa information to allow for node-id lookups for reserved
   memory ranges, deploy that capability for the e820-pmem driver.
 
 - Pick up some miscellaneous minor fixes, that missed v5.6-final,
   including a some smatch reports in the ioctl path and some unit test
   compilation fixups.
 
 - Fixup some flexible-array declarations.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl6LtIAACgkQHtKRamZ9
 iAIwRA/8CLVVuQpgHQ1tqK4h8CZPrISFXh7wy7uhocEU2xrDh6iGVnLztmoLRr2k
 5f8T9lRzreSAwIVL5DbGqP1pFncqIt9VMnKsFlaPMBGCBNR+hURY0iBCNjIT+jiq
 BOzLd52MR2rqJxeXGTMUbWrBrbmuj4mZPdmGVuFFe7GFRpoaVpCgOo+296eWa/ot
 gIOFUTonZY7STYjNvDok0TXCmiCFuJb+P+y5ldfCPShHvZhTiaF53jircja8vAjO
 G5dt8ixBKUK0rXRc4SEQsQhAZNcAFHb6Gy5lg4C2QzhTF374xTc9usJZNWbIE9iM
 5mipBYvjVuoY+XaCNZDkaRcJIy/jqB15O6l3QIWbZLGaK9m95YPp9LmkPFwd3JpO
 e3rO24ML471DxqB9iWIiJCNcBBocLOlnd6qAQTpppWDpGNbudwXvfsmKHmKIScSE
 x+IDCdscLmmm+WG2dLmLraWOVPu42xZFccoQCi4M3TTqfeB9pZ9XckFQ37zX62zG
 5t+7Ek+t1W4QVt/JQYVKH03XT15sqUpVknvx0Hl4Y5TtbDOkFLkO8RN0/HyExDef
 7iegS35kqTsM4EfZQ+9juKbI2JBAjHANcbj0V4dogqaRj6vr3akumBzUtuYqAofv
 qU3s9skmLsEemOJC+ns2PT8vl5dyIoeDfH0r2XvGWxYqolMqJpA=
 =sY4N
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm and dax updates from Dan Williams:
 "There were multiple touches outside of drivers/nvdimm/ this round to
  add cross arch compatibility to the devm_memremap_pages() interface,
  enhance numa information for persistent memory ranges, and add a
  zero_page_range() dax operation.

  This cycle I switched from the patchwork api to Konstantin's b4 script
  for collecting tags (from x86, PowerPC, filesystem, and device-mapper
  folks), and everything looks to have gone ok there. This has all
  appeared in -next with no reported issues.

  Summary:

   - Add support for region alignment configuration and enforcement to
     fix compatibility across architectures and PowerPC page size
     configurations.

   - Introduce 'zero_page_range' as a dax operation. This facilitates
     filesystem-dax operation without a block-device.

   - Introduce phys_to_target_node() to facilitate drivers that want to
     know resulting numa node if a given reserved address range was
     onlined.

   - Advertise a persistence-domain for of_pmem and papr_scm. The
     persistence domain indicates where cpu-store cycles need to reach
     in the platform-memory subsystem before the platform will consider
     them power-fail protected.

   - Promote numa_map_to_online_node() to a cross-kernel generic
     facility.

   - Save x86 numa information to allow for node-id lookups for reserved
     memory ranges, deploy that capability for the e820-pmem driver.

   - Pick up some miscellaneous minor fixes, that missed v5.6-final,
     including a some smatch reports in the ioctl path and some unit
     test compilation fixups.

   - Fixup some flexible-array declarations"

* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
  dax: Move mandatory ->zero_page_range() check in alloc_dax()
  dax,iomap: Add helper dax_iomap_zero() to zero a range
  dax: Use new dax zero page method for zeroing a page
  dm,dax: Add dax zero_page_range operation
  s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  dax, pmem: Add a dax operation zero_page_range
  pmem: Add functions for reading/writing page to/from pmem
  libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  tools/test/nvdimm: Fix out of tree build
  libnvdimm/region: Fix build error
  libnvdimm/region: Replace zero-length array with flexible-array member
  libnvdimm/label: Replace zero-length array with flexible-array member
  ACPI: NFIT: Replace zero-length array with flexible-array member
  libnvdimm/region: Introduce an 'align' attribute
  libnvdimm/region: Introduce NDD_LABELING
  libnvdimm/namespace: Enforce memremap_compat_align()
  libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
  libnvdimm: Out of bounds read in __nd_ioctl()
  acpi/nfit: improve bounds checking for 'func'
  mm/memremap_pages: Introduce memremap_compat_align()
  ...
2020-04-08 21:03:40 -07:00
Vivek Goyal
4f3b4f161d dax,iomap: Add helper dax_iomap_zero() to zero a range
Add a helper dax_ioamp_zero() to zero a range. This patch basically
merges __dax_zero_page_range() and iomap_dax_zero().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200228163456.1587-7-vgoyal@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-04-02 19:15:03 -07:00
Matthew Wilcox (Oracle)
457df33e03 iomap: Handle memory allocation failure in readahead
bio_alloc() can fail when we use GFP_NORETRY.  If it does, allocate
a bio large enough for a single page like mpage_readpages() does.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-02 09:08:53 -07:00
Matthew Wilcox (Oracle)
1ac994525b iomap: Remove pgoff from tracepoints
The 'pgoff' displayed by the tracepoints wasn't a pgoff at all; it
was a byte offset from the start of the file.  We already emit that in
the form of the 'offset', so we can just remove pgoff.  That means we
can remove 'page' as an argument to the tracepoint, and rename this
type of tracepoint from being a page class to being a range class.

Fixes: 0b1b213fcf ("xfs: event tracing support")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-05 07:30:54 -08:00
Andreas Gruenbacher
243145bc43 fs: Fix page_mkwrite off-by-one errors
The check in block_page_mkwrite that is meant to determine whether an
offset is within the inode size is off by one.  This bug has been copied
into iomap_page_mkwrite and several filesystems (ubifs, ext4, f2fs,
ceph).

Fix that by introducing a new page_mkwrite_check_truncate helper that
checks for truncate and computes the bytes in the page up to EOF.  Use
the helper in iomap.

NOTE from Darrick: The original patch fixed a number of filesystems, but
then there were merge conflicts with the f2fs for-next tree; a
subsequent re-submission of the patch had different btrfs changes with
no explanation; and Christoph complained that each per-fs fix should be
a separate patch.  In my view that's too much risk to take on, so I
decided to drop all the hunks except for iomap, since I've actually QA'd
XFS.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: drop everything but the iomap parts]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-06 08:58:23 -08:00
Zorro Lang
c275779ff2 iomap: stop using ioend after it's been freed in iomap_finish_ioend()
This patch fixes the following KASAN report. The @ioend has been
freed by dio_put(), but the iomap_finish_ioend() still trys to access
its data.

[20563.631624] BUG: KASAN: use-after-free in iomap_finish_ioend+0x58c/0x5c0
[20563.638319] Read of size 8 at addr fffffc0c54a36928 by task kworker/123:2/22184

[20563.647107] CPU: 123 PID: 22184 Comm: kworker/123:2 Not tainted 5.4.0+ #1
[20563.653887] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
[20563.664499] Workqueue: xfs-conv/sda5 xfs_end_io [xfs]
[20563.669547] Call trace:
[20563.671993]  dump_backtrace+0x0/0x370
[20563.675648]  show_stack+0x1c/0x28
[20563.678958]  dump_stack+0x138/0x1b0
[20563.682455]  print_address_description.isra.9+0x60/0x378
[20563.687759]  __kasan_report+0x1a4/0x2a8
[20563.691587]  kasan_report+0xc/0x18
[20563.694985]  __asan_report_load8_noabort+0x18/0x20
[20563.699769]  iomap_finish_ioend+0x58c/0x5c0
[20563.703944]  iomap_finish_ioends+0x110/0x270
[20563.708396]  xfs_end_ioend+0x168/0x598 [xfs]
[20563.712823]  xfs_end_io+0x1e0/0x2d0 [xfs]
[20563.716834]  process_one_work+0x7f0/0x1ac8
[20563.720922]  worker_thread+0x334/0xae0
[20563.724664]  kthread+0x2c4/0x348
[20563.727889]  ret_from_fork+0x10/0x18

[20563.732941] Allocated by task 83403:
[20563.736512]  save_stack+0x24/0xb0
[20563.739820]  __kasan_kmalloc.isra.9+0xc4/0xe0
[20563.744169]  kasan_slab_alloc+0x14/0x20
[20563.747998]  slab_post_alloc_hook+0x50/0xa8
[20563.752173]  kmem_cache_alloc+0x154/0x330
[20563.756185]  mempool_alloc_slab+0x20/0x28
[20563.760186]  mempool_alloc+0xf4/0x2a8
[20563.763845]  bio_alloc_bioset+0x2d0/0x448
[20563.767849]  iomap_writepage_map+0x4b8/0x1740
[20563.772198]  iomap_do_writepage+0x200/0x8d0
[20563.776380]  write_cache_pages+0x8a4/0xed8
[20563.780469]  iomap_writepages+0x4c/0xb0
[20563.784463]  xfs_vm_writepages+0xf8/0x148 [xfs]
[20563.788989]  do_writepages+0xc8/0x218
[20563.792658]  __writeback_single_inode+0x168/0x18f8
[20563.797441]  writeback_sb_inodes+0x370/0xd30
[20563.801703]  wb_writeback+0x2d4/0x1270
[20563.805446]  wb_workfn+0x344/0x1178
[20563.808928]  process_one_work+0x7f0/0x1ac8
[20563.813016]  worker_thread+0x334/0xae0
[20563.816757]  kthread+0x2c4/0x348
[20563.819979]  ret_from_fork+0x10/0x18

[20563.825028] Freed by task 22184:
[20563.828251]  save_stack+0x24/0xb0
[20563.831559]  __kasan_slab_free+0x10c/0x180
[20563.835648]  kasan_slab_free+0x10/0x18
[20563.839389]  slab_free_freelist_hook+0xb4/0x1c0
[20563.843912]  kmem_cache_free+0x8c/0x3e8
[20563.847745]  mempool_free_slab+0x20/0x28
[20563.851660]  mempool_free+0xd4/0x2f8
[20563.855231]  bio_free+0x33c/0x518
[20563.858537]  bio_put+0xb8/0x100
[20563.861672]  iomap_finish_ioend+0x168/0x5c0
[20563.865847]  iomap_finish_ioends+0x110/0x270
[20563.870328]  xfs_end_ioend+0x168/0x598 [xfs]
[20563.874751]  xfs_end_io+0x1e0/0x2d0 [xfs]
[20563.878755]  process_one_work+0x7f0/0x1ac8
[20563.882844]  worker_thread+0x334/0xae0
[20563.886584]  kthread+0x2c4/0x348
[20563.889804]  ret_from_fork+0x10/0x18

[20563.894855] The buggy address belongs to the object at fffffc0c54a36900
                which belongs to the cache bio-1 of size 248
[20563.906844] The buggy address is located 40 bytes inside of
                248-byte region [fffffc0c54a36900, fffffc0c54a369f8)
[20563.918485] The buggy address belongs to the page:
[20563.923269] page:ffffffff82f528c0 refcount:1 mapcount:0 mapping:fffffc8e4ba31900 index:0xfffffc0c54a33300
[20563.932832] raw: 17ffff8000000200 ffffffffa3060100 0000000700000007 fffffc8e4ba31900
[20563.940567] raw: fffffc0c54a33300 0000000080aa0042 00000001ffffffff 0000000000000000
[20563.948300] page dumped because: kasan: bad access detected

[20563.955345] Memory state around the buggy address:
[20563.960129]  fffffc0c54a36800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[20563.967342]  fffffc0c54a36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[20563.974554] >fffffc0c54a36900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[20563.981766]                                   ^
[20563.986288]  fffffc0c54a36980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[20563.993501]  fffffc0c54a36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[20564.000713] ==================================================================

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205703
Signed-off-by: Zorro Lang <zlang@redhat.com>
Fixes: 9cd0ed63ca ("iomap: enhance writeback error message")
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-12-05 07:41:16 -08:00
Christoph Hellwig
1cea335d1d iomap: fix sub-page uptodate handling
bio completions can race when a page spans more than one file system
block.  Add a spinlock to synchronize marking the page uptodate.

Fixes: 9dc55f1389 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-04 09:33:52 -08:00
Andreas Gruenbacher
add66fcbd3 iomap: Fix overflow in iomap_page_mkwrite
On architectures where loff_t is wider than pgoff_t, the expression
((page->index + 1) << PAGE_SHIFT) can overflow.  Rewrite to use the page
offset, which we already compute here anyway.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-07 07:28:18 -08:00
Goldwyn Rodrigues
c039b99792 iomap: use a srcmap for a read-modify-write I/O
The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target.  The srcmap is only supported for buffered writes so far.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
      srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
32a38a4991 iomap: use write_begin to read pages to unshare
Use the existing iomap write_begin code to read the pages unshared
by iomap_file_unshare.  That avoids the extra ->readpage call and
extent tree lookup currently done by read_mapping_page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
d3b4043969 iomap: move the zeroing case out of iomap_read_page_sync
That keeps the function a little easier to understand, and easier to
modify for pending enhancements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
3590c4d897 iomap: ignore non-shared or non-data blocks in xfs_file_dirty
xfs_file_dirty is used to unshare reflink blocks.  Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing.  This will allow to simplify
the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
dcd6158d15 iomap: always use AOP_FLAG_NOFS in iomap_write_begin
All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
to allow reusing the flags arguments for an internal flags namespace
soon.  Also remove the local index variable that is only used once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
c12d6fa88d iomap: remove the unused iomap argument to __iomap_write_end
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Darrick J. Wong
9cd0ed63ca iomap: enhance writeback error message
If we encounter an IO error during writeback, log the inode, offset, and
sector number of the failure, instead of forcing the user to do some
sort of reverse mapping to figure out which file is affected.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
48d64cd18b iomap: pass a struct page to iomap_finish_page_writeback
No need to pass the full bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
b3d423ec89 iomap: cleanup iomap_ioend_compare
Move the initialization of ia and ib to the declaration line and remove
a superflous else.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
ab08b01ec0 iomap: move struct iomap_page out of iomap.h
Now that all the writepage code is in the iomap code there is no
need to keep this structure public.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
3e19e6f3ee iomap: warn on inline maps in iomap_writepage_map
And inline mapping should never mark the page dirty and thus never end up
in writepages.  Add a check for that condition and warn if it happens.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
598ecfbaa7 iomap: lift the xfs writeback code to iomap
Take the xfs writeback code and move it to fs/iomap.  A new structure
with three methods is added as the abstraction from the generic writeback
code to the file system.  These methods are used to map blocks, submit an
ioend, and cancel a page that encountered an error before it was added to
an ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it
does]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
9e91c5728c iomap: lift common tracing code from xfs to iomap
Lift the xfs code for tracing address space operations to the iomap
layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig
009d8d849d iomap: zero newly allocated mapped blocks
File systems like gfs2 don't support delayed allocations or unwritten
extents and thus allocate normal mapped blocks to fill holes.  To
cover the case of such file systems allocating new blocks to fill holes
also zero out mapped blocks with the new flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Linus Torvalds
26473f8370 Also new for 5.3:
- Regroup the fs/iomap.c code by major functional area so that we can
   start development for 5.4 from a more stable base.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl0vMvMACgkQ+H93GTRK
 tOtgsw//Xrqy6pYnohvltKkmE2Ioo17Ylctg15MZpicxSREyozSntdUbPJ8Hv3qF
 uM80Z9PJh/XzlTbDbQ+bvEj6kAQxClGmcoKn8vBScW0LBqRz5rMwhJE2C8hyRx08
 hf310FPnZnyJK7jWGjZFhg1EsIqzQD8TZVNt4+sT/Kz/dWglkeT5sXJtoGTT8WI2
 Rgx8U8AYdpjaKfUf7X7ab68krYBNOrUS6vRp+4sfts6s7y4zILOom2QdDblwWT54
 pruq6iS4+2gyf4Pl7HXYT2A17R/coTb0AOrWNC3Sg0W4I6gdfoTXeten7jUVgXvl
 eXKOPHYYXqJadvdjPx7+DFW7sy6RSP8xe/KUp9uiEOW4dmKqxTrEoxYgFNBXgjwC
 FBUwgc2vhAw8o3P+/NcfbqYWwF/2fDvDBTQZ3kdwpmrFQqzhDyRxr5hPrhObuo5r
 wAJgP8F4M5KKdos0lg9jR4cirrInEzUOeHaLhFC+d9cFMNcxRo8ddx5KriMHVvuA
 JWgeXWvRKL3nPtbnyLRVxeEGmjhjwMkntKaCPqgD4FOD1+CGUuBtzykcPMbGfSS0
 sZd/qEJ6lZqYKRxee/R1d5RkJx+86TG3ZdWvuc49zSYavMLuqG/l2ohmfQ1P03nA
 Ux+8Bg6BbMGzlkVPXgiogHBN6ro2ZrjsHzu8E6+IuEXeL3NIC8A=
 =3uGR
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap split/cleanup from Darrick Wong:
 "As promised, here's the second part of the iomap merge for 5.3, in
  which we break up iomap.c into smaller files grouped by functional
  area so that it'll be easier in the long run to maintain cohesiveness
  of code units and to review incoming patches. There are no functional
  changes and fs/iomap.c split cleanly.

  Summary:

   - Regroup the fs/iomap.c code by major functional area so that we can
     start development for 5.4 from a more stable base"

* tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: move internal declarations into fs/iomap/
  iomap: move the main iteration code into a separate file
  iomap: move the buffered IO code into a separate file
  iomap: move the direct IO code into a separate file
  iomap: move the SEEK_HOLE code into a separate file
  iomap: move the file mapping reporting code into a separate file
  iomap: move the swapfile code into a separate file
  iomap: start moving code to fs/iomap/
2019-07-19 11:38:12 -07:00
Darrick J. Wong
afc51aaa22 iomap: move the buffered IO code into a separate file
Move the buffered IO code into a separate file so that we can group
related functions in a single file instead of having a single enormous
source file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-17 07:16:00 -07:00