Commit Graph

4006 Commits

Author SHA1 Message Date
David Howells
fc9de52de3 rxrpc: Fix missing locking causing hanging calls
If a call gets aborted (e.g. because kafs saw a signal) between it being
queued for connection and the I/O thread picking up the call, the abort
will be prioritised over the connection and it will be removed from
local->new_client_calls by rxrpc_disconnect_client_call() without a lock
being held.  This may cause other calls on the list to disappear if a race
occurs.

Fix this by taking the client_call_lock when removing a call from whatever
list its ->wait_link happens to be on.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Fixes: 9d35d880e0 ("rxrpc: Move client call connection to the I/O thread")
Link: https://patch.msgid.link/726660.1730898202@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07 11:30:34 -08:00
Jaewon Kim
1f2d03cc53 vmscan: add a vmscan event for reclaim_pages
reclaim_folio_list uses a dummy reclaim_stat and is not being used.  To
know the memory stat, add a new trace event.  This is useful how how many
pages are not reclaimed or why.

This is an example:

mm_vmscan_reclaim_pages: nid=0 nr_scanned=112 nr_reclaimed=112 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=0 nr_unmap_fail=0

Currently reclaim_folio_list is only called by reclaim_pages, and
reclaim_pages is used by damon and madvise.  In the latest Android,
reclaim_pages is also used by shmem to reclaim all pages in a
address_space.

[jaewon31.kim@samsung.com: use sc.nr_scanned rather than new counting]
  Link: https://lkml.kernel.org/r/20241016143227.961162-1-jaewon31.kim@samsung.com
Link: https://lkml.kernel.org/r/20241011124928.1224813-1-jaewon31.kim@samsung.com
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jaewon Kim <jaewon31.kim@samsung.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06 20:11:13 -08:00
Shakeel Butt
0aa3ef3637 memcg: add tracing for memcg stat updates
The memcg stats are maintained in rstat infrastructure which provides very
fast updates side and reasonable read side.  However memcg added plethora
of stats and made the read side, which is cgroup rstat flush, very slow. 
To solve that, threshold was added in the memcg stats read side i.e.  no
need to flush the stats if updates are within the threshold.

This threshold based improvement worked for sometime but more stats were
added to memcg and also the read codepath was getting triggered in the
performance sensitive paths which made threshold based ratelimiting
ineffective.  We need more visibility into the hot and cold stats i.e. 
stats with a lot of updates.  Let's add trace to get that visibility.

[shakeel.butt@linux.dev: use unsigned long type for memcg_rstat_events, per Yosry]
  Link: https://lkml.kernel.org/r/20241015213721.3804209-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20241010003550.3695245-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: JP Kobryn <inwardvessel@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06 20:11:12 -08:00
Alice Ryhl
91d39024e1 rust: samples: add tracepoint to Rust sample
This updates the Rust printing sample to invoke a tracepoint. This
ensures that we have a user in-tree from the get-go even though the
patch is being merged before its real user.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: " =?utf-8?q?Bj=C3=B6rn_Roy_Baron?= " <bjorn3_gh@protonmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tianrui Zhao <zhaotianrui@loongson.cn>
Link: https://lore.kernel.org/20241030-tracepoint-v12-3-eec7f0f8ad22@google.com
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-04 16:21:44 -05:00
Alice Ryhl
ad37bcd965 rust: add tracepoint support
Make it possible to have Rust code call into tracepoints defined by C
code. It is still required that the tracepoint is declared in a C
header, and that this header is included in the input to bindgen.

Instead of calling __DO_TRACE directly, the exported rust_do_trace_
function calls an inline helper function. This is because the `cond`
argument does not exist at the callsite of DEFINE_RUST_DO_TRACE.

__DECLARE_TRACE always emits an inline static and an extern declaration
that is only used when CREATE_RUST_TRACE_POINTS is set. These should not
end up in the final binary so it is not a problem that they sometimes
are emitted without a user.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Cc: " =?utf-8?q?Bj=C3=B6rn_Roy_Baron?= " <bjorn3_gh@protonmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tianrui Zhao <zhaotianrui@loongson.cn>
Link: https://lore.kernel.org/20241030-tracepoint-v12-2-eec7f0f8ad22@google.com
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-04 16:21:44 -05:00
Mathieu Desnoyers
654ced4a13 tracing: Introduce tracepoint_is_faultable()
Introduce a "faultable" flag within the extended structure to know
whether a tracepoint needs rcu tasks trace grace period before reclaim.
This can be queried using tracepoint_is_faultable().

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Jordan Rife <jrife@google.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Jordan Rife <jrife@google.com>
Cc: linux-trace-kernel@vger.kernel.org
Link: https://lore.kernel.org/20241031152056.744137-3-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01 14:37:31 -04:00
Linus Torvalds
d56239a82e vfs-6.12-rc6.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZyTGAQAKCRCRxhvAZXjc
 opd6AQCal4omyfS8FYe4VRRZ/0XHouagq99I0U0TAmKkvoKAsgD/XrdE+pSTEkPX
 Pv4T9phh1cZRxcyKVu77UoYkuHJEDAg=
 =Lu9R
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull filesystem fixes from Christian Brauner:
 "VFS:

   - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set

   - Add a get_tree_bdev_flags() helper that allows to modify e.g.,
     whether errors are logged into the filesystem context during
     superblock creation. This is used by erofs to fix a userspace
     regression where an error is currently logged when its used on a
     regular file which is an new allowed mode in erofs.

  netfs:

   - Fix the sysfs debug path in the documentation.

   - Fix iov_iter_get_pages*() for folio queues by skipping the page
     extracation if we're at the end of a folio.

  afs:

   - Fix moving subdirectories to different parent directory.

  autofs:

   - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
     validate_dev_ioctl(). The actual ioctl number, not the ioctl
     command needs to be checked for autofs"

* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
  autofs: fix thinko in validate_dev_ioctl()
  iov_iter: Fix iov_iter_get_pages*() for folio_queue
  afs: Fix missing subdir edit when renamed between parent dirs
  doc: correcting the debug path for cachefiles
  erofs: use get_tree_bdev_flags() to avoid misleading messages
  fs/super.c: introduce get_tree_bdev_flags()
2024-11-01 07:37:10 -10:00
Avadhut Naik
d4fca1358e x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers:
MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition to the
existing MCA_SYND register. The data within these registers is considered
valid if MCA_STATUS[SyndV] is set.

Userspace error decoding tools like rasdaemon gather related hardware error
information through the tracepoints.

Therefore, export these two registers through the mce_record tracepoint so
that tools like rasdaemon can parse them and output the supplemental error
information like FRU text contained in them.

  [ bp: Massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com
2024-10-31 10:36:07 +01:00
Steven Rostedt
e52750fb14 tracing: Add __print_dynamic_array() helper
When printing a dynamic array in a trace event, the method is rather ugly.
It has the format of:

  __print_array(__get_dynamic_array(array),
            __get_dynmaic_array_len(array) / el_size, el_size)

Since dynamic arrays are known to the tracing infrastructure, create a
helper macro that does the above for you.

  __print_dynamic_array(array, el_size)

Which would expand to the same output.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-3-avadhut.naik@amd.com
2024-10-30 17:24:32 +01:00
Avadhut Naik
750fd23926 x86/mce: Add wrapper for struct mce to export vendor specific info
Currently, exporting new additional machine check error information
involves adding new fields for the same at the end of the struct mce.
This additional information can then be consumed through mcelog or
tracepoint.

However, as new MSRs are being added (and will be added in the future)
by CPU vendors on their newer CPUs with additional machine check error
information to be exported, the size of struct mce will balloon on some
CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
different CPU vendors may export the additional information in varying
sizes.

The problem particularly intensifies since struct mce is exposed to
userspace as part of UAPI. It's bloating through vendor-specific data
should be avoided to limit the information being sent out to userspace.

Add a new structure mce_hw_err to wrap the existing struct mce. The same
will prevent its ballooning since vendor-specifc data, if any, can now be
exported through a union within the wrapper structure and through
__dynamic_array in mce_record tracepoint.

Furthermore, new internal kernel fields can be added to the wrapper
struct without impacting the user space API.

  [ bp: Restore reverse x-mas tree order of function vars declarations. ]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-2-avadhut.naik@amd.com
2024-10-30 17:18:59 +01:00
Pavel Begunkov
2946f08ae9 io_uring: clean up cqe trace points
We have too many helpers posting CQEs, instead of tracing completion
events before filling in a CQE and thus having to pass all the data,
set the CQE first, pass it to the tracing helper and let it extract
everything it needs.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b83c1ca9ee5aed2df0f3bb743bf5ed699cce4c86.1729267437.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:27 -06:00
Sean Anderson
68b6dbf1f4 dma-mapping: trace more error paths
It can be surprising to the user if DMA functions are only traced on
success. On failure, it can be unclear what the source of the problem
is. Fix this by tracing all functions even when they fail. Cases where
we BUG/WARN are skipped, since those should be sufficiently noisy
already.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29 08:54:06 +01:00
Sean Anderson
c4484ab86e dma-mapping: use trace_dma_alloc for dma_alloc* instead of using trace_dma_map
In some cases, we use trace_dma_map to trace dma_alloc* functions. This
generally follows dma_debug. However, this does not record all of the
relevant information for allocations, such as GFP flags. Create new
dma_alloc tracepoints for these functions. Note that while
dma_alloc_noncontiguous may allocate discontiguous pages (from the CPU's
point of view), the device will only see one contiguous mapping.
Therefore, we just need to trace dma_addr and size.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29 08:54:03 +01:00
Sean Anderson
3afff779a7 dma-mapping: trace dma_alloc/free direction
In preparation for using these tracepoints in a few more places, trace
the DMA direction as well. For coherent allocations this is always
bidirectional.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29 08:54:00 +01:00
Sean Anderson
5af5fc895f dma-mapping: use macros to define events in a class
Use a macro to avoid repeating the parameters and arguments for each event
in a class.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29 08:53:57 +01:00
Uwe Kleine-König
acf2b31489 pwm: Support for duty_offset
Support a new abstraction for pwm configuration that allows to specify
 the time between start of period and the raising edge of the signal
 ("duty offset").
 
 This is used in a patch series by Trevor Gamblin for triggering an ADC
 conversion and afterwards read out the result. See
 https://lore.kernel.org/linux-iio/20240909-ad7625_r1-v5-0-60a397768b25@baylibre.com/
 for more details.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEP4GsaTp6HlmJrf7Tj4D7WH0S/k4FAmcDeZsACgkQj4D7WH0S
 /k7gqwf+LcXVzZ9APuhh7hYVMBKvM0f0VhihGBlTS/1hXdA/807Ooe0+VIlxAdBi
 im6nIQzS6JWwkowYro0MRB5JWQgRUwDMdwhIKP8lFU+jRZs+TOjeCGs6bolgw+26
 rLd7hpWTo3m9PD0Hp+y8xQq999ALaBIcAtrJM/Mop7YKa2FJvTwLtirH9rOImDVc
 Vkdx36N870gzOAUSNSghWlSFATJp2fWc7T51XhBLBzVShZQ6cCy9oRJ8mZdPjcf0
 hq8HwhVkKHMZidYo9KZpa10qz5S4diLUt6yr01LcmSNGgoqsWHWPFhcmAn2j64ok
 pmC8NxY0HSgwgkxQxjDmBZe3eW2Wiw==
 =C4Cy
 -----END PGP SIGNATURE-----

Merge tag 'pwm/duty_offset-for-6.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

pwm: Support for duty_offset

Support a new abstraction for pwm configuration that allows to specify
the time between start of period and the raising edge of the signal
("duty offset").

This is used in a patch series by Trevor Gamblin for triggering an ADC
conversion and afterwards read out the result. See
https://lore.kernel.org/linux-iio/20240909-ad7625_r1-v5-0-60a397768b25@baylibre.com/
for more details.
2024-10-25 11:41:46 +02:00
David Howells
247d65fb12
afs: Fix missing subdir edit when renamed between parent dirs
When rename moves an AFS subdirectory between parent directories, the
subdir also needs a bit of editing: the ".." entry needs updating to point
to the new parent (though I don't make use of the info) and the DV needs
incrementing by 1 to reflect the change of content.  The server also sends
a callback break notification on the subdirectory if we have one, but we
can take care of recovering the promise next time we access the subdir.

This can be triggered by something like:

    mount -t afs %example.com:xfstest.test20 /xfstest.test/
    mkdir /xfstest.test/{aaa,bbb,aaa/ccc}
    touch /xfstest.test/bbb/ccc/d
    mv /xfstest.test/{aaa/ccc,bbb/ccc}
    touch /xfstest.test/bbb/ccc/e

When the pathwalk for the second touch hits "ccc", kafs spots that the DV
is incorrect and downloads it again (so the fix is not critical).

Fix this, if the rename target is a directory and the old and new
parents are different, by:

 (1) Incrementing the DV number of the target locally.

 (2) Editing the ".." entry in the target to refer to its new parent's
     vnode ID and uniquifier.

Link: https://lore.kernel.org/r/3340431.1729680010@warthog.procyon.org.uk
Fixes: 63a4681ff3 ("afs: Locally edit directory data for mkdir/create/unlink/...")
cc: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24 13:50:27 +02:00
Linus Torvalds
7166c32651 vfs-6.12-rc5.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZxY6XAAKCRCRxhvAZXjc
 opmUAQCu4KhzBBdZmFw3AfZFNJvYb1onT4FiU0pnyGgfvzEdEwD6AlnlgQ7DL3ZN
 WBqBzUl+DpGYJfzhkqoEGH89Fagx7QM=
 =mm68
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "afs:
   - Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock

 netfs:
   - Drop the references to a folio immediately after the folio has been
     extracted to prevent races with future I/O collection

   - Fix a documenation build error

   - Downgrade the i_rwsem for buffered writes to fix a cifs reported
     performance regression when switching to netfslib

  vfs:
   - Explicitly return -E2BIG from openat2() if the specified size is
     unexpectedly large. This aligns openat2() with other extensible
     struct based system calls

   - When copying a mount namespace ensure that we only try to remove
     the new copy from the mount namespace rbtree if it has already been
     added to it

  nilfs:
   - Clear the buffer delay flag when clearing the buffer state clags
     when a buffer head is discarded to prevent a kernel OOPs

  ocfs2:
   - Fix an unitialized value warning in ocfs2_setattr()

  proc:
   - Fix a kernel doc warning"

* tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  proc: Fix W=1 build kernel-doc warning
  afs: Fix lock recursion
  fs: Fix uninitialized value issue in from_kuid and from_kgid
  fs: don't try and remove empty rbtree node
  netfs: Downgrade i_rwsem for a buffered write
  nilfs2: fix kernel bug due to missing clearing of buffer delay flag
  openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
  netfs: fix documentation build error
  netfs: In readahead, put the folio refs as soon extracted
2024-10-21 10:48:24 -07:00
Linus Torvalds
10e93e1900 dma-mapping fix for Linux 6.12
Just another small tracing fix from Sean.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmcUhhoLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPzww//XpnJtQ2glWj9MZAbrPAHsBooazaIae1wfxPXCb9u
 EzaookDsTZhVtS0buSH9+EcNC5fWsr7Q7wyazx4cmGl6wOdHZ6mq+YbpxGQmhgWe
 MHDXu/X+rCRsa4cU5X8LUCjWVpKsu0kQE2B3E7M6cYfCWH9r1r9jaW0uXxdTQRWf
 Yi6W43cL/G946aT76wTspFrCfBqLgxypuTPGgehHpbF99sC/eJ6YHzGkOxc+mZ/5
 AatMB/npW8Y0G38yScp6gJZ4XaetJ6hflXoFN1pR7ehDggmZAjM/WfPwFlqZgjbk
 sVL0GjLuE4kbLnXIWX7GzzY/sXlUbebIKkAiYw3uqeo3KchU8/pA2Cqb9qWzdQmf
 FkMJQO7rgj7BvlJnxccDVAZYedkoywdj4Jw/B8hnm5jF355g4tZDmXm+4A88KtHZ
 qnz7pBNdfFumyMEJFwUzOAMWyN2ZDdirb3lrDsCXlIV56h4NH60I6D+cAsX9a+94
 Qao0xLr72jlk4NNDQShYJgHybCVTMMep3Wjkejg/EEZCxdkkyMpSOZXJeBLlxn80
 O2fdRynM5EhG+e28pjFYvU+/zLT0poSRaE+jBfWJLtG9xCFMybWRKtASH7VcaRLQ
 /kDPR51ZttfNYQscVWi7S+R37VWksPLEbFQHSFDvOcwGKgcnFpllDuwv+o62TuUc
 eKk=
 =MnDt
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Just another small tracing fix from Sean"

* tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: fix tracing dma_alloc/free with vmalloc'd memory
2024-10-20 10:56:42 -07:00
Sean Anderson
78b2770c93 dma-mapping: fix tracing dma_alloc/free with vmalloc'd memory
Not all virtual addresses have physical addresses, such as if they were
vmalloc'd. Just trace the virtual address instead of trying to trace a
physical address. This aligns with the API, and is good enough to
associate dma_alloc with dma_free.

Fixes: 038eb433dc ("dma-mapping: add tracing for dma-mapping API calls")
Reported-by: syzbot+b4bfacdec173efaa8567@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/670ebde5.050a0220.d9b66.0154.GAE@google.com/
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-17 17:41:29 +02:00
Yang Shi
37f0b47c51 mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
The "addr" and "is_shmem" arguments have different order in TP_PROTO and
TP_ARGS.  This resulted in the incorrect trace result:

text-hugepage-644429 [276] 392092.878683: mm_khugepaged_collapse_file:
mm=0xffff20025d52c440, hpage_pfn=0x200678c00, index=512, addr=1, is_shmem=0,
filename=text-hugepage, nr=512, result=failed

The value of "addr" is wrong because it was treated as bool value, the
type of is_shmem.

Fix the order in TP_PROTO to keep "addr" is before "is_shmem" since the
original patch review suggested this order to achieve best packing.

And use "lx" for "addr" instead of "ld" in TP_printk because address is
typically shown in hex.

After the fix, the trace result looks correct:

text-hugepage-7291  [004]   128.627251: mm_khugepaged_collapse_file:
mm=0xffff0001328f9500, hpage_pfn=0x20016ea00, index=512, addr=0x400000,
is_shmem=0, filename=text-hugepage, nr=512, result=failed

Link: https://lkml.kernel.org/r/20241012011702.1084846-1-yang@os.amperecomputing.com
Fixes: 4c9473e87e ("mm/khugepaged: add tracepoint to collapse_file()")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Gautam Menghani <gautammenghani201@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>    [6.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-17 00:28:09 -07:00
Christian Brauner
b40508ca5d
Merge patch series "timekeeping/fs: multigrain timestamp redux"
Jeff Layton <jlayton@kernel.org> says:

The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide when to invalidate the cache. Even with NFSv4, a lot of
exported filesystems don't properly support a change attribute and are
subject to the same problems with timestamp granularity. Other
applications have similar issues with timestamps (e.g backup
applications).

If we were to always use fine-grained timestamps, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.

What we need is a way to only use fine-grained timestamps when they are
being actively queried. Use the (unused) top bit in inode->i_ctime_nsec
as a flag that indicates whether the current timestamps have been
queried via stat() or the like. When it's set, we allow the kernel to
use a fine-grained timestamp iff it's necessary to make the ctime show
a different value.

This solves the problem of being able to distinguish the timestamp
between updates, but introduces a new problem: it's now possible for a
file being changed to get a fine-grained timestamp. A file that is
altered just a bit later can then get a coarse-grained one that appears
older than the earlier fine-grained time. This violates timestamp
ordering guarantees.

To remedy this, keep a global monotonic atomic64_t value that acts as a
timestamp floor.  When we go to stamp a file, we first get the latter of
the current floor value and the current coarse-grained time. If the
inode ctime hasn't been queried then we just attempt to stamp it with
that value.

If it has been queried, then first see whether the current coarse time
is later than the existing ctime. If it is, then we accept that value.
If it isn't, then we get a fine-grained time and try to swap that into
the global floor. Whether that succeeds or fails, we take the resulting
floor time, convert it to realtime and try to swap that into the ctime.

We take the result of the ctime swap whether it succeeds or fails, since
either is just as valid.

Filesystems can opt into this by setting the FS_MGTIME fstype flag.
Others should be unaffected (other than being subject to the same floor
value as multigrain filesystems).

* patches from https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org:
  tmpfs: add support for multigrain timestamps
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  Documentation: add a new file documenting multigrain timestamps
  fs: add percpu counters for significant multigrain timestamp events
  fs: tracepoints around multigrain timestamp events
  fs: handle delegated timestamps in setattr_copy_mgtime
  fs: have setattr_copy handle multigrain timestamps appropriately
  fs: add infrastructure for multigrain timestamps

Link: https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-10 10:20:57 +02:00
Jeff Layton
c86e3c4718
fs: tracepoints around multigrain timestamp events
Add some tracepoints around various multigrain timestamp events.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241002-mgtime-v10-6-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-10 10:20:52 +02:00
Mathieu Desnoyers
0850e1bc88 tracing/bpf: Add might_fault check to syscall probes
Add a might_fault() check to validate that the bpf sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-9-mathieu.desnoyers@efficios.com
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org> # BPF parts
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09 17:09:46 -04:00
Mathieu Desnoyers
cdb537ac41 tracing/perf: Add might_fault check to syscall probes
Add a might_fault() check to validate that the perf sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-8-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09 17:09:46 -04:00
Mathieu Desnoyers
a3204c740a tracing/ftrace: Add might_fault check to syscall probes
Add a might_fault() check to validate that the ftrace sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-7-mathieu.desnoyers@efficios.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09 17:09:36 -04:00
Mathieu Desnoyers
4aadde89d8 tracing/bpf: disable preemption in syscall probe
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that bpf can handle this change by
explicitly disabling preemption within the bpf system call tracepoint
probes to respect the current expectations within bpf tracing code.

This change does not yet allow bpf to take page faults per se within its
probe, but allows its existing probes to adapt to the upcoming change.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-5-mathieu.desnoyers@efficios.com
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org> # BPF parts
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09 17:07:43 -04:00
Mathieu Desnoyers
65e7462a16 tracing/perf: disable preemption in syscall probe
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that perf can handle this change by
explicitly disabling preemption within the perf system call tracepoint
probes to respect the current expectations within perf ring buffer code.

This change does not yet allow perf to take page faults per se within
its probe, but allows its existing probes to adapt to the upcoming
change.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-4-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09 17:07:25 -04:00
Mathieu Desnoyers
13d750c2c0 tracing/ftrace: disable preemption in syscall probe
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that ftrace can handle this change by
explicitly disabling preemption within the ftrace system call tracepoint
probes to respect the current expectations within ftrace ring buffer
code.

This change does not yet allow ftrace to take page faults per se within
its probe, but allows its existing probes to adapt to the upcoming
change.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-3-mathieu.desnoyers@efficios.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09 17:07:00 -04:00
Mathieu Desnoyers
0e6caab8db tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL
In preparation for allowing system call tracepoints to handle page
faults, introduce TRACE_EVENT_SYSCALL to declare the sys_enter/sys_exit
tracepoints.

Move the common code between __DECLARE_TRACE and __DECLARE_TRACE_SYSCALL
into __DECLARE_TRACE_COMMON.

This change is not meant to alter the generated code, and only prepares
the following modifications.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-2-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09 17:05:54 -04:00
Steven Rostedt
48bcda6848 tracing: Remove definition of trace_*_rcuidle()
The trace_*_rcuidle() variant of a tracepoint was to handle places where a
tracepoint was located but RCU was not "watching". All those locations
have been removed, and RCU should be watching where all tracepoints are
located. We can now remove the trace_*_rcuidle() variant.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241003181629.36209057@gandalf.local.home
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-08 21:17:39 -04:00
David Howells
796a404964
netfs: In readahead, put the folio refs as soon extracted
netfslib currently defers dropping the ref on the folios it obtains during
readahead to after it has started I/O on the basis that we can do it whilst
we wait for the I/O to complete, but this runs the risk of the I/O
collection racing with this in future.

Furthermore, Matthew Wilcox strongly suggests that the refs should be
dropped immediately, as readahead_folio() does (netfslib is using
__readahead_batch() which doesn't drop the refs).

Fixes: ee4cdf7ba8 ("netfs: Speed up buffered reading")
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/3771538.1728052438@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-07 13:48:22 +02:00
Christian Brauner
9b8e8091c8
Merge patch series "Random netfs folio fixes"
Matthew Wilcox (Oracle) <willy@infradead.org> says:

A few minor fixes; nothing earth-shattering.

Matthew Wilcox (Oracle) (3):
  netfs: Remove call to folio_index()
  netfs: Fix a few minor bugs in netfs_page_mkwrite()
  netfs: Remove unnecessary references to pages

Link: https://lore.kernel.org/r/20241005182307.3190401-1-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-07 13:45:29 +02:00
Matthew Wilcox (Oracle)
fcd4904e2f
netfs: Remove call to folio_index()
Calling folio_index() is pointless overhead; directly dereferencing
folio->index is fine.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20241005182307.3190401-2-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-07 13:45:15 +02:00
Linus Torvalds
79eb2c07af for-6.12-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmb/9agACgkQxWXV+ddt
 WDtKFA/5AW88osk/1k/NVhOvOa0xPr5XyDLq1n8Gaxfy8uHlHAc8wdsvJzCDMS0M
 qUOD/tOPhRI0HGXPiKD767erwbyXiZAcCTkSd8x5jlXy1hVjUHQSKO//JxD0vtAZ
 jOscoUA1wJJutopCXcppnoUUFE2753edEg0w2EtUXMpfqivqOmMCR+1ZtKkfaNJo
 oRuZCq3Oi8hu7Wsvmh4Etq/9MvGM+xovXAMAji6Op8nsP1jJlzWztpEUogLOQH2S
 IhDFFxP9shBV9JjV+HSyXcAYr8VArH6HtYjapR9oajCH0pvSjLYQQPq/qQ0//8Hb
 SHr4YP6RBbpEIvvaQeA7vwsckBHNBOrSAbEgRQ2+zdmiwha6SRIEVyXYh5LUwcYT
 WnVozbk7ZX9rD1jOVhvgouFG6vUz8A6/qt3BD028bVcyMvBXW4gsEduCMVlFGmlN
 D6+hNY6J08j4HUEGnPk7fAYi/lk5OEK1p5yarUgsOQ3GqWWS0ywkrfbmbWwyC+Ff
 AxggFTl9YodU5RMs7EU2GeHkLU6LnXgevk6FFm0JzsLtT/BbEP7pj6tOot4Msl1e
 2ovqFiSbuPNg5Wr70ZBRO9LDIAYtTZy1UrVR/YCSLzm+wZsXMelDIHMQfE7+Yodp
 O5Cud23AanRTwjErvNl3X4rhut8rrI1FsR89gDyKK1EPG+mwofo=
 =yslr
 -----END PGP SIGNATURE-----

Merge tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - in incremental send, fix invalid clone operation for file that got
   its size decreased

 - fix __counted_by() annotation of send path cache entries, we do not
   store the terminating NUL

 - fix a longstanding bug in relocation (and quite hard to hit by
   chance), drop back reference cache that can get out of sync after
   transaction commit

 - wait for fixup worker kthread before finishing umount

 - add missing raid-stripe-tree extent for NOCOW files, zoned mode
   cannot have NOCOW files but RST is meant to be a standalone feature

 - handle transaction start error during relocation, avoid potential
   NULL pointer dereference of relocation control structure (reported by
   syzbot)

 - disable module-wide rate limiting of debug level messages

 - minor fix to tracepoint definition (reported by checkpatch.pl)

* tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: disable rate limiting when debug enabled
  btrfs: wait for fixup workers before stopping cleaner kthread during umount
  btrfs: fix a NULL pointer dereference when failed to start a new trasacntion
  btrfs: send: fix invalid clone operation for file that got its size decreased
  btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class
  btrfs: drop the backref cache during relocation if we commit
  btrfs: also add stripe entries for NOCOW writes
  btrfs: send: fix buffer overflow detection when copying path to cache entry
2024-10-04 10:05:13 -07:00
Christian Brauner
2b2b1a20db
Merge patch series "Introduce tracepoint for hugetlbfs"
Hongbo Li <lihongbo22@huawei.com> says:

Add some basic tracepoints for debugging hugetlbfs: {alloc, free,
evict}_inode, setattr and fallocate.

* patches from https://lore.kernel.org/r/20240829064110.67884-1-lihongbo22@huawei.com:
  hugetlbfs: use tracepoints in hugetlbfs functions.
  hugetlbfs: support tracepoint

Link: https://lore.kernel.org/r/20240829064110.67884-1-lihongbo22@huawei.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 07:52:11 +02:00
Filipe Manana
50c6f6e680 btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class
While running checkpatch.pl against a patch that modifies the
btrfs_qgroup_extent event class, it complained about using a comma instead
of a semicolon:

  $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch
  WARNING: Possible comma where semicolon could be used
  #215: FILE: include/trace/events/btrfs.h:1720:
  +		__entry->bytenr		= bytenr,
		__entry->num_bytes	= rec->num_bytes;

  total: 0 errors, 1 warnings, 184 lines checked

So replace the comma with a semicolon to silence checkpatch and possibly
other tools. It also makes the code consistent with the rest.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:14:10 +02:00
Linus Torvalds
a5f24c7955 vfs-6.12-rc2.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZvqoUgAKCRCRxhvAZXjc
 oveLAQC952BB+giATJvX5lPz400g+MoDVlWiPdqAVbF0PCGRRwEA+/7TedOfOkQx
 Df3iAzDXVbX/WvWYdZ/DyLp/r44WHAw=
 =tGkE
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "afs:

   - Fix setting of the server responding flag

   - Remove unused struct afs_address_list and afs_put_address_list()
     function

   - Fix infinite loop because of unresponsive servers

   - Ensure that afs_retry_request() function is correctly added to the
     afs_req_ops netfs operations table

  netfs:

   - Fix netfs_folio tracepoint handling to handle NULL mappings

   - Add a missing folio_queue API documentation

   - Ensure that netfs_write_folio() correctly advances the iterator via
     iov_iter_advance()

   - Fix a dentry leak during concurrent cull and cookie lookup
     operations in cachefiles

  pidfs:

   - Correctly handle accessing another task's pid namespace"

* tag 'vfs-6.12-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix the netfs_folio tracepoint to handle NULL mapping
  netfs: Add folio_queue API documentation
  netfs: Advance iterator correctly rather than jumping it
  afs: Fix the setting of the server responding flag
  afs: Remove unused struct and function prototype
  afs: Fix possible infinite loop with unresponsive servers
  pidfs: check for valid pid namespace
  afs: Fix missing wire-up of afs_retry_request()
  cachefiles: fix dentry leak in cachefiles_open_file()
2024-09-30 10:59:44 -07:00
David Howells
f801850bc2
netfs: Fix the netfs_folio tracepoint to handle NULL mapping
Fix the netfs_folio tracepoint to handle folios that have a NULL mapping
pointer.  In such a case, just substitute a zero inode number.

Fixes: c38f4e96e6 ("netfs: Provide func to copy data to pagecache for buffered write")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/2917423.1727697556@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-30 14:11:05 +02:00
Linus Torvalds
b81b78dacc dma-mapping fixes for Linux 6.12
- handle chained SGLs in the new tracing code (Christoph Hellwig)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmb47voLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMqAw//TCBIiBNBwjWipVBXmvizu0MaB+PU8ZOjXZoeEdlN
 WeGSPvuW2lWJxjBblPcFA3MFO9pztO9E94dayu5/jV5QP4G+EzyCJDoZOQu8iHYX
 YzaIgkt0W+vKEFmvAqfrHuPHHO0mZ00qaBmj/r+cayWqBBvWPLbj7kU7+sFVG4it
 lP2QGGYGV6Ryvkwcft+Got4oIKmyuTsS4i4Cq17WNuhAxDMZftnVPIuUJ9Q00xza
 zDSwKgAOoPqaf3r026MqahpEECAQP0S7uqH10I+MJe7AbREbO7GSHcx4YWDhmyLd
 KFJ+Wv67H8voCsvdH76zANVLE6S5YMyAwMAJSRQxuVSjqrIM0vyyQ4jeRZCHibva
 NvYnv/xRRF+BKyNthNAz1m69K3y8UV5gvP/otezHInGd5/b527UoLXdFl+wQvdJm
 aJwCYSFgQbAsoAh4kWCUmYXGwc1h6aQQ3pcHD118yf9Glzjl4vwHOyF12GigfCdQ
 cbNo+ceacO3rg/H85zMl8OCMtnPwAlNAU+16MY0JwefKKs6rmRyQeTreQKjjdhTm
 /0FVKq1PXJeOdMBqSnrN/tJBIEJxtQlUqE7mx7B2OnczvNpaO4THGlo181nnzPPd
 HTb9fXzBJeSSbDGWF8W6MZIleDj+s4V5Z6Qqi0cW9F7Yn4dCw9ghBBkDZtcm+hkb
 +q0=
 =zjTS
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:

 - handle chained SGLs in the new tracing code (Christoph Hellwig)

* tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: fix DMA API tracing for chained scatterlists
2024-09-29 09:35:10 -07:00
Uwe Kleine-König
1afd01db1a pwm: Add tracing for waveform callbacks
This adds trace events for the recently introduced waveform callbacks.
With the introduction of some helper macros consistency among the
different events is ensured.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/1d71879b0de3bf01459c7a9d0f040d43eb5ace56.1726819463.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2024-09-28 15:13:56 +02:00
Christoph Hellwig
bfc4a245a7 dma-mapping: fix DMA API tracing for chained scatterlists
scatterlist allocations can be chained, and thus all iterations need to
use the chain-aware iterators.  Switch the newly added tracing to use the
proper iterators so that they work with chained scatterlists.

Fixes: 038eb433dc ("dma-mapping: add tracing for dma-mapping API calls")
Reported-by: syzbot+95e4ef83a3024384ec7a@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sean Anderson <sean.anderson@linux.dev>
Tested-by: syzbot+95e4ef83a3024384ec7a@syzkaller.appspotmail.com
2024-09-26 20:08:12 +02:00
Linus Torvalds
79952bdcbc f2fs-6.12-rc1
In this series, the main changes include 1) converting major IO paths to use
 folio, and 2) adding various knobs to control GC more flexibly for Zoned
 devices. In addition, there are several patches to address corner cases of
 atomic file operations and better support for file pinning on zoned device.
 
 Enhancement:
  - add knobs to tune foreground/background GCs for Zoned devices
  - convert IO paths to use folio
  - reduce expensive checkpoint trigger frequency
  - allow F2FS_IPU_NOCACHE for pinned file
  - forcibly migrate to secure space for zoned device file pinning
  - get rid of buffer_head use
  - add write priority option based on zone UFS
  - get rid of online repair on corrupted directory
 
 Bug fix:
  - fix to don't panic system for no free segment fault injection
  - fix to don't set SB_RDONLY in f2fs_handle_critical_error()
  - avoid unused block when dio write in LFS mode
  - compress: don't redirty sparse cluster during {,de}compress
  - check discard support for conventional zones
  - atomic: prevent atomic file from being dirtied before commit
  - atomic: fix to check atomic_file in f2fs ioctl interfaces
  - atomic: fix to forbid dio in atomic_file
  - atomic: fix to truncate pagecache before on-disk metadata truncation
  - atomic: create COW inode from parent dentry
  - atomic: fix to avoid racing w/ GC
  - atomic: require FMODE_WRITE for atomic write ioctls
  - fix to wait page writeback before setting gcing flag
  - fix to avoid racing in between read and OPU dio write, dio completion
  - fix several potential integer overflows in file offsets and dir_block_index
  - fix to avoid use-after-free in f2fs_stop_gc_thread()
 
 As usual, there are several code clean-ups and refactorings.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmbyJn8ACgkQQBSofoJI
 UNJz9Q/+LDDJjD6xh0Fs6H2NeltFNbuNmS79kN5oG0xfjIAiKXE1lsw2n2gwrDKv
 EHKUPa2D4Rztckp8EFF6/st2SXVXH5U7YY2z5jkIUFccbeod+CrK9AGHjJe54iXL
 D0ulbgE2jR8uuwAkNEooNJK1a5ZhZLVy+fXknNIgKoqx31YYE+mKOJaaJFbCxvNT
 grZdH9ApweJB8L4A4ebwIWyBy8Bh4lhr2d6ngsx6HA5TFA2Ay0V9kaoZrLPZvJhv
 3qJ+xu3oeGJbP4e5h5g9omafBskI1pfEE6/sY94o1Zy5Ahx3iCR6U/qehtyyU3TF
 5QLoMXTvIz0MkRuBaW1XxVDpFevVzUfYmbLycuxjArBtjHnvsdh12DKT1Pk5BDZ4
 GgkUyt4pK4PYyEZFtayCleLZljSRzKzi+Y9XEs82z01s41mvx71kz44bR8SPcb1Q
 D4VOJld4O4qMmNrZhhwW8sj4UiDVgliURwmpiZwz9zT9fXU/ZPD1gThcfSWJZ/53
 rrx87e1Bnyk/cMuN/gxEdVV20nggxng4hl2oDcUzBBV1G1R9I3RZJWQt/YFXpB0O
 Whv5pJkV8BZXFWoRmm9cpWe0MslRRhsKBPzcKmlowy/lYdgjpQTmh7TSJ1Teh+2Y
 r77XI31Y/ACaKDJsRmUVbtqdM3N/88N97Fa52wOByK0PjMbgM0E=
 =EKzY
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "The main changes include converting major IO paths to use folio, and
  adding various knobs to control GC more flexibly for Zoned devices.

  In addition, there are several patches to address corner cases of
  atomic file operations and better support for file pinning on zoned
  device.

  Enhancement:
   - add knobs to tune foreground/background GCs for Zoned devices
   - convert IO paths to use folio
   - reduce expensive checkpoint trigger frequency
   - allow F2FS_IPU_NOCACHE for pinned file
   - forcibly migrate to secure space for zoned device file pinning
   - get rid of buffer_head use
   - add write priority option based on zone UFS
   - get rid of online repair on corrupted directory

  Bug fixes:
   - fix to don't panic system for no free segment fault injection
   - fix to don't set SB_RDONLY in f2fs_handle_critical_error()
   - avoid unused block when dio write in LFS mode
   - compress: don't redirty sparse cluster during {,de}compress
   - check discard support for conventional zones
   - atomic: prevent atomic file from being dirtied before commit
   - atomic: fix to check atomic_file in f2fs ioctl interfaces
   - atomic: fix to forbid dio in atomic_file
   - atomic: fix to truncate pagecache before on-disk metadata truncation
   - atomic: create COW inode from parent dentry
   - atomic: fix to avoid racing w/ GC
   - atomic: require FMODE_WRITE for atomic write ioctls
   - fix to wait page writeback before setting gcing flag
   - fix to avoid racing in between read and OPU dio write, dio completion
   - fix several potential integer overflows in file offsets and dir_block_index
   - fix to avoid use-after-free in f2fs_stop_gc_thread()

  As usual, there are several code clean-ups and refactorings"

* tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
  f2fs: allow F2FS_IPU_NOCACHE for pinned file
  f2fs: forcibly migrate to secure space for zoned device file pinning
  f2fs: remove unused parameters
  f2fs: fix to don't panic system for no free segment fault injection
  f2fs: fix to don't set SB_RDONLY in f2fs_handle_critical_error()
  f2fs: add valid block ratio not to do excessive GC for one time GC
  f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent
  f2fs: do FG_GC when GC boosting is required for zoned devices
  f2fs: increase BG GC migration window granularity when boosted for zoned devices
  f2fs: add reserved_segments sysfs node
  f2fs: introduce migration_window_granularity
  f2fs: make BG GC more aggressive for zoned devices
  f2fs: avoid unused block when dio write in LFS mode
  f2fs: fix to check atomic_file in f2fs ioctl interfaces
  f2fs: get rid of online repaire on corrupted directory
  f2fs: prevent atomic file from being dirtied before commit
  f2fs: get rid of page->index
  f2fs: convert read_node_page() to use folio
  f2fs: convert __write_node_page() to use folio
  f2fs: convert f2fs_write_data_page() to use folio
  ...
2024-09-24 15:12:38 -07:00
Linus Torvalds
d7dfb07d4d firewire updates for v6.12
The batch of changes includes the followwing:
 
 - Replacing tasklet with usual workqueue for isochronous context
 - Replacing IDR with XArray
 - Utilizing guard macro where possible
 - Printing deprecation warning when enabling debug parameter of
   firewire-ohci module
 
 Additionally, it includes a single patch for sound subsystem which the
 subsystem maintainer acked:
 
 - Switching to nonatomic PCM operation
 
 In FireWire subsystem, tasklet has been used as the bottom half of 1394
 OHCi hardIRQ so long. In the recent kernel updates, BH workqueue has
 been available, and some developers have proposed replacing tasklet with
 BH workqueue. While it is fortunate that developers are still considering
 the legacy subsystem, a simple replacement is not necessarily suitable.
 
 As a first step towards dropping tasklet, I've investigated the
 feasibility for 1394 OHCI isochronous context, and concluded that usual
 workqueue is available. In the context, the batch of packets is processed
 in the specific queue, thus the timing jitter caused by task scheduling is
 not so critical. Additionally, DMA transmission can be scheduled
 per-packet basis, therefore the context can be sleep between the operation
 of transmissions. Furthermore, in-kernel protocol implementation involves
 some CPU-bound tasks, which can sometimes consumes CPU time so long. These
 characteristics suggest that usual workqueue is suitable, through BH
 workqueues are not.
 
 The replacement with usual workqueue allows unit drivers to process the
 content of packets in non-atomic context. It brings some reliefs to some
 drivers in sound subsystem that spin-lock is not mandatory anymore during
 isochronous packet processing.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQE66IEYNDXNBPeGKSsLtaWM8LwEwUCZu41yQAKCRCsLtaWM8Lw
 E4Y1AP43vZatH202NNMnbkLSW9axmHe6VHWEwDSsJT80vTbBNAD/WYV62EoQzlk1
 1lzdts11SSqYPhj6tJDuRgqULlNAows=
 =7VMx
 -----END PGP SIGNATURE-----

Merge tag 'firewire-updates-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire updates from Takashi Sakamoto:
 "In the FireWire subsystem, tasklets have been used as the bottom half
  of 1394 OHCi hardIRQ. In recent kernel updates, BH workqueues have
  become available, and some developers have proposed replacing the
  tasklet with a BH workqueue.

  As a first step towards dropping tasklet use, the 1394 OHCI
  isochronous context can use regular workqueues. In this context, the
  batch of packets is processed in the specific queue, thus the timing
  jitter caused by task scheduling is not so critical.

  Additionally, DMA transmission can be scheduled per-packet basis,
  therefore the context can be sleep between the operation of
  transmissions. Furthermore, in-kernel protocol implementation involves
  some CPU-bound tasks, which can sometimes consumes CPU time so long.
  These characteristics suggest that normal workqueues are suitable,
  through BH workqueues are not.

  The replacement with a workqueue allows unit drivers to process the
  content of packets in non-atomic context. It brings some reliefs to
  some drivers in sound subsystem that spin-lock is not mandatory
  anymore during isochronous packet processing.

  Summary:

   - Replace tasklet with workqueue for isochronous context

   - Replace IDR with XArray

   - Utilize guard macro where possible

   - Print deprecation warning when enabling debug parameter of
     firewire-ohci module

   - Switch to nonatomic PCM operation"

* tag 'firewire-updates-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: (55 commits)
  firewire: core: rename cause flag of tracepoints event
  firewire: core: update documentation of kernel APIs for flushing completions
  firewire: core: add helper function to retire descriptors
  Revert "firewire: core: move workqueue handler from 1394 OHCI driver to core function"
  Revert "firewire: core: use mutex to coordinate concurrent calls to flush completions"
  firewire: core: use mutex to coordinate concurrent calls to flush completions
  firewire: core: move workqueue handler from 1394 OHCI driver to core function
  firewire: core: fulfill documentation of fw_iso_context_flush_completions()
  firewire: core: expose kernel API to schedule work item to process isochronous context
  firewire: core: use WARN_ON_ONCE() to avoid superfluous dumps
  ALSA: firewire: use nonatomic PCM operation
  firewire: core: non-atomic memory allocation for isochronous event to user client
  firewire: ohci: operate IT/IR events in sleepable work process instead of tasklet softIRQ
  firewire: core: add local API to queue work item to workqueue specific to isochronous contexts
  firewire: core: allocate workqueue to handle isochronous contexts in card
  firewire: ohci: obsolete direct usage of printk_ratelimit()
  firewire: ohci: deprecate debug parameter
  firewire: core: update fw_device outside of device_find_child()
  firewire: ohci: fix error path to detect initiated reset in TI TSB41BA3D phy
  firewire: core/ohci: minor refactoring for computation of configuration ROM size
  ...
2024-09-23 12:55:27 -07:00
Linus Torvalds
18ba603446 NFSD 6.12 Release Notes
Notable features of this release include:
 
 - Pre-requisites for automatically determining the RPC server thread
   count
 - Clean-up and preparation for supporting LOCALIO, which will be
   merged via the NFS client tree
 - Enhancements and fixes to NFSv4.2 COPY offload
 - A new Python-based tool for generating kernel SunRPC XDR encoding
   and decoding functions, added as an aid for prototyping features
   in protocols based on the Linux kernel's SunRPC implementation.
 
 As always I am grateful to the NFSD contributors, reviewers,
 testers, and bug reporters who participated during this cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmbxg60ACgkQM2qzM29m
 f5d+9A/+LiXAjR3x1vlbGFiMAW3Alixg5wE6AM7M1I/OH/dBCkWU1gzneWYaUXAk
 cIGp5sH2Uco2mFVswOZyQ3tX8T/2PeY+Kx5qrlK5h0bTUoz95AIyLe3LA/4o4CIL
 qMGlLQyVq9UolggPoRdigsDhKVwLcu3hWaG7ykkTquyrOPLBKgzRNSwVKLpFc/0/
 mQToOf6HLjgFkEUR3pmXAMsVq88/BpjHIXeNhx2Z1ekWslSKjrAu2gC0rc6/s9Wi
 JsTtzSdnqefc2jsNVZ8FT+V7mDF1sxrN4SnHruSLhJsd5tL/3HDkiZEvdG2Sh0nH
 zQlDpMpNbZyCvaWs6jgaZeMRiNSSl7q31zXUgX2bkWpL/EnagujZHtLZroUgLQfA
 BO8HhRqdt1wJohiv2aMlFvnlp+GhSH5FdcXv1cT/CmyTNGqbXENqoCUA1OT9kE55
 RvXVCLD4YbmCb5EpjLavhu/NuFOc9l9GitKlhiJlcX86QAu/C1Bu1DOyqgq5G0VW
 Xl/q7xIvNZz0mh7x8kKVV4bQHsm9pnoNz57CZFPahoHg/+BR4u6p8LepowpaHjHj
 Ef62BzYwQtuw0jCyufDea+uCt5CGwUM3Y5iBiQogtnvFK6ie8WwD0QTI2SYpcWZ/
 T6RwDOX5jlMWWmuibSK2STgwkblG3vAmMot0RtEbZILvB/ld9qw=
 =Ybsc
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Notable features of this release include:

   - Pre-requisites for automatically determining the RPC server thread
     count

   - Clean-up and preparation for supporting LOCALIO, which will be
     merged via the NFS client tree

   - Enhancements and fixes to NFSv4.2 COPY offload

   - A new Python-based tool for generating kernel SunRPC XDR encoding
     and decoding functions, added as an aid for prototyping features in
     protocols based on the Linux kernel's SunRPC implementation

  As always I am grateful to the NFSD contributors, reviewers, testers,
  and bug reporters who participated during this cycle"

* tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (57 commits)
  xdrgen: Prevent reordering of encoder and decoder functions
  xdrgen: typedefs should use the built-in string and opaque functions
  xdrgen: Fix return code checking in built-in XDR decoders
  tools: Add xdrgen
  nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
  nfsd: fix initial getattr on write delegation
  nfsd: untangle code in nfsd4_deleg_getattr_conflict()
  nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()
  nfsd: return -EINVAL when namelen is 0
  NFSD: Wrap async copy operations with trace points
  NFSD: Clean up extra whitespace in trace_nfsd_copy_done
  NFSD: Record the callback stateid in copy tracepoints
  NFSD: Display copy stateids with conventional print formatting
  NFSD: Limit the number of concurrent async COPY operations
  NFSD: Async COPY result needs to return a write verifier
  nfsd: avoid races with wake_up_var()
  nfsd: use clear_and_wake_up_bit()
  sunrpc: xprtrdma: Use ERR_CAST() to return
  NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()
  nfsd: call cache_put if xdr_reserve_space returns NULL
  ...
2024-09-23 12:01:45 -07:00
Linus Torvalds
88264981f2 sched_ext: Initial pull request for v6.12
This is the initial pull request of sched_ext. The v7 patchset
 (https://lkml.kernel.org/r/20240618212056.2833381-1-tj@kernel.org) is
 applied on top of tip/sched/core + bpf/master as of Jun 18th.
 
   tip/sched/core 793a62823d1c ("sched/core: Drop spinlocks on contention iff kernel is preempti
 ble")
   bpf/master f6afdaf72a ("Merge branch 'bpf-support-resilient-split-btf'")
 
 Since then, the following pulls were made:
 
 - v6.11-rc1 is pulled to keep up with the mainline.
 
 - tip/sched/core was pulled several times:
 
   - 7b9f6c864a, 0df340ceae, 5ac998574f, 0b1777f0fa: To resolve
     conflicts. See each commit for details on conflicts and their
     resolutions.
 
   - d7b01aef9d: To receive fd03c5b858 ("sched: Rework pick_next_task()")
     and related commits. @prev in added to sched_class->put_prev_task() and
     put_prev_task() is reordered after ->pick_task(), which makes
     sched_class->switch_class() unnecessary. The follow-up commits update
     sched_ext accordingly and drop sched_class->switch_class().
 
 - bpf/master was pulled to receive baebe9aaba ("bpf: allow passing struct
   bpf_iter_<type> as kfunc arguments") and related changes in preparation
   for the DSQ iterator patchset
 
 To obtain the net sched_ext changes, diff against:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.12-base
 
 which is the merge of:
 
   tip/sched/core bc9057da1a ("sched/cpufreq: Use NSEC_PER_MSEC for deadline task")
   bpf/master 2ad6d23f46 ("selftests/bpf: Do not update vmlinux.h unnecessarily")
 
 Since the v7 patchset, the following changes were made:
 
 - cpuperf support which was a part of the v6 patchset was posted separately
   and then applied after reviews.
 
 - cgroup support which was a part of the v6 patchset was posted seprately,
   iterated and then applied.
 
 - Improve integration with sched core.
 
 - Double locking usage in migration paths dropped. Depend on
   TASK_ON_RQ_MIGRATING synchronization instead.
 
 - The BPF scheduler couldn't directly dispatch to the local DSQ of another
   CPU using a SCX_DSQ_LOCAL_ON verdict. This caused difficulties around
   handling non-wakeup enqueues. Updated so that SCX_DSQ_LOCAL_ON can be used
   in the enqueue path too.
 
 - DSQ iterator which was a part of the v6 patchset was posted separately.
   The iterator itself was applied after a couple revisions. The associated
   selective consumption kfunc can use further improvements and is still
   being worked on.
 
 - scx_bpf_dispatch[_vtime]_from_dsq() added to increase flexibility. A task
   can now be transferred between two DSQs from almost any context. This
   involved significant refactoring of migration code.
 
 - Various fixes and improvements.
 
 As the branch is based on top of tip/sched/core + bpf/master, please merge
 after both are applied.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuOSuA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGVZyAQDBU3WPkYKB8gl6a6YQ+/PzBXorOK7mioS9A2iJ
 vBR3FgEAg1vtcss1S+2juWmVq7ItiFNWCqtXzUr/bVmL9CqqDwA=
 =bOOC
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext support from Tejun Heo:
 "This implements a new scheduler class called ‘ext_sched_class’, or
  sched_ext, which allows scheduling policies to be implemented as BPF
  programs.

  The goals of this are:

   - Ease of experimentation and exploration: Enabling rapid iteration
     of new scheduling policies.

   - Customization: Building application-specific schedulers which
     implement policies that are not applicable to general-purpose
     schedulers.

   - Rapid scheduler deployments: Non-disruptive swap outs of scheduling
     policies in production environments"

See individual commits for more documentation, but also the cover letter
for the latest series:

Link: https://lore.kernel.org/all/20240618212056.2833381-1-tj@kernel.org/

* tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (110 commits)
  sched: Move update_other_load_avgs() to kernel/sched/pelt.c
  sched_ext: Don't trigger ops.quiescent/runnable() on migrations
  sched_ext: Synchronize bypass state changes with rq lock
  scx_qmap: Implement highpri boosting
  sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()
  sched_ext: Compact struct bpf_iter_scx_dsq_kern
  sched_ext: Replace consume_local_task() with move_local_task_to_local_dsq()
  sched_ext: Move consume_local_task() upward
  sched_ext: Move sanity check and dsq_mod_nr() into task_unlink_from_dsq()
  sched_ext: Reorder args for consume_local/remote_task()
  sched_ext: Restructure dispatch_to_local_dsq()
  sched_ext: Fix processs_ddsp_deferred_locals() by unifying DTL_INVALID handling
  sched_ext: Make find_dsq_for_dispatch() handle SCX_DSQ_LOCAL_ON
  sched_ext: Refactor consume_remote_task()
  sched_ext: Rename scx_kfunc_set_sleepable to unlocked and relocate
  sched_ext: Add missing static to scx_dump_data
  sched_ext: Add missing static to scx_has_op[]
  sched_ext: Temporarily work around pick_task_scx() being called without balance_scx()
  sched_ext: Add a cgroup scheduler which uses flattened hierarchy
  sched_ext: Add cgroup support
  ...
2024-09-21 09:44:57 -07:00
Linus Torvalds
617a814f14 ALong with the usual shower of singleton patches, notable patch series in
this pull request are:
 
 "Align kvrealloc() with krealloc()" from Danilo Krummrich.  Adds
 consistency to the APIs and behaviour of these two core allocation
 functions.  This also simplifies/enables Rustification.
 
 "Some cleanups for shmem" from Baolin Wang.  No functional changes - mode
 code reuse, better function naming, logic simplifications.
 
 "mm: some small page fault cleanups" from Josef Bacik.  No functional
 changes - code cleanups only.
 
 "Various memory tiering fixes" from Zi Yan.  A small fix and a little
 cleanup.
 
 "mm/swap: remove boilerplate" from Yu Zhao.  Code cleanups and
 simplifications and .text shrinkage.
 
 "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt.  This
 is a feature, it adds new feilds to /proc/vmstat such as
 
     $ grep kstack /proc/vmstat
     kstack_1k 3
     kstack_2k 188
     kstack_4k 11391
     kstack_8k 243
     kstack_16k 0
 
 which tells us that 11391 processes used 4k of stack while none at all
 used 16k.  Useful for some system tuning things, but partivularly useful
 for "the dynamic kernel stack project".
 
 "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
 Teaches kmemleak to detect leaksage of percpu memory.
 
 "mm: memcg: page counters optimizations" from Roman Gushchin.  "3
 independent small optimizations of page counters".
 
 "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
 Hildenbrand.  Improves PTE/PMD splitlock detection, makes powerpc/8xx work
 correctly by design rather than by accident.
 
 "mm: remove arch_make_page_accessible()" from David Hildenbrand.  Some
 folio conversions which make arch_make_page_accessible() unneeded.
 
 "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
 Cleans up and fixes our handling of the resetting of the cgroup/process
 peak-memory-use detector.
 
 "Make core VMA operations internal and testable" from Lorenzo Stoakes.
 Rationalizaion and encapsulation of the VMA manipulation APIs.  With a
 view to better enable testing of the VMA functions, even from a
 userspace-only harness.
 
 "mm: zswap: fixes for global shrinker" from Takero Funaki.  Fix issues in
 the zswap global shrinker, resulting in improved performance.
 
 "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao.  Fill in
 some missing info in /proc/zoneinfo.
 
 "mm: replace follow_page() by folio_walk" from David Hildenbrand.  Code
 cleanups and rationalizations (conversion to folio_walk()) resulting in
 the removal of follow_page().
 
 "improving dynamic zswap shrinker protection scheme" from Nhat Pham.  Some
 tuning to improve zswap's dynamic shrinker.  Significant reductions in
 swapin and improvements in performance are shown.
 
 "mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
 Improvements to the new unaccepted memory feature,
 
 "mm/mprotect: Fix dax puds" from Peter Xu.  Implements mprotect on DAX
 PUDs.  This was missing, although nobody seems to have notied yet.
 
 "Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
 Cleanups and modest performance improvements for the maple tree library
 code.
 
 "memcg: further decouple v1 code from v2" from Shakeel Butt.  Move more
 cgroup v1 remnants away from the v2 memcg code.
 
 "memcg: initiate deprecation of v1 features" from Shakeel Butt.  Adds
 various warnings telling users that memcg v1 features are deprecated.
 
 "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
 Greatly improves the success rate of the mTHP swap allocation.
 
 "mm: introduce numa_memblks" from Mike Rapoport.  Moves various disparate
 per-arch implementations of numa_memblk code into generic code.
 
 "mm: batch free swaps for zap_pte_range()" from Barry Song.  Greatly
 improves the performance of munmap() of swap-filled ptes.
 
 "support large folio swap-out and swap-in for shmem" from Baolin Wang.
 With this series we no longer split shmem large folios into simgle-page
 folios when swapping out shmem.
 
 "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao.  Nice performance
 improvements and code reductions for gigantic folios.
 
 "support shmem mTHP collapse" from Baolin Wang.  Adds support for
 khugepaged's collapsing of shmem mTHP folios.
 
 "mm: Optimize mseal checks" from Pedro Falcato.  Fixes an mprotect()
 performance regression due to the addition of mseal().
 
 "Increase the number of bits available in page_type" from Matthew Wilcox.
 Increases the number of bits available in page_type!
 
 "Simplify the page flags a little" from Matthew Wilcox.  Many legacy page
 flags are now folio flags, so the page-based flags and their
 accessors/mutators can be removed.
 
 "mm: store zero pages to be swapped out in a bitmap" from Usama Arif.  An
 optimization which permits us to avoid writing/reading zero-filled zswap
 pages to backing store.
 
 "Avoid MAP_FIXED gap exposure" from Liam Howlett.  Fixes a race window
 which occurs when a MAP_FIXED operqtion is occurring during an unrelated
 vma tree walk.
 
 "mm: remove vma_merge()" from Lorenzo Stoakes.  Major rotorooting of the
 vma_merge() functionality, making ot cleaner, more testable and better
 tested.
 
 "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.  Minor
 fixups of DAMON selftests and kunit tests.
 
 "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.  Code
 cleanups and folio conversions.
 
 "Shmem mTHP controls and stats improvements" from Ryan Roberts.  Cleanups
 for shmem controls and stats.
 
 "mm: count the number of anonymous THPs per size" from Barry Song.  Expose
 additional anon THP stats to userspace for improved tuning.
 
 "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
 conversions and removal of now-unused page-based APIs.
 
 "replace per-quota region priorities histogram buffer with per-context
 one" from SeongJae Park.  DAMON histogram rationalization.
 
 "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
 Park.  DAMON documentation updates.
 
 "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
 related doc and warn" from Jason Wang: fixes usage of page allocator
 __GFP_NOFAIL and GFP_ATOMIC flags.
 
 "mm: split underused THPs" from Yu Zhao.  Improve THP=always policy - this
 was overprovisioning THPs in sparsely accessed memory areas.
 
 "zram: introduce custom comp backends API" frm Sergey Senozhatsky.  Add
 support for zram run-time compression algorithm tuning.
 
 "mm: Care about shadow stack guard gap when getting an unmapped area" from
 Mark Brown.  Fix up the various arch_get_unmapped_area() implementations
 to better respect guard areas.
 
 "Improve mem_cgroup_iter()" from Kinsey Ho.  Improve the reliability of
 mem_cgroup_iter() and various code cleanups.
 
 "mm: Support huge pfnmaps" from Peter Xu.  Extends the usage of huge
 pfnmap support.
 
 "resource: Fix region_intersects() vs add_memory_driver_managed()" from
 Huang Ying.  Fix a bug in region_intersects() for systems with CXL memory.
 
 "mm: hwpoison: two more poison recovery" from Kefeng Wang.  Teaches a
 couple more code paths to correctly recover from the encountering of
 poisoned memry.
 
 "mm: enable large folios swap-in support" from Barry Song.  Support the
 swapin of mTHP memory into appropriately-sized folios, rather than into
 single-page folios.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
 jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
 =s0T+
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
2024-09-21 07:29:05 -07:00
Chuck Lever
c4de97f7c4 svcrdma: Handle device removal outside of the CM event handler
Synchronously wait for all disconnects to complete to ensure the
transports have divested all hardware resources before the
underlying RDMA device can safely be removed.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20 19:31:03 -04:00
Linus Torvalds
a1d1eb2f57 SCSI misc on 20240919
Updates to the usual drivers (ufs, smartpqi, NCR5380, mac_scsi, lpfc,
 mpi3mr).  There are no user visible core changes and a whole series of
 minor updates and fixes.  The largest core change is probably the
 simplification of the workqueue allocation path.
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZuvd5yYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishV7dAQC+TSlv
 BeNm8W4yAFCXLCwnJh8rT6ZzuBsjsIHH1DPP3wD+IXuIOFf5gVRJGpCNJc/dI082
 /ehSrIdeJxwaNoOOt+Y=
 =SXZD
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, smartpqi, NCR5380, mac_scsi, lpfc,
  mpi3mr).

  There are no user visible core changes and a whole series of minor
  updates and fixes. The largest core change is probably the
  simplification of the workqueue allocation path"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (86 commits)
  scsi: smartpqi: update driver version to 2.1.30-031
  scsi: smartpqi: fix volume size updates
  scsi: smartpqi: fix rare system hang during LUN reset
  scsi: smartpqi: add new controller PCI IDs
  scsi: smartpqi: add counter for parity write stream requests
  scsi: smartpqi: correct stream detection
  scsi: smartpqi: Add fw log to kdump
  scsi: bnx2fc: Remove some unused fields in struct bnx2fc_rport
  scsi: qla2xxx: Remove the unused 'del_list_entry' field in struct fc_port
  scsi: ufs: core: Remove ufshcd_urgent_bkops()
  scsi: core: Remove obsoleted declaration for scsi_driverbyte_string()
  scsi: bnx2i: Remove unused declarations
  scsi: core: Simplify an alloc_workqueue() invocation
  scsi: ufs: Simplify alloc*_workqueue() invocation
  scsi: stex: Simplify an alloc_ordered_workqueue() invocation
  scsi: scsi_transport_fc: Simplify alloc_workqueue() invocations
  scsi: snic: Simplify alloc_workqueue() invocations
  scsi: qedi: Simplify an alloc_workqueue() invocation
  scsi: qedf: Simplify alloc_workqueue() invocations
  scsi: myrs: Simplify an alloc_ordered_workqueue() invocation
  ...
2024-09-19 11:28:51 +02:00
Linus Torvalds
726e2d0cf2 dma-mapping updates for linux 6.12
- support DMA zones for arm64 systems where memory starts at > 4GB
    (Baruch Siach, Catalin Marinas)
  - support direct calls into dma-iommu and thus obsolete dma_map_ops for
    many common configurations (Leon Romanovsky)
  - add DMA-API tracing (Sean Anderson)
  - remove the not very useful return value from various dma_set_* APIs
    (Christoph Hellwig)
  - misc cleanups and minor optimizations (Chen Y, Yosry Ahmed,
    Christoph Hellwig)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmbr2BALHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNheA/6A453SQy2kFvspFRvEp8ztEqtvxwxGLAUMIyvmU+a
 9b37KlMwUnpbMsXK5+KtYdTLRoIvtl89uIkdZq7pYYKj0uoPZvF9QVnKtrJWAvqK
 fFuauokZznuD3ZSd6v6uY4ijb29ImGfx5kZopQf1zWoYLENxM7mWqRU+eqxDozev
 FbyfYhJzMBhpHveen9+Q7PEfi/90ZdEqtJhSK2AOzuV9ZvbYiSFCrcnT/4wM30DS
 2OxjGa8tKcGYZ9ah0rF2V5hboaRuYedTFgXoKfUSJINJkzmBlTXdxVx5Xr3kQtyC
 7S/xv2y79CXkDKck2+IY7xkhwwBsXPrTAyTzWAIJqOEmaMJ4KqEW54JOsK+VHfmO
 29UKBnASOK0xvfCzakm2631iOzEZF743RgpQiOGeMcnph789Mwu8EUCcqeEW/fJy
 Xh7B0z3/XgJz8BtTG/64IhmqO63Cwa/o7DSQdLr9dh5F/mPBzqrnRov97KL7mH1q
 VSO0Z7+8J0x9ALcYutpth/IzG/lXtXn/pfR1sj6dBHvjf5SwjuT8MKUHgh0l6N+C
 BWZn8swwrZaJ2Li2Gv3CpnCzVQZCkL6ns9VqAWiWq7VfGhDLndMqfi/jHCyGH83i
 E3dMtqf81XaQ7JRDPCs7Jx/4Zkn/iNkkZe8IQsByMc1BY4oeD7/Z2s8mkK8MbNla
 /CA=
 =DZVc
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - support DMA zones for arm64 systems where memory starts at > 4GB
   (Baruch Siach, Catalin Marinas)

 - support direct calls into dma-iommu and thus obsolete dma_map_ops for
   many common configurations (Leon Romanovsky)

 - add DMA-API tracing (Sean Anderson)

 - remove the not very useful return value from various dma_set_* APIs
   (Christoph Hellwig)

 - misc cleanups and minor optimizations (Chen Y, Yosry Ahmed, Christoph
   Hellwig)

* tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: reflow dma_supported
  dma-mapping: reliably inform about DMA support for IOMMU
  dma-mapping: add tracing for dma-mapping API calls
  dma-mapping: use IOMMU DMA calls for common alloc/free page calls
  dma-direct: optimize page freeing when it is not addressable
  dma-mapping: clearly mark DMA ops as an architecture feature
  vdpa_sim: don't select DMA_OPS
  arm64: mm: keep low RAM dma zone
  dma-mapping: don't return errors from dma_set_max_seg_size
  dma-mapping: don't return errors from dma_set_seg_boundary
  dma-mapping: don't return errors from dma_set_min_align_mask
  scsi: check that busses support the DMA API before setting dma parameters
  arm64: mm: fix DMA zone when dma-ranges is missing
  dma-mapping: direct calls for dma-iommu
  dma-mapping: call ->unmap_page and ->unmap_sg unconditionally
  arm64: support DMA zone above 4GB
  dma-mapping: replace zone_dma_bits by zone_dma_limit
  dma-mapping: use bit masking to check VM_DMA_COHERENT
2024-09-19 11:12:49 +02:00
Linus Torvalds
84bbfe6b64 platform-drivers-x86 for v6.12-1
Highlights:
  -  asus-wmi: Add support for vivobook fan profiles
  -  dell-laptop: Add knobs to change battery charge settings
  -  lg-laptop: Add operation region support
  -  intel-uncore-freq: Add support for efficiency latency control
  -  intel/ifs: Add SBAF test support
  -  intel/pmc: Ignore all LTRs during suspend
  -  platform/surface: Support for arm64 based Surface devices
  -  wmi: Pass event data directly to legacy notify handlers
  -  x86/platform/geode: switch GPIO buttons and LEDs to software properties
  -  bunch of small cleanups, fixes, hw-id additions, etc.
 
 The following is an automated git shortlog grouped by driver:
 
 Documentation:
  -  admin-guide: pm: Add efficiency vs. latency tradeoff to uncore documentation
 
 ISST:
  -  Simplify isst_misc_reg() and isst_misc_unreg()
 
 MAINTAINERS:
  -  adjust file entry in INTEL MID PLATFORM
  -  Add Intel MID section
 
 Merge tag 'hwmon-for-v6.11-rc7' into review-hans:
  - Merge tag 'hwmon-for-v6.11-rc7' into review-hans
 
 Merge tag 'platform-drivers-x86-v6.11-3' into review-hans:
  - Merge tag 'platform-drivers-x86-v6.11-3' into review-hans
 
 acer-wmi:
  -  Use backlight power constants
 
 asus-laptop:
  -  Use backlight power constants
 
 asus-nb-wmi:
  -  Use backlight power constants
 
 asus-wmi:
  -  don't fail if platform_profile already registered
  -  add debug print in more key places
  -  Use backlight power constants
  -  add support for vivobook fan profiles
 
 dell-laptop:
  -  remove duplicate code w/ battery function
  -  Add knobs to change battery charge settings
 
 dt-bindings:
  -  platform: Add Surface System Aggregator Module
  -  serial: Allow embedded-controller as child node
 
 eeepc-laptop:
  -  Use backlight power constants
 
 eeepc-wmi:
  -  Use backlight power constants
 
 fujitsu-laptop:
  -  Use backlight power constants
 
 hid-asus:
  -  use hid for brightness control on keyboard
 
 ideapad-laptop:
  -  Make the scope_guard() clear of its scope
  -  move ACPI helpers from header to source file
  -  Use backlight power constants
 
 int3472:
  -  Use str_high_low()
  -  Use GPIO_LOOKUP() macro
  -  make common part a separate module
 
 intel-hid:
  -  Use string_choices API instead of ternary operator
 
 intel/pmc:
  -  Ignore all LTRs during suspend
  -  Remove unused param idx from pmc_for_each_mode()
 
 intel_scu_ipc:
  -  Move intel_scu_ipc.h out of arch/x86/include/asm
 
 intel_scu_wdt:
  -  Move intel_scu_wdt.h to x86 subfolder
 
 lenovo-ymc:
  -  Ignore the 0x0 state
 
 lg-laptop:
  -  Add operation region support
 
 oaktrail:
  -  Use backlight power constants
 
 panasonic-laptop:
  -  Add support for programmable buttons
 
 platform/mellanox:
  -  mlxbf-pmc: fix lockdep warning
 
 platform/olpc:
  -  Remove redundant null pointer checks in olpc_ec_setup_debugfs()
 
 platform/surface:
  -  Add OF support
 
 platform/x86/amd:
  -  pmf: Add quirk for TUF Gaming A14
 
 platform/x86/amd/pmf:
  -  Update SMU metrics table for 1AH family series
  -  Relocate CPU ID macros to the PMF header
  -  Add support for notifying Smart PC Solution updates
 
 platform/x86/intel-uncore-freq:
  -  Add efficiency latency control to sysfs interface
  -  Add support for efficiency latency control
  -  Do not present separate package-die domain
 
 platform/x86/intel/ifs:
  -  Fix SBAF title underline length
  -  Add SBAF test support
  -  Add SBAF test image loading support
  -  Refactor MSR usage in IFS test code
 
 platform/x86/intel/pmc:
  -  Show live substate requirements
 
 platform/x86/intel/pmt:
  -  Use PMT callbacks
 
 platform/x86/intel/vsec:
  -  Add PMT read callbacks
 
 platform/x86/intel/vsec.h:
  -  Move to include/linux
 
 samsung-laptop:
  -  Use backlight power constants
 
 serial-multi-instantiate:
  -  Don't require both I2C and SPI
 
 thinkpad_acpi:
  -  Fix uninitialized symbol 's' warning
  -  Add Thinkpad Edge E531 fan support
 
 touchscreen_dmi:
  -  add nanote-next quirk
 
 trace:
  -  platform/x86/intel/ifs: Add SBAF trace support
 
 wmi:
  -  Call both legacy and WMI driver notify handlers
  -  Merge get_event_data() with wmi_get_notify_data()
  -  Remove wmi_get_event_data()
  -  Pass event data directly to legacy notify handlers
 
 x86-android-tablets:
  -  Adjust Xiaomi Pad 2 bottom bezel touch buttons LED
  -  Fix spelling in the comments
 
 x86/platform/geode:
  -  switch GPIO buttons and LEDs to software properties
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmbq2tYUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9xKYAgAoXZt1MjBDA1mP813i4bj8CYQHWO+
 YnugVhEccucxgC6sBGzQeRLBNuG/VaBN6tyJ1pKYMpWV5gSthq1Iop+DZbno2ciM
 QAnSSzioHB/dhYBXuKmZatkMsKLjLjtfcexUed9DfwKapqFl3XQMb6cEYasM37hH
 197K4yAFF3oqQImlACwQDxN1q3eCG6bdIbEAByZW7yH644IC5zH8/CiFjTCwUx/F
 aFIHQlLLzt1kjhD8AbRHhRcsGbzG2ejHsC3yrQddEJSOkInDO8baR0aDyhBTUFPE
 lztuekFfaJ1Xcyoc/Zf4pi3ab1Djt+Htck3CHLO/xcl0YYMlM5vcs1QlhQ==
 =sAk7
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers updates from Hans de Goede:

 - asus-wmi: Add support for vivobook fan profiles

 - dell-laptop: Add knobs to change battery charge settings

 - lg-laptop: Add operation region support

 - intel-uncore-freq: Add support for efficiency latency control

 - intel/ifs: Add SBAF test support

 - intel/pmc: Ignore all LTRs during suspend

 - platform/surface: Support for arm64 based Surface devices

 - wmi: Pass event data directly to legacy notify handlers

 - x86/platform/geode: switch GPIO buttons and LEDs to software
   properties

 - bunch of small cleanups, fixes, hw-id additions, etc.

* tag 'platform-drivers-x86-v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (65 commits)
  MAINTAINERS: adjust file entry in INTEL MID PLATFORM
  platform/x86: x86-android-tablets: Adjust Xiaomi Pad 2 bottom bezel touch buttons LED
  platform/mellanox: mlxbf-pmc: fix lockdep warning
  platform/x86/amd: pmf: Add quirk for TUF Gaming A14
  platform/x86: touchscreen_dmi: add nanote-next quirk
  platform/x86: asus-wmi: don't fail if platform_profile already registered
  platform/x86: asus-wmi: add debug print in more key places
  platform/x86: intel_scu_wdt: Move intel_scu_wdt.h to x86 subfolder
  platform/x86: intel_scu_ipc: Move intel_scu_ipc.h out of arch/x86/include/asm
  MAINTAINERS: Add Intel MID section
  platform/x86: panasonic-laptop: Add support for programmable buttons
  platform/olpc: Remove redundant null pointer checks in olpc_ec_setup_debugfs()
  platform/x86: intel/pmc: Ignore all LTRs during suspend
  platform/x86: wmi: Call both legacy and WMI driver notify handlers
  platform/x86: wmi: Merge get_event_data() with wmi_get_notify_data()
  platform/x86: wmi: Remove wmi_get_event_data()
  platform/x86: wmi: Pass event data directly to legacy notify handlers
  platform/x86: thinkpad_acpi: Fix uninitialized symbol 's' warning
  platform/x86: x86-android-tablets: Fix spelling in the comments
  platform/x86: ideapad-laptop: Make the scope_guard() clear of its scope
  ...
2024-09-19 09:16:04 +02:00
Linus Torvalds
4a39ac5b7d Random number generator updates for Linux 6.12-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmboHyUACgkQSfxwEqXe
 A66wGQ/8DRIjBllwf1YuTWi4T6OcfoYxK6C9bXO6QPP5gzdTyFE9pvDuuPyad6+F
 FR086ydTHeodemz1dFiQCL9etcUaxo4+6FRKyXKF9/1ezGbTA5nJd0/fKJGlqbI2
 EoA4LNYHOsvCZk1BTpxRNWKeKphU9zQgQdSigy6Rx8p269UkGmIZjD1PtUc+vqfR
 Ox0dK/Cswyo236fRi5HzaoMntWI4vXgLfxty0e1R7tfbstkCxSKWAON1lo3uHgkA
 0HpJXWgWXAPt9gp++Fs/jGNpOqbt6IaKeV5f7CjYfvWhlFjNMhQxF+PbxknaZn/k
 K0gQsItOIoFTfbQdLDIdfnj9awMdLW8FB2A1WXHpNr9pVC4ickPb1bMTF/XRd0tm
 wBNu4BL0gklx6017KZg5uINMIduzMLGkBLRFiBW0en/sZMLTJTMg58BJn0CL1Pmh
 1ll/Q3ToSMHalvxU2OnJagTwh4fzzCEpK/hW9WiDO4jSCsMXyX0clinrCjNo1JfA
 tqgTWEy3uGtg+dg0Du9VD5JASbNQSJ0ZRnas5+qz10IRWWfTolrsk61dliXLQ4Sv
 tSryDtsE2znwJF1Krh4aHNSSVhD5/l/8QaXkf9aZc/kkaHxwsx83FuWnqw6nMz8c
 l4B2MbH0jUgsEqEyx+0iwk+FXE9kZKWumTVLjFZ6bRnq3q+uq0U=
 =mWCw
 -----END PGP SIGNATURE-----

Merge tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "Originally I'd planned on sending each of the vDSO getrandom()
  architecture ports to their respective arch trees. But as we started
  to work on this, we found lots of interesting issues in the shared
  code and infrastructure, the fixes for which the various archs needed
  to base their work.

  So in the end, this turned into a nice collaborative effort fixing up
  issues and porting to 5 new architectures -- arm64, powerpc64,
  powerpc32, s390x, and loongarch64 -- with everybody pitching in and
  commenting on each other's code. It was a fun development cycle.

  This contains:

   - Numerous fixups to the vDSO selftest infrastructure, getting it
     running successfully on more platforms, and fixing bugs in it.

   - Additions to the vDSO getrandom & chacha selftests. Basically every
     time manual review unearthed a bug in a revision of an arch patch,
     or an ambiguity, the tests were augmented.

     By the time the last arch was submitted for review, s390x, v1 of
     the series was essentially fine right out of the gate.

   - Fixes to the the generic C implementation of vDSO getrandom, to
     build and run successfully on all archs, decoupling it from
     assumptions we had (unintentionally) made on x86_64 that didn't
     carry through to the other architectures.

   - Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by
     Huacai Chen.

   - Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked
     by Will Deacon.

   - Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit
     varieties, from Christophe Leroy and acked by Michael Ellerman.

   - Port of vDSO getrandom to S390X from Heiko Carstens, the arch
     maintainer.

  While it'd be natural for there to be things to fix up over the course
  of the development cycle, these patches got a decent amount of review
  from a fairly diverse crew of folks on the mailing lists, and, for the
  most part, they've been cooking in linux-next, which has been helpful
  for ironing out build issues.

  In terms of architectures, I think that mostly takes care of the
  important 64-bit archs with hardware still being produced and running
  production loads in settings where vDSO getrandom is likely to help.

  Arguably there's still RISC-V left, and we'll see for 6.13 whether
  they find it useful and submit a port"

* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits)
  selftests: vDSO: check cpu caps before running chacha test
  s390/vdso: Wire up getrandom() vdso implementation
  s390/vdso: Move vdso symbol handling to separate header file
  s390/vdso: Allow alternatives in vdso code
  s390/module: Provide find_section() helper
  s390/facility: Let test_facility() generate static branch if possible
  s390/alternatives: Remove ALT_FACILITY_EARLY
  s390/facility: Disable compile time optimization for decompressor code
  selftests: vDSO: fix vdso_config for s390
  selftests: vDSO: fix ELF hash table entry size for s390x
  powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64
  powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
  powerpc/vdso: Refactor CFLAGS for CVDSO build
  powerpc/vdso32: Add crtsavres
  mm: Define VM_DROPPABLE for powerpc/32
  powerpc/vdso: Fix VDSO data access when running in a non-root time namespace
  selftests: vDSO: don't include generated headers for chacha test
  arm64: vDSO: Wire up getrandom() vDSO implementation
  arm64: alternative: make alternative_has_cap_likely() VDSO compatible
  selftests: vDSO: also test counter in vdso_test_chacha
  ...
2024-09-18 15:26:31 +02:00
Linus Torvalds
cc52dc2fe3 pwm: Changes for v6.12-rc1
This pull request contains some cleanups to the core and some mostly
 minor updates to a bunch of drivers and device tree bindings. One thing
 worth pointing out is that it contains an immutable branch containing
 support for a new mfd chip (Analog Devices ADP5585) with several sub
 drivers. So expect to get the four affected commits also from my fellow
 MFD and GPIO maintainers.
 
 Thanks go to Andrew Kreimer, Clark Wang, Conor Dooley, David Lechner,
 Dmitry Rokosov, Frank Li, Geert Uytterhoeven, George Stark, Jiapeng
 Chong, Krzysztof Kozlowski, Laurent Pinchart, Liao Chen, Liu Ying, Rob
 Herring and Wolfram Sang for code contributions and reviews and to Lee
 Jones for preparing the above mentioned immutable branch.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEP4GsaTp6HlmJrf7Tj4D7WH0S/k4FAmboM3AACgkQj4D7WH0S
 /k6IzQgAj+3B4F4UKPPI8jcQqRQGOWfjA365nIQmr1oeFYSGDILv4btU1TNV1MfH
 WLXMRXLQb4dng21J8IwIJ/qyndL+GjRj3KWxLHJa3+/gxf8YuGwWJlNjlxtrGXM/
 3JQ/aWqfgCf4KTRG3MoCTKc5fxtbHHWZ71kGdi6cchk1HggyBUH/7g85h/VkhCuc
 JpOC7CvDVmzTkTIltCbiVJQ4xO3zmsV2WgnsWUzN+41PUjqJmMLmhKjI6UdAYWlI
 B3qgCMXik153oYgaIw/BMtxFWa9e2ZxZ6hV+gx4tVQWbOtBPUxEqHpX2dt1fp5+h
 7PQoKVWJycykdxmlOSGnjOl3RHVX5A==
 =VjPD
 -----END PGP SIGNATURE-----

Merge tag 'pwm/for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm updates from Uwe Kleine-König:
 "This contains some cleanups to the core and some mostly minor updates
  to a bunch of drivers and device tree bindings. One thing worth
  pointing out is that it contains an immutable branch containing
  support for a new mfd chip (Analog Devices ADP5585) with several sub
  drivers.

  Thanks go to Andrew Kreimer, Clark Wang, Conor Dooley, David Lechner,
  Dmitry Rokosov, Frank Li, Geert Uytterhoeven, George Stark, Jiapeng
  Chong, Krzysztof Kozlowski, Laurent Pinchart, Liao Chen, Liu Ying, Rob
  Herring and Wolfram Sang for code contributions and reviews and to Lee
  Jones for preparing the above mentioned immutable branch"

* tag 'pwm/for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: (21 commits)
  pwm: stm32: Fix a typo
  dt-bindings: pwm: amlogic: Add new bindings for meson A1 PWM
  dt-bindings: pwm: amlogic: Add optional power-domains
  pwm: Switch back to struct platform_driver::remove()
  dt-bindings: pwm: allwinner,sun4i-a10-pwm: add top-level constraints
  pwm: axi-pwmgen: use shared macro for version reg
  pwm: atmel-hlcdc: Drop trailing comma
  pwm: atmel-hlcdc: Enable module autoloading
  pwm: omap-dmtimer: Use of_property_read_bool()
  pwm: adp5585: Set OSC_EN bit to 1 when PWM state is enabled
  pwm: lp3943: Fix an incorrect type in lp3943_pwm_parse_dt()
  pwm: Simplify pwm_capture()
  pwm: lp3943: Use of_property_count_u32_elems() to get property length
  pwm: Don't export pwm_capture()
  pwm: Make info in traces about affected pwm more useful
  dt-bindings: pwm: renesas,tpu: Add r8a779h0 support
  dt-bindings: pwm: renesas,pwm-rcar: Add r8a779h0 support
  pwm: adp5585: Add Analog Devices ADP5585 support
  gpio: adp5585: Add Analog Devices ADP5585 support
  mfd: adp5585: Add Analog Devices ADP5585 core support
  ...
2024-09-18 10:39:35 +02:00
Linus Torvalds
067610ebaa RCU pull request for v6.12
This pull request contains the following branches:
 
 context_tracking.15.08.24a: Rename context tracking state related
         symbols and remove references to "dynticks" in various context
         tracking state variables and related helpers; force
         context_tracking_enabled_this_cpu() to be inlined to avoid
         leaving a noinstr section.
 
 csd.lock.15.08.24a: Enhance CSD-lock diagnostic reports; add an API
         to provide an indication of ongoing CSD-lock stall.
 
 nocb.09.09.24a: Update and simplify RCU nocb code to handle
         (de-)offloading of callbacks only for offline CPUs; fix RT
         throttling hrtimer being armed from offline CPU.
 
 rcutorture.14.08.24a: Remove redundant rcu_torture_ops get_gp_completed
         fields; add SRCU ->same_gp_state and ->get_comp_state
         functions; add generic test for NUM_ACTIVE_*RCU_POLL* for
         testing RCU and SRCU polled grace periods; add CFcommon.arch
         for arch-specific Kconfig options; print number of update types
         in rcu_torture_write_types();
         add rcutree.nohz_full_patience_delay testing to the TREE07
         scenario; add a stall_cpu_repeat module parameter to test
         repeated CPU stalls; add argument to limit number of CPUs a
         guest OS can use in torture.sh;
 
 rcustall.09.09.24a: Abbreviate RCU CPU stall warnings during CSD-lock
         stalls; Allow dump_cpu_task() to be called without disabling
         preemption; defer printing stall-warning backtrace when holding
         rcu_node lock.
 
 srcu.12.08.24a: Make SRCU gp seq wrap-around faster; add KCSAN checks
         for concurrent updates to ->srcu_n_exp_nodelay and
         ->reschedule_count which are used in heuristics governing
         auto-expediting of normal SRCU grace periods and
         grace-period-state-machine delays; mark idle SRCU-barrier
         callbacks to help identify stuck SRCU-barrier callback.
 
 rcu.tasks.14.08.24a: Remove RCU Tasks Rude asynchronous APIs as they
         are no longer used; stop testing RCU Tasks Rude asynchronous
         APIs; fix access to non-existent percpu regions; check
         processor-ID assumptions during chosen CPU calculation for
         callback enqueuing; update description of rtp->tasks_gp_seq
         grace-period sequence number; add rcu_barrier_cb_is_done()
         to identify whether a given rcu_barrier callback is stuck;
         mark idle Tasks-RCU-barrier callbacks; add
         *torture_stats_print() functions to print detailed
         diagnostics for Tasks-RCU variants; capture start time of
         rcu_barrier_tasks*() operation to help distinguish a hung
         barrier operation from a long series of barrier operations.
 
 rcu_scaling_tests.15.08.24a:
         refscale: Add a TINY scenario to support tests of Tiny RCU
         and Tiny SRCU; Optimize process_durations() operation;
 
         rcuscale: Dump stacks of stalled rcu_scale_writer() instances;
         dump grace-period statistics when rcu_scale_writer() stalls;
         mark idle RCU-barrier callbacks to identify stuck RCU-barrier
         callbacks; print detailed grace-period and barrier diagnostics
         on rcu_scale_writer() hangs for Tasks-RCU variants; warn if
         async module parameter is specified for RCU implementations
         that do not have async primitives such as RCU Tasks Rude;
         make all writer tasks report upon hang; tolerate repeated
         GFP_KERNEL failure in rcu_scale_writer(); use special allocator
         for rcu_scale_writer(); NULL out top-level pointers to heap
         memory to avoid double-free bugs on modprobe failures; maintain
         per-task instead of per-CPU callbacks count to avoid any issues
         with migration of either tasks or callbacks; constify struct
         ref_scale_ops.
 
 fixes.12.08.24a: Use system_unbound_wq for kfree_rcu work to avoid
         disturbing isolated CPUs.
 
 misc.11.08.24a: Warn on unexpected rcu_state.srs_done_tail state;
         Better define "atomic" for list_replace_rcu() and
         hlist_replace_rcu() routines; annotate struct
         kvfree_rcu_bulk_data with __counted_by().
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCZt8+8wAKCRAAHS7/6Z0w
 pTqoAPwPN//tlEoJx2PRs6t0q+nD1YNvnZawPaRmdzgdM8zJogD+PiSN+XhqRr80
 jzyvMDU4Aa0wjUNP3XsCoaCxo7L/lQk=
 =bZ9z
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Neeraj Upadhyay:
 "Context tracking:
   - rename context tracking state related symbols and remove references
     to "dynticks" in various context tracking state variables and
     related helpers
   - force context_tracking_enabled_this_cpu() to be inlined to avoid
     leaving a noinstr section

  CSD lock:
   - enhance CSD-lock diagnostic reports
   - add an API to provide an indication of ongoing CSD-lock stall

  nocb:
   - update and simplify RCU nocb code to handle (de-)offloading of
     callbacks only for offline CPUs
   - fix RT throttling hrtimer being armed from offline CPU

  rcutorture:
   - remove redundant rcu_torture_ops get_gp_completed fields
   - add SRCU ->same_gp_state and ->get_comp_state functions
   - add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU
     polled grace periods
   - add CFcommon.arch for arch-specific Kconfig options
   - print number of update types in rcu_torture_write_types()
   - add rcutree.nohz_full_patience_delay testing to the TREE07 scenario
   - add a stall_cpu_repeat module parameter to test repeated CPU stalls
   - add argument to limit number of CPUs a guest OS can use in
     torture.sh

  rcustall:
   - abbreviate RCU CPU stall warnings during CSD-lock stalls
   - Allow dump_cpu_task() to be called without disabling preemption
   - defer printing stall-warning backtrace when holding rcu_node lock

  srcu:
   - make SRCU gp seq wrap-around faster
   - add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and
     ->reschedule_count which are used in heuristics governing
     auto-expediting of normal SRCU grace periods and
     grace-period-state-machine delays
   - mark idle SRCU-barrier callbacks to help identify stuck
     SRCU-barrier callback

  rcu tasks:
   - remove RCU Tasks Rude asynchronous APIs as they are no longer used
   - stop testing RCU Tasks Rude asynchronous APIs
   - fix access to non-existent percpu regions
   - check processor-ID assumptions during chosen CPU calculation for
     callback enqueuing
   - update description of rtp->tasks_gp_seq grace-period sequence
     number
   - add rcu_barrier_cb_is_done() to identify whether a given
     rcu_barrier callback is stuck
   - mark idle Tasks-RCU-barrier callbacks
   - add *torture_stats_print() functions to print detailed diagnostics
     for Tasks-RCU variants
   - capture start time of rcu_barrier_tasks*() operation to help
     distinguish a hung barrier operation from a long series of barrier
     operations

  refscale:
   - add a TINY scenario to support tests of Tiny RCU and Tiny
     SRCU
   - optimize process_durations() operation

  rcuscale:
   - dump stacks of stalled rcu_scale_writer() instances and
     grace-period statistics when rcu_scale_writer() stalls
   - mark idle RCU-barrier callbacks to identify stuck RCU-barrier
     callbacks
   - print detailed grace-period and barrier diagnostics on
     rcu_scale_writer() hangs for Tasks-RCU variants
   - warn if async module parameter is specified for RCU implementations
     that do not have async primitives such as RCU Tasks Rude
   - make all writer tasks report upon hang
   - tolerate repeated GFP_KERNEL failure in rcu_scale_writer()
   - use special allocator for rcu_scale_writer()
   - NULL out top-level pointers to heap memory to avoid double-free
     bugs on modprobe failures
   - maintain per-task instead of per-CPU callbacks count to avoid any
     issues with migration of either tasks or callbacks
   - constify struct ref_scale_ops

  Fixes:
   - use system_unbound_wq for kfree_rcu work to avoid disturbing
     isolated CPUs

  Misc:
   - warn on unexpected rcu_state.srs_done_tail state
   - better define "atomic" for list_replace_rcu() and
     hlist_replace_rcu() routines
   - annotate struct kvfree_rcu_bulk_data with __counted_by()"

* tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits)
  rcu: Defer printing stall-warning backtrace when holding rcu_node lock
  rcu/nocb: Remove superfluous memory barrier after bypass enqueue
  rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
  rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
  rcu/nocb: Simplify (de-)offloading state machine
  context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline
  context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
  rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}()
  rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
  rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()
  rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()
  rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
  rcu: Rename struct rcu_data .dynticks_snap into .watching_snap
  rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()
  rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
  rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
  rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()
  context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu()
  context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*()
  refscale: Constify struct ref_scale_ops
  ...
2024-09-18 07:52:24 +02:00
Linus Torvalds
2f27fce671 sound updates for 6.12-rc1
A fairly big update at this time, both in core and driver sides.
 
 The core received rewrites in PCM buffer allocation handling and
 locking optimizations, PCM rate updates followed by lots of cleanups.
 
 In ASoC side, the legacy Intel drivers have been deprecated by AVS
 drivers which leaded to the significant amount of code reduction.
 SoundWire driver updates and other cleanups contributed more code
 reduction, too.
 
 USB-audio driver received a large cleanup of its big quirk table, and
 the old snd_print*() API usages in many legacy drivers are replaced
 with the standard print API.
 
 Here are some highlights:
 
 Core:
 - More optimized locking in ALSA control code
 - Rewrites of memalloc helpers for better DMA API usage
 - Drop of obsoleted vmalloc PCM buffer helper API
 - Continued MIDI2 UMP updates
 - Support of a new user-space driven timer instance
 - Update for more PCM support rates and cleanups
 - Xrun counter report in the proc files
 
 ASoC:
 - Continued simplification and cleanup works for ASoC
 - Extensive cleanups and refactoring of the Soundwire drivers
 - Removal of Intel machine support obsoleted by the AVS driver
 - Lots of DT schema conversions
 - Machine support for many AMD and Intel x86 platforms
 - Support for AMD ACP 7.1, Mediatek MT6367 and MT8365, Realtek RTL1320
   SoundWire and rev C, and Texas Instruments TAS2563
 
 USB-audio:
 - Add support of multiple control interfaces
 - A large rewrite of quirk table with macros
 - Support for RME Digiface USB
 
 HD-audio:
 - Cleanup of quirk code for Samsung Galaxy laptops
 - Clean up of detection of Cirrus codecs
 - C-Media CM9825 HD-audio codec support
 
 Others:
 - Rewrites to standard print API in a lot of legacy drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmblvDMOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE823BAAktHgwGbgu+s/U4osgk5M+x1IAzbbRFDEEhuG
 Pck6K1NikgUGXg/x/m6O0/M4CmLcGv7NeebD4ihJJPxdK7fpsEOcIeCiPoWfpumN
 whtrzf6DP6gMxrE/ov4qUydItuCGVNWcEF/bWv7inEcoJ+qtqiRAWLGvpwQurrvn
 NwO+9V/L8NSTWiZVX5ve1+hVVxpLoEQEhRpvMfrVyPXgX0zXgSexka9pwSdb+3xD
 vkIKQ1ju1JD8HG6JLfsIOBQYndrz3KLYWhozzrPKh+hGz3vOkhUPrfhYz5hyoWO9
 Ep95ZHF4ynAIV0pHlsQTH79BmkxmAJKVQImYHOnOWDvL4T6OVpoY6bzIMXzE9IHJ
 p/5JkG422qguoqIEBhM1mkggdXXIjwARFEtqQs+NvUErAd2Pnckl38TSrBtswa1c
 FcEjVq8MfIMFroDIPbEt6UY5K5GLWjwFG8rYFYbbEI4qIMLYSi4pbGtedpGxVZ4P
 eZGbAlAL6cpzXhTh90maA+NXSyeZUl9Tg8aHF48WjkU8LsEi9fHW/YU8JYyMfyQ3
 nYWAZocvXOlIpul8MOPVOg1vXpFKhSVXITKXolQQK1e/C3PirfWsrDxbdF8HduTi
 tfVGPiHprwPw2PE0E7ZqjBO1nRLMGcCqv2Iz69lFisPprDJr75C4voPDK+rjo7We
 YIhyUMU=
 =HLUp
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "A fairly big update at this time, both in core and driver sides.

  The core received rewrites in PCM buffer allocation handling and
  locking optimizations, PCM rate updates followed by lots of cleanups.

  In ASoC side, the legacy Intel drivers have been deprecated by AVS
  drivers which leaded to the significant amount of code reduction.
  SoundWire driver updates and other cleanups contributed more code
  reduction, too.

  USB-audio driver received a large cleanup of its big quirk table, and
  the old snd_print*() API usages in many legacy drivers are replaced
  with the standard print API.

  Here are some highlights:

  Core:
   - More optimized locking in ALSA control code
   - Rewrites of memalloc helpers for better DMA API usage
   - Drop of obsoleted vmalloc PCM buffer helper API
   - Continued MIDI2 UMP updates
   - Support of a new user-space driven timer instance
   - Update for more PCM support rates and cleanups
   - Xrun counter report in the proc files

  ASoC:
   - Continued simplification and cleanup works for ASoC
   - Extensive cleanups and refactoring of the Soundwire drivers
   - Removal of Intel machine support obsoleted by the AVS driver
   - Lots of DT schema conversions
   - Machine support for many AMD and Intel x86 platforms
   - Support for AMD ACP 7.1, Mediatek MT6367 and MT8365, Realtek
     RTL1320 SoundWire and rev C, and Texas Instruments TAS2563

  USB-audio:
   - Add support of multiple control interfaces
   - A large rewrite of quirk table with macros
   - Support for RME Digiface USB

  HD-audio:
   - Cleanup of quirk code for Samsung Galaxy laptops
   - Clean up of detection of Cirrus codecs
   - C-Media CM9825 HD-audio codec support

  Others:
   - Rewrites to standard print API in a lot of legacy drivers"

* tag 'sound-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (410 commits)
  ASoC: topology: Fix redundant logical jump
  ASoC: tas2781: Add Calibration Kcontrols for Chromebook
  ASoC: amd: acp: refactor SoundWire machine driver code
  ASoC: sdw_utils/intel: move soundwire endpoint parsing helper functions
  ASoC: sdw_util/intel: move soundwire endpoint and dai link structures
  ASoC: intel: sof_sdw: rename soundwire parsing helper functions
  ASoC: intel: sof_sdw: rename soundwire endpoint and dailink structures
  ASoC: atmel: mchp-pdmc: Retain Non-Runtime Controls
  ALSA: hda/realtek: Add support for Galaxy Book2 Pro (NP950XEE)
  ASoC: mediatek: mt7986-afe-pcm: Remove redundant error message
  ALSA: memalloc: Use proper DMA mapping API for x86 S/G buffer allocations
  ALSA: memalloc: Use proper DMA mapping API for x86 WC buffer allocations
  ALSA: usb-audio: Add logitech Audio profile quirk
  ASoc: mediatek: mt8365: Remove unneeded assignment
  ASoC: Intel: ARL: Add entry for HDMI-In capture support to non-I2S codec boards.
  ASoC: Intel: sof_rt5682: Add HDMI-In capture with rt5682 support for ARL.
  ASoC: SOF: Intel: hda: remove common_hdmi_codec_drv
  ASoC: Intel: sof_pcm512x: do not check common_hdmi_codec_drv
  ASoC: Intel: ehl_rt5660: do not check common_hdmi_codec_drv
  ASoC: Intel: skl_hda_dsp_generic: use common module for DAI links
  ...
2024-09-17 17:03:43 +02:00
Hongbo Li
318580ad7f
hugetlbfs: support tracepoint
Add basic tracepoints for {alloc, evict, free}_inode, setattr and
fallocate. These can help users to debug hugetlbfs more conveniently.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20240829064110.67884-2-lihongbo22@huawei.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-17 11:07:34 +02:00
Linus Torvalds
7a40974fd0 for-6.12-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmbjB00ACgkQxWXV+ddt
 WDsZ/xAAmKAPYvn9P8UZcsG+J0Uxs+OyDJ6SDxkZWw/m3gH1SzRH1+baWNIFvsTO
 O5GcEsjXkjfC+vBOU0Hoje1YsvLXboZLM85NvgcA8FZPAnUkLkUjVD1jYTGN7YPM
 gfkVFUJWdYs6uYTAX8bCIxgAqmABLJ6Jlk9HFpL82WqkUQLd/0vCnUpmRXPdG+2D
 RiL9b8HG9v9vahMLyVhUUdeoubD88/lNqIhbiVCoqqC+6IuFYJlpj2nZEO1N/k5F
 iX2VBMayRXbX0JziryU6fuE0VIr0yEttRUDdjgxL4SwCcrXBQzFaIgV43ygWbW+Y
 QI6nuD+hhOwDq3Dn0zz27LV5d0A5YNAGV/nO042FC28GL+qZeensX9SVKxR3ig99
 tLkHhyQ8q8AdanVk/LskbBjm2LaARJIKDVmgpllVR9o+ur4P23koVsPTz/VTgBKt
 DcMCHNpRrryz3/S4dCcNvN7NCxZkKiPOgVx0Q6hPj/URG9LB72Gu+wrQ2wZAdxkV
 yIbY2EOEiks0fnuRVrG0I0WRJjyWZUSsTpjnC8VXSK+6hdfi0M2Vr+ErFMbU95Ys
 FyGEHfGGP5WBEEE/uxPOMRc9u6KOCfHoAjWrMPyKeJDqWS2gQ4ovOw+JjzpmaqXJ
 qghLDjqjkPs8c1JlnAFp9QEeiYJRX1C3qoiBU5Axt/7J36sYrEM=
 =kDNG
 -----END PGP SIGNATURE-----

Merge tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "This brings mostly refactoring, cleanups, minor performance
  optimizations and usual fixes. The folio API conversions are most
  noticeable.

  There's one less visible change that could have a high impact. The
  extent lock scope for read is reduced, not held for the entire
  operation. In the buffered read case it's left to page or inode lock,
  some direct io read synchronization is still needed.

  This used to prevent deadlocks induced by page faults during direct
  io, so there was a 4K limitation on the requests, e.g. for io_uring.
  In the future this will allow smoother integration with iomap where
  the extent read lock was a major obstacle.

  User visible changes:

   - the FSTRIM ioctl updates the processed range even after an error or
     interruption

   - cleaner thread is woken up in SYNC ioctl instead of waking the
     transaction thread that can take some delay before waking up the
     cleaner, this can speed up cleaning of deleted subvolumes

   - print an error message when opening a device fail, e.g. when it's
     unexpectedly read-only

  Core changes:

   - improved extent map handling in various ways (locking, iteration, ...)

   - new assertions and locking annotations

   - raid-stripe-tree locking fixes

   - use xarray for tracking dirty qgroup extents, switched from rb-tree

   - turn the subpage test to compile-time condition if possible (e.g.
     on x86_64 with 4K pages), this allows to skip a lot of ifs and
     remove dead code

   - more preparatory work for compression in subpage mode

  Cleanups and refactoring

   - folio API conversions, many simple cases where page is passed so
     switch it to folios

   - more subpage code refactoring, update page state bitmap processing

   - introduce auto free for btrfs_path structure, use for the simple
     cases"

* tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (110 commits)
  btrfs: only unlock the to-be-submitted ranges inside a folio
  btrfs: merge btrfs_folio_unlock_writer() into btrfs_folio_end_writer_lock()
  btrfs: BTRFS_PATH_AUTO_FREE in orphan.c
  btrfs: use btrfs_path auto free in zoned.c
  btrfs: DEFINE_FREE for struct btrfs_path
  btrfs: remove btrfs_folio_end_all_writers()
  btrfs: constify more pointer parameters
  btrfs: rework BTRFS_I as macro to preserve parameter const
  btrfs: add and use helper to verify the calling task has locked the inode
  btrfs: always update fstrim_range on failure in FITRIM ioctl
  btrfs: convert copy_inline_to_page() to use folio
  btrfs: convert btrfs_decompress() to take a folio
  btrfs: convert zstd_decompress() to take a folio
  btrfs: convert lzo_decompress() to take a folio
  btrfs: convert zlib_decompress() to take a folio
  btrfs: convert try_release_extent_mapping() to take a folio
  btrfs: convert try_release_extent_state() to take a folio
  btrfs: convert submit_eb_page() to take a folio
  btrfs: convert submit_eb_subpage() to take a folio
  btrfs: convert read_key_bytes() to take a folio
  ...
2024-09-16 13:10:46 +02:00
Linus Torvalds
35219bc5c7 vfs-6.12.netfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvgAKCRCRxhvAZXjc
 onQWAQD6IxAKPU0zom2FoWNilvSzPs7WglTtvddX9pu/lT1RNAD/YC/wOLW8mvAv
 9oTAmigQDQQhEWdJA9RgLZBiw7k+DAw=
 =zWFb
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull netfs updates from Christian Brauner:
 "This contains the work to improve read/write performance for the new
  netfs library.

  The main performance enhancing changes are:

   - Define a structure, struct folio_queue, and a new iterator type,
     ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
     that patch for questions about naming and form.

     ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
     problem with an xarray is that accessing it requires the use of a
     lock (typically the RCU read lock) - and this means that we can't
     supply iterate_and_advance() with a step function that might sleep
     (crypto for example) without having to drop the lock between pages.
     ITER_FOLIOQ is the iterator for a chain of folio_queue structs,
     where each folio_queue holds a small list of folios. A folio_queue
     struct is a simpler structure than xarray and is not subject to
     concurrent manipulation by the VM. folio_queue is used rather than
     a bvec[] as it can form lists of indefinite size, adding to one end
     and removing from the other on the fly.

   - Provide a copy_folio_from_iter() wrapper.

   - Make cifs RDMA support ITER_FOLIOQ.

   - Use folio queues in the write-side helpers instead of xarrays.

   - Add a function to reset the iterator in a subrequest.

   - Simplify the write-side helpers to use sheaves to skip gaps rather
     than trying to work out where gaps are.

   - In afs, make the read subrequests asynchronous, putting them into
     work items to allow the next patch to do progressive
     unlocking/reading.

   - Overhaul the read-side helpers to improve performance.

   - Fix the caching of a partial block at the end of a file.

   - Allow a store to be cancelled.

  Then some changes for cifs to make it use folio queues instead of
  xarrays for crypto bufferage:

   - Use raw iteration functions rather than manually coding iteration
     when hashing data.

   - Switch to using folio_queue for crypto buffers.

   - Remove the xarray bits.

  Make some adjustments to the /proc/fs/netfs/stats file such that:

   - All the netfs stats lines begin 'Netfs:' but change this to
     something a bit more useful.

   - Add a couple of stats counters to track the numbers of skips and
     waits on the per-inode writeback serialisation lock to make it
     easier to check for this as a source of performance loss.

  Miscellaneous work:

   - Ensure that the sb_writers lock is taken around
     vfs_{set,remove}xattr() in the cachefiles code.

   - Reduce the number of conditional branches in netfs_perform_write().

   - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
     remove cifs_post_modify().

   - Move the max_len/max_nr_segs members from netfs_io_subrequest to
     netfs_io_request as they're only needed for one subreq at a time.

   - Add an 'unknown' source value for tracing purposes.

   - Remove NETFS_COPY_TO_CACHE as it's no longer used.

   - Set the request work function up front at allocation time.

   - Use bh-disabling spinlocks for rreq->lock as cachefiles completion
     may be run from block-filesystem DIO completion in softirq context.

   - Remove fs/netfs/io.c"

* tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  docs: filesystems: corrected grammar of netfs page
  cifs: Don't support ITER_XARRAY
  cifs: Switch crypto buffer to use a folio_queue rather than an xarray
  cifs: Use iterate_and_advance*() routines directly for hashing
  netfs: Cancel dirty folios that have no storage destination
  cachefiles, netfs: Fix write to partial block at EOF
  netfs: Remove fs/netfs/io.c
  netfs: Speed up buffered reading
  afs: Make read subreqs async
  netfs: Simplify the writeback code
  netfs: Provide an iterator-reset function
  netfs: Use new folio_queue data type and iterator instead of xarray iter
  cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
  iov_iter: Provide copy_folio_from_iter()
  mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
  netfs: Use bh-disabling spinlocks for rreq->lock
  netfs: Set the request work function upon allocation
  netfs: Remove NETFS_COPY_TO_CACHE
  netfs: Reserve netfs_sreq_source 0 as unset/unknown
  netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
  ...
2024-09-16 12:13:31 +02:00
Linus Torvalds
ee25861f26 vfs-6.12.fallocate
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEwAAKCRCRxhvAZXjc
 omD7AQCZuWPXkEGYFD37MJZuRXNEoq7Tuj6yd0O2b5khUpzvyAD+MPuthGiCMPsu
 voPpUP83x7T0D3JsEsCAXtNeVRcIBQI=
 =xTs6
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fallocate updates from Christian Brauner:
 "This contains work to try and cleanup some the fallocate mode
  handling. Currently, it confusingly mixes operation modes and an
  optional flag.

  The work here tries to better define operation modes and optional
  flags allowing the core and filesystem code to use switch statements
  to switch on the operation mode"

* tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  xfs: refactor xfs_file_fallocate
  xfs: move the xfs_is_always_cow_inode check into xfs_alloc_file_space
  xfs: call xfs_flush_unmap_range from xfs_free_file_space
  fs: sort out the fallocate mode vs flag mess
  ext4: remove tracing for FALLOC_FL_NO_HIDE_STALE
  block: remove checks for FALLOC_FL_NO_HIDE_STALE
2024-09-16 09:34:08 +02:00
Linus Torvalds
8f72c31f45 vfs-6.12.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEGwAKCRCRxhvAZXjc
 ojIuAQC433+hBkvjvmQ7H0r5rgZSjUuCTG3bSmdU7RJmPHUHhwEA85v/NGq53f+W
 IhandK6t+Cf0JYpFZ3N0bT88hDYVhQQ=
 =9zGL
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual pile of misc updates:

  Features:

   - Add F_CREATED_QUERY fcntl() that allows userspace to query whether
     a file was actually created. Often userspace wants to know whether
     an O_CREATE request did actually create a file without using
     O_EXCL. The current logic is that to first attempts to open the
     file without O_CREAT | O_EXCL and if ENOENT is returned userspace
     tries again with both flags. If that succeeds all is well. If it
     now reports EEXIST it retries.

     That works fairly well but some corner cases make this more
     involved. If this operates on a dangling symlink the first openat()
     without O_CREAT | O_EXCL will return ENOENT but the second openat()
     with O_CREAT | O_EXCL will fail with EEXIST.

     The reason is that openat() without O_CREAT | O_EXCL follows the
     symlink while O_CREAT | O_EXCL doesn't for security reasons. So
     it's not something we can really change unless we add an explicit
     opt-in via O_FOLLOW which seems really ugly.

     All available workarounds are really nasty (fanotify, bpf lsm etc)
     so add a simple fcntl().

   - Try an opportunistic lookup for O_CREAT. Today, when opening a file
     we'll typically do a fast lookup, but if O_CREAT is set, the kernel
     always takes the exclusive inode lock. This was likely done with
     the expectation that O_CREAT means that we always expect to do the
     create, but that's often not the case. Many programs set O_CREAT
     even in scenarios where the file already exists (see related
     F_CREATED_QUERY patch motivation above).

     The series contained in the pr rearranges the pathwalk-for-open
     code to also attempt a fast_lookup in certain O_CREAT cases. If a
     positive dentry is found, the inode_lock can be avoided altogether
     and it can stay in rcuwalk mode for the last step_into.

   - Expose the 64 bit mount id via name_to_handle_at()

     Now that we provide a unique 64-bit mount ID interface in statx(2),
     we can now provide a race-free way for name_to_handle_at(2) to
     provide a file handle and corresponding mount without needing to
     worry about racing with /proc/mountinfo parsing or having to open a
     file just to do statx(2).

     While this is not necessary if you are using AT_EMPTY_PATH and
     don't care about an extra statx(2) call, users that pass full paths
     into name_to_handle_at(2) need to know which mount the file handle
     comes from (to make sure they don't try to open_by_handle_at a file
     handle from a different filesystem) and switching to AT_EMPTY_PATH
     would require allocating a file for every name_to_handle_at(2) call

   - Add a per dentry expire timeout to autofs

     There are two fairly well known automounter map formats, the autofs
     format and the amd format (more or less System V and Berkley).

     Some time ago Linux autofs added an amd map format parser that
     implemented a fair amount of the amd functionality. This was done
     within the autofs infrastructure and some functionality wasn't
     implemented because it either didn't make sense or required extra
     kernel changes. The idea was to restrict changes to be within the
     existing autofs functionality as much as possible and leave changes
     with a wider scope to be considered later.

     One of these changes is implementing the amd options:
      1) "unmount", expire this mount according to a timeout (same as
         the current autofs default).
      2) "nounmount", don't expire this mount (same as setting the
         autofs timeout to 0 except only for this specific mount) .
      3) "utimeout=<seconds>", expire this mount using the specified
         timeout (again same as setting the autofs timeout but only for
         this mount)

     To implement these options per-dentry expire timeouts need to be
     implemented for autofs indirect mounts. This is because all map
     keys (mounts) for autofs indirect mounts use an expire timeout
     stored in the autofs mount super block info. structure and all
     indirect mounts use the same expire timeout.

  Fixes:

   - Fix missing fput for FSCONFIG_SET_FD in autofs

   - Use param->file for FSCONFIG_SET_FD in coda

   - Delete the 'fs/netfs' proc subtreee when netfs module exits

   - Make sure that struct uid_gid_map fits into a single cacheline

   - Don't flush in-flight wb switches for superblocks without cgroup
     writeback

   - Correcting the idmapping mount example in the idmapping
     documentation

   - Fix a race between evice_inodes() and find_inode() and iput()

   - Refine the show_inode_state() macro definition in writeback code

   - Prevent dump_mapping() from accessing invalid dentry.d_name.name

   - Show actual source for debugfs in /proc/mounts

   - Annotate data-race of busy_poll_usecs in eventpoll

   - Don't WARN for racy path_noexec check in exec code

   - Handle OOM on mnt_warn_timestamp_expiry()

   - Fix some spelling in the iomap design documentation

   - Fix typo in procfs comment

   - Fix typo in fs/namespace.c comment

  Cleanups:

   - Add the VFS git tree to the MAINTAINERS file

   - Move FMODE_UNSIGNED_OFFSET to fop_flags freeing up another f_mode
     bit in struct file bringing us to 5 free f_mode bits

   - Remove the __I_DIO_WAKEUP bit from i_state flags as we can simplify
     the wait mechanism

   - Remove the unused path_put_init() helper

   - Replace a __u32 with u32 for s_fsnotify_mask as __u32 is uapi
     specific

   - Replace the unsigned long i_state member with a u32 i_state member
     in struct inode freeing up 4 bytes in struct inode. Instead of
     using the bit based wait apis we're now using the var event apis
     and using the individual bytes of the i_state member to wait on
     state changes

   - Explain how per-syscall AT_* flags should be allocated

   - Use in_group_or_capable() helper to simplify the posix acl mode
     update code

   - Switch to LIST_HEAD() in fsync_buffers_list() to simplify the code

   - Removed comment about d_rcu_to_refcount() as that function doesn't
     exist anymore

   - Add kernel documentation for lookup_fast()

   - Don't re-zero evenpoll fields

   - Remove outdated comment after close_fd()

   - Fix imprecise wording in comment about the pipe filesystem

   - Drop GFP_NOFAIL mode from alloc_page_buffers

   - Missing blank line warnings and struct declaration improved in
     file_table

   - Annotate struct poll_list with __counted_by()

   - Remove the unused read parameter in percpu-rwsem

   - Remove linux/prefetch.h include from direct-io code

   - Use kmemdup_array instead of kmemdup for multiple allocation in
     mnt_idmapping code

   - Remove unused mnt_cursor_del() declaration

  Performance tweaks:

   - Dodge smp_mb in break_lease and break_deleg in the common case

   - Only read fops once in fops_{get,put}()

   - Use RCU in ilookup()

   - Elide smp_mb in iversion handling in the common case

   - Drop one lock trip in evict()"

* tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (58 commits)
  uidgid: make sure we fit into one cacheline
  proc: Fix typo in the comment
  fs/pipe: Correct imprecise wording in comment
  fhandle: expose u64 mount id to name_to_handle_at(2)
  uapi: explain how per-syscall AT_* flags should be allocated
  fs: drop GFP_NOFAIL mode from alloc_page_buffers
  writeback: Refine the show_inode_state() macro definition
  fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name
  mnt_idmapping: Use kmemdup_array instead of kmemdup for multiple allocation
  netfs: Delete subtree of 'fs/netfs' when netfs module exits
  fs: use LIST_HEAD() to simplify code
  inode: make i_state a u32
  inode: port __I_LRU_ISOLATING to var event
  vfs: fix race between evice_inodes() and find_inode()&iput()
  inode: port __I_NEW to var event
  inode: port __I_SYNC to var event
  fs: reorder i_state bits
  fs: add i_state helpers
  MAINTAINERS: add the VFS git tree
  fs: s/__u32/u32/ for s_fsnotify_mask
  ...
2024-09-16 08:35:09 +02:00
Christophe Leroy
d175ee98fe mm: Define VM_DROPPABLE for powerpc/32
Commit 9651fcedf7 ("mm: add MAP_DROPPABLE for designating always
lazily freeable mappings") only adds VM_DROPPABLE for 64 bits
architectures.

In order to also use the getrandom vDSO implementation on powerpc/32,
use VM_ARCH_1 for VM_DROPPABLE on powerpc/32. This is possible because
VM_ARCH_1 is used for VM_SAO on powerpc and VM_SAO is only for
powerpc/64. It is used in combination with PROT_SAO in some parts of
code that are restricted to CONFIG_PPC64 through #ifdefs, it is
therefore possible to define VM_SAO for CONFIG_PPC64 only.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:36 +02:00
Jakub Kicinski
3b7dc7000e bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZuH9UQAKCRDbK58LschI
 g0/zAP99WOcCBp1M/jSTUOba230+eiol7l5RirDEA6wu7TqY2QEAuvMG0KfCCpTI
 I0WqStrK1QMbhwKPodJC1k+17jArKgw=
 =jfMU
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-09-11

We've added 12 non-merge commits during the last 16 day(s) which contain
a total of 20 files changed, 228 insertions(+), 30 deletions(-).

There's a minor merge conflict in drivers/net/netkit.c:
  00d066a4d4 ("netdev_features: convert NETIF_F_LLTX to dev->lltx")
  d966087948 ("netkit: Disable netpoll support")

The main changes are:

1) Enable bpf_dynptr_from_skb for tp_btf such that this can be used
   to easily parse skbs in BPF programs attached to tracepoints,
   from Philo Lu.

2) Add a cond_resched() point in BPF's sock_hash_free() as there have
   been several syzbot soft lockup reports recently, from Eric Dumazet.

3) Fix xsk_buff_can_alloc() to account for queue_empty_descs which
   got noticed when zero copy ice driver started to use it,
   from Maciej Fijalkowski.

4) Move the xdp:xdp_cpumap_kthread tracepoint before cpumap pushes skbs
   up via netif_receive_skb_list() to better measure latencies,
   from Daniel Xu.

5) Follow-up to disable netpoll support from netkit, from Daniel Borkmann.

6) Improve xsk selftests to not assume a fixed MAX_SKB_FRAGS of 17 but
   instead gather the actual value via /proc/sys/net/core/max_skb_frags,
   also from Maciej Fijalkowski.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  sock_map: Add a cond_resched() in sock_hash_free()
  selftests/bpf: Expand skb dynptr selftests for tp_btf
  bpf: Allow bpf_dynptr_from_skb() for tp_btf
  tcp: Use skb__nullable in trace_tcp_send_reset
  selftests/bpf: Add test for __nullable suffix in tp_btf
  bpf: Support __nullable argument suffix for tp_btf
  bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcv
  selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knob
  xsk: Bump xsk_queue::queue_empty_descs in xp_can_alloc()
  tcp_bpf: Remove an unused parameter for bpf_tcp_ingress()
  bpf, sockmap: Correct spelling skmsg.c
  netkit: Disable netpoll support

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================

Link: https://patch.msgid.link/20240911211525.13834-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 20:22:44 -07:00
Takashi Sakamoto
f1cba5212e firewire: core: rename cause flag of tracepoints event
The flag of FW_ISO_CONTEXT_COMPLETIONS_CAUSE_IRQ directly causes hardIRQ
request by 1394 OHCI hardware when the corresponding isochronous packet is
transferred, however it is not so directly associated to hardIRQ
processing itself.

This commit renames the flag so that it relates to interrupt parameter of
internal packet data.

Link: https://lore.kernel.org/r/20240912133038.238786-6-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-09-12 22:30:38 +09:00
David Howells
8f246b7c0a
netfs: Cancel dirty folios that have no storage destination
Kafs wants to be able to cache the contents of directories (and symlinks),
but whilst these are downloaded from the server with the FS.FetchData RPC
op and similar, the same as for regular files, they can't be updated by
FS.StoreData, but rather have special operations (FS.MakeDir, etc.).

Now, rather than redownloading a directory's content after each change made
to that directory, kafs modifies the local blob.  This blob can be saved
out to the cache, and since it's using netfslib, kafs just marks the folios
dirty and lets ->writepages() on the directory take care of it, as for an
regular file.

This is fine as long as there's a cache as although the upload stream is
disabled, there's a cache stream to drive the procedure.  But if the cache
goes away in the meantime, suddenly there's no way do any writes and the
code gets confused, complains "R=%x: No submit" to dmesg and leaves the
dirty folio hanging.

Fix this by just cancelling the store of the folio if neither stream is
active.  (If there's no cache at the time of dirtying, we should just not
mark the folio dirty).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-23-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 12:20:41 +02:00
David Howells
ee4cdf7ba8
netfs: Speed up buffered reading
Improve the efficiency of buffered reads in a number of ways:

 (1) Overhaul the algorithm in general so that it's a lot more compact and
     split the read submission code between buffered and unbuffered
     versions.  The unbuffered version can be vastly simplified.

 (2) Read-result collection is handed off to a work queue rather than being
     done in the I/O thread.  Multiple subrequests can be processes
     simultaneously.

 (3) When a subrequest is collected, any folios it fully spans are
     collected and "spare" data on either side is donated to either the
     previous or the next subrequest in the sequence.

Notes:

 (*) Readahead expansion is massively slows down fio, presumably because it
     causes a load of extra allocations, both folio and xarray, up front
     before RPC requests can be transmitted.

 (*) RDMA with cifs does appear to work, both with SIW and RXE.

 (*) PG_private_2-based reading and copy-to-cache is split out into its own
     file and altered to use folio_queue.  Note that the copy to the cache
     now creates a new write transaction against the cache and adds the
     folios to be copied into it.  This allows it to use part of the
     writeback I/O code.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 12:20:41 +02:00
David Howells
983cdcf8fe
netfs: Simplify the writeback code
Use the new folio_queue structures to simplify the writeback code.  The
problem with referring to the i_pages xarray directly is that we may have
gaps in the sequence of folios we're writing from that we need to skip when
we're removing the writeback mark from the folios we're writing back from.

At the moment the code tries to deal with this by carefully tracking the
gaps in each writeback stream (eg. write to server and write to cache) and
divining when there's a gap that spans folios (something that's not helped
by folios not being a consistent size).

Instead, the folio_queue buffer contains pointers only the folios we're
dealing with, has them in ascending order and indicates a gap by placing
non-consequitive folios next to each other.  This makes it possible to
track where we need to clean up to by just keeping track of where we've
processed to on each stream and taking the minimum.

Note that the I/O iterator is always rounded up to the end of the folio,
even if that is beyond the EOF position, so that the cache can do DIO from
the page.  The excess space is cleared, though mmapped writes clobber it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-18-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 12:20:40 +02:00
David Howells
cd0277ed0c
netfs: Use new folio_queue data type and iterator instead of xarray iter
Make the netfs write-side routines use the new folio_queue struct to hold a
rolling buffer of folios, with the issuer adding folios at the tail and the
collector removing them from the head as they're processed instead of using
an xarray.

This will allow a subsequent patch to simplify the write collector.

The primary mark (as tested by folioq_is_marked()) is used to note if the
corresponding folio needs putting.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-16-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 12:20:40 +02:00
Mina Almasry
8ab79ed50c page_pool: devmem support
Convert netmem to be a union of struct page and struct netmem. Overload
the LSB of struct netmem* to indicate that it's a net_iov, otherwise
it's a page.

Currently these entries in struct page are rented by the page_pool and
used exclusively by the net stack:

struct {
	unsigned long pp_magic;
	struct page_pool *pp;
	unsigned long _pp_mapping_pad;
	unsigned long dma_addr;
	atomic_long_t pp_ref_count;
};

Mirror these (and only these) entries into struct net_iov and implement
netmem helpers that can access these common fields regardless of
whether the underlying type is page or net_iov.

Implement checks for net_iov in netmem helpers which delegate to mm
APIs, to ensure net_iov are never passed to the mm stack.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-6-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11 20:44:31 -07:00
Philo Lu
edd3f6f758 tcp: Use skb__nullable in trace_tcp_send_reset
Replace skb with skb__nullable as the argument name. The suffix tells
bpf verifier through btf that the arg could be NULL and should be
checked in tp_btf prog.

For now, this is the only nullable argument in tcp tracepoints.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240911033719.91468-4-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11 08:56:42 -07:00
David Sterba
ca283ea992 btrfs: constify more pointer parameters
Continue adding const to parameters.  This is for clarity and minor
addition to safety. There are some minor effects, in the assembly code
and .ko measured on release config.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10 16:51:22 +02:00
David Sterba
06de42c5a9 btrfs: rename __extent_writepage() and drop double underscores
The function does not follow the pattern where the underscores would be
justified, so rename it.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10 16:51:19 +02:00
Josef Bacik
9e97e8b277 btrfs: update the writepage tracepoint to take a folio
Willy is wanting to get rid of page->index, convert the writepage
tracepoint to take a folio so we can do folio->index instead of
page->index.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10 16:51:13 +02:00
Sean Anderson
038eb433dc dma-mapping: add tracing for dma-mapping API calls
When debugging drivers, it can often be useful to trace when memory gets
(un)mapped for DMA (and can be accessed by the device). Add some
tracepoints for this purpose.

Use u64 instead of phys_addr_t and dma_addr_t (and similarly %llx instead
of %pa) because libtraceevent can't handle typedefs in all cases.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-09-10 07:47:19 +03:00
Hans de Goede
56d8b784c5 hwmon fixes for v6.11-rc7
hp-wmi-sensors: Check if WMI event data exists before accessing it
 
 ltc2991: fix register bits defines
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAmbYrGEACgkQyx8mb86f
 mYERyw//V+MQSPf+wmRcM7gU0+u+Ylbl98wbyaO6+YyTiq2XQJgbrk2AnNJFxqu9
 tkv2Xb3xFu1EB1V6Xz5kADzaP6Wp9W+4TK+PrlN0GCa6FZZwbGZBuhIkLS1492ex
 Zf8oRUNNiRnIAHb+5+E0aonDbsHSWe7Ff5WVTRfKBn8rrQW8pchPx4h/Y+o54Llo
 BEOQTA0rN51eg0RL7e9ECFUmBLRNuZA6E1w25F15VA50O4Xx2i6lL7PwPDb7KKpa
 g6fy2JTtr7+hKOiJMR4KdodsOZHvW4RgZ0MvWQkOZsccHw8CLp8c4egIxPYQ41Zu
 Ef4imKHPDVsiSXHW6rjQro0IQWBvRu1/6R1ibbDlXTWN5dkxGaByb0tlrRo3QBQn
 CvoHqkQkhioaX2k1I7bBDbyz6AregHLBJ7kpLq4ClcfW4QmxJ9zPPSDekyIyQiTZ
 vU/OljM/Je6CIbzspcaKS8uhNdykw3FX/LWLVwUEFJawJHV8b7XD+S3aJq7zQqEl
 R0hIeDyWiakP2ca7gozey1RjxpFDNQhCeZUqEaTlLhHlDykvZAceEXNuoczrpKNq
 UaBincY64PVBmzLPpjpDVg7C7C023xoqrWKSSd9kFsl9KzbKTH0XzkVPtD9cjQC5
 d34++QqhffAtJuekIbrWmtfhZsvkgQrfRv1MowNsbce0Lz7dRGE=
 =sSGh
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v6.11-rc7' into review-hans

Merge "hwmon fixes for v6.11-rc7" into review-hans to bring in
commit a54da9df75 ("hwmon: (hp-wmi-sensors) Check if WMI event
data exists").

This is a dependency for a set of WMI event data refactoring changes.
2024-09-05 16:57:36 +02:00
Uwe Kleine-König
1c3e34bf88 pwm: Make info in traces about affected pwm more useful
The hashed pointer isn't useful to identify the pwm device. Instead
store and emit chipid and hwpwm.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20240705211452.1157967-2-u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2024-09-05 11:14:14 +02:00
David Howells
c57de2a925
netfs: Remove NETFS_COPY_TO_CACHE
Remove NETFS_COPY_TO_CACHE as it isn't used anymore.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-10-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05 11:00:42 +02:00
David Howells
51e9a86a4f
netfs: Reserve netfs_sreq_source 0 as unset/unknown
Reserve the 0-valued netfs_sreq_source to mean unset or unknown so that it
can be seen in the trace as such rather than appearing as
download-from-server when it's going to get switched to something else.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-9-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05 11:00:41 +02:00
David Howells
8f52de0077
netfs: Reduce number of conditional branches in netfs_perform_write()
Reduce the number of conditional branches in netfs_perform_write() by
merging in netfs_how_to_modify() and then creating a separate if-statement
for each way we might modify a folio.  Note that this means replicating the
data copy in each path.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-6-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05 11:00:41 +02:00
Tejun Heo
649e980dad Merge branch 'bpf/master' into for-6.12
Pull bpf/master to receive baebe9aaba ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation for
the DSQ iterator patchset.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-04 11:41:32 -10:00
Matthew Wilcox (Oracle)
7a87225ae2 x86: remove PG_uncached
Convert x86 to use PG_arch_2 instead of PG_uncached and remove
PG_uncached.

Link: https://lkml.kernel.org/r/20240821193445.2294269-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:46 -07:00
Matthew Wilcox (Oracle)
02e1960aaf mm: rename PG_mappedtodisk to PG_owner_2
This flag has similar constraints to PG_owner_priv_1 -- it is ignored by
core code, and is entirely for the use of the code which allocated the
folio.  Since the pagecache does not use it, individual filesystems can
use it.  The bufferhead code does use it, so filesystems which use the
buffer cache must not use it for another purpose.

Link: https://lkml.kernel.org/r/20240821193445.2294269-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:45 -07:00
Matthew Wilcox (Oracle)
e27ad6560e printf: remove %pGt support
Patch series "Increase the number of bits available in page_type".

Kent wants more than 16 bits in page_type, so I resurrected this old patch
and expanded it a bit.  It's a bit more efficient than our current scheme
(1 4-byte insn vs 3 insns of 13 bytes total) to test a single page type.


This patch (of 4):

An upcoming patch will convert page type from being a bitfield to a
single byte, so we will not be able to use %pG to print the page type
any more.  The printing of the symbolic name will be restored in that
patch.

Link: https://lkml.kernel.org/r/20240821173914.2270383-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20240821173914.2270383-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:42 -07:00
Takaya Saeki
b6273b55d8 filemap: add trace events for get_pages, map_pages, and fault
To allow precise tracking of page caches accessed, add new tracepoints
that trigger when a process actually accesses them.

The ureadahead program used by ChromeOS traces the disk access of programs
as they start up at boot up.  It uses mincore(2) or the
'mm_filemap_add_to_page_cache' trace event to accomplish this.  It stores
this information in a "pack" file and on subsequent boots, it will read
the pack file and call readahead(2) on the information so that disk
storage can be loaded into RAM before the applications actually need it.

A problem we see is that due to the kernel's readahead algorithm that can
aggressively pull in more data than needed (to try and accomplish the same
goal) and this data is also recorded.  The end result is that the pack
file contains a lot of pages on disk that are never actually used. 
Calling readahead(2) on these unused pages can slow down the system boot
up times.

To solve this, add 3 new trace events, get_pages, map_pages, and fault. 
These will be used to trace the pages are not only pulled in from disk,
but are actually used by the application.  Only those pages will be stored
in the pack file, and this helps out the performance of boot up.

With the combination of these 3 new trace events and
mm_filemap_add_to_page_cache, we observed a reduction in the pack file by
7.3% - 20% on ChromeOS varying by device.

Link: https://lkml.kernel.org/r/20240813100312.3930505-1-takayas@chromium.org
Signed-off-by: Takaya Saeki <takayas@chromium.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Junichi Uekawa <uekawa@chromium.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:10 -07:00
Matthew Wilcox (Oracle)
09022bc196 mm: remove PG_error
The PG_error bit is now unused; delete it and free up a bit in
page->flags.

Link: https://lkml.kernel.org/r/20240807193528.1865100-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:05 -07:00
Wei Yang
29943248af mm: improve code consistency with zonelist_* helper functions
Replace direct access to zoneref->zone, zoneref->zone_idx, or
zone_to_nid(zoneref->zone) with the corresponding zonelist_* helper
functions for consistency.

No functional change.

Link: https://lkml.kernel.org/r/20240729091717.464-1-shivankg@amd.com
Co-developed-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:55 -07:00
Chen Hanxiao
cef48236df NFS: trace: show TIMEDOUT instead of 0x6e
__nfs_revalidate_inode may return ETIMEDOUT.

print symbol of ETIMEDOUT in nfs trace:

before:
cat-5191 [005] 119.331127: nfs_revalidate_inode_exit: error=-110 (0x6e)

after:
cat-1738 [004] 44.365509: nfs_revalidate_inode_exit: error=-110 (TIMEDOUT)

Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01 10:04:55 -04:00
Julian Sun
459ca85ae1 writeback: Refine the show_inode_state() macro definition
Currently, the show_inode_state() macro only prints
part of the state of inode->i_state. Let’s improve it
to display more of its state.

Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Link: https://lore.kernel.org/r/20240828081359.62429-1-sunjunchao2870@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:22:41 +02:00
Christoph Hellwig
f951171044 ext4: remove tracing for FALLOC_FL_NO_HIDE_STALE
FALLOC_FL_NO_HIDE_STALE can't make it past vfs_fallocate (and if the
flag does what the name implies that's a good thing as it would be
highly dangerous).  Remove the dead tracing code for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240827065123.1762168-3-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-28 16:53:57 +02:00
Avri Altman
89835a58f5 scsi: ufs: Move UFS trace events to private header
UFS trace events are called exclusively from the UFS core drivers.  Make
those events private to the core driver.

The MAINTAINERS file does not need updating as the maintainership remains
the same and the relevant directory is already covered.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20240821055411.3128159-1-avri.altman@wdc.com
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-22 20:54:46 -04:00
Chuck Lever
dc0112e6d8 rpcrdma: Trace connection registration and unregistration
These new trace points record xarray indices and the time of
endpoint registration and unregistration, to co-ordinate with
device removal events.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-08-19 11:50:41 -04:00
Takashi Iwai
41776e4008 Merge branch 'topic/seq-filter-cleanup' into for-next
Pull ALSA sequencer cleanup.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-08-19 10:48:46 +02:00
Valentin Schneider
4f336dc07e context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
The "rcu_dyntick" naming convention has been turned into "rcu_watching" for
all helpers now, align the trace event to that.

To add to the confusion, the strings passed to the trace event are now
reversed: when RCU "starts" the dyntick / EQS state, it "stops" watching.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15 21:30:43 +05:30
Chao Yu
aaf8c0b9ae f2fs: reduce expensive checkpoint trigger frequency
We may trigger high frequent checkpoint for below case:
1. mkdir /mnt/dir1; set dir1 encrypted
2. touch /mnt/file1; fsync /mnt/file1
3. mkdir /mnt/dir2; set dir2 encrypted
4. touch /mnt/file2; fsync /mnt/file2
...

Although, newly created dir and file are not related, due to
commit bbf156f7af ("f2fs: fix lost xattrs of directories"), we will
trigger checkpoint whenever fsync() comes after a new encrypted dir
created.

In order to avoid such performance regression issue, let's record an
entry including directory's ino in global cache whenever we update
directory's xattr data, and then triggerring checkpoint() only if
xattr metadata of target file's parent was updated.

This patch updates to cover below no encryption case as well:
1) parent is checkpointed
2) set_xattr(dir) w/ new xnid
3) create(file)
4) fsync(file)

Fixes: bbf156f7af ("f2fs: fix lost xattrs of directories")
Reported-by: wangzijie <wangzijie1@honor.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reported-by: Yunlei He <heyunlei@hihonor.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15 15:26:39 +00:00
Linus Torvalds
4ac0f08f44 vfs-6.11-rc4.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZrym4AAKCRCRxhvAZXjc
 oqT3AP9ydoUNavaZcRayH8r3ybvz9+aJGJ6Q7NznFVCk71vn0gD/buLzmq96Muns
 M5DWHbft2AFwK0Rz2nx8j5OXUeHwrQg=
 =HZBL
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "VFS:

   - Fix the name of file lease slab cache. When file leases were split
     out of file locks the name of the file lock slab cache was used for
     the file leases slab cache as well.

   - Fix a type in take_fd() helper.

   - Fix infinite directory iteration for stable offsets in tmpfs.

   - When the icache is pruned all reclaimable inodes are marked with
     I_FREEING and other processes that try to lookup such inodes will
     block.

     But some filesystems like ext4 can trigger lookups in their inode
     evict callback causing deadlocks. Ext4 does such lookups if the
     ea_inode feature is used whereby a separate inode may be used to
     store xattrs.

     Introduce I_LRU_ISOLATING which pins the inode while its pages are
     reclaimed. This avoids inode deletion during inode_lru_isolate()
     avoiding the deadlock and evict is made to wait until
     I_LRU_ISOLATING is done.

  netfs:

   - Fault in smaller chunks for non-large folio mappings for
     filesystems that haven't been converted to large folios yet.

   - Fix the CONFIG_NETFS_DEBUG config option. The config option was
     renamed a short while ago and that introduced two minor issues.
     First, it depended on CONFIG_NETFS whereas it wants to depend on
     CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
     does. Second, the documentation for the config option wasn't fixed
     up.

   - Revert the removal of the PG_private_2 writeback flag as ceph is
     using it and fix how that flag is handled in netfs.

   - Fix DIO reads on 9p. A program watching a file on a 9p mount
     wouldn't see any changes in the size of the file being exported by
     the server if the file was changed directly in the source
     filesystem. Fix this by attempting to read the full size specified
     when a DIO read is requested.

   - Fix a NULL pointer dereference bug due to a data race where a
     cachefiles cookies was retired even though it was still in use.
     Check the cookie's n_accesses counter before discarding it.

  nsfs:

   - Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
     the kernel is writing to userspace.

  pidfs:

   - Prevent the creation of pidfds for kthreads until we have a
     use-case for it and we know the semantics we want. It also confuses
     userspace why they can get pidfds for kthreads.

  squashfs:

   - Fix an unitialized value bug reported by KMSAN caused by a
     corrupted symbolic link size read from disk. Check that the
     symbolic link size is not larger than expected"

* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Squashfs: sanity check symbolic link size
  9p: Fix DIO read through netfs
  vfs: Don't evict inode under the inode lru traversing context
  netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
  netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
  file: fix typo in take_fd() comment
  pidfd: prevent creation of pidfds for kthreads
  netfs: clean up after renaming FSCACHE_DEBUG config
  libfs: fix infinite directory reads for offset dir
  nsfs: fix ioctl declaration
  fs/netfs/fscache_cookie: add missing "n_accesses" check
  filelock: fix name of file_lease slab cache
  netfs: Fault in smaller chunks for non-large folio mappings
2024-08-14 09:06:28 -07:00
David Howells
7b589a9b45
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly.  The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct.  Further, we secondarily sample the
cache state and make contradictory decisions later.

The issue arises because we set up the cache resources, which allows the
cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.

Fix this in the following way:

 (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
     ->init_request() rather than trying to juggle that in
     netfs_alloc_request().

 (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
     to be done, then PG_private_2 is to be used rather than only setting
     it if we decide to cache and then having netfs_rreq_unlock_folios()
     set the non-PG_private_2 writeback-to-cache if it wasn't set.

 (3) Split netfs_rreq_unlock_folios() into two functions, one of which
     contains the deprecated code for using PG_private_2 to avoid
     accidentally doing the writeback path - and always use it if
     USE_PGPRIV2 is set.

 (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
     wait for PG_private_2.  This function is deprecated and only used by
     ceph anyway, and so label it so.

 (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
     fscache_operation_valid() on the cache_resources instead.  This has
     the advantage of picking up the result of netfs_begin_cache_read() and
     fscache_begin_write_operation() - which are called after the object is
     initialised and will wait for the cache to come to a usable state.

Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that.  Without this as well, things like:

 rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {

and:

 WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386

may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.

Fixes: 2ff1e97587 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Hristo Venev <hristo@venev.name>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12 22:03:27 +02:00
David Howells
8e5ced7804
netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
This reverts commit ae678317b9.

Revert the patch that removes the deprecated use of PG_private_2 in
netfslib for the moment as Ceph is actually still using this to track
data copied to the cache.

Fixes: ae678317b9 ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12 22:03:27 +02:00
Jithu Joseph
61b7496453 trace: platform/x86/intel/ifs: Add SBAF trace support
Add tracing support for the SBAF IFS tests, which may be useful for
debugging systems that fail these tests. Log details like test content
batch number, SBAF bundle ID, program index and the exact errors or
warnings encountered by each HT thread during the test.

Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240801051814.1935149-5-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-08-12 16:36:11 +02:00
Valentin Schneider
bf66471987 context_tracking, rcu: Rename struct context_tracking .dynticks_nesting into .nesting
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, reflect that change in the related helpers.

[ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ]

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-11 11:05:09 +05:30
Takashi Iwai
4004f3029e Merge branch 'topic/control-lookup-rwlock' into for-next
Pull control lookup optimization changes.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-08-09 14:25:24 +02:00
Linus Torvalds
183d46ff42 Including fixes from wireless, bleutooth, BPF and netfilter.
Current release - regressions:
 
  - core: drop bad gso csum_start and offset in virtio_net_hdr
 
  - wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove
 
  - eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic()
 
  - phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115c
 
 Current release - new code bugs:
 
  - smc: prevent UAF in inet_create()
 
  - bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend
 
  - eth: bnxt: reject unsupported hash functions
 
 Previous releases - regressions:
 
  - sched: act_ct: take care of padding in struct zones_ht_key
 
  - netfilter: fix null-ptr-deref in iptable_nat_table_init().
 
  - tcp: adjust clamping window for applications specifying SO_RCVBUF
 
 Previous releases - always broken:
 
  - ethtool: rss: small fixes to spec and GET
 
  - mptcp:
    - fix signal endpoint re-add
    - pm: fix backup support in signal endpoints
 
  - wifi: ath12k: fix soft lockup on suspend
 
  - eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings()
 
  - eth: ice: fix AF_XDP ZC timeout and concurrency issues
 
  - eth: mlx5:
    - fix missing lock on sync reset reload
    - fix error handling in irq_pool_request_irq
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmarelYSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkdPwP/2lxh5Cc/SK/mJjBvyBdO2+cuNR0M4Kf
 UV2PA4oOLREYXEPgmOtJQ/VcsmOLa1pEPAdJarZwB5ztalgRKkIogHzzjfY43Fmx
 rAgZqGnIJrWRtepDM8jAaEJC0bEKywH5Wo6eh+oi0GCS07B48lpYATI/1gQdwBjV
 CgcZTQd/04PVx69Bi8LiQyfbwppAsIQQa9YaGmqGuQa74Hp9gz+4VyeRFg54h3CP
 6fWwRHNVO8GsGNA1UgWbeXXajhUU+AG/gDThqIcgxs3KmrREzU9EvcQ70XCzphOA
 JoUy9yykWRGen7aFGrggfY4NzjQmL6g+/rCvbIMfidRsJKBaQYBeMUkbQRnAh34V
 Pe3aSBEnv1aBKaQA7yntdqYGRJ2Sz56a1kjCvI86eDjExt4UshbZi+TfuQSj6zAY
 /ejOawhEYPFZw2FlvkBetyck7iroG1404DoBPghoRu9dG2e3p0eJOZfXiEzfS2qB
 PsJtMPiexSdEcY3sxVKOMh4hx0Zjkqest7laitb1Lrbg5pLhEiHvDkyhoUPGI2oa
 a3N4rsBc6sgSTQfJsx4nXFfKfNQsNu2Nr308BDk16XOHZ4J7Hgt6xR6STDo9ACz1
 Gy5munCN2AhGSdhR5niFI3ocNpDM5oWkztBfjz7YmIQv18NcU5nO8ByYytXMbglq
 sSsnR+VbYeCu
 =z53B
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from wireless, bleutooth, BPF and netfilter.

  Current release - regressions:

   - core: drop bad gso csum_start and offset in virtio_net_hdr

   - wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove

   - eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic()

   - phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and
     aqr115c

  Current release - new code bugs:

   - smc: prevent UAF in inet_create()

   - bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend

   - eth: bnxt: reject unsupported hash functions

  Previous releases - regressions:

   - sched: act_ct: take care of padding in struct zones_ht_key

   - netfilter: fix null-ptr-deref in iptable_nat_table_init().

   - tcp: adjust clamping window for applications specifying SO_RCVBUF

  Previous releases - always broken:

   - ethtool: rss: small fixes to spec and GET

   - mptcp:
      - fix signal endpoint re-add
      - pm: fix backup support in signal endpoints

   - wifi: ath12k: fix soft lockup on suspend

   - eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings()

   - eth: ice: fix AF_XDP ZC timeout and concurrency issues

   - eth: mlx5:
      - fix missing lock on sync reset reload
      - fix error handling in irq_pool_request_irq"

* tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
  mptcp: fix duplicate data handling
  mptcp: fix bad RCVPRUNED mib accounting
  ipv6: fix ndisc_is_useropt() handling for PIO
  igc: Fix double reset adapter triggered from a single taprio cmd
  net: MAINTAINERS: Demote Qualcomm IPA to "maintained"
  net: wan: fsl_qmc_hdlc: Discard received CRC
  net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutex
  net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptys
  net/mlx5e: Fix CT entry update leaks of modify header context
  net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability
  net/mlx5: Fix missing lock on sync reset reload
  net/mlx5: Lag, don't use the hardcoded value of the first port
  net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule
  net/mlx5: Fix error handling in irq_pool_request_irq
  net/mlx5: Always drain health in shutdown callback
  net: Add skbuff.h to MAINTAINERS
  r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY
  netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
  netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
  net: drop bad gso csum_start and offset in virtio_net_hdr
  ...
2024-08-01 09:42:09 -07:00