Commit Graph

312 Commits

Author SHA1 Message Date
Linus Torvalds
d6f38c1239 tracing changes for 6.17
- Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing
 
   When tracefs was first introduced back in 2014, the directory
   /sys/kernel/tracing was added and is the designated location to mount
   tracefs. To keep backward compatibility, tracefs was auto-mounted in
   /sys/kernel/debug/tracing as well.
 
   All distros now mount tracefs on /sys/kernel/tracing. Having it seen in two
   different locations has lead to various issues and inconsistencies.
 
   The VFS folks have to also maintain debugfs_create_automount() for this
   single user.
 
   It's been over 10 years. Tooling and scripts should start replacing the
   debugfs location with the tracefs one. The reason tracefs was created in the
   first place was to allow access to the tracing facilities without the need
   to configure debugfs into the kernel. Using tracefs should now be more
   robust.
 
   A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED
   which is default y, so that the kernel is still built with the automount.
   This config allows those that want to remove the automount from debugfs to
   do so.
 
   When tracefs is accessed from /sys/kernel/debug/tracing, the following
   printk is triggerd:
 
    pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n");
 
   This gives users another 5 years to fix their scripts.
 
 - Use queue_rcu_work() instead of call_rcu() for freeing event filters
 
   The number of filters to be free can be many depending on the number of
   events within an event system. Freeing them from softirq context can
   potentially cause undesired latency. Use the RCU workqueue to free them
   instead.
 
 - Remove pointless memory barriers in latency code
 
   Memory barriers were added to some of the latency code a long time ago with
   the idea of "making them visible", but that's not what memory barriers are
   for. They are to synchronize access between different variables. There was
   no synchronization here making them pointless.
 
 - Remove "__attribute__()" from the type field of event format
 
   When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y and
   PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with the
   following:
 
     field:const char * filename;      offset:24;      size:8; signed:0;
 
   Turns into:
 
     field:const char __attribute__((btf_type_tag("user"))) * filename;      offset:24;      size:8; signed:0;
 
   This confuses parsers. Add code to strip these tags from the strings.
 
 - Add eprobe config option CONFIG_EPROBE_EVENTS
 
   Eprobes were added back in 5.15 but were only enabled when another probe was
   enabled (kprobe, fprobe, uprobe, etc). The eprobes had no config option
   of their own. Add one as they should be a separate entity.
 
   It's default y to keep with the old kernels but still has dependencies on
   TRACING and HAVE_REGS_AND_STACK_ACCESS_API.
 
 - Add eprobe documentation
 
   When eprobes were added back in 5.15 no documentation was added to describe
   them. This needs to be rectified.
 
 - Replace open coded cpumask_next_wrap() in move_to_next_cpu()
 
 - Have preemptirq_delay_run() use off-stack CPU mask
 
 - Remove obsolete comment about pelt_cfs event
 
   DECLARE_TRACE() appends "_tp" to trace events now, but the comment above
   pelt_cfs still mentioned appending it manually.
 
 - Remove EVENT_FILE_FL_SOFT_MODE flag
 
   The SOFT_MODE flag was required when the soft enabling and disabling of
   trace events was first introduced. But there was a bug with this approach
   as it only worked for a single instance. When multiple users required soft
   disabling and disabling the code was changed to have a ref count. The
   SOFT_MODE flag is now set iff the ref count is non zero. This is redundant
   and just reading the ref count is good enough.
 
 - Fix typo in comment
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIt5ZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qvriAPsEbOEgMrPF1Tdj1mHLVajYTxI8ft5J
 aX5bfM2cDDRVcgEA57JHOXp4d05dj555/hgAUuCWuFp/E0Anp45EnFTedgQ=
 =wKZW
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing

   When tracefs was first introduced back in 2014, the directory
   /sys/kernel/tracing was added and is the designated location to mount
   tracefs. To keep backward compatibility, tracefs was auto-mounted in
   /sys/kernel/debug/tracing as well.

   All distros now mount tracefs on /sys/kernel/tracing. Having it seen
   in two different locations has lead to various issues and
   inconsistencies.

   The VFS folks have to also maintain debugfs_create_automount() for
   this single user.

   It's been over 10 years. Tooling and scripts should start replacing
   the debugfs location with the tracefs one. The reason tracefs was
   created in the first place was to allow access to the tracing
   facilities without the need to configure debugfs into the kernel.
   Using tracefs should now be more robust.

   A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED which is
   default y, so that the kernel is still built with the automount. This
   config allows those that want to remove the automount from debugfs to
   do so.

   When tracefs is accessed from /sys/kernel/debug/tracing, the
   following printk is triggerd:

     pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n");

   This gives users another 5 years to fix their scripts.

 - Use queue_rcu_work() instead of call_rcu() for freeing event filters

   The number of filters to be free can be many depending on the number
   of events within an event system. Freeing them from softirq context
   can potentially cause undesired latency. Use the RCU workqueue to
   free them instead.

 - Remove pointless memory barriers in latency code

   Memory barriers were added to some of the latency code a long time
   ago with the idea of "making them visible", but that's not what
   memory barriers are for. They are to synchronize access between
   different variables. There was no synchronization here making them
   pointless.

 - Remove "__attribute__()" from the type field of event format

   When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y
   and PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with
   the following:

     field:const char * filename;      offset:24;      size:8; signed:0;

   Turns into:

     field:const char __attribute__((btf_type_tag("user"))) * filename;      offset:24;      size:8; signed:0;

   This confuses parsers. Add code to strip these tags from the strings.

 - Add eprobe config option CONFIG_EPROBE_EVENTS

   Eprobes were added back in 5.15 but were only enabled when another
   probe was enabled (kprobe, fprobe, uprobe, etc). The eprobes had no
   config option of their own. Add one as they should be a separate
   entity.

   It's default y to keep with the old kernels but still has
   dependencies on TRACING and HAVE_REGS_AND_STACK_ACCESS_API.

 - Add eprobe documentation

   When eprobes were added back in 5.15 no documentation was added to
   describe them. This needs to be rectified.

 - Replace open coded cpumask_next_wrap() in move_to_next_cpu()

 - Have preemptirq_delay_run() use off-stack CPU mask

 - Remove obsolete comment about pelt_cfs event

   DECLARE_TRACE() appends "_tp" to trace events now, but the comment
   above pelt_cfs still mentioned appending it manually.

 - Remove EVENT_FILE_FL_SOFT_MODE flag

   The SOFT_MODE flag was required when the soft enabling and disabling
   of trace events was first introduced. But there was a bug with this
   approach as it only worked for a single instance. When multiple users
   required soft disabling and disabling the code was changed to have a
   ref count. The SOFT_MODE flag is now set iff the ref count is non
   zero. This is redundant and just reading the ref count is good
   enough.

 - Fix typo in comment

* tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Add documentation about eprobes
  tracing: Have eprobes have their own config option
  tracing: Remove "__attribute__()" from the type field of event format
  tracing: Deprecate auto-mounting tracefs in debugfs
  tracing: Fix comment in trace_module_remove_events()
  tracing: Remove EVENT_FILE_FL_SOFT_MODE flag
  tracing: Remove pointless memory barriers
  tracing/sched: Remove obsolete comment on suffixes
  kernel: trace: preemptirq_delay_test: use offstack cpu mask
  tracing: Use queue_rcu_work() to free filters
  tracing: Replace opencoded cpumask_next_wrap() in move_to_next_cpu()
2025-08-01 10:29:36 -07:00
Linus Torvalds
2be6a7503d Remove or hide unused tracepoints
Tracepoints take up memory (around 5K per tracepoint) even when they are
 unused. Changes are being made to detect when a tracepoint is defined but
 unused and a warning is shown at build. But those changes are not yet
 ready for inclusion.
 
 - Fix some of the unused tracepoints that it detected
 
   Some tracepoints were removed and others were hidden by config settings
   to match the config settings of where they are instantiated. Some
   tracepoints were moved into architecture specific code as only one
   architecture used them.
 
 - Call the ftrace_test_filter tracepoint in an unreachable if statement
 
   The ftrace_test_filter tracepoint which is defined when ftrace selftests
   are configured and is used to test the filter logic, but the tracepoint is
   not actually called. It is put into an if statement to not have it get
   compiled out, but also not warn for not being used.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIlYqxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qisrAQD+pu2en9LAXLcgbFxQOwhbACpxOpmT
 3LiE2+MvDR3ckQD/Vyi31XebdRmj3leJ7ENf28oa155y1pyK/onrPgDHyQ4=
 =nFfn
 -----END PGP SIGNATURE-----

Merge tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracepoint cleanup from Steven Rostedt:
 "Remove or hide unused tracepoints

  Tracepoints take up memory (around 5K per tracepoint) even when they
  are unused. Changes are being made to detect when a tracepoint is
  defined but unused and a warning is shown at build. But those changes
  are not yet ready for inclusion.

   - Fix some of the unused tracepoints that it detected

     Some tracepoints were removed and others were hidden by config
     settings to match the config settings of where they are
     instantiated. Some tracepoints were moved into architecture
     specific code as only one architecture used them.

   - Call the ftrace_test_filter tracepoint in an unreachable if
     statement

     The ftrace_test_filter tracepoint which is defined when ftrace
     selftests are configured and is used to test the filter logic, but
     the tracepoint is not actually called. It is put into an if
     statement to not have it get compiled out, but also not warn for
     not being used"

* tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING
  powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64
  tracing: Call trace_ftrace_test_filter() for the event
  tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
  binder: Remove unused binder lock events
  PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS
  PM: tracing: Hide device_pm_callback events under PM_SLEEP
  PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE
  PM: cpufreq: powernv/tracing: Move powernv_throttle trace event
  alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured
  tracing, AER: Hide PCIe AER event when PCIEAER is not configured
2025-07-30 16:41:58 -07:00
Steven Rostedt
0dd1274a05 tracing: Have eprobes have their own config option
Eprobes were added in 5.15 and were selected whenever any of the other
probe events were selected. If kprobe events were enabled (which it is by
default if kprobes are enabled) it would enable eprobe events as well. The
same for uprobes and fprobes.

Have eprobes have its own config and it gets enabled by default if tracing
is enabled.

Link: https://lore.kernel.org/all/20250729102636.b7cce553e7cc263722b12365@kernel.org/

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/20250730140945.360286733@kernel.org
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-30 10:38:43 -04:00
Steven Rostedt
9f0cb91767 tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
The ipi tracepoints are mostly generic, but the tracepoints ipi_raise,
ipi_entry and ipi_exit are only used by arm and arm64. This means these
trace events are wasting memory in all the other architectures that do not
use them.

Add CONFIG_HAVE_EXTRA_IPI_TRACEPOINTS and have arm and arm64 select it to
enable these trace events. The config makes it easy if other architectures
decide to trace these as well.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Will Deacon <will@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250722103714.64eba013@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-23 14:58:55 -04:00
Steven Rostedt
9ba817fb7c tracing: Deprecate auto-mounting tracefs in debugfs
In January 2015, tracefs was created to allow access to the tracing
infrastructure without needing to compile in debugfs. When tracefs is
configured, the directory /sys/kernel/tracing will exist and tooling is
expected to use that path to access the tracing infrastructure.

To allow backward compatibility, when debugfs is mounted, it would
automount tracefs in its "tracing" directory so that tooling that had hard
coded /sys/kernel/debug/tracing would still work.

It has been over 10 years since the new interface was introduced, and all
tooling should now be using it. Start the process of deprecating the old
path so that it doesn't need to be maintained anymore.

A new config is added to allow distributions to disable automounting of
tracefs on debugfs.

If /sys/kernel/debug/tracing is accessed, a pr_warn() will trigger stating:

"NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030"

Expect to remove this feature in 5 years (2030).

Cc: <linux-trace-users@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/20250722170806.40c068c6@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-23 10:47:26 -04:00
Steven Rostedt
4d6d0a6263 tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
Ftrace is tightly coupled with architecture specific code because it
requires the use of trampolines written in assembly. This means that when
a new feature or optimization is made, it must be done for all
architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are
added to denote which architecture has the new enhancement so that other
architectures can still function until they too have been updated.

The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the
DYNAMIC_FTRACE work, but now every architecture that implements
DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant
with the HAVE_DYNAMIC_FTRACE.

Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where
applicable.

Link: https://lore.kernel.org/all/20250703154916.48e3ada7@gandalf.local.home/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250704104838.27a18690@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:15:56 -04:00
Steven Rostedt
9b4d5d330f ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it
ftrace has two flavors:

 1) static: Where every function always calls the ftrace trampoline

 2) dynamic: Where each function has nops that can be changed on demand to
             jump to the ftrace trampoline when needed.

The static flavor has very high performance overhead and was only created
to make it easier for architectures to implement the dynamic flavor. An
architecture developer can first implement the static ftrace to make sure
the trampolines work before working on the more complicated dynamic aspect
of ftrace. Once the architecture can support dynamic ftrace, there's no
reason to continue to support the static flavor. In fact, the static
flavor tends to bitrot and bugs start to appear in them.

Remove the prompt to pick DYNAMIC_FTRACE and simply enable it if the
architecture supports it.

Link: https://lore.kernel.org/all/f7e12c6d-892e-4ca3-9ef0-fbb524d04a48@ghiti.fr/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: ChenMiao <chenmiao.ku@gmail.com>
Link: https://lore.kernel.org/20250703115222.2d7c8cd5@batman.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22 20:13:16 -04:00
Steven Rostedt
b81ff11c21 ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
The option PROBE_EVENTS_BTF_ARGS enables the functions
btf_find_func_proto() and btf_get_func_param() which are used by the
function argument tracing code. The option FUNCTION_TRACE_ARGS was
dependent on the same configs that PROBE_EVENTS_BTF_ARGS was dependent on,
but it was also dependent on PROBE_EVENTS_BTF_ARGS. In fact, if
PROBE_EVENTS_BTF_ARGS is supported then FUNCTION_TRACE_ARGS is supported.

Just make FUNCTION_TRACE_ARGS depend on PROBE_EVENTS_BTF_ARGS.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250401113601.17fa1129@gandalf.local.home
Fixes: 533c20b062 ("ftrace: Add print_function_args()")
Closes: https://lore.kernel.org/all/DB9PR08MB75820599801BAD118D123D7D93AD2@DB9PR08MB7582.eurprd08.prod.outlook.com/
Reported-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:50:56 -04:00
Sven Schnelle
ff5c9c576e ftrace: Add support for function argument to graph tracer
Wire up the code to print function arguments in the function graph
tracer. This functionality can be enabled/disabled during runtime with
options/funcgraph-args.

Example usage:

6)              | dummy_xmit [dummy](skb = 0x8887c100, dev = 0x872ca000) {
6)              |   consume_skb(skb = 0x8887c100) {
6)              |     skb_release_head_state(skb = 0x8887c100) {
6)  0.178 us    |       sock_wfree(skb = 0x8887c100)
6)  0.627 us    |     }

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Guo Ren <guoren@kernel.org>
Cc: Donglin Peng <dolinux.peng@gmail.com>
Cc: Zheng Yejian <zhengyejian@huaweicloud.com>
Link: https://lore.kernel.org/20250227185822.810321199@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04 11:27:23 -05:00
Sven Schnelle
533c20b062 ftrace: Add print_function_args()
Add a function to decode argument types with the help of BTF. Will
be used to display arguments in the function and function graph
tracer.

It can only handle simply arguments and up to FTRACE_REGS_MAX_ARGS number
of arguments. When it hits a max, it will print ", ...":

   page_to_skb(vi=0xffff8d53842dc980, rq=0xffff8d53843a0800, page=0xfffffc2e04337c00, offset=6160, len=64, truesize=1536, ...)

And if it hits an argument that is not recognized, it will print the raw
value and the type of argument it is:

   make_vfsuid(idmap=0xffffffff87f99db8, fs_userns=0xffffffff87e543c0, kuid=0x0 (STRUCT))
   __pti_set_user_pgtbl(pgdp=0xffff8d5384ab47f8, pgd=0x110e74067 (STRUCT))

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Guo Ren <guoren@kernel.org>
Cc: Donglin Peng <dolinux.peng@gmail.com>
Cc: Zheng Yejian <zhengyejian@huaweicloud.com>
Link: https://lore.kernel.org/20250227185822.639418500@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-04 11:27:23 -05:00
Masami Hiramatsu (Google)
4346ba1604 fprobe: Rewrite fprobe on function-graph tracer
Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
 -  'nr_maxactive' field is deprecated.
 -  This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
    !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
    CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only
    on x86_64.
 -  Currently the entry size is limited in 15 * sizeof(long).
 -  If there is too many fprobe exit handler set on the same
    function, it will fail to probe.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/173519003970.391279.14406792285453830996.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26 10:50:05 -05:00
Masami Hiramatsu (Google)
a762e9267d ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
Add CONFIG_HAVE_FTRACE_GRAPH_FUNC kconfig in addition to ftrace_graph_func
macro check. This is for the other feature (e.g. FPROBE) which requires to
access ftrace_regs from fgraph_ops::entryfunc() can avoid compiling if
the fgraph can not pass the valid ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173519001472.391279.1174901685282588467.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26 10:50:04 -05:00
Masami Hiramatsu (Google)
0566cefe73 tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
With this change, fprobe events mostly use ftrace_regs instead of pt_regs.
Note that if the arch doesn't enable HAVE_FTRACE_REGS_HAVING_PT_REGS,
fprobe events will not be able to be used from perf.

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/173518999352.391279.13332699755290175168.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26 10:50:04 -05:00
Masami Hiramatsu (Google)
762abbc0d0 fprobe: Use ftrace_regs in fprobe exit handler
Change the fprobe exit handler to use ftrace_regs structure instead of
pt_regs. This also introduce HAVE_FTRACE_REGS_HAVING_PT_REGS which
means the ftrace_regs is including the pt_regs so that ftrace_regs
can provide pt_regs without memory allocation.
Fprobe introduces a new dependency with that.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Song Liu <song@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Matt Bobrowski <mattbobrowski@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Hao Luo <haoluo@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/173518995092.391279.6765116450352977627.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26 10:50:03 -05:00
Masami Hiramatsu (Google)
46bc082388 fprobe: Use ftrace_regs in fprobe entry handler
This allows fprobes to be available with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
instead of CONFIG_DYNAMIC_FTRACE_WITH_REGS, then we can enable fprobe
on arm64.

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/173518994037.391279.2786805566359674586.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Florent Revest <revest@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26 10:50:03 -05:00
Masami Hiramatsu (Google)
a3ed4157b7 fgraph: Replace fgraph_ret_regs with ftrace_regs
Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.

The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26 10:50:02 -05:00
Donglin Peng
21e92806d3 function_graph: Support recording and printing the function return address
When using function_graph tracer to analyze the flow of kernel function
execution, it is often necessary to quickly locate the exact line of code
where the call occurs. While this may be easy at times, it can be more
time-consuming when some functions are inlined or the flow is too long.

This feature aims to simplify the process by recording the return address
of traced funcions and printing it when outputing trace logs.

To enhance human readability, the prefix 'ret=' is used for the kernel return
value, while '<-' serves as the prefix for the return address in trace logs to
make it look more like the function tracer.

A new trace option named 'funcgraph-retaddr' has been introduced, and the
existing option 'sym-addr' can be used to control the format of the return
address.

See below logs with both funcgraph-retval and funcgraph-retaddr enabled.

0)             | load_elf_binary() { /* <-bprm_execve+0x249/0x600 */
0)             |   load_elf_phdrs() { /* <-load_elf_binary+0x84/0x1730 */
0)             |     __kmalloc_noprof() { /* <-load_elf_phdrs+0x4a/0xb0 */
0)   3.657 us  |       __cond_resched(); /* <-__kmalloc_noprof+0x28c/0x390 ret=0x0 */
0) + 24.335 us |     } /* __kmalloc_noprof ret=0xffff8882007f3000 */
0)             |     kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */
0)             |       rw_verify_area() { /* <-kernel_read+0x2b/0x50 */
0)             |         security_file_permission() { /* <-kernel_read+0x2b/0x50 */
0)             |           selinux_file_permission() { /* <-security_file_permission+0x26/0x40 */
0)             |             __inode_security_revalidate() { /* <-selinux_file_permission+0x6d/0x140 */
0)   2.034 us  |               __cond_resched(); /* <-__inode_security_revalidate+0x5f/0x80 ret=0x0 */
0)   6.602 us  |             } /* __inode_security_revalidate ret=0x0 */
0)   2.214 us  |             avc_policy_seqno(); /* <-selinux_file_permission+0x107/0x140 ret=0x0 */
0) + 16.670 us |           } /* selinux_file_permission ret=0x0 */
0) + 20.809 us |         } /* security_file_permission ret=0x0 */
0) + 25.217 us |       } /* rw_verify_area ret=0x0 */
0)             |       __kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */
0)             |         ext4_file_read_iter() { /* <-__kernel_read+0x160/0x2e0 */

Then, we can use the faddr2line to locate the source code, for example:

$ ./scripts/faddr2line ./vmlinux load_elf_phdrs+0x6c/0xb0
load_elf_phdrs+0x6c/0xb0:
elf_read at fs/binfmt_elf.c:471
(inlined by) load_elf_phdrs at fs/binfmt_elf.c:531

Link: https://lore.kernel.org/20240915032912.1118397-1-dolinux.peng@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409150605.HgUmU8ea-lkp@intel.com/
Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
[ Rebased to handle text_delta offsets ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-05 10:14:04 -04:00
Masami Hiramatsu (Google)
3572bd5689 tracing: Build event generation tests only as modules
The kprobes and synth event generation test modules add events and lock
(get a reference) those event file reference in module init function,
and unlock and delete it in module exit function. This is because those
are designed for playing as modules.

If we make those modules as built-in, those events are left locked in the
kernel, and never be removed. This causes kprobe event self-test failure
as below.

[   97.349708] ------------[ cut here ]------------
[   97.353453] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:2133 kprobe_trace_self_tests_init+0x3f1/0x480
[   97.357106] Modules linked in:
[   97.358488] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-g699646734ab5-dirty #14
[   97.361556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   97.363880] RIP: 0010:kprobe_trace_self_tests_init+0x3f1/0x480
[   97.365538] Code: a8 24 08 82 e9 ae fd ff ff 90 0f 0b 90 48 c7 c7 e5 aa 0b 82 e9 ee fc ff ff 90 0f 0b 90 48 c7 c7 2d 61 06 82 e9 8e fd ff ff 90 <0f> 0b 90 48 c7 c7 33 0b 0c 82 89 c6 e8 6e 03 1f ff 41 ff c7 e9 90
[   97.370429] RSP: 0000:ffffc90000013b50 EFLAGS: 00010286
[   97.371852] RAX: 00000000fffffff0 RBX: ffff888005919c00 RCX: 0000000000000000
[   97.373829] RDX: ffff888003f40000 RSI: ffffffff8236a598 RDI: ffff888003f40a68
[   97.375715] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   97.377675] R10: ffffffff811c9ae5 R11: ffffffff8120c4e0 R12: 0000000000000000
[   97.379591] R13: 0000000000000001 R14: 0000000000000015 R15: 0000000000000000
[   97.381536] FS:  0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000
[   97.383813] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   97.385449] CR2: 0000000000000000 CR3: 0000000002244000 CR4: 00000000000006b0
[   97.387347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   97.389277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   97.391196] Call Trace:
[   97.391967]  <TASK>
[   97.392647]  ? __warn+0xcc/0x180
[   97.393640]  ? kprobe_trace_self_tests_init+0x3f1/0x480
[   97.395181]  ? report_bug+0xbd/0x150
[   97.396234]  ? handle_bug+0x3e/0x60
[   97.397311]  ? exc_invalid_op+0x1a/0x50
[   97.398434]  ? asm_exc_invalid_op+0x1a/0x20
[   97.399652]  ? trace_kprobe_is_busy+0x20/0x20
[   97.400904]  ? tracing_reset_all_online_cpus+0x15/0x90
[   97.402304]  ? kprobe_trace_self_tests_init+0x3f1/0x480
[   97.403773]  ? init_kprobe_trace+0x50/0x50
[   97.404972]  do_one_initcall+0x112/0x240
[   97.406113]  do_initcall_level+0x95/0xb0
[   97.407286]  ? kernel_init+0x1a/0x1a0
[   97.408401]  do_initcalls+0x3f/0x70
[   97.409452]  kernel_init_freeable+0x16f/0x1e0
[   97.410662]  ? rest_init+0x1f0/0x1f0
[   97.411738]  kernel_init+0x1a/0x1a0
[   97.412788]  ret_from_fork+0x39/0x50
[   97.413817]  ? rest_init+0x1f0/0x1f0
[   97.414844]  ret_from_fork_asm+0x11/0x20
[   97.416285]  </TASK>
[   97.417134] irq event stamp: 13437323
[   97.418376] hardirqs last  enabled at (13437337): [<ffffffff8110bc0c>] console_unlock+0x11c/0x150
[   97.421285] hardirqs last disabled at (13437370): [<ffffffff8110bbf1>] console_unlock+0x101/0x150
[   97.423838] softirqs last  enabled at (13437366): [<ffffffff8108e17f>] handle_softirqs+0x23f/0x2a0
[   97.426450] softirqs last disabled at (13437393): [<ffffffff8108e346>] __irq_exit_rcu+0x66/0xd0
[   97.428850] ---[ end trace 0000000000000000 ]---

And also, since we can not cleanup dynamic_event file, ftracetest are
failed too.

To avoid these issues, build these tests only as modules.

Link: https://lore.kernel.org/all/171811263754.85078.5877446624311852525.stgit@devnote2/

Fixes: 9fe41efaca ("tracing: Add synth event generation test module")
Fixes: 64836248dd ("tracing: Add kprobe event command generation test module")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-12 08:43:12 +09:00
Linus Torvalds
70a663205d Probes updates for v6.10:
- tracing/probes: Adding new pseudo-types %pd and %pD support for dumping
   dentry name from 'struct dentry *' and file name from 'struct file *'.
 
 - uprobes: Some performance optimizations have been done.
  . Speed up the BPF uprobe event by delaying the fetching of the uprobe
    event arguments that are not used in BPF.
  . Avoid locking by speculatively checking whether uprobe event is valid.
  . Reduce lock contention by using read/write_lock instead of spinlock for
    uprobe list operation. This improved BPF uprobe benchmark result 43% on
    average.
 
 - rethook: Removes non-fatal warning messages when tracing stack from BPF
   and skip rcu_is_watching() validation in rethook if possible.
 
 - objpool: Optimizing objpool (which is used by kretprobes and fprobe as
   rethook backend storage) by inlining functions and avoid caching nr_cpu_ids
   because it is a const value.
 
 - fprobe: Add entry/exit callbacks types (code cleanup)
 - kprobes: Check ftrace was killed in kprobes if it uses ftrace.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmZFUxsbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b+fIH/A96/SeC5WRLhXmHfTCM
 IvKUea2n0b0oV/2pVfHqfkCBTICuUZ97Opd9VH9jLtjBOTh0fUOGZ2DNVGdSYfWm
 IIkS5dhuZxHXrSHEVYykwLHI3AOL7Q6Ny9EmOg1CNMidUkPMNtBvppsBYPlFU/B/
 qQJAvOdkVOnNITCaas0+MNgepoVVKdJzdNQ1I4WrGyG8isCZBaCYKo2QcGyheCNN
 y8NXvnVHgmgHQ8nTaeE5AawclFzFnhwHfPQPe1kiyGrx15b8K+VYmaZxPKv33A1a
 KT3TKJ1Ep7s7iWFh2iPVJzIwOXCmSnvNTKfNx/MDuKtO7UVfFwytoMEaekbmv3bG
 VqM=
 =n/mW
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:

 - tracing/probes: Add new pseudo-types %pd and %pD support for dumping
   dentry name from 'struct dentry *' and file name from 'struct file *'

 - uprobes performance optimizations:
    - Speed up the BPF uprobe event by delaying the fetching of the
      uprobe event arguments that are not used in BPF
    - Avoid locking by speculatively checking whether uprobe event is
      valid
    - Reduce lock contention by using read/write_lock instead of
      spinlock for uprobe list operation. This improved BPF uprobe
      benchmark result 43% on average

 - rethook: Remove non-fatal warning messages when tracing stack from
   BPF and skip rcu_is_watching() validation in rethook if possible

 - objpool: Optimize objpool (which is used by kretprobes and fprobe as
   rethook backend storage) by inlining functions and avoid caching
   nr_cpu_ids because it is a const value

 - fprobe: Add entry/exit callbacks types (code cleanup)

 - kprobes: Check ftrace was killed in kprobes if it uses ftrace

* tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  kprobe/ftrace: bail out if ftrace was killed
  selftests/ftrace: Fix required features for VFS type test case
  objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
  objpool: enable inlining objpool_push() and objpool_pop() operations
  rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
  ftrace: make extra rcu_is_watching() validation check optional
  uprobes: reduce contention on uprobes_tree access
  rethook: Remove warning messages printed for finding return address of a frame.
  fprobe: Add entry/exit callbacks types
  selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
  selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
  Documentation: tracing: add new type '%pd' and '%pD' for kprobe
  tracing/probes: support '%pD' type for print struct file's name
  tracing/probes: support '%pd' type for print struct dentry's name
  uprobes: add speculative lockless system-wide uprobe filter check
  uprobes: prepare uprobe args buffer lazily
  uprobes: encapsulate preparation of uprobe args buffer
2024-05-17 18:29:30 -07:00
Linus Torvalds
c0b9620bc3 RCU pull request for v6.10
This pull request contains the following branches:
 
 fixes.2024.04.15a: Fix a lockdep complain for lazy-preemptible kernel,
 remove redundant BH disable for TINY_RCU, remove redundant READ_ONCE()
 in tree.c, fix false positives KCSAN splat and fix buffer overflow in
 the print_cpu_stall_info().
 
 misc.2024.04.12a: Misc updates related to bpf, tracing and update the
 MAINTAINERS file.
 
 rcu-sync-normal-improve.2024.04.15a: An improvement of a normal
 synchronize_rcu() call in terms of latency. It maintains a separate
 track for sync. users only. This approach bypasses per-cpu nocb-lists
 thus sync-users do not depend on nocb-list length and how fast regular
 callbacks are processed.
 
 rcu-tasks.2024.04.15a: RCU tasks, switch tasks RCU grace periods to
 sleep at TASK_IDLE priority, fix some comments, add some diagnostic
 warning to the exit_tasks_rcu_start() and fix a buffer overflow in
 the show_rcu_tasks_trace_gp_kthread().
 
 rcutorture.2024.04.15a: Increase memory to guest OS, fix a Tasks
 Rude RCU testing, some updates for TREE09, dump mode information
 to debug GP kthread state, remove redundant READ_ONCE(), fix some
 comments about RCU_TORTURE_PIPE_LEN and pipe_count, remove some
 redundant pointer initialization, fix a hung splat task by when
 the rcutorture tests start to exit, fix invalid context warning,
 add '--do-kvfree' parameter to torture test and use slow register
 unregister callbacks only for rcutype test.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEu6QRe/mAUYNn5U0PBYqkjnKWLM8FAmYzsmUACgkQBYqkjnKW
 LM/FAwv+LcIJ9lO/wzUpnH3d3djBOPmyu7Us8ERNY5lcVZ+neS2m3vxq0kOk/cnV
 RGgZc7qjWqMQ9hAx/MmIodmiw036ceRDe5CP/Ec/TYx68m+NPG3VnP08s/xLXLlx
 n8aSJJu37y0ElMQMwvuQaoNJ2xqlZ8AHCR6iaqJtzmPBR6zHLyeCPVpdPJQfcSO7
 +9ABzqo8isGxeuaAE7y0WUp0ZsSpdYvdext5SStjtvZ+hKERdVluhBF+OxZIZByp
 RSBoZJrbTKKpzTUBSE0ci+mlfqBPmSVjjqvygscuwOoKhm+601E51DYb1QXkGujq
 vuc1f/c7VjTAXyvs9k4An2x3XcN5SFhA6Bhc+L6aU/UJBzAWrJJkVOwS79gHNSn1
 qshyhpDLE8MiBEi0QxaEmBZLkz3BX1aYbQA0+5wvgoz0u8QglrpRrPRIWUWC0wvq
 SOLIibZkJuPUOZuD5AP4tg80swTuSCvyWuiKUVRnJK9FsYKdcyNUCnOLIwUzQlrg
 1/hatlvS
 =cq8V
 -----END PGP SIGNATURE-----

Merge tag 'rcu.next.v6.10' of https://github.com/urezki/linux

Pull RCU updates from Uladzislau Rezki:

 - Fix a lockdep complain for lazy-preemptible kernel, remove redundant
   BH disable for TINY_RCU, remove redundant READ_ONCE() in tree.c, fix
   false positives KCSAN splat and fix buffer overflow in the
   print_cpu_stall_info().

 - Misc updates related to bpf, tracing and update the MAINTAINERS file.

 - An improvement of a normal synchronize_rcu() call in terms of
   latency. It maintains a separate track for sync. users only. This
   approach bypasses per-cpu nocb-lists thus sync-users do not depend on
   nocb-list length and how fast regular callbacks are processed.

 - RCU tasks: switch tasks RCU grace periods to sleep at TASK_IDLE
   priority, fix some comments, add some diagnostic warning to the
   exit_tasks_rcu_start() and fix a buffer overflow in the
   show_rcu_tasks_trace_gp_kthread().

 - RCU torture: Increase memory to guest OS, fix a Tasks Rude RCU
   testing, some updates for TREE09, dump mode information to debug GP
   kthread state, remove redundant READ_ONCE(), fix some comments about
   RCU_TORTURE_PIPE_LEN and pipe_count, remove some redundant pointer
   initialization, fix a hung splat task by when the rcutorture tests
   start to exit, fix invalid context warning, add '--do-kvfree'
   parameter to torture test and use slow register unregister callbacks
   only for rcutype test.

* tag 'rcu.next.v6.10' of https://github.com/urezki/linux: (48 commits)
  rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test
  torture: Scale --do-kvfree test time
  rcutorture: Fix invalid context warning when enable srcu barrier testing
  rcutorture: Make stall-tasks directly exit when rcutorture tests end
  rcutorture: Removing redundant function pointer initialization
  rcutorture: Make rcutorture support print rcu-tasks gp state
  rcutorture: Use the gp_kthread_dbg operation specified by cur_ops
  rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading
  rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment
  rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE()
  rcu: Allocate WQ with WQ_MEM_RECLAIM bit set
  rcu: Support direct wake-up of synchronize_rcu() users
  rcu: Add a trace event for synchronize_rcu_normal()
  rcu: Reduce synchronize_rcu() latency
  rcu: Fix buffer overflow in print_cpu_stall_info()
  rcu: Mollify sparse with RCU guard
  rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow
  rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer
  rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE()
  rcu: Remove redundant CONFIG_PROVE_RCU #if condition
  ...
2024-05-13 09:49:06 -07:00
Andrii Nakryiko
b0e28a4b5b ftrace: make extra rcu_is_watching() validation check optional
Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to
control whether ftrace low-level code performs additional
rcu_is_watching()-based validation logic in an attempt to catch noinstr
violations.

This check is expected to never be true and is mostly useful for
low-level validation of ftrace subsystem invariants. For most users it
should probably be kept disabled to eliminate unnecessary runtime
overhead.

This improves BPF multi-kretprobe (relying on ftrace and rethook
infrastructure) runtime throughput by 2%, according to BPF benchmarks ([0]).

  [0] https://lore.kernel.org/bpf/CAEf4BzauQ2WKMjZdc9s0rBWa01BYbgwHN6aNDXQSHYia47pQ-w@mail.gmail.com/

Link: https://lore.kernel.org/all/20240418190909.704286-1-andrii@kernel.org/

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-01 23:18:48 +09:00
Prasad Pandit
d96c36004e tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry
Fix FTRACE_RECORD_RECURSION_SIZE entry, replace tab with
a space character. It helps Kconfig parsers to read file
without error.

Link: https://lore.kernel.org/linux-trace-kernel/20240322121801.1803948-1-ppandit@redhat.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 773c167050 ("ftrace: Add recording of functions that caused recursion")
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-04-11 17:45:18 -04:00
Paul E. McKenney
02b3c5fcdf tracing: Select new NEED_TASKS_RCU Kconfig option
Currently, if a Kconfig option depends on TASKS_RCU, it conditionally does
"select TASKS_RCU if PREEMPTION".  This works, but requires any change in
this enablement logic to be replicated across all such "select" clauses.
A new NEED_TASKS_RCU Kconfig option has been created to allow this
enablement logic to be in one place in kernel/rcu/Kconfig.

Therefore, select the new NEED_TASKS_RCU Kconfig option instead of the
old TASKS_RCU option.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: <linux-trace-kernel@vger.kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-11 18:18:47 +02:00
Linus Torvalds
d2a6fd45c5 Probes updates for v6.5:
- fprobe: Pass return address to the fprobe entry/exit callbacks so that
   the callbacks don't need to analyze pt_regs/stack to find the function
   return address.
 
 - kprobe events: cleanup usage of TPARG_FL_FENTRY and TPARG_FL_RETURN
   flags so that those are not set at once.
 
 - fprobe events:
  . Add a new fprobe events for tracing arbitrary function entry and
    exit as a trace event.
  . Add a new tracepoint events for tracing raw tracepoint as a trace
    event. This allows user to trace non user-exposed tracepoints.
  . Move eprobe's event parser code into probe event common file.
  . Introduce BTF (BPF type format) support to kernel probe (kprobe,
    fprobe and tracepoint probe) events so that user can specify traced
    function arguments by name. This also applies the type of argument
    when fetching the argument.
  . Introduce '$arg*' wildcard support if BTF is available. This expands
    the '$arg*' meta argument to all function argument automatically.
  . Check the return value types by BTF. If the function returns 'void',
    '$retval' is rejected.
  . Add some selftest script for fprobe events, tracepoint events and
    BTF support.
  . Update documentation about the fprobe events.
  . Some fixes for above features, document and selftests.
 
 - selftests for ftrace (except for new fprobe events):
  . Add a test case for multiple consecutive probes in a function which
    checks if ftrace based kprobe, optimized kprobe and normal kprobe
    can be defined in the same target function.
  . Add a test case for optimized probe, which checks whether kprobe
    can be optimized or not.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmSa+9MACgkQ2/sHvwUr
 PxsmOAgAmUOIWtvH5py7AZpIRhCj8B18F6KnT7w2hByCsRxf7SaCqMhpBCk9VnYv
 9fJFBHpvYRJEmpHoH3o2ET5AGfKVNac9z96AGI2qJ4ECWITd6I5+WfTdZ5ueVn2d
 f6DQ10mHXDHSMFbuqfYWSHtkeivJpWpUNHhwzPb4doNOe06bZNfVuSgnksFg1at5
 kq16HbvGnhPzdO4YHmvqwjmRHr5/nCI1KDE9xIBcqNtWFbiRigC11zaZEUkLX+vT
 F63ShyfCK718AiwDfnjXpGkXAiVOZuAIR8RELaSqQ92YHCFKq5k9K4++WllPR5f9
 AxjVultFDiCd4oSPgYpQkjuZdFq9NA==
 =IhmY
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:

 - fprobe: Pass return address to the fprobe entry/exit callbacks so
   that the callbacks don't need to analyze pt_regs/stack to find the
   function return address.

 - kprobe events: cleanup usage of TPARG_FL_FENTRY and TPARG_FL_RETURN
   flags so that those are not set at once.

 - fprobe events:
      - Add a new fprobe events for tracing arbitrary function entry and
        exit as a trace event.
      - Add a new tracepoint events for tracing raw tracepoint as a
        trace event. This allows user to trace non user-exposed
        tracepoints.
      - Move eprobe's event parser code into probe event common file.
      - Introduce BTF (BPF type format) support to kernel probe (kprobe,
        fprobe and tracepoint probe) events so that user can specify
        traced function arguments by name. This also applies the type of
        argument when fetching the argument.
      - Introduce '$arg*' wildcard support if BTF is available. This
        expands the '$arg*' meta argument to all function argument
        automatically.
      - Check the return value types by BTF. If the function returns
        'void', '$retval' is rejected.
      - Add some selftest script for fprobe events, tracepoint events
        and BTF support.
      - Update documentation about the fprobe events.
      - Some fixes for above features, document and selftests.

 - selftests for ftrace (in addition to the new fprobe events):
      - Add a test case for multiple consecutive probes in a function
        which checks if ftrace based kprobe, optimized kprobe and normal
        kprobe can be defined in the same target function.
      - Add a test case for optimized probe, which checks whether kprobe
        can be optimized or not.

* tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Fix tracepoint event with $arg* to fetch correct argument
  Documentation: Fix typo of reference file name
  tracing/probes: Fix to return NULL and keep using current argc
  selftests/ftrace: Add new test case which checks for optimized probes
  selftests/ftrace: Add new test case which adds multiple consecutive probes in a function
  Documentation: tracing/probes: Add fprobe event tracing document
  selftests/ftrace: Add BTF arguments test cases
  selftests/ftrace: Add tracepoint probe test case
  tracing/probes: Add BTF retval type support
  tracing/probes: Add $arg* meta argument for all function args
  tracing/probes: Support function parameters if BTF is available
  tracing/probes: Move event parameter fetching code to common parser
  tracing/probes: Add tracepoint support on fprobe_events
  selftests/ftrace: Add fprobe related testcases
  tracing/probes: Add fprobe events for tracing function entry and exit.
  tracing/probes: Avoid setting TPARG_FL_FENTRY and TPARG_FL_RETURN
  fprobe: Pass return address to the handlers
2023-06-30 10:44:53 -07:00
Donglin Peng
a1be9ccc57 function_graph: Support recording and printing the return value of function
Analyzing system call failures with the function_graph tracer can be a
time-consuming process, particularly when locating the kernel function
that first returns an error in the trace logs. This change aims to
simplify the process by recording the function return value to the
'retval' member of 'ftrace_graph_ret' and printing it when outputting
the trace log.

We have introduced new trace options: funcgraph-retval and
funcgraph-retval-hex. The former controls whether to display the return
value, while the latter controls the display format.

Please note that even if a function's return type is void, a return
value will still be printed. You can simply ignore it.

This patch only establishes the fundamental infrastructure. Subsequent
patches will make this feature available on some commonly used processor
architectures.

Here is an example:

I attempted to attach the demo process to a cpu cgroup, but it failed:

echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
-bash: echo: write error: Invalid argument

The strace logs indicate that the write system call returned -EINVAL(-22):
...
write(1, "273\n", 4)                    = -1 EINVAL (Invalid argument)
...

To capture trace logs during a write system call, use the following
commands:

cd /sys/kernel/debug/tracing/
echo 0 > tracing_on
echo > trace
echo *sys_write > set_graph_function
echo *spin* > set_graph_notrace
echo *rcu* >> set_graph_notrace
echo *alloc* >> set_graph_notrace
echo preempt* >> set_graph_notrace
echo kfree* >> set_graph_notrace
echo $$ > set_ftrace_pid
echo function_graph > current_tracer
echo 1 > options/funcgraph-retval
echo 0 > options/funcgraph-retval-hex
echo 1 > tracing_on
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
echo 0 > tracing_on
cat trace > ~/trace.log

To locate the root cause, search for error code -22 directly in the file
trace.log and identify the first function that returned -22. Once you
have identified this function, examine its code to determine the root
cause.

For example, in the trace log below, cpu_cgroup_can_attach
returned -22 first, so we can focus our analysis on this function to
identify the root cause.

...

 1)          | cgroup_migrate() {
 1) 0.651 us |   cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
 1)          |   cgroup_migrate_execute() {
 1)          |     cpu_cgroup_can_attach() {
 1)          |       cgroup_taskset_first() {
 1) 0.732 us |         cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
 1) 1.232 us |       } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
 1) 0.380 us |       sched_rt_can_attach(); /* = 0x0 */
 1) 2.335 us |     } /* cpu_cgroup_can_attach = -22 */
 1) 4.369 us |   } /* cgroup_migrate_execute = -22 */
 1) 7.143 us | } /* cgroup_migrate = -22 */

...

Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn

Tested-by: Florian Kauer <florian.kauer@linutronix.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-06-20 18:38:37 -04:00
Masami Hiramatsu (Google)
b576e09701 tracing/probes: Support function parameters if BTF is available
Support function or tracepoint parameters by name if BTF support is enabled
and the event is for function entry (this feature can be used with kprobe-
events, fprobe-events and tracepoint probe events.)

Note that the BTF variable syntax does not require a prefix. If it starts
with an alphabetic character or an underscore ('_') without a prefix like
'$' and '%', it is considered as a BTF variable.
If you specify only the BTF variable name, the argument name will also
be the same name instead of 'arg*'.

 # echo 'p vfs_read count pos' >> dynamic_events
 # echo 'f vfs_write count pos' >> dynamic_events
 # echo 't sched_overutilized_tp rd overutilized' >> dynamic_events
 # cat dynamic_events
p:kprobes/p_vfs_read_0 vfs_read count=count pos=pos
f:fprobes/vfs_write__entry vfs_write count=count pos=pos
t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized

Link: https://lore.kernel.org/all/168507474014.913472.16963996883278039183.stgit@mhiramat.roam.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
2023-06-06 21:39:56 +09:00
Masami Hiramatsu (Google)
334e5519c3 tracing/probes: Add fprobe events for tracing function entry and exit.
Add fprobe events for tracing function entry and exit instead of kprobe
events. With this change, we can continue to trace function entry/exit
even if the CONFIG_KPROBES_ON_FTRACE is not available. Since
CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS,
it is not available if the architecture only supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS. And that means kprobe events can not
probe function entry/exit effectively on such architecture.
But this can be solved if the dynamic events supports fprobe events.

The fprobe event is a new dynamic events which is only for the function
(symbol) entry and exit. This event accepts non register fetch arguments
so that user can trace the function arguments and return values.

The fprobe events syntax is here;

 f[:[GRP/][EVENT]] FUNCTION [FETCHARGS]
 f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS]

E.g.

 # echo 'f vfs_read $arg1'  >> dynamic_events
 # echo 'f vfs_read%return $retval'  >> dynamic_events
 # cat dynamic_events
 f:fprobes/vfs_read__entry vfs_read arg1=$arg1
 f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
 # echo 1 > events/fprobes/enable
 # head -n 20 trace | tail
 #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
 #              | |         |   |||||     |         |
              sh-142     [005] ...1.   448.386420: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
              sh-142     [005] .....   448.386436: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
              sh-142     [005] ...1.   448.386451: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
              sh-142     [005] .....   448.386458: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
              sh-142     [005] ...1.   448.386469: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
              sh-142     [005] .....   448.386476: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
              sh-142     [005] ...1.   448.602073: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
              sh-142     [005] .....   448.602089: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1

Link: https://lore.kernel.org/all/168507469754.913472.6112857614708350210.stgit@mhiramat.roam.corp.google.com/

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202302011530.7vm4O8Ro-lkp@intel.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-06-06 21:39:55 +09:00
Steven Rostedt (Google)
88fe1ec75f tracing: Unbreak user events
The user events was added a bit prematurely, and there were a few kernel
developers that had issues with it. The API also needed a bit of work to
make sure it would be stable. It was decided to make user events "broken"
until this was settled. Now it has a new API that appears to be as stable
as it will be without the use of a crystal ball. It's being used within
Microsoft as is, which means the API has had some testing in real world
use cases. It went through many discussions in the bi-weekly tracing
meetings, and there's been no more comments about updates.

I feel this is good to go.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29 06:52:09 -04:00
Beau Belgrave
7235759084 tracing/user_events: Use remote writes for event enablement
As part of the discussions for user_events aligned with user space
tracers, it was determined that user programs should register a aligned
value to set or clear a bit when an event becomes enabled. Currently a
shared page is being used that requires mmap(). Remove the shared page
implementation and move to a user registered address implementation.

In this new model during the event registration from user programs 3 new
values are specified. The first is the address to update when the event
is either enabled or disabled. The second is the bit to set/clear to
reflect the event being enabled. The third is the size of the value at
the specified address.

This allows for a local 32/64-bit value in user programs to support
both kernel and user tracers. As an example, setting bit 31 for kernel
tracers when the event becomes enabled allows for user tracers to use
the other bits for ref counts or other flags. The kernel side updates
the bit atomically, user programs need to also update these values
atomically.

User provided addresses must be aligned on a natural boundary, this
allows for single page checking and prevents odd behaviors such as a
enable value straddling 2 pages instead of a single page. Currently
page faults are only logged, future patches will handle these.

Link: https://lkml.kernel.org/r/20230328235219.203-4-beaub@linux.microsoft.com

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29 06:52:08 -04:00
Florent Revest
60c8971899 ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS
Direct called trampolines can be called in two ways:
- either from the ftrace callsite. In this case, they do not access any
  struct ftrace_regs nor pt_regs
- Or, if a ftrace ops is also attached, from the end of a ftrace
  trampoline. In this case, the call_direct_funcs ops is in charge of
  setting the direct call trampoline's address in a struct ftrace_regs

Since:

commit 9705bc7096 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()")

The later case no longer requires a full pt_regs. It only needs a struct
ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS.
With architectures like arm64 already abandoning WITH_REGS in favor of
WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only.

Link: https://lkml.kernel.org/r/20230321140424.345218-7-revest@chromium.org

Signed-off-by: Florent Revest <revest@chromium.org>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:43:32 -04:00
Linus Torvalds
b72b5fecc1 tracing updates for 6.3:
- Add function names as a way to filter function addresses
 
 - Add sample module to test ftrace ops and dynamic trampolines
 
 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a task is
   scheduled out and recorded when it gets scheduled back in.
 
 - Add trace event helper __get_buf() to use as a temporary buffer when printing
   out trace event output.
 
 - Add kernel command line to create trace instances on boot up.
 
 - Add enabling of events to instances created at boot up.
 
 - Add trace_array_puts() to write into instances.
 
 - Allow boot instances to take a snapshot at the end of boot up.
 
 - Allow live patch modules to include trace events
 
 - Minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY/PaaBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qh5iAPoD0LKZzD33rhO5Ec4hoexE0DkqycP3
 dvmOMbCBL8GkxwEA+d2gLz/EquSFm166hc4D79Sn3geCqvkwmy8vQWVjIQc=
 =M82D
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add function names as a way to filter function addresses

 - Add sample module to test ftrace ops and dynamic trampolines

 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a
   task is scheduled out and recorded when it gets scheduled back in.

 - Add trace event helper __get_buf() to use as a temporary buffer when
   printing out trace event output.

 - Add kernel command line to create trace instances on boot up.

 - Add enabling of events to instances created at boot up.

 - Add trace_array_puts() to write into instances.

 - Allow boot instances to take a snapshot at the end of boot up.

 - Allow live patch modules to include trace events

 - Minor fixes and clean ups

* tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
  tracing: Remove unnecessary NULL assignment
  tracepoint: Allow livepatch module add trace event
  tracing: Always use canonical ftrace path
  tracing/histogram: Fix stacktrace histogram Documententation
  tracing/histogram: Fix stacktrace key
  tracing/histogram: Fix a few problems with stacktrace variable printing
  tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
  tracing/histogram: Don't use strlen to find length of stacktrace variables
  tracing: Allow boot instances to have snapshot buffers
  tracing: Add trace_array_puts() to write into instance
  tracing: Add enabling of events to boot instances
  tracing: Add creation of instances at boot command line
  tracing: Fix trace_event_raw_event_synth() if else statement
  samples: ftrace: Make some global variables static
  ftrace: sample: avoid open-coded 64-bit division
  samples: ftrace: Include the nospec-branch.h only for x86
  tracing: Acquire buffer from temparary trace sequence
  tracing/histogram: Wrap remaining shell snippets in code blocks
  tracing/osnoise: No need for schedule_hrtimeout range
  bpf/tracing: Use stage6 of tracing to not duplicate macros
  ...
2023-02-23 10:20:49 -08:00
Linus Torvalds
8bf1a529cd arm64 updates for 6.3:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that Linux
   needs to save/restore.
 
 - Include TPIDR2 in the signal context and add the corresponding
   kselftests.
 
 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time.
 
 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
 
 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire
   loaded kernel image to the PoC and disabling the MMU and caches before
   branching to the kernel bare metal entry point, leave the MMU and
   caches enabled and rely on EFI's cacheable 1:1 mapping of all of
   system RAM to populate the initial page tables.
 
 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values).
 
 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure.
 
 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice.
 
 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone.
 
 - Further arm64 sysreg conversion and some fixes.
 
 - arm64 kselftest fixes and improvements.
 
 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation.
 
 - Pseudo-NMI code generation optimisations.
 
 - Minor fixes for SME and TPIDR2 handling.
 
 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace
   strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic
   shadow call stack in two passes, intercept pfn changes in set_pte_at()
   without the required break-before-make sequence, attempt to dump all
   instructions on unhandled kernel faults.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmP0/QsACgkQa9axLQDI
 XvG+gA/+JDVEH9wRzAIZvbp9hSuohPc48xgAmIMP1eiVB0/5qeRjYAJwS33H0rXS
 BPC2kj9IBy/eQeM9ICg0nFd0zYznSVacITqe6NrqeJ1F+ftS4rrHdfxd+J7kIoCs
 V2L8e+BJvmHdhmNV2qMAgJdGlfxfQBA7fv2cy52HKYcouoOh1AUVR/x+yXVXAsCd
 qJP3+dlUKccgm/oc5unEC1eZ49u8O+EoasqOyfG6K5udMgzhEX3K6imT9J3hw0WT
 UjstYkx5uGS/prUrRCQAX96VCHoZmzEDKtQuHkHvQXEYXsYPF3ldbR2CziNJnHe7
 QfSkjJlt8HAtExA+BkwEe9i0MQO/2VF5qsa2e4fA6l7uqGu3LOtS/jJd23C9n9fR
 Id8aBMeN6S8+MjqRA9L2uf4t6e4ISEHoG9ZRdc4WOwloxEEiJoIeun+7bHdOSZLj
 AFdHFCz4NXiiwC0UP0xPDI2YeCLqt5np7HmnrUqwzRpVO8UUagiJD8TIpcBSjBN9
 J68eidenHUW7/SlIeaMKE2lmo8AUEAJs9AorDSugF19/ThJcQdx7vT2UAZjeVB3j
 1dbbwajnlDOk/w8PQC4thFp5/MDlfst0htS3WRwa+vgkweE2EAdTU4hUZ8qEP7FQ
 smhYtlT1xUSTYDTqoaG/U2OWR6/UU79wP0jgcOsHXTuyYrtPI/Q=
 =VmXL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
   architectural register (ZT0, for the look-up table feature) that
   Linux needs to save/restore

 - Include TPIDR2 in the signal context and add the corresponding
   kselftests

 - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
   support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
   (ARM CMN) at probe time

 - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64

 - Permit EFI boot with MMU and caches on. Instead of cleaning the
   entire loaded kernel image to the PoC and disabling the MMU and
   caches before branching to the kernel bare metal entry point, leave
   the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
   all of system RAM to populate the initial page tables

 - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
   kernel (the arm32 kernel only defines the values)

 - Harden the arm64 shadow call stack pointer handling: stash the shadow
   stack pointer in the task struct on interrupt, load it directly from
   this structure

 - Signal handling cleanups to remove redundant validation of size
   information and avoid reading the same data from userspace twice

 - Refactor the hwcap macros to make use of the automatically generated
   ID registers. It should make new hwcaps writing less error prone

 - Further arm64 sysreg conversion and some fixes

 - arm64 kselftest fixes and improvements

 - Pointer authentication cleanups: don't sign leaf functions, unify
   asm-arch manipulation

 - Pseudo-NMI code generation optimisations

 - Minor fixes for SME and TPIDR2 handling

 - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
   replace strtobool() to kstrtobool() in the cpufeature.c code, apply
   dynamic shadow call stack in two passes, intercept pfn changes in
   set_pte_at() without the required break-before-make sequence, attempt
   to dump all instructions on unhandled kernel faults

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
  arm64: fix .idmap.text assertion for large kernels
  kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
  kselftest/arm64: Copy whole EXTRA context
  arm64: kprobes: Drop ID map text from kprobes blacklist
  perf: arm_spe: Print the version of SPE detected
  perf: arm_spe: Add support for SPEv1.2 inverted event filtering
  perf: Add perf_event_attr::config3
  arm64/sme: Fix __finalise_el2 SMEver check
  drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
  arm64/signal: Only read new data when parsing the ZT context
  arm64/signal: Only read new data when parsing the ZA context
  arm64/signal: Only read new data when parsing the SVE context
  arm64/signal: Avoid rereading context frame sizes
  arm64/signal: Make interface for restore_fpsimd_context() consistent
  arm64/signal: Remove redundant size validation from parse_user_sigframe()
  arm64/signal: Don't redundantly verify FPSIMD magic
  arm64/cpufeature: Use helper macros to specify hwcaps
  arm64/cpufeature: Always use symbolic name for feature value in hwcaps
  arm64/sysreg: Initial unsigned annotations for ID registers
  arm64/sysreg: Initial annotation of signed ID registers
  ...
2023-02-21 15:27:48 -08:00
Ross Zwisler
2455f0e124 tracing: Always use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many comments and Kconfig help messages in the tracing code still refer
to this older debugfs path, so let's update them to avoid confusion.

Link: https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-2-zwisler@google.com

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-18 14:34:09 -05:00
Randy Dunlap
ac28d0a0f4 tracing: Kconfig: Fix spelling/grammar/punctuation
Fix some editorial nits in trace Kconfig.

Link: https://lkml.kernel.org/r/20230124181647.15902-1-rdunlap@infradead.org

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24 13:22:38 -05:00
Mark Rutland
cbad0fb2d8 ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).

Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.

On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.

Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.

Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.

This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.

Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).

The CALL_OPS approach avoids this problems and is safe as:

* The existing synchronization in ftrace_shutdown() using
  ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
  synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
  to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
  or while invoking ftrace_ops::func), when that ftrace_ops is
  unregistered.

  Arguably this could also be relied upon for the existing scheme,
  permitting dynamic ftrace_ops to be invoked directly when ops->func is
  in range, but this will require additional logic to handle branch
  range limitations, and is not handled by this patch.

* Each callsite's ftrace_ops pointer literal can hold any valid kernel
  address, and is updated atomically. As an architecture's ftrace_caller
  trampoline will atomically load the ops pointer then dereference
  ops->func, there is no risk of invoking ops->func with a mismatches
  ops pointer, and updates to the ops pointer do not require special
  care.

A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-24 11:49:42 +00:00
Linus Torvalds
5f6e430f93 powerpc updates for 6.2
- Add powerpc qspinlock implementation optimised for large system scalability and
    paravirt. See the merge message for more details.
 
  - Enable objtool to be built on powerpc to generate mcount locations.
 
  - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is
    restricted to the patching CPU.
 
  - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI.
 
  - Sanitise user registers on interrupt entry on 64-bit Book3S.
 
  - Many other small features and fixes.
 
 Thanks to: Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen
 Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin
 Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven,
 Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain,
 Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao,
 Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure,
 Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
 Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang
 Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, Wolfram Sang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmOfrj8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIWtD/9mGF/ze2k+qFTo+30fb7bO8WJIDgsR
 dIASnZjXV7q/45elvymhUdkQv4R7xL3pzC40P1+ZKtWzGTNe+zWUQLoALNwRK85j
 8CsxZbqefGNKE5Z6ZHo9s37wsu3+jJu9yEQpGFo1LINyzeclCn5St5oqfRam+Hd/
 cPF+VfvREwZ0+YOKGBhJ2EgC+Gc9xsFY7DLQsoYlu71iZZr6Z6rgZW/EY5h3RMGS
 YKBoVwDsWaU0FpFWrr/rYTI6DqSr3AHr1+ftDg7ncCZMD6vQva6aMCCt94aLB1aE
 vC+DNdhZlA558bXGa5yA7Wr//7aUBUIwyC60DogOeZ6vw3kD9tdEd1fbH5hmqNKY
 K5bfqm28XU2959CTE8RDgsYYZvwDcfrjBIML14WZGdCQOTcGKpgOGp22o6yNb1Pq
 JKpHHnVpvu2PZ/p2XdKSm9+etr2yI6lXZAEVTS7ehdtMukButjSHEVbSCEZ8tlWz
 KokQt2J23BMHuSrXK6+67wWQBtdsLEk+LBOQmweiwarMocqvL/Zjz/5J7DR2DtH8
 wlY3wOtB1+E5j7xZ+RgK3c3jNg5dH39ZwvFsSATWTI3P+iq6OK/bbk4q4LmZt2l9
 ZIfH/CXPf9BvGCHzHa3AAd3UBbJLFwj17btMEv1wFVPS0T4LPUzkgTNTNUYeP6zL
 h1e5QfgUxvKPuQ==
 =7k3p
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add powerpc qspinlock implementation optimised for large system
   scalability and paravirt. See the merge message for more details

 - Enable objtool to be built on powerpc to generate mcount locations

 - Use a temporary mm for code patching with the Radix MMU, so the
   writable mapping is restricted to the patching CPU

 - Add an option to build the 64-bit big-endian kernel with the ELFv2
   ABI

 - Sanitise user registers on interrupt entry on 64-bit Book3S

 - Many other small features and fixes

Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.

* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
  powerpc/code-patching: Fix oops with DEBUG_VM enabled
  powerpc/qspinlock: Fix 32-bit build
  powerpc/prom: Fix 32-bit build
  powerpc/rtas: mandate RTAS syscall filtering
  powerpc/rtas: define pr_fmt and convert printk call sites
  powerpc/rtas: clean up includes
  powerpc/rtas: clean up rtas_error_log_max initialization
  powerpc/pseries/eeh: use correct API for error log size
  powerpc/rtas: avoid scheduling in rtas_os_term()
  powerpc/rtas: avoid device tree lookups in rtas_os_term()
  powerpc/rtasd: use correct OF API for event scan rate
  powerpc/rtas: document rtas_call()
  powerpc/pseries: unregister VPA when hot unplugging a CPU
  powerpc/pseries: reset the RCU watchdogs after a LPM
  powerpc: Take in account addition CPU node when building kexec FDT
  powerpc: export the CPU node count
  powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
  powerpc/dts/fsl: Fix pca954x i2c-mux node names
  cxl: Remove unnecessary cxl_pci_window_alignment()
  selftests/powerpc: Fix resource leaks
  ...
2022-12-19 07:13:33 -06:00
Linus Torvalds
fe36bb8736 Tracing updates for 6.2:
- Add options to the osnoise tracer
   o panic_on_stop option that panics the kernel if osnoise is greater than some
     user defined threshold.
   o preempt option, to test noise while preemption is disabled
   o irq option, to test noise when interrupts are disabled
 
 - Add .percent and .graph suffix to histograms to give different outputs
 
 - Add nohitcount to disable showing hitcount in histogram output
 
 - Add new __cpumask() to trace event fields to annotate that a unsigned long
   array is a cpumask to user space and should be treated as one.
 
 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing and take
   snapshots.
 
 - Fix x86/kmmio mmio tracer to work with the updates to lockdep
 
 - Unify the panic and die notifiers
 
 - Add back ftrace_expect reference that is used to extract more information in
   the ftrace_bug() code.
 
 - Have trigger filter parsing errors show up in the tracing error log.
 
 - Updated MAINTAINERS file to add kernel tracing  mailing list and patchwork
   info
 
 - Use IDA to keep track of event type numbers.
 
 - And minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5vIcxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlO7AQCmtZbriadAR6N7Llj092YXmYfzrxyi
 1WS35vhpZsBJ8gEA8j68l+LrgNt51N2gXlTXEHgXzdBgL/TKAPSX4D99GQY=
 =z1pe
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add options to the osnoise tracer:
      - 'panic_on_stop' option that panics the kernel if osnoise is
        greater than some user defined threshold.
      - 'preempt' option, to test noise while preemption is disabled
      - 'irq' option, to test noise when interrupts are disabled

 - Add .percent and .graph suffix to histograms to give different
   outputs

 - Add nohitcount to disable showing hitcount in histogram output

 - Add new __cpumask() to trace event fields to annotate that a unsigned
   long array is a cpumask to user space and should be treated as one.

 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing
   and take snapshots.

 - Fix x86/kmmio mmio tracer to work with the updates to lockdep

 - Unify the panic and die notifiers

 - Add back ftrace_expect reference that is used to extract more
   information in the ftrace_bug() code.

 - Have trigger filter parsing errors show up in the tracing error log.

 - Updated MAINTAINERS file to add kernel tracing mailing list and
   patchwork info

 - Use IDA to keep track of event type numbers.

 - And minor fixes and clean ups

* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  tracing: Fix cpumask() example typo
  tracing: Improve panic/die notifiers
  ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
  tracing: Do not synchronize freeing of trigger filter on boot up
  tracing: Remove pointer (asterisk) and brackets from cpumask_t field
  tracing: Have trigger filter parsing errors show up in error_log
  x86/mm/kmmio: Remove redundant preempt_disable()
  tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
  Documentation/osnoise: Add osnoise/options documentation
  tracing/osnoise: Add preempt and/or irq disabled options
  tracing/osnoise: Add PANIC_ON_STOP option
  Documentation/osnoise: Escape underscore of NO_ prefix
  tracing: Fix some checker warnings
  tracing/osnoise: Make osnoise_options static
  tracing: remove unnecessary trace_trigger ifdef
  ring-buffer: Handle resize in early boot up
  tracing/hist: Fix issue of losting command info in error_log
  tracing: Fix issue of missing one synthetic field
  tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
  tracing/hist: Fix wrong return value in parse_action_params()
  ...
2022-12-15 18:01:16 -08:00
Masami Hiramatsu (Google)
e25e43a4e5 tracing: Fix complicated dependency of CONFIG_TRACER_MAX_TRACE
Both CONFIG_OSNOISE_TRACER and CONFIG_HWLAT_TRACER partially enables the
CONFIG_TRACER_MAX_TRACE code, but that is complicated and has
introduced a bug; It declares tracing_max_lat_fops data structure outside
of #ifdefs, but since it is defined only when CONFIG_TRACER_MAX_TRACE=y
or CONFIG_HWLAT_TRACER=y, if only CONFIG_OSNOISE_TRACER=y, that
declaration comes to a definition(!).

To fix this issue, and do not repeat the similar problem, makes
CONFIG_OSNOISE_TRACER and CONFIG_HWLAT_TRACER enables the
CONFIG_TRACER_MAX_TRACE always. It has there benefits;
- Fix the tracing_max_lat_fops bug
- Simplify the #ifdefs
- CONFIG_TRACER_MAX_TRACE code is fully enabled, or not.

Link: https://lore.kernel.org/linux-trace-kernel/167033628155.4111793.12185405690820208159.stgit@devnote3

Fixes: 424b650f35 ("tracing: Fix missing osnoise tracer on max_latency")
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: stable@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/all/166992525941.1716618.13740663757583361463.stgit@warthog.procyon.org.uk/ (original thread and v1)
Link: https://lore.kernel.org/all/202212052253.VuhZ2ulJ-lkp@intel.com/T/#u (v1 error report)
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-09 23:48:05 -05:00
Mark Rutland
94d095ffa0 ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.

This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.

Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 13:56:41 +00:00
Sathvika Vasireddy
280981d699 objtool: Add --mnop as an option to --mcount
Some architectures (powerpc) may not support ftrace locations being nop'ed
out at build time. Introduce CONFIG_HAVE_OBJTOOL_NOP_MCOUNT for objtool, as
a means for architectures to enable nop'ing of ftrace locations. Add --mnop
as an option to objtool --mcount, to indicate support for the same.

Also, make sure that --mnop can be passed as an option to objtool only when
--mcount is passed.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-12-sv@linux.ibm.com
2022-11-18 19:00:16 +11:00
Peter Zijlstra (Intel)
9440155ccb ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
x86 will shortly start using -fpatchable-function-entry for purposes
other than ftrace, make sure the __patchable_function_entry section
isn't merged in the mcount_loc section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org
2022-09-16 22:16:48 +02:00
Daniel Bristot de Oliveira
102227b970 rv: Add Runtime Verification (RV) interface
RV is a lightweight (yet rigorous) method that complements classical
exhaustive verification techniques (such as model checking and
theorem proving) with a more practical approach to complex systems.

RV works by analyzing the trace of the system's actual execution,
comparing it against a formal specification of the system behavior.
RV can give precise information on the runtime behavior of the
monitored system while enabling the reaction for unexpected
events, avoiding, for example, the propagation of a failure on
safety-critical systems.

The development of this interface roots in the development of the
paper:

De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo
Silva. Efficient formal verification for the Linux kernel. In:
International Conference on Software Engineering and Formal Methods.
Springer, Cham, 2019. p. 315-332.

And:

De Oliveira, Daniel Bristot. Automata-based formal analysis
and verification of the real-time Linux kernel. PhD Thesis, 2020.

The RV interface resembles the tracing/ interface on purpose. The current
path for the RV interface is /sys/kernel/tracing/rv/.

It presents these files:

 "available_monitors"
   - List the available monitors, one per line.

   For example:
     # cat available_monitors
     wip
     wwnr

 "enabled_monitors"
   - Lists the enabled monitors, one per line;
   - Writing to it enables a given monitor;
   - Writing a monitor name with a '!' prefix disables it;
   - Truncating the file disables all enabled monitors.

   For example:
     # cat enabled_monitors
     # echo wip > enabled_monitors
     # echo wwnr >> enabled_monitors
     # cat enabled_monitors
     wip
     wwnr
     # echo '!wip' >> enabled_monitors
     # cat enabled_monitors
     wwnr
     # echo > enabled_monitors
     # cat enabled_monitors
     #

   Note that more than one monitor can be enabled concurrently.

 "monitoring_on"
   - It is an on/off general switcher for monitoring. Note
   that it does not disable enabled monitors or detach events,
   but stop the per-entity monitors of monitoring the events
   received from the system. It resembles the "tracing_on" switcher.

 "monitors/"
   Each monitor will have its one directory inside "monitors/". There
   the monitor specific files will be presented.
   The "monitors/" directory resembles the "events" directory on
   tracefs.

   For example:
     # cd monitors/wip/
     # ls
     desc  enable
     # cat desc
     wakeup in preemptive per-cpu testing monitor.
     # cat enable
     0

For further information, see the comments in the header of
kernel/trace/rv/rv.c from this patch.

Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:28 -04:00
Steven Rostedt (Google)
0a6d7d4541 ftrace: Be more specific about arch impact when function tracer is enabled
It was brought up that on ARMv7, that because the FUNCTION_TRACER does not
use nops to keep function tracing disabled because of the use of a link
register, it does have some performance impact.

The start of functions when -pg is used to compile the kernel is:

	push    {lr}
	bl      8010e7c0 <__gnu_mcount_nc>

When function tracing is tuned off, it becomes:

	push    {lr}
	add   sp, sp, #4

Which just puts the stack back to its normal location. But these two
instructions at the start of every function does incur some overhead.

Be more honest in the Kconfig FUNCTION_TRACER description and specify that
the overhead being in the noise was x86 specific, but other architectures
may vary.

Link: https://lore.kernel.org/all/20220705105416.GE5208@pengutronix.de/
Link: https://lkml.kernel.org/r/20220706161231.085a83da@gandalf.local.home

Reported-by: Sascha Hauer <sha@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12 16:36:34 -04:00
Linus Torvalds
22922deae1 Objtool changes for this cycle were:
- Comprehensive interface overhaul:
    =================================
 
    Objtool's interface has some issues:
 
      - Several features are done unconditionally, without any way to turn
        them off.  Some of them might be surprising.  This makes objtool
        tricky to use, and prevents porting individual features to other
        arches.
 
      - The config dependencies are too coarse-grained.  Objtool enablement is
        tied to CONFIG_STACK_VALIDATION, but it has several other features
        independent of that.
 
      - The objtool subcmds ("check" and "orc") are clumsy: "check" is really
        a subset of "orc", so it has all the same options.  The subcmd model
        has never really worked for objtool, as it only has a single purpose:
        "do some combination of things on an object file".
 
      - The '--lto' and '--vmlinux' options are nonsensical and have
        surprising behavior.
 
    Overhaul the interface:
 
       - get rid of subcmds
 
       - make all features individually selectable
 
       - remove and/or clarify confusing/obsolete options
 
       - update the documentation
 
       - fix some bugs found along the way
 
  - Fix x32 regression
 
  - Fix Kbuild cleanup bugs
 
  - Add scripts/objdump-func helper script to disassemble a single function from an object file.
 
  - Rewrite scripts/faddr2line to be section-aware, by basing it on 'readelf',
    moving it away from 'nm', which doesn't handle multiple sections well,
    which can result in decoding failure.
 
  - Rewrite & fix symbol handling - which had a number of bugs wrt. object files
    that don't have global symbols - which is rare but possible. Also fix a
    bunch of symbol handling bugs found along the way.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLtcURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jVQg//QM8nCNadJAVS9exVGX1DZI9pnf3OJaA9
 gOFML7Lv3MC+Lwdxt6Iv020rFVaeAnOcjPsis3dppFz62FZzzMWoemn5irg2BFiJ
 dp++UtJWTfKxgU2BHydU9uXD0kcJkD4AjBCIaFsgmTjAz8QvMGa9j0smuUm3cDSL
 0Bdid+LhkQqW3P2FiLWsSAzh4vqZmdwpXgERZRql8qD3NYk5hV4QDKs3gMguktat
 9gos4kGt0uwKfiEvmeNEXkoAwUsTvE/vqaOy9cVxxCqcWrrC+yQeBpwSoqhHK526
 dyHlwlYvBaPFqZnmquVUv21iv1MU6dUBJPhNIChke0NDTwVzSXdI75207FARyk5J
 3igSFEfJcU9zMvhAAsAjzD/uQP2ATowg5qa/V2xyWwtyaRgBleRffYiDsbhgDoNc
 R4/vI+vn/fQXouMhmmjPNYzu9uHQ+k89wQCJIY8Bswf7oNu6nKL3jJb/a/a7xhsH
 ZNqv+M0KEENTZcjBU2UHGyImApmkTlsp2mxUiiHs7QoV1hTfz+TcTXKPM1mIuJB8
 /HrVpv64CZ3S7p4JyGBUTNpci4mBjgBmwwAf16+dtaxyxxfoqReVWh3+bzsZbH+B
 kRjezWHh7/yCsoyDm7/LPgyPKEbozLLzMsTsjVJeWgeTgZ+xuqku3PTVctyzAI21
 DVL5oZe3iK4=
 =ARdm
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Comprehensive interface overhaul:
   =================================

   Objtool's interface has some issues:

     - Several features are done unconditionally, without any way to
       turn them off. Some of them might be surprising. This makes
       objtool tricky to use, and prevents porting individual features
       to other arches.

     - The config dependencies are too coarse-grained. Objtool
       enablement is tied to CONFIG_STACK_VALIDATION, but it has several
       other features independent of that.

     - The objtool subcmds ("check" and "orc") are clumsy: "check" is
       really a subset of "orc", so it has all the same options.

       The subcmd model has never really worked for objtool, as it only
       has a single purpose: "do some combination of things on an object
       file".

     - The '--lto' and '--vmlinux' options are nonsensical and have
       surprising behavior.

   Overhaul the interface:

      - get rid of subcmds

      - make all features individually selectable

      - remove and/or clarify confusing/obsolete options

      - update the documentation

      - fix some bugs found along the way

 - Fix x32 regression

 - Fix Kbuild cleanup bugs

 - Add scripts/objdump-func helper script to disassemble a single
   function from an object file.

 - Rewrite scripts/faddr2line to be section-aware, by basing it on
   'readelf', moving it away from 'nm', which doesn't handle multiple
   sections well, which can result in decoding failure.

 - Rewrite & fix symbol handling - which had a number of bugs wrt.
   object files that don't have global symbols - which is rare but
   possible. Also fix a bunch of symbol handling bugs found along the
   way.

* tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  objtool: Fix objtool regression on x32 systems
  objtool: Fix symbol creation
  scripts/faddr2line: Fix overlapping text section failures
  scripts: Create objdump-func helper script
  objtool: Remove libsubcmd.a when make clean
  objtool: Remove inat-tables.c when make clean
  objtool: Update documentation
  objtool: Remove --lto and --vmlinux in favor of --link
  objtool: Add HAVE_NOINSTR_VALIDATION
  objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"
  objtool: Make noinstr hacks optional
  objtool: Make jump label hack optional
  objtool: Make static call annotation optional
  objtool: Make stack validation frame-pointer-specific
  objtool: Add CONFIG_OBJTOOL
  objtool: Extricate sls from stack validation
  objtool: Rework ibt and extricate from stack validation
  objtool: Make stack validation optional
  objtool: Add option to print section addresses
  objtool: Don't print parentheses in function addresses
  ...
2022-05-24 10:36:38 -07:00
Josh Poimboeuf
03f16cd020 objtool: Add CONFIG_OBJTOOL
Now that stack validation is an optional feature of objtool, add
CONFIG_OBJTOOL and replace most usages of CONFIG_STACK_VALIDATION with
it.

CONFIG_STACK_VALIDATION can now be considered to be frame-pointer
specific.  CONFIG_UNWINDER_ORC is already inherently valid for live
patching, so no need to "validate" it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/939bf3d85604b2a126412bf11af6e3bd3b872bcb.1650300597.git.jpoimboe@redhat.com
2022-04-22 12:32:03 +02:00
Paul E. McKenney
835f14ed53 rcu: Make the TASKS_RCU Kconfig option be selected
Currently, any kernel built with CONFIG_PREEMPTION=y also gets
CONFIG_TASKS_RCU=y, which is not helpful to people trying to build
preemptible kernels of minimal size.

Because CONFIG_TASKS_RCU=y is needed only in kernels doing tracing of
one form or another, this commit moves from TASKS_RCU deciding when it
should be enabled to the tracing Kconfig options explicitly selecting it.
This allows building preemptible kernels without TASKS_RCU, if desired.

This commit also updates the SRCU-N and TREE09 rcutorture scenarios
in order to avoid Kconfig errors that would otherwise result from
CONFIG_TASKS_RCU being selected without its CONFIG_RCU_EXPERT dependency
being met.

[ paulmck: Apply BPF_SYSCALL feedback from Andrii Nakryiko. ]

Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20 16:52:58 -07:00
Steven Rostedt (Google)
1cd927ad6f tracing: mark user_events as BROKEN
After being merged, user_events become more visible to a wider audience
that have concerns with the current API.

It is too late to fix this for this release, but instead of a full
revert, just mark it as BROKEN (which prevents it from being selected in
make config).  Then we can work finding a better API.  If that fails,
then it will need to be completely reverted.

To not have the code silently bitrot, still allow building it with
COMPILE_TEST.

And to prevent the uapi header from being installed, then later changed,
and then have an old distro user space see the old version, move the
header file out of the uapi directory.

Surround the include with CONFIG_COMPILE_TEST to the current location,
but when the BROKEN tag is taken off, it will use the uapi directory,
and fail to compile.  This is a good way to remind us to move the header
back.

Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home
Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-02 10:32:14 -07:00
Linus Torvalds
169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Masami Hiramatsu
5b0ab78998 fprobe: Add exit_handler support
Add exit_handler to fprobe. fprobe + rethook allows us to hook the kernel
function return. The rethook will be enabled only if the
fprobe::exit_handler is set.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735290790.1084943.10601965782208052202.stgit@devnote2
2022-03-17 20:16:52 -07:00
Masami Hiramatsu
54ecbe6f1e rethook: Add a generic return hook
Add a return hook framework which hooks the function return. Most of the
logic came from the kretprobe, but this is independent from kretprobe.

Note that this is expected to be used with other function entry hooking
feature, like ftrace, fprobe, adn kprobes. Eventually this will replace
the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment,
this is just an additional hook.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
2022-03-17 20:16:29 -07:00