Commit Graph

14654 Commits

Author SHA1 Message Date
Nícolas F. R. A. Prado
d63fde98b8 selftests: ktap_helpers: Add a helper to abort the test
Similar to the C counterpart, add a helper function to abort the
remainder of the test.

Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-20 15:53:15 -07:00
Nícolas F. R. A. Prado
d90b7c705c selftests: ktap_helpers: Add helper to pass/fail test based on exit code
Similar to the C counterpart, add a helper function that runs a command
and passes or fails the test based on the result.

Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-20 15:53:08 -07:00
Nícolas F. R. A. Prado
6934eea269 selftests: ktap_helpers: Add helper to print diagnostic messages
Similar to the C counterpart, add a helper to print a diagnostic
message.

Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-20 15:53:02 -07:00
Laura Nao
7c079e909b selftests: Move KTAP bash helpers to selftests common folder
Move bash helpers for outputting in KTAP format to the common selftests
folder. This allows kselftests other than the dt one to source the file
and make use of the helper functions.
Define pass, fail and skip codes in the same file too.

Signed-off-by: Laura Nao <laura.nao@collabora.com>
Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-20 15:52:55 -07:00
Ali Zahraee
8cbf22b3dc selftests: ftrace: fix typo in test description
The typo in the description shows up in test logs and output.
This patch submission is part of my application to the Linux Foundation
mentorship program: Linux kernel Bug Fixing Spring Unpaid 2024.

Signed-off-by: Ali Zahraee <ahzahraee@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-20 14:48:32 -07:00
Kousik Sanagavarapu
1901ae3cc9 selftest/ftrace: fix typo in ftracetest script
Fix a typo in ftracetest script which is run when running the kselftests
for ftrace.

s/faii/fail

Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-20 14:47:37 -07:00
Mark Brown
f17d8a87ec selftests: fuxex: Report a unique test name per run of futex_requeue_pi
The futex_requeue_pi test program is run a number of times with different
options to provide multiple test cases. Currently every time it runs it
reports the result with a consistent string, meaning that automated systems
parsing the TAP output from a test run have difficulty in distinguishing
which test is which.

The parameters used for the test are already logged as part of the test
output, let's use the same format to roll them into the test name that we
use with KTAP so that automated systems can follow the results of the
individual cases that get run.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-19 15:00:51 -07:00
Ilpo Järvinen
345e8abe4c selftests/resctrl: Get domain id from cache id
Domain id is acquired differently depending on CPU. AMD tests use id
from L3 cache, whereas CPUs from other vendors base the id on topology
package id. In order to support L2 CAT test, this has to be
generalized.

The driver side code seems to get the domain ids from cache ids so the
approach used by the AMD branch seems to match the kernel-side code. It
will also work with L2 domain IDs as long as the cache level is
generalized.

Using the topology id was always fragile due to mismatch with the
kernel-side way to acquire the domain id. It got incorrect domain id,
e.g., when Cluster-on-Die (CoD) is enabled for CPU (but CoD is not well
suited for resctrl in the first place so it has not been a big issue if
tests don't work correctly with it).

Taking all the above into account, generalize acquiring the domain id
by taking it from the cache id and do not hard-code the cache level.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
6874f6ed92 selftests/resctrl: Rename resource ID to domain ID
Kernel-side calls the instances of a resource domains.

Change the resource_id naming in the selftest code to domain_id to
match the kernel side better.

Suggested-by: Maciej Wieczór-Retman <maciej.wieczor-retman@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
e73dda7ffd selftests/resctrl: Add helper to convert L2/3 to integer
"L2"/"L3" conversion to integer is embedded into get_cache_size()
which prevents reuse.

Create a helper for the cache string to integer conversion to make
it reusable.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
ca1608875a selftests/resctrl: Pass write_schemata() resource instead of test name
write_schemata() takes the test name as an argument and determines the
relevant resource based on the test name. Such mapping from name to
resource does not really belong to resctrlfs.c that should provide
only generic, test-independent functions.

Pass the resource stored in the test information structure to
write_schemata() instead of the test name. The new API is also more
flexible as it enables to use write_schemata() for more than one
resource within a test.

While touching the sprintf(), move the unnecessary %c that is always
'=' directly into the format string.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
c603ff5bb8 selftests/resctrl: Introduce generalized test framework
Each test currently has a "run test" function in per test file and
another resctrl_tests.c. The functions in resctrl_tests.c are almost
identical.

Generalize the one in resctrl_tests.c such that it can be shared
between all of the tests. It makes adding new tests easier and removes
the per test if () forests.

Also add comment to CPU vendor IDs that they must be defined as bits
for a bitmask.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
15f2988212 selftests/resctrl: Create struct for input parameters
resctrl_tests reads a set of parameters and passes them individually
for each tests which causes variations in the call signature between
the tests.

Add struct input_params to hold all input parameters. It can be easily
passed to every test without varying the call signature.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
6c8cb747d0 selftests/resctrl: Restore the CPU affinity after CAT test
CAT test does not reset the CPU affinity after the benchmark.
This is relatively harmless as is because CAT test is the last
benchmark to run, however, more tests may be added later.

Store the CPU affinity the first time taskset_benchmark() is run and
add taskset_restore() which the test can call to reset the CPU mask to
its original value.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
205de6ddd7 selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
CAT test spawns two processes into two different control groups with
exclusive schemata. Both the processes alloc a buffer from memory
matching their allocated LLC block size and flush the entire buffer out
of caches. Since the processes are reading through the buffer only once
during the measurement and initially all the buffer was flushed, the
test isn't testing CAT.

Rewrite the CAT test to allocate a buffer sized to half of LLC. Then
perform a sequence of tests with different LLC alloc sizes starting
from half of the CBM bits down to 1-bit CBM. Flush the buffer before
each test and read the buffer twice. Observe the LLC misses on the
second read through the buffer. As the allocated LLC block gets smaller
and smaller, the LLC misses will become larger and larger giving a
strong signal on CAT working properly.

The new CAT test is using only a single process because it relies on
measured effect against another run of itself rather than another
process adding noise. The rest of the system is set to use the CBM bits
not used by the CAT test to keep the test isolated.

Replace count_bits() with count_contiguous_bits() to get the first bit
position in order to be able to calculate masks based on it.

This change has been tested with a number of systems from different
generations.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:45 -07:00
Ilpo Järvinen
bcdb2e9d9f selftests/resctrl: Read in less obvious order to defeat prefetch optimizations
When reading memory in order, HW prefetching optimizations will
interfere with measuring how caches and memory are being accessed. This
adds noise into the results.

Change the fill_buf reading loop to not use an obvious in-order access
using multiply by a prime and modulo.

Using a prime multiplier with modulo ensures the entire buffer is
eventually read. 23 is small enough that the reads are spread out but
wrapping does not occur very frequently (wrapping too often can trigger
L2 hits more frequently which causes noise to the test because getting
the data from LLC is not required).

It was discovered that not all primes work equally well and some can
cause wildly unstable results (e.g., in an earlier version of this
patch, the reads were done in reversed order and 59 was used as the
prime resulting in unacceptably high and unstable results in MBA and
MBM test on some architectures).

Link: https://lore.kernel.org/linux-kselftest/TYAPR01MB6330025B5E6537F94DA49ACB8B499@TYAPR01MB6330.jpnprd01.prod.outlook.com/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
90a009db09 selftests/resctrl: Replace file write with volatile variable
The fill_buf code prevents compiler optimizating the entire read loop
away by writing the final value of the variable into a file. While it
achieves the goal, writing into a file requires significant amount of
work within the innermost test loop and also error handling.

A simpler approach is to take advantage of volatile. Writing through
a pointer to a volatile variable is enough to prevent compiler from
optimizing the write away, and therefore compiler cannot remove the
read loop either.

Add a volatile 'value_sink' into resctrl_tests.c and make fill_buf to
write into it. As a result, the error handling in fill_buf.c can be
simplified.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
2892731ec2 selftests/resctrl: Open perf fd before start & add error handling
Perf fd (pe_fd) is opened, reset, and enabled during every test the CAT
selftest runs. Also, ioctl(pe_fd, ...) calls are not error checked even
if ioctl() could return an error.

Open perf fd only once before the tests and only reset and enable the
counter within the test loop. Add error checking to pe_fd ioctl()
calls.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
433f437b3a selftests/resctrl: Move cat_val() to cat_test.c and rename to cat_test()
The main CAT test function is called cat_val() and resides in cache.c
which is illogical.

Rename the function to cat_test() and move it into cat_test.c.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
3cdad0a5a6 selftests/resctrl: Convert perf related globals to locals
Perf related variables pea_llc_miss, pe_read, and pe_fd are globals in
cache.c.

Convert them to locals for better scoping and make pea_llc_miss simpler
by renaming it to pea. Make close(pe_fd) handling easier to understand
by doing it inside cat_val().

Make also sizeof()s use safer way to determine the right struct.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
038ce802e2 selftests/resctrl: Improve perf init
struct perf_event_attr initialization is spread into
perf_event_initialize() and perf_event_attr_initialize() and setting
->config is hardcoded by the deepest level.

perf_event_attr init belongs to perf_event_attr_initialize() so move it
entirely there. Rename the other function
perf_event_initialized_read_format().

Call each init function directly from the test as they will take
different parameters (especially true after the perf related global
variables are moved to local variables).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
b6e6a582f2 selftests/resctrl: Consolidate naming of perf event related things
Naming for perf event related functions, types, and variables is
inconsistent.

Make struct read_format and all functions related to perf events start
with "perf_". Adjust variable names towards the same direction but use
shorter names for variables where appropriate (pe prefix).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
3c6bfc9cc7 selftests/resctrl: Remove nested calls in perf event handling
Perf event handling has functions that are the sole caller of another
perf event handling related function:
  - reset_enable_llc_perf() calls perf_event_open_llc_miss()
  - perf_event_measure() calls get_llc_perf()

Remove the extra layer of calls to make the code easier to follow by
moving the code into the calling function.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
33403bc7fe selftests/resctrl: Remove unnecessary __u64 -> unsigned long conversion
Perf counters are __u64 but the code converts them to unsigned long
before printing them out.

Remove unnecessary type conversion and retain the perf originating
value as __u64.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
5caf1b6400 selftests/resctrl: Split show_cache_info() to test specific and generic parts
show_cache_info() calculates results and provides generic cache
information. This makes it hard to alter pass/fail conditions.

Separate the test specific checks into CAT and CMT test files and
leave only the generic information part into show_cache_info().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
a575c9734f selftests/resctrl: Split measure_cache_vals()
measure_cache_vals() does a different thing depending on the test case
that called it:
  - For CAT, it measures LLC misses through perf.
  - For CMT, it measures LLC occupancy through resctrl.

Split these two functionalities into own functions the CAT and CMT
tests can call directly. Replace passing the struct resctrl_val_param
parameter with the filename because it's more generic and all those
functions need out of resctrl_val.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
b6dfac9486 selftests/resctrl: Exclude shareable bits from schemata in CAT test
CAT test doesn't take shareable bits into account, i.e., the test might
be sharing cache with some devices (e.g., graphics).

Introduce get_mask_no_shareable() and use it to provision an
environment for CAT test where the allocated LLC is isolated better.
Excluding shareable_bits may create hole(s) into the cbm_mask, thus add
a new helper count_contiguous_bits() to find the longest contiguous set
of CBM bits.

create_bit_mask() is needed by an upcoming CAT test rewrite so make it
available in resctrl.h right away.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
19e94a2333 selftests/resctrl: Create cache_portion_size() helper
CAT and CMT tests calculate size of the cache portion for the n-bits
cache allocation on their own.

Add cache_portion_size() helper that calculates size of the cache
portion for the given number of bits and use it to replace the existing
span calculations. This also prepares for the new CAT test that will
need to determine the size of the cache portion also during results
processing.

Rename also 'cache_size' local variables to 'cache_total_size' to
prevent misinterpretations.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
60c2a6926c selftests/resctrl: Mark get_cache_size() cache_type const
get_cache_size() does not modify cache_type so it could be const.

Mark cache_type const so that const char * can be passed to it. This
prevents warnings once many of the test parameters are marked const.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:44 -07:00
Ilpo Järvinen
4b357e2a6d selftests/resctrl: Refactor get_cbm_mask() and rename to get_full_cbm()
Callers of get_cbm_mask() are required to pass a string into which the
capacity bitmask (CBM) is read. Neither CAT nor CMT tests need the
bitmask as string but just convert it into an unsigned long value.

Another limitation is that the bit mask reader can only read
.../cbm_mask files.

Generalize the bit mask reading function into get_bit_mask() such that
it can be used to handle other files besides the .../cbm_mask and
handles the unsigned long conversion within get_bit_mask() using
fscanf(). Change get_cbm_mask() to use get_bit_mask() and rename it to
get_full_cbm() to better indicate what the function does.

Return error from get_full_cbm() if the bitmask is zero for some reason
because it makes the code more robust as the selftests naturally assume
the bitmask has some bits.

Also mark cache_type const while at it and remove useless comments that
are related to processing of CBM bits.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:43 -07:00
Ilpo Järvinen
24be05591f selftests/resctrl: Refactor fill_buf functions
There are unnecessary nested calls in fill_buf.c:
  - run_fill_buf() calls fill_cache()
  - alloc_buffer() calls malloc_and_init_memory()

Simplify the code flow and remove those unnecessary call levels by
moving the called code inside the calling function and remove the
duplicated error print.

Resolve the difference in run_fill_buf() and fill_cache() parameter
name into 'buf_size' which is more descriptive than 'span'. Also, while
moving the allocation related code, rename 'p' into 'buf' to be
consistent in naming the variables.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:43 -07:00
Ilpo Järvinen
f8f6696999 selftests/resctrl: Split fill_buf to allow tests finer-grained control
MBM, MBA and CMT test cases call run_fill_buf() that in turn calls
fill_cache() to alloc and loop indefinitely around the buffer. This
binds buffer allocation and running the benchmark into a single bundle
so that a selftest cannot allocate a buffer once and reuse it. CAT test
doesn't want to loop around the buffer continuously and after rewrite
it needs the ability to allocate the buffer separately.

Split buffer allocation out of fill_cache() into alloc_buffer(). This
change is part of preparation for the new CAT test that allocates a
buffer and does multiple passes over the same buffer (but not in an
infinite loop).

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:43 -07:00
Ilpo Järvinen
348139384b selftests/resctrl: Change function comments to say < 0 on error
A number function comments state the function return non-zero on
failure but in reality they can only return 0 on success and < 0 on
error.

Update the comments to say < 0 on error to match the behavior.

While at it, improve cat_val() comment to state that 0 means the test
was run (either pass or fail).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:43 -07:00
Ilpo Järvinen
bcd8a929a5 selftests/resctrl: Don't use ctrlc_handler() outside signal handling
perf_event_open_llc_miss() calls ctrlc_handler() to cleanup if
perf_event_open() returns an error. Those cleanups, however, are not
the responsibility of perf_event_open_llc_miss() and it thus interferes
unnecessarily with the usual cleanup pattern. Worse yet,
ctrlc_handler() calls exit() in the end preventing the ordinary cleanup
done in the calling function from executing.

ctrlc_handler() should only be used as a signal handler, not during
normal error handling.

Remove call to ctrlc_handler() from perf_event_open_llc_miss(). As
unmounting resctrlfs and test cleanup are already handled properly
by error rollbacks in the calling functions, no other changes are
necessary.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:43 -07:00
Ilpo Järvinen
c90fba60f2 selftests/resctrl: Return -1 instead of errno on error
A number of functions in the resctrl selftests return errno. It is
problematic because errno is positive which is often counterintuitive.
Also, every site returning errno prints the error message already with
ksft_perror() so there is not much added value in returning the precise
error code.

Simply convert all places returning errno to return -1 that is typical
userspace error code in case of failures.

While at it, improve resctrl_val() comment to state that 0 means the
test was run (either pass or fail).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:43 -07:00
Ilpo Järvinen
cc8ff7f5c8 selftests/resctrl: Convert perror() to ksft_perror() or ksft_print_msg()
The resctrl selftest code contains a number of perror() calls. Some of
them come with hash character and some don't. The kselftest framework
provides ksft_perror() that is compatible with test output formatting
so it should be used instead of adding custom hash signs.

Some perror() calls are too far away from anything that sets error.
For those call sites, ksft_print_msg() must be used instead.

Convert perror() to ksft_perror() or ksft_print_msg().

Other related changes:
- Remove hash signs
- Remove trailing stops & newlines from ksft_perror()
- Add terminating newlines for converted ksft_print_msg()
- Use consistent capitalization
- Small fixes/tweaks to typos & grammar of the messages
- Extract error printing out of PARENT_EXIT() to be able to
  differentiate

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13 13:56:43 -07:00
Marcos Paulo de Souza
6a71770442 selftests: livepatch: Test livepatching a heavily called syscall
The test proves that a syscall can be livepatched. It is interesting
because syscalls are called a tricky way. Also the process gets
livepatched either when sleeping in the userspace or when entering
or leaving the kernel space.

The livepatch is a bit tricky:
  1. The syscall function name is architecture specific. Also
     ARCH_HAS_SYSCALL_WRAPPER must be taken in account.

  2. The syscall must stay working the same way for other processes
     on the system. It is solved by decrementing a counter only
     for PIDs of the test processes. It means that the test processes
     has to call the livepatched syscall at least once.

The test creates one userspace process per online cpu. The processes
are calling getpid in a busy loop. The intention is to create random
locations when the livepatch gets enabled. Nothing is guarantted.
The magic is in the randomness.

Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-22 10:29:53 -07:00
Marcos Paulo de Souza
c4bbe83d27 livepatch: Move tests from lib/livepatch to selftests/livepatch
The modules are being moved from lib/livepatch to
tools/testing/selftests/livepatch/test_modules.

This code moving will allow writing more complex tests, like for example an
userspace C code that will call a livepatched kernel function.

The modules are now built as out-of-tree
modules, but being part of the kernel source means they will be maintained.

Another advantage of the code moving is to be able to easily change,
debug and rebuild the tests by running make on the selftests/livepatch
directory, which is not currently possible since the modules on
lib/livepatch are build and installed using the "modules" target.

The current approach also keeps the ability to execute the tests manually
by executing the scripts inside selftests/livepatch directory, as it's
currently supported. If the modules are modified, they needed to be
rebuilt before running the scripts though.

The modules are built before running the selftests when using the
kselftest invocations:

	make kselftest TARGETS=livepatch
or
	make -C tools/testing/selftests/livepatch run_tests

Having the modules being built as out-of-modules requires changing the
currently used 'modprobe' by 'insmod' and adapt the test scripts that
check for the kernel message buffer.

Now it is possible to only compile the modules by running:

	make -C tools/testing/selftests/livepatch/

This way the test modules and other test program can be built in order
to be packaged if so desired.

As there aren't any modules being built on lib/livepatch, remove the
TEST_LIVEPATCH Kconfig and it's references.

Note: "make gen_tar" packages the pre-built binaries into the tarball.
       It means that it will store the test modules pre-built for
       the kernel running on the build host.

       Note that these modules need not binary compatible with
       the kernel built from the same sources. But the same
       is true for other packaged selftest binaries.

       The entire kernel sources are needed for rebuilding
       the selftests on another system.

Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-22 10:29:47 -07:00
Marcos Paulo de Souza
6727980b67 kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
Add TEST_GEN_MODS_DIR variable for kselftests. It can point to
a directory containing kernel modules that will be used by
selftest scripts.

The modules are built as external modules for the running kernel.
As a result they are always binary compatible and the same tests
can be used for older or newer kernels.

The build requires "kernel-devel" package to be installed.
For example, in the upstream sources, the rpm devel package
is produced by "make rpm-pkg"

The modules can be built independently by

  make -C tools/testing/selftests/livepatch/

or they will be automatically built before running the tests via

  make -C tools/testing/selftests/livepatch/ run_tests

Note that they are _not_ built when running the standalone
tests by calling, for example, ./test-state.sh.

Along with TEST_GEN_MODS_DIR, it was necessary to create a new install
rule. INSTALL_MODS_RULE is needed because INSTALL_SINGLE_RULE would
copy the entire TEST_GEN_MODS_DIR directory to the destination, even
the files created by Kbuild to compile the modules. The new install
rule copies only the .ko files, as we would expect the gen_tar to work.

Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-22 10:29:41 -07:00
Linus Torvalds
e5075d8ec5 RISC-V Patches for the 6.8 Merge Window, Part 4
This includes everything from part 2:
 
 * Support for tuning for systems with fast misaligned accesses.
 * Support for SBI-based suspend.
 * Support for the new SBI debug console extension.
 * The T-Head CMOs now use PA-based flushes.
 * Support for enabling the V extension in kernel code.
 * Optimized IP checksum routines.
 * Various ftrace improvements.
 * Support for archrandom, which depends on the Zkr extension.
 
 and then also a fix for those:
 
 * The build is no longer broken under NET=n, KUNIT=y for ports that
   don't define their own ipv6 checksum.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmWsCOMTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYieQND/0f+1gizTM0OzuqZG9+DOdWTtqmILyr
 sZaYXWBw6SPzbUSlwjoW4Qp/S3Ur7IhrbfttM2aMoS4GHZvSESAXOMXC4c7AnCaQ
 HOXBC2OuXvq6jA0ZjK5XPviR70A/7uD2iu5SNO1hyfJK08LSEu+AulxtkW50+wMc
 bHXSpZxEf8AtwOJK1cRtwhH4qy+Qcs3Nla3jG7OnDsPbhJVcydHx95eCtfwn2cQA
 KwJPN1fjRtm4ALZb91QcMDO8VAoanfPEkSR3DoNVE/UfdTItYk35VHmf4RWh7IWA
 qDnV5Mp/XMX2RmJqwi1ZmSHHX0rfVLL5UqgBhGHC8PuMpLJn5p9U6DZ0qD7YWxcB
 NDlrHsaXt112RHEEM/7CcLkqEexua/ezcC45E5tSQ4sRDZE3fvgbALao67xSQ22D
 lCpVAY0Z3o5oWaM/jISiQHjSNn5RrAwEYSvvv2pkW4QAMShA2eggmQaCF+Jl4EMp
 u6yqJpXxDI99C088uvM6Bi2gcX8fnBSmOzCB/sSU4a1I72UpWrGngqUpTYKHG8Jz
 cTZhbIKmQirBP0vC/UgMOS0sNuw/NykItfRXZ2g0qGKvw1TjJ6djdeZBKcAj3h0E
 fJpMxuhmeOFYE7DavnhSt3CResFTXZzXLChjxGbT+g10YzVEf9g7vBVnjxAwad9f
 tryMVpL/ipGpQQ==
 =Sjhj
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Support for tuning for systems with fast misaligned accesses.

 - Support for SBI-based suspend.

 - Support for the new SBI debug console extension.

 - The T-Head CMOs now use PA-based flushes.

 - Support for enabling the V extension in kernel code.

 - Optimized IP checksum routines.

 - Various ftrace improvements.

 - Support for archrandom, which depends on the Zkr extension.

 - The build is no longer broken under NET=n, KUNIT=y for ports that
   don't define their own ipv6 checksum.

* tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (56 commits)
  lib: checksum: Fix build with CONFIG_NET=n
  riscv: lib: Check if output in asm goto supported
  riscv: Fix build error on rv32 + XIP
  riscv: optimize ELF relocation function in riscv
  RISC-V: Implement archrandom when Zkr is available
  riscv: Optimize hweight API with Zbb extension
  riscv: add dependency among Image(.gz), loader(.bin), and vmlinuz.efi
  samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI]
  riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
  riscv: ftrace: Make function graph use ftrace directly
  riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
  lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name
  riscv: Restrict DWARF5 when building with LLVM to known working versions
  riscv: Hoist linker relaxation disabling logic into Kconfig
  kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  riscv: Add checksum library
  riscv: Add checksum header
  riscv: Add static key for misaligned accesses
  asm-generic: Improve csum_fold
  RISC-V: selftests: cbo: Ensure asm operands match constraints
  ...
2024-01-20 11:06:04 -08:00
Linus Torvalds
736b5545d3 Including fixes from bpf and netfilter.
Previous releases - regressions:
 
  - Revert "net: rtnetlink: Enslave device before bringing it up",
    breaks the case inverse to the one it was trying to fix
 
  - net: dsa: fix oob access in DSA's netdevice event handler
    dereference netdev_priv() before check its a DSA port
 
  - sched: track device in tcf_block_get/put_ext() only for clsact
    binder types
 
  - net: tls, fix WARNING in __sk_msg_free when record becomes full
    during splice and MORE hint set
 
  - sfp-bus: fix SFP mode detect from bitrate
 
  - drv: stmmac: prevent DSA tags from breaking COE
 
 Previous releases - always broken:
 
  - bpf: fix no forward progress in in bpf_iter_udp if output
    buffer is too small
 
  - bpf: reject variable offset alu on registers with a type
    of PTR_TO_FLOW_KEYS to prevent oob access
 
  - netfilter: tighten input validation
 
  - net: add more sanity check in virtio_net_hdr_to_skb()
 
  - rxrpc: fix use of Don't Fragment flag on RESPONSE packets,
    avoid infinite loop
 
  - amt: do not use the portion of skb->cb area which may get clobbered
 
  - mptcp: improve validation of the MPTCPOPT_MP_JOIN MCTCP option
 
 Misc:
 
  - spring cleanup of inactive maintainers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmWpnvoACgkQMUZtbf5S
 Irvskg/+Or5tETxOmpQXxnj6ECZyrSp0Jcyd7+TIcos/7JfPdn3Kebl004SG4h/s
 bwKDOIIP1iSjQ+0NFsPjyYIVd6wFuCElSB7npV5uQAT6ptXx7A4Ym68/rVxodI8T
 6hiYV/mlPuZF8JjRhtp/VJL8sY1qnG7RIUB4oH3y9HQNfwZX0lIWChuUilHuWfbq
 zQ2Iu97tMkoIBjXrkIT3Qaj0aFxYbjCOrg9zy+FZ69a7Rmrswr//7amlCH6saNTx
 Ku7Wl8FXhe7O23OiM6GSl7AechSM1aJ5kOS3orseej0+aSp9eH3ekYGmbsQr6sjz
 ix/eZ7V7SUkJK3bEH5haeymk4TDV3lHE8SziMbosK4wVbHOyPwEmqCxppADYJLZs
 WycHZKcTBluFBOxknAofH7m5Hh0ToXkeTfpptSSGtRB4WncAOMsMapr3yS4WXg/q
 AnOo/tzCBgMrnSJtD/kjqgUiCk8vYoLc8lBR9K74l0zqI1sf13OfuTHvEgqIS6z1
 Ir/ewlAV6fCH8gQbyzjKUVlyjZS4+vFv19xg/2GgLf+LdyzcCOxUZkND3/DE6+OA
 Dgf9gtABYU4hGXMUfTfml3KCBTF65QmY8dIh17zraNylYUHEJ2lI4D+sdiqWUrXb
 mXPBJh4nOPwIV5t2gT80skNwF3aWPr6l4ieY2codSbP04rO74S8=
 =YhQQ
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  Previous releases - regressions:

   - Revert "net: rtnetlink: Enslave device before bringing it up",
     breaks the case inverse to the one it was trying to fix

   - net: dsa: fix oob access in DSA's netdevice event handler
     dereference netdev_priv() before check its a DSA port

   - sched: track device in tcf_block_get/put_ext() only for clsact
     binder types

   - net: tls, fix WARNING in __sk_msg_free when record becomes full
     during splice and MORE hint set

   - sfp-bus: fix SFP mode detect from bitrate

   - drv: stmmac: prevent DSA tags from breaking COE

  Previous releases - always broken:

   - bpf: fix no forward progress in in bpf_iter_udp if output buffer is
     too small

   - bpf: reject variable offset alu on registers with a type of
     PTR_TO_FLOW_KEYS to prevent oob access

   - netfilter: tighten input validation

   - net: add more sanity check in virtio_net_hdr_to_skb()

   - rxrpc: fix use of Don't Fragment flag on RESPONSE packets, avoid
     infinite loop

   - amt: do not use the portion of skb->cb area which may get clobbered

   - mptcp: improve validation of the MPTCPOPT_MP_JOIN MCTCP option

  Misc:

   - spring cleanup of inactive maintainers"

* tag 'net-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  i40e: Include types.h to some headers
  ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work
  selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
  selftests: mlxsw: qos_pfc: Remove wrong description
  mlxsw: spectrum_router: Register netdevice notifier before nexthop
  mlxsw: spectrum_acl_tcam: Fix stack corruption
  mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
  mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
  ethtool: netlink: Add missing ethnl_ops_begin/complete
  selftests: bonding: Add more missing config options
  selftests: netdevsim: add a config file
  libbpf: warn on unexpected __arg_ctx type when rewriting BTF
  selftests/bpf: add tests confirming type logic in kernel for __arg_ctx
  bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
  bpf: extract bpf_ctx_convert_map logic and make it more reusable
  libbpf: feature-detect arg:ctx tag support in kernel
  ipvs: avoid stat macros calls from preemptible context
  netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description
  netfilter: nf_tables: skip dead set elements in netlink dump
  netfilter: nf_tables: do not allow mismatch field size and set key length
  ...
2024-01-18 17:33:50 -08:00
Linus Torvalds
db5ccb9eb2 cxl for v6.8
- Add support for parsing the Coherent Device Attribute Table (CDAT)
 
 - Add support for calculating a platform CXL QoS class from CDAT data
 
 - Unify the tracing of EFI CXL Events with native CXL Events.
 
 - Add Get Timestamp support
 
 - Miscellaneous cleanups and fixups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZaHVvAAKCRDfioYZHlFs
 Z3sCAQDPHSsHmj845k4lvKbWjys3eh78MKKEFyTXLQgYhOlsGAEAigQY2ZiSum52
 nwdIgpOOADNt0Iq6yXuLsmn9xvY9bAU=
 =HjCl
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull CXL (Compute Express Link) updates from Dan Williams:
 "The bulk of this update is support for enumerating the performance
  capabilities of CXL memory targets and connecting that to a platform
  CXL memory QoS class. Some follow-on work remains to hook up this data
  into core-mm policy, but that is saved for v6.9.

  The next significant update is unifying how CXL event records (things
  like background scrub errors) are processed between so called
  "firmware first" and native error record retrieval. The CXL driver
  handler that processes the record retrieved from the device mailbox is
  now the handler for that same record format coming from an EFI/ACPI
  notification source.

  This also contains miscellaneous feature updates, like Get Timestamp,
  and other fixups.

  Summary:

   - Add support for parsing the Coherent Device Attribute Table (CDAT)

   - Add support for calculating a platform CXL QoS class from CDAT data

   - Unify the tracing of EFI CXL Events with native CXL Events.

   - Add Get Timestamp support

   - Miscellaneous cleanups and fixups"

* tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (41 commits)
  cxl/core: use sysfs_emit() for attr's _show()
  cxl/pci: Register for and process CPER events
  PCI: Introduce cleanup helpers for device reference counts and locks
  acpi/ghes: Process CXL Component Events
  cxl/events: Create a CXL event union
  cxl/events: Separate UUID from event structures
  cxl/events: Remove passing a UUID to known event traces
  cxl/events: Create common event UUID defines
  cxl/events: Promote CXL event structures to a core header
  cxl: Refactor to use __free() for cxl_root allocation in cxl_endpoint_port_probe()
  cxl: Refactor to use __free() for cxl_root allocation in cxl_find_nvdimm_bridge()
  cxl: Fix device reference leak in cxl_port_perf_data_calculate()
  cxl: Convert find_cxl_root() to return a 'struct cxl_root *'
  cxl: Introduce put_cxl_root() helper
  cxl/port: Fix missing target list lock
  cxl/port: Fix decoder initialization when nr_targets > interleave_ways
  cxl/region: fix x9 interleave typo
  cxl/trace: Pass UUID explicitly to event traces
  cxl/region: use %pap format to print resource_size_t
  cxl/region: Add dev_dbg() detail on failure to allocate HPA space
  ...
2024-01-18 16:22:43 -08:00
Linus Torvalds
86c4d58a99 iommufd for 6.8
This brings the first of three planned user IO page table invalidation
 operations:
 
  - IOMMU_HWPT_INVALIDATE allows invalidating the IOTLB integrated into the
    iommu itself. The Intel implementation will also generate an ATC
    invalidation to flush the device IOTLB as it unambiguously knows the
    device, but other HW will not.
 
 It goes along with the prior PR to implement userspace IO page tables (aka
 nested translation for VMs) to allow Intel to have full functionality for
 simple cases. An Intel implementation of the operation is provided.
 
 Fix a small bug in the selftest mock iommu driver probe.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZaFiRQAKCRCFwuHvBreF
 YbmgAP9Z0+cAUPKxUKaMRls8YR+gmaOCniSkqBlyrxcib+F/WAD2NPLcBPBRk2o7
 GfXPIrovx96Btf8M40AFdiTEp7LABw==
 =9POe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "This brings the first of three planned user IO page table invalidation
  operations:

   - IOMMU_HWPT_INVALIDATE allows invalidating the IOTLB integrated into
     the iommu itself. The Intel implementation will also generate an
     ATC invalidation to flush the device IOTLB as it unambiguously
     knows the device, but other HW will not.

  It goes along with the prior PR to implement userspace IO page tables
  (aka nested translation for VMs) to allow Intel to have full
  functionality for simple cases. An Intel implementation of the
  operation is provided.

  Also fix a small bug in the selftest mock iommu driver probe"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd/selftest: Check the bus type during probe
  iommu/vt-d: Add iotlb flush for nested domain
  iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
  iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  iommufd/selftest: Add mock_domain_cache_invalidate_user support
  iommu: Add iommu_copy_struct_from_user_array helper
  iommufd: Add IOMMU_HWPT_INVALIDATE
  iommu: Add cache_invalidate_user op
2024-01-18 15:28:15 -08:00
Linus Torvalds
a2ded784cd tracing updates for 6.8:
- Allow kernel trace instance creation to specify what events are created
   Inside the kernel, a subsystem may create a tracing instance that it can
   use to send events to user space. This sub-system may not care about the
   thousands of events that exist in eventfs. Allow the sub-system to specify
   what sub-systems of events it cares about, and only those events are exposed
   to this instance.
 
 - Allow the ring buffer to be broken up into bigger sub-buffers than just the
   architecture page size. A new tracefs file called "buffer_subbuf_size_kb"
   is created. The user can now specify a minimum size the sub-buffer may be
   in kilobytes. Note, that the implementation currently make the sub-buffer
   size a power of 2 pages (1, 2, 4, 8, 16, ...) but the user only writes in
   kilobyte size, and the sub-buffer will be updated to the next size that
   it will can accommodate it. If the user writes in 10, it will change the
   size to be 4 pages on x86 (16K), as that is the next available size that
   can hold 10K pages.
 
 - Update the debug output when a corrupt time is detected in the ring buffer.
   If the ring buffer detects inconsistent timestamps, there's a debug config
   options that will dump the contents of the meta data of the sub-buffer that
   is used for debugging. Add some more information to this dump that helps
   with debugging.
 
 - Add more timestamp debugging checks (only triggers when the config is enabled)
 
 - Increase the trace_seq iterator to 2 page sizes.
 
 - Allow strings written into tracefs_marker to be larger. Up to just under
   2 page sizes (based on what trace_seq can hold).
 
 - Increase the trace_maker_raw write to be as big as a sub-buffer can hold.
 
 - Remove 32 bit time stamp logic, now that the rb_time_cmpxchg() has been
   removed.
 
 - More selftests were added.
 
 - Some code clean ups as well.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZZ8p3BQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6ql2GAQDZg/zlFEiJHyTfWbCIE8pA3T5xbzKo
 26TNxIZAxJJZpQEAvGFU5Smy14pG6soEoVMp8B6ZOANbqU8VVamhOL+r+Qw=
 =0OYG
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Allow kernel trace instance creation to specify what events are
   created

   Inside the kernel, a subsystem may create a tracing instance that it
   can use to send events to user space. This sub-system may not care
   about the thousands of events that exist in eventfs. Allow the
   sub-system to specify what sub-systems of events it cares about, and
   only those events are exposed to this instance.

 - Allow the ring buffer to be broken up into bigger sub-buffers than
   just the architecture page size.

   A new tracefs file called "buffer_subbuf_size_kb" is created. The
   user can now specify a minimum size the sub-buffer may be in
   kilobytes. Note, that the implementation currently make the
   sub-buffer size a power of 2 pages (1, 2, 4, 8, 16, ...) but the user
   only writes in kilobyte size, and the sub-buffer will be updated to
   the next size that it will can accommodate it. If the user writes in
   10, it will change the size to be 4 pages on x86 (16K), as that is
   the next available size that can hold 10K pages.

 - Update the debug output when a corrupt time is detected in the ring
   buffer. If the ring buffer detects inconsistent timestamps, there's a
   debug config options that will dump the contents of the meta data of
   the sub-buffer that is used for debugging. Add some more information
   to this dump that helps with debugging.

 - Add more timestamp debugging checks (only triggers when the config is
   enabled)

 - Increase the trace_seq iterator to 2 page sizes.

 - Allow strings written into tracefs_marker to be larger. Up to just
   under 2 page sizes (based on what trace_seq can hold).

 - Increase the trace_maker_raw write to be as big as a sub-buffer can
   hold.

 - Remove 32 bit time stamp logic, now that the rb_time_cmpxchg() has
   been removed.

 - More selftests were added.

 - Some code clean ups as well.

* tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
  ring-buffer: Remove stale comment from ring_buffer_size()
  tracing histograms: Simplify parse_actions() function
  tracing/selftests: Remove exec permissions from trace_marker.tc test
  ring-buffer: Use subbuf_order for buffer page masking
  tracing: Update subbuffer with kilobytes not page order
  ringbuffer/selftest: Add basic selftest to test changing subbuf order
  ring-buffer: Add documentation on the buffer_subbuf_order file
  ring-buffer: Just update the subbuffers when changing their allocation order
  ring-buffer: Keep the same size when updating the order
  tracing: Stop the tracing while changing the ring buffer subbuf size
  tracing: Update snapshot order along with main buffer order
  ring-buffer: Make sure the spare sub buffer used for reads has same size
  ring-buffer: Do no swap cpu buffers if order is different
  ring-buffer: Clear pages on error in ring_buffer_subbuf_order_set() failure
  ring-buffer: Read and write to ring buffers with custom sub buffer size
  ring-buffer: Set new size of the ring buffer sub page
  ring-buffer: Add interface for configuring trace sub buffer size
  ring-buffer: Page size per ring buffer
  ring-buffer: Have ring_buffer_print_page_header() be able to access ring_buffer_iter
  ring-buffer: Check if absolute timestamp goes backwards
  ...
2024-01-18 14:35:29 -08:00
Linus Torvalds
ba7dd8570d - Clean up selftest compilation issues, mostly from non-gcc compilers
- Avoid building selftests when not on x86
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmWfAycACgkQaDWVMHDJ
 krD5XQ/+KQ7kITy7jr5fskRQ3uGjk9KUnc3e3qO/SxDLJU9xuXbZbh5pqxZQrkud
 mj0G1LRCk8wsIPU44wP9SKPQRG9AqcsCNSiBkBaTBusHyCXCCvoJ013Mlqyj9ecz
 bvaYuHDuji29eV/0+xuOcv8ELJHFp/UCTQk6azeQIfUs/97/Ho2qMb1oHC7zNjWX
 okJBUj73tLO3EUCW5p9cLw2TgrmOtNa6KlNqj//xoDx03HofjoGyrx2fd8RcmOvY
 Z2v8XEfx/fnpD8vA8SwnCKhWDLHDdwdnLMREy3gykt3PBdmuIKTT5fIggMSMZh6c
 wbxYALGMyE+T0klIfme4k4SJuoitI+Ec/naW/aP3buAgdVFXVw7+KjAwEcOi18Sx
 kSpzvYCwE+sHIZdErk+1Wx/VIWgCBfkAr4hPLgxl5s6nHB2l7lXwGLvaxiBbXSQO
 aMDVD61JwCPI5WuLG8r8iCsCdbRwZVoe4Jm+CkwE69BccZfTXmjOuP0uNTY+cOoH
 Wroe74XGQp4QOvaBhunkzT/ntLaDcQvXGOhaTrYmCvElu1gB25c/FdEIPMTcQgPv
 dFMm49Gzo7v4RZjm/LSavJz6DU/40PRTYMntbKGSiirxAmxwpG8uNz9nUm6Q/4+D
 7uL0be3ey2DzGa+8FoYe9T3i0LbGiRBjlNIEXMjWh1pnD/auUGA=
 =PaVo
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX updates from Dave Hansen:
 "This time, these are entirely confined to SGX selftests fixes.

  The mini SGX enclave built by the selftests has garnered some
  attention because it stands alone and does not need the sizable
  infrastructure of the official SGX SDK. I think that's why folks are
  suddently interested in cleaning it up.

   - Clean up selftest compilation issues, mostly from non-gcc compilers

   - Avoid building selftests when not on x86"

* tag 'x86_sgx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/sgx: Skip non X86_64 platform
  selftests/sgx: Remove incomplete ABI sanitization code in test enclave
  selftests/sgx: Discard unsupported ELF sections
  selftests/sgx: Ensure expected location of test enclave buffer
  selftests/sgx: Ensure test enclave buffer is entirely preserved
  selftests/sgx: Fix linker script asserts
  selftests/sgx: Handle relocations in test enclave
  selftests/sgx: Produce static-pie executable for test enclave
  selftests/sgx: Remove redundant enclave base address save/restore
  selftests/sgx: Specify freestanding environment for enclave compilation
  selftests/sgx: Separate linker options
  selftests/sgx: Include memory clobber for inline asm in test enclave
  selftests/sgx: Fix uninitialized pointer dereferences in encl_get_entry
  selftests/sgx: Fix uninitialized pointer dereference in error path
2024-01-18 13:23:53 -08:00
Jakub Kicinski
4349efc52b bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZalBVQAKCRDbK58LschI
 gyfQAP4+KhkJiJiOXsECo0f3JcuzDgCqEMnylNx0Wujzgs2s9wD+LEjYr8zztqUd
 E9rkjGKUoSYYfarEJ0KKfy6Lv61BlgY=
 =xI6t
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-01-18

We've added 10 non-merge commits during the last 5 day(s) which contain
a total of 12 files changed, 806 insertions(+), 51 deletions(-).

The main changes are:

1) Fix an issue in bpf_iter_udp under backward progress which prevents
   user space process from finishing iteration, from Martin KaFai Lau.

2) Fix BPF verifier to reject variable offset alu on registers with a type
   of PTR_TO_FLOW_KEYS to prevent oob access, from Hao Sun.

3) Follow up fixes for kernel- and libbpf-side logic around handling
   arg:ctx tagged arguments of BPF global subprogs, from Andrii Nakryiko.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  libbpf: warn on unexpected __arg_ctx type when rewriting BTF
  selftests/bpf: add tests confirming type logic in kernel for __arg_ctx
  bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
  bpf: extract bpf_ctx_convert_map logic and make it more reusable
  libbpf: feature-detect arg:ctx tag support in kernel
  selftests/bpf: Add test for alu on PTR_TO_FLOW_KEYS
  bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS
  selftests/bpf: Test udp and tcp iter batching
  bpf: Avoid iter->offset making backward progress in bpf_iter_udp
  bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket
====================

Link: https://lore.kernel.org/r/20240118153936.11769-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:54:24 -08:00
Amit Cohen
b34f4de6d3 selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
using a shaper somewhere in the flow of the packets. In this area, the
buffer is smaller than the buffer at the beginning of the flow, so it fills
up until there is no more space left. The test configures there PFC
which is supposed to notice that the headroom is filling up and send PFC
Xoff to indicate the transmitter to stop sending traffic for the priorities
sharing this PG.

The Xon/Xoff threshold is auto-configured and always equal to
2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
traffic will keep arriving until the transmitter receives and processes
the PFC packet. This amount of traffic is known as the PFC delay allowance.

Currently the buffer for the delay traffic is configured as 100KB. The
MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
This allows 80KB extra to be stored in this buffer.

8-lane ports use two buffers among which the configured buffer is split,
the Xoff threshold then applies to each buffer in parallel.

The test does not take into account the behavior of 8-lane ports, when the
ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
packets are dropped and the test fails.

Check if the relevant ports use 8 lanes, in such case double the size of
the buffer, as the headroom is split half-half.

Cc: Shuah Khan <shuah@kernel.org>
Fixes: bfa804784e ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/23ff11b7dff031eb04a41c0f5254a2b636cd8ebb.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:09 -08:00
Amit Cohen
40cc674baf selftests: mlxsw: qos_pfc: Remove wrong description
In the diagram of the topology, $swp3 and $swp4 are described as 1Gbps
ports. This is wrong information, the test does not configure such speed.

Cc: Shuah Khan <shuah@kernel.org>
Fixes: bfa804784e ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/0087e2d416aff7e444d15f7c2958fc1d438dc27e.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:09 -08:00
Ido Schimmel
483ae90d8f mlxsw: spectrum_acl_tcam: Fix stack corruption
When tc filters are first added to a net device, the corresponding local
port gets bound to an ACL group in the device. The group contains a list
of ACLs. In turn, each ACL points to a different TCAM region where the
filters are stored. During forwarding, the ACLs are sequentially
evaluated until a match is found.

One reason to place filters in different regions is when they are added
with decreasing priorities and in an alternating order so that two
consecutive filters can never fit in the same region because of their
key usage.

In Spectrum-2 and newer ASICs the firmware started to report that the
maximum number of ACLs in a group is more than 16, but the layout of the
register that configures ACL groups (PAGT) was not updated to account
for that. It is therefore possible to hit stack corruption [1] in the
rare case where more than 16 ACLs in a group are required.

Fix by limiting the maximum ACL group size to the minimum between what
the firmware reports and the maximum ACLs that fit in the PAGT register.

Add a test case to make sure the machine does not crash when this
condition is hit.

[1]
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
[...]
 dump_stack_lvl+0x36/0x50
 panic+0x305/0x330
 __stack_chk_fail+0x15/0x20
 mlxsw_sp_acl_tcam_group_update+0x116/0x120
 mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
 mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
 mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
 mlxsw_sp_acl_rule_add+0x47/0x240
 mlxsw_sp_flower_replace+0x1a9/0x1d0
 tc_setup_cb_add+0xdc/0x1c0
 fl_hw_replace_filter+0x146/0x1f0
 fl_change+0xc17/0x1360
 tc_new_tfilter+0x472/0xb90
 rtnetlink_rcv_msg+0x313/0x3b0
 netlink_rcv_skb+0x58/0x100
 netlink_unicast+0x244/0x390
 netlink_sendmsg+0x1e4/0x440
 ____sys_sendmsg+0x164/0x260
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xc0
 do_syscall_64+0x40/0xe0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Reported-by: Orel Hagag <orelh@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/2d91c89afba59c22587b444994ae419dbea8d876.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:08 -08:00
Amit Cohen
6d6eeabcfa mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.

The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.

Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table->base_index and indeed other tables use it.

gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.

Fix this by changing erp_table->base_index only in case of a successful
allocation.

Add a test case for such a scenario. Without this fix it causes
segmentation fault:

$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null

[1]:
kernel BUG at lib/genalloc.c:508!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:gen_pool_free_owner+0xc9/0xe0
...
Call Trace:
 <TASK>
 __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
 objagg_obj_root_destroy+0x18/0x80 [objagg]
 objagg_obj_destroy+0x12c/0x130 [objagg]
 mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
 mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
 mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
 mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
 tc_setup_cb_destroy+0xc1/0x180
 fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
 __fl_delete+0x1ac/0x1c0 [cls_flower]
 fl_destroy+0xc2/0x150 [cls_flower]
 tcf_proto_destroy+0x1a/0xa0
...
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion

Fixes: f465261aa1 ("mlxsw: spectrum_acl: Implement common eRP core")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:08 -08:00