Commit Graph

2755 Commits

Author SHA1 Message Date
Paul E. McKenney
c7dcf8106f rcu-tasks: Fix synchronize_rcu_tasks_trace() header comment
The synchronize_rcu_tasks_trace() header comment incorrectly claims that
any number of things delimit RCU Tasks Trace read-side critical sections,
when in fact only rcu_read_lock_trace() and rcu_read_unlock_trace() do so.
This commit therefore fixes this comment, and, while in the area, fixes
a typo in the rcu_read_lock_trace() header comment.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:46 -07:00
Paul E. McKenney
e13ef442fe refperf: Add test for RCU Tasks readers
This commit adds testing for RCU Tasks readers to the refperf module.
This also applies to RCU Rude readers, as both flavors have empty
(as in non-existent) read-side markers.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:46 -07:00
Paul E. McKenney
72bb749e70 refperf: Add test for RCU Tasks Trace readers.
This commit adds testing for RCU Tasks Trace readers to the refperf module.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
918b351d96 refperf: Change readdelay module parameter to nanoseconds
The current units of microseconds are too coarse, so this commit
changes the units to nanoseconds.  However, ndelay is used only for the
nanoseconds with udelay being used for whole microseconds.  For example,
setting refperf.readdelay=1500 results in a udelay(1) followed by an
ndelay(500).

Suggested-by: Akira Yokosawa <akiyks@gmail.com>
[ paulmck: Abstracted delay per Akira feedback and move from 80 to 100 lines. ]
[ paulmck: Fix names as suggested by kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Arnd Bergmann
7c944d7c67 refperf: Work around 64-bit division
A 64-bit division was introduced in refperf, breaking compilation
on all 32-bit architectures:

kernel/rcu/refperf.o: in function `main_func':
refperf.c:(.text+0x57c): undefined reference to `__aeabi_uldivmod'

Fix this by using div_u64 to mark the expensive operation.

[ paulmck: Update primitive and format per Nathan Chancellor. ]
Fixes: bd5b16d6c88d ("refperf: Allow decimal nanoseconds")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
4dd72a338a refperf: Adjust refperf.loop default value
With the various measurement optimizations, 10,000 loops normally
suffices.  This commit therefore reduces the refperf.loops default value
from 10,000,000 to 10,000.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
b4d1e34f65 refperf: Add read-side delay module parameter
This commit adds a refperf.readdelay module parameter that controls the
duration of each critical section.  This parameter allows gathering data
showing how the performance differences between the various primitives
vary with critical-section length.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
96af866959 refperf: Simplify initialization-time wakeup protocol
This commit moves the reader-launch wait loop from ref_perf_init()
to main_func(), removing one layer of wakeup and allowing slightly
faster system boot.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
6efb063408 refperf: Label experiment-number column "Runs"
The experiment-number column is currently labeled "Threads", which is
misleading at best.  This commit therefore relabels it as "Runs", and
adjusts the scripts accordingly.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
2db0bda384 refperf: Add warmup and cooldown processing phases
This commit causes all the readers to start running unmeasured load
until all readers have done at least one such run (thus having warmed
up), then run the measured load, and then run unmeasured load until all
readers have completed their measured load.  This approach avoids any
thread running measured load while other readers are idle.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
86e0da2bb8 refperf: More closely synchronize reader start times
Currently, readers are awakened individually.  On most systems, this
results in significant wakeup delay from one reader to the next, which
can result in the first and last reader having sole access to the
synchronization primitive in question.  If that synchronization primitive
involves shared memory, those readers will rack up a huge number of
operations in a very short time, causing large perturbations in the
results.

This commit therefore has the readers busy-wait after being awakened,
and uses a new n_started variable to synchronize their start times.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
af2789db13 refperf: Convert reader_task structure's "start" field to int
This commit converts the reader_task structure's "start" field to int
in order to demote a full barrier to an smp_load_acquire() and also to
simplify the code a bit.  While in the area, and to enlist the compiler's
help in ensuring that nothing was missed, the field's name was changed
to start_reader.

Also while in the area, change the main_func() store to use
smp_store_release() to further fortify against wait/wake races.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
b864f89ff6 refperf: Tune reader measurement interval
This commit moves a printk() out of the measurement interval, converts
a atomic_dec()/atomic_read() pair to atomic_dec_and_test(), and adds
a smp_mb__before_atomic() to avoid potential wake/wait hangs.  These
changes have the added benefit of reducing the number of loops required
for amortizing loop overhead for CONFIG_PREEMPT=n RCU measurements from
1,000,000 to 10,000.  This reduction in turn shortens the test, reducing
the probability of interference.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
2990750bce refperf: Make functions static
Because the reset_readers() and process_durations() functions are used
only within kernel/rcu/refperf.c, this commit makes them static.

Reported-by: kbuild test robot <lkp@intel.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:45 -07:00
Paul E. McKenney
2e90de76f2 refperf: Dynamically allocate thread-summary output buffer
Currently, the buffer used to accumulate the thread-summary output is
fixed size, which will cause problems if someone decides to run on a large
number of PCUs.  This commit therefore dynamically allocates this buffer.

[ paulmck: Fix memory allocation as suggested by KASAN. ]
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
f518f154ec refperf: Dynamically allocate experiment-summary output buffer
Currently, the buffer used to accumulate the experiment-summary output
is fixed size, which will cause problems if someone decides to run
one hundred experiments.  This commit therefore dynamically allocates
this buffer.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
dbf28efdae refperf: Provide module parameter to specify number of experiments
The current code uses the number of threads both to limit the number
of threads and to specify the number of experiments, but also varies
the number of threads as the experiments progress.  This commit takes
a different approach by adding an refperf.nruns module parameter that
specifies the number of experiments, and furthermore uses the same
number of threads for each experiment.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
8fc28783a0 refperf: Convert nreaders to a module parameter
This commit converts nreaders to a module parameter, with the default
of -1 specifying the old behavior of using 75% of the readers.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
83b88c86da refperf: Allow decimal nanoseconds
The CONFIG_PREEMPT=n rcu_read_lock()/rcu_read_unlock() pair's overhead,
even including loop overhead, is far less than one nanosecond.
Since logscale plots are not all that happy with zero values, provide
picoseconds as decimals.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
75dd8efef5 refperf: Hoist function-pointer calls out of the loop
Current runs show PREEMPT=n rcu_read_lock()/rcu_read_unlock() pairs
consuming between 20 and 30 nanoseconds, when in fact the actual value is
zero, give or take the barrier() asm's effect on compiler optimizations.
The additional overhead is caused by function calls through pointers
(especially in these days of Spectre mitigations) and perhaps also
needless argument passing, a non-const loop limit, and an upcounting loop.

This commit therefore combines the ->readlock() and ->readunlock()
function pointers into a single ->readsection() function pointer that
takes the loop count as a const parameter and keeps any data passed
from the read-lock to the read-unlock internal to this new function.

These changes reduce the measured overhead of the aforementioned
PREEMPT=n rcu_read_lock()/rcu_read_unlock() pairs from between 20 and
30 nanoseconds to somewhere south of 500 picoseconds.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
777a54c908 refperf: Add holdoff parameter to allow CPUs to come online
This commit adds an rcuperf module parameter named "holdoff" that
defaults to 10 seconds if refperf is built in and to zero otherwise.
The assumption is that all the CPUs are online by the time that the
modprobe and insmod commands are going to do anything, and that normal
systems will have all the CPUs online within ten seconds.

Larger systems may take many tens of seconds or even minutes to get
to this point, hence this being a module parameter instead of being a
hard-coded constant.

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
708cda3165 rcuperf: Add comments explaining the high reader overhead
This commit adds comments explaining why the readers have otherwise insane
levels of measurement overhead, namely that they are intended as a test
load for update-side performance measurements, not as a straight-up
read-side performance test.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Joel Fernandes (Google)
653ed64b01 refperf: Add a test to measure performance of read-side synchronization
Add a test for comparing the performance of RCU with various read-side
synchronization mechanisms. The test has proved useful for collecting
data and performing these comparisons.

Currently RCU, SRCU, reader-writer lock, reader-writer semaphore and
reference counting can be measured using refperf.perf_type parameter.
Each invocation of the test runs measures performance of a specific
mechanism.

The maximum number of CPUs to concurrently run readers on is chosen by
the test itself and is 75% of the total number of CPUs. So if you had 24
CPUs, the test runs with a maximum of 18 parallel readers.

A number of experiments are conducted, and in each experiment, the
number of readers is increased by 1, upto the 75% of CPUs mark. During
each experiment, all readers execute an empty loop with refperf.loops
iterations and time the total loop duration. This is then averaged.

Example output:
Parameters "refperf.perf_type=srcu refperf.loops=2000000" looks like:

[    3.347133] srcu-ref-perf:
[    3.347133] Threads  Time(ns)
[    3.347133] 1        36
[    3.347133] 2        34
[    3.347133] 3        34
[    3.347133] 4        34
[    3.347133] 5        33
[    3.347133] 6        33
[    3.347133] 7        33
[    3.347133] 8        33
[    3.347133] 9        33
[    3.347133] 10       33
[    3.347133] 11       33
[    3.347133] 12       33
[    3.347133] 13       33
[    3.347133] 14       33
[    3.347133] 15       32
[    3.347133] 16       33
[    3.347133] 17       33
[    3.347133] 18       34

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Joel Fernandes (Google)
7e866460cc rcuperf: Remove useless while loops around wait_event
wait_event() already retries if the condition for the wake up is not
satisifed after wake up. Remove them from the rcuperf test.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:44 -07:00
Paul E. McKenney
30d8aa5128 rcu-tasks: Fix code-style issues
This commit declares trc_n_readers_need_end and trc_wait static and
replaced a "&" with "&&".  The "&" happened to work because the values
are bool, but accidents waiting to happen and all that...

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:22 -07:00
Paul E. McKenney
8344496e8b rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()
The show_rcu_tasks_gp_kthreads() function is not invoked by Tiny RCU,
but is nevertheless defined in Tiny RCU builds that enable Tasks Trace
RCU.  This commit therefore conditionally compiles this function so
that it is defined only in builds that actually use it.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:22 -07:00
Paul E. McKenney
5b3cc99bed rcu-tasks: Add #include of rcupdate_trace.h to update.c
Although this is in some strict sense unnecessary, it is good to allow
the compiler to compare the function declaration with its definition.
This commit therefore adds a #include of linux/rcupdate_trace.h to
kernel/rcu/update.c.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:22 -07:00
Paul E. McKenney
04a3c5aa7a rcu-tasks: Make rcu_tasks_postscan() be static
The rcu_tasks_postscan() function is not used outside of RCU's tasks.h
file, so this commit makes it be static.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:22 -07:00
Paul E. McKenney
ea6eed9f7d rcu-tasks: Convert sleeps to idle priority
This commit converts the long-standing schedule_timeout_interruptible()
and schedule_timeout_uninterruptible() calls used by the various Tasks
RCU's grace-period kthreads to schedule_timeout_idle().  This conversion
avoids polluting the load-average with Tasks-RCU-related sleeping.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 12:00:22 -07:00
Uladzislau Rezki (Sony)
3042f83f19 rcu: Support reclaim for head-less object
Update the kvfree_call_rcu() function with head-less support.
This allows RCU to reclaim objects without an embedded rcu_head.

tree-RCU:
We introduce two chains of arrays to store SLAB-backed and vmalloc
pointers, each.  Storage in either of these arrays does not require
embedding an rcu_head within the object.

Maintaining the arrays may become impossible due to high memory
pressure. For such cases there is an emergency path. Objects with
rcu_head inside are just queued on a backup rcu_head list. Later on
that list is drained. As for the head-less variant, as the current
context can sleep, the following emergency measures are applied:
   a) Synchronously wait until a grace period has elapsed.
   b) Call kvfree().

tiny-RCU:
For double argument calls, there are no new changes in behavior. For
single argument call, kvfree() is directly inlined on the current
stack after a synchronize_rcu() call. Note that for tiny-RCU, any
call to synchronize_rcu() is actually a quiescent state, therefore
it does nothing.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:26 -07:00
Uladzislau Rezki (Sony)
c408b215f5 rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_*
The following changes are introduced:

1. Rename rcu_invoke_kfree_callback() to rcu_invoke_kvfree_callback(),
as well as the associated trace events, so the rcu_kfree_callback(),
becomes rcu_kvfree_callback(). The reason is to be aligned with kvfree()
notation.

2. Rename __is_kfree_rcu_offset to __is_kvfree_rcu_offset. All RCU
paths use kvfree() now instead of kfree(), thus rename it.

3. Rename kfree_call_rcu() to the kvfree_call_rcu(). The reason is,
it is capable of freeing vmalloc() memory now. Do the same with
__kfree_rcu() macro, it becomes __kvfree_rcu(), the goal is the
same.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)
64d1d06ccb rcu/tiny: support vmalloc in tiny-RCU
Replace kfree() with kvfree() in rcu_reclaim_tiny().
This makes it possible to release either SLAB or vmalloc
objects after a GP.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)
5f3c8d6204 rcu/tree: Maintain separate array for vmalloc ptrs
To do so, we use an array of kvfree_rcu_bulk_data structures.
It consists of two elements:
 - index number 0 corresponds to slab pointers.
 - index number 1 corresponds to vmalloc pointers.

Keeping vmalloc pointers separated from slab pointers makes
it possible to invoke the right freeing API for the right
kind of pointer.

It also prepares us for future headless support for vmalloc
and SLAB objects. Such objects cannot be queued on a linked
list and are instead directly into an array.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)
53c72b590b rcu/tree: cache specified number of objects
In order to reduce the dynamic need for pages in kfree_rcu(),
pre-allocate a configurable number of pages per CPU and link
them in a list. When kfree_rcu() reclaims objects, the object's
container page is cached into a list instead of being released
to the low-level page allocator.

Such an approach provides O(1) access to free pages while also
reducing the number of requests to the page allocator. It also
makes the kfree_rcu() code to have free pages available during
a low memory condition.

A read-only sysfs parameter (rcu_min_cached_objs) reflects the
minimum number of allowed cached pages per CPU.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Sebastian Andrzej Siewior
69f08d3999 rcu/tree: Use static initializer for krc.lock
The per-CPU variable is initialized at runtime in
kfree_rcu_batch_init(). This function is invoked before
'rcu_scheduler_active' is set to 'RCU_SCHEDULER_RUNNING'.
After the initialisation, '->initialized' is to true.

The raw_spin_lock is only acquired if '->initialized' is
set to true. The worqueue item is only used if 'rcu_scheduler_active'
set to RCU_SCHEDULER_RUNNING which happens after initialisation.

Use a static initializer for krc.lock and remove the runtime
initialisation of the lock. Since the lock can now be always
acquired, remove the '->initialized' check.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)
952371d6fc rcu/tree: Move kfree_rcu_cpu locking/unlocking to separate functions
Introduce helpers to lock and unlock per-cpu "kfree_rcu_cpu"
structures. That will make kfree_call_rcu() more readable
and prevent programming errors.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)
3af8486281 rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro
We can simplify KFREE_BULK_MAX_ENTR macro and get rid of
magic numbers which were used to make the structure to be
exactly one page.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Joel Fernandes (Google)
446044eb9c rcu/tree: Make debug_objects logic independent of rcu_head
kfree_rcu()'s debug_objects logic uses the address of the object's
embedded rcu_head to queue/unqueue. Instead of this, make use of the
object's address itself as preparation for future headless kfree_rcu()
support.

Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)
594aa5975b rcu/tree: Repeat the monitor if any free channel is busy
It is possible that one of the channels cannot be detached
because its free channel is busy and previously queued data
has not been processed yet. On the other hand, another
channel can be successfully detached causing the monitor
work to stop.

Prevent that by rescheduling the monitor work if there are
any channels in the pending state after a detach attempt.

Fixes: 34c8817455 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Joel Fernandes (Google)
4d29194118 rcu/tree: Skip entry into the page allocator for PREEMPT_RT
To keep the kfree_rcu() code working in purely atomic sections on RT,
such as non-threaded IRQ handlers and raw spinlock sections, avoid
calling into the page allocator which uses sleeping locks on RT.

In fact, even if the  caller is preemptible, the kfree_rcu() code is
not, as the krcp->lock is a raw spinlock.

Calling into the page allocator is optional and avoiding it should be
Ok, especially with the page pre-allocation support in future patches.
Such pre-allocation would further avoid the a need for a dynamically
allocated page in the first place.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Co-developed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Joel Fernandes (Google)
8ac88f7177 rcu/tree: Keep kfree_rcu() awake during lock contention
On PREEMPT_RT kernels, the krcp spinlock gets converted to an rt-mutex
and causes kfree_rcu() callers to sleep. This makes it unusable for
callers in purely atomic sections such as non-threaded IRQ handlers and
raw spinlock sections. Fix it by converting the spinlock to a raw
spinlock.

Vetting all code paths, there is no reason to believe that the raw
spinlock will hurt RT latencies as it is not held for a long time.

Cc: bigeasy@linutronix.de
Cc: Uladzislau Rezki <urezki@gmail.com>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:25 -07:00
Mauro Carvalho Chehab
8e11690d2f rcu: Fix a kernel-doc warnings for "count"
There are some kernel-doc warnings:

	./kernel/rcu/tree.c:2915: warning: Function parameter or member 'count' not described in 'kfree_rcu_cpu'

This commit therefore moves the comment for "count" to the kernel-doc
markup.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:59:24 -07:00
Randy Dunlap
c3cb47a6cc kernel/rcu/tree.c: Fix kernel-doc warnings
Fix kernel-doc warning:

../kernel/rcu/tree.c:959: warning: Excess function parameter 'irq' description in 'rcu_nmi_enter'

Fixes: cf7614e13c ("rcu: Refactor rcu_{nmi,irq}_{enter,exit}()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:51 -07:00
Wei Yang
7a0c2b0940 rcu: grpnum just records group number
The ->grpnum field in the rcu_node structure contains the bit position
in this structure's parent's bitmasks, which is not the CPU number.
This commit therefore adjusts this field's comment accordingly.

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:51 -07:00
Wei Yang
a2dae43088 rcu: grplo/grphi just records CPU number
The ->grplo and ->grphi fields store the lowest and highest CPU number
covered by to a rcu_node structure, which is not the group number.
This commit therefore adjusts these fields' comments to match reality.

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:51 -07:00
Wei Yang
00943a609d rcu: gp_max is protected by root rcu_node's lock
Because gp_max is protected by root rcu_node's lock, this commit moves
the gp_max definition to the region of the rcu_node structure containing
fields protected by this lock.

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:51 -07:00
Peter Enderborg
c6dfd72b7a rcu: Stop shrinker loop
The count and scan can be separated in time, and there is a fair chance
that all work is already done when the scan starts, which might in turn
result in a needless retry.  This commit therefore avoids this retry by
returning SHRINK_STOP.

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Peter Enderborg <peter.enderborg@sony.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:51 -07:00
Jules Irenge
e40bb92111 rcu: Replace 1 with true
Coccinelle reports a warning

WARNING: Assignment of 0/1 to bool variable

The root cause is that the variable lastphase is a bool, but is
initialised with integer 1.  This commit therefore replaces the 1 with
a true.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:51 -07:00
Paul E. McKenney
04b25a495b rcu: Mark rcu_nmi_enter() call to rcu_cleanup_after_idle() noinstr
The objtool complains about the call to rcu_cleanup_after_idle() from
rcu_nmi_enter(), so this commit adds instrumentation_begin() before that
call and instrumentation_end() after it.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Paul E. McKenney
55fbe86ef3 rcu: Remove initialized but unused rnp from check_slow_task()
This commit removes the variable rnp from check_slow_task(), which
is defined, assigned to, but not otherwise used.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Lihao Liang
360fbbb489 rcu: Update comment from rsp->rcu_gp_seq to rsp->gp_seq
Signed-off-by: Lihao Liang <lihaoliang@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Paul E. McKenney
68c2f27e01 rcu: Expedited grace-period sleeps to idle priority
This commit converts the schedule_timeout_uninterruptible() call used
by RCU's expedited grace-period processing to schedule_timeout_idle().
This conversion avoids polluting the load-average with RCU-related
sleeping.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Paul E. McKenney
f5ca34643b rcu: No-CBs-related sleeps to idle priority
This commit converts the schedule_timeout_interruptible() call used by
RCU's no-CBs grace-period kthreads to schedule_timeout_idle().  This
conversion avoids polluting the load-average with RCU-related sleeping.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Paul E. McKenney
a9352f72d6 rcu: Priority-boost-related sleeps to idle priority
This commit converts the long-standing schedule_timeout_interruptible()
call used by RCU's priority-boosting kthreads to schedule_timeout_idle().
This conversion avoids polluting the load-average with RCU-related
sleeping.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Paul E. McKenney
77865dea25 rcu: Grace-period-kthread related sleeps to idle priority
This commit converts the long-standing schedule_timeout_interruptible()
and schedule_timeout_uninterruptible() calls used by RCU's grace-period
kthread to schedule_timeout_idle().  This conversion avoids polluting
the load-average with RCU-related sleeping.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:49 -07:00
Paul E. McKenney
f8466f9468 rcu: Add comment documenting rcu_callback_map's purpose
The rcu_callback_map lockdep_map structure was added back in 2013, but
its purpose has become obscure.  This commit therefore documments that the
purpose of rcu_callback map is, in the words of commit 24ef659a85 ("rcu:
Provide better diagnostics for blocking in RCU callback functions"),
to help lockdep to tie an "inappropriate voluntary context switch back
to the fact that the function is being invoked from within a callback."

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:49 -07:00
Paul E. McKenney
e816d56fad rcu: Add callbacks-invoked counters
This commit adds a count of the callbacks invoked to the per-CPU rcu_data
structure.  This count is printed by the show_rcu_gp_kthreads() that
is invoked by rcutorture and the RCU CPU stall-warning code.  It is also
intended for use by drgn.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:49 -07:00
Wei Yang
abfce04148 rcu: Simplify the calculation of rcu_state.ncpus
There is only 1 bit set in mask, which means that the only difference
between oldmask and the new one will be at the position where the bit is
set in mask.  This commit therefore updates rcu_state.ncpus by checking
whether the bit in mask is already set in rnp->expmaskinitnext.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:49 -07:00
Wei Yang
7ee880b7bf rcu: Initialize and destroy rcu_synchronize only when necessary
The __wait_rcu_gp() function unconditionally initializes and cleans up
each element of rs_array[], whether used or not.  This is slightly
wasteful and rather confusing, so this commit skips both initialization
and cleanup for duplicate callback functions.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:49 -07:00
Mauro Carvalho Chehab
f2286ab995 docs: RCU: Convert stallwarn.txt to ReST
- Add a SPDX header;
- Adjust document and section titles;
- Fix list markups;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to RCU/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:11 -07:00
Mauro Carvalho Chehab
43cb5451df docs: RCU: Convert torture.txt to ReST
- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to RCU/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:11 -07:00
Peter Zijlstra
b58e733fd7 rcu: Fixup noinstr warnings
A KCSAN build revealed we have explicit annoations through atomic_*()
usage, switch to arch_atomic_*() for the respective functions.

vmlinux.o: warning: objtool: rcu_nmi_exit()+0x4d: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x25: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: rcu_nmi_enter()+0x4f: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0x2a: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: __rcu_is_watching()+0x25: call to __kcsan_check_access() leaves .noinstr.text section

Additionally, without the NOP in instrumentation_begin(), objtool would
not detect the lack of the 'else instrumentation_begin();' branch in
rcu_nmi_enter().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-25 08:24:32 -07:00
Peter Zijlstra
8b700983de sched: Remove sched_set_*() return value
Ingo suggested that since the new sched_set_*() functions are
implemented using the 'nocheck' variants, they really shouldn't ever
fail, so remove the return value.

Cc: axboe@kernel.dk
Cc: daniel.lezcano@linaro.org
Cc: sudeep.holla@arm.com
Cc: airlied@redhat.com
Cc: broonie@kernel.org
Cc: paulmck@kernel.org
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2020-06-15 14:10:26 +02:00
Peter Zijlstra
ce1c8afd3f sched,rcutorture: Convert to sched_set_fifo_low()
Because SCHED_FIFO is a broken scheduler model (see previous patches)
take away the priority field, the kernel can't possibly make an
informed decision.

Effectively no change.

Cc: paulmck@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-15 14:10:25 +02:00
Peter Zijlstra
b1433395c4 sched,rcuperf: Convert to sched_set_fifo_low()
Because SCHED_FIFO is a broken scheduler model (see previous patches)
take away the priority field, the kernel can't possibly make an
informed decision.

Effectively no change.

Cc: paulmck@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-15 14:10:24 +02:00
Linus Torvalds
081096d98b TTY/Serial driver updates for 5.8-rc1
Here is the tty and serial driver updates for 5.8-rc1
 
 Nothing huge at all, just a lot of little serial driver fixes, updates
 for new devices and features, and other small things.  Full details are
 in the shortlog.
 
 Note, you will get a conflict merging with your tree in the
 Documentation/devicetree/bindings/serial/rs485.yaml file, but it should
 be pretty obvious what to do.  If not, I'm sure Rob will clean it all up
 afterwards :)
 
 All of these have been in linux-next with no issues for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXtzpCg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylRxACgjGtOKPjahONL4lWd0F8ZYEcyw7sAn34woBCO
 BDUV3kolrRQ4OYNJWsHP
 =TvqG
 -----END PGP SIGNATURE-----

Merge tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the tty and serial driver updates for 5.8-rc1

  Nothing huge at all, just a lot of little serial driver fixes, updates
  for new devices and features, and other small things. Full details are
  in the shortlog.

  All of these have been in linux-next with no issues for a while"

* tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (67 commits)
  tty: serial: qcom_geni_serial: Add 51.2MHz frequency support
  tty: serial: imx: clear Ageing Timer Interrupt in handler
  serial: 8250_fintek: Add F81966 Support
  sc16is7xx: Add flag to activate IrDA mode
  dt-bindings: sc16is7xx: Add flag to activate IrDA mode
  serial: 8250: Support rs485 bus termination GPIO
  serial: 8520_port: Fix function param documentation
  dt-bindings: serial: Add binding for rs485 bus termination GPIO
  vt: keyboard: avoid signed integer overflow in k_ascii
  serial: 8250: Enable 16550A variants by default on non-x86
  tty: hvc_console, fix crashes on parallel open/close
  serial: imx: Initialize lock for non-registered console
  sc16is7xx: Read the LSR register for basic device presence check
  sc16is7xx: Allow sharing the IRQ line
  sc16is7xx: Use threaded IRQ
  sc16is7xx: Always use falling edge IRQ
  tty: n_gsm: Fix bogus i++ in gsm_data_kick
  tty: n_gsm: Remove unnecessary test in gsm_print_packet()
  serial: stm32: add no_console_suspend support
  tty: serial: fsl_lpuart: Use __maybe_unused instead of #if CONFIG_PM_SLEEP
  ...
2020-06-07 09:52:36 -07:00
Ingo Molnar
5fdeefa053 Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull RCU fix from Paul E. McKenney.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-03 11:14:37 +02:00
Kefeng Wang
b3e2d20973 rcuperf: Fix printk format warning
Using "%zu" to fix following warning,
kernel/rcu/rcuperf.c: In function ‘kfree_perf_init’:
include/linux/kern_levels.h:5:18: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘unsigned int’ [-Wformat=]

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-02 08:41:37 -07:00
Ingo Molnar
cb3cb6733f Merge branch 'WIP.core/rcu' into core/rcu, to pick up two x86/entry dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-01 11:47:29 +02:00
Peter Zijlstra
806f04e9fd rcu: Allow for smp_call_function() running callbacks from idle
Current RCU hard relies on smp_call_function() callbacks running from
interrupt context. A pending optimization is going to break that, it
will allow idle CPUs to run the callbacks from the idle loop. This
avoids raising the IPI on the requesting CPU and avoids handling an
exception on the receiving CPU.

Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
provided it is the idle task.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net
2020-05-28 10:50:12 +02:00
Thomas Gleixner
07325d4a90 rcu: Provide rcu_irq_exit_check_preempt()
Provide a debug check which can be invoked from exception return to kernel
mode before an attempt is made to schedule. Warn if RCU is not ready for
this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.089709607@linutronix.de
2020-05-26 19:05:11 +02:00
Paul E. McKenney
aaf2bc50df rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
There will likely be exception handlers that can sleep, which rules
out the usual approach of invoking rcu_nmi_enter() on entry and also
rcu_nmi_exit() on all exit paths.  However, the alternative approach of
just not calling anything can prevent RCU from coaxing quiescent states
from nohz_full CPUs that are looping in the kernel:  RCU must instead
IPI them explicitly.  It would be better to enable the scheduler tick
on such CPUs to interact with RCU in a lighter-weight manner, and this
enabling is one of the things that rcu_nmi_enter() currently does.

What is needed is something that helps RCU coax quiescent states while
not preventing subsequent sleeps.  This commit therefore splits out the
nohz_full scheduler-tick enabling from the rest of the rcu_nmi_enter()
logic into a new function named rcu_irq_enter_check_tick().

[ tglx: Renamed the function and made it a nop when context tracking is off ]
[ mingo: Fixed a CONFIG_NO_HZ_FULL assumption, harmonized and fixed all the
         comment blocks and cleaned up rcu_nmi_enter()/exit() definitions. ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200521202116.996113173@linutronix.de
2020-05-26 19:04:18 +02:00
Thomas Gleixner
b1fcf9b83c rcu: Provide __rcu_is_watching()
Same as rcu_is_watching() but without the preempt_disable/enable() pair
inside the function. It is merked noinstr so it ends up in the
non-instrumentable text section.

This is useful for non-preemptible code especially in the low level entry
section. Using rcu_is_watching() there results in a call to the
preempt_schedule_notrace() thunk which triggers noinstr section warnings in
objtool.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512213810.518709291@linutronix.de
2020-05-19 15:51:21 +02:00
Thomas Gleixner
8ae0ae6737 rcu: Provide rcu_irq_exit_preempt()
Interrupts and exceptions invoke rcu_irq_enter() on entry and need to
invoke rcu_irq_exit() before they either return to the interrupted code or
invoke the scheduler due to preemption.

The general assumption is that RCU idle code has to have preemption
disabled so that a return from interrupt cannot schedule. So the return
from interrupt code invokes rcu_irq_exit() and preempt_schedule_irq().

If there is any imbalance in the rcu_irq/nmi* invocations or RCU idle code
had preemption enabled then this goes unnoticed until the CPU goes idle or
some other RCU check is executed.

Provide rcu_irq_exit_preempt() which can be invoked from the
interrupt/exception return code in case that preemption is enabled. It
invokes rcu_irq_exit() and contains a few sanity checks in case that
CONFIG_PROVE_RCU is enabled to catch such issues directly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.364456424@linutronix.de
2020-05-19 15:51:21 +02:00
Paul E. McKenney
9ea366f669 rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an
"irq" parameter that indicates whether these functions have been invoked from
an irq handler (irq==true) or an NMI handler (irq==false).

However, recent changes have applied notrace to a few critical functions
such that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely on
in_nmi().  Note that in_nmi() works no differently than before, but rather
that tracing is now prohibited in code regions where in_nmi() would
incorrectly report NMI state.

Therefore remove the "irq" parameter and inline rcu_nmi_enter_common() and
rcu_nmi_exit_common() into rcu_nmi_enter() and rcu_nmi_exit(),
respectively.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.617130349@linutronix.de
2020-05-19 15:51:21 +02:00
Thomas Gleixner
ff5c4f5cad rcu/tree: Mark the idle relevant functions noinstr
These functions are invoked from context tracking and other places in the
low level entry code. Move them into the .noinstr.text section to exclude
them from instrumentation.

Mark the places which are safe to invoke traceable functions with
instrumentation_begin/end() so objtool won't complain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20200505134100.575356107@linutronix.de
2020-05-19 15:51:20 +02:00
Emil Velikov
0ca650c430 rcu: constify sysrq_key_op
With earlier commits, the API no longer discards the const-ness of the
sysrq_key_op. As such we can add the notation.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-kernel@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: rcu@vger.kernel.org
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Link: https://lore.kernel.org/r/20200513214351.2138580-11-emil.l.velikov@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-15 14:53:20 +02:00
Paul E. McKenney
f736e0f1a5 Merge branches 'fixes.2020.04.27a', 'kfree_rcu.2020.04.27a', 'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD
fixes.2020.04.27a:  Miscellaneous fixes.
kfree_rcu.2020.04.27a:  Changes related to kfree_rcu().
rcu-tasks.2020.04.27a:  Addition of new RCU-tasks flavors.
stall.2020.04.27a:  RCU CPU stall-warning updates.
torture.2020.05.07a:  Torture-test updates.
2020-05-07 10:18:32 -07:00
Paul E. McKenney
3c80b40245 rcutorture: Convert ULONG_CMP_LT() to time_before()
This commit converts three ULONG_CMP_LT() invocations in rcutorture to
time_before() to reflect the fact that they are comparing timestamps to
the jiffies counter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Jason Yan
afbc1574f1 rcutorture: Make rcu_fwds and rcu_fwd_emergency_stop static
This commit fixes the following sparse warning:

kernel/rcu/rcutorture.c:1695:16: warning: symbol 'rcu_fwds' was not declared. Should it be static?
kernel/rcu/rcutorture.c:1696:6: warning: symbol 'rcu_fwd_emergency_stop' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
55b2dcf587 rcu: Allow rcutorture to starve grace-period kthread
This commit provides an rcutorture.stall_gp_kthread module parameter
to allow rcutorture to starve the grace-period kthread.  This allows
testing the code that detects such starvation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:28 -07:00
Paul E. McKenney
19a8ff956c rcutorture: Add flag to produce non-busy-wait task stalls
This commit aids testing of RCU task stall warning messages by adding
an rcutorture.stall_cpu_block module parameter that results in the
induced stall sleeping within the RCU read-side critical section.
Spinning with interrupts disabled is still available via the
rcutorture.stall_cpu_irqsoff module parameter, and specifying neither
of these two module parameters will spin with preemption disabled.

Note that sleeping (as opposed to preemption) results in additional
complaints from RCU at context-switch time, so yet more testing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:28 -07:00
Paul E. McKenney
c9527bebb0 rcutorture: Mark data-race potential for rcu_barrier() test statistics
The n_barrier_successes, n_barrier_attempts, and
n_rcu_torture_barrier_error variables are updated (without access
markings) by the main rcu_barrier() test kthread, and accessed (also
without access markings) by the rcu_torture_stats() kthread.  This of
course can result in KCSAN complaints.

Because the accesses are in diagnostic prints, this commit uses
data_race() to excuse the diagnostic prints from the data race.  If this
were to ever cause bogus statistics prints (for example, due to store
tearing), any misleading information would be disambiguated by the
presence or absence of an rcutorture splat.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely and due to the mild consequences of the
failure, namely a confusing rcutorture console message.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2020-04-27 11:05:13 -07:00
Paul E. McKenney
3b2a473985 rcutorture: Add KCSAN stubs
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(),
and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to
move ahead.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:05:13 -07:00
Paul E. McKenney
33b2b93bd8 rcu: Remove self-stack-trace when all quiescent states seen
When all quiescent states have been seen, it is normally the grace-period
kthread that is in trouble.  Although the existing stack trace from
the current CPU might possibly provide useful information, experience
indicates that there is too much noise for this to be worthwhile.

This commit therefore removes this stack trace from the output.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:04:37 -07:00
Paul E. McKenney
8837582517 rcu: When GP kthread is starved, tag idle threads as false positives
If the grace-period kthread is starved, idle threads' extended quiescent
states are not reported.  These idle threads thus wrongly appear to
be blocking the current grace period.  This commit therefore tags such
idle threads as probable false positives when the grace-period kthread
is being starved.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:04:37 -07:00
Paul E. McKenney
654db05cee rcu: Use data_race() for RCU expedited CPU stall-warning prints
Although the accesses used to determine whether or not an expedited
stall should be printed are an integral part of the concurrency algorithm
governing use of the corresponding variables, the values that are simply
printed are ancillary.  As such, it is best to use data_race() for these
accesses in order to provide the greatest latitude in the use of KCSAN
for the other accesses that are an integral part of the algorithm.  This
commit therefore changes the relevant uses of READ_ONCE() to data_race().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:04:37 -07:00
Paul E. McKenney
25246fc831 rcu-tasks: Allow standalone use of TASKS_{TRACE_,}RCU
This commit allows TASKS_TRACE_RCU to be used independently of TASKS_RCU
and vice versa.

[ paulmck: Fix conditional compilation per kbuild test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:53 -07:00
Paul E. McKenney
7e0669c3e9 rcu-tasks: Add IPI failure count to statistics
This commit adds a failure-return count for smp_call_function_single(),
and adds this to the console messages for rcutorture writer stalls and at
the end of rcutorture testing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:53 -07:00
Paul E. McKenney
edf3775f0a rcu-tasks: Add count for idle tasks on offline CPUs
This commit adds a counter for the number of times the quiescent state
was an idle task associated with an offline CPU, and prints this count
at the end of rcutorture runs and at stall time.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:53 -07:00
Paul E. McKenney
40471509be rcu-tasks: Add rcu_dynticks_zero_in_eqs() effectiveness statistics
This commit adds counts of the number of calls and number of successful
calls to rcu_dynticks_zero_in_eqs(), which are printed at the end
of rcutorture runs and at stall time.  This allows evaluation of the
effectiveness of rcu_dynticks_zero_in_eqs().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
9796e1ae73 rcu-tasks: Make RCU tasks trace also wait for idle tasks
This commit scans the CPUs, adding each CPU's idle task to the list of
tasks that need quiescent states.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
7e3b70e070 rcu-tasks: Handle the running-offline idle-task special case
The idle task corresponding to an offline CPU can appear to be running
while that CPU is offline.  This commit therefore adds checks for this
situation, treating it as a quiescent state.  Because the tasklist scan
and the holdout-list scan now exclude CPU-hotplug operations, readers
on the CPU-hotplug paths are still waited for.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
81b4a7bc3b rcu-tasks: Disable CPU hotplug across RCU tasks trace scans
This commit disables CPU hotplug across RCU tasks trace scans, which
is a first step towards correctly recognizing idle tasks "running" on
offline CPUs.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
b38f57c1fe rcu-tasks: Allow rcu_read_unlock_trace() under scheduler locks
The rcu_read_unlock_trace() can invoke rcu_read_unlock_trace_special(),
which in turn can call wake_up().  Therefore, if any scheduler lock is
held across a call to rcu_read_unlock_trace(), self-deadlock can occur.
This commit therefore uses the irq_work facility to defer the wake_up()
to a clean environment where no scheduler locks will be held.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: Update #includes for m68k per kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
7d0c9c50c5 rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built
Systems running CPU-bound real-time task do not want IPIs sent to CPUs
executing nohz_full userspace tasks.  Battery-powered systems don't
want IPIs sent to idle CPUs in low-power mode.  Unfortunately, RCU tasks
trace can and will send such IPIs in some cases.

Both of these situations occur only when the target CPU is in RCU
dyntick-idle mode, in other words, when RCU is not watching the
target CPU.  This suggests that CPUs in dyntick-idle mode should use
memory barriers in outermost invocations of rcu_read_lock_trace()
and rcu_read_unlock_trace(), which would allow the RCU tasks trace
grace period to directly read out the target CPU's read-side state.
One challenge is that RCU tasks trace is not targeting a specific
CPU, but rather a task.  And that task could switch from one CPU to
another at any time.

This commit therefore uses try_invoke_on_locked_down_task()
and checks for task_curr() in trc_inspect_reader_notrunning().
When this condition holds, the target task is running and cannot move.
If CONFIG_TASKS_TRACE_RCU_READ_MB=y, the new rcu_dynticks_zero_in_eqs()
function can be used to check if the specified integer (in this case,
t->trc_reader_nesting) is zero while the target CPU remains in that same
dyntick-idle sojourn.  If so, the target task is in a quiescent state.
If not, trc_read_check_handler() must indicate failure so that the
grace-period kthread can take appropriate action or retry after an
appropriate delay, as the case may be.

With this change, given CONFIG_TASKS_TRACE_RCU_READ_MB=y, if a given
CPU remains idle or a given task continues executing in nohz_full mode,
the RCU tasks trace grace-period kthread will detect this without the
need to send an IPI.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
9ae58d7bd1 rcu-tasks: Add Kconfig option to mediate smp_mb() vs. IPI
This commit provides a new TASKS_TRACE_RCU_READ_MB Kconfig option that
enables use of read-side memory barriers by both rcu_read_lock_trace()
and rcu_read_unlock_trace() when the are executed with the
current->trc_reader_special.b.need_mb flag set.  This flag is currently
never set.  Doing that is the subject of a later commit.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
238dbce39e rcu-tasks: Add grace-period and IPI counts to statistics
This commit adds a grace-period count and a count of IPIs sent since
boot, which is printed in response to rcutorture writer stalls and at
the end of rcutorture testing.  These counts will be used to evaluate
various schemes to reduce the number of IPIs sent.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
276c410448 rcu-tasks: Split ->trc_reader_need_end
This commit splits ->trc_reader_need_end by using the rcu_special union.
This change permits readers to check to see if a memory barrier is
required without any added overhead in the common case where no such
barrier is required.  This commit also adds the read-side checking.
Later commits will add the machinery to properly set the new
->trc_reader_special.b.need_mb field.

This commit also makes rcu_read_unlock_trace_special() tolerate nested
read-side critical sections within interrupt and NMI handlers.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
b0afa0f056 rcu-tasks: Provide boot parameter to delay IPIs until late in grace period
This commit provides a rcupdate.rcu_task_ipi_delay kernel boot parameter
that specifies how old the RCU tasks trace grace period must be before
the grace-period kthread starts sending IPIs.  This delay allows more
tasks to pass through rcu_tasks_qs() quiescent states, thus reducing
(or even eliminating) the number of IPIs that must be sent.

On a short rcutorture test setting this kernel boot parameter to HZ/2
resulted in zero IPIs for all 877 RCU-tasks trace grace periods that
elapsed during that test.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
88092d0c99 rcu-tasks: Add a grace-period start time for throttling and debug
This commit adds a place to record the grace-period start in jiffies.
This will be used by later commits for debugging purposes and to throttle
IPIs early in the grace period.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
43766c3ead rcu-tasks: Make RCU Tasks Trace make use of RCU scheduler hooks
This commit makes the calls to rcu_tasks_qs() detect and report
quiescent states for RCU tasks trace.  If the task is in a quiescent
state and if ->trc_reader_checked is not yet set, the task sets its own
->trc_reader_checked.  This will cause the grace-period kthread to
remove it from the holdout list if it still remains there.

[ paulmck: Fix conditional compilation per kbuild test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
af051ca4e4 rcu-tasks: Make rcutorture writer stall output include GP state
This commit adds grace-period state and time to the rcutorture writer
stall output.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:52 -07:00
Paul E. McKenney
e21408ceec rcu-tasks: Add RCU tasks to rcutorture writer stall output
This commit adds state for each RCU-tasks flavor to the rcutorture
writer stall output.  The initial state is minimal, but you have to
start somewhere.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Fixes based on feedback from kbuild test robot. ]
2020-04-27 11:03:51 -07:00
Paul E. McKenney
8fd8ca388c rcu-tasks: Move #ifdef into tasks.h
This commit pushes the #ifdef CONFIG_TASKS_RCU_GENERIC from
kernel/rcu/update.c to kernel/rcu/tasks.h in order to improve
readability as more APIs are added.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
4593e772b5 rcu-tasks: Add stall warnings for RCU Tasks Trace
This commit adds RCU CPU stall warnings for RCU Tasks Trace.  These
dump out any tasks blocking the current grace period, as well as any
CPUs that have not responded to an IPI request.  This happens in two
phases, when initially extracting state from the tasks and later when
waiting for any holdout tasks to check in.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
c1a76c0b6a rcutorture: Add torture tests for RCU Tasks Trace
This commit adds the definitions required to torture the tracing flavor
of RCU tasks.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
d5f177d35c rcu-tasks: Add an RCU Tasks Trace to simplify protection of tracing hooks
Because RCU does not watch exception early-entry/late-exit, idle-loop,
or CPU-hotplug execution, protection of tracing and BPF operations is
needlessly complicated.  This commit therefore adds a variant of
Tasks RCU that:

o	Has explicit read-side markers to allow finite grace periods in
	the face of in-kernel loops for PREEMPT=n builds.  These markers
	are rcu_read_lock_trace() and rcu_read_unlock_trace().

o	Protects code in the idle loop, exception entry/exit, and
	CPU-hotplug code paths.  In this respect, RCU-tasks trace is
	similar to SRCU, but with lighter-weight readers.

o	Avoids expensive read-side instruction, having overhead similar
	to that of Preemptible RCU.

There are of course downsides:

o	The grace-period code can send IPIs to CPUs, even when those
	CPUs are in the idle loop or in nohz_full userspace.  This is
	mitigated by later commits.

o	It is necessary to scan the full tasklist, much as for Tasks RCU.

o	There is a single callback queue guarded by a single lock,
	again, much as for Tasks RCU.  However, those early use cases
	that request multiple grace periods in quick succession are
	expected to do so from a single task, which makes the single
	lock almost irrelevant.  If needed, multiple callback queues
	can be provided using any number of schemes.

Perhaps most important, this variant of RCU does not affect the vanilla
flavors, rcu_preempt and rcu_sched.  The fact that RCU Tasks Trace
readers can operate from idle, offline, and exception entry/exit in no
way enables rcu_preempt and rcu_sched readers to do so.

The memory ordering was outlined here:
https://lore.kernel.org/lkml/20200319034030.GX3199@paulmck-ThinkPad-P72/

This effort benefited greatly from off-list discussions of BPF
requirements with Alexei Starovoitov and Andrii Nakryiko.  At least
some of the on-list discussions are captured in the Link: tags below.
In addition, KCSAN was quite helpful in finding some early bugs.

Link: https://lore.kernel.org/lkml/20200219150744.428764577@infradead.org/
Link: https://lore.kernel.org/lkml/87mu8p797b.fsf@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/lkml/20200225221305.605144982@linutronix.de/
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>
[ paulmck: Apply feedback from Steve Rostedt and Joel Fernandes. ]
[ paulmck: Decrement trc_n_readers_need_end upon IPI failure. ]
[ paulmck: Fix locking issue reported by rcutorture. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
d01aa2633b rcu-tasks: Code movement to allow more Tasks RCU variants
This commit does nothing but move rcu_tasks_wait_gp() up to a new section
for common code.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
e4fe5dd6f2 rcu-tasks: Further refactor RCU-tasks to allow adding more variants
This commit refactors RCU tasks to allow variants to be added.  These
variants will share the current Tasks-RCU tasklist scan and the holdout
list processing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
c97d12a63c rcu-tasks: Use unique names for RCU-Tasks kthreads and messages
This commit causes the flavors of RCU Tasks to use different names
for their kthreads and in their console messages.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
3d6e43c75d rcutorture: Add torture tests for RCU Tasks Rude
This commit adds the definitions required to torture the rude flavor of
RCU tasks.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
c84aad7654 rcu-tasks: Add an RCU-tasks rude variant
This commit adds a "rude" variant of RCU-tasks that has as quiescent
states schedule(), cond_resched_tasks_rcu_qs(), userspace execution,
and (in theory, anyway) cond_resched().  In other words, RCU-tasks rude
readers are regions of code with preemption disabled, but excluding code
early in the CPU-online sequence and late in the CPU-offline sequence.
Updates make use of IPIs and force an IPI and a context switch on each
online CPU.  This variant is useful in some situations in tracing.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Apply review feedback from Steve Rostedt. ]
2020-04-27 11:03:51 -07:00
Paul E. McKenney
5873b8a94e rcu-tasks: Refactor RCU-tasks to allow variants to be added
This commit splits out generic processing from RCU-tasks-specific
processing in order to allow additional flavors to be added.  It also
adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks
infrastructure code.

This is primarily, but not entirely, a code-movement commit.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
9cf8fc6fab rcutorture: Add a test for synchronize_rcu_mult()
This commit adds a crude test for synchronize_rcu_mult().  This is
currently a smoke test rather than a high-quality stress test.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
07e105158d rcu-tasks: Create struct to hold state information
This commit creates an rcu_tasks struct to hold state information for
RCU Tasks.  This is a preparation commit for adding additional flavors
of Tasks RCU, each of which would have its own rcu_tasks struct.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
eacd6f04a1 rcu-tasks: Move Tasks RCU to its own file
This code-movement-only commit is in preparation for adding an additional
flavor of Tasks RCU, which relies on workqueues to detect grace periods.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
5bef8da66a rcu: Add per-task state to RCU CPU stall warnings
Currently, an RCU-preempt CPU stall warning simply lists the PIDs of
those tasks holding up the current grace period.  This can be helpful,
but more can be even more helpful.

To this end, this commit adds the nesting level, whether the task
thinks it was preempted in its current RCU read-side critical section,
whether RCU core has asked this task for a quiescent state, whether the
expedited-grace-period hint is set, and whether the task believes that
it is on the blocked-tasks list (it must be, or it would not be printed,
but if things are broken, best not to take too much for granted).

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
66777e5821 rcu-tasks: Use context-switch hook for PREEMPT=y kernels
Currently, the PREEMPT=y version of rcu_note_context_switch() does not
invoke rcu_tasks_qs(), and we need it to in order to keep RCU Tasks
Trace's IPIs down to a dull roar.  This commit therefore enables this
hook.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
ac3caf8274 rcu: Add comments marking transitions between RCU watching and not
It is not as clear as it might be just where in RCU's idle entry/exit
code RCU stops and starts watching the current CPU.  This commit therefore
adds comments calling out the transitions.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
52b1fc3f79 rcutorture: Add test of holding scheduler locks across rcu_read_unlock()
Now that it should be safe to hold scheduler locks across
rcu_read_unlock(), even in cases where the corresponding RCU read-side
critical section might have been preempted and boosted, the commit adds
a test of this capability to rcutorture.  This has been tested on current
mainline (which can deadlock in this situation), and lockdep duly reported
the expected deadlock.  On -rcu, lockdep is silent, thus far, anyway.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Lai Jiangshan
5f5fa7ea89 rcu: Don't use negative nesting depth in __rcu_read_unlock()
Now that RCU flavors have been consolidated, an RCU-preempt
rcu_read_unlock() in an interrupt or softirq handler cannot possibly
end the RCU read-side critical section.  Consider the old vulnerability
involving rcu_read_unlock() being invoked within such a handler that
interrupted an __rcu_read_unlock_special(), in which a wakeup might be
invoked with a scheduler lock held.  Because rcu_read_unlock_special()
no longer does wakeups in such situations, it is no longer necessary
for __rcu_read_unlock() to set the nesting level negative.

This commit therefore removes this recursion-protection code from
__rcu_read_unlock().

[ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ]
[ paulmck: Adjust other checks given no more negative nesting. ]
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Lai Jiangshan
f0bdf6d473 rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs field
The ->rcu_read_unlock_special.b.deferred_qs field is set to true in
rcu_read_unlock_special() but never set to false.  This is not
particularly useful, so this commit removes this field.

The only possible justification for this field is to ease debugging
of RCU deferred quiscent states, but the combination of the other
->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course
->rcu_read_lock_nesting should cover debugging needs.  And if this last
proves incorrect, this patch can always be reverted, along with the
required setting of ->rcu_read_unlock_special.b.deferred_qs to false
in rcu_preempt_deferred_qs_irqrestore().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Lai Jiangshan
07b4a930fc rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs()
Now that RCU flavors have been consolidated, an RCU-preempt
rcu_read_unlock() in an interrupt or softirq handler cannot possibly
end the RCU read-side critical section.  Consider the old vulnerability
involving rcu_preempt_deferred_qs() being invoked within such a handler
that interrupted an extended RCU read-side critical section, in which
a wakeup might be invoked with a scheduler lock held.  Because
rcu_read_unlock_special() no longer does wakeups in such situations,
it is no longer necessary for rcu_preempt_deferred_qs() to set the
nesting level negative.

This commit therefore removes this recursion-protection code from
rcu_preempt_deferred_qs().

[ paulmck: Fix typo in commit log per Steve Rostedt. ]
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
e4453d8a1c rcu: Make rcu_read_unlock_special() safe for rq/pi locks
The scheduler is currently required to hold rq/pi locks across the entire
RCU read-side critical section or not at all.  This is inconvenient and
leaves traps for the unwary, including the author of this commit.

But now that excessively long grace periods enable scheduling-clock
interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in
rcu_read_unlock_special() can be dispensed with.  In other words, the
rcu_read_unlock_special() function can refrain from doing wakeups unless
such wakeups are guaranteed safe.

This commit therefore avoids unsafe wakeups, freeing the scheduler to
hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU
read-side critical section might have been preempted.  This commit also
updates RCU's requirements documentation.

This commit is inspired by a patch from Lai Jiangshan:
https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com
This commit is further intended to be a step towards his goal of permitting
the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock().

Cc: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
c76e7e0bce rcu: Add KCSAN stubs to update.c
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(),
and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros
to move ahead.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
6be7436d22 rcu: Add rcu_gp_might_be_stalled()
This commit adds rcu_gp_might_be_stalled(), which returns true if there
is some reason to believe that the RCU grace period is stalled.  The use
case is where an RCU free-memory path needs to allocate memory in order
to free it, a situation that should be avoided where possible.

But where it is necessary, there is always the alternative of using
synchronize_rcu() to wait for a grace period in order to avoid the
allocation.  And if the grace period is stalled, allocating memory to
asynchronously wait for it is a bad idea of epic proportions: Far better
to let others use the memory, because these others might actually be
able to free that memory before the grace period ends.

Thus, rcu_gp_might_be_stalled() can be used to help decide whether
allocating memory on an RCU free path is a semi-reasonable course
of action.

Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:02:50 -07:00
Joel Fernandes (Google)
a6a82ce18b rcu/tree: Count number of batched kfree_rcu() locklessly
We can relax the correctness of counting of number of queued objects in
favor of not hurting performance, by locklessly sampling per-cpu
counters. This should be Ok since under high memory pressure, it should not
matter if we are off by a few objects while counting. The shrinker will
still do the reclaim.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Remove unused "flags" variable. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:02:50 -07:00
Joel Fernandes (Google)
9154244c1a rcu/tree: Add a shrinker to prevent OOM due to kfree_rcu() batching
To reduce grace periods and improve kfree() performance, we have done
batching recently dramatically bringing down the number of grace periods
while giving us the ability to use kfree_bulk() for efficient kfree'ing.

However, this has increased the likelihood of OOM condition under heavy
kfree_rcu() flood on small memory systems. This patch introduces a
shrinker which starts grace periods right away if the system is under
memory pressure due to existence of objects that have still not started
a grace period.

With this patch, I do not observe an OOM anymore on a system with 512MB
RAM and 8 CPUs, with the following rcuperf options:

rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000
rcuperf.kfree_rcu_test=1 rcuperf.kfree_mult=2

Otherwise it easily OOMs with the above parameters.

NOTE:
1. On systems with no memory pressure, the patch has no effect as intended.
2. In the future, we can use this same mechanism to prevent grace periods
   from happening even more, by relying on shrinkers carefully.

Cc: urezki@gmail.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:02:50 -07:00
Joel Fernandes (Google)
f87dc80800 rcuperf: Add ability to increase object allocation size
This allows us to increase memory pressure dynamically using a new
rcuperf boot command line parameter called 'rcumult'.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:02:50 -07:00
Paul E. McKenney
e2f3ccfa62 rcu: Convert rcu_nohz_full_cpu() ULONG_CMP_LT() to time_before()
This commit converts the ULONG_CMP_LT() in rcu_nohz_full_cpu() to
time_before() to reflect the fact that it is comparing a timestamp to
the jiffies counter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:17 -07:00
Paul E. McKenney
7b2413111a rcu: Convert rcu_initiate_boost() ULONG_CMP_GE() to time_after()
This commit converts the ULONG_CMP_GE() in rcu_initiate_boost() to
time_after() to reflect the fact that it is comparing a timestamp to
the jiffies counter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
29ffebc5fc rcu: Convert ULONG_CMP_GE() to time_after() for jiffy comparison
This commit converts the ULONG_CMP_GE() in rcu_gp_fqs_loop() to
time_after() to reflect the fact that it is comparing a timestamp to
the jiffies counter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Jules Irenge
da44cd6c8e rcu: Replace 1 by true
Coccinelle reports a warning at use_softirq declaration

WARNING: Assignment of 0/1 to bool variable

The root cause is
use_softirq a variable of bool type is initialised with the integer 1
Replacing 1 with value true solve the issue.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Jules Irenge
a66dbda789 rcu: Replace assigned pointer ret value by corresponding boolean value
Coccinelle reports warnings at rcu_read_lock_held_common()

WARNING: Assignment of 0/1 to bool variable

To fix this,
the assigned  pointer ret values are replaced by corresponding boolean value.
Given that ret is a pointer of bool type

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
62ae19511f rcu: Mark rcu_state.gp_seq to detect more concurrent writes
The rcu_state structure's gp_seq field is only to be modified by the RCU
grace-period kthread, which is single-threaded.  This commit therefore
enlists KCSAN's help in enforcing this restriction.  This commit applies
KCSAN-specific primitives, so cannot go upstream until KCSAN does.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Mauro Carvalho Chehab
c28d5c09d0 rcu: Get rid of some doc warnings in update.c
This commit escapes *ret, because otherwise the documentation system
thinks that this is an incomplete emphasis block:

	./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string.
	./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string.
	./kernel/rcu/update.c:70: WARNING: Inline emphasis start-string without end-string.
	./kernel/rcu/update.c:82: WARNING: Inline emphasis start-string without end-string.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Zhaolong Zhang
fcbcc0e700 rcu: Fix the (t=0 jiffies) false positive
It is possible that an over-long grace period will end while the RCU
CPU stall warning message is printing.  In this case, the estimate of
the offending grace period's duration can be erroneous due to refetching
of rcu_state.gp_start, which will now be the time of the newly started
grace period.  Computation of this duration clearly needs to use the
start time for the old over-long grace period, not the fresh new one.
This commit avoids such errors by causing both print_other_cpu_stall() and
print_cpu_stall() to reuse the value previously fetched by their caller.

Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
1fca4d12f4 rcu: Expedite first two FQS scans under callback-overload conditions
Even if some CPUs have excessive numbers of callbacks, RCU's grace-period
kthread will still wait normally between successive force-quiescent-state
scans.  The first two are the most important, as they are the ones that
enlist aid from the scheduler when overloaded.  This commit therefore
omits the wait before the first and the second force-quiescent-state
scan under callback-overload conditions.

This approach was inspired by a discussion with Jeff Roberson.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
47fbb07453 rcu: Use data_race() for RCU CPU stall-warning prints
Although the accesses used to determine whether or not a stall should
be printed are an integral part of the concurrency algorithm governing
use of the corresponding variables, the values that are simply printed
are ancillary.  As such, it is best to use data_race() for these accesses
in order to provide the greatest latitude in the use of KCSAN for the
other accesses that are an integral part of the algorithm.  This commit
therefore changes the relevant uses of READ_ONCE() to data_race().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
5822b8126f rcu: Add WRITE_ONCE() to rcu_node ->boost_tasks
The rcu_node structure's ->boost_tasks field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
b68c614651 srcu: Add data_race() to ->srcu_lock_count and ->srcu_unlock_count arrays
The srcu_data structure's ->srcu_lock_count and ->srcu_unlock_count arrays
are read and written locklessly, so this commit adds the data_race()
to the diagnostic-print loads from these arrays in order mark them as
known and approved data-racy accesses.

This data race was reported by KCSAN. Not appropriate for backporting due
to failure being unlikely and due to this being used only by rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
065a6db12a rcu: Add READ_ONCE and data_race() to rcu_node ->boost_tasks
The rcu_node structure's ->boost_tasks field is read locklessly, so this
commit adds the READ_ONCE() to one load in order to avoid destructive
compiler optimizations.  The other load is from a diagnostic print,
so data_race() suffices.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:16 -07:00
Paul E. McKenney
314eeb43e5 rcu: Add *_ONCE() and data_race() to rcu_node ->exp_tasks plus locking
There are lockless loads from the rcu_node structure's ->exp_tasks field,
so this commit causes all stores to use WRITE_ONCE() and all lockless
loads to use READ_ONCE() or data_race(), with the latter for debug
prints.  This code also did a unprotected traversal of the linked list
pointed into by ->exp_tasks, so this commit also acquires the rcu_node
structure's ->lock to properly protect this traversal.  This list was
traversed unprotected only when printing an RCU CPU stall warning for
an expedited grace period, so the odds of seeing this in production are
not all that high.

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:15 -07:00
Paul E. McKenney
2f08469563 rcu: Mark rcu_state.ncpus to detect concurrent writes
The rcu_state structure's ncpus field is only to be modified by the
CPU-hotplug CPU-online code path, which is single-threaded.  This
commit therefore enlists KCSAN's help in enforcing this restriction.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:15 -07:00
Paul E. McKenney
4f58820fd7 srcu: Add KCSAN stubs
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(),
and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to
move ahead.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:15 -07:00
Paul E. McKenney
353159365e rcu: Add KCSAN stubs
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(),
and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to
move ahead.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:00:06 -07:00
Ingo Molnar
40e7d7bdc1 Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull RCU fix from Paul E. McKenney.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-14 08:36:41 +02:00
Paul E. McKenney
bf37da98c5 rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common()
The rcu_nmi_enter_common() function can be invoked both in interrupt
and NMI handlers.  If it is invoked from process context (as opposed
to userspace or idle context) on a nohz_full CPU, it might acquire the
CPU's leaf rcu_node structure's ->lock.  Because this lock is held only
with interrupts disabled, this is safe from an interrupt handler, but
doing so from an NMI handler can result in self-deadlock.

This commit therefore adds "irq" to the "if" condition so as to only
acquire the ->lock from irq handlers or process context, never from
an NMI handler.

Fixes: 5b14557b07 ("rcu: Avoid tick_dep_set_cpu() misordering")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org> # 5.5.x
2020-04-05 14:22:15 -07:00
Linus Torvalds
4b9fd8a829 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued user-access cleanups in the futex code.

   - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
     instead of an embedded rwsem. This addresses a couple of
     weaknesses, but the primary motivation was complications on the -rt
     kernel.

   - Introduce raw lock nesting detection on lockdep
     (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
     lock differences. This too originates from -rt.

   - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
     footprint on distro-ish kernels running into the "BUG:
     MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
     chain-entries pool.

   - Misc cleanups, smaller fixes and enhancements - see the changelog
     for details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
  thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
  Documentation/locking/locktypes: Minor copy editor fixes
  Documentation/locking/locktypes: Further clarifications and wordsmithing
  m68knommu: Remove mm.h include from uaccess_no.h
  x86: get rid of user_atomic_cmpxchg_inatomic()
  generic arch_futex_atomic_op_inuser() doesn't need access_ok()
  x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
  x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
  objtool: whitelist __sanitizer_cov_trace_switch()
  [parisc, s390, sparc64] no need for access_ok() in futex handling
  sh: no need of access_ok() in arch_futex_atomic_op_inuser()
  futex: arch_futex_atomic_op_inuser() calling conventions change
  completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
  lockdep: Add posixtimer context tracing bits
  lockdep: Annotate irq_work
  lockdep: Add hrtimer context tracing bits
  lockdep: Introduce wait-type checks
  completion: Use simple wait queues
  sched/swait: Prepare usage in completions
  ...
2020-03-30 16:17:15 -07:00
Paul E. McKenney
aa93ec620b Merge branches 'doc.2020.02.27a', 'fixes.2020.03.21a', 'kfree_rcu.2020.02.20a', 'locktorture.2020.02.20a', 'ovld.2020.02.20a', 'rcu-tasks.2020.02.20a', 'srcu.2020.02.20a' and 'torture.2020.02.20a' into HEAD
doc.2020.02.27a: Documentation updates.
fixes.2020.03.21a: Miscellaneous fixes.
kfree_rcu.2020.02.20a: Updates to kfree_rcu().
locktorture.2020.02.20a: Lock torture-test updates.
ovld.2020.02.20a: Updates to callback-overload handling.
rcu-tasks.2020.02.20a: RCU-tasks updates.
srcu.2020.02.20a: SRCU updates.
torture.2020.02.20a: Torture-test updates.
2020-03-21 17:15:11 -07:00
Paul E. McKenney
127e29815b rcu: Make rcu_barrier() account for offline no-CBs CPUs
Currently, rcu_barrier() ignores offline CPUs,  However, it is possible
for an offline no-CBs CPU to have callbacks queued, and rcu_barrier()
must wait for those callbacks.  This commit therefore makes rcu_barrier()
directly invoke the rcu_barrier_func() with interrupts disabled for such
CPUs.  This requires passing the CPU number into this function so that
it can entrain the rcu_barrier() callback onto the correct CPU's callback
list, given that the code must instead execute on the current CPU.

While in the area, this commit fixes a bug where the first CPU's callback
might have been invoked before rcu_segcblist_entrain() returned, which
would also result in an early wakeup.

Fixes: 5d6742b377 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Apply optimization feedback from Boqun Feng. ]
Cc: <stable@vger.kernel.org> # 5.5.x
2020-03-21 16:14:25 -07:00
Paul E. McKenney
0f11ad323d rcu: Mark rcu_state.gp_seq to detect concurrent writes
The rcu_state structure's gp_seq field is only to be modified by the RCU
grace-period kthread, which is single-threaded.  This commit therefore
enlists KCSAN's help in enforcing this restriction.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-03-21 16:13:39 -07:00
Sebastian Andrzej Siewior
49915ac35c lockdep: Annotate irq_work
Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de
2020-03-21 16:00:24 +01:00
Peter Zijlstra
de8f5e4f2d lockdep: Introduce wait-type checks
Extend lockdep to validate lock wait-type context.

The current wait-types are:

	LD_WAIT_FREE,		/* wait free, rcu etc.. */
	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock(&foo);
  spin_lock(&bar);
  spin_unlock(&bar);
  raw_spin_unlock(&foo);

 [ BUG: Invalid wait context ]
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
2020-03-21 16:00:24 +01:00
Paul E. McKenney
9470a18fab rcutorture: Manually clean up after rcu_barrier() failure
Currently, if rcu_barrier() returns too soon, the test waits 100ms and
then does another instance of the test.  However, if rcu_barrier() were
to have waited for more than 100ms too short a time, this could cause
the test's rcu_head structures to be reused while they were still on
RCU's callback lists.  This can result in knock-on errors that obscure
the original rcu_barrier() test failure.

This commit therefore adds code that attempts to wait until all of
the test's callbacks have been invoked.  Of course, if RCU completely
lost track of the corresponding rcu_head structures, this wait could be
forever.  This commit therefore also complains if this attempted recovery
takes more than one second, and it also gives up when the test ends.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:31 -08:00
Paul E. McKenney
50d4b62970 rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU
Currently, rcu_torture_barrier_cbs() posts callbacks from whatever CPU
it is running on, which means that all these kthreads might well be
posting from the same CPU, which would drastically reduce the effectiveness
of this test.  This commit therefore uses IPIs to make the callbacks be
posted from the corresponding CPU (given by local variable myid).

If the IPI fails (which can happen if the target CPU is offline or does
not exist at all), the callback is posted on whatever CPU is currently
running.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:31 -08:00
Joel Fernandes (Google)
12af660321 rcuperf: Measure memory footprint during kfree_rcu() test
During changes to kfree_rcu() code, we often check the amount of free
memory.  As an alternative to checking this manually, this commit adds a
measurement in the test itself.  It measures four times during the test
for available memory, digitally filters these measurements to produce a
running average with a weight of 0.5, and compares this digitally filtered
value with the amount of available memory at the beginning of the test.

Something like the following is printed at the end of the run:

Total time taken by all kfree'ers: 6369738407 ns, loops: 10000, batches: 764, memory footprint: 216MB

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:31 -08:00
Paul E. McKenney
5396d31d3a rcutorture: Annotation lockless accesses to rcu_torture_current
The rcutorture global variable rcu_torture_current is accessed locklessly,
so it must use the RCU pointer load/store primitives.  This commit
therefore adds several that were missed.

This data race was reported by KCSAN.  Not appropriate for backporting due
to failure being unlikely and due to this being used only by rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:31 -08:00
Paul E. McKenney
f042a436c8 rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch
The rcutorture rcu_torture_count and rcu_torture_batch per-CPU variables
are read locklessly, so this commit adds the READ_ONCE() to a load in
order to avoid various types of compiler vandalism^Woptimization.

This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely and due to this being rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:31 -08:00
Paul E. McKenney
102c14d2f8 rcutorture: Fix stray access to rcu_fwd_cb_nodelay
The rcu_fwd_cb_nodelay variable suppresses excessively long read-side
delays while carrying out an rcutorture forward-progress test.  As such,
it is accessed both by readers and updaters, and most of the accesses
therefore use *_ONCE().  Except for one in rcu_read_delay(), which this
commit fixes.

This data race was reported by KCSAN.  Not appropriate for backporting
due to this being rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:31 -08:00
Paul E. McKenney
202489101f rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race
The ->rtort_pipe_count field in the rcu_torture structure checks for
too-short grace periods, and is therefore read by rcutorture's readers
while being updated by rcutorture's writers.  This commit therefore
adds the needed READ_ONCE() and WRITE_ONCE() invocations.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely and due to this being rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:31 -08:00
Paul E. McKenney
4ab00bdd99 rcutorture: Suppress boottime bad-sequence warnings
In normal production, an excessively long wait on a grace period
(synchronize_rcu(), for example) at boottime is often just as bad
as at any other time.  In fact, given the desire for fast boot, any
sort of long wait at boot is a bad idea.  However, heavy rcutorture
testing on large hyperthreaded systems can generate such long waits
during boot as a matter of course.  This commit therefore causes
the rcupdate.rcu_cpu_stall_suppress_at_boot kernel boot parameter to
suppress reporting of bootime bad-sequence warning due to excessively
long grace-period waits.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:30 -08:00
Paul E. McKenney
58c53360b3 rcutorture: Allow boottime stall warnings to be suppressed
In normal production, an RCU CPU stall warning at boottime is often
just as bad as at any other time.  In fact, given the desire for fast
boot, any sort of long-term stall at boot is a bad idea.  However,
heavy rcutorture testing on large hyperthreaded systems can generate
boottime RCU CPU stalls as a matter of course.  This commit therefore
provides a kernel boot parameter that suppresses reporting of boottime
RCU CPU stall warnings and similarly of rcutorture writer stalls.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:30 -08:00
Paul E. McKenney
435508095a rcutorture: Refrain from callback flooding during boot
Additional rcutorture aggression can result in, believe it or not,
boot times in excess of three minutes on large hyperthreaded systems.
This is long enough for rcutorture to decide to do some callback flooding,
which seems a bit excessive given that userspace cannot have started
until long after boot, and it is userspace that does the real-world
callback flooding.  Worse yet, because Tiny RCU lacks forward-progress
functionality, the looping-in-the-kernel tests can also be problematic
during early boot.

This commit therefore causes rcutorture to hold off on callback
flooding until about the time that init is spawned, and the same
for looping-in-the-kernel tests for Tiny RCU.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:30 -08:00
Paul E. McKenney
59ee0326cc rcutorture: Suppress forward-progress complaints during early boot
Some larger systems can take in excess of 50 seconds to complete their
early boot initcalls prior to spawing init.  This does not in any way
help the forward-progress judgments of built-in rcutorture (when
rcutorture is built as a module, the insmod or modprobe command normally
cannot happen until some time after boot completes).  This commit
therefore suppresses such complaints until about the time that init
is spawned.

This also includes a fix to a stupid error located by kbuild test robot.

[ paulmck: Apply kbuild test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Fix to nohz_full slow-expediting recovery logic, per bpetkov. ]
[ paulmck: Restrict splat to CONFIG_PREEMPT_RT=y kernels and simplify. ]
Tested-by: Borislav Petkov <bp@alien8.de>
2020-02-20 16:03:30 -08:00
Paul E. McKenney
710426068d srcu: Hold srcu_struct ->lock when updating ->srcu_gp_seq
A read of the srcu_struct structure's ->srcu_gp_seq field should not
need READ_ONCE() when that structure's ->lock is held.  Except that this
lock is not always held when updating this field.  This commit therefore
acquires the lock around updates and removes a now-unneeded READ_ONCE().

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Switch from READ_ONCE() to lock per Peter Zilstra question. ]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-02-20 16:01:11 -08:00
Paul E. McKenney
39f91504a0 srcu: Fix process_srcu()/srcu_batches_completed() datarace
The srcu_struct structure's ->srcu_idx field is accessed locklessly,
so reads must use READ_ONCE().  This commit therefore adds the needed
READ_ONCE() invocation where it was missed.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:01:11 -08:00
Paul E. McKenney
8c9e0cb323 srcu: Fix __call_srcu()/srcu_get_delay() datarace
The srcu_struct structure's ->srcu_gp_seq_needed_exp field is accessed
locklessly, so updates must use WRITE_ONCE().  This commit therefore
adds the needed WRITE_ONCE() invocations.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:01:11 -08:00
Paul E. McKenney
7ff8b4502b srcu: Fix __call_srcu()/process_srcu() datarace
The srcu_node structure's ->srcu_gp_seq_needed_exp field is accessed
locklessly, so updates must use WRITE_ONCE().  This commit therefore
adds the needed WRITE_ONCE() invocations.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:01:11 -08:00
Jules Irenge
90ba11ba99 rcu: Add missing annotation for exit_tasks_rcu_finish()
Sparse reports a warning at exit_tasks_rcu_finish(void)

|warning: context imbalance in exit_tasks_rcu_finish()
|- wrong count at exit

To fix this, this commit adds a __releases(&tasks_rcu_exit_srcu).
Given that exit_tasks_rcu_finish() does actually call __srcu_read_lock(),
this not only fixes the warning but also improves on the readability of
the code.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2020-02-20 16:00:45 -08:00
Jules Irenge
e1e9bdc00a rcu: Add missing annotation for exit_tasks_rcu_start()
Sparse reports a warning at exit_tasks_rcu_start(void)

|warning: context imbalance in exit_tasks_rcu_start() - wrong count at exit

To fix this, this commit adds an __acquires(&tasks_rcu_exit_srcu).
Given that exit_tasks_rcu_start() does actually call __srcu_read_lock(),
this not only fixes the warning but also improves on the readability of
the code.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2020-02-20 16:00:45 -08:00
Paul E. McKenney
fcb7381265 rcu-tasks: *_ONCE() for rcu_tasks_cbs_head
The RCU tasks list of callbacks, rcu_tasks_cbs_head, is sampled locklessly
by rcu_tasks_kthread() when waiting for work to do.  This commit therefore
applies READ_ONCE() to that lockless sampling and WRITE_ONCE() to the
single potential store outside of rcu_tasks_kthread.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:00:45 -08:00
Paul E. McKenney
b692dc4adf rcu: Update __call_rcu() comments
The __call_rcu() function's header comment refers to a cpu argument
that no longer exists, and the comment of the return path from
rcu_nocb_try_bypass() ignores the non-no-CBs CPU case.  This commit
therefore update both comments.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:00:20 -08:00
Colin Ian King
aa96a93ba2 rcu: Fix spelling mistake "leval" -> "level"
This commit fixes a spelling mistake in a pr_info() message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:00:20 -08:00
Paul E. McKenney
8c14263d35 rcu: React to callback overload by boosting RCU readers
RCU priority boosting currently is not applied until the grace period
is at least 250 milliseconds old (or the number of milliseconds specified
by the CONFIG_RCU_BOOST_DELAY Kconfig option).  Although this has worked
well, it can result in OOM under conditions of RCU callback flooding.
One can argue that the real-time systems using RCU priority boosting
should carefully avoid RCU callback flooding, but one can just as well
argue that an OOM is a rather obnoxious error message.

This commit therefore disables the RCU priority boosting delay when
there are excessive numbers of callbacks queued.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:00:20 -08:00
Paul E. McKenney
b2b00ddf19 rcu: React to callback overload by aggressively seeking quiescent states
In default configutions, RCU currently waits at least 100 milliseconds
before asking cond_resched() and/or resched_rcu() for help seeking
quiescent states to end a grace period.  But 100 milliseconds can be
one good long time during an RCU callback flood, for example, as can
happen when user processes repeatedly open and close files in a tight
loop.  These 100-millisecond gaps in successive grace periods during a
callback flood can result in excessive numbers of callbacks piling up,
unnecessarily increasing memory footprint.

This commit therefore asks cond_resched() and/or resched_rcu() for help
as early as the first FQS scan when at least one of the CPUs has more
than 20,000 callbacks queued, a number that can be changed using the new
rcutree.qovld kernel boot parameter.  An auxiliary qovld_calc variable
is used to avoid acquisition of locks that have not yet been initialized.
Early tests indicate that this reduces the RCU-callback memory footprint
during rcutorture floods by from 50% to 4x, depending on configuration.

Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: Tejun Heo <tj@kernel.org>
[ paulmck: Fix bug located by Qian Cai. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Qian Cai <cai@lca.pw>
2020-02-20 16:00:20 -08:00
Paul E. McKenney
b5ea03709d rcu: Clear ->core_needs_qs at GP end or self-reported QS
The rcu_data structure's ->core_needs_qs field does not necessarily get
cleared in a timely fashion after the corresponding CPUs' quiescent state
has been reported.  From a functional viewpoint, no harm done, but this
can result in excessive invocation of RCU core processing, as witnessed
by the kernel test robot, which saw greatly increased softirq overhead.

This commit therefore restores the rcu_report_qs_rdp() function's
clearing of this field, but only when running on the corresponding CPU.
Cases where some other CPU reports the quiescent state (for example, on
behalf of an idle CPU) are handled by setting this field appropriately
within the __note_gp_changes() function's end-of-grace-period checks.
This handling is carried out regardless of whether the end of a grace
period actually happened, thus handling the case where a CPU goes non-idle
after a quiescent state is reported on its behalf, but before the grace
period ends.  This fix also avoids cross-CPU updates to ->core_needs_qs,

While in the area, this commit changes the __note_gp_changes() need_gp
variable's name to need_qs because it is a quiescent state that is needed
from the CPU in question.

Fixes: ed93dfc6bc ("rcu: Confine ->core_needs_qs accesses to the corresponding CPU")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:00:20 -08:00
Uladzislau Rezki (Sony)
613707929b rcu: Add a trace event for kfree_rcu() use of kfree_bulk()
The event is given three parameters, first one is the name
of RCU flavour, second one is the number of elements in array
for free and last one is an address of the array holding
pointers to be freed by the kfree_bulk() function.

To enable the trace event your kernel has to be build with
CONFIG_RCU_TRACE=y, after that it is possible to track the
events using ftrace subsystem.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:51 -08:00
Uladzislau Rezki (Sony)
34c8817455 rcu: Support kfree_bulk() interface in kfree_rcu()
The kfree_rcu() logic can be improved further by using kfree_bulk()
interface along with "basic batching support" introduced earlier.

The are at least two advantages of using "bulk" interface:
- in case of large number of kfree_rcu() requests kfree_bulk()
  reduces the per-object overhead caused by calling kfree()
  per-object.

- reduces the number of cache-misses due to "pointer chasing"
  between objects which can be far spread between each other.

This approach defines a new kfree_rcu_bulk_data structure that
stores pointers in an array with a specific size. Number of entries
in that array depends on PAGE_SIZE making kfree_rcu_bulk_data
structure to be exactly one page.

Since it deals with "block-chain" technique there is an extra
need in dynamic allocation when a new block is required. Memory
is allocated with GFP_NOWAIT | __GFP_NOWARN flags, i.e. that
allows to skip direct reclaim under low memory condition to
prevent stalling and fails silently under high memory pressure.

The "emergency path" gets maintained when a system is run out of
memory. In that case objects are linked into regular list.

The "rcuperf" was run to analyze this change in terms of memory
consumption and kfree_bulk() throughput.

1) Testing on the Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz, 12xCPUs
with following parameters:

kfree_loops=200000 kfree_alloc_num=1000 kfree_rcu_test=1 kfree_vary_obj_size=1
dev.2020.01.10a branch

Default / CONFIG_SLAB
53607352517 ns, loops: 200000, batches: 1885, memory footprint: 1248MB
53529637912 ns, loops: 200000, batches: 1921, memory footprint: 1193MB
53570175705 ns, loops: 200000, batches: 1929, memory footprint: 1250MB

Patch / CONFIG_SLAB
23981587315 ns, loops: 200000, batches: 810, memory footprint: 1219MB
23879375281 ns, loops: 200000, batches: 822, memory footprint: 1190MB
24086841707 ns, loops: 200000, batches: 794, memory footprint: 1380MB

Default / CONFIG_SLUB
51291025022 ns, loops: 200000, batches: 1713, memory footprint: 741MB
51278911477 ns, loops: 200000, batches: 1671, memory footprint: 719MB
51256183045 ns, loops: 200000, batches: 1719, memory footprint: 647MB

Patch / CONFIG_SLUB
50709919132 ns, loops: 200000, batches: 1618, memory footprint: 456MB
50736297452 ns, loops: 200000, batches: 1633, memory footprint: 507MB
50660403893 ns, loops: 200000, batches: 1628, memory footprint: 429MB

in case of CONFIG_SLAB there is double increase in performance and
slightly higher memory usage. As for CONFIG_SLUB, the performance
figures are better together with lower memory usage.

2) Testing on the HiKey-960, arm64, 8xCPUs with below parameters:

CONFIG_SLAB=y
kfree_loops=200000 kfree_alloc_num=1000 kfree_rcu_test=1

102898760401 ns, loops: 200000, batches: 5822, memory footprint: 158MB
89947009882  ns, loops: 200000, batches: 6715, memory footprint: 115MB

rcuperf shows approximately ~12% better throughput in case of
using "bulk" interface. The "drain logic" or its RCU callback
does the work faster that leads to better throughput.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:51 -08:00
Paul E. McKenney
3d05031ae6 rcu: Make nocb_gp_wait() double-check unexpected-callback warning
Currently, nocb_gp_wait() unconditionally complains if there is a
callback not already associated with a grace period.  This assumes that
either there was no such callback initially on the one hand, or that
the rcu_advance_cbs() function assigned all such callbacks to a grace
period on the other.  However, in theory there are some situations that
would prevent rcu_advance_cbs() from assigning all of the callbacks.

This commit therefore checks for unassociated callbacks immediately after
rcu_advance_cbs() returns, while the corresponding rcu_node structure's
->lock is still held.  If there are unassociated callbacks at that point,
the subsequent WARN_ON_ONCE() is disabled.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:23 -08:00
Paul E. McKenney
13817dd589 rcu: Tighten rcu_lockdep_assert_cblist_protected() check
The ->nocb_lock lockdep assertion is currently guarded by cpu_online(),
which is incorrect for no-CBs CPUs, whose callback lists must be
protected by ->nocb_lock regardless of whether or not the corresponding
CPU is online.  This situation could result in failure to detect bugs
resulting from failing to hold ->nocb_lock for offline CPUs.

This commit therefore removes the cpu_online() guard.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:23 -08:00
Paul E. McKenney
faa059c397 rcu: Optimize and protect atomic_cmpxchg() loop
This commit reworks the atomic_cmpxchg() loop in rcu_eqs_special_set()
to do only the initial read from the current CPU's rcu_data structure's
->dynticks field explicitly.  On subsequent passes, this value is instead
retained from the failing atomic_cmpxchg() operation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:23 -08:00
Jules Irenge
92c0b889f2 rcu/nocb: Add missing annotation for rcu_nocb_bypass_unlock()
Sparse reports warning at rcu_nocb_bypass_unlock()

warning: context imbalance in rcu_nocb_bypass_unlock() - unexpected unlock

The root cause is a missing annotation of rcu_nocb_bypass_unlock()
which causes the warning.

This commit therefore adds the missing __releases(&rdp->nocb_bypass_lock)
annotation.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
2020-02-20 15:58:23 -08:00
Jules Irenge
9ced454807 rcu: Add missing annotation for rcu_nocb_bypass_lock()
Sparse reports warning at rcu_nocb_bypass_lock()

|warning: context imbalance in rcu_nocb_bypass_lock() - wrong count at exit

To fix this, this commit adds an __acquires(&rdp->nocb_bypass_lock).
Given that rcu_nocb_bypass_lock() does actually call raw_spin_lock()
when raw_spin_trylock() fails, this not only fixes the warning but also
improves on the readability of the code.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:23 -08:00
Paul E. McKenney
5648d65912 rcu: Don't flag non-starting GPs before GP kthread is running
Currently rcu_check_gp_start_stall() complains if a grace period takes
too long to start, where "too long" is roughly one RCU CPU stall-warning
interval.  This has worked well, but there are some debugging Kconfig
options (such as CONFIG_EFI_PGT_DUMP=y) that can make booting take a
very long time, so much so that the stall-warning interval has expired
before RCU's grace-period kthread has even been spawned.

This commit therefore resets the rcu_state.gp_req_activity and
rcu_state.gp_activity timestamps just before the grace-period kthread
is spawned, and modifies the checks and adds ordering to ensure that
if rcu_check_gp_start_stall() sees that the grace-period kthread
has been spawned, that it will also see the resets applied to the
rcu_state.gp_req_activity and rcu_state.gp_activity timestamps.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Fix whitespace issues reported by Qian Cai. ]
Tested-by: Qian Cai <cai@lca.pw>
[ paulmck: Simplify grace-period wakeup check per Steve Rostedt feedback. ]
2020-02-20 15:58:23 -08:00
Paul E. McKenney
aa24f93753 rcu: Fix rcu_barrier_callback() race condition
The rcu_barrier_callback() function does an atomic_dec_and_test(), and
if it is the last CPU to check in, does the required wakeup.  Either way,
it does an event trace.  Unfortunately, this is susceptible to the
following sequence of events:

o	CPU 0 invokes rcu_barrier_callback(), but atomic_dec_and_test()
	says that it is not last.  But at this point, CPU 0 is delayed,
	perhaps due to an NMI, SMI, or vCPU preemption.

o	CPU 1 invokes rcu_barrier_callback(), and atomic_dec_and_test()
	says that it is last.  So CPU 1 traces completion and does
	the needed wakeup.

o	The awakened rcu_barrier() function does cleanup and releases
	rcu_state.barrier_mutex.

o	Another CPU now acquires rcu_state.barrier_mutex and starts
	another round of rcu_barrier() processing, including updating
	rcu_state.barrier_sequence.

o	CPU 0 gets its act back together and does its tracing.  Except
	that rcu_state.barrier_sequence has already been updated, so
	its tracing is incorrect and probably quite confusing.
	(Wait!  Why did this CPU check in twice for one rcu_barrier()
	invocation???)

This commit therefore causes rcu_barrier_callback() to take a
snapshot of the value of rcu_state.barrier_sequence before invoking
atomic_dec_and_test(), thus guaranteeing that the event-trace output
is sensible, even if the timing of the event-trace output might still
be confusing.  (Wait!  Why did the old rcu_barrier() complete before
all of its CPUs checked in???)  But being that this is RCU, only so much
confusion can reasonably be eliminated.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely and due to the mild consequences of the
failure, namely a confusing event trace.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:23 -08:00
Paul E. McKenney
59881bcd85 rcu: Add WRITE_ONCE() to rcu_state ->gp_start
The rcu_state structure's ->gp_start field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:23 -08:00
Paul E. McKenney
57721fd15a rcu: Remove dead code from rcu_segcblist_insert_pend_cbs()
The rcu_segcblist_insert_pend_cbs() function currently (partially)
initializes the rcu_cblist that it pulls callbacks from.  However, all
the resulting stores are dead because all callers pass in the address of
an on-stack cblist that is not used afterwards.  This commit therefore
removes this pointless initialization.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:23 -08:00
Paul E. McKenney
3ca3b0e2cb rcu: Add *_ONCE() to rcu_node ->boost_kthread_status
The rcu_node structure's ->boost_kthread_status field is accessed
locklessly, so this commit causes all updates to use WRITE_ONCE() and
all reads to use READ_ONCE().

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
2a2ae872ef rcu: Add *_ONCE() to rcu_data ->rcu_forced_tick
The rcu_data structure's ->rcu_forced_tick field is read locklessly, so
this commit adds WRITE_ONCE() to all updates and READ_ONCE() to all
lockless reads.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
a5b8950180 rcu: Add READ_ONCE() to rcu_data ->gpwrap
The rcu_data structure's ->gpwrap field is read locklessly, and so
this commit adds the required READ_ONCE() to a pair of laods in order
to avoid destructive compiler optimizations.

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
SeongJae Park
65bb0dc437 rcu: Fix typos in file-header comments
Convert to plural and add a note that this is for Tree RCU.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
8ff37290d6 rcu: Add *_ONCE() for grace-period progress indicators
The various RCU structures' ->gp_seq, ->gp_seq_needed, ->gp_req_activity,
and ->gp_activity fields are read locklessly, so they must be updated with
WRITE_ONCE() and, when read locklessly, with READ_ONCE().  This commit makes
these changes.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
bfeebe2421 rcu: Add READ_ONCE() to rcu_segcblist ->tails[]
The rcu_segcblist structure's ->tails[] array entries are read
locklessly, so this commit adds the READ_ONCE() to a load in order to
avoid destructive compiler optimizations.

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
105abf82b0 rcu: Add WRITE_ONCE() to rcu_node ->qsmaskinitnext
The rcu_state structure's ->qsmaskinitnext field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely for systems not doing incessant CPU-hotplug
operations.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
2906d2154c rcu: Add WRITE_ONCE() to rcu_state ->gp_req_activity
The rcu_state structure's ->gp_req_activity field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
0937d04573 rcu: Add READ_ONCE() to rcu_node ->gp_seq
The rcu_node structure's ->gp_seq field is read locklessly, so this
commit adds the READ_ONCE() to several loads in order to avoid
destructive compiler optimizations.

This data race was reported by KCSAN.  Not appropriate for backporting
because this affects only tracing and warnings.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
b0c18c8773 rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store
The rcu_node structure's ->exp_seq_rq field is read locklessly, so
this commit adds the WRITE_ONCE() to a load in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
7672d647dd rcu: Add WRITE_ONCE() to rcu_node ->qsmask update
The rcu_node structure's ->qsmask field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
8a7e8f5171 rcu: Provide debug symbols and line numbers in KCSAN runs
This commit adds "-g -fno-omit-frame-pointer" to ease interpretation
of KCSAN output, but only for CONFIG_KCSAN=y kerrnels.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:21 -08:00
Paul E. McKenney
24bb9eccf7 rcu: Fix exp_funnel_lock()/rcu_exp_wait_wake() datarace
The rcu_node structure's ->exp_seq_rq field is accessed locklessly, so
updates must use WRITE_ONCE().  This commit therefore adds the needed
WRITE_ONCE() invocation where it was missed.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:21 -08:00
Paul E. McKenney
82dd8419e2 rcu: Warn on for_each_leaf_node_cpu_mask() from non-leaf
The for_each_leaf_node_cpu_mask() and for_each_leaf_node_possible_cpu()
macros must be invoked only on leaf rcu_node structures.  Failing to
abide by this restriction can result in infinite loops on systems with
more than 64 CPUs (or for more than 32 CPUs on 32-bit systems).  This
commit therefore adds WARN_ON_ONCE() calls to make misuse of these two
macros easier to debug.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:21 -08:00
Paul E. McKenney
59d8cc6b2e rcu: Forgive slow expedited grace periods at boot time
Boot-time processing often loops in the kernel longer than one might
prefer, which can prevent expedited grace periods from completing in
a timely manner.  This in turn triggers a splat In nohz_full CPUs  One
could argue that long-looping code should be fixed, but on the other hand,
boot time is a bit special.

This commit therefore removes the splat.  Later commits will add the
splat back in, but in a way that removes false positives.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-25 12:00:40 -08:00
Paul E. McKenney
0e247386d9 Merge branches 'doc.2019.12.10a', 'exp.2019.12.09a', 'fixes.2020.01.24a', 'kfree_rcu.2020.01.24a', 'list.2020.01.10a', 'preempt.2020.01.24a' and 'torture.2019.12.09a' into HEAD
doc.2019.12.10a: Documentations updates
exp.2019.12.09a: Expedited grace-period updates
fixes.2020.01.24a: Miscellaneous fixes
kfree_rcu.2020.01.24a: Batch kfree_rcu() work
list.2020.01.10a: RCU-protected-list updates
preempt.2020.01.24a: Preemptible RCU updates
torture.2019.12.09a: Torture-test updates
2020-01-24 10:37:27 -08:00
Paul E. McKenney
f6105fc2a9 rcu: Remove unused stop-machine #include
Long ago, RCU used the stop-machine mechanism to implement expedited
grace periods, but no longer does so.  This commit therefore removes
the no-longer-needed #includes of linux/stop_machine.h.

Link: https://lwn.net/Articles/805317/
Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:52 -08:00
Paul E. McKenney
844a378de3 srcu: Apply *_ONCE() to ->srcu_last_gp_end
The ->srcu_last_gp_end field is accessed from any CPU at any time
by synchronize_srcu(), so non-initialization references need to use
READ_ONCE() and WRITE_ONCE().  This commit therefore makes that change.

Reported-by: syzbot+08f3e9d26e5541e1ecf2@syzkaller.appspotmail.com
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:51 -08:00
Paul E. McKenney
7441e7661d rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()
Currently, force_qs_rnp() uses a for_each_leaf_node_possible_cpu()
loop containing a check of the current CPU's bit in ->qsmask.
This works, but this commit saves three lines by instead using
for_each_leaf_node_cpu_mask(), which combines the functionality of
for_each_leaf_node_possible_cpu() and leaf_node_cpu_bit().  This commit
also replaces the use of the local variable "bit" with rdp->grpmask.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:51 -08:00
Ben Dooks
e1350e8e0e rcu: Move rcu_{expedited,normal} definitions into rcupdate.h
This commit moves the rcu_{expedited,normal} definitions from
kernel/rcu/update.c to include/linux/rcupdate.h to make sure they are
in sync, and also to avoid the following warning from sparse:

kernel/ksysfs.c:150:5: warning: symbol 'rcu_expedited' was not declared. Should it be static?
kernel/ksysfs.c:167:5: warning: symbol 'rcu_normal' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:50 -08:00
Lai Jiangshan
e2167b38c8 rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h
Only tree_stall.h needs to get name from GP state, so this commit
moves the gp_state_names[] array and the gp_state_getname()
from kernel/rcu/tree.h and kernel/rcu/tree.c, respectively, to
kernel/rcu/tree_stall.h.  While moving gp_state_names[], this commit
uses the GCC syntax to ensure that the right string is associated with
the right CPP macro.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:45 -08:00
Lai Jiangshan
4778339df0 rcu: Remove the declaration of call_rcu() in tree.h
The call_rcu() function is an external RCU API that is declared in
include/linux/rcupdate.h.  There is thus no point in redeclaring it
in kernel/rcu/tree.h, so this commit removes that redundant declaration.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:38 -08:00
Lai Jiangshan
2488a5e695 rcu: Fix tracepoint tracking RCU CPU kthread utilization
In the call to trace_rcu_utilization() at the start of the loop in
rcu_cpu_kthread(), "rcu_wait" is incorrect, plus this trace event needs
to be hoisted above the loop to balance with either the "rcu_wait" or
"rcu_yield", depending on how the loop exits.  This commit therefore
makes these changes.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:31 -08:00
Lai Jiangshan
822175e729 rcu: Fix harmless omission of "CONFIG_" from #if condition
The C preprocessor macros SRCU and TINY_RCU should instead be CONFIG_SRCU
and CONFIG_TINY_RCU, respectively in the #f in kernel/rcu/rcu.h. But
there is no harm when "TINY_RCU" is wrongly used, which are always
non-defined, which makes "!defined(TINY_RCU)" always true, which means
the code block is always included, and the included code block doesn't
cause any compilation error so far in CONFIG_TINY_RCU builds.  It is
also the reason this change should not be taken in -stable.

This commit adds the needed "CONFIG_" prefix to both macros.

Not for -stable.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:33:13 -08:00
Paul E. McKenney
5b14557b07 rcu: Avoid tick_dep_set_cpu() misordering
In the current code, rcu_nmi_enter_common() might decide to turn on
the tick using tick_dep_set_cpu(), but be delayed just before doing so.
Then the grace-period kthread might notice that the CPU in question had
in fact gone through a quiescent state, thus turning off the tick using
tick_dep_clear_cpu().  The later invocation of tick_dep_set_cpu() would
then incorrectly leave the tick on.

This commit therefore enlists the aid of the leaf rcu_node structure's
->lock to ensure that decisions to enable or disable the tick are
carried out before they can be reversed.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:27:33 -08:00
Lai Jiangshan
77339e61aa rcu: Provide wrappers for uses of ->rcu_read_lock_nesting
This commit provides wrapper functions for uses of ->rcu_read_lock_nesting
to improve readability and to ease future changes to support inlining
of __rcu_read_lock() and __rcu_read_unlock().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:27:33 -08:00
Paul E. McKenney
c51f83c315 rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()
The rcu_node structure's ->expmask field is updated only when holding the
->lock, but is also accessed locklessly.  This means that all ->expmask
updates must use WRITE_ONCE() and all reads carried out without holding
->lock must use READ_ONCE().  This commit therefore changes the lockless
->expmask read in rcu_read_unlock_special() to use READ_ONCE().

Reported-by: syzbot+99f4ddade3c22ab0cf23@syzkaller.appspotmail.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
2020-01-24 10:27:33 -08:00
Lai Jiangshan
3717e1e9f2 rcu: Clear ->rcu_read_unlock_special only once
In rcu_preempt_deferred_qs_irqrestore(), ->rcu_read_unlock_special is
cleared one piece at a time.  Given that the "if" statements in this
function use the copy in "special", this commit removes the clearing
of the individual pieces in favor of clearing ->rcu_read_unlock_special
in one go just after it has been determined to be non-zero.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:27:33 -08:00
Lai Jiangshan
2eeba5838f rcu: Clear .exp_hint only when deferred quiescent state has been reported
Currently, the .exp_hint flag is cleared in rcu_read_unlock_special(),
which works, but which can also prevent subsequent rcu_read_unlock() calls
from helping expedite the quiescent state needed by an ongoing expedited
RCU grace period.  This commit therefore defers clearing of .exp_hint
from rcu_read_unlock_special() to rcu_preempt_deferred_qs_irqrestore(),
thus ensuring that intervening calls to rcu_read_unlock() have a chance
to help end the expedited grace period.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:27:33 -08:00
Lai Jiangshan
c130d2dc93 rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
CONFIG_PREEMPTION and CONFIG_PREEMPT_RCU are always identical,
but some code depends on CONFIG_PREEMPTION to access to
rcu_preempt functionality. This patch changes CONFIG_PREEMPTION
to CONFIG_PREEMPT_RCU in these cases.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:26:28 -08:00
Joel Fernandes (Google)
189a6883dc rcu: Remove kfree_call_rcu_nobatch()
Now that the kfree_rcu() special-casing has been removed from tree RCU,
this commit removes kfree_call_rcu_nobatch() since it is no longer needed.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Joel Fernandes (Google)
77a40f9703 rcu: Remove kfree_rcu() special casing and lazy-callback handling
This commit removes kfree_rcu() special-casing and the lazy-callback
handling from Tree RCU.  It moves some of this special casing to Tiny RCU,
the removal of which will be the subject of later commits.

This results in a nice negative delta.

Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Add slab.h #include, thanks to kbuild test robot <lkp@intel.com>. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Joel Fernandes (Google)
e99637becb rcu: Add support for debug_objects debugging for kfree_rcu()
This commit applies RCU's debug_objects debugging to the new batched
kfree_rcu() implementations.  The object is queued at the kfree_rcu()
call and dequeued during reclaim.

Tested that enabling CONFIG_DEBUG_OBJECTS_RCU_HEAD successfully detects
double kfree_rcu() calls.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Fix IRQ per kbuild test robot <lkp@intel.com> feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Joel Fernandes (Google)
0392bebebf rcu: Add multiple in-flight batches of kfree_rcu() work
During testing, it was observed that amount of memory consumed due
kfree_rcu() batching is 300-400MB. Previously we had only a single
head_free pointer pointing to the list of rcu_head(s) that are to be
freed after a grace period. Until this list is drained, we cannot queue
any more objects on it since such objects may not be ready to be
reclaimed when the worker thread eventually gets to drainin g the
head_free list.

We can do better by maintaining multiple lists as done by this patch.
Testing shows that memory consumption came down by around 100-150MB with
just adding another list. Adding more than 1 additional list did not
show any improvement.

Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Code style and initialization handling. ]
[ paulmck: Fix field name, reported by kbuild test robot <lkp@intel.com>. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Joel Fernandes
569d767087 rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
Because the ->monitor_todo field is always protected by krcp->lock,
this commit downgrades from xchg() to non-atomic unmarked assignment
statements.

Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
[ paulmck: Update to include early-boot kick code. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Joel Fernandes (Google)
e6e78b004f rcuperf: Add kfree_rcu() performance Tests
This test runs kfree_rcu() in a loop to measure performance of the new
kfree_rcu() batching functionality.

The following table shows results when booting with arguments:
rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000
rcuperf.kfree_rcu_test=1 rcuperf.kfree_no_batch=X

rcuperf.kfree_no_batch=X    # Grace Periods	Test Duration (s)
  X=1 (old behavior)              9133                 11.5
  X=0 (new behavior)              1732                 12.5

On a 16 CPU system with the above boot parameters, we see that the total
number of grace periods that elapse during the test drops from 9133 when
not batching to 1732 when batching (a 5X improvement). The kfree_rcu()
flood itself slows down a bit when batching, though, as shown.

Note that the active memory consumption during the kfree_rcu() flood
does increase to around 200-250MB due to the batching (from around 50MB
without batching). However, this memory consumption is relatively
constant. In other words, the system is able to keep up with the
kfree_rcu() load. The memory consumption comes down considerably if
KFREE_DRAIN_JIFFIES is increased from HZ/50 to HZ/80. A later patch will
reduce memory consumption further by using multiple lists.

Also, when running the test, please disable CONFIG_DEBUG_PREEMPT and
CONFIG_PROVE_RCU for realistic comparisons with/without batching.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Byungchul Park
a35d16905e rcu: Add basic support for kfree_rcu() batching
Recently a discussion about stability and performance of a system
involving a high rate of kfree_rcu() calls surfaced on the list [1]
which led to another discussion how to prepare for this situation.

This patch adds basic batching support for kfree_rcu(). It is "basic"
because we do none of the slab management, dynamic allocation, code
moving or any of the other things, some of which previous attempts did
[2]. These fancier improvements can be follow-up patches and there are
different ideas being discussed in those regards. This is an effort to
start simple, and build up from there. In the future, an extension to
use kfree_bulk and possibly per-slab batching could be done to further
improve performance due to cache-locality and slab-specific bulk free
optimizations. By using an array of pointers, the worker thread
processing the work would need to read lesser data since it does not
need to deal with large rcu_head(s) any longer.

Torture tests follow in the next patch and show improvements of around
5x reduction in number of  grace periods on a 16 CPU system. More
details and test data are in that patch.

There is an implication with rcu_barrier() with this patch. Since the
kfree_rcu() calls can be batched, and may not be handed yet to the RCU
machinery in fact, the monitor may not have even run yet to do the
queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
to wait for those kfree_rcu()s that are already made. So this means a
kfree_rcu() followed by an rcu_barrier() does not imply that memory will
be freed once rcu_barrier() returns.

Another implication is higher active memory usage (although not
run-away..) until the kfree_rcu() flooding ends, in comparison to
without batching. More details about this are in the second patch which
adds an rcuperf test.

Finally, in the near future we will get rid of kfree_rcu() special casing
within RCU such as in rcu_do_batch and switch everything to just
batching. Currently we don't do that since timer subsystem is not yet up
and we cannot schedule the kfree_rcu() monitor as the timer subsystem's
lock are not initialized. That would also mean getting rid of
kfree_call_rcu_nobatch() entirely.

[1] http://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org
[2] https://lkml.org/lkml/2017/12/19/824

Cc: kernel-team@android.com
Cc: kernel-team@lge.com
Co-developed-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ]
[ paulmck: Make it work during early boot. ]
[ paulmck: Add a crude early boot self-test. ]
[ paulmck: Style adjustments and experimental docbook structure header. ]
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.9999.1908161931110.32497@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:17:03 -08:00
Paul E. McKenney
c30fe54189 rcu: Mark non-global functions and variables as static
Each of rcu_state, rcu_rnp_online_cpus(), rcu_dynticks_curr_cpu_in_eqs(),
and rcu_dynticks_snap() are used only in the kernel/rcu/tree.o translation
unit, and may thus be marked static.  This commit therefore makes this
change.

Reported-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-12-12 10:24:52 -08:00
Paul E. McKenney
5155be9994 rcutorture: Dynamically allocate rcu_fwds structure
This commit switches from static structure to dynamic allocation
for rcu_fwds as another step towards providing multiple call_rcu()
forward-progress kthreads.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 13:00:29 -08:00
Paul E. McKenney
6764100bd2 rcutorture: Complete threading rcu_fwd pointers through functions
This commit threads pointers to rcu_fwd structures through the remaining
functions using rcu_fwds directly, namely rcu_torture_fwd_prog_cbfree(),
rcutorture_oom_notify() and rcu_torture_fwd_prog_init().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 13:00:28 -08:00
Paul E. McKenney
7beba0c06b rcutorture: Move to dynamic initialization of rcu_fwds
In order to add multiple call_rcu() forward-progress kthreads, it will
be necessary to dynamically allocate and initialize.  This commit
therefore moves the initialization from compile time to instead
immediately precede thread-creation time.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 13:00:28 -08:00
Paul E. McKenney
6b1b832546 rcutorture: Thread rcu_fwd pointer through forward-progress functions
In order to add multiple kthreads, it will be necessary to allow
the various functions to operate on a pointer to their kthread's
rcu_fwd structure.  This commit therefore starts the process of
adding the needed "struct rcu_fwd" parameters and arguments to the
various callback forward-progress functions.

Note that rcutorture_oom_notify() and rcu_torture_fwd_cb_hist() will
eventually need to iterate over all kthreads' rcu_fwd structures.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 13:00:28 -08:00
Paul E. McKenney
a289e608b3 rcutorture: Pull callback forward-progress data into rcu_fwd struct
Now that RCU behaves reasonably well with the current single-kthread
call_rcu() forward-progress testing, it is time to add more kthreads.
This commit takes a first step towards that goal by wrapping what
will be the per-kthread data into a new rcu_fwd structure.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 13:00:27 -08:00
Sebastian Andrzej Siewior
90326f0521 rcu: Use CONFIG_PREEMPTION where appropriate
The config option `CONFIG_PREEMPT' is used for the preemption model
"Low-Latency Desktop". The config option `CONFIG_PREEMPTION' is enabled
when kernel preemption is enabled which is true for the preemption model
`CONFIG_PREEMPT' and `CONFIG_PREEMPT_RT'.

Use `CONFIG_PREEMPTION' if it applies to both preemption models and not
just to `CONFIG_PREEMPT'.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: rcu@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:51 -08:00
Lai Jiangshan
b3e627d3d5 rcu: Make PREEMPT_RCU be a modifier to TREE_RCU
Currently PREEMPT_RCU and TREE_RCU are mutually exclusive Kconfig
options.  But PREEMPT_RCU actually specifies a kind of TREE_RCU,
namely a preemptible TREE_RCU. This commit therefore makes PREEMPT_RCU
be a modifer to the TREE_RCU Kconfig option.  This has the benefit of
simplifying several of the #if expressions that formerly needed to
check both, but now need only check one or the other.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:51 -08:00
Paul E. McKenney
03bd2983d7 rcu: Use lockdep rather than comment to enforce lock held
The rcu_preempt_check_blocked_tasks() function has a comment
that states that the rcu_node structure's ->lock must be held,
which might be informative, but which carries little weight if
not read.  This commit therefore removes this comment in favor of
raw_lockdep_assert_held_rcu_node(), which will complain quite
visibly if the required lock is not held.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:50 -08:00
Eric Dumazet
6935c3983b rcu: Avoid data-race in rcu_gp_fqs_check_wake()
The rcu_gp_fqs_check_wake() function uses rcu_preempt_blocked_readers_cgp()
to read ->gp_tasks while other cpus might overwrite this field.

We need READ_ONCE()/WRITE_ONCE() pairs to avoid compiler
tricks and KCSAN splats like the following :

BUG: KCSAN: data-race in rcu_gp_fqs_check_wake / rcu_preempt_deferred_qs_irqrestore

write to 0xffffffff85a7f190 of 8 bytes by task 7317 on cpu 0:
 rcu_preempt_deferred_qs_irqrestore+0x43d/0x580 kernel/rcu/tree_plugin.h:507
 rcu_read_unlock_special+0xec/0x370 kernel/rcu/tree_plugin.h:659
 __rcu_read_unlock+0xcf/0xe0 kernel/rcu/tree_plugin.h:394
 rcu_read_unlock include/linux/rcupdate.h:645 [inline]
 __ip_queue_xmit+0x3b0/0xa40 net/ipv4/ip_output.c:533
 ip_queue_xmit+0x45/0x60 include/net/ip.h:236
 __tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158
 __tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685
 tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691
 tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575
 tcp_recvmsg+0x633/0x1a30 net/ipv4/tcp.c:2179
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:871 [inline]
 sock_recvmsg net/socket.c:889 [inline]
 sock_recvmsg+0x92/0xb0 net/socket.c:885
 sock_read_iter+0x15f/0x1e0 net/socket.c:967
 call_read_iter include/linux/fs.h:1864 [inline]
 new_sync_read+0x389/0x4f0 fs/read_write.c:414

read to 0xffffffff85a7f190 of 8 bytes by task 10 on cpu 1:
 rcu_gp_fqs_check_wake kernel/rcu/tree.c:1556 [inline]
 rcu_gp_fqs_check_wake+0x93/0xd0 kernel/rcu/tree.c:1546
 rcu_gp_fqs_loop+0x36c/0x580 kernel/rcu/tree.c:1611
 rcu_gp_kthread+0x143/0x220 kernel/rcu/tree.c:1768
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10 Comm: rcu_preempt Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
[ paulmck:  Added another READ_ONCE() for RCU CPU stall warnings. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:50 -08:00
Stefan Reiter
610dea36d3 rcu/nocb: Fix dump_tree hierarchy print always active
Commit 18cd8c93e6 ("rcu/nocb: Print gp/cb kthread hierarchy if
dump_tree") added print statements to rcu_organize_nocb_kthreads for
debugging, but incorrectly guarded them, causing the function to always
spew out its message.

This patch fixes it by guarding both pr_alert statements with dump_tree,
while also changing the second pr_alert to a pr_cont, to print the
hierarchy in a single line (assuming that's how it was supposed to
work).

Fixes: 18cd8c93e6 ("rcu/nocb: Print gp/cb kthread hierarchy if dump_tree")
Signed-off-by: Stefan Reiter <stefan@pimaker.at>
[ paulmck: Make single-nocbs-CPU GP kthreads look less erroneous. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:50 -08:00
Paul E. McKenney
df1e849ae4 rcu: Enable tick for nohz_full CPUs slow to provide expedited QS
An expedited grace period can be stalled by a nohz_full CPU looping
in kernel context.  This possibility is currently handled by some
carefully crafted checks in rcu_read_unlock_special() that enlist help
from ksoftirqd when permitted by the scheduler.  However, it is exactly
these checks that require the scheduler avoid holding any of its rq or
pi locks across rcu_read_unlock() without also having held them across
the entire RCU read-side critical section.

It would therefore be very nice if expedited grace periods could
handle nohz_full CPUs looping in kernel context without such checks.
This commit therefore adds code to the expedited grace period's wait
and cleanup code that forces the scheduler-clock interrupt on for CPUs
that fail to quickly supply a quiescent state.  "Quickly" is currently
a hard-coded single-jiffy delay.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:32:59 -08:00
Paul E. McKenney
28f0361fdf rcu: Replace synchronize_sched_expedited_wait() "_sched" with "_rcu"
After RCU flavor consolidation, synchronize_sched_expedited_wait() does
both RCU-preempt and RCU-sched, whichever happens to have been built into
the running kernel.  This commit therefore changes this function's name
to synchronize_rcu_expedited_wait() to reflect its new generic nature.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:59 -08:00
Paul E. McKenney
de8cd0a533 rcu: Update tree_exp.h function-header comments
The function-header comments in kernel/rcu/tree_exp.h have gotten a bit
out of date, so this commit updates a number of them.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:58 -08:00
Paul E. McKenney
6c7d7dbf5b rcu: Rename sync_rcu_preempt_exp_done() to sync_rcu_exp_done()
Now that the RCU flavors have been consolidated, there is one common
function for checking to see if an expedited RCU grace period has
completed, namely sync_rcu_preempt_exp_done().  Because this function is
no longer specific to RCU-preempt, this commit removes the "_preempt" from
its name.  This commit also changes sync_rcu_preempt_exp_done_unlocked()
to sync_rcu_exp_done_unlocked() for the same reason.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:58 -08:00
Neeraj Upadhyay
4bc6b745e5 rcu: Allow only one expedited GP to run concurrently with wakeups
The current expedited RCU grace-period code expects that a task
requesting an expedited grace period cannot awaken until that grace
period has reached the wakeup phase.  However, it is possible for a long
preemption to result in the waiting task never sleeping.  For example,
consider the following sequence of events:

1.	Task A starts an expedited grace period by invoking
	synchronize_rcu_expedited().  It proceeds normally up to the
	wait_event() near the end of that function, and is then preempted
	(or interrupted or whatever).

2.	The expedited grace period completes, and a kworker task starts
	the awaken phase, having incremented the counter and acquired
	the rcu_state structure's .exp_wake_mutex.  This kworker task
	is then preempted or interrupted or whatever.

3.	Task A resumes and enters wait_event(), which notes that the
	expedited grace period has completed, and thus doesn't sleep.

4.	Task B starts an expedited grace period exactly as did Task A,
	complete with the preemption (or whatever delay) just before
	the call to wait_event().

5.	The expedited grace period completes, and another kworker
	task starts the awaken phase, having incremented the counter.
	However, it blocks when attempting to acquire the rcu_state
	structure's .exp_wake_mutex because step 2's kworker task has
	not yet released it.

6.	Steps 4 and 5 repeat, resulting in overflow of the rcu_node
	structure's ->exp_wq[] array.

In theory, this is harmless.  Tasks waiting on the various ->exp_wq[]
array will just be spuriously awakened, but they will just sleep again
on noting that the rcu_state structure's ->expedited_sequence value has
not advanced far enough.

In practice, this wastes CPU time and is an accident waiting to happen.
This commit therefore moves the rcu_exp_gp_seq_end() call that officially
ends the expedited grace period (along with associate tracing) until
after the ->exp_wake_mutex has been acquired.  This prevents Task A from
awakening prematurely, thus preventing more than one expedited grace
period from being in flight during a previous expedited grace period's
wakeup phase.

Fixes: 3b5f668e71 ("rcu: Overlap wakeups with next expedited grace period")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
[ paulmck: Added updated comment. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:57 -08:00
Neeraj Upadhyay
fd6bc19d76 rcu: Fix missed wakeup of exp_wq waiters
Tasks waiting within exp_funnel_lock() for an expedited grace period to
elapse can be starved due to the following sequence of events:

1.	Tasks A and B both attempt to start an expedited grace
	period at about the same time.	This grace period will have
	completed when the lower four bits of the rcu_state structure's
	->expedited_sequence field are 0b'0100', for example, when the
	initial value of this counter is zero.	Task A wins, and thus
	does the actual work of starting the grace period, including
	acquiring the rcu_state structure's .exp_mutex and sets the
	counter to 0b'0001'.

2.	Because task B lost the race to start the grace period, it
	waits on ->expedited_sequence to reach 0b'0100' inside of
	exp_funnel_lock(). This task therefore blocks on the rcu_node
	structure's ->exp_wq[1] field, keeping in mind that the
	end-of-grace-period value of ->expedited_sequence (0b'0100')
	is shifted down two bits before indexing the ->exp_wq[] field.

3.	Task C attempts to start another expedited grace period,
	but blocks on ->exp_mutex, which is still held by Task A.

4.	The aforementioned expedited grace period completes, so that
	->expedited_sequence now has the value 0b'0100'.  A kworker task
	therefore acquires the rcu_state structure's ->exp_wake_mutex
	and starts awakening any tasks waiting for this grace period.

5.	One of the first tasks awakened happens to be Task A.  Task A
	therefore releases the rcu_state structure's ->exp_mutex,
	which allows Task C to start the next expedited grace period,
	which causes the lower four bits of the rcu_state structure's
	->expedited_sequence field to become 0b'0101'.

6.	Task C's expedited grace period completes, so that the lower four
	bits of the rcu_state structure's ->expedited_sequence field now
	become 0b'1000'.

7.	The kworker task from step 4 above continues its wakeups.
	Unfortunately, the wake_up_all() refetches the rcu_state
	structure's .expedited_sequence field:

	wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);

	This results in the wakeup being applied to the rcu_node
	structure's ->exp_wq[2] field, which is unfortunate given that
	Task B is instead waiting on ->exp_wq[1].

On a busy system, no harm is done (or at least no permanent harm is done).
Some later expedited grace period will redo the wakeup.  But on a quiet
system, such as many embedded systems, it might be a good long time before
there was another expedited grace period.  On such embedded systems,
this situation could therefore result in a system hang.

This issue manifested as DPM device timeout during suspend (which
usually qualifies as a quiet time) due to a SCSI device being stuck in
_synchronize_rcu_expedited(), with the following stack trace:

	schedule()
	synchronize_rcu_expedited()
	synchronize_rcu()
	scsi_device_quiesce()
	scsi_bus_suspend()
	dpm_run_callback()
	__device_suspend()

This commit therefore prevents such delays, timeouts, and hangs by
making rcu_exp_wait_wake() use its "s" argument consistently instead of
refetching from rcu_state.expedited_sequence.

Fixes: 3b5f668e71 ("rcu: Overlap wakeups with next expedited grace period")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:57 -08:00
Paul E. McKenney
aca2991a25 rcu: Substitute lookup for bit-twiddling in sync_rcu_exp_select_node_cpus()
The code in sync_rcu_exp_select_node_cpus() calculates the current
CPU's mask within its rcu_node structure's bitmasks, but this has
already been computed in the ->grpmask field of that CPU's rcu_data
structure.  This commit therefore just uses this ->grpmask field.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:57 -08:00
Marco Elver
6cf539a87a rcu: Fix data-race due to atomic_t copy-by-value
This fixes a data-race where `atomic_t dynticks` is copied by value. The
copy is performed non-atomically, resulting in a data-race if `dynticks`
is updated concurrently.

This data-race was found with KCSAN:
==================================================================
BUG: KCSAN: data-race in dyntick_save_progress_counter / rcu_irq_enter

write to 0xffff989dbdbe98e0 of 4 bytes by task 10 on cpu 3:
 atomic_add_return include/asm-generic/atomic-instrumented.h:78 [inline]
 rcu_dynticks_snap kernel/rcu/tree.c:310 [inline]
 dyntick_save_progress_counter+0x43/0x1b0 kernel/rcu/tree.c:984
 force_qs_rnp+0x183/0x200 kernel/rcu/tree.c:2286
 rcu_gp_fqs kernel/rcu/tree.c:1601 [inline]
 rcu_gp_fqs_loop+0x71/0x880 kernel/rcu/tree.c:1653
 rcu_gp_kthread+0x22c/0x3b0 kernel/rcu/tree.c:1799
 kthread+0x1b5/0x200 kernel/kthread.c:255
 <snip>

read to 0xffff989dbdbe98e0 of 4 bytes by task 154 on cpu 7:
 rcu_nmi_enter_common kernel/rcu/tree.c:828 [inline]
 rcu_irq_enter+0xda/0x240 kernel/rcu/tree.c:870
 irq_enter+0x5/0x50 kernel/softirq.c:347
 <snip>

Reported by Kernel Concurrency Sanitizer on:
CPU: 7 PID: 154 Comm: kworker/7:1H Not tainted 5.3.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
==================================================================

Signed-off-by: Marco Elver <elver@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:56 -08:00
Boqun Feng
9f08cf0886 rcu: Avoid modifying mask_ofl_ipi in sync_rcu_exp_select_node_cpus()
The "mask_ofl_ipi" is used to track which CPUs get IPIed, however
in the IPI sending loop, "mask_ofl_ipi" along with another variable
"mask_ofl_test" might also get modified to record which CPUs' quiesent
states must be reported by the sync_rcu_exp_select_node_cpus() at
the end of sync_rcu_exp_select_node_cpus().  This overlap of roles
can be confusing, so this patch cleans things a little by using
"mask_ofl_ipi" solely for determining which CPUs must be IPIed  and
"mask_ofl_test" for solely determining on behalf of  which CPUs
sync_rcu_exp_select_node_cpus() must report a quiscent state.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Marco Elver <elver@google.com>
2019-12-09 12:24:56 -08:00
Paul E. McKenney
15c7c972cd rcu: Use *_ONCE() to protect lockless ->expmask accesses
The rcu_node structure's ->expmask field is accessed locklessly when
starting a new expedited grace period and when reporting an expedited
RCU CPU stall warning.  This commit therefore handles the former by
taking a snapshot of ->expmask while the lock is held and the latter
by applying READ_ONCE() to lockless reads and WRITE_ONCE() to the
corresponding updates.

Link: https://lore.kernel.org/lkml/CANpmjNNmSOagbTpffHr4=Yedckx9Rm2NuGqC9UqE+AOz5f1-ZQ@mail.gmail.com
Reported-by: syzbot+134336b86f728d6e55a0@syzkaller.appspotmail.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
2019-12-09 12:24:56 -08:00
Paul E. McKenney
8dcdfb7096 Merge branches 'doc.2019.10.29a', 'fixes.2019.10.30a', 'nohz.2019.10.28a', 'replace.2019.10.30a', 'torture.2019.10.05a' and 'lkmm.2019.10.05a' into HEAD
doc.2019.10.29a: RCU documentation updates.
fixes.2019.10.30a: RCU miscellaneous fixes.
nohz.2019.10.28a: RCU NO_HZ and NO_HZ_FULL updates.
replace.2019.10.30a: Replace rcu_swap_protected() with rcu_replace().
torture.2019.10.05a: RCU torture-test updates.

lkmm.2019.10.05a: Linux kernel memory model updates.
2019-10-30 08:47:13 -07:00
Paul E. McKenney
36b5dae645 rcu: Suppress levelspread uninitialized messages
New tools bring new warnings, and with v5.3 comes:

kernel/rcu/srcutree.c: warning: 'levelspread[<U aa0>]' may be used uninitialized in this function [-Wuninitialized]:  => 121:34

This commit suppresses this warning by initializing the full array
to INT_MIN, which will result in failures should any out-of-bounds
references appear.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
2019-10-30 08:34:53 -07:00
Dan Carpenter
b8889c9c89 rcu: Fix uninitialized variable in nocb_gp_wait()
We never set this to false.  This probably doesn't affect most people's
runtime because GCC will automatically initialize it to false at certain
common optimization levels.  But that behavior is related to a bug in
GCC and obviously should not be relied on.

Fixes: 5d6742b377 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-30 08:34:53 -07:00
Joel Fernandes (Google)
05ef9e9eb3 rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
The RCU-specific resched_cpu() function sends a resched IPI to the
specified CPU, which can be used to force the tick on for a given
nohz_full CPU.  This is needed when this nohz_full CPU is looping in the
kernel while blocking the current grace period.  However, for the tick
to actually be forced on in all cases, that CPU's rcu_data structure's
->rcu_urgent_qs flag must be set beforehand.  This commit therefore
causes rcu_implicit_dynticks_qs() to set this flag prior to invoking
resched_cpu() on a holdout nohz_full CPU.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-30 08:34:35 -07:00
kbuild test robot
1d24dd4e01 rcu: Several rcu_segcblist functions can be static
None of rcu_segcblist_set_len(), rcu_segcblist_add_len(), or
rcu_segcblist_xchg_len() are used outside of kernel/rcu/rcu_segcblist.c.
This commit therefore makes them static.

Fixes: eda669a6a2 ("rcu/nocb: Atomic ->len field in rcu_segcblist structure")
Signed-off-by: kbuild test robot <lkp@intel.com>
[ paulmck: "Fixes:" updated per Stephen Rothwell feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-30 08:33:22 -07:00
Paul E. McKenney
dd7dafd1ad rcu: Make kernel-mode nohz_full CPUs invoke the RCU core processing
If a nohz_full CPU is idle or executing in userspace, it makes good sense
to keep it out of RCU core processing.  After all, the RCU grace-period
kthread can see its quiescent states and all of its callbacks are
offloaded, so there is nothing for RCU core processing to do.

However, if a nohz_full CPU is executing in kernel space, the RCU
grace-period kthread cannot do anything for it, so such a CPU must report
its own quiescent states.  This commit therefore makes nohz_full CPUs
skip RCU core processing only if the scheduler-clock interrupt caught
them in idle or in userspace.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney
ed93dfc6bc rcu: Confine ->core_needs_qs accesses to the corresponding CPU
Commit 671a63517c ("rcu: Avoid unnecessary softirq when system
is idle") fixed a bug that could result in an indefinite number of
unnecessary invocations of the RCU_SOFTIRQ handler at the trailing edge
of a scheduler-clock interrupt.  However, the fix introduced off-CPU
stores to ->core_needs_qs.  These writes did not conflict with the
on-CPU stores because the CPU's leaf rcu_node structure's ->lock was
held across all such stores.  However, the loads from ->core_needs_qs
were not promoted to READ_ONCE() and, worse yet, the code loading from
->core_needs_qs was written assuming that it was only ever updated by
the corresponding CPU.  So operation has been robust, but only by luck.
This situation is therefore an accident waiting to happen.

This commit therefore takes a different approach.  Instead of clearing
->core_needs_qs from the grace-period kthread's force-quiescent-state
processing, it modifies the rcu_pending() function to suppress the
rcu_sched_clock_irq() function's call to invoke_rcu_core() if there is no
grace period in progress.  This avoids the infinite needless RCU_SOFTIRQ
handlers while still keeping all accesses to ->core_needs_qs local to
the corresponding CPU.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Joel Fernandes (Google)
516e5ae0c9 rcu: Reset CPU hints when reporting a quiescent state
In some cases, tracing shows that need_heavy_qs is still set even though
urgent_qs was cleared upon reporting of a quiescent state.  One such
case is when the softirq reports that a CPU has passed quiescent state.

Commit 671a63517c ("rcu: Avoid unnecessary softirq when system is
idle") fixed a bug where core_needs_qs was not being cleared.  In order
to avoid running into similar situations with the urgent-grace-period
flags, this commit causes rcu_disable_urgency_upon_qs(), previously
rcu_disable_tick_upon_qs(), to clear the urgency hints, ->rcu_urgent_qs
and ->rcu_need_heavy_qs.  Note that it is possible for CPUs to go
offline with these urgency hints still set.  This is handled because
rcu_disable_urgency_upon_qs() is also invoked during the online process.

Because these hints can be cleared both by the corresponding CPU and by
the grace-period kthread, this commit also adds a number of READ_ONCE()
and WRITE_ONCE() calls.

Tested overnight with rcutorture running for 60 minutes on all
configurations of RCU.

Signed-off-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
[ paulmck: Clear urgency flags in rcu_disable_urgency_upon_qs(). ]
[ paulmck: Remove ->core_needs_qs from the set cleared at quiescent state. ]
[ paulmck: Make rcu_disable_urgency_upon_qs static per kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney
b200a04895 rcu: Force nohz_full tick on upon irq enter instead of exit
There is interrupt-exit code that forces on the tick for nohz_full CPUs
failing to respond to the current grace period in a timely fashion.
However, this code must compare ->dynticks_nmi_nesting to the value 2
in the interrupt-exit fastpath.  This commit therefore moves this code
to the interrupt-entry fastpath, where a lighter-weight comparison to
zero may be used.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney
66e4c33b51 rcu: Force tick on for nohz_full CPUs not reaching quiescent states
CPUs running for long time periods in the kernel in nohz_full mode
might leave the scheduling-clock interrupt disabled for then full
duration of their in-kernel execution.  This can (among other things)
delay grace periods.  This commit therefore forces the tick back on
for any nohz_full CPU that is failing to pass through a quiescent state
upon return from interrupt, which the resched_cpu() will induce.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
[ paulmck: Clear ->rcu_forced_tick as reported by Joel Fernandes testing. ]
[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney
fbbd5e358c rcutorture: Make in-kernel-loop testing more brutal
The rcu_torture_fwd_prog_nr() tests the ability of RCU to tolerate
in-kernel busy loops.  It invokes rcu_torture_fwd_prog_cond_resched()
within its delay loop, which, in PREEMPT && NO_HZ_FULL kernels results
in the occasional direct call to schedule().  Now, this direct call to
schedule() is appropriate for call_rcu() flood testing, in which either
the kernel should restrain itself or userspace transitions will supply
the needed restraint.  But in pure in-kernel loops, the occasional
cond_resched() should do the job.

This commit therefore makes rcu_torture_fwd_prog_nr() use cond_resched()
instead of rcu_torture_fwd_prog_cond_resched() in order to increase the
brutality of this aspect of rcutorture testing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 11:50:18 -07:00
Paul E. McKenney
8b5ddf8b99 rcutorture: Separate warnings for each failure type
Currently, each of six different types of failure triggers a
single WARN_ON_ONCE(), and it is then necessary to stare at the
rcu_torture_stats(), Reader Pipe, and Reader Batch lines looking for
inappropriately non-zero values.  This can be annoying and error-prone,
so this commit provides a separate WARN_ON_ONCE() for each of the
six error conditions and adds short comments to each to ease error
identification.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 11:50:03 -07:00
Ethan Hansen
b3ffb206dd rcu: Remove unused variable rcu_perf_writer_state
The variable rcu_perf_writer_state is declared and initialized,
but is never actually referenced. Remove it to clean code.

Signed-off-by: Ethan Hansen <1ethanhansen@gmail.com>
[ paulmck: Also removed unused macros assigned to that variable. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 11:49:36 -07:00
Ethan Hansen
ac5f636130 rcu: Remove unused function rcutorture_record_progress()
The function rcutorture_record_progress() is declared in rcu.h, but is
never used.  This commit therefore removes rcutorture_record_progress()
to clean code.

Signed-off-by: Ethan Hansen <1ethanhansen@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 11:48:13 -07:00
Paul E. McKenney
79ba7ff5a9 rcutorture: Emulate dyntick aspect of userspace nohz_full sojourn
During an actual call_rcu() flood, there would be frequent trips to
userspace (in-kernel call_rcu() floods must be otherwise housebroken).
Userspace execution on nohz_full CPUs implies an RCU dyntick idle/not-idle
transition pair, so this commit adds emulation of that pair.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:05 -07:00
Paul E. McKenney
96926686de rcu: Make CPU-hotplug removal operations enable tick
CPU-hotplug removal operations run the multi_cpu_stop() function, which
relies on the scheduler to gain control from whatever is running on the
various online CPUs, including any nohz_full CPUs running long loops in
kernel-mode code.  Lack of the scheduler-clock interrupt on such CPUs
can delay multi_cpu_stop() for several minutes and can also result in
RCU CPU stall warnings.  This commit therefore causes CPU-hotplug removal
operations to enable the scheduler-clock interrupt on all online CPUs.

[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
[ paulmck: Apply simplifications suggested by Frederic Weisbecker. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:05 -07:00
Paul E. McKenney
366237e7b0 stop_machine: Provide RCU quiescent state in multi_cpu_stop()
When multi_cpu_stop() loops waiting for other tasks, it can trigger an RCU
CPU stall warning.  This can be misleading because what is instead needed
is information on whatever task is blocking multi_cpu_stop().  This commit
therefore inserts an RCU quiescent state into the multi_cpu_stop()
function's waitloop.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:05 -07:00
Paul E. McKenney
d38e6dc6ed rcutorture: Force on tick for readers and callback flooders
Readers and callback flooders in the rcutorture stress-test suite run for
extended time periods by design.  They do take pains to relinquish the
CPU from time to time, but in some cases this relies on the scheduler
being active, which in turn relies on the scheduler-clock interrupt
firing from time to time.

This commit therefore forces scheduling-clock interrupts within
these loops.  While in the area, this commit also prevents
rcu_torture_reader()'s occasional timed sleeps from delaying shutdown.

[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:04 -07:00
Paul E. McKenney
6a949b7af8 rcu: Force on tick when invoking lots of callbacks
Callback invocation can run for a significant time period, and within
CONFIG_NO_HZ_FULL=y kernels, this period will be devoid of scheduler-clock
interrupts.  In-kernel execution without such interrupts can cause all
manner of malfunction, with RCU CPU stall warnings being but one result.

This commit therefore forces scheduling-clock interrupts on whenever more
than a few RCU callbacks are invoked.  Because offloaded callback invocation
can be preempted, this forcing is withdrawn on each context switch.  This
in turn requires that the loop invoking RCU callbacks reiterate the forcing
periodically.

[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
[ paulmck: Remove NO_HZ_FULL check per Frederic Weisbecker feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:04 -07:00
Linus Torvalds
7e67a85999 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
   Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
   Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

   As perf and the scheduler is getting bigger and more complex,
   document the status quo of current responsibilities and interests,
   and spread the review pain^H^H^H^H fun via an increase in the Cc:
   linecount generated by scripts/get_maintainer.pl. :-)

 - Add another series of patches that brings the -rt (PREEMPT_RT) tree
   closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
   into a new CONFIG_PREEMPTION category that will allow the eventual
   introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
   to go though.

 - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
   allow the finer shaping of CPU bandwidth usage.

 - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

 - Improve the behavior of high CPU count, high thread count
   applications running under cpu.cfs_quota_us constraints.

 - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

 - Improve CPU isolation housekeeping CPU allocation NUMA locality.

 - Fix deadline scheduler bandwidth calculations and logic when cpusets
   rebuilds the topology, or when it gets deadline-throttled while it's
   being offlined.

 - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
   setscheduler() system calls without creating global serialization.
   Add new synchronization between cpuset topology-changing events and
   the deadline acceptance tests in setscheduler(), which were broken
   before.

 - Rework the active_mm state machine to be less confusing and more
   optimal.

 - Rework (simplify) the pick_next_task() slowpath.

 - Improve load-balancing on AMD EPYC systems.

 - ... and misc cleanups, smaller fixes and improvements - please see
   the Git log for more details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  sched/psi: Correct overly pessimistic size calculation
  sched/fair: Speed-up energy-aware wake-ups
  sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
  sched/uclamp: Update CPU's refcount on TG's clamp changes
  sched/uclamp: Use TG's clamps to restrict TASK's clamps
  sched/uclamp: Propagate system defaults to the root group
  sched/uclamp: Propagate parent clamps
  sched/uclamp: Extend CPU's cgroup controller
  sched/topology: Improve load balancing on AMD EPYC systems
  arch, ia64: Make NUMA select SMP
  sched, perf: MAINTAINERS update, add submaintainers and reviewers
  sched/fair: Use rq_lock/unlock in online_fair_sched_group
  cpufreq: schedutil: fix equation in comment
  sched: Rework pick_next_task() slow-path
  sched: Allow put_prev_task() to drop rq->lock
  sched/fair: Expose newidle_balance()
  sched: Add task_struct pointer to sched_class::set_curr_task
  sched: Rework CPU hotplug task selection
  sched/{rt,deadline}: Fix set_next_task vs pick_next_task
  sched: Fix kerneldoc comment for ia64_set_curr_task
  ...
2019-09-16 17:25:49 -07:00
Ingo Molnar
563c4f85f9 Merge branch 'sched/rt' into sched/core, to pick up -rt changes
Pick up the first couple of patches working towards PREEMPT_RT.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-16 14:05:04 +02:00
Eric Dumazet
cfcdef5e30 rcu: Allow rcu_do_batch() to dynamically adjust batch sizes
Bimodal behavior of rcu_do_batch() is not really suited to Google
applications like gfe servers.

When a process with millions of sockets exits, closing all files
queues two rcu callbacks per socket.

This eventually reaches the point where RCU enters an emergency
mode, where rcu_do_batch() do not return until whole queue is flushed.

Each rcu callback lasts at least 70 nsec, so with millions of
elements, we easily spend more than 100 msec without rescheduling.

Goal of this patch is to avoid the infamous message like following
"need_resched set for > 51999388 ns (52 ticks) without schedule"

We dynamically adjust the number of elements we process, instead
of 10 / INFINITE choices, we use a floor of ~1 % of current entries.

If the number is above 1000, we switch to a time based limit of 3 msec
per batch, adjustable with /sys/module/rcutree/parameters/rcu_resched_ns

Signed-off-by: Eric Dumazet <edumazet@google.com>
[ paulmck: Forward-port and remove debug statements. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
f48fe4c586 rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload
When under overload conditions, __call_rcu_nocb_wake() will wake the
no-CBs GP kthread any time the no-CBs CB kthread is asleep or there
are no ready-to-invoke callbacks, but only after a timer delay.  If the
no-CBs GP kthread has a ->nocb_bypass_timer pending, the deferred wakeup
from __call_rcu_nocb_wake() is redundant.  This commit therefore makes
__call_rcu_nocb_wake() avoid posting the redundant deferred wakeup if
->nocb_bypass_timer is pending.  This requires adding a bit of ordering
of timer actions.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
296181d78d rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention
Currently, __call_rcu_nocb_wake() advances callbacks each time that it
detects excessive numbers of callbacks, though only if it succeeds in
conditionally acquiring its leaf rcu_node structure's ->lock.  Despite
the conditional acquisition of ->lock, this does increase contention.
This commit therefore avoids advancing callbacks unless there are
callbacks in ->cblist whose grace period has completed and advancing
has not yet been done during this jiffy.

Note that this decision does not take the presence of new callbacks
into account.  That is because on this code path, there will always be
at least one new callback, namely the one we just enqueued.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
1d5a81c18d rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention
Currently, nocb_cb_wait() advances callbacks on each pass through its
loop, though only if it succeeds in conditionally acquiring its leaf
rcu_node structure's ->lock.  Despite the conditional acquisition of
->lock, this does increase contention.  This commit therefore avoids
advancing callbacks unless there are callbacks in ->cblist whose grace
period has completed.

Note that nocb_cb_wait() doesn't worry about callbacks that have not
yet been assigned a grace period.  The idea is that the only reason for
nocb_cb_wait() to advance callbacks is to allow it to continue invoking
callbacks.  Time will tell whether this is the correct choice.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
23651d9b96 rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()
The rcutree_migrate_callbacks() invokes rcu_advance_cbs() on both the
offlined CPU's ->cblist and that of the surviving CPU, then merges
them.  However, after the merge, and of the offlined CPU's callbacks
that were not ready to be invoked will no longer be associated with a
grace-period number.  This commit therefore invokes rcu_advance_cbs()
one more time on the merged ->cblist in order to assign a grace-period
number to these callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
273f034065 rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake()
When callbacks are in full flow, the common case is waiting for a
grace period, and this grace period will normally take a few jiffies to
complete.  It therefore isn't all that helpful for __call_rcu_nocb_wake()
to do a synchronous wakeup in this case.  This commit therefore turns this
into a timer-based deferred wakeup of the no-CBs grace-period kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
f7a81b12d6 rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed
This commit causes locking, sleeping, and callback state to be printed
for no-CBs CPUs when the rcutorture writer is delayed sufficiently for
rcutorture to complain.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
6aacd88d17 rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney
d1b222c6be rcu/nocb: Add bypass callback queueing
Use of the rcu_data structure's segmented ->cblist for no-CBs CPUs
takes advantage of unrelated grace periods, thus reducing the memory
footprint in the face of floods of call_rcu() invocations.  However,
the ->cblist field is a more-complex rcu_segcblist structure which must
be protected via locking.  Even though there are only three entities
which can acquire this lock (the CPU invoking call_rcu(), the no-CBs
grace-period kthread, and the no-CBs callbacks kthread), the contention
on this lock is excessive under heavy stress.

This commit therefore greatly reduces contention by provisioning
an rcu_cblist structure field named ->nocb_bypass within the
rcu_data structure.  Each no-CBs CPU is permitted only a limited
number of enqueues onto the ->cblist per jiffy, controlled by a new
nocb_nobypass_lim_per_jiffy kernel boot parameter that defaults to
about 16 enqueues per millisecond (16 * 1000 / HZ).  When that limit is
exceeded, the CPU instead enqueues onto the new ->nocb_bypass.

The ->nocb_bypass is flushed into the ->cblist every jiffy or when
the number of callbacks on ->nocb_bypass exceeds qhimark, whichever
happens first.  During call_rcu() floods, this flushing is carried out
by the CPU during the course of its call_rcu() invocations.  However,
a CPU could simply stop invoking call_rcu() at any time.  The no-CBs
grace-period kthread therefore carries out less-aggressive flushing
(every few jiffies or when the number of callbacks on ->nocb_bypass
exceeds (2 * qhimark), whichever comes first).  This means that the
no-CBs grace-period kthread cannot be permitted to do unbounded waits
while there are callbacks on ->nocb_bypass.  A ->nocb_bypass_timer is
used to provide the needed wakeups.

[ paulmck: Apply Coverity feedback reported by Colin Ian King. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:37:32 -07:00
Paul E. McKenney
eda669a6a2 rcu/nocb: Atomic ->len field in rcu_segcblist structure
Upcoming ->nocb_lock contention-reduction work requires that the
rcu_segcblist structure's ->len field be concurrently manipulated,
but only if there are no-CBs CPUs in the kernel.  This commit
therefore makes this ->len field be an atomic_long_t, but only
in CONFIG_RCU_NOCB_CPU=y kernels.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
faca5c2509 rcu/nocb: Unconditionally advance and wake for excessive CBs
When there are excessive numbers of callbacks, and when either the
corresponding no-CBs callback kthread is asleep or there is no more
ready-to-invoke callbacks, and when least one callback is pending,
__call_rcu_nocb_wake() will advance the callbacks, but refrain from
awakening the corresponding no-CBs grace-period kthread.  However,
because rcu_advance_cbs_nowake() is used, it is possible (if a bit
unlikely) that the needed advancement could not happen due to a grace
period not being in progress.  Plus there will always be at least one
pending callback due to one having just now been enqueued.

This commit therefore attempts to advance callbacks and awakens the
no-CBs grace-period kthread when there are excessive numbers of callbacks
posted and when the no-CBs callback kthread is not in a position to do
anything helpful.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
4fd8c5f153 rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock
The sleep/wakeup of the no-CBs grace-period kthreads is synchronized
using the ->nocb_lock of the first CPU corresponding to that kthread.
This commit provides a separate ->nocb_gp_lock for this purpose, thus
reducing contention on ->nocb_lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
523bddd553 rcu/nocb: Reduce contention at no-CBs invocation-done time
Currently, nocb_cb_wait() unconditionally acquires the leaf rcu_node
->lock to advance callbacks when done invoking the previous batch.
It does this while holding ->nocb_lock, which means that contention on
the leaf rcu_node ->lock visits itself on the ->nocb_lock.  This commit
therefore makes this lock acquisition conditional, forgoing callback
advancement when the leaf rcu_node ->lock is not immediately available.
(In this case, the no-CBs grace-period kthread will eventually do any
needed callback advancement.)

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
6608c3a027 rcu/nocb: Reduce contention at no-CBs registry-time CB advancement
Currently, __call_rcu_nocb_wake() conditionally acquires the leaf rcu_node
structure's ->lock, and only afterwards does rcu_advance_cbs_nowake()
check to see if it is possible to advance callbacks without potentially
needing to awaken the grace-period kthread.  Given that the no-awaken
check can be done locklessly, this commit reverses the order, so that
rcu_advance_cbs_nowake() is invoked without holding the leaf rcu_node
structure's ->lock and rcu_advance_cbs_nowake() checks the grace-period
state before conditionally acquiring that lock, thus reducing the number
of needless acquistions of the leaf rcu_node structure's ->lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
9fcb09bddd rcu/nocb: Round down for number of no-CBs grace-period kthreads
Currently, when the square root of the number of CPUs is rounded down
by int_sqrt(), this round-down is applied to the number of callback
kthreads per grace-period kthreads.  This makes almost no difference
for large systems, but results in oddities such as three no-CBs
grace-period kthreads for a five-CPU system, which is a bit excessive.
This commit therefore causes the round-down to apply to the number of
no-CBs grace-period kthreads, so that systems with from four to eight
CPUs have only two no-CBs grace period kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
81c0b3d724 rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU
A given rcu_data structure's ->nocb_lock can be acquired very frequently
by the corresponding CPU and occasionally by the corresponding no-CBs
grace-period and callbacks kthreads.  In particular, these two kthreads
will have frequent gaps between ->nocb_lock acquisitions that are roughly
a grace period in duration.  This means that any excessive ->nocb_lock
contention will be due to the CPU's acquisitions, and this in turn
enables a very naive contention-avoidance strategy to be quite effective.

This commit therefore modifies rcu_nocb_lock() to first
attempt a raw_spin_trylock(), and to atomically increment a
separate ->nocb_lock_contended across a raw_spin_lock().  This new
->nocb_lock_contended field is checked in __call_rcu_nocb_wake() when
interrupts are enabled, with a spin-wait for contending acquisitions
to complete, thus allowing the kthreads a chance to acquire the lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
7f36ef82e5 rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread
Currently, the code provides an extra wakeup for the no-CBs grace-period
kthread if one of its CPUs is generating excessive numbers of callbacks.
But satisfying though it is to wake something up when things are going
south, unless the thing being awakened can actually help solve the
problem, that extra wakeup does nothing but consume additional CPU time,
which is exactly what you don't want during a call_rcu() flood.

This commit therefore avoids doing anything if the corresponding
no-CBs callback kthread is going full tilt.  Otherwise, if advancing
callbacks immediately might help and if the leaf rcu_node structure's
lock is immediately available, this commit invokes a new variant of
rcu_advance_cbs() that advances callbacks only if doing so won't require
awakening the grace-period kthread (not to be confused with any of the
no-CBs grace-period kthreads).

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
ce0a825e40 rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks
It might be hard to imagine having more than two billion callbacks
queued on a single CPU's ->cblist, but someone will do it sometime.
This commit therefore makes __call_rcu_nocb_wake() handle this situation
by upgrading local variable "len" from "int" to "long".

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
383e133283 rcu/nocb: Never downgrade ->nocb_defer_wakeup in wake_nocb_gp_defer()
Currently, wake_nocb_gp_defer() simply stores whatever waketype was
passed in, which can result in a RCU_NOCB_WAKE_FORCE being downgraded
to RCU_NOCB_WAKE, which could in turn delay callback processing.
This commit therefore adds a check so that wake_nocb_gp_defer() only
updates ->nocb_defer_wakeup when the update increases the forcefulness,
thus avoiding downgrades.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
aeeacd9d84 rcu/nocb: Enable re-awakening under high callback load
The __call_rcu_nocb_wake() function and its predecessors set
->qlen_last_fqs_check to zero for the first callback and to LONG_MAX / 2
for forced reawakenings.  The former can result in a too-quick reawakening
when there are many callbacks ready to invoke and the latter prevents a
second reawakening.  This commit therefore sets ->qlen_last_fqs_check
to the current number of callbacks in both cases.  While in the area,
this commit also moves both assignments under ->nocb_lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
0bd55c6936 rcu/nohz: Turn off tick for offloaded CPUs
Historically, no-CBs CPUs allowed the scheduler-clock tick to be
unconditionally disabled on any transition to idle or nohz_full userspace
execution (see the rcu_needs_cpu() implementations).  Unfortunately,
the checks used by rcu_needs_cpu() are defeated now that no-CBs CPUs
use ->cblist, which might make users of battery-powered devices rather
unhappy.  This commit therefore adds explicit rcu_segcblist_is_offloaded()
checks to return to the historical energy-efficient semantics.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
969974e5c5 rcu/nocb: Suppress uninitialized false-positive in nocb_gp_wait()
Some compilers complain that wait_gp_seq might be used uninitialized
in nocb_gp_wait().  This cannot actually happen because when wait_gp_seq
is uninitialized, needwait_gp must be false, which prevents wait_gp_seq
from being used.  But this analysis is apparently beyond some compilers,
so this commit adds a bogus initialization of wait_gp_seq for the sole
purpose of suppressing the false-positive warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
921bb5fad1 rcu/nocb: Use build-time no-CBs check in rcu_pending()
Currently, rcu_pending() invokes rcu_segcblist_is_offloaded() even
in CONFIG_RCU_NOCB_CPU=n kernels, which cannot possibly be offloaded.
Given that rcu_pending() is on a fastpath, it makes sense to check for
CONFIG_RCU_NOCB_CPU=y before invoking rcu_segcblist_is_offloaded().
This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
c1ab99d66e rcu/nocb: Use build-time no-CBs check in rcu_core()
Currently, rcu_core() invokes rcu_segcblist_is_offloaded() each time it
needs to know whether the current CPU is a no-CBs CPU.  Given that it is
not possible to change the no-CBs status of a CPU after boot, and given
that it is not possible to even have no-CBs CPUs in CONFIG_RCU_NOCB_CPU=n
kernels, this repeated runtime invocation wastes CPU.  This commit
therefore created a const on-stack variable to allow this check to be
done only once per rcu_core() invocation.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
ec5ef87bac rcu/nocb: Use build-time no-CBs check in rcu_do_batch()
Currently, rcu_do_batch() invokes rcu_segcblist_is_offloaded() each time
it needs to know whether the current CPU is a no-CBs CPU.  Given that it
is not possible to change the no-CBs status of a CPU after boot, and given
that it is not possible to even have no-CBs CPUs in CONFIG_RCU_NOCB_CPU=n
kernels, this per-callback invocation wastes CPU.  This commit therefore
created a const on-stack variable to allow this check to be done only
once per rcu_do_batch() invocation.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
4f9c1bc727 rcu/nocb: Remove obsolete nocb_gp_head and nocb_gp_tail fields
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
2a777de757 rcu/nocb: Remove obsolete nocb_cb_tail and nocb_cb_head fields
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
c035280f17 rcu/nocb: Remove obsolete nocb_q_count and nocb_q_count_lazy fields
This commit removes the obsolete nocb_q_count and nocb_q_count_lazy
fields, also removing rcu_get_n_cbs_nocb_cpu(), adjusting
rcu_get_n_cbs_cpu(), and making rcutree_migrate_callbacks() once again
disable the ->cblist fields of offline CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
e7f4c5b399 rcu/nocb: Remove obsolete nocb_head and nocb_tail fields
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
5d6742b377 rcu/nocb: Use rcu_segcblist for no-CBs CPUs
Currently the RCU callbacks for no-CBs CPUs are queued on a series of
ad-hoc linked lists, which means that these callbacks cannot benefit
from "drive-by" grace periods, thus suffering needless delays prior
to invocation.  In addition, the no-CBs grace-period kthreads first
wait for callbacks to appear and later wait for a new grace period,
which means that callbacks appearing during a grace-period wait can
be delayed.  These delays increase memory footprint, and could even
result in an out-of-memory condition.

This commit therefore enqueues RCU callbacks from no-CBs CPUs on the
rcu_segcblist structure that is already used by non-no-CBs CPUs.  It also
restructures the no-CBs grace-period kthread to be checking for incoming
callbacks while waiting for grace periods.  Also, instead of waiting
for a new grace period, it waits for the closest grace period that will
cause some of the callbacks to be safe to invoke.  All of these changes
reduce callback latency and thus the number of outstanding callbacks,
in turn reducing the probability of an out-of-memory condition.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
e83e73f5b0 rcu/nocb: Leave ->cblist enabled for no-CBs CPUs
As a first step towards making no-CBs CPUs use the ->cblist, this commit
leaves the ->cblist enabled for these CPUs.  The main reason to make
no-CBs CPUs use ->cblist is to take advantage of callback numbering,
which will reduce the effects of missed grace periods which in turn will
reduce forward-progress problems for no-CBs CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
e6060b41c9 rcu/nocb: Allow lockless use of rcu_segcblist_empty()
Currently, rcu_segcblist_empty() assumes that the callback list is not
being changed by other CPUs, but upcoming changes will require it to
operate locklessly.  This commit therefore adds the needed READ_ONCE()
call, along with the WRITE_ONCE() calls when updating the callback list's
->head field.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
76c6927c3e rcu/nocb: Allow lockless use of rcu_segcblist_restempty()
Currently, rcu_segcblist_restempty() assumes that the callback list
is not being changed by other CPUs, but upcoming changes will require
it to operate locklessly.  This commit therefore adds the needed
READ_ONCE() calls, along with the WRITE_ONCE() calls when updating
the callback list.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
ca5c825808 rcu/nocb: Remove deferred wakeup checks for extended quiescent states
The idea behind the checks for extended quiescent states at the end of
__call_rcu_nocb() is to handle cases where call_rcu() is invoked directly
from within an extended quiescent state, for example, from the idle loop.
However, this will result in a timer-mediated deferred wakeup, which
will cause the needed wakeup to happen within a jiffy or thereabouts.
There should be no forward-progress concerns, and if there are, the proper
response is to exit the extended quiescent state while executing the
endless blast of call_rcu() invocations, for example, using RCU_NONIDLE().
Given the more realistic case of an isolated call_rcu() invocation, there
should be no problem.

This commit therefore removes the checks for invoking call_rcu() within
an extended quiescent state for on no-CBs CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
85f69b3212 rcu/nocb: Check for deferred nocb wakeups before nohz_full early exit
In theory, a timer is used to defer wakeups of no-CBs grace-period
kthreads when the wakeup cannot be done safely directly from the
call_rcu().  In practice, the one-jiffy delay is not always consistent
with timely callback invocation under heavy call_rcu() loads.  Therefore,
there are a number of checks for a pending deferred wakeup, including
from the scheduling-clock interrupt.  Unfortunately, this check follows
the rcu_nohz_full_cpu() early exit, which renders it useless on such CPUs.

This commit therefore moves the check for the pending deferred no-CB
wakeup to precede the rcu_nohz_full_cpu() early exit.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
c00045be32 rcu/nocb: Make rcutree_migrate_callbacks() start at leaf rcu_node structure
Because rcutree_migrate_callbacks() is invoked infrequently and because
an exact snapshot of the grace-period state might save some callbacks a
second trip through a grace period, this function has used the root
rcu_node structure.  However, this safe-second-trip optimization
happens only if rcutree_migrate_callbacks() races with grace-period
initialization, so it is not worth the added mental load.  This commit
therefore makes rcutree_migrate_callbacks() start with the leaf rcu_node
structures, as is done elsewhere.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
750d7f6a43 rcu/nocb: Add checks for offloaded callback processing
This commit is a preparatory patch for offloaded callbacks using the
same ->cblist structure used by non-offloaded callbacks.  It therefore
adds rcu_segcblist_is_offloaded() calls where they will be needed when
!rcu_segcblist_is_enabled() no longer flags the offloaded case.  It also
adds checks in rcu_do_batch() to ensure that there are no missed checks:
Currently, it should not be possible for offloaded execution to reach
rcu_do_batch(), though this will change later in this series.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
ce5215c134 rcu/nocb: Use separate flag to indicate offloaded ->cblist
RCU callback processing currently uses rcu_is_nocb_cpu() to determine
whether or not the current CPU's callbacks are to be offloaded.
This works, but it is not so good for cache locality.  Plus use of
->cblist for offloaded callbacks will greatly increase the frequency
of these checks.  This commit therefore adds a ->offloaded flag to the
rcu_segcblist structure to provide a more flexible and cache-friendly
means of checking for callback offloading.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney
1bb5f9b95a rcu/nocb: Use separate flag to indicate disabled ->cblist
NULLing the RCU_NEXT_TAIL pointer was a clever way to save a byte, but
forward-progress considerations would require that this pointer be both
NULL and non-NULL, which, absent a quantum-computer port of the Linux
kernel, simply won't happen.  This commit therefore creates as separate
->enabled flag to replace the current NULL checks.

[ paulmck: Add include files per 0day test robot and -next. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:34:50 -07:00
Paul E. McKenney
18cd8c93e6 rcu/nocb: Print gp/cb kthread hierarchy if dump_tree
This commit causes the no-CBs grace-period/callback hierarchy to be
printed to the console when the dump_tree kernel boot parameter is set.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
f7c612b000 rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter
This commit changes the name of the rcu_nocb_leader_stride kernel
boot parameter to rcu_nocb_gp_stride in order to account for the new
distinction between callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
f7c9a9b664 rcu/nocb: Rename and document no-CB CB kthread sleep trace event
The nocb_cb_wait() function traces a "FollowerSleep" trace_rcu_nocb_wake()
event, which never was documented and is now misleading.  This commit
therefore changes "FollowerSleep" to "CBSleep", documents this, and
updates the documentation for "Sleep" as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
0bdc33daef rcu/nocb: Rename rcu_organize_nocb_kthreads() local variable
This commit renames rdp_leader to rdp_gp in order to account for the
new distinction between callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
0d52a6652f rcu/nocb: Rename wake_nocb_leader_defer() to wake_nocb_gp_defer()
This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
5f675ba6eb rcu/nocb: Rename __wake_nocb_leader() to __wake_nocb_gp()
This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.  While in the area, it also
updates local variables.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
5d62c08c5f rcu/nocb: Rename wake_nocb_leader() to wake_nocb_gp()
This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
9fa471a881 rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait()
This commit adjusts naming to account for the new distinction between
callback and grace-period no-CBs kthreads.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
12f54c3a84 rcu/nocb: Provide separate no-CBs grace-period kthreads
Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads
are divided into groups.  The first rcuo kthread to come online in a
given group is that group's leader, and the leader both waits for grace
periods and invokes its CPU's callbacks.  The non-leader rcuo kthreads
only invoke callbacks.

This works well in the real-time/embedded environments for which it was
intended because such environments tend not to generate all that many
callbacks.  However, given huge floods of callbacks, it is possible for
the leader kthread to be stuck invoking callbacks while its followers
wait helplessly while their callbacks pile up.  This is a good recipe
for an OOM, and rcutorture's new callback-flood capability does generate
such OOMs.

One strategy would be to wait until such OOMs start happening in
production, but similar OOMs have in fact happened starting in 2018.
It would therefore be wise to take a more proactive approach.

This commit therefore features per-CPU rcuo kthreads that do nothing
but invoke callbacks.  Instead of having one of these kthreads act as
leader, each group has a separate rcog kthread that handles grace periods
for its group.  Because these rcuog kthreads do not invoke callbacks,
callback floods on one CPU no longer block callbacks from reaching the
rcuc callback-invocation kthreads on other CPUs.

This change does introduce additional kthreads, however:

1.	The number of additional kthreads is about the square root of
	the number of CPUs, so that a 4096-CPU system would have only
	about 64 additional kthreads.  Note that recent changes
	decreased the number of rcuo kthreads by a factor of two
	(CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so
	this still represents a significant improvement on most systems.

2.	The leading "rcuo" of the rcuog kthreads should allow existing
	scripting to affinity these additional kthreads as needed, the
	same as for the rcuop and rcuos kthreads.  (There are no longer
	any rcuob kthreads.)

3.	A state-machine approach was considered and rejected.  Although
	this would allow the rcuo kthreads to continue their dual
	leader/follower roles, it complicates callback invocation
	and makes it more difficult to consolidate rcuo callback
	invocation with existing softirq callback invocation.

The introduction of rcuog kthreads should thus be acceptable.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
6484fe54b5 rcu/nocb: Update comments to prepare for forward-progress work
This commit simply rewords comments to prepare for leader nocb kthreads
doing only grace-period work and callback shuffling.  This will mean
the addition of replacement kthreads to invoke callbacks.  The "leader"
and "follower" thus become less meaningful, so the commit changes no-CB
comments with these strings to "GP" and "CB", respectively.  (Give or
take the usual grammatical transformations.)

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
58bf6f77c6 rcu/nocb: Rename rcu_data fields to prepare for forward-progress work
This commit simply renames rcu_data fields to prepare for leader
nocb kthreads doing only grace-period work and callback shuffling.
This will mean the addition of replacement kthreads to invoke callbacks.
The "leader" and "follower" thus become less meaningful, so the commit
changes no-CB fields with these strings to "gp" and "cb", respectively.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:32:39 -07:00
Paul E. McKenney
31da067023 Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a', 'lists.2019.08.13a' and 'torture.2019.08.01b' into HEAD
consolidate.2019.08.01b: Further consolidation cleanups
fixes.2019.08.12a: Miscellaneous fixes
lists.2019.08.13a: Optional lockdep arguments for RCU list macros
torture.2019.08.01b: Torture-test updates
2019-08-13 14:30:30 -07:00
Mukesh Ojha
511b44f759 rcu: Fix spelling mistake "greate"->"great"
This commit fixes a spelling mistake in file tree_exp.h.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-12 11:25:06 -07:00
Paul E. McKenney
b823cafa75 rcu: Remove redundant "if" condition from rcu_gp_is_expedited()
Because rcu_expedited_nesting is initialized to 1 and not decremented
until just before init is spawned, rcu_expedited_nesting is guaranteed
to be non-zero whenever rcu_scheduler_active == RCU_SCHEDULER_INIT.
This commit therefore removes this redundant "if" equality test.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-08-12 11:25:06 -07:00
Joel Fernandes (Google)
28875945ba rcu: Add support for consolidated-RCU reader checking
This commit adds RCU-reader checks to list_for_each_entry_rcu() and
hlist_for_each_entry_rcu().  These checks are optional, and are indicated
by a lockdep expression passed to a new optional argument to these two
macros.  If this optional lockdep expression is omitted, these two macros
act as before, checking for an RCU read-side critical section.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Update to eliminate return within macro and update comment. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-09 11:00:35 -07:00
Peter Zijlstra
130d9c331b rcu/tree: Fix SCHED_FIFO params
A rather embarrasing mistake had us call sched_setscheduler() before
initializing the parameters passed to it.

Fixes: 1a763fd7c6 ("rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
2019-08-08 09:09:30 +02:00
Paul E. McKenney
60013d5d2b rcutorture: Aggressive forward-progress tests shouldn't block shutdown
The more aggressive forward-progress tests can interfere with rcutorture
shutdown, resulting in false-positive diagnostics.  This commit therefore
ends any such tests 30 seconds prior to shutdown.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:30:22 -07:00
Joel Fernandes (Google)
77e9752ce6 rcuperf: Make rcuperf kernel test more robust for !expedited mode
It is possible that the rcuperf kernel test runs concurrently with init
starting up.  During this time, the system is running all grace periods
as expedited.  However, rcuperf can also be run for normal GP tests.
Right now, it depends on a holdoff time before starting the test to
ensure grace periods start later. This works fine with the default
holdoff time however it is not robust in situations where init takes
greater than the holdoff time to finish running. Or, as in my case:

I modified the rcuperf test locally to also run a thread that did
preempt disable/enable in a loop. This had the effect of slowing down
init. The end result was that the "batches:" counter in rcuperf was 0
causing a division by 0 error in the results. This counter was 0 because
only expedited GPs seem to happen, not normal ones which led to the
rcu_state.gp_seq counter remaining constant across grace periods which
unexpectedly happen to be expedited. The system was running expedited
RCU all the time because rcu_unexpedited_gp() would not have run yet
from init.  In other words, the test would concurrently with init
booting in expedited GP mode.

To fix this properly, this commit waits until system_state is set to
SYSTEM_RUNNING before starting the test.  This change is made just
before kernel_init() invokes rcu_end_inkernel_boot(), and this latter
is what turns off boot-time expediting of RCU grace periods.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:30:22 -07:00
Paul E. McKenney
bd1bfc51a3 rcutorture: Emulate userspace sojourn during call_rcu() floods
During an actual call_rcu() flood, there would be frequent trips to
userspace (in-kernel call_rcu() floods must be otherwise housebroken).
Userspace execution allows a great many things to interrupt execution,
and rcutorture needs to also allow such interruptions.  This commit
therefore causes call_rcu() floods to occasionally invoke schedule(),
thus preventing spurious rcutorture failures due to other parts of the
kernel becoming irate at the call_rcu() flood events.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:30:22 -07:00
Xiao Yang
b3f3886c59 rcuperf: Fix perf_type module-parameter description
The rcu_bh rcuperf type was removed by commit 620d246065cd("rcuperf:
Remove the "rcu_bh" and "sched" torture types"), but it lives on in the
MODULE_PARM_DESC() of perf_type.  This commit therefore changes that
module-parameter description to substitute srcu for rcu_bh.

Signed-off-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:30:22 -07:00
Joel Fernandes (Google)
9147089bee rcu: Remove redundant debug_locks check in rcu_read_lock_sched_held()
The debug_locks flag can never be true at the end of
rcu_read_lock_sched_held() because it is already checked by the earlier
call todebug_lockdep_rcu_enabled().   This commit therefore removes this
redundant check.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:17:01 -07:00
Byungchul Park
3545832fc2 rcu: Change return type of rcu_spawn_one_boost_kthread()
The return value of rcu_spawn_one_boost_kthread() is not used any longer.
This commit therefore changes its return type from int to void, and
removes the cast to void from its callers.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:05:51 -07:00
Paul E. McKenney
7e210a653e srcu: Avoid srcutorture security-based pointer obfuscation
Because pointer output is now obfuscated, and because what you really
want to know is whether or not the callback lists are empty, this commit
replaces the srcu_data structure's head callback pointer printout with
a single character that is "." is the callback list is empty or "C"
otherwise.

This is the only remaining user of rcu_segcblist_head(), so this
commit also removes this function's definition.  It also turns out that
rcu_segcblist_tail() no longer has any callers, so this commit removes
that function's definition while in the area.  They were both marked
"Interim", and their end has come.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:05:51 -07:00
Paul E. McKenney
fbad01af8c rcu: Add destroy_work_on_stack() to match INIT_WORK_ONSTACK()
The synchronize_rcu_expedited() function has an INIT_WORK_ONSTACK(),
but lacks the corresponding destroy_work_on_stack().  This commit
therefore adds destroy_work_on_stack().

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
2019-08-01 14:05:51 -07:00
Paul E. McKenney
cdc694b235 rcu: Add kernel parameter to dump trace after RCU CPU stall warning
This commit adds a rcu_cpu_stall_ftrace_dump kernel boot parameter, that,
when set, causes the trace buffer to be dumped after an RCU CPU stall
warning is printed.  This kernel boot parameter is disabled by default,
maintaining compatibility with previous behavior.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:05:51 -07:00
Paul E. McKenney
1f3ebc8253 rcu: Restore barrier() to rcu_read_lock() and rcu_read_unlock()
Commit bb73c52bad ("rcu: Don't disable preemption for Tiny and Tree
RCU readers") removed the barrier() calls from rcu_read_lock() and
rcu_write_lock() in CONFIG_PREEMPT=n&&CONFIG_PREEMPT_COUNT=n kernels.
Within RCU, this commit was OK, but it failed to account for things like
get_user() that can pagefault and that can be reordered by the compiler.
Lack of the barrier() calls in rcu_read_lock() and rcu_read_unlock()
can cause these page faults to migrate into RCU read-side critical
sections, which in CONFIG_PREEMPT=n kernels could result in too-short
grace periods and arbitrary misbehavior.  Please see commit 386afc9114
("spinlocks and preemption points need to be at least compiler barriers")
and Linus's commit 66be4e66a7 ("rcu: locking and unlocking need to
always be at least barriers"), this last of which restores the barrier()
call to both rcu_read_lock() and rcu_read_unlock().

This commit removes barrier() calls that are no longer needed given that
the addition of them in Linus's commit noted above.  The combination of
this commit and Linus's commit effectively reverts commit bb73c52bad
("rcu: Don't disable preemption for Tiny and Tree RCU readers").

Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix embarrassing typo located by Alan Stern. ]
2019-08-01 14:05:51 -07:00
Joel Fernandes (Google)
cb4dbbfaa1 rcu: Simplify rcu_note_context_switch exit from critical section
Because __rcu_read_unlock() can be preempted just before the call to
rcu_read_unlock_special(), it is possible for a task to be preempted just
before it would have fully exited its RCU read-side critical section.
This would result in a needless extension of that critical section until
that task was resumed, which might in turn result in a needlessly
long grace period, needless RCU priority boosting, and needless
force-quiescent-state actions.  Therefore, rcu_note_context_switch()
invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when
it detects this situation.  This action by rcu_note_context_switch()
ends the RCU read-side critical section immediately.

Of course, once the task resumes, it will invoke rcu_read_unlock_special()
redundantly.  This is harmless because the fact that a preemption
happened means that interrupts, preemption, and softirqs cannot
have been disabled, so there would be no deferred quiescent state.
While ->rcu_read_lock_nesting remains less than zero, none of the
->rcu_read_unlock_special.b bits can be set, and they were all zeroed by
the call to rcu_note_context_switch() at task-preemption time.  Therefore,
setting ->rcu_read_unlock_special.b.exp_hint to false has no effect.

Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore()
would return immediately.  With one possible exception, which is
if an expedited grace period started just as the task was being
resumed, which could leave ->exp_deferred_qs set.  This will cause
rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(),
reporting the quiescent state, just as it should.  (Such an expedited
grace period won't affect the preemption code path due to interrupts
having already been disabled.)

But when rcu_note_context_switch() invokes __rcu_read_unlock(), it
is doing so with preemption disabled, hence __rcu_read_unlock() will
unconditionally defer the quiescent state, only to immediately invoke
rcu_preempt_deferred_qs(), thus immediately reporting the deferred
quiescent state.  It turns out to be safe (and faster) to instead
just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock()
middleman.

Because this is the invocation during the preemption (as opposed to
the invocation just after the resume), at least one of the bits in
->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting
must be negative.  This means that rcu_preempt_need_deferred_qs() must
return true, avoiding the early exit from rcu_preempt_deferred_qs().
Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately,
as required.

This commit therefore simplifies the CONFIG_PREEMPT=y version of
rcu_note_context_switch() by removing the "else if" branch of its
"if" statement.  This change means that all callers that would have
invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs()
will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the
rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted.

Cc: rcu@vger.kernel.org
Cc: kernel-team@android.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:04:20 -07:00
Paul E. McKenney
87446b4874 rcu: Make rcu_read_unlock_special() checks match raise_softirq_irqoff()
Threaded interrupts provide additional interesting interactions between
RCU and raise_softirq() that can result in self-deadlocks in v5.0-2 of
the Linux kernel.  These self-deadlocks can be provoked in susceptible
kernels within a few minutes using the following rcutorture command on
an 8-CPU system:

tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs "TREE03" --bootargs "threadirqs"

Although post-v5.2 RCU commits have at least greatly reduced the
probability of these self-deadlocks, this was entirely by accident.
Although this sort of accident should be rowdily celebrated on those
rare occasions when it does occur, such celebrations should be quickly
followed by a principled patch, which is what this patch purports to be.

The key point behind this patch is that when in_interrupt() returns
true, __raise_softirq_irqoff() will never attempt a wakeup.  Therefore,
if in_interrupt(), calls to raise_softirq*() are both safe and
extremely cheap.

This commit therefore replaces the in_irq() calls in the "if" statement
in rcu_read_unlock_special() with in_interrupt() and simplifies the
"if" condition to the following:

	if (irqs_were_disabled && use_softirq &&
	    (in_interrupt() ||
	     (exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
		raise_softirq_irqoff(RCU_SOFTIRQ);
	} else {
		/* Appeal to the scheduler. */
	}

The rationale behind the "if" condition is as follows:

1.	irqs_were_disabled:  If interrupts are enabled, we should
	instead appeal to the scheduler so as to let the upcoming
	irq_enable()/local_bh_enable() do the rescheduling for us.
2.	use_softirq: If this kernel isn't using softirq, then
	raise_softirq_irqoff() will be unhelpful.
3.	a.	in_interrupt(): If this returns true, the subsequent
		call to raise_softirq_irqoff() is guaranteed not to
		do a wakeup, so that call will be both very cheap and
		quite safe.
	b.	Otherwise, if !in_interrupt() the raise_softirq_irqoff()
		might do a wakeup, which is expensive and, in some
		contexts, unsafe.
		i.	The "exp" (an expedited RCU grace period is being
			blocked) says that the wakeup is worthwhile, and:
		ii.	The !.deferred_qs says that scheduler locks
			cannot be held, so the wakeup will be safe.

Backporting this requires considerable care, so no auto-backport, please!

Fixes: 05f415715c ("rcu: Speed up expedited GPs when interrupting RCU reader")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:04:20 -07:00
Paul E. McKenney
d143b3d1cd rcu: Simplify rcu_read_unlock_special() deferred wakeups
In !use_softirq runs, we clearly cannot rely on raise_softirq() and
its lightweight bit setting, so we must instead do some form of wakeup.
In the absence of a self-IPI when interrupts are disabled, these wakeups
can be delayed until the next interrupt occurs.  This means that calling
invoke_rcu_core() doesn't actually do any expediting.

In this case, it is better to take the "else" clause, which sets the
current CPU's resched bits and, if there is an expedited grace period
in flight, uses IRQ-work to force the needed self-IPI.  This commit
therefore removes the "else if" clause that calls invoke_rcu_core().

Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-01 14:04:20 -07:00
Thomas Gleixner
01b1d88b09 rcu: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
functionality which today depends on CONFIG_PREEMPT.

Switch the conditionals in RCU to use CONFIG_PREEMPTION.

That's the first step towards RCU on RT. The further tweaks are work in
progress. This neither touches the selftest bits which need a closer look
by Paul.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190726212124.210156346@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31 19:03:35 +02:00
Juri Lelli
1a763fd7c6 rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region
sched_setscheduler() needs to acquire cpuset_rwsem, but it is currently
called from an invalid (atomic) context by rcu_spawn_gp_kthread().

Fix that by simply moving sched_setscheduler_nocheck() call outside of
the atomic region, as it doesn't actually require to be guarded by
rcu_node lock.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-8-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25 15:55:03 +02:00
Ingo Molnar
83086d654d Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull rcu/next + tools/memory-model changes from Paul E. McKenney:

 - RCU flavor consolidation cleanups and optmizations
 - Documentation updates
 - Miscellaneous fixes
 - SRCU updates
 - RCU-sync flavor consolidation
 - Torture-test updates
 - Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-28 19:46:47 +02:00
Paul E. McKenney
11ca7a9d54 Merge branches 'consolidate.2019.05.28a', 'doc.2019.05.28a', 'fixes.2019.06.13a', 'srcu.2019.05.28a', 'sync.2019.05.28a' and 'torture.2019.05.28a' into HEAD
consolidate.2019.05.28a: RCU flavor consolidation cleanups and optmizations.
doc.2019.05.28a: Documentation updates.
fixes.2019.06.13a: Miscellaneous fixes.
srcu.2019.05.28a: SRCU updates.
sync.2019.05.28a: RCU-sync flavor consolidation.
torture.2019.05.28a: Torture-test updates.
2019-06-19 09:21:46 -07:00
Paul E. McKenney
96050c68be rcu: Upgrade sync_exp_work_done() to smp_mb()
The sync_exp_work_done() function uses smp_mb__before_atomic(), but
there is no obvious atomic in the ensuing code.  The ordering is
absolutely required for grace periods to work correctly, so this
commit upgrades the smp_mb__before_atomic() to smp_mb().

Fixes: 6fba2b3767 ("rcu: Remove deprecated RCU debugfs tracing code")
Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-06-13 15:33:19 -07:00
Paul E. McKenney
354ea05d02 rcutorture: Upper case solves the case of the vanishing NULL pointer
Various security techniques can obfuscate pointer printouts on the
console.  Unfortunately, rcutorture relies on either "null" or all zeroes
to identify the last few statistics printouts at the end of the test.
These need to be identified because failing to do so will results in
false-positive complaints about grace-period hangs.

This commit therefore prints the "ver:" in capitals ("VER:") when
the RCU-protected pointer has been set to NULL, which causes rcutorture's
parse-console.sh script to correctly ignore these lines.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
34aa34b818 rcutorture: Dump trace buffer for callback pipe drain failures
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
c682db558e rcutorture: Add trivial RCU implementation
I have been showing off a trivial RCU implementation for non-preemptive
environments for some time now:

	#define rcu_read_lock()
	#define rcu_read_unlock()
	#define rcu_dereference(p) READ_ONCE(p)
	#define rcu_assign_pointer(p, v) smp_store_release(&(p), (v))
	void synchronize_rcu(void)
	{
	int cpu;
		for_each_online_cpu(cpu)
			sched_setaffinity(current->pid, cpumask_of(cpu));
	}

Trivial or not, as the old saying goes, "if it ain't tested, it don't
work!".  This commit therefore adds a "trivial" flavor to rcutorture
and a corresponding TRIVIAL test scenario.  This variant does not handle
CPU hotplug, which is unconditionally enabled on x86 for post-v5.1-rc3
kernels, which is why the TRIVIAL.boot says "rcutorture.onoff_interval=0".
This commit actually does handle CONFIG_PREEMPT=y kernels, but only
because it turns back the Linux-kernel clock in order to provide these
alternative definitions (or the moral equivalent thereof):

	#define rcu_read_lock() preempt_disable()
	#define rcu_read_unlock() preempt_enable()

In CONFIG_PREEMPT=n kernels without debugging, these are equivalent to
empty macros give or take a compiler barrier.  However, the have been
successfully tested with actual empty macros as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix symbol issue reported by kbuild test robot <lkp@intel.com>. ]
[ paulmck: Work around sched_setaffinity() issue noted by Andrea Parri. ]
[ paulmck: Add rcutorture.shuffle_interval=0 to TRIVIAL.boot to fix
  interaction with shuffler task noted by Peter Zijlstra. ]
Tested-by: Andrea Parri <andrea.parri@amarulasolutions.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
3432d765c5 rcutorture: Halt forward-progress checks at end of run
Once removed, an rcu_torture element can be deferred-freed by a chain
of call_rcu() invocations, with each callback invoking another round of
call_rcu() until either a fixed number of call_rcu() invocations have
been chained or until the test ends.  This means that if the test ends,
some of the rcu_torture elements will be "stranded" partway through the
deferred-free process, which results in false-positive warnings from
rcu_torture_writer() due to lack of forward progress should the test
end just at the end of a stutter interval.

This commit therefore suppresses rcu_torture_writer()'s forward-progress
checks when the test ends in order to avoid these false-positive reports..

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
ab21f6081f rcutorture: Give the scheduler a chance on PREEMPT && NO_HZ_FULL kernels
In !PREEMPT kernels, cond_resched() is a no-op.  In NO_HZ_FULL kernels,
in-kernel execution (such as that of rcutorture's kthreads) might extend
indefinitely without the scheduler gaining the aid of a scheduling-clock
interrupt.  This combination can make the interaction of an rcutorture
forward-progress test and a CPU-hotplug stop_machine operation make less
forward progress than one might like.  Additionally, Sebastian Siewior
notes that NO_HZ_FULL kernels have a scheduler check upon return to
userspace execution, which suggests that in-kernel emulation of tight
userspace loops containing system calls doing call_rcu() might also need
explicit checks in the PREEMPT && NO_HZ_FULL case.

This commit therefore introduces a rcu_torture_fwd_prog_cond_resched()
function that explicitly invokes schedule() in such kernels whenever
need_resched() returns true, while retaining use of cond_resched()
for kernels that are either !PREEMPT or !NO_HZ_FULL.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
5eabea594b rcutorture: Exempt tasks RCU from timely draining of grace periods
After the end of each stutter pause interval, the rcu_torture_writer()
kthread checks to be sure that all prior callbacks have completed so
that all the test structures have been freed.  This works fine except
for tasks RCU, in which grace periods can take one good long time.
This commit therefore exempts tasks RCU from this check.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
ff3bf92d90 torture: Allow inter-stutter interval to be specified
Currently, the inter-stutter interval is the same as the stutter duration,
that is, whatever number of jiffies is passed into torture_stutter_init().
This has worked well for quite some time, but the addition of
forward-progress testing to rcutorture can delay processes for several
seconds, which can triple the time that they are stuttered.

This commit therefore adds a second argument to torture_stutter_init()
that specifies the inter-stutter interval.  While locktorture preserves
the current behavior, rcutorture uses the RCU CPU stall warning interval
to provide a wider inter-stutter interval.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
e8516c64fe rcutorture: Fix stutter_wait() return value and freelist checks
The stutter_wait() function is supposed to return true if it actually
waits and false otherwise, but it instead unconditionally returns false.
Which hides a bug in rcu_torture_writer() that fails to account for
the fact that one of the rcu_tortures[] array elements will normally be
referenced by rcu_torture_current, and thus not be on the freelist.

This commit therefore corrects the stutter_wait() return value and adds a
check for rcu_torture_current to rcu_torture_writer()'s check that things
get freed after everything goes quiescent.  In addition, this commit
causes torture_stutter() to give a bit more than one second (instead of
only one jiffy) warning of the end of the stutter interval.  Finally,
this commit disables long-delay readers and aggressive update-side
forward-progress checks while forward-progress testing is in flight.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Paul E. McKenney
140e53f20b rcutorture: Add cond_resched() to forward-progress free-up loop
The rcu_torture_fwd_prog_cbfree() function frees callbacks used during
rcutorture's call_rcu() forward-progress test, but does so in a tight
loop.  This could cause problems given a very long list of callbacks to be
freed, and actual testing produces lists with as many as 25M callbacks.
This commit therefore adds a cond_resched() to this loop.  While in
the area, this commit also rearranges the lock releases to look a bit
more sane.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:06:09 -07:00
Oleg Nesterov
89da3b94bb rcu/sync: Simplify the state machine
With this patch rcu_sync has a single state variable and the transition rules
become really simple:

	GP_IDLE   - owned by the first rcu_sync_enter() which moves it to

	GP_ENTER  - owned by rcu-callback which moves it to

	GP_PASSED - owned by the last rcu_sync_exit() which moves it to

	GP_EXIT   - and this is the only "nontrivial" state.

		rcu-callback moves it back to GP_IDLE unless another enter()
		comes before a GP pass.

		If rcu-callback is invoked before the next rcu_sync_exit() it
		must see gp_count incremented by that enter() and set GP_PASSED.

		Otherwise, if the next rcu_sync_exit() wins the race, it will
		move it to

	GP_REPLAY - owned by rcu-callback which moves it to GP_EXIT

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[ paulmck: While here, apply READ_ONCE() and WRITE_ONCE() to ->gp_state. ]
[ paulmck: Tweaks to make htmldocs happy. (Reported by kbuild test robot.) ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:05:23 -07:00
Oleg Nesterov
95bf33b55f rcu/sync: Kill rcu_sync_type/gp_type
Now that the RCU flavors have been consolidated, rcu_sync_type makes no
sense because none of internal update functions aside from .held() depend
on gp_type.  This commit therefore removes this field and consolidates
the relevant code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[ paulmck: Added RCU and RCU-bh checks to rcu_sync_is_idle(). ]
[ paulmck: And applied subsequent feedback from Oleg Nesterov. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:05:23 -07:00
Jiang Biao
11b000457f rcu: Make __call_srcu static
Because __call_srcu() is not used outside kernel/rcu/srcutree.c,
this commit makes it static.

Signed-off-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:03:35 -07:00
Paul E. McKenney
fe15b50cde srcu: Allocate per-CPU data for DEFINE_SRCU() in modules
Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module requires
that the size of the reserved region be increased, which is not something
we want to be doing all that often.  One approach would be to require
that loadable modules define an srcu_struct and invoke init_srcu_struct()
from their module_init function and cleanup_srcu_struct() from their
module_exit function.  However, this is more than a bit user unfriendly.

This commit therefore creates an ___srcu_struct_ptrs linker section,
and pointers to srcu_struct structures created by DEFINE_SRCU() and
DEFINE_STATIC_SRCU() within a module are placed into that module's
___srcu_struct_ptrs section.  The required init_srcu_struct() and
cleanup_srcu_struct() functions are then automatically invoked as needed
when that module is loaded and unloaded, thus allowing modules to continue
to use DEFINE_SRCU() and DEFINE_STATIC_SRCU() while avoiding the need
to increase the size of the reserved region.

Many of the algorithms and some of the code was cheerfully cherry-picked
from other code making use of linker sections, perhaps most notably from
tracepoints.  All bugs are nevertheless the sole property of the author.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[ paulmck: Use __section() and use "default" in srcu_module_notify()'s
  "switch" statement as suggested by Joel Fernandes. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-05-28 09:03:35 -07:00
Paul E. McKenney
d5a9a8c3bc rcu: Set a maximum limit for back-to-back callback invocation
Currently, if a CPU has more than 10,000 callbacks pending, it will
increase rdp->blimit to LONG_MAX.  If you are lucky, LONG_MAX is only
about two billion, but this is still a bit too many callbacks to invoke
back-to-back while otherwise ignoring the world.

This commit therefore sets a maximum limit of DEFAULT_MAX_RCU_BLIMIT,
which is set to 10,000, for rdp->blimit.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:02:57 -07:00
Neeraj Upadhyay
3ae976a7e3 rcu: Correctly unlock root node in rcu_check_gp_start_stall()
On systems whose rcu_node tree has only one node, the
rcu_check_gp_start_stall() function's values of rnp and rnp_root will
be identical.  In this case, it clearly does not make sense to release
both rnp->lock and rnp_root->lock, but that is exactly what this function
does in the last early exit.  This commit therefore unlocks only rnp->lock
when rnp and rnp_root are equal.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:02:57 -07:00
Neeraj Upadhyay
cd6d17b4a4 rcu: Dump specified number of blocked tasks
The dump_blkd_tasks() function dumps at most 10 blocked tasks, ignoring
the value of the ncheck parameter.  This commit therefore substitutes
the value of ncheck for the hard-coded value of 10.  Because all callers
currently pass 10 as the number, this patch does not change behavior,
but it is clearly an accident waiting to happen.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:02:57 -07:00
Jiang Biao
f0b6356273 rcu: Remove unused rdp local from synchronize_rcu_expedited()
Because rdp is initialized but never used in synchronize_rcu_expedited(),
this commit removes it.

Signed-off-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 08:48:19 -07:00
Paul E. McKenney
1bb336443c rcu: Rename rcu_data's ->deferred_qs to ->exp_deferred_qs
The rcu_data structure's ->deferred_qs field is used to indicate that the
current CPU is blocking an expedited grace period (perhaps a future one).
Given that it is used only for expedited grace periods, its current name
is misleading, so this commit renames it to ->exp_deferred_qs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 08:48:19 -07:00
Joel Fernandes (Google)
eddded8012 rcu: Add checks for dynticks counters in rcu_is_cpu_rrupt_from_idle()
It would be good to combine the dynticks and dynticks_nesting counters
in order to simplify the code.  Unfortunately, there are concerns
about usermode upcalls appearing to RCU as half of an interrupt, as
Byungchul learned [1].  The "half" in "half interrupt" is due to an
unpaired rcu_irq_enter(): Normally, each rcu_irq_enter() has a later
call to rcu_irq_exit().

Out of an abundance of caution, Paul added warnings [2] in the RCU
code which if not fired by 2021 will be interpreted as meaning that
this half-interrupt scenario cannot happen any more, thus permitting
simplification of this code.

In the meantime, this commit makes the following changes:

(1) Combining these two counters requires that rcu_rrupt_from_idle()
    is invoked only from hard-interrupt contexts as discussed here [3].
    This commit therefore adds the required lockdep_assert_in_irq()
    to check this constraint.

(2) Furthermore, rcu_rrupt_from_idle() is not explicit about how it
    is using the counters which can lead to weird future bugs. This
    commit therefore adds comments indicating the meaning and use of
    each counter.

(3) Lastly, this commit checks for counter underflows as another check
    that half interrupts don't occur.  (Previously, the function would
    simply return true upon underflow.)

All these checks checks are NOOPs if PROVE_LOCKING (and thus PROVE_RCU)
are disabled.

[1] https://lore.kernel.org/patchwork/patch/952349/
[2] Commit e11ec65cc8 ("rcu: Add warning to detect half-interrupts")
[3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/

Cc: byungchul.park@lge.com
Cc: kernel-team@android.com
Cc: rcu@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 08:48:19 -07:00
Paul E. McKenney
e015a34112 rcu: Avoid self-IPI in sync_sched_exp_online_cleanup()
The sync_sched_exp_online_cleanup() is invoked at online time to handle
the case where the start of an expedited grace period ran concurrently
with a CPU being taken offline and then immediately being placed online.
It checks to see if RCU needs an expedited quiescent state from the
incoming CPU, sending it an IPI if so.  However, it is quite possible
that sync_sched_exp_online_cleanup() is running on that CPU, in which
case it is considerably less overhead to simply request the quiescent
state locally instead of simulating a self-IPI.

This commit therefore places the last few lines of rcu_exp_handler()
into a new rcu_exp_need_qs() function, which is invoked both by
rcu_exp_handler() and by sync_sched_exp_online_cleanup() in the self-IPI
case.

This also reduces the rcu_exp_handler() function's state space by
removing the direct call that this smp_call_function_single() uses to
emulate the requested self-IPI.  This in turn will allow tighter error
checking in rcu_is_cpu_rrupt_from_idle().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-05-25 14:50:50 -07:00
Paul E. McKenney
b9ad4d6ed1 rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()
Although sync_rcu_exp_select_node_cpus() treats the current CPU as being
in a quiescent state, it might well migrate to some other CPU before
reaching the smp_call_function_single(), which could then result in an
unnecessary simulated self-IPI.  This commit therefore instead simply
refuses to invoke smp_call_function_single() on the current CPU, which
causes the later rcu_report_exp_cpu_mult() to report this CPU's quiescent
state with less overhead.

This also reduces the rcu_exp_handler() function's state space by removing
the direct call that this smp_call_function_single() uses to emulate the
requested self-IPI.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Use get_cpu() instead of preempt_disable() per Joel Fernandes. ]
2019-05-25 14:50:50 -07:00
Paul E. McKenney
43e903ad3e rcu: Inline invoke_rcu_callbacks() into its sole remaining caller
This commit saves a few lines of code by inlining invoke_rcu_callbacks()
into its sole remaining caller.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-25 14:50:49 -07:00
Paul E. McKenney
0864f057b0 rcu: Use irq_work to get scheduler's attention in clean context
When rcu_read_unlock_special() is invoked with interrupts disabled, is
either not in an interrupt handler or is not using RCU_SOFTIRQ, is not
the first RCU read-side critical section in the chain, and either there
is an expedited grace period in flight or this is a NO_HZ_FULL kernel,
the end of the grace period can be unduly delayed.  The reason for this
is that it is not safe to do wakeups in this situation.

This commit fixes this problem by using the irq_work subsystem to
force a later interrupt handler in a clean environment.  Because
set_tsk_need_resched(current) and set_preempt_need_resched() are
invoked prior to this, the scheduler will force a context switch
upon return from this interrupt (though perhaps at the end of any
interrupted preempt-disable or BH-disable region of code), which will
invoke rcu_note_context_switch() (again in a clean environment), which
will in turn give RCU the chance to report the deferred quiescent state.

Of course, by then this task might be within another RCU read-side
critical section.  But that will be detected at that time and reporting
will be further deferred to the outermost rcu_read_unlock().  See
rcu_preempt_need_deferred_qs() and rcu_preempt_deferred_qs() for more
details on the checking.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-25 14:50:49 -07:00
Paul E. McKenney
385b599e8c rcu: Allow rcu_read_unlock_special() to raise_softirq() if in_irq()
When running in an interrupt handler, raise_softirq() and
raise_softirq_irqoff() have extremely low overhead: They simply set a
bit in a per-CPU mask, which is checked upon exit from that interrupt
handler.  Therefore, if rcu_read_unlock_special() is invoked within an
interrupt handler and RCU_SOFTIRQ is in use, this commit make use of
raise_softirq_irqoff() even if there is no expedited grace period in
flight and even if this is not a nohz_full CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-25 14:50:48 -07:00
Paul E. McKenney
25102de65f rcu: Only do rcu_read_unlock_special() wakeups if expedited
Currently, rcu_read_unlock_special() will do wakeups whenever it is safe
to do so.  However, wakeups are expensive, and they are only really
needed when the just-ended RCU read-side critical section is blocking
an expedited grace period (in which case speed is of the essence)
or on a nohz_full CPU (where it might be a good long time before an
interrupt arrives).  This commit therefore checks for these conditions,
and does the expensive wakeups only if doing so would be useful.

Note it can be rather expensive to determine whether or not the current
task (as opposed to the current CPU) is blocking the current expedited
grace period.  Doing so requires traversing the ->blkd_tasks list, which
can be quite long.  This commit therefore cheats:  If the current task
is on a given ->blkd_tasks list, and some task on that list is blocking
the current expedited grace period, the code assumes that the current
task is blocking that expedited grace period.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-25 14:50:48 -07:00
Paul E. McKenney
23634ebc1d rcu: Check for wakeup-safe conditions in rcu_read_unlock_special()
When RCU core processing is offloaded from RCU_SOFTIRQ to the rcuc
kthreads, a full and unconditional wakeup is required to initiate RCU
core processing.  In contrast, when RCU core processing is carried
out by RCU_SOFTIRQ, a raise_softirq() suffices.  Of course, there are
situations where raise_softirq() does a full wakeup, but these do not
occur with normal usage of rcu_read_unlock().

The reason that full wakeups can be problematic is that the scheduler
sometimes invokes rcu_read_unlock() with its pi or rq locks held,
which can of course result in deadlock in CONFIG_PREEMPT=y kernels when
rcu_read_unlock() invokes the scheduler.  Scheduler invocations can happen
in the following situations: (1) The just-ended reader has been subjected
to RCU priority boosting, in which case rcu_read_unlock() must deboost,
(2) Interrupts were disabled across the call to rcu_read_unlock(), so
the quiescent state must be deferred, requiring a wakeup of the rcuc
kthread corresponding to the current CPU.

Now, the scheduler may hold one of its locks across rcu_read_unlock()
only if preemption has been disabled across the entire RCU read-side
critical section, which in the days prior to RCU flavor consolidation
meant that rcu_read_unlock() never needed to do wakeups.  However, this
is no longer the case for any but the first rcu_read_unlock() following a
condition (e.g., preempted RCU reader) requiring special rcu_read_unlock()
attention.  For example, an RCU read-side critical section might be
preempted, but preemption might be disabled across the rcu_read_unlock().
The rcu_read_unlock() must defer the quiescent state, and therefore
leaves the task queued on its leaf rcu_node structure.  If a scheduler
interrupt occurs, the scheduler might well invoke rcu_read_unlock() with
one of its locks held.  However, the preempted task is still queued, so
rcu_read_unlock() will attempt to defer the quiescent state once more.
When RCU core processing is carried out by RCU_SOFTIRQ, this works just
fine: The raise_softirq() function simply sets a bit in a per-CPU mask
and the RCU core processing will be undertaken upon return from interrupt.

Not so when RCU core processing is carried out by the rcuc kthread: In this
case, the required wakeup can result in deadlock.

The initial solution to this problem was to use set_tsk_need_resched() and
set_preempt_need_resched() to force a future context switch, which allows
rcu_preempt_note_context_switch() to report the deferred quiescent state
to RCU's core processing.  Unfortunately for expedited grace periods,
there can be a significant delay between the call for a context switch
and the actual context switch.

This commit therefore introduces a ->deferred_qs flag to the task_struct
structure's rcu_special structure.  This flag is initially false, and
is set to true by the first call to rcu_read_unlock() requiring special
attention, then finally reset back to false when the quiescent state is
finally reported.  Then rcu_read_unlock() attempts full wakeups only when
->deferred_qs is false, that is, on the first rcu_read_unlock() requiring
special attention.  Note that a chain of RCU readers linked by some other
sort of reader may find that a later rcu_read_unlock() is once again able
to do a full wakeup, courtesy of an intervening preemption:

	rcu_read_lock();
	/* preempted */
	local_irq_disable();
	rcu_read_unlock(); /* Can do full wakeup, sets ->deferred_qs. */
	rcu_read_lock();
	local_irq_enable();
	preempt_disable()
	rcu_read_unlock(); /* Cannot do full wakeup, ->deferred_qs set. */
	rcu_read_lock();
	preempt_enable();
	/* preempted, >deferred_qs reset. */
	local_irq_disable();
	rcu_read_unlock(); /* Can again do full wakeup, sets ->deferred_qs. */

Such linked RCU readers do not yet seem to appear in the Linux kernel, and
it is probably best if they don't.  However, RCU needs to handle them, and
some variations on this theme could make even raise_softirq() unsafe due to
the possibility of its doing a full wakeup.  This commit therefore also
avoids invoking raise_softirq() when the ->deferred_qs set flag is set.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2019-05-25 14:50:47 -07:00
Sebastian Andrzej Siewior
48d07c04b4 rcu: Enable elimination of Tree-RCU softirq processing
Some workloads need to change kthread priority for RCU core processing
without affecting other softirq work.  This commit therefore introduces
the rcutree.use_softirq kernel boot parameter, which moves the RCU core
work from softirq to a per-CPU SCHED_OTHER kthread named rcuc.  Use of
SCHED_OTHER approach avoids the scalability problems that appeared
with the earlier attempt to move RCU core processing to from softirq
to kthreads.  That said, kernels built with RCU_BOOST=y will run the
rcuc kthreads at the RCU-boosting priority.

Note that rcutree.use_softirq=0 must be specified to move RCU core
processing to the rcuc kthreads: rcutree.use_softirq=1 is the default.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked
  from RCU core processing, in contrast to softirq->rcuc transition
  in old mainline RCU priority boosting. ]
[ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock()
  while holding rq or pi locks, also possibly fixing a pre-existing latent
  bug involving raise_softirq()-induced wakeups. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-25 14:50:46 -07:00
Thomas Gleixner
ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Linus Torvalds
d2d8b14604 The major changes in this tracing update includes:
- Removing of non-DYNAMIC_FTRACE from 32bit x86
 
  - Removing of mcount support from x86
 
  - Emulating a call from int3 on x86_64, fixes live kernel patching
 
  - Consolidated Tracing Error logs file
 
 Minor updates:
 
  - Removal of klp_check_compiler_support()
 
  - kdb ftrace dumping output changes
 
  - Accessing and creating ftrace instances from inside the kernel
 
  - Clean up of #define if macro
 
  - Introduction of TRACE_EVENT_NOP() to disable trace events based on config
    options
 
 And other minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r
 ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw=
 =RVQo
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The major changes in this tracing update includes:

   - Removal of non-DYNAMIC_FTRACE from 32bit x86

   - Removal of mcount support from x86

   - Emulating a call from int3 on x86_64, fixes live kernel patching

   - Consolidated Tracing Error logs file

  Minor updates:

   - Removal of klp_check_compiler_support()

   - kdb ftrace dumping output changes

   - Accessing and creating ftrace instances from inside the kernel

   - Clean up of #define if macro

   - Introduction of TRACE_EVENT_NOP() to disable trace events based on
     config options

  And other minor fixes and clean ups"

* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
  x86: Hide the int3_emulate_call/jmp functions from UML
  livepatch: Remove klp_check_compiler_support()
  ftrace/x86: Remove mcount support
  ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
  tracing: Simplify "if" macro code
  tracing: Fix documentation about disabling options using trace_options
  tracing: Replace kzalloc with kcalloc
  tracing: Fix partial reading of trace event's id file
  tracing: Allow RCU to run between postponed startup tests
  tracing: Fix white space issues in parse_pred() function
  tracing: Eliminate const char[] auto variables
  ring-buffer: Fix mispelling of Calculate
  tracing: probeevent: Fix to make the type of $comm string
  tracing: probeevent: Do not accumulate on ret variable
  tracing: uprobes: Re-enable $comm support for uprobe events
  ftrace/x86_64: Emulate call function while updating in breakpoint handler
  x86_64: Allow breakpoints to emulate call instructions
  x86_64: Add gap to int3 to allow for call emulation
  tracing: kdb: Allow ftdump to skip all but the last few entries
  tracing: Add trace_total_entries() / trace_total_entries_cpu()
  ...
2019-05-15 16:05:47 -07:00
Linus Torvalds
0968621917 Printk changes for 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA
 lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0
 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO
 EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK
 r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k
 FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq
 uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj
 lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR
 waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w
 wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55
 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10
 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY=
 =WZfC
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow state reset of printk_once() calls.

 - Prevent crashes when dereferencing invalid pointers in vsprintf().
   Only the first byte is checked for simplicity.

 - Make vsprintf warnings consistent and inlined.

 - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
   modifiers.

 - Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  lib/vsprintf: Make function pointer_string static
  vsprintf: Limit the length of inlined error messages
  vsprintf: Avoid confusion between invalid address and value
  vsprintf: Prevent crash when dereferencing invalid pointers
  vsprintf: Consolidate handling of unknown pointer specifiers
  vsprintf: Factor out %pO handler as kobject_string()
  vsprintf: Factor out %pV handler as va_format()
  vsprintf: Factor out %p[iI] handler as ip_addr_string()
  vsprintf: Do not check address of well-known strings
  vsprintf: Consistent %pK handling for kptr_restrict == 0
  vsprintf: Shuffle restricted_pointer()
  printk: Tie printk_once / printk_deferred_once into .data.once for reset
  treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
  lib/test_printf: Switch to bitmap_zalloc()
2019-05-07 09:18:12 -07:00
Paul E. McKenney
6cdbc07a5a Merge branches 'consolidate.2019.04.09a', 'doc.2019.03.26b', 'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD
consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups.
doc.2019.03.26b: Documentation updates.
fixes.2019.03.26b: Miscellaneous fixes.
srcu.2019.03.26b: SRCU updates.
stall.2019.03.26b: RCU CPU stall warning updates.
torture.2019.03.26b: Torture-test updates.
2019-04-09 08:08:13 -07:00
Sakari Ailus
d75f773c86 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-09 14:19:06 +02:00
Yafang Shao
4f5fbd78a7 rcu: validate arguments for rcu tracepoints
When CONFIG_RCU_TRACE is not set, all these tracepoints are defined as
do-nothing macro.
We'd better make those inline functions that take proper arguments.

As RCU_TRACE() is defined as do-nothing marco as well when
CONFIG_RCU_TRACE is not set, so we can clean it up.

Link: http://lkml.kernel.org/r/1553602391-11926-4-git-send-email-laoar.shao@gmail.com

Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-08 09:22:51 -04:00
Paul E. McKenney
ad092c0277 rcuperf: Fix cleanup path for invalid perf_type strings
If the specified rcuperf.perf_type is not in the rcu_perf_init()
function's perf_ops[] array, rcuperf prints some console messages and
then invokes rcu_perf_cleanup() to set state so that a future torture
test can run.  However, rcu_perf_cleanup() also attempts to end the
test that didn't actually start, and in doing so relies on the value
of cur_ops, a value that is not particularly relevant in this case.
This can result in confusing output or even follow-on failures due to
attempts to use facilities that have not been properly initialized.

This commit therefore sets the value of cur_ops to NULL in this case and
inserts a check near the beginning of rcu_perf_cleanup(), thus avoiding
relying on an irrelevant cur_ops value.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:42:53 -07:00
Paul E. McKenney
b813afae7a rcutorture: Fix cleanup path for invalid torture_type strings
If the specified rcutorture.torture_type is not in the rcu_torture_init()
function's torture_ops[] array, rcutorture prints some console messages
and then invokes rcu_torture_cleanup() to set state so that a future
torture test can run.  However, rcu_torture_cleanup() also attempts to
end the test that didn't actually start, and in doing so relies on the
value of cur_ops, a value that is not particularly relevant in this case.
This can result in confusing output or even follow-on failures due to
attempts to use facilities that have not been properly initialized.

This commit therefore sets the value of cur_ops to NULL in this case
and inserts a check near the beginning of rcu_torture_cleanup(),
thus avoiding relying on an irrelevant cur_ops value.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:42:53 -07:00
Neeraj Upadhyay
d44ac1bebc rcutorture: Fix expected forward progress duration in OOM notifier
The rcutorture_oom_notify() function has a misplaced close parenthesis
that results in increasingly long delays in rcu_fwd_progress_check()'s
checking for various RCU forward-progress problems.  This commit therefore
puts the parenthesis in the right place.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:42:53 -07:00
Paul E. McKenney
f47cb1bb0d rcutorture: Remove ->ext_irq_conflict field
Back when there was a separate RCU-bh flavor, the ->ext_irq_conflict
field was used to prevent executing local_bh_enable() while interrupts
were disabled.  However, there is no longer an RCU-bh flavor, so this
commit removes the no-longer-needed ->ext_irq_conflict field.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:42:53 -07:00
Paul E. McKenney
a3b0e1e59e rcutorture: Make rcutorture_extend_mask() comment match the code
The code actually rarely uses more than one type of RCU read-side
protection, as is actually desired given that we need some reasonable
probability of preempting RCU read-side critical sections, which cannot
happen with multiple types of protection.  This comment therefore adjusts
the comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:42:53 -07:00
Neeraj Upadhyay
6c70e9cd5f rcu: Fix nohz status in stall warning
The Documentation/RCU/stallwarn.txt file says that stall warnings
print "D" if dyntick-idle processing is enabled, but the code in
print_cpu_stall_fast_no_hz() prints "." instead.  This commit therefore
reverses the sense of the test to make the code match the documentation.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:42:00 -07:00
Paul E. McKenney
b51bcbbf16 rcu: Move forward-progress checkers into tree_stall.h
This commit further consolidates stall-warning functionality by moving
forward-progress checkers into kernel/rcu/tree_stall.h, updating a
comment or two while in the area.  More specifically, this commit moves
show_rcu_gp_kthreads(), rcu_check_gp_start_stall(), rcu_fwd_progress_check(),
sysrq_rcu, sysrq_show_rcu(), sysrq_rcudump_op, and rcu_sysrq_init() from
kernel/rcu/tree.c to kernel/rcu/tree_stall.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
7ac1907c9e rcu: Move irq-disabled stall-warning checking to tree_stall.h
The rcu_iw_handler() function's sole purpose in life is to indicate
whether a stalled CPU had interrupts disabled, so it belongs in
kernel/rcu/tree_stall.h.  This commit therefore makes that move,
clarifying its header comment while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
e23344c2ca rcu: Organize functions in tree_stall.h
This commit does only code movement, removal of now-unneeded forward
declarations, and addition of comments.  It organizes the functions
that implement RCU CPU stall warnings for normal grace periods into
three categories:

1.	Control of RCU CPU stall warnings, including computing timeouts.

2.	Interaction of stall warnings with grace periods.

3.	Actual printing of the RCU CPU stall-warning messages.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
59b73a2768 rcu: Move FAST_NO_HZ stall-warning code to tree_stall.h
This commit further consolidates the stall-warning code by moving
print_cpu_stall_info() and its helper functions along with
zero_cpu_stall_ticks() to kernel/rcu/tree_stall.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
40e69ac7d0 rcu: Inline RCU stall-warning info helper functions
The print_cpu_stall_info_begin() and print_cpu_stall_info_end() print a
single character each onto the console, and are a holdover from a time
when RCU CPU stall warning messages could be abbreviated using a long-gone
Kconfig option.  This commit therefore adds these single characters to
already-printed strings in the calling functions, and then eliminates
both print_cpu_stall_info_begin() and print_cpu_stall_info_end().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
d87cda5094 rcu: Move rcu_print_task_exp_stall() to tree_exp.h
Because expedited CPU stall warnings are contained within the
kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live
there too.  This commit carries out the required code motion.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
21d0d79ab0 rcu: Inline RCU task stall-warning helper functions
The rcu_print_detail_task_stall(), rcu_print_task_stall_begin(), and
rcu_print_task_stall_end() functions were defined to allow long-gone
Kconfig options to provide an abbreviated RCU CPU stall warning printout.
This commit saves a few lines of code by inlining them into their sole
callers.

While in the area, a useless call of rcu_print_detail_task_stall_rnp()
on the root rcu_node structure was eliminated.  If there is only one
rcu_node structure, its tasks get printed twice, but if there are more,
the root rcu_node structure is guaranteed to have an empty list of blocked
tasks, hence the uselessness.  (Long ago, root rcu_node structures with
non-empty ->blkd_tasks lists could happen, but no longer.)

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
32255d51b6 rcu: Move RCU CPU stall-warning code out of tree.c
This commit completes the process of consolidating the code for RCU CPU
stall warnings for normal grace periods by moving the remaining such
code from kernel/rcu/tree.c to kernel/rcu/tree_stall.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
3fc3d1709f rcu: Move RCU CPU stall-warning code out of tree_plugin.h
The RCU CPU stall-warning code for normal grace periods is currently
scattered across two files, due to earlier Tiny RCU support for RCU
CPU stall warnings and for old Kconfig options that have long since
been retired.  Given that it is hard for the lead RCU maintainer to
find relevant stall-warning code, it would be good to consolidate it.
This commit continues this process by moving stall-warning code from
kernel/rcu/tree_plugin.c to a new kernel/rcu/tree_stall.h file.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
10462d6f58 rcu: Move RCU CPU stall-warning code out of update.c
The RCU CPU stall-warning code for normal grace periods is currently
scattered across three files, due to earlier Tiny RCU support for RCU
CPU stall warnings and for old Kconfig options that have long since
been retired.  Given that it is hard for the lead RCU maintainer to
find relevant stall-warning code, it would be good to consolidate it.
This commit starts this process by moving stall-warning code from
kernel/rcu/update.c to a new kernel/rcu/tree_stall.h file.

Note that the definitions of rcu_cpu_stall_suppress and
rcu_cpu_stall_timeout must remain in kernel/rcu/update.h to provide
compatibility for kernel boot parameter lists.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
f5ad399149 srcu: Remove cleanup_srcu_struct_quiesced()
The cleanup_srcu_struct_quiesced() function was added because NVME
used WQ_MEM_RECLAIM workqueues and SRCU did not, which meant that
NVME workqueues waiting on SRCU workqueues could result in deadlocks
during low-memory conditions.  However, SRCU now also has WQ_MEM_RECLAIM
workqueues, so there is no longer a potential for deadlock.  Furthermore,
it turns out to be extremely hard to use cleanup_srcu_struct_quiesced()
correctly due to the fact that SRCU callback invocation accesses the
srcu_struct structure's per-CPU data area just after callbacks are
invoked.  Therefore, the usual practice of using srcu_barrier() to wait
for callbacks to be invoked before invoking cleanup_srcu_struct_quiesced()
fails because SRCU's callback-invocation workqueue handler might be
delayed, which can result in cleanup_srcu_struct_quiesced() being invoked
(and thus freeing the per-CPU data) before the SRCU's callback-invocation
workqueue handler is finished using that per-CPU data.  Nor is this a
theoretical problem: KASAN emitted use-after-free warnings because of
this problem on actual runs.

In short, NVME can now safely invoke cleanup_srcu_struct(), which
avoids the use-after-free scenario.  And cleanup_srcu_struct_quiesced()
is quite difficult to use safely.  This commit therefore removes
cleanup_srcu_struct_quiesced(), switching its sole user back to
cleanup_srcu_struct().  This effectively reverts the following pair
of commits:

f7194ac32c ("srcu: Add cleanup_srcu_struct_quiesced()")
4317228ad9 ("nvme: Avoid flush dependency in delete controller flow")

Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
2019-03-26 14:39:24 -07:00
Paul E. McKenney
5cdfd174ea srcu: Check for in-flight callbacks in _cleanup_srcu_struct()
If someone fails to drain the corresponding SRCU callbacks (for
example, by failing to invoke srcu_barrier()) before invoking either
cleanup_srcu_struct() or cleanup_srcu_struct_quiesced(), the resulting
diagnostic is an ambiguous use-after-free diagnostic, and even then
only if you are running something like KASAN.  This commit therefore
improves SRCU diagnostics by adding checks for in-flight callbacks at
_cleanup_srcu_struct() time.

Note that these diagnostics can still be defeated, for example, by
invoking call_srcu() concurrently with cleanup_srcu_struct().  Which is
a really bad idea, but sometimes all too easy to do.  But even then,
these diagnostics have at least some probability of catching the problem.

Reported-by: Sagi Grimberg <sagi@grimberg.me>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
2019-03-26 14:39:24 -07:00
Paul E. McKenney
add0d37b4f rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_special
The task_struct structure's ->rcu_read_unlock_special field is only ever
read or written by the owning task, but it is accessed both at process
and interrupt levels.  It may therefore be accessed using plain reads
and writes while interrupts are disabled, but must be accessed using
READ_ONCE() and WRITE_ONCE() or better otherwise.  This commit makes a
few adjustments to align with this discipline.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Paul E. McKenney
f1a98045ab rcu: Fix typo in tree_exp.h comment
This commit changes a rcu_exp_handler() comment from rcu_preempt_defer_qs()
to rcu_preempt_deferred_qs() in order to better match reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Paul E. McKenney
a2badefa85 rcu: Eliminate redundant NULL-pointer check
Because rcu_wake_cond() checks for a null task_struct pointer, there is
no need for its callers to do so.  This commit eliminates the redundant
check.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Zhouyi Zhou
5d8a752e31 rcu: Fix force_qs_rnp() header comment
Previously, threads blocked on offlining CPUS were migrated to the
root rcu_node structure, thus requiring RCU priority boosting on this
structure.  However, since commit d19fb8d1f3 ("rcu: Don't migrate
blocked tasks even if all corresponding CPUs offline"), RCU does not
migrate blocked tasks.  Consequently, RCU no longer does RCU priority
boosting on the root rcu_node structure as of commit 1be0085b51 ("rcu:
Don't initiate RCU priority boosting on root rcu_node").

This commit therefore brings comments for the force_qs_rnp() function's
header comment in line with this new no-root-boosting reality.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
[ paulmck: Also remove obsolete comment on suppressing new grace periods. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Paul E. McKenney
85f2b60c43 rcu: Update jiffies_to_sched_qs and adjust_jiffies_till_sched_qs() comments
This commit better documents the jiffies_to_sched_qs default-value
strategy used by adjust_jiffies_till_sched_qs()

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Neeraj Upadhyay
6973032a60 rcu: Default jiffies_to_sched_qs to jiffies_till_sched_qs
The current code only calls adjust_jiffies_till_sched_qs() if
jiffies_till_sched_qs is left at its default value, so when the
jiffies_till_sched_qs kernel-boot parameter actually is specified,
jiffies_to_sched_qs will be left with the value zero, which
will result in useless slowdowns of cond_resched().  This commit
therefore changes rcu_init_geometry() to unconditionally invoke
adjust_jiffies_till_sched_qs(), which ensures that jiffies_to_sched_qs
will be initialized in all cases, thus maintaining good cond_resched()
performance.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Neeraj Upadhyay
0f58d2ac2c rcu: Fix self-wakeups for grace-period kthread
The current rcu_gp_kthread_wake() function uses in_interrupt()
and thus does a self-wakeup from all interrupt contexts, including
the pointless case where the GP kthread happens to be running with
bottom halves disabled, along with the impossible case where the GP
kthread is running within an NMI handler (you are not supposed to invoke
rcu_gp_kthread_wake() from within an NMI handler.  This commit therefore
replaces the in_interrupt() with in_irq(), so that the self-wakeups
happen only from handlers for hardware interrupts and softirqs.
This also makes the code match the comment.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26 14:37:49 -07:00
Paul E. McKenney
497e42600b rcu: Report error for bad rcu_nocbs= parameter values
This commit prints a console message when cpulist_parse() reports a
bad list of CPUs, and sets all CPUs' bits in that case.  The reason for
setting all CPUs' bits is that this is the safe(r) choice for real-time
workloads, which would normally be the ones using the rcu_nocbs= kernel
boot parameter.  Either way, later RCU console log messages list the
actual set of CPUs whose RCU callbacks will be offloaded.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Paul E. McKenney
da8739f23f rcu: Allow rcu_nocbs= to specify all CPUs
Currently, the rcu_nocbs= kernel boot parameter requires that a specific
list of CPUs be specified, and has no way to say "all of them".
As noted by user RavFX in a comment to Phoronix topic 1002538, this
is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL
Kconfig option.  This commit therefore enables the rcu_nocbs= kernel boot
parameter to be given the string "all", as in "rcu_nocbs=all" to specify
that all CPUs on the system are to have their RCU callbacks offloaded.

Another approach would be to make cpulist_parse() check for "all", but
there are uses of cpulist_parse() that do other checking, which could
conflict with an "all".  This commit therefore focuses on the specific
use of cpulist_parse() in rcu_nocb_setup().

Just a note to other people who would like changes to Linux-kernel RCU:
If you send your requests to me directly, they might get fixed somewhat
faster.  RavFX's comment was posted on January 22, 2018 and I first saw
it on March 5, 2019.  And the only reason that I found it -at- -all- was
that I was looking for projects using RCU, and my search engine showed
me that Phoronix comment quite by accident.  Your choice, though!  ;-)

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Akira Yokosawa
b2eb85b49a rcu: Move common code out of if-else block
As the result of recent addition of "rdp->core_needs_qs = false;" in
the "if" block, now both branches of the if-else have the same
assignment.

Factor it out and reduce line count.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-03-26 14:37:49 -07:00
Liu Song
3ffe3d1adc rcu: Set rcutree.kthread_prio sysfs access to read-only
The rcutree.kthread_prio kernel-boot parameter is used to set the
priority for boost (rcub), per-CPU (rcuc), and grace-period (rcu_preempt
or rcu_sched) kthreads.  It is also used by rcutorture to check whether
it is possible to meaningfully test RCU priority boosting.  However,
all of these cases will either ignore or be confused by any post-boot
changes to rcutree.kthread_prio.

Note that the user really can change the priorities of all of these
kthreads using chrt, given sufficient privileges.  Therefore, the
read-write nature of sysfs access to rcutree.kthread_prio is thus at
best an attractive nuisance.

This commit therefore changes sysfs access to rcutree.kthread_prio to
be read-only.

Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Paul E. McKenney
884157cef0 rcu: Make exit_rcu() handle non-preempted RCU readers
The purpose of exit_rcu() is to handle cases where buggy code causes a
task to exit within an RCU read-side critical section.  It currently
does that in the case where said RCU read-side critical section was
preempted at least once, but fails to handle cases where preemption did
not occur.  This case needs to be handled because otherwise the final
context switch away from the exiting task will incorrectly behave as if
task exit were instead a preemption of an RCU read-side critical section,
and will therefore queue the exiting task.  The exiting task will have
exited, and thus won't ever execute rcu_read_unlock(), which means that
it will remain queued forever, blocking all subsequent grace periods,
and eventually resulting in OOM.

Although this is arguably better than letting grace periods proceed
and having a later rcu_read_unlock() access the now-freed task
structure that once belonged to the exiting tasks, it would obviously
be better to correctly handle this case.  This commit therefore sets
->rcu_read_lock_nesting to 1 in that case, so that the subsequence call
to __rcu_read_unlock() causes the exiting task to exit its dangling RCU
read-side critical section.

Note that deferred quiescent states need not be considered.  The reason
is that removing the task from the ->blkd_tasks[] list in the call to
rcu_preempt_deferred_qs() handles the per-task component of any deferred
quiescent state, and all other components of any deferred quiescent state
are associated with the CPU, which isn't going anywhere until some later
CPU-hotplug operation, which will report any remaining deferred quiescent
states from within the rcu_report_dead() function.

Note also that negative values of ->rcu_read_lock_nesting need not be
considered.  First, these won't show up in exit_rcu() unless there is
a serious bug in RCU, and second, setting ->rcu_read_lock_nesting sets
the state so that the RCU read-side critical section will be exited
normally.

Again, this code has no effect unless there has been some prior bug
that prevents a task from leaving an RCU read-side critical section
before exiting.  Furthermore, there have been no reports of the bug
fixed by this commit appearing in production.  This commit is therefore
absolutely -not- recommended for backporting to -stable.

Reported-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in>
Reported-by: BHARATH Y MOURYA <bharathm@iisc.ac.in>
Reported-by: Aravinda Prasad <aravinda@iisc.ac.in>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in>
2019-03-26 14:37:49 -07:00
Cyrill Gorcunov
18d7e40679 rcu: rcu_qs -- Use raise_softirq_irqoff to not save irqs twice
The rcu_qs is disabling IRQs by self so no need to do the same in raise_softirq
but instead we can save some cycles using raise_softirq_irqoff directly.

CC: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Joel Fernandes (Google)
671a63517c rcu: Avoid unnecessary softirq when system is idle
When there are no callbacks pending on an idle system, I noticed that
RCU softirq is continuously firing. During this the cpu_no_qs is set to
false, and core_needs_qs is set to true indefinitely. This causes
rcu_process_callbacks to be repeatedly called, even though the node
corresponding to the CPU has that CPU's mask bit cleared and the system
is idle. I believe the race is when such mask clearing is done during
idle CPU scan of the quiescent state forcing stage in the kthread
instead of the softirq. Since the rnp mask is cleared, but the flags on
the CPU's rdp are not cleared, the CPU thinks it still needs to report
to core RCU.

Cure this by clearing the core_needs_qs flag when the CPU detects that
its node is already updated which will avoid the unwanted softirq raises
to the benefit of real-time systems.

Test: Ran rcutorture for various tree RCU configs.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Paul E. McKenney
e85e6a21b2 rcu: Unconditionally expedite during suspend/hibernate
The rcu_pm_notify() function refuses to switch to/from expedited grace
periods on systems with more than 256 CPUs due to the serialized
initialization of expedited grace periods.  However, expedited grace
periods are now initialized in parallel, removing this concern.
This commit therefore removes the checks from rcu_pm_notify(), so that
expedited grace periods are used unconditionally during suspend/resume
and hibernate/wake operations.

As always, real-time workloads wishing to completely avoid expedited
grace periods can use the rcupdate.rcu_normal= kernel parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Linus Torvalds
203b6609e0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Lots of tooling updates - too many to list, here's a few highlights:

   - Various subcommand updates to 'perf trace', 'perf report', 'perf
     record', 'perf annotate', 'perf script', 'perf test', etc.

   - CPU and NUMA topology and affinity handling improvements,

   - HW tracing and HW support updates:
      - Intel PT updates
      - ARM CoreSight updates
      - vendor HW event updates

   - BPF updates

   - Tons of infrastructure updates, both on the build system and the
     library support side

   - Documentation updates.

   - ... and lots of other changes, see the changelog for details.

  Kernel side updates:

   - Tighten up kprobes blacklist handling, reduce the number of places
     where developers can install a kprobe and hang/crash the system.

   - Fix/enhance vma address filter handling.

   - Various PMU driver updates, small fixes and additions.

   - refcount_t conversions

   - BPF updates

   - error code propagation enhancements

   - misc other changes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
  perf script python: Add Python3 support to syscall-counts-by-pid.py
  perf script python: Add Python3 support to syscall-counts.py
  perf script python: Add Python3 support to stat-cpi.py
  perf script python: Add Python3 support to stackcollapse.py
  perf script python: Add Python3 support to sctop.py
  perf script python: Add Python3 support to powerpc-hcalls.py
  perf script python: Add Python3 support to net_dropmonitor.py
  perf script python: Add Python3 support to mem-phys-addr.py
  perf script python: Add Python3 support to failed-syscalls-by-pid.py
  perf script python: Add Python3 support to netdev-times.py
  perf tools: Add perf_exe() helper to find perf binary
  perf script: Handle missing fields with -F +..
  perf data: Add perf_data__open_dir_data function
  perf data: Add perf_data__(create_dir|close_dir) functions
  perf data: Fail check_backup in case of error
  perf data: Make check_backup work over directories
  perf tools: Add rm_rf_perf_data function
  perf tools: Add pattern name checking to rm_rf
  perf tools: Add depth checking to rm_rf
  perf data: Add global path holder
  ...
2019-03-06 07:59:36 -08:00
Linus Torvalds
3717f613f4 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU related changes in this cycle were:

   - Additional cleanups after RCU flavor consolidation

   - Grace-period forward-progress cleanups and improvements

   - Documentation updates

   - Miscellaneous fixes

   - spin_is_locked() conversions to lockdep

   - SPDX changes to RCU source and header files

   - SRCU updates

   - Torture-test updates, including nolibc updates and moving nolibc to
     tools/include"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  locking/locktorture: Convert to SPDX license identifier
  linux/torture: Convert to SPDX license identifier
  torture: Convert to SPDX license identifier
  linux/srcu: Convert to SPDX license identifier
  linux/rcutree: Convert to SPDX license identifier
  linux/rcutiny: Convert to SPDX license identifier
  linux/rcu_sync: Convert to SPDX license identifier
  linux/rcu_segcblist: Convert to SPDX license identifier
  linux/rcupdate: Convert to SPDX license identifier
  linux/rcu_node_tree: Convert to SPDX license identifier
  rcu/update: Convert to SPDX license identifier
  rcu/tree: Convert to SPDX license identifier
  rcu/tiny: Convert to SPDX license identifier
  rcu/sync: Convert to SPDX license identifier
  rcu/srcu: Convert to SPDX license identifier
  rcu/rcutorture: Convert to SPDX license identifier
  rcu/rcu_segcblist: Convert to SPDX license identifier
  rcu/rcuperf: Convert to SPDX license identifier
  rcu/rcu.h: Convert to SPDX license identifier
  RCU/torture.txt: Remove section MODULE PARAMETERS
  ...
2019-03-05 14:49:11 -08:00
Masami Hiramatsu
a39f15b964 kprobes: Prohibit probing on RCU debug routine
Since kprobe itself depends on RCU, probing on RCU debug
routine can cause recursive breakpoint bugs.

Prohibit probing on RCU debug routines.

int3
 ->do_int3()
   ->ist_enter()
     ->RCU_LOCKDEP_WARN()
       ->debug_lockdep_rcu_enabled() -> int3

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154998807741.31052.11229157537816341591.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13 08:16:40 +01:00
Masami Hiramatsu
c13324a505 x86/kprobes: Prohibit probing on functions before kprobe_int3_handler()
Prohibit probing on the functions called before kprobe_int3_handler()
in do_int3(). More specifically, ftrace_int3_handler(),
poke_int3_handler(), and ist_enter(). And since rcu_nmi_enter() is
called by ist_enter(), it also should be marked as NOKPROBE_SYMBOL.

Since those are handled before kprobe_int3_handler(), probing those
functions can cause a breakpoint recursion and crash the kernel.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154998793571.31052.11301258949601150994.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13 08:16:39 +01:00
Paul E. McKenney
e7ffb4eb9a Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', 'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD
doc.2019.01.26a:  Documentation updates.
fixes.2019.01.26a:  Miscellaneous fixes.
sil.2019.01.26a:  Removal of a few more spin_is_locked() instances.
spdx.2019.02.09a:  Add SPDX identifiers to RCU files
srcu.2019.01.26a:  SRCU updates.
torture.2019.01.26a: Torture-test updates.
2019-02-09 08:47:52 -08:00
Paul E. McKenney
38b4df649e rcu/update: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:44:27 -08:00
Paul E. McKenney
22e4092531 rcu/tree: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Update .h file SPDX comment format per Joe Perches. ]
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:44:10 -08:00
Paul E. McKenney
00de9d7415 rcu/tiny: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:44:05 -08:00
Paul E. McKenney
96b903f5da rcu/sync: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:43:59 -08:00
Paul E. McKenney
e7ee1501cd rcu/srcu: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:43:54 -08:00
Paul E. McKenney
2e24ce8852 rcu/rcutorture: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:43:46 -08:00
Paul E. McKenney
eb7935e479 rcu/rcu_segcblist: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:43:40 -08:00
Paul E. McKenney
8bf05ed3ad rcu/rcuperf: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:43:35 -08:00
Paul E. McKenney
b5b11890de rcu/rcu.h: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:43:05 -08:00
Paul E. McKenney
e838a7d66e rcuperf: Stop abusing IS_ENABLED()
The ever-evolving IS_ENABLED() macro is intended for CONFIG_* Kconfig
options, but rcuperf currently uses it for the decidedly non-CONFIG_*
MODULE macro.  In the spirit of not inviting trouble, this commit
substitutes tried-and-true #ifdef.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2019-01-25 15:37:11 -08:00
Paul E. McKenney
3a6cb58f15 rcutorture: Add grace period after CPU offline
Beyond a certain point in the CPU-hotplug offline process, timers get
stranded on the outgoing CPU, and won't fire until that CPU comes back
online, which might well be never.  This commit therefore adds a hook
in torture_onoff_init() that is invoked from torture_offline(), which
rcutorture uses to occasionally wait for a grace period.  This should
result in failures for RCU implementations that rely on stranded timers
eventually firing in the absence of the CPU coming back online.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:37:10 -08:00
Paul E. McKenney
cd618d102b rcutorture: Record grace periods in forward-progress histogram
This commit records grace periods in rcutorture's n_launders_hist[]
histogram, thus allowing rcu_torture_fwd_cb_hist() to print out the
elapsed number of grace periods between buckets.  This information
helps to determine whether a lack of forward progress is due to stalled
grace periods on the one hand or due to sluggish callback invocation on
the other.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:37:09 -08:00
Sebastian Andrzej Siewior
e81baf4cb1 srcu: Remove srcu_queue_delayed_work_on()
srcu_queue_delayed_work_on() disables preemption (and therefore CPU
hotplug in RCU's case) and then checks based on its own accounting if a
CPU is online. If the CPU is online it uses queue_delayed_work_on()
otherwise it fallbacks to queue_delayed_work().
The problem here is that queue_work() on -RT does not work with disabled
preemption.

queue_work_on() works also on an offlined CPU. queue_delayed_work_on()
has the problem that it is possible to program a timer on an offlined
CPU. This timer will fire once the CPU is online again. But until then,
the timer remains programmed and nothing will happen.

Add a local timer which will fire (as requested per delay) on the local
CPU and then enqueue the work on the specific CPU.

RCUtorture testing with SRCU-P for 24h showed no problems.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:36:42 -08:00
Paul E. McKenney
c2d8089de7 rcu: Fix obsolete DYNTICK_IRQ_NONIDLE comment
This commit updates the DYNTICK_IRQ_NONIDLE header comment to remove
the obsolete commentary about unmatched rcu_irq_{enter,exit}().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:24 -08:00
Paul E. McKenney
39abefe743 rcu: Repair rcu_nmi_exit() docbook header
This commit removes the "@irq" argument from the rcu_nmi_exit() docbook
header, given that this function now has no arguments.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:23 -08:00
Paul E. McKenney
5a0874c1d1 rcu: Remove preemption disabling from expedited CPU selection
It turns out that it is queue_delayed_work_on() rather than
queue_work_on() that has difficulties when used concurrently with
CPU-hotplug removal operations.  It is therefore unnecessary to protect
CPU identification and queue_work_on() with preempt_disable().

This commit therefore removes the preempt_disable() and preempt_enable()
from sync_rcu_exp_select_cpus(), which has the further benefit of reducing
the number of changes that must be maintained in the -rt patchset.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:23 -08:00
Paul E. McKenney
fb60e533be rcu: Rename rcu_process_callbacks() to rcu_core() for Tree RCU
Although the name rcu_process_callbacks() still makes sense for Tiny
RCU, where most of what it does is invoke callbacks, it no longer makes
much sense for Tree RCU, especially given that the actually callback
invocation is relegated to rcu_do_batch(), or, for no-CBs CPUs, to the
rcuo kthreads.  Especially in the latter case, rcu_process_callbacks()
has very little to do with actual callbacks.  A better description of
this function is that it performs RCU's core processing.

This commit therefore changes the name of Tree RCU's rcu_process_callbacks()
function to rcu_core(), which also has the virtue of being consistent with
the existing invoke_rcu_core() function.

While in the area, the header comment is reworked.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:22 -08:00
Paul E. McKenney
c98cac603f rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq()
The name rcu_check_callbacks() arguably made sense back in the early
2000s when RCU was quite a bit simpler than it is today, but it has
become quite misleading, especially with the advent of dyntick-idle
and NO_HZ_FULL.  The rcu_check_callbacks() function is RCU's hook into
the scheduling-clock interrupt, and is now but one of many ways that
callbacks get promoted to invocable state.

This commit therefore changes the name to rcu_sched_clock_irq(),
which is the same number of characters and clearly indicates this
function's relation to the rest of the Linux kernel.  In addition, for
the sake of consistency, rcu_flavor_check_callbacks() is also renamed
to rcu_flavor_sched_clock_irq().

While in the area, the header comments for both functions are reworked.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:21 -08:00
Paul E. McKenney
7a968bb26a Merge branches 'consolidate.2019.01.26a' and 'fwd.2019.01.26a' into HEAD
consolidate.2019.01.26a: RCU flavor consolidation cleanups.
fwd.2019.01.26a: RCU grace-period forward-progress fixes.
2019-01-25 15:32:01 -08:00
Zhang, Jun
13dc7d0c7a rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes()
Currently, __note_gp_changes() checks to see if the rcu_node structure's
->gp_seq_needed is greater than or equal to that of the rcu_data
structure, and if so, updates the rcu_data structure's ->gp_seq_needed
field.  This results in a useless store in the case where the two fields
are equal.

This commit therefore carries out this store only in the case where the
rcu_node structure's ->gp_seq_needed is strictly greater than that of
the rcu_data structure.

Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/88DC34334CA3444C85D647DBFA962C2735AD5F77@SHSMSX104.ccr.corp.intel.com
2019-01-25 15:30:00 -08:00
Zhang, Jun
1d1f898df6 rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
The rcu_gp_kthread_wake() function is invoked when it might be necessary
to wake the RCU grace-period kthread.  Because self-wakeups are normally
a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
this kthread, it naturally refuses to do the wakeup.

Unfortunately, natural though it might be, this heuristic fails when
rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
that interrupted the grace-period kthread just after the final check of
the wait-event condition but just before the schedule() call.  In this
case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
is within the RCU grace-period kthread's context.  Failing to provide
this wakeup can result in grace periods failing to start, which in turn
results in out-of-memory conditions.

This race window is quite narrow, but it actually did happen during real
testing.  It would of course need to be fixed even if it was strictly
theoretical in nature.

This patch does not Cc stable because it does not apply cleanly to
earlier kernel versions.

Fixes: 48a7639ce8 ("rcu: Make callers awaken grace-period kthread")
Reported-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com>
Co-developed-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "xiao, jin" <jin.xiao@intel.com>
Co-developed-by: Bai, Jie A <jie.a.bai@intel.com>
Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off: "He, Bo" <bo.he@intel.com>
Signed-off: "xiao, jin" <jin.xiao@intel.com>
Signed-off: Bai, Jie A <jie.a.bai@intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
[ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
  !in_serving_softirq() to avoid redundant wakeups and to also handle the
  interrupt-handler scenario as well as the softirq-handler scenario that
  actually occurred in testing. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com
2019-01-25 15:29:59 -08:00
Paul E. McKenney
2ccaff10f7 rcu: Add sysrq rcu_node-dump capability
Life is hard if RCU manages to get stuck without triggering RCU CPU
stall warnings or triggering the rcu_check_gp_start_stall() checks
for failing to start a grace period.  This commit therefore adds a
boot-time-selectable sysrq key (commandeering "y") that allows manually
dumping Tree RCU state.  The new rcutree.sysrq_rcu kernel boot parameter
must be set for this sysrq to be available.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:59 -08:00
Paul E. McKenney
3b6505fd8e rcu: Protect rcu_check_gp_kthread_starvation() access to ->gp_flags
The rcu_check_gp_kthread_starvation() function can be invoked without
holding locks, so the access to the rcu_state structure's ->gp_flags
field must be protected with READ_ONCE().  This commit therefore adds
this protection.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:58 -08:00
Paul E. McKenney
fd897573fa rcu: Improve diagnostics for failed RCU grace-period start
If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time.  This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:57 -08:00
Paul E. McKenney
a9fefdb257 rcu: Update NOCB comments
This commit updates a few obsolete comments in the RCU callback-offload
code.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:57 -08:00
Paul E. McKenney
b2c1955b88 rcu: Remove unused rcu_cpu_kthread_cpu per-CPU variable
The rcu_cpu_kthread_cpu used to provide debugfs information, but is no
longer used.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:56 -08:00
Paul E. McKenney
f7e972ee12 rcu: Move rcu_cpu_has_work to rcu_data structure
Given that RCU has a perfectly good per-CPU rcu_data structure, most
per-CPU quantities should be stored there.

This commit therefore moves the rcu_cpu_has_work per-CPU variable to
the rcu_data structure.  This also makes this variable unconditionally
present, which should be acceptable given the memory reduction due to the
RCU flavor consolidation and also due to simplifications this will enable.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:56 -08:00
Paul E. McKenney
8b4d0f4858 rcu: Remove unused rcu_cpu_kthread_loops per-CPU variable
The rcu_cpu_kthread_loops variable used to provide debugfs information,
but is no longer used.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:55 -08:00
Paul E. McKenney
6ffdde28b7 rcu: Move rcu_cpu_kthread_status to rcu_data structure
Given that RCU has a perfectly good per-CPU rcu_data structure, most
per-CPU quantities should be stored there.

This commit therefore moves the rcu_cpu_kthread_status per-CPU variable
to the rcu_data structure.  This also makes this variable unconditionally
present, which should be acceptable given the memory reduction due to the
RCU flavor consolidation and also due to simplifications this will enable.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:54 -08:00
Paul E. McKenney
37f62d7cf0 rcu: Move rcu_cpu_kthread_task to rcu_data structure
Given that RCU has a perfectly good per-CPU rcu_data structure, most
per-CPU quantities should be stored there.

This commit therefore moves the rcu_cpu_kthread_task per-CPU variable to
the rcu_data structure.  This also makes this variable unconditionally
present, which should be acceptable given the memory reduction due to the
RCU flavor consolidation and also due to simplifications this will enable.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:53 -08:00
Paul E. McKenney
9cf422a8e7 rcu: Accommodate zero jiffies_till_first_fqs and kthread kicking
It is perfectly fine to set the rcutree.jiffies_till_first_fqs boot
parameter to zero, in fact, this can be useful on specialty systems that
usually have at least one idle CPU and that need fast grace periods.
This is because this setting causes the RCU grace-period kthread to
scan for idle threads immediately after grace-period initialization,
as opposed to waiting several jiffies to do so.

It is also perfectly fine to set the rcutree.rcu_kick_kthreads kernel
parameter, which gives the RCU grace-period kthread an extra wakeup
if it doesn't make progress for a period of three times the setting of
the rcutree.jiffies_till_first_fqs boot parameter.  This is of course
problematic when the value of this parameter is zero, as it can result
in unnecessary wakeup IPIs along with unnecessary WARN_ONCE() invocations.

This commit therefore defers kthread kicking for at least two jiffies,
regardless of the setting of rcutree.jiffies_till_first_fqs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:53 -08:00
Paul E. McKenney
260e1e4fd8 rcu: Discard separate per-CPU callback counts
Back when there were multiple flavors of RCU, it was necessary to
separately count lazy and non-lazy callbacks for each CPU.  These counts
were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly
idle CPU should be allowed to sleep before handling its RCU callbacks.
But now that there is only one flavor, the callback counts for a given
CPU's sole rcu_data structure are the counts for that CPU.

This commit therefore removes the rcu_data structure's ->nonlazy_posted
and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted()
and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's
->all_lazy field to record the laziness state at the beginning of the
latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall
warnings accordingly.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:30 -08:00
Paul E. McKenney
8923072664 rcu: Inline _synchronize_rcu_expedited() into synchronize_rcu_expedited()
Now that _synchronize_rcu_expedited() has only one caller, and given that
this is a tail call, this commit inlines _synchronize_rcu_expedited()
into synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:29 -08:00
Paul E. McKenney
e5bc3af773 rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu()
Now that rcu_blocking_is_gp() makes the correct immediate-return
decision for both PREEMPT and !PREEMPT, a single implementation of
synchronize_rcu() will work correctly under both configurations.
This commit therefore eliminates a few lines of code by consolidating
the two implementations of synchronize_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:28 -08:00
Paul E. McKenney
3cd4ca47aa rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu_expedited()
The CONFIG_PREEMPT=n and CONFIG_PREEMPT=y implementations of
synchronize_rcu_expedited() are quite similar, and with small
modifications to rcu_blocking_is_gp() can be made identical.  This commit
therefore makes this change in order to save a few lines of code and to
reduce the amount of duplicate code.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:27 -08:00
Paul E. McKenney
142d106d5e rcu: Determine expedited-GP IPI handler at build time
Back when there could be multiple RCU flavors running in the same kernel
at the same time, it was necessary to specify the expedited grace-period
IPI handler at runtime.  Now that there is only one RCU flavor, the
IPI handler can be determined at build time.  There is therefore no
longer any reason for the RCU-preempt and RCU-sched IPI handlers to
have different names, nor is there any reason to pass these handlers in
function arguments and in the data structures enclosing workqueues.

This commit therefore makes all these changes, pushing the specification
of the expedited grace-period IPI handler down to the point of use.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:27 -08:00
Paul E. McKenney
c46f497a61 rcu: Inline rcu_kthread_do_work() into its sole remaining caller
The rcu_kthread_do_work() function has a single-line body and only one
remaining caller.  This commit therefore saves a few lines of code by
inlining rcu_kthread_do_work() into its sole remaining caller.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:25 -08:00
Paul E. McKenney
c97058d033 rcu: Eliminate RCU_BH_FLAVOR and RCU_SCHED_FLAVOR
Now that the RCU flavors have been consolidated, RCU_BH_FLAVOR and
RCU_SCHED_FLAVOR are no longer used.  This commit therefore saves a
few lines by removing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:25 -08:00
Paul E. McKenney
cd920e5a34 rcu: Inline force_quiescent_state() into rcu_force_quiescent_state()
Given that rcu_force_quiescent_state() is a simple wrapper around
force_quiescent_state(), this commit saves a few lines of code by
inlining force_quiescent_state() into rcu_force_quiescent_state(),
and changing all references to force_quiescent_state() to instead
invoke rcu_force_quiescent_state().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:24 -08:00
Paul E. McKenney
1de462ed85 rcu: Make expedited IPI handler return after handling critical section
During expedited RCU grace-period initialization, IPIs are sent to
all non-idle online CPUs.  The IPI handler checks to see if the CPU is
in quiescent state, reporting one if so.  This handler looks at three
different cases: (1) The CPU is not in an rcu_read_lock()-based critical
section, (2) The CPU is in the process of exiting an rcu_read_lock()-based
critical section, and (3) The CPU is in an rcu_read_lock()-based critical
section.  In case (2), execution falls through into case (3).

This is harmless from a functionality viewpoint, but can result in
needless overhead during an improbable corner case.  This commit therefore
adds the "return" statement needed to prevent fall-through.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:24 -08:00
Paul E. McKenney
ad368d15b0 rcu: Rename and comment changes due to only one rcuo kthread per CPU
Given RCU flavor consolidation, the name rcu_spawn_all_nocb_kthreads()
is quite misleading.  It no longer ever creates more than one kthread,
and it does so only for the specified CPU.  This commit therefore changes
this name to the more descriptive rcu_spawn_cpu_nocb_kthread(), and also
fixes up a similar issue in its header comment while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:23 -08:00
Paul E. McKenney
a4cffdad73 time: Move CONTEXT_TRACKING to kernel/time/Kconfig
Both CONTEXT_TRACKING and CONTEXT_TRACKING_FORCE are currently defined
in kernel/rcu/kconfig, which might have made sense at some point, but
no longer does given that RCU refers to neither of these Kconfig options.

Therefore move them to kernel/time/Kconfig, where the rest of the
NO_HZ_FULL Kconfig options live.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://lkml.kernel.org/r/20181220170525.GA12579@linux.ibm.com
2019-01-15 11:16:41 +01:00
Paul E. McKenney
5ac7cdc298 rcutorture: Don't do busted forward-progress testing
The "busted" rcutorture type is an intentionally broken implementation
of RCU.  Doing forward-progress testing on this implementation is not
particularly meaningful on the one hand and can result in fatal abuse
of the memory allocator on the other.  This commit therefore disables
forward-progress testing of the "busted" rcutorture type.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:42 -08:00
Paul E. McKenney
2e57bf97a6 rcutorture: Use 100ms buckets for forward-progress callback histograms
This commit narrows the scope of each bucket of the forward-progress
callback-invocation histograms from one second to 100 milliseconds, which
aids debugging of forward-progress problems by making shorter-duration
callback-invocation stalls visible.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:41 -08:00
Paul E. McKenney
2667ccce93 rcutorture: Recover from OOM during forward-progress tests
This commit causes the OOM handler to do rcu_barrier() calls and to
free up forward-progress callbacks in order to recover from OOM events.
The current test is terminated, but subsequent forward-progress tests can
proceed.  This allows a long test to result in multiple forward-progress
failures, greatly reducing the required testing time.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:41 -08:00
Paul E. McKenney
73d665b141 rcutorture: Print forward-progress test age upon failure
This commit prints the age of the forward-progress test in jiffies,
in order to allow better interpretation of the callback-invocation
histograms.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:40 -08:00
Paul E. McKenney
c51d7b5e6c rcutorture: Print time since GP end upon forward-progress failure
If rcutorture's forward-progress tests fail while a grace period is not
in progress, it is useful to print the time since the last grace period
ended as a way to detect failure to launch a new grace period.  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:40 -08:00
Paul E. McKenney
1a682754c7 rcutorture: Print histogram of CB invocation at OOM time
One reason why a forward-progress test might fail would be if something
prevented or delayed callback invocation.  This commit therefore adds a
callback-invocation histogram printout when OOM is reported to rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:39 -08:00
Paul E. McKenney
8dd3b54689 rcutorture: Print GP age upon forward-progress failure
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:38 -08:00
Paul E. McKenney
bfcfcffc5f rcu: Print per-CPU callback counts for forward-progress failures
This commit prints out the non-zero per-CPU callback counts when a
forware-progress error (OOM event) occurs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix a pair of uninitialized locals spotted by kbuild test robot. ]
2018-12-01 12:45:38 -08:00
Paul E. McKenney
903ee83d91 rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
The RCU CPU stall warnings print an estimate of the total number of
RCU callbacks queued in the system, but this estimate leaves out
the callbacks queued for nocbs CPUs.  This commit therefore introduces
rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for
both nocbs and normal CPUs, and uses this new function as needed.

This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function
that returns the number of callbacks for nocbs CPUs or zero otherwise,
and also uses this function in place of direct access to ->nocb_q_count
while in the area (fewer characters, you see).

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:37 -08:00
Paul E. McKenney
e0aff97355 rcutorture: Dump grace-period diagnostics upon forward-progress OOM
This commit adds an OOM notifier during rcutorture forward-progress
testing.  If this notifier is invoked, it dumps out some grace-period
state to help debug the forward-progress problem.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:36 -08:00
Paul E. McKenney
61670adcb4 rcutorture: Prepare for asynchronous access to rcu_fwd_startat
Because rcutorture's forward-progress checking will trigger from an
OOM notifier, this notifier will introduce asynchronous concurrent
access to the rcu_fwd_startat variable.  This commit therefore prepares
for this by converting updates to WRITE_ONCE().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:35 -08:00
Paul E. McKenney
5ab7ab8362 rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
This commit affinities the forward-progress tests to avoid hogging a
housekeeping CPU on the theory that the offloaded callbacks will be
running on those housekeeping CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix NULL-pointer issue located by kbuild test robot. ]
Tested-by: Rong Chen <rong.a.chen@intel.com>
2018-12-01 12:45:34 -08:00
Paul E. McKenney
6b3de7a172 rcutorture: Break up too-long rcu_torture_fwd_prog() function
This commit splits rcu_torture_fwd_prog_nr() and rcu_torture_fwd_prog_cr()
functions out of rcu_torture_fwd_prog() in order to reduce indentation
pain and because rcu_torture_fwd_prog() was getting a bit too long.
In addition, this will enable easier conditional execution of the
rcu_torture_fwd_prog_cr() function, which can give false-positive
failures in some NO_HZ_FULL configurations due to overloading the
housekeeping CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:34 -08:00
Paul E. McKenney
fc6f9c5778 rcutorture: Remove cbflood facility
Now that the forward-progress code does a full-bore continuous callback
flood lasting multiple seconds, there is little point in also posting a
mere 60,000 callbacks every second or so.  This commit therefore removes
the old cbflood testing.  Over time, it may be desirable to concurrently
do full-bore continuous callback floods on all CPUs simultaneously, but
one dragon at a time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:33 -08:00
Paul E. McKenney
4871848531 rcutorture: Add call_rcu() flooding forward-progress tests
This commit adds a call_rcu() flooding loop to the forward-progress test.
This emulates tight userspace loops that force call_rcu() invocations,
for example, the infamous loop containing close(open()) that instigated
the addition of blimit.  If RCU does not make sufficient forward progress
in invoking the resulting flood of callbacks, rcutorture emits a warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:32 -08:00
Paul E. McKenney
eaaf055f27 Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', 'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD
bug.2018.11.12a:  Get rid of BUG_ON() and friends
consolidate.2018.12.01a:  Continued RCU flavor-consolidation cleanup
doc.2018.11.12a:  Documentation updates
fixes.2018.11.12a:  Miscellaneous fixes
initrd.2018.11.08b:  Automate creation of rcutorture initrd
sil.2018.11.12a:  Remove more spin_unlock_wait() calls
2018-12-01 12:43:16 -08:00
Paul E. McKenney
aacb5d91ab srcu: Use "ssp" instead of "sp" for srcu_struct pointer
In RCU, the distinction between "rsp", "rnp", and "rdp" has served well
for a great many years, but in SRCU, "sp" vs. "sdp" has proven confusing.
This commit therefore renames SRCU's "sp" pointers to "ssp", so that there
is "ssp" for srcu_struct pointer, "snp" for srcu_node pointer, and "sdp"
for srcu_data pointer.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-27 09:24:17 -08:00
Dennis Krein
eb4c238227 srcu: Lock srcu_data structure in srcu_gp_start()
The srcu_gp_start() function is called with the srcu_struct structure's
->lock held, but not with the srcu_data structure's ->lock.  This is
problematic because this function accesses and updates the srcu_data
structure's ->srcu_cblist, which is protected by that lock.  Failing to
hold this lock can result in corruption of the SRCU callback lists,
which in turn can result in arbitrarily bad results.

This commit therefore makes srcu_gp_start() acquire the srcu_data
structure's ->lock across the calls to rcu_segcblist_advance() and
rcu_segcblist_accelerate(), thus preventing this corruption.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reported-by: Sebastian Kuzminsky <seb.kuzminsky@gmail.com>
Signed-off-by: Dennis Krein <Dennis.Krein@netapp.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Dennis Krein <Dennis.Krein@netapp.com>
Cc: <stable@vger.kernel.org> # 4.16.x
2018-11-27 09:23:57 -08:00
Paul E. McKenney
5f1a6ef374 rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
Subtracting INT_MIN can be interpreted as unconditional signed integer
overflow, which according to the C standard is undefined behavior.
Therefore, kernel build arguments notwithstanding, it would be good to
future-proof the code.  This commit therefore substitutes INT_MAX for
INT_MIN in order to avoid undefined behavior.

While in the neighborhood, this commit also creates some meaningful names
for INT_MAX and friends in order to improve readability, as suggested
by Joel Fernandes.

Reported-by: Ran Rozenstein <ranro@mellanox.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
117f683c6e rcu: Replace this_cpu_ptr() with __this_cpu_read()
Because __this_cpu_read() can be lighter weight than equivalent uses of
this_cpu_ptr(), this commit replaces the latter with the former.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
05f415715c rcu: Speed up expedited GPs when interrupting RCU reader
In PREEMPT kernels, an expedited grace period might send an IPI to a
CPU that is executing an RCU read-side critical section.  In that case,
it would be nice if the rcu_read_unlock() directly interacted with the
RCU core code to immediately report the quiescent state.  And this does
happen in the case where the reader has been preempted.  But it would
also be a nice performance optimization if immediate reporting also
happened in the preemption-free case.

This commit therefore adds an ->exp_hint field to the task_struct structure's
->rcu_read_unlock_special field.  The IPI handler sets this hint when
it has interrupted an RCU read-side critical section, and this causes
the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
which, if preemption is enabled, reports the quiescent state immediately.
If preemption is disabled, then the report is required to be deferred
until preemption (or bottom halves or interrupts or whatever) is re-enabled.

Because this is a hint, it does nothing for more complicated cases.  For
example, if the IPI interrupts an RCU reader, but interrupts are disabled
across the rcu_read_unlock(), but another rcu_read_lock() is executed
before interrupts are re-enabled, the hint will already have been cleared.
If you do crazy things like this, reporting will be deferred until some
later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
0a89e5a402 rcu: Trace end of grace period before end of grace period
Currently, rcu_gp_cleanup() traces the end of the old grace period after
the old grace period has officially ended.  This might make intuitive
sense, but it also makes for confusing event-trace output because the
"end" trace displays not the old but instead the new grace-period number.
This commit therefore traces the end of an old grace period just before
that grace period officially ends.

Reported-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Zhouyi Zhou
2320bda26d rcu: Adjust the comment of function rcu_is_watching
Because RCU avoids interrupting idle CPUs, rcu_is_watching() is used to
test whether or not it is currently legal to run RCU read-side critical
sections on this CPU.  However, the first sentence and last sentences
of current comment for rcu_is_watching have opposite meaning of what
is expected.  This commit therefore fixes this header comment.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
c669c014d1 rcu: Add jiffies-since-GP-activity to show_rcu_gp_kthreads()
This commit adds a printout of the number of jiffies since the last time
that the RCU grace-period kthread did any processing.  This can be useful
when tracking down forward-progress issues.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
691960197e rcu: Add state name to show_rcu_gp_kthreads() output
This commit adds the name of the RCU grace-period state to
the show_rcu_gp_kthreads() output in order to ease debugging.
This commit also moves gp_state_getname() up in the code so that
show_rcu_gp_kthreads() can use it.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
791416c471 rcu: Parameterize rcu_check_gp_start_stall()
In order to debug forward-progress stalls, it is necessary to check
for excessively delayed grace-period starts.  This is currently done
for RCU CPU stall warnings by rcu_check_gp_start_stall(), which checks
to see if the start of a requested grace period has been delayed by an
RCU CPU stall warning period.  Because rcutorture will need to check
for the time consumed by an RCU forward-progress delay, this commit
promotes gpssdelay from a local variable to a formal parameter.  It is
not necessary to export rcu_check_gp_start_stall() because rcutorture
will access it via a wrapper function.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
b3c1d9ec7c rcu: Avoid double multiply by HZ
The rcu_check_gp_start_stall() function multiplies the return value
from rcu_jiffies_till_stall_check() by HZ, but the units are already
in jiffies.  This commit therefore avoids the need for introduction of
a jiffies-squared unit by removing the extraneous multiplication.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
f0ad56e876 rcu: Eliminate BUG_ON() for kernel/rcu/update.c
The update.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output.  This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 08:15:59 -08:00
Paul E. McKenney
9213784b48 rcu: Eliminate BUG_ON() for kernel/rcu/tree_plugin.h
The tree_plugin.h file has a number of calls to BUG_ON(), which panics
the kernel, which is not a good strategy for devices (like embedded)
that don't have a way to capture console output.  This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix typo: s/rcuo/rcub/. ]
2018-11-12 08:15:16 -08:00
Paul E. McKenney
9cac83a57e rcu: Stop expedited grace periods from relying on stop-machine
The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
to prevent the cpu_online_mask from changing.  However, this relies on
the stop-machine mechanism in the CPU-hotplug offline code, which is not
desirable (it would be good to someday remove the stop-machine mechanism).

This commit therefore instead uses the relevant leaf rcu_node structure's
->ffmask, which has a bit set for all CPUs that are fully functional.
A given CPU's bit is cleared very early during offline processing by
rcutree_offline_cpu() and set very late during online processing by
rcutree_online_cpu().  Therefore, if a CPU's bit is set in this mask, and
preemption is disabled, we have to be before the synchronize_sched() in
the CPU-hotplug offline code, which means that the CPU is guaranteed to be
workqueue-ready throughout the duration of the enclosing preempt_disable()
region of code.

This also has the side-effect of using WORK_CPU_UNBOUND if all the CPUs for
this leaf rcu_node structure are offline, which is an acceptable difference
in behavior.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-11-11 11:23:01 -08:00
Paul E. McKenney
0607ba8403 srcu: Prevent __call_srcu() counter wrap with read-side critical section
Ever since cdf7abc461 ("srcu: Allow use of Tiny/Tree SRCU from
both process and interrupt context"), it has been permissible
to use SRCU read-side critical sections in interrupt context.
This allows __call_srcu() to use SRCU read-side critical sections to
prevent a new SRCU grace period from ending before the call to either
srcu_funnel_gp_start() or srcu_funnel_exp_start completes, thus preventing
SRCU grace-period counter overflow during that time.

Note that this does not permit removal of the counter-wrap checks in
srcu_gp_end().  These check are necessary to handle the case where
a given CPU does not interact at all with SRCU for an extended time
period.

This commit therefore adds an SRCU read-side critical section to
__call_srcu() in order to prevent grace period counter wrap during
the funnel-locking process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-11-08 21:54:14 -08:00
Paul E. McKenney
d3ff3891b2 rcu: Consolidate the RCU update functions invoked by sync.c
This commit retains all the various gp_ops[] entries, but makes their
update functions all be synchronize_rcu(), call_rcu() and rcu_barrier().
The read-side checks remain consistent with the various RCU flavors,
which still exist on the read side.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2018-11-08 21:43:20 -08:00
Paul E. McKenney
309ba859b9 rcu: Eliminate synchronize_rcu_mult()
Now that synchronize_rcu() waits for both RCU read-side critical
sections and preempt-disabled regions of code, the sole caller of
synchronize_rcu_mult() can be replaced by synchronize_rcu().
This patch makes this change and removes synchronize_rcu_mult().
Note that _wait_rcu_gp() still supports synchronize_rcu_mult(),
and thus might be simplified in the future to take only take
a single call_rcu() function rather than the current list of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-11-08 21:43:20 -08:00
Joel Fernandes (Google)
adbccddb4a rcu: Fix rcu_{node,data} comments about gp_seq_needed
Recent changes have removed the old ->gp_seq_needed field from the
rcu_state structure, which in turn obsoleted a couple of comments in
the rcu_node and rcu_data structures.  This commit therefore updates
these comments accordingly.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-08 21:43:20 -08:00
Joel Fernandes (Google)
75a8f72245 rcu: Remove unused rcu_state externs
The rcu_bh_state and rcu_sched_state variables were removed during the
RCU flavor consolidations, but external declarations remain in tree.h.
This commit therefore removes these obsolete declarations.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-08 21:43:20 -08:00
Paul E. McKenney
08543bda42 rcu: Eliminate BUG_ON() for kernel/rcu/tree.c
The tree.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output.  This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-08 21:41:57 -08:00
Paul E. McKenney
042d4c70a2 rcu: Eliminate BUG_ON() for sync.c
The sync.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output.  This commit therefore
changes these BUG_ON() calls to WARN_ON_ONCE(), but does so quite naively.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2018-11-08 21:41:57 -08:00
Paul E. McKenney
b56ada1209 Merge branches 'doc.2018.08.30a', 'dynticks.2018.08.30b', 'srcu.2018.08.30b' and 'torture.2018.08.29a' into HEAD
doc.2018.08.30a: Documentation updates
dynticks.2018.08.30b: RCU flavor consolidation updates and cleanups
srcu.2018.08.30b: SRCU updates
torture.2018.08.29a: Torture-test updates
2018-08-30 16:12:53 -07:00
Paul E. McKenney
4e6ea4ef56 srcu: Make early-boot call_srcu() reuse workqueue lists
Allocating a list_head structure that is almost never used, and, when
used, is used only during early boot (rcu_init() and earlier), is a bit
wasteful.  This commit therefore eliminates that list_head in favor of
the one in the work_struct structure.  This is safe because the work_struct
structure cannot be used until after rcu_init() returns.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:10:49 -07:00
Paul E. McKenney
e0fcba9ac0 srcu: Make call_srcu() available during very early boot
Event tracing is moving to SRCU in order to take advantage of the fact
that SRCU may be safely used from idle and even offline CPUs.  However,
event tracing can invoke call_srcu() very early in the boot process,
even before workqueue_init_early() is invoked (let alone rcu_init()).
Therefore, call_srcu()'s attempts to queue work fail miserably.

This commit therefore detects this situation, and refrains from attempting
to queue work before rcu_init() time, but does everything else that it
would have done, and in addition, adds the srcu_struct to a global list.
The rcu_init() function now invokes a new srcu_init() function, which
is empty if CONFIG_SRCU=n.  Otherwise, srcu_init() queues work for
each srcu_struct on the list.  This all happens early enough in boot
that there is but a single CPU with interrupts disabled, which allows
synchronization to be dispensed with.

Of course, the queued work won't actually be invoked until after
workqueue_init() is invoked, which happens shortly after the scheduler
is up and running.  This means that although call_srcu() may be invoked
any time after per-CPU variables have been set up, there is still a very
narrow window when synchronize_srcu() won't work, and this window
extends from the time that the scheduler starts until the time that
workqueue_init() returns.  This can be fixed in a manner similar to
the fix for synchronize_rcu_expedited() and friends, but until someone
actually needs to use synchronize_srcu() during this window, this fix
is added churn for no benefit.

Finally, note that Tree SRCU's new srcu_init() function invokes
queue_work() rather than the queue_delayed_work() function that is
invoked post-boot.  The reason is that queue_delayed_work() will (as you
would expect) post a timer, and timers have not yet been initialized.
So use of queue_work() avoids the complaints about use of uninitialized
spinlocks that would otherwise result.  Besides, some delay is already
provide by the aforementioned fact that the queued work won't actually
be invoked until after the scheduler is up and running.

Requested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:10:19 -07:00
Mike Galbraith
894d45bbf7 rcu: Convert rcu_state.ofl_lock to raw_spinlock_t
1e64b15a4b ("rcu: Fix grace-period hangs due to race with CPU offline")
added spinlock_t ofl_lock to the rcu_state structure, then takes it with
preemption disabled during CPU offline, which gives the -rt patchset's
sleeping spinlock heartburn.

This commit therefore converts ->ofl_lock to raw_spinlock_t.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2018-08-30 16:03:54 -07:00
Paul E. McKenney
8d8a9d0e7e rcu: Remove obsolete ->dynticks_fqs and ->cond_resched_completed
The rcu_data structure's ->dynticks_fqs is incremented but never
accesses.  Its ->cond_resched_completed field isn't used at all.
This commit therefore removes both fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:53 -07:00
Paul E. McKenney
dc5a4f2932 rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks
This commit move ->dynticks from the rcu_dynticks structure to the
rcu_data structure, replacing the field of the same name.  It also updates
the code to access ->dynticks from the rcu_data structure and to use the
rcu_data structure rather than following to now-gone ->dynticks field
to the now-gone rcu_dynticks structure.  While in the area, this commit
also fixes up comments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:52 -07:00
Paul E. McKenney
4c5273bf2b rcu: Switch dyntick nesting counters to rcu_data structure
This commit removes ->dynticks_nesting and ->dynticks_nmi_nesting from
the rcu_dynticks structure and updates the code to access them from the
rcu_data structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:51 -07:00
Paul E. McKenney
2dba13f0b6 rcu: Switch urgent quiescent-state requests to rcu_data structure
This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the
rcu_dynticks structure and updates the code to access them from the
rcu_data structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:50 -07:00
Paul E. McKenney
c458a89e96 rcu: Switch lazy counts to rcu_data structure
This commit removes ->all_lazy, ->nonlazy_posted and ->nonlazy_posted_snap
from the rcu_dynticks structure and updates the code to access them from
the rcu_data structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:49 -07:00
Paul E. McKenney
5998a75adb rcu: Switch last accelerate/advance to rcu_data structure
This commit removes ->last_accelerate and ->last_advance_all from the
rcu_dynticks structure and updates the code to access them from the
rcu_data structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:48 -07:00
Paul E. McKenney
0fd79e7521 rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure
This commit removes ->tick_nohz_enabled_snap from the rcu_dynticks
structure and updates the code to access it from the rcu_data
structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:47 -07:00
Paul E. McKenney
cc72046cc3 rcu: Merge rcu_dynticks structure into rcu_data structure
Now that there is only ever one rcu_data structure per CPU, there is no
need for a separate rcu_dynticks structure.  This commit therefore adds
the rcu_dynticks fields into the rcu_data structure in preparation for
removing the rcu_dynticks structure entirely.  Note that the ->dynticks
field will be handled specially because there is a field by that name
in both structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:47 -07:00
Paul E. McKenney
df63fa5bc1 rcu: Convert "1UL << x" to "BIT(x)"
This commit saves a few characters by converting "1UL << x" to "BIT(x)".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:46 -07:00
Paul E. McKenney
fced9c8cfe rcu: Avoid resched_cpu() when rescheduling the current CPU
The resched_cpu() interface is quite handy, but it does acquire the
specified CPU's runqueue lock, which does not come for free.  This
commit therefore substitutes the following when directing resched_cpu()
at the current CPU:

	set_tsk_need_resched(current);
	set_preempt_need_resched();

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2018-08-30 16:03:45 -07:00
Paul E. McKenney
d3052109c0 rcu: More aggressively enlist scheduler aid for nohz_full CPUs
Because nohz_full CPUs can leave the scheduler-clock interrupt disabled
even when in kernel mode, RCU cannot rely on rcu_check_callbacks() to
enlist the scheduler's aid in extracting a quiescent state from such CPUs.
This commit therefore more aggressively uses resched_cpu() on nohz_full
CPUs that fail to pass through a quiescent state in a timely manner.
By default, the resched_cpu() beating starts 300 milliseconds into the
quiescent state.

While in the neighborhood, add a ->last_fqs_resched field to the rcu_data
structure in order to rate-limit resched_cpu() calls from the RCU
grace-period kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:44 -07:00
Paul E. McKenney
c06aed0e31 rcu: Compute jiffies_till_sched_qs from other kernel parameters
The jiffies_till_sched_qs value used to determine how old a grace period
must be before RCU enlists the help of the scheduler to force a quiescent
state on the holdout CPU.  Currently, this defaults to HZ/10 regardless of
system size and may be set only at boot time.  This can be a problem for
very large systems, because if the values of the jiffies_till_first_fqs
and jiffies_till_next_fqs kernel parameters are left at their defaults,
they are calculated to increase as the number of CPUs actually configured
on the system increases.  Thus, on a sufficiently large system, RCU would
enlist the help of the scheduler before the grace-period kthread had a
chance to scan for idle CPUs, which wastes CPU time.

This commit therefore allows jiffies_till_sched_qs to be set, if desired,
but if left as default, computes is as jiffies_till_first_fqs plus twice
jiffies_till_next_fqs, thus allowing three force-quiescent-state scans
for idle CPUs.  This scales with the number of CPUs, providing sensible
default values.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:43 -07:00
Paul E. McKenney
74de6960c9 rcu: Provide functions for determining if call_rcu() has been invoked
This commit adds rcu_head_init() and rcu_head_after_call_rcu() functions
to help RCU users detect when another CPU has passed the specified
rcu_head structure and function to call_rcu().  The rcu_head_init()
should be invoked before making the structure visible to RCU readers,
and then the rcu_head_after_call_rcu() may be invoked from within
an RCU read-side critical section on an rcu_head structure that
was obtained during a traversal of the data structure in question.
The rcu_head_after_call_rcu() function will return true if the rcu_head
structure has already been passed (with the specified function) to
call_rcu(), otherwise it will return false.

If rcu_head_init() has not been invoked on the rcu_head structure
or if the rcu_head (AKA callback) has already been invoked, then
rcu_head_after_call_rcu() will do WARN_ON_ONCE().

Reported-by: NeilBrown <neilb@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply neilb naming feedback. ]
2018-08-30 16:03:42 -07:00
Paul E. McKenney
7e28c5af4e rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure
The ->rcu_qs_ctr counter was intended to allow providing a lightweight
report of a quiescent state to all RCU flavors.  But now that there is
only one flavor of RCU in any one running kernel, there is no point in
having this feature.  This commit therefore removes the ->rcu_qs_ctr
field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field
from the rcu_data structure.  This results in the "rqc" option to the
rcu_fqs trace event no longer being used, so this commit also removes the
"rqc" description from the header comment.

While in the neighborhood, this commit also causes the forward-progress
request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval
later in the grace period than the first setting of .rcu_urgent_qs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:42 -07:00
Paul E. McKenney
c5bacd9417 rcu: Motivate Tiny RCU forward progress
If a long-running CPU-bound in-kernel task invokes call_rcu(), the
callback won't be invoked until the next context switch.  If there are
no other runnable tasks (which is not an uncommon situation on deep
embedded systems), the callback might never be invoked.

This commit therefore causes rcu_check_callbacks() to ask the scheduler
for a context switch if there are callbacks posted that are still waiting
for a grace period.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:41 -07:00
Paul E. McKenney
c116dba68d rcutorture: Dump reader protection sequence if failures or close calls
Now that RCU can have readers with multiple segments, it is quite
possible that a specific sequence of reader segments might result in
an rcutorture failure (reader spans a full grace period as detected
by one of the grace-period primitives) or an rcutorture close call
(reader potentially spans a full grace period based on reading out
the RCU implementation's grace-period counter, but with no ordering).
In such cases, it would clearly ease debugging if the offending specific
sequence was known.  For the first reader encountering a failure or a
close call, this commit therefore dumps out the segments, delay durations,
and whether or not the reader was preempted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Mark variables static, as suggested by kbuild test robot. ]
2018-08-30 16:03:40 -07:00
Paul E. McKenney
a0ef9ec241 rcu: Provide improved interrupt-from-idle check in rcu_check_callbacks()
The patch making need_resched() respond to urgent RCU-QS needs used
is_idle_task(current) to detect an interrupt from idle, which does work
reasonably, but is (in theory at least) vulnerable to loops containing
need_resched() invoked from within RCU_NONIDLE() or its tracepoint
equivalent.  This commit therefore moves rcu_is_cpu_rrupt_from_idle()
to a place from which rcu_check_callbacks() can invoke it and replaces
the is_idle_task(current) with rcu_is_cpu_rrupt_from_idle().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:39 -07:00
Paul E. McKenney
92aa39e9dc rcu: Make need_resched() respond to urgent RCU-QS needs
The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
need for an RCU quiescent state from the force-quiescent-state processing
within the grace-period kthread to context switches and to cond_resched().
Unfortunately, such urgent needs are not communicated to need_resched(),
which is sometimes used to decide when to invoke cond_resched(), for
but one example, within the KVM vcpu_run() function.  As of v4.15, this
can result in synchronize_sched() being delayed by up to ten seconds,
which can be problematic, to say nothing of annoying.

This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
rcu_check_callbacks(), which is invoked from the scheduling-clock
interrupt handler.  If the current task is not an idle task and is
not executing in usermode, a context switch is forced, and either way,
the rcu_dynticks.rcu_urgent_qs variable is set to false.  If the current
task is an idle task, then RCU's dyntick-idle code will detect the
quiescent state, so no further action is required.  Similarly, if the
task is executing in usermode, other code in rcu_check_callbacks() and
its called functions will report the corresponding quiescent state.

Reported-by: Marius Hillenbrand <mhillenb@amazon.de>
Reported-by: David Woodhouse <dwmw2@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:39 -07:00
Paul E. McKenney
dd46a7882c rcu: Inline _rcu_barrier() into its sole remaining caller
Because rcu_barrier() is a one-line wrapper function for _rcu_barrier()
and because nothing else calls _rcu_barrier(), this commit inlines
_rcu_barrier() into rcu_barrier().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:39 -07:00
Paul E. McKenney
395a2f097e rcu: Define rcu_all_qs() only in !PREEMPT builds
Now that rcu_all_qs() is used only in !PREEMPT builds, move it to
tree_plugin.h so that it is defined only in those builds.  This in
turn means that rcu_momentary_dyntick_idle() is only used in !PREEMPT
builds, but it is simply marked __maybe_unused in order to keep it
near the rest of the dyntick-idle code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:37 -07:00
Paul E. McKenney
06462efc80 rcu: Clean up flavor-related definitions and comments in update.c
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:36 -07:00
Paul E. McKenney
0ae86a2726 rcu: Clean up flavor-related definitions and comments in tree_plugin.h
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:35 -07:00
Paul E. McKenney
8fa946d428 rcu: Clean up flavor-related definitions and comments in tree_exp.h
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:35 -07:00
Paul E. McKenney
49918a54e6 rcu: Clean up flavor-related definitions and comments in tree.c
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:34 -07:00
Paul E. McKenney
679d3f3092 rcu: Clean up flavor-related definitions and comments in tiny.c
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:34 -07:00
Paul E. McKenney
6eb95cc450 rcu: Clean up flavor-related definitions and comments in srcutree.h
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:34 -07:00
Paul E. McKenney
62a1a94536 rcu: Clean up flavor-related definitions and comments in rcutorture.c
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:33 -07:00
Paul E. McKenney
7f87c036fe rcu: Clean up flavor-related definitions and comments in rcu.h
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:33 -07:00
Paul E. McKenney
8c1cf2da6f rcu: Clean up flavor-related definitions and comments in Kconfig
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:32 -07:00
Paul E. McKenney
de3875d302 rcu: Remove now-unused rcutorture APIs
This commit removes rcu_sched_get_gp_seq(), rcu_bh_get_gp_seq(),
rcu_exp_batches_completed_sched(), rcu_sched_force_quiescent_state(),
and rcu_bh_force_quiescent_state(), which are no longer used because
rcutorture no longer does "rcu_bh" and "rcu_sched" torture types.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:30 -07:00
Paul E. McKenney
620d246065 rcuperf: Remove the "rcu_bh" and "sched" torture types
Now that the RCU-bh and RCU-sched update-side functions are simple
wrappers around their RCU counterparts, there isn't a whole lot of point
in testing them.  This commit therefore removes the "rcu_bh" and "sched"
torture types from rcuperf.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:30 -07:00
Paul E. McKenney
c770c82a23 rcutorture: Remove the "rcu_bh" and "sched" torture types
Now that the RCU-bh and RCU-sched update-side functions are simple
wrappers around their RCU counterparts, there isn't a whole lot of point
in testing them.  This commit therefore removes the "rcu_bh" and "sched"
torture types from rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:29 -07:00
Paul E. McKenney
72ce30dd1f rcu: Stop testing RCU-bh and RCU-sched
Now that the RCU-bh and RCU-sched update-side functions are simple
wrappers around their RCU counterparts, there isn't a whole lot of
point in testing them.  This commit therefore removes the self-test
capability and removes the corresponding kernel-boot parameters.
It also updates the various rcutorture .boot files to remove the
kernel boot parameters that call for testing RCU-bh and RCU-sched.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:29 -07:00
Paul E. McKenney
2ceebc0350 rcutorture: Add RCU-bh and RCU-sched support for extended readers
Since there is now a single consolidated RCU flavor, rcutorture
needs to test extending of RCU readers via rcu_read_lock_bh() and
rcu_read_lock_sched().  This commit adds this support, with added checks
(just like for local_bh_enable()) to ensure that rcu_read_unlock_bh()
will not be invoked while interrupts are disabled.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:27 -07:00
Paul E. McKenney
a8bb74acd8 rcu: Consolidate RCU-sched update-side function definitions
This commit saves a few lines by consolidating the RCU-sched function
definitions at the end of include/linux/rcupdate.h.  This consolidation
also makes it easier to remove them all when the time comes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:26 -07:00
Paul E. McKenney
4c7e9c1434 rcu: Consolidate RCU-bh update-side function definitions
This commit saves a few lines by consolidating the RCU-bh function
definitions at the end of include/linux/rcupdate.h.  This consolidation
also makes it easier to remove them all when the time comes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:25 -07:00
Paul E. McKenney
c3854a055b rcu: Pull rcu_gp_kthread() FQS loop into separate function
The rcu_gp_kthread() function is long and deeply indented, so this
commit pulls the loop that repeatedly invokes rcu_gp_fqs() into a new
rcu_gp_fqs_loop() function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:24 -07:00
Paul E. McKenney
4e95020cdd rcu: Inline increment_cpu_stall_ticks() into its sole caller
Consolidation of the RCU flavors into one makes increment_cpu_stall_ticks()
a trivial one-line function with only one caller.  This commit therefore
inlines it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:23 -07:00
Paul E. McKenney
8ff0b90780 rcu: Fix typo in force_qs_rnp()'s parameter's parameter
Pointers to rcu_data structures should be named rdp, not rsp.  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:23 -07:00
Paul E. McKenney
eb7a665388 rcu: Eliminate initialization-time use of rsp
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it.  This commit therefore replaces rsp with
&rcu_state in rcu_cpu_starting() and rcu_init_one().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:22 -07:00
Paul E. McKenney
ec9f5835f7 rcu: Eliminate RCU-barrier use of rsp
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it.  This commit therefore replaces rsp
with &rcu_state in rcu_barrier_callback(), rcu_barrier_func(), and
_rcu_barrier().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:22 -07:00
Paul E. McKenney
67a0edbf3c rcu: Eliminate quiescent-state and grace-period-nonstart use of rsp
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it.  This commit therefore replaces rsp with
&rcu_state in rcu_report_qs_rnp(), force_quiescent_state(), and
rcu_check_gp_start_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:21 -07:00
Paul E. McKenney
3c779dfef2 rcu: Eliminate callback-invocation/invocation use of rsp
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it.  This commit therefore replaces rsp with
&rcu_state in rcu_do_batch(), invoke_rcu_callbacks(), and __call_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:21 -07:00
Paul E. McKenney
9cbc5b9702 rcu: Eliminate grace-period management code use of rsp
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it.  This commit therefore replaces
rsp with &rcu_state in rcu_start_this_gp(), rcu_accelerate_cbs(),
__note_gp_changes(), rcu_gp_init(), rcu_gp_fqs(), rcu_gp_cleanup(),
rcu_gp_kthread(), and rcu_report_qs_rsp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:20 -07:00
Paul E. McKenney
4c6ed43708 rcu: Eliminate stall-warning use of rsp
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it.  This commit therefore replaces rsp
with &rcu_state in print_other_cpu_stall(), print_cpu_stall(), and
check_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:20 -07:00
Paul E. McKenney
7cba4775ba rcu: Restructure rcu_check_gp_kthread_starvation()
This commit removes the rsp and gpa local variables, repurposes the j
local variable and adds a gpk (GP kthread) local to improve readability.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:19 -07:00
Paul E. McKenney
f7dd7d44fd rcu: Simplify rcutorture_get_gp_data()
This commit restructures rcutorture_get_gp_data() to take advantage of
the fact that there is only one flavor of RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:19 -07:00
Paul E. McKenney
b97d23c51c rcu: Remove for_each_rcu_flavor() flavor-traversal macro
Now that there is only ever a single flavor of RCU in a given kernel
build, there isn't a whole lot of point in having a flavor-traversal
macro.  This commit therefore removes it and converts calls to it to
straightline code, inlining trivial functions as appropriate.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:18 -07:00
Paul E. McKenney
564a9ae604 rcu: Remove last non-flavor-traversal rsp local variable from tree_plugin.h
This commit removes the last non-flavor-traversal rsp local variable from
kernel/rcu/tree_plugin.h in favor of &rcu_state.  The flavor-traversal
locals will be removed with the removal of flavor traversal.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:17 -07:00
Paul E. McKenney
88d1bead85 rcu: Remove rcu_data structure's ->rsp field
Now that there is only one rcu_state structure, there is no need for the
rcu_data structure to indicate which it corresponds to.  This commit
therefore removes the rcu_data structure's ->rsp field, replacing all
remaining uses of it with &rcu_state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:17 -07:00
Paul E. McKenney
aedf4ba984 rcu: Remove rsp parameter from rcu_node tree accessor macros
There now is only one rcu_state structure in a given build of the Linux
kernel, so there is no need to pass it as a parameter to RCU's rcu_node
tree's accessor macros.  This commit therefore removes the rsp parameter
from those macros in kernel/rcu/rcu.h, and removes some now-unused rsp
local variables while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:16 -07:00
Paul E. McKenney
63d4c8c979 rcu: Remove rsp parameter from expedited grace-period functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to
RCU's functions.  This commit therefore removes the rsp parameter
from the code in kernel/rcu/tree_exp.h, and removes all of the
rsp local variables while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:14 -07:00
Paul E. McKenney
4580b0541b rcu: Remove rsp parameter from no-CBs CPU functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to
RCU's functions.  This commit therefore removes the rsp parameter
from rcu_nocb_cpu_needs_barrier(), rcu_spawn_one_nocb_kthread(),
rcu_organize_nocb_kthreads(), rcu_nocb_cpu_needs_barrier(), and
rcu_nohz_full_cpu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:13 -07:00
Paul E. McKenney
b21ebed951 rcu: Remove rsp parameter from print_cpu_stall_info()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
print_cpu_stall_info().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:12 -07:00
Paul E. McKenney
6dbfdc1409 rcu: Remove rsp parameter from rcu_spawn_one_boost_kthread()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_spawn_one_boost_kthread().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:11 -07:00
Paul E. McKenney
81ab59a3ad rcu: Remove rsp parameter from dump_blkd_tasks() and friend
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
dump_blkd_tasks() and rcu_preempt_blocked_readers_cgp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:10 -07:00
Paul E. McKenney
a2887cd85f rcu: Remove rsp parameter from rcu_print_detail_task_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_print_detail_task_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:09 -07:00
Paul E. McKenney
b8bb1f63cf rcu: Remove rsp parameter from rcu_init_one() and friends
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_init_one() and rcu_dump_rcu_node_tree().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:09 -07:00
Paul E. McKenney
53b46303da rcu: Remove rsp parameter from rcu_boot_init_percpu_data() and friends
There now is only one rcu_state structure in a given build of
the Linux kernel, so there is no need to pass it as a parameter
to RCU's functions.  This commit therefore removes the rsp
parameter from rcu_boot_init_percpu_data(), rcu_init_percpu_data(),
rcu_cleanup_dying_idle_cpu(), and rcu_migrate_callbacks().  While in
the neighborhood, line the last three into rcutree_prepare_cpu(),
rcu_report_dead() and rcutree_migrate_callbacks(), respectively.
This also gets rid of the for_each_rcu_flavor() calls that were in
those tree functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:08 -07:00
Paul E. McKenney
8344b871b1 rcu: Remove rsp parameter from _rcu_barrier() and friends
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
_rcu_barrier_trace() and _rcu_barrier().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:08 -07:00
Paul E. McKenney
98ece508b5 rcu: Remove rsp parameter from __rcu_pending()
There now is only one rcu_state structure in a given build of the Linux
kernel, so there is no need to pass it as a parameter to RCU's functions.
This commit therefore removes the rsp parameter from __rcu_pending(),
and also inlines it into rcu_pending(), removing the for_each_rcu_flavor()
while in the neighborhood..

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:07 -07:00
Paul E. McKenney
5c7d89676b rcu: Remove rsp parameter from __call_rcu() and friend
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
__call_rcu_core() and __call_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:07 -07:00
Paul E. McKenney
b049fdf8e3 rcu: Remove rsp parameter from __rcu_process_callbacks()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
__rcu_process_callbacks(), and also inlines it into rcu_process_callbacks(),
removing the for_each_rcu_flavor() while in the neighborhood.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:06 -07:00
Paul E. McKenney
b96f9dc4fb rcu: Remove rsp parameter from rcu_check_gp_start_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_check_gp_start_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:06 -07:00
Paul E. McKenney
e9ecb780fe rcu: Remove rsp parameter from force-quiescent-state functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
force_qs_rnp() and force_quiescent_state().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:05 -07:00
Paul E. McKenney
5bb5d09cc4 rcu: Remove rsp parameter from rcu_do_batch()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_do_batch().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:05 -07:00
Paul E. McKenney
780cd59083 rcu: Remove rsp parameter from CPU hotplug functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_cleanup_dying_cpu() and rcu_cleanup_dead_cpu().  And, as long as
we are in the neighborhood, inlines them into rcutree_dying_cpu() and
rcutree_dead_cpu(), respectively.  This also eliminates a pair of
for_each_rcu_flavor() loops.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:04 -07:00
Paul E. McKenney
8087d3e3c4 rcu: Remove rsp parameter from rcu_check_quiescent_state()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_check_quiescent_state().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:04 -07:00
Paul E. McKenney
0854a05c9f rcu: Remove rsp parameter from rcu_gp_kthread() and friends
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_init(), rcu_gp_fqs_check_wake(), rcu_gp_fqs(), rcu_gp_cleanup(),
and rcu_gp_kthread().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:03 -07:00
Paul E. McKenney
22212332c1 rcu: Remove rsp parameter from rcu_gp_slow()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_slow().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:03 -07:00
Paul E. McKenney
15cabdffbb rcu: Remove rsp parameter from note_gp_changes()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:02 -07:00
Paul E. McKenney
c7e48f7ba3 rcu: Remove rsp parameter from __note_gp_changes()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
__note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:01 -07:00
Paul E. McKenney
834f56bf54 rcu: Remove rsp parameter from rcu_advance_cbs()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_advance_cbs().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:01 -07:00
Paul E. McKenney
c6e09b97b9 rcu: Remove rsp parameter from rcu_accelerate_cbs_unlocked()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_accelerate_cbs_unlocked().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:00 -07:00
Paul E. McKenney
02f501423d rcu: Remove rsp parameter from rcu_accelerate_cbs()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_accelerate_cbs().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:00 -07:00
Paul E. McKenney
532c00c97f rcu: Remove rsp parameter from rcu_gp_kthread_wake()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:59 -07:00
Paul E. McKenney
3481f2eab0 rcu: Remove rsp parameter from rcu_future_gp_cleanup()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_future_gp_cleanup().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:59 -07:00
Paul E. McKenney
ea12ff2b7d rcu: Remove rsp parameter from check_cpu_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
check_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:58 -07:00
Paul E. McKenney
4e8b8e08f9 rcu: Remove rsp parameter from print_cpu_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
print_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:58 -07:00
Paul E. McKenney
a91e7e58b1 rcu: Remove rsp parameter from print_other_cpu_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
print_other_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:57 -07:00
Paul E. McKenney
e1741c69d4 rcu: Remove rsp parameter from rcu_stall_kick_kthreads()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_stall_kick_kthreads().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:57 -07:00
Paul E. McKenney
33dbdbf025 rcu: Remove rsp parameter from rcu_dump_cpu_stacks()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_dump_cpu_stacks().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:56 -07:00
Paul E. McKenney
8fd119b652 rcu: Remove rsp parameter from rcu_check_gp_kthread_starvation()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_check_gp_kthread_starvation().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:56 -07:00
Paul E. McKenney
ad3832e974 rcu: Remove rsp parameter from record_gp_stall_check_time()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
record_gp_stall_check_time().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:55 -07:00
Paul E. McKenney
336a4f6c45 rcu: Remove rsp parameter from rcu_get_root()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_get_root().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:55 -07:00
Paul E. McKenney
de8e87305a rcu: Remove rsp parameter from rcu_gp_in_progress()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_in_progress().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:54 -07:00
Paul E. McKenney
33085c469a rcu: Remove rsp parameter from rcu_report_qs_rdp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_qs_rdp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:53 -07:00
Paul E. McKenney
139ad4da5a rcu: Remove rsp parameter from rcu_report_unblock_qs_rnp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_unblock_qs_rnp(), which is particularly appropriate in
this case given that this parameter is no longer used.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:53 -07:00
Paul E. McKenney
aff4e9ede5 rcu: Remove rsp parameter from rcu_report_qs_rsp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_qs_rsp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:52 -07:00
Paul E. McKenney
b50912d0b5 rcu: Remove rsp parameter from rcu_report_qs_rnp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_qs_rnp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:51 -07:00
Paul E. McKenney
2280ee5a7d rcu: Remove rcu_data_p pointer to default rcu_data structure
The rcu_data_p pointer references the default set of per-CPU rcu_data
structures, that is, those that call_rcu() uses, as opposed to
call_rcu_bh() and sometimes call_rcu_sched().  But there is now only one
set of per-CPU rcu_data structures, so that one set is by definition
the default, which means that the rcu_data_p pointer no longer serves
any useful purpose.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:51 -07:00
Paul E. McKenney
16fc9c600b rcu: Remove rcu_state_p pointer to default rcu_state structure
The rcu_state_p pointer references the default rcu_state structure,
that is, the one that call_rcu() uses, as opposed to call_rcu_bh()
and sometimes call_rcu_sched().  But there is now only one rcu_state
structure, so that one structure is by definition the default, which
means that the rcu_state_p pointer no longer serves any useful purpose.
This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:50 -07:00
Paul E. McKenney
da1df50d16 rcu: Remove rcu_state structure's ->rda field
The rcu_state structure's ->rda field was used to find the per-CPU
rcu_data structures corresponding to that rcu_state structure.  But now
there is only one rcu_state structure (creatively named "rcu_state")
and one set of per-CPU rcu_data structures (creatively named "rcu_data").
Therefore, uses of the ->rda field can always be replaced by "rcu_data,
and this commit makes that change and removes the ->rda field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:49 -07:00
Paul E. McKenney
ec5dd444b6 rcu: Eliminate rcu_state structure's ->call field
The rcu_state structure's ->call field references the corresponding RCU
flavor's call_rcu() function.  However, now that there is only ever one
rcu_state structure in a given build of the Linux kernel, and that flavor
uses plain old call_rcu(), there is not a lot of point in continuing to
have the ->call field.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:48 -07:00
Paul E. McKenney
358be2d368 rcu: Remove RCU_STATE_INITIALIZER()
Now that a given build of the Linux kernel has only one set of rcu_state,
rcu_node, and rcu_data structures, there is no point in creating a macro
to declare and compile-time initialize them.  This commit therefore
just does normal declaration and compile-time initialization of these
structures.  While in the area, this commit also removes #ifndefs of
the no-longer-ever-defined preprocessor macro RCU_TREE_NONCORE.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:47 -07:00
Paul E. McKenney
709fdce754 rcu: Express Tiny RCU updates in terms of RCU rather than RCU-sched
This commit renames Tiny RCU functions so that the lowest level of
functionality is RCU (e.g., synchronize_rcu()) rather than RCU-sched
(e.g., synchronize_sched()).  This provides greater naming compatibility
with Tree RCU, which will in turn permit more LoC removal once
the RCU-sched and RCU-bh update-side API is removed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix Tiny call_rcu()'s EXPORT_SYMBOL() in response to a bug
  report from kbuild test robot. ]
2018-08-30 16:02:46 -07:00
Paul E. McKenney
45975c7d21 rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds
Now that RCU-preempt knows about preemption disabling, its implementation
of synchronize_rcu() works for synchronize_sched(), and likewise for the
other RCU-sched update-side API members.  This commit therefore confines
the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines
RCU-sched's update-side API members in terms of those of RCU-preempt.

This means that any given build of the Linux kernel has only one
update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds
and RCU-sched for CONFIG_PREEMPT=n builds.  This in turn means that kernels
built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
2018-08-30 16:02:45 -07:00
Paul E. McKenney
4cf439a200 rcu: Fix typo in rcu_get_gp_kthreads_prio() header comment
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:43 -07:00
Paul E. McKenney
2bbfc25b09 rcu: Drop "wake" parameter from rcu_report_exp_rdp()
The rcu_report_exp_rdp() function is always invoked with its "wake"
argument set to "true", so this commit drops this parameter.  The only
potential call site that would use "false" is in the code driving the
expedited grace period, and that code uses rcu_report_exp_cpu_mult()
instead, which therefore retains its "wake" parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:43 -07:00
Paul E. McKenney
82fcecfa81 rcu: Update comments and help text for no more RCU-bh updaters
This commit updates comments and help text to account for the fact that
RCU-bh update-side functions are now simple wrappers for their RCU or
RCU-sched counterparts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:42 -07:00
Paul E. McKenney
65cfe3583b rcu: Define RCU-bh update API in terms of RCU
Now that the main RCU API knows about softirq disabling and softirq's
quiescent states, the RCU-bh update code can be dispensed with.
This commit therefore removes the RCU-bh update-side implementation and
defines RCU-bh's update-side API in terms of that of either RCU-preempt or
RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option.

In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect
of reducing by one the number of rcuo kthreads per CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:40 -07:00
Paul E. McKenney
ba1c64c272 rcu: Report expedited grace periods at context-switch time
This commit reduces the latency of expedited RCU grace periods by
reporting a quiescent state for the CPU at context-switch time.
In CONFIG_PREEMPT=y kernels, if the outgoing task is still within an
RCU read-side critical section (and thus still blocking some grace
period, perhaps including this expedited grace period), then that task
will already have been placed on one of the leaf rcu_node structures'
->blkd_tasks list.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:38 -07:00
Paul E. McKenney
d28139c4e9 rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe
One necessary step towards consolidating the three flavors of RCU is to
make sure that the resulting consolidated "one flavor to rule them all"
correctly handles networking denial-of-service attacks.  One thing that
allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every
so often, and so something similar has to happen for consolidated RCU.

This must be done carefully.  For example, if a preemption-disabled
region of code takes an interrupt which does softirq processing before
returning, consolidated RCU must ignore the resulting rcu_bh_qs()
invocations -- preemption is still disabled, and that means an RCU
reader for the consolidated flavor.

This commit therefore creates a new rcu_softirq_qs() that is called only
from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region
problem.  This new rcu_softirq_qs() function invokes rcu_sched_qs(),
rcu_preempt_qs(), and rcu_preempt_deferred_qs().  The latter call handles
any deferred quiescent states.

Note that __do_softirq() still invokes rcu_bh_qs().  It will continue to
do so until a later stage of cleanup when the RCU-bh flavor is removed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix !SMP issue located by kbuild test robot. ]
2018-08-30 16:02:38 -07:00
Paul E. McKenney
e11ec65cc8 rcu: Add warning to detect half-interrupts
RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
either an interrupt that invokes rcu_irq_enter() but never invokes the
corresponding rcu_irq_exit() on the one hand, or an interrupt that never
invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
on the other.  These things really did happen at one time, as evidenced
by this ca-2011 LKML post:

http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com

The reason why RCU tolerates half-interrupts is that usermode helpers
used exceptions to invoke a system call from within the kernel such that
the system call did a normal return (not a return from exception) to
the calling context.  This caused rcu_irq_enter() to be invoked without
a matching rcu_irq_exit().  However, usermode helpers have since been
rewritten to make much more housebroken use of workqueues, kernel threads,
and do_execve(), and therefore should no longer produce half-interrupts.
No one knows of any other source of half-interrupts, but then again,
no one seems insane enough to go audit the entire kernel to verify that
half-interrupts really are a relic of the past.

This commit therefore adds a pair of WARN_ON_ONCE() calls that will
trigger in the presence of half interrupts, which the code will continue
to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
mid-2021, then perhaps RCU can stop handling half-interrupts, which
would be a considerable simplification.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-08-30 16:02:36 -07:00
Paul E. McKenney
fcc878e4df rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union
The ->b.exp_need_qs field is now set only to false, so this commit
removes it.  The job this field used to do is now done by the rcu_data
structure's ->deferred_qs field, which is a consequence of a better
split between task-based (the rcu_node structure's ->exp_tasks field) and
CPU-based (the aforementioned rcu_data structure's ->deferred_qs field)
tracking of quiescent states for RCU-preempt expedited grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:36 -07:00
Paul E. McKenney
27c744e32a rcu: Allow processing deferred QSes for exiting RCU-preempt readers
If an RCU-preempt read-side critical section is exiting, that is,
->rcu_read_lock_nesting is negative, then it is a good time to look
at the possibility of reporting deferred quiescent states.  This
commit therefore updates the checks in rcu_preempt_need_deferred_qs()
to allow exiting critical sections to report deferred quiescent states.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:35 -07:00
Paul E. McKenney
c0335743c5 rcutorture: Test extended "rcu" read-side critical sections
This commit makes the "rcu" torture type test extended read-side
critical sections in order to test the deferral of RCU-preempt
quiescent-state testing.

In CONFIG_PREEMPT=n kernels, this simply duplicates the setup already
in place for the "sched" torture type.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:35 -07:00
Paul E. McKenney
3e31009898 rcu: Defer reporting RCU-preempt quiescent states when disabled
This commit defers reporting of RCU-preempt quiescent states at
rcu_read_unlock_special() time when any of interrupts, softirq, or
preemption are disabled.  These deferred quiescent states are reported
at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug
offline operation.  Of course, if another RCU read-side critical
section has started in the meantime, the reporting of the quiescent
state will be further deferred.

This also means that disabling preemption, interrupts, and/or
softirqs will act as an RCU-preempt read-side critical section.
This is enforced by checking preempt_count() as needed.

Some special cases must be handled on an ad-hoc basis, for example,
context switch is a quiescent state even though both the scheduler and
do_exit() disable preemption.  In these cases, additional calls to
rcu_preempt_deferred_qs() override the preemption disabling.  Similar
logic overrides disabled interrupts in rcu_preempt_check_callbacks()
because in this case the quiescent state happened just before the
corresponding scheduling-clock interrupt.

In theory, this change lifts a long-standing restriction that required
that if interrupts were disabled across a call to rcu_read_unlock()
that the matching rcu_read_lock() also be contained within that
interrupts-disabled region of code.  Because the reporting of the
corresponding RCU-preempt quiescent state is now deferred until
after interrupts have been enabled, it is no longer possible for this
situation to result in deadlocks involving the scheduler's runqueue and
priority-inheritance locks.  This may allow some code simplification that
might reduce interrupt latency a bit.  Unfortunately, in practice this
would also defer deboosting a low-priority task that had been subjected
to RCU priority boosting, so real-time-response considerations might
well force this restriction to remain in place.

Because RCU-preempt grace periods are now blocked not only by RCU
read-side critical sections, but also by disabling of interrupts,
preemption, and softirqs, it will be possible to eliminate RCU-bh and
RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels.  This may
require some additional plumbing to provide the network denial-of-service
guarantees that have been traditionally provided by RCU-bh.  Once these
are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh
into RCU-sched.  This would mean that all kernels would have but
one flavor of RCU, which would open the door to significant code
cleanup.

Moving to a single flavor of RCU would also have the beneficial effect
of reducing the NOCB kthreads by at least a factor of two.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback
  from Joel Fernandes. ]
[ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
  response to bug reports from kbuild test robot. ]
[ paulmck: Fix bug located by kbuild test robot involving recursion
  via rcu_preempt_deferred_qs(). ]
2018-08-30 16:02:34 -07:00
Byungchul Park
cf7614e13c rcu: Refactor rcu_{nmi,irq}_{enter,exit}()
When entering or exiting irq or NMI handlers, the current code uses
->dynticks_nmi_nesting to detect if it is in the outermost handler,
that is, the one interrupting or returning to an RCU-idle context (the
idle loop or nohz_full usermode execution).  When entering the outermost
handler via an interrupt (as opposed to NMI), it is necessary to invoke
rcu_dynticks_task_exit() just before the CPU is marked non-idle from an
RCU perspective and to invoke rcu_cleanup_after_idle() just after the
CPU is marked non-idle.  Similarly, when exiting the outermost handler
via an interrupt, it is necessary to invoke rcu_prepare_for_idle() just
before marking the CPU idle and to invoke rcu_dynticks_task_enter()
just after marking the CPU idle.

The decision to execute these four functions is currently taken in
rcu_irq_enter() and rcu_irq_exit() as follows:

   rcu_irq_enter()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_enter()
         /* A conditional branch with ->dynticks */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_irq_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_exit()
         /* A conditional branch with ->dynticks_nmi_nesting */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter()
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */

This works, but the conditional branches in rcu_irq_enter() and
rcu_irq_exit() are redundant with those in rcu_nmi_enter() and
rcu_nmi_exit(), respectively.  Redundant branches are not something
we want in the to/from-idle fastpaths, so this commit refactors
rcu_{nmi,irq}_{enter,exit}() so they use a common inlined function passed
a constant argument as follows:

   rcu_irq_enter() inlining rcu_nmi_enter_common(irq=true)
      /* A conditional branch with ->dynticks */

   rcu_irq_exit() inlining rcu_nmi_exit_common(irq=true)
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter() inlining rcu_nmi_enter_common(irq=false)
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit() inlining rcu_nmi_exit_common(irq=false)
      /* A conditional branch with ->dynticks_nmi_nesting */

The combination of the constant function argument and the inlining allows
the compiler to discard the conditionals that previously controlled
execution of rcu_dynticks_task_exit(), rcu_cleanup_after_idle(),
rcu_prepare_for_idle(), and rcu_dynticks_task_enter().  This reduces both
the to-idle and from-idle path lengths by two conditional branches each,
and improves readability as well.

This commit also changes order of execution from this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	trace_rcu_dyntick();
	rcu_cleanup_after_idle();

To this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	rcu_cleanup_after_idle();
	trace_rcu_dyntick();

In other words, the calls to rcu_cleanup_after_idle() and
trace_rcu_dyntick() are reversed.  This has no functional effect because
the real concern is whether a given call is before or after the call to
rcu_dynticks_eqs_exit(), and this patch does not change that.  Before the
call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
CPU and after that call RCU is watching.

A similar switch in calling order happens on the idle-entry path, with
similar lack of effect for the same reasons.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Applied Steven Rostedt feedback. ]
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:00:46 -07:00
Paul E. McKenney
7c590fcca6 rcutorture: Maintain self-propagating CB only during forward-progress test
The current forward-progress testing maintains a self-propagating
callback during the full test.  This could result in false negatives
for stutter-end checking, where it might appear that RCU was clearing
out old callbacks only because it was being continually motivated by
the self-propagating callback.  This commit therefore shuts down the
self-propagating callback at the end of each forward-progress test
interval.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
474e59b476 rcutorture: Check GP completion at stutter end
The rcu_torture_writer() function invokes stutter_wait() at the end of
each writer pass, which occasionally blocks for an extended time period
in order to ensure that RCU can handle intermittent loads.  But part of
handling a busy period is invoking all the callbacks before the end of
the idle period induced by stutter_wait().

This commit therefore adds a return value to stutter_wait() indicating
whether stutter_wait() actually waited.  In addition, this commit causes
rcu_torture_writer() to test this value and if set, checks that all the
elements of the rcu_tortures[] array have been freed up.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
f4de46ed5b rcutorture: Print forward-progress test interval on error
This commit prints the duration of the forward-progress test interval in
the case that no forward progress was observed as an aid to debugging.
When forward progress does happen, it prints out the number of
rcu_torture_writer() versions and grace periods that elapsed during the
forward-progress test.  At the end of the run, it also prints the number
of attempted and actual forward-progress tests.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
c04dd09bd3 rcutorture: Adjust number of reader kthreads per CPU-hotplug operations
Currently, rcutorture provisions rcu_torture_reader() kthreads based
on the initial number of CPUs.  This can be problematic when CPU hotplug
is enabled, as a system with a very large number of CPUs will provision
a very large number of rcu_torture_reader() kthreads.  All of these
kthreads will continue running even if the CPU-hotplug operations result
in only one remaining online CPU.  This can result in all sorts of strange
artifacts due simply to massive overload.

This commit therefore causes the rcu_torture_reader() kthreads to start
blocking as the number of online CPUs decreases.  This is accomplished
by numbering these kthreads, and having each check to make sure that the
number of online CPUs is at least as large as its assigned number.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
fecad5091f rcutorture: Reduce priority of forward-progress testing
On !SMP tests, the forward-progress kthread might prevent RCU's
grace-period kthread from running, which would defeat RCU's
forward-progress measures.  On PREEMPT tests without RCU priority
boosting, the forward-progress kthread might preempt a reader for an
extended time period, which would also defeat RCU's forward-progress
measures.  This commit therefore reduced rcutorture's forward-progress
kthread's priority in those cases.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
1e69676592 rcutorture: Limit reader duration if irq or bh disabled
There are debug checks in some environments that will complain if the
duration of a bh-disabled region of code exceeds about 50 milliseconds.
Because rcu_read_delay() can produce a 50-millisecond delay and because
there could be up to eight reader segments with such delays, this commit
limits the maximum delay to 10 milliseconds if either interrupts or
softirqs are disabled.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
3cff54a830 rcutorture: Increase rcu_read_delay() longdelay_ms
RCU now takes certain actions 100 and 200 milliseconds into a grace period
by default, but rcutorture only runs RCU read-side critical sections
with durations up to 50 milliseconds.  This commit therefore increases
test coverage by increasing the maximum critical-section duration to
300 milliseconds.  Note that the existing code automatically dials down
the probability of long delays based on the maximum duration, which means
that this change should not significantly change the rate of execution
of RCU read-side critical sections.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
9fdcb9afe0 rcutorture: Add self-propagating callback to forward-progress testing
If rcutorture is run on a quiet system with the rcutorture.stutter module
parameter set high, then there can legitimately be an extended period
during which no RCU forward progress takes place.  This can result
in false-positive no-forward-progress splats.  This commit therefore
makes rcu_torture_fwd_prog() create a self-propagating RCU callback
to ensure that grace periods are in progress for the duration of the
forward-progress test.

Note that the RCU flavor under test must define ->call(), ->sync(),
and ->cb_barrier() for this self-propagating callback to be created.
If one or more of those rcu_torture_ops fields are NULL, then the
rcu_torture_fwd_prog() function will silently proceed without creating
the self-propagating callback.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
08a7a2ec68 rcutorture: Vary forward-progress test interval
Some of the Linux kernel's RCU implementations provide several mechanisms
to promote forward progress that operate over different timeframes.
This commit therefore causes rcu_torture_fwd_prog() to vary the duration
of its forward-progress testing in order to test each such mechanism.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
152f4afbfd rcutorture: Avoid no-test complaint if too few forward-progress tries
In a too-short test, random delays can cause each attempt to do
forward-progress testing to fail to complete, thus resulting in
spurious splats.  This commit therefore requires at least five tries
before complaining about rcutorture runs that failed to produce at
least one valid forward-progress testing attempt.  Note that actual
forward-progress failures will splat regardless of the number of tries.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
119248bec9 rcutorture: Also use GP sequence to judge forward progress
Currently, rcutorture relies solely on the progress of
rcu_torture_writer() to judge grace-period forward progress.  In theory,
this is the gold standard of forward progress, but in practice rcutorture
separately detects and reports rcu_torture_writer() stalls.  This commit
therefore adds the grace-period sequence number (when provided) to the
judgment of grace-period forward progress, which makes it easier to
distinguish between failure of actual grace periods to progress on the
one hand and downstream forward-progress failures on the other.

For example, given this change, if rcu_torture_writer() stalls,
but rcu_torture_fwd_prog() does not complain, then the grace-period
computation is working, which is a hint that the failure lies in callback
processing, wakeup of the rcu_torture_writer() kthread, or similar.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
1b27291b1e rcutorture: Add forward-progress tests for RCU grace periods
This commit adds a kthread that loops going into and out of RCU
read-side critical sections, but also including a cond_resched(),
optionally guarded by a check of need_resched(), in that same loop.
This commit relies solely on rcu_torture_writer() progress to judge
the forward progress of grace periods.

Note that Tasks RCU and SRCU are exempted from forward-progress testing
due their (intentionally) less-robust forward-progress guarantees.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
f028806442 rcuperf: Warn on bad perf type for built-in tests
When running a built-in rcuperf test, specifying an invalid perf type
results in what looks like a hard hang, with the error messages hidden
by other boot-time output.  This commit therefore executes a WARN_ON()
in this case so that the splat appears just following the error messages.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
e746b55857 rcutorture: Warn on bad torture type for built-in tests
When running a built-in rcutorture test, specifying an invalid torture
type results in what looks like a hard hang, with the error messages
hidden by other boot-time output.  This commit therefore executes a
WARN_ON() in this case so that the splat appears just following the
error messages.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Paul E. McKenney
444da518fd rcutorture: Force occasional reader waits
Deferred quiescent states can interact with the scheduler, but
rcu_torture_reader() does not force such interaction all that frequently.
This commit therefore blocks for one jiffy after ten jiffies of read-side
runtime.  This has the beneficial effect of being most likely to block
just after long-running readers, and it is exactly these readers that
are most likely to have been preempted (in CONFIG_PREEMPT=y kernels).
This in turn helps increase the probability that a deferred quiescent
state will be seen by RCU's context-switch hooks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-29 09:20:48 -07:00
Linus Torvalds
f7951c33f0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:

 - Cleanup and improvement of NUMA balancing

 - Refactoring and improvements to the PELT (Per Entity Load Tracking)
   code

 - Watchdog simplification and related cleanups

 - The usual pile of small incremental fixes and improvements

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  watchdog: Reduce message verbosity
  stop_machine: Reflow cpu_stop_queue_two_works()
  sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
  sched/numa: Use group_weights to identify if migration degrades locality
  sched/numa: Update the scan period without holding the numa_group lock
  sched/numa: Remove numa_has_capacity()
  sched/numa: Modify migrate_swap() to accept additional parameters
  sched/numa: Remove unused task_capacity from 'struct numa_stats'
  sched/numa: Skip nodes that are at 'hoplimit'
  sched/debug: Reverse the order of printing faults
  sched/numa: Use task faults only if numa_group is not yet set up
  sched/numa: Set preferred_node based on best_cpu
  sched/numa: Simplify load_too_imbalanced()
  sched/numa: Evaluate move once per node
  sched/numa: Remove redundant field
  sched/debug: Show the sum wait time of a task group
  sched/fair: Remove #ifdefs from scale_rt_capacity()
  sched/core: Remove get_cpu() from sched_fork()
  sched/cpufreq: Clarify sugov_get_util()
  sched/sysctl: Remove unused sched_time_avg_ms sysctl
  ...
2018-08-13 11:25:07 -07:00
Paul E. McKenney
18952651da Merge branches 'fixes1.2018.07.12b' and 'torture1.2018.07.12b' into HEAD
fixes1.2018.07.12b: Post-gp_seq miscellaneous fixes
torture1.2018.07.12b: Post-gp_seq torture-test updates
2018-07-12 15:42:41 -07:00
Joel Fernandes (Google)
bf5b64355a rcutorture: Fix rcu_barrier successes counter
The rcutorture test module currently increments both successes and error
for the barrier test upon error, which results in misleading statistics
being printed.  This commit therefore changes the code to increment the
success counter only when the test actually passes.

This change was tested by by returning from the barrier callback without
incrementing the callback counter, thus introducing what appeared to
rcutorture to be rcu_barrier() failures.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:08 -07:00
Joel Fernandes (Google)
4babd855fd rcutorture: Add support to detect if boost kthread prio is too low
When rcutorture is built in to the kernel, an earlier patch detects
that and raises the priority of RCU's kthreads to allow rcutorture's
RCU priority boosting tests to succeed.

However, if rcutorture is built as a module, those priorities must be
raised manually via the rcutree.kthread_prio kernel boot parameter.
If this manual step is not taken, rcutorture's RCU priority boosting
tests will fail due to kthread starvation.  One approach would be to
raise the default priority, but that risks breaking existing users.
Another approach would be to allow runtime adjustment of RCU's kthread
priorities, but that introduces numerous "interesting" race conditions.
This patch therefore instead detects too-low priorities, and prints a
message and disables the RCU priority boosting tests in that case.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:08 -07:00
Arnd Bergmann
622be33fcb rcutorture: Use monotonic timestamp for stall detection
The get_seconds() call is deprecated because it overflows on 32-bit
architectures. The algorithm in rcu_torture_stall() can deal with
the overflow, but another problem here is that using a CLOCK_REALTIME
stamp can lead to a false-positive stall warning when a settimeofday()
happens concurrently.

Using ktime_get_seconds() instead avoids those issues and will never
overflow. The added cast to 'unsigned long' however is necessary to
make ULONG_CMP_LT() work correctly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:07 -07:00
Joel Fernandes (Google)
3b745c8969 rcutorture: Make boost test more robust
Currently, with RCU_BOOST disabled, I get no failures when forcing
rcutorture to test RCU boost priority inversion. The reason seems to be
that we don't check for failures if the callback never ran at all for
the duration of the boost-test loop.

Further, the 'rtb' and 'rtbf' counters seem to be used inconsistently.
'rtb' is incremented at the start of each test and 'rtbf' is incremented
per-cpu on each failure of call_rcu. So its possible 'rtbf' > 'rtb'.

To test the boost with rcutorture, I did following on a 4-CPU x86 machine:

modprobe rcutorture  test_boost=2
sleep 20
rmmod rcutorture

With patch:
rtbf: 8 rtb: 12

Without patch:
rtbf: 0 rtb: 2

In summary this patch:
 - Increments failed and total test counters once per boost-test.
 - Checks for failure cases correctly.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:06 -07:00
Joel Fernandes (Google)
450efca718 rcutorture: Disable RT throttling for boost tests
Currently rcutorture is not able to torture RCU boosting properly. This
is because the rcutorture's boost threads which are doing the torturing
may be throttled due to RT throttling.

This patch makes rcutorture use the right torture technique (unthrottled
rcutorture boost tasks) for torturing RCU so that the test fails
correctly when no boost is available.

Currently this requires accessing sysctl_sched_rt_runtime directly, but
that should be Ok since rcutorture is test code. Such direct access is
also only possible if rcutorture is used as a built-in so make it
conditional on that.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:06 -07:00
Paul E. McKenney
bf1bef50be rcutorture: Emphasize testing of single reader protection type
For RCU implementations supporting multiple types of reader protection,
rcutorture currently randomly selects the combinations of types of
protection for each phase of each reader.  The problem with this,
for example, given the four kinds of protection for RCU-sched
(local_irq_disable(), local_bh_disable(), preempt_disable(), and
rcu_read_lock_sched()), the reader will be protected by a single
mechanism only 25% of the time.  We really heavier testing of single
read-side mechanisms.

This commit therefore uses only a single mechanism about 60% of the time,
half of the time explicitly and one-eighth of the time by chance.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:05 -07:00
Paul E. McKenney
2397d072f7 rcutorture: Handle extended read-side critical sections
This commit enables rcutorture to test whether RCU properly aggregates
different types of read-side critical sections into a larger section
covering the set.  It does this by extending an initial read-side
critical section randomly for a random number of extensions.  There is
a new rcu_torture_ops field ->extendable that specifies what extensions
are permitted for a given flavor of RCU (for example, SRCU does not
permit any extensions, while RCU-sched permits all types).  Note that
if a given operation (for example, local_bh_disable()) extends an RCU
read-side critical section, then rcutorture feels free to also start
and end the critical section with that operation's type of disabling.

Disabling operations include local_bh_disable(), local_irq_disable(),
and preempt_disable().  This commit also adds a new "busted_srcud"
torture type, which verifies rcutorture's ability to detect extensions
of RCU read-side critical sections that are not handled.  Gotta test
the test, after all!

Note that it is not legal to invoke local_bh_disable() with interrupts
disabled, and this transition is avoided by overriding the random-number
generator when it wants to call local_bh_disable() while interrupts
are disabled.  The code instead leaves both interrupts and bh/softirq
disabled in this case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:05 -07:00
Paul E. McKenney
241b42522a rcutorture: Make rcu_torture_timer() use rcu_torture_one_read()
This commit saves a few lines of code by making rcu_torture_timer()
invoke rcu_torture_one_read(), thus completing the consolidation of
code between rcu_torture_timer() and rcu_torture_reader().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:04 -07:00
Paul E. McKenney
3025520ec4 rcutorture: Use per-CPU random state for rcu_torture_timer()
Currently, the rcu_torture_timer() function uses a single global
torture_random_state structure protected by a single global lock.
This conflicts to some extent with performance and scalability,
but even more with the goal of consolidating read-side testing
with rcu_torture_reader().  This commit therefore creates a per-CPU
torture_random_state structure for use by rcu_torture_timer() and
eliminates the lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Make rcu_torture_timer_rand static, per 0day Test Robot report. ]
2018-07-12 15:42:04 -07:00
Paul E. McKenney
8da9a59523 rcutorture: Use atomic increment for n_rcu_torture_timers
Currently, rcu_torture_timer() relies on a lock to guard updates to
n_rcu_torture_timers.  Unfortunately, consolidating code with
rcu_torture_reader() will dispense with this lock.  This commit
therefore makes n_rcu_torture_timers be an atomic_long_t and uses
atomic_long_inc() to carry out the update.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:03 -07:00
Paul E. McKenney
6b06aa723e rcutorture: Extract common code from rcu_torture_reader()
This commit extracts the code executed on each pass through the loop
in rcu_torture_reader() into a new rcu_torture_one_read() function.
This new function will also be used by rcu_torture_timer().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:02 -07:00
Paul E. McKenney
2d3625841d rcuperf: Remove unused torturing_tasks() function
The torturing_tasks() function in rcuperf.c is not used, so this commit
removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:02 -07:00
Paul E. McKenney
6bea2cc5a9 rcu: Remove rcutorture test version and sequence number
Back when RCU had a debugfs interface, there was a test version and
sequence number that allowed associating debugfs data with a particular
test run, where the test run started with modprobe and ended with rmmod,
which was how tests were run back on the old ABAT system within IBM.
But rcutorture testing no longer runs on ABAT, and there is no longer an
RCU debugfs interface, so there is no longer any need for test versions
and sequence numbers.

This commit therefore removes the rcutorture_record_test_transition()
and rcutorture_record_progress() functions, and along with them the
rcutorture_testseq and rcutorture_vernum variables that they update.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:01 -07:00
Paul E. McKenney
028be12b29 rcutorture: Change units of onoff_interval to jiffies
Some RCU bugs have been sensitive to the frequency of CPU-hotplug
operations, which have been gradually increased over time.  But this
frequency is now at the one-second lower limit that can be specified using
the rcutorture.onoff_interval kernel parameter.  This commit therefore
changes the units of rcutorture.onoff_interval from seconds to jiffies,
and also sets the value specified for this kernel parameter in the TREE03
rcutorture scenario to 200, which is 200 milliseconds for HZ=1000.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:01 -07:00
Joel Fernandes (Google)
c7cd161ecb rcu: Assign higher prio to RCU threads if rcutorture is built-in
The rcutorture RCU priority boosting tests fail even with CONFIG_RCU_BOOST
set because rcutorture's threads run at the same priority as the default
RCU kthreads (RT class with priority of 1).

This patch checks if RCU torture is built into the kernel and if so,
assigns RT priority 1 to the RCU threads, allowing the rcutorture boost
tests to pass.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:26 -07:00
Paul E. McKenney
52e17ba1d0 srcu: Add grace-period number to rcutorture statistics printout
This commit adds the SRCU grace-period number to the rcutorture statistics
printout, which allows it to be compared to the rcutorture "Writer stall
state" message.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:25 -07:00
Paul E. McKenney
89b4cd4b9e rcu: Print stall-warning NMI dyntick state in hexadecimal
The ->dynticks_nmi_nesting field records the nesting depth of both
interrupt and NMI handlers.  Because the kernel can enter interrupts
and never leave them (and vice versa) and because NMIs can interrupt
manipulation of the ->dynticks_nmi_nesting field, the values in this
field must be both chosen and maniupated very carefully.  As a result,
although the value is zero when the corresponding CPU is executing
neither an interrupt nor an NMI handler, it is 4,611,686,018,427,387,906
on 64-bit systems when there is a single level of interrupt/NMI handling
in progress.

This number is difficult to remember and interpret, so this commit
switches the output to hexadecimal, resulting in the much nicer
0x4000000000000002.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:24 -07:00
Paul E. McKenney
2ee5aca546 rcu: Make rcu_seq_diff() more exact
The current implementatation of rcu_seq_diff() follows tradition in
providing a rough-and-ready approximation of the number of elapsed grace
periods between the two rcu_seq values.  However, this difference is
used to flag RCU-failure "near misses", which can be a valuable debugging
aid, so more exactitude would be an improvement.  This commit therefore
improves the accuracy of rcu_seq_diff().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:23 -07:00
Byungchul Park
67abb96cbf rcu: Check the range of jiffies_till_{first,next}_fqs when setting them
Currently, the range of jiffies_till_{first,next}_fqs are checked and
adjusted on and on in the loop of rcu_gp_kthread on runtime.

However, it's enough to check them only when setting them, not every
time in the loop. So make them handled on a setting time via sysfs.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:22 -07:00
Paul E. McKenney
47199a0812 rcu: Add diagnostics for rcutorture writer stall warning
This commit adds any in-the-future ->gp_seq_needed fields to the
diagnostics for an rcutorture writer stall warning message.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:22 -07:00
Steven Rostedt (VMware)
cd23ac8ddb rcu: Add comment to the last sleep in the rcu tasks loop
At the end of rcu_tasks_kthread() there's a lonely
schedule_timeout_uninterruptible() call with no apparent rationale for
its existence. But there is. It is to keep the thread from going into
a tight loop if there's some anomaly. That really needs a comment.

Link: http://lkml.kernel.org/r/20180524223839.GU3803@linux.vnet.ibm.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:21 -07:00
Steven Rostedt (VMware)
c03be752d3 rcu: Speed up calling of RCU tasks callbacks
Joel Fernandes found that the synchronize_rcu_tasks() was taking a
significant amount of time. He demonstrated it with the following test:

 # cd /sys/kernel/tracing
 # while [ 1 ]; do x=1; done &
 # echo '__schedule_bug:traceon' > set_ftrace_filter
 # time echo '!__schedule_bug:traceon' > set_ftrace_filter;

real	0m1.064s
user	0m0.000s
sys	0m0.004s

Where it takes a little over a second to perform the synchronize,
because there's a loop that waits 1 second at a time for tasks to get
through their quiescent points when there's a task that must be waited
for.

After discussion we came up with a simple way to wait for holdouts but
increase the time for each iteration of the loop but no more than a
full second.

With the new patch we have:

 # time echo '!__schedule_bug:traceon' > set_ftrace_filter;

real	0m0.131s
user	0m0.000s
sys	0m0.004s

Which drops it down to 13% of what the original wait time was.

Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org
Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:21 -07:00
Joel Fernandes (Google)
0d805a70a6 rcu: Add comment documenting how rcu_seq_snap works
rcu_seq_snap may be tricky to decipher. Lets document how it works with
an example to make it easier.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Shrink comment as suggested by Peter Zijlstra. ]
2018-07-12 15:39:20 -07:00
Paul E. McKenney
b06ae25a1e rcu: Use RCU CPU stall timeout for rcu_check_gp_start_stall()
Currently, rcu_check_gp_start_stall() waits for one second after the first
request before complaining that a grace period has not yet started.  This
was desirable while testing the conversion from ->future_gp_needed[] to
->gp_seq_needed, but it is a bit on the hair-trigger side for production
use under heavy load.  This commit therefore makes this wait time be
exactly that of the RCU CPU stall warning, allowing easy adjustment of
both timeouts to suit the distribution or installation at hand.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:20 -07:00
Paul E. McKenney
51fbb910f5 rcu: Remove __maybe_unused from rcu_cpu_has_callbacks()
The rcu_cpu_has_callbacks() function is now used in all configurations,
so this commit removes the __maybe_unused.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:19 -07:00
Paul E. McKenney
9622179519 rcu: Remove "inline" from rcu_perf_print_module_parms()
This function is in rcuperf.c, which is not an include file, so there
is no problem dropping the "inline", especially given that this function
is invoked only twice per rcuperf run.  This commit therefore delegates
the inlining decision to the compiler by dropping the "inline".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:19 -07:00
Paul E. McKenney
eac45e586c rcu: Remove "inline" from rcu_torture_print_module_parms()
This function is in rcutorture.c, which is not an include file, so there
is no problem dropping the "inline", especially given that this function
is invoked only twice per rcutorture run.  This commit therefore delegates
the inlining decision to the compiler by dropping the "inline".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:18 -07:00
Paul E. McKenney
95394e69c4 rcu: Remove "inline" from panic_on_rcu_stall() and rcu_blocking_is_gp()
These functions are in kernel/rcu/tree.c, which is not an include file,
so there is no problem dropping the "inline", especially given that these
functions are nowhere near a fastpath.  This commit therefore delegates
the inlining decision to the compiler by dropping the "inline".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:18 -07:00
Paul E. McKenney
ab6b82147f rcu: Remove unused local variable "cpu"
One danger of using __maybe_unused is that the compiler doesn't yell
at you when you remove the last reference, witness rcu_bind_gp_kthread()
and its local variable "cpu".  This commit removes this local variable.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:17 -07:00
Paul E. McKenney
164ba3fc48 rcu: Remove unused rcu_kick_nohz_cpu() function
The rcu_kick_nohz_cpu() function is no longer used, and the functionality
it used to provide is now provided by a call to resched_cpu() in the
force-quiescent-state function rcu_implicit_dynticks_qs().  This commit
therefore removes rcu_kick_nohz_cpu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:17 -07:00
Paul E. McKenney
c7037ff524 rcu: Clarify and correct the rcu_preempt_qs() header comment
The rcu_preempt_qs() function only applies to the CPU, not the task.
A task really is allowed to invoke this function while in an RCU-preempt
read-side critical section, but only if it has first added itself to
some leaf rcu_node structure's ->blkd_tasks list.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:16 -07:00
Paul E. McKenney
3b57a3994f rcu: Inline rcu_dynticks_momentary_idle() into its sole caller
The rcu_dynticks_momentary_idle() function is invoked only from
rcu_momentary_dyntick_idle(), and neither function is particularly
large.  This commit therefore saves a few lines by inlining
rcu_dynticks_momentary_idle() into rcu_momentary_dyntick_idle().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:16 -07:00
Paul E. McKenney
15651201fa rcu: Mark task as .need_qs less aggressively
If any scheduling-clock interrupt interrupts an RCU-preempt read-side
critical section, the interrupted task's ->rcu_read_unlock_special.b.need_qs
field is set.  This causes the outermost rcu_read_unlock() to incur the
extra overhead of calling into rcu_read_unlock_special().  This commit
reduces that overhead by setting ->rcu_read_unlock_special.b.need_qs only
if the grace period has been in effect for more than one second.

Why one second?  Because this is comfortably smaller than the minimum
RCU CPU stall-warning timeout of three seconds, but long enough that the
.need_qs marking should happen quite rarely.  And if your RCU read-side
critical section has run on-CPU for a full second, it is not unreasonable
to invest some CPU time in ending the grace period quickly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:15 -07:00
Paul E. McKenney
6f56f714db rcu: Improve RCU-tasks naming and comments
The naming and comments associated with some RCU-tasks code make
the faulty assumption that context switches due to cond_resched()
are voluntary.  As several people pointed out, this is not the case.
This commit therefore updates function names and comments to better
reflect current reality.

Reported-by: Byungchul Park <byungchul.park@lge.com>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:15 -07:00
Joe Perches
a7538352da rcu: Use pr_fmt to prefix "rcu: " to logging output
This commit also adjusts some whitespace while in the area.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]
2018-07-12 15:39:13 -07:00
Byungchul Park
07f27570dc rcu: Improve rcu_note_voluntary_context_switch() reporting
We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
is called, no matter whether it actually be scheduled or not. However,
it currently doesn't report the quiescent state when the task enters
into __schedule() as it's called with preempt = true. So make it report
the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
called.

And in TINY_RCU, even though the quiescent state of rcu_bh also should
be reported when the tick interrupt comes from user, it doesn't. So make
it reported.

Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
reported when the tick interrupt comes from not only user but also idle,
as an extended quiescent state.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Simplify rcutiny portion given no RCU-tasks for !PREEMPT. ]
2018-07-12 15:39:12 -07:00
Paul E. McKenney
3949fa9bac rcu: Make rcu_read_unlock_special() static
Because rcu_read_unlock_special() is no longer used outside of
kernel/rcu/tree_plugin.h, this commit makes it static.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:11 -07:00
Paul E. McKenney
f2e2df5978 rcu: Add diagnostics for offline CPUs failing to report QS
CPUs are expected to report quiescent states when coming online and
when going offline, and grace-period initialization is supposed to
handle any race conditions where a CPU's ->qsmask bit is set just after
it goes offline.  This commit adds diagnostics for the case where an
offline CPU nevertheless has a grace period waiting on it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:10 -07:00
Paul E. McKenney
fea3f222d3 rcu: Record ->gp_state for both phases of grace-period initialization
Grace-period initialization first processes any recent CPU-hotplug
operations, and then initializes state for the new grace period.  These
two phases of initialization are currently not distinguished in debug
prints, but the distinction is valuable in a number of debug situations.
This commit therefore introduces two new values for ->gp_state,
RCU_GP_ONOFF and RCU_GP_INIT, in order to make this distinction.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:09 -07:00
Paul E. McKenney
5773894231 rcu: Add CPU online/offline state to dump_blkd_tasks()
Interactions between CPU-hotplug operations and grace-period
initialization can result in dump_blkd_tasks().  One of the first
debugging actions in this case is to search back in dmesg to work
out which of the affected rcu_node structure's CPUs are online and to
determine the last CPU-hotplug operation affecting any of those CPUs.
This can be laborious and error-prone, especially when console output
is lost.

This commit therefore causes dump_blkd_tasks() to dump the state of
the affected rcu_node structure's CPUs and the last grace period during
which the last offline and online operation affected each of these CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:09 -07:00
Paul E. McKenney
ff3cee3908 rcu: Add up-tree information to dump_blkd_tasks() diagnostics
This commit updates dump_blkd_tasks() to print out quiescent-state
bitmasks for the rcu_node structures further up the tree.  This
information helps debugging of interactions between CPU-hotplug
operations and RCU grace-period initialization.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:08 -07:00
Paul E. McKenney
e05121ba5b rcu: Remove CPU-hotplug failsafe from force-quiescent-state code path
Now that quiescent states for newly offlined CPUs are reported either
when that CPU goes offline or at the end of grace-period initialization,
the CPU-hotplug failsafe in the force-quiescent-state code path is no
longer needed.

This commit therefore removes this failsafe.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:07 -07:00
Paul E. McKenney
17a8212b8d rcu: Remove failsafe check for lost quiescent state
Now that quiescent-state reporting is fully event-driven, this commit
removes the check for a lost quiescent state from force_qs_rnp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:06 -07:00
Paul E. McKenney
f34f2f5852 rcu: Move grace-period pre-init delay after pre-init
The main race with the early part of grace-period initialization appears
to be with CPU hotplug.  To more fully open this race window, this commit
moves the rcu_gp_slow() from the beginning of the early initialization
loop to follow that loop, thus widening the race window, especially for
the rcu_node structures that are initialized last.  This commit also
expands rcutree.gp_preinit_delay from 3 to 12, giving the same overall
delay in the grace period, but concentrated in the spot where it will
do the most good.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:06 -07:00
Paul E. McKenney
1f3e5f51b9 rcu: Add RCU-preempt check for waiting on newly onlined CPU
RCU should only be waiting on CPUs that were online at the time that the
current grace period started.  Failure to abide by this rule can result
in confusing splats during grace-period cleanup and initialization.
This commit therefore adds a check to RCU-preempt's preempted-task
queuing that checks for waiting on newly onlined CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:05 -07:00
Paul E. McKenney
1e64b15a4b rcu: Fix grace-period hangs due to race with CPU offline
Without special fail-safe quiescent-state-propagation checks, grace-period
hangs can result from the following scenario:

1.	CPU 1 goes offline.

2.	Because CPU 1 is the only CPU in the system blocking the current
	grace period, the grace period ends as soon as
	rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
	returns.

3.	At this point, the leaf rcu_node structure's ->lock is no longer
	held: rcu_report_qs_rnp() has released it, as it must in order
	to awaken the RCU grace-period kthread.

4.	At this point, that same leaf rcu_node structure's ->qsmaskinitnext
	field still records CPU 1 as being online.  This is absolutely
	necessary because the scheduler uses RCU (in this case on the
	wake-up path while awakening RCU's grace-period kthread), and
	->qsmaskinitnext contains RCU's idea as to which CPUs are online.
	Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
	bit from ->qsmaskinitnext would result in a lockdep-RCU splat
	due to RCU being used from an offline CPU.

5.	RCU's grace-period kthread awakens, sees that the old grace period
	has completed and that a new one is needed.  It therefore starts
	a new grace period, but because CPU 1's leaf rcu_node structure's
	->qsmaskinitnext field still shows CPU 1 as being online, this new
	grace period is initialized to wait for a quiescent state from the
	now-offline CPU 1.

6.	Without the fail-safe force-quiescent-state checks, there would
	be no quiescent state from the now-offline CPU 1, which would
	eventually result in RCU CPU stall warnings and memory exhaustion.

It would be good to get rid of the special fail-safe quiescent-state
propagation checks, and thus it would be good to fix things so that
the above scenario cannot happen.  This commit therefore adds a new
->ofl_lock to the rcu_state structure.  This lock is held by rcu_gp_init()
across the applying of buffered online and offline operations to the
rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
when buffering a new offline operation.  This prevents rcu_gp_init()
from acquiring the leaf rcu_node structure's lock during the interval
between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
which releases ->lock and the re-acquisition of that same lock.
This in turn prevents the failure scenario outlined above, and will
hopefully eventually allow removal of the offline-CPU checks from the
force-quiescent-state code path.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:04 -07:00
Paul E. McKenney
ec2c29765a rcu: Fix grace-period hangs from mid-init task resume
Without special fail-safe quiescent-state-propagation checks, grace-period
hangs can result from the following scenario:

1.	A task running on a given CPU is preempted in its RCU read-side
	critical section.

2.	That CPU goes offline, and there are now no online CPUs
	corresponding to that CPU's leaf rcu_node structure.

3.	The rcu_gp_init() function does the first phase of grace-period
	initialization, and sets the aforementioned leaf rcu_node
	structure's ->qsmaskinit field to all zeroes.  Because there
	is a blocked task, it does not propagate the zeroing of either
	->qsmaskinit or ->qsmaskinitnext up the rcu_node tree.

4.	The task resumes on some other CPU and exits its critical section.
	There is no grace period in progress, so the resulting quiescent
	state is not reported up the tree.

5.	The rcu_gp_init() function does the second phase of grace-period
	initialization, which results in the leaf rcu_node structure
	being initialized to expect no further quiescent states, but
	with that structure's parent expecting a quiescent-state report.

	The parent will never receive a quiescent state from this leaf
	rcu_node structure, so the grace period will hang, resulting in
	RCU CPU stall warnings.

It would be good to get rid of the special fail-safe quiescent-state
propagation checks.  This commit therefore checks the leaf rcu_node
structure's ->wait_blkd_tasks field during grace-period initialization.
If this flag is set, the rcu_report_qs_rnp() is invoked to immediately
report the possible quiescent state.  While in the neighborhood, this
commit also report quiescent states for any CPUs that went offline between
the two phases of grace-period initialization, thus reducing grace-period
delays and hopefully eventually allowing removal of offline-CPU checks
from the force-quiescent-state code path.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:04 -07:00
Paul E. McKenney
0b107d24d9 rcu: Suppress false-positive splats from mid-init task resume
Consider the following sequence of events in a PREEMPT=y kernel:

1.	All CPUs corresponding to a given leaf rcu_node structure are
	offline.

2.	The first phase of the rcu_gp_init() function's grace-period
	initialization runs, and sets that rcu_node structure's
	->qsmaskinit to zero, as it should.

3.	One of the CPUs corresponding to that rcu_node structure comes
	back online.  Note that because this CPU came online after the
	grace period started, this grace period can safely ignore this
	newly onlined CPU.

4.	A task running on the newly onlined CPU enters an RCU-preempt
	read-side critical section, and is then preempted.  Because
	the corresponding rcu_node structure's ->qsmask is zero,
	rcu_preempt_ctxt_queue() leaves the rcu_node structure's
	->gp_tasks field NULL, as it should.

5.	The rcu_gp_init() function continues running the second phase of
	grace-period initialization.  The ->qsmask field of the parent of
	the aforementioned leaf rcu_node structure is set to not expect
	a quiescent state from the leaf, as is only right and proper.

	However, when rcu_gp_init() reaches the leaf, it invokes
	rcu_preempt_check_blocked_tasks(), which sees that the leaf's
	->blkd_tasks list is non-empty, and therefore sets the leaf's
	->gp_tasks field to reference the first task on that list.

6.	The grace period ends before the preempted task resumes, which
	is perfectly fine, given that this grace period was under no
	obligation to wait for that task to exit its late-starting
	RCU-preempt read-side critical section.  Unfortunately, the
	leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats.
	After all, it appears to rcu_gp_cleanup() that the grace period
	failed to wait for a task that was supposed to be blocking that
	grace period.

This commit avoids this false-positive splat by adding a check of both
->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks().
If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must
have entered its RCU-preempt read-side critical section late (after all,
the CPU that it is running on was not online at that time), which means
that the upper-level rcu_node structure won't be waiting for anything
on the leaf anyway.

If ->wait_blkd_tasks is non-zero, then there is at least one task on
ths rcu_node structure's ->blkd_tasks list whose RCU read-side
critical section predates the current grace period.  If ->qsmaskinit
is non-zero, there is at least one CPU that was online at the start
of the current grace period.  Thus, if both are zero, there is nothing
to wait for.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:03 -07:00
Paul E. McKenney
99990da1b3 rcu: Suppress more involved false-positive preempted-task splats
Consider the following sequence of events in a PREEMPT=y kernel:

1.	All but one of the CPUs corresponding to a given leaf rcu_node
	structure go offline.  Each of these CPUs clears its bit in that
	structure's ->qsmaskinitnext field.

2.	A new grace period starts, and rcu_gp_init() scans the leaf
	rcu_node structures, applying CPU-hotplug changes since the
	start of the previous grace period, including those changes in
	#1 above.  This copies each leaf structure's ->qsmaskinitnext
	to its ->qsmask field, which represents the CPUs that this new
	grace period will wait on.  Each copy operation is done holding
	the corresponding leaf rcu_node structure's ->lock, and at the
	end of this scan, rcu_gp_init() holds no locks.

3.	The last CPU corresponding to #1's leaf rcu_node structure goes
	offline, clearing its bit in that structure's ->qsmaskinitnext
	field, but not touching the ->qsmaskinit field.  Note that
	rcu_gp_init() is not currently holding any locks!  This CPU does
	-not- report a quiescent state because the grace period has not
	yet initialized itself sufficiently to have set any bits in any
	of the leaf rcu_node structures' ->qsmask fields.

4.	The rcu_gp_init() function continues initializing the new grace
	period, copying each leaf rcu_node structure's ->qsmaskinit
	field to its ->qsmask field while holding the corresponding ->lock.
	This sets the ->qsmask bit corresponding to #3's CPU.

5.	Before the grace period ends, #3's CPU comes back online.
	Because te grace period has not yet done any force-quiescent-state
	scans (which would report a quiescent state on behalf of any
	offline CPUs), this CPU's ->qsmask bit is still set.

6.	A task running on the newly onlined CPU is preempted while in
	an RCU read-side critical section.  Because this CPU's ->qsmask
	bit is net, not only does this task queue itself on the leaf
	rcu_node structure's ->blkd_tasks list, it also sets that
	structure's ->gp_tasks pointer to reference it.

7.	The grace period started in #1 above comes to an end.  This
	results in rcu_gp_cleanup() being invoked, which, among other
	things, checks to make sure that there are no tasks blocking the
	just-ended grace period, that is, that all ->gp_tasks pointers
	are NULL.  The ->gp_tasks pointer corresponding to the task
	preempted in #3 above is non-NULL, which results in a splat.

This splat is a false positive.  The task's RCU read-side critical
section cannot have begun before the just-ended grace period because
this would mean either: (1) The CPU came online before the grace period
started, which cannot have happened because the grace period started
before that CPU went offline, or (2) The task started its RCU read-side
critical section on some other CPU, but then it would have had to have
been preempted before migrating to this CPU, which would mean that it
would have instead queued itself on that other CPU's rcu_node structure.
RCU's grace periods thus are working correctly.  Or, more accurately,
that remaining bugs in RCU's grace periods are elsewhere.

This commit eliminates this false positive by adding code to the end
of rcu_cpu_starting() that reports a quiescent state to RCU, which has
the side-effect of clearing that CPU's ->qsmask bit, preventing the
above scenario.  This approach has the added benefit of more promptly
reporting quiescent states corresponding to offline CPUs.  Nevertheless,
this commit does -not- remove the need for the force-quiescent-state
scans to check for offline CPUs, given that a CPU might remain offline
indefinitely.  And without the checks in the force-quiescent-state scans,
the grace period would also persist indefinitely, which could result in
hangs or memory exhaustion.

Note well that the call to rcu_report_qs_rnp() reporting the quiescent
state must come -after- the setting of this CPU's bit in the leaf rcu_node
structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU will complain
bitterly about quiescent states coming from an offline CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:03 -07:00
Paul E. McKenney
fece27760f rcu: Suppress false-positive preempted-task splats
Consider the following sequence of events in a PREEMPT=y kernel:

1.	All CPUs corresponding to a given rcu_node structure go offline.
	A new grace period starts just after the CPU-hotplug code path
	does its synchronize_rcu() for the last CPU, so at least this
	CPU is present in that structure's ->qsmask.

2.	Before the grace period ends, a CPU comes back online, and not
	just any CPU, but the one corresponding to a non-zero bit in
	the leaf rcu_node structure's ->qsmask.

3.	A task running on the newly onlined CPU is preempted while in
	an RCU read-side critical section.  Because this CPU's ->qsmask
	bit is net, not only does this task queue itself on the leaf
	rcu_node structure's ->blkd_tasks list, it also sets that
	structure's ->gp_tasks pointer to reference it.

4.	The grace period started in #1 above comes to an end.  This
	results in rcu_gp_cleanup() being invoked, which, among other
	things, checks to make sure that there are no tasks blocking the
	just-ended grace period, that is, that all ->gp_tasks pointers
	are NULL.  The ->gp_tasks pointer corresponding to the task
	preempted in #3 above is non-NULL, which results in a splat.

This splat is a false positive.  The task's RCU read-side critical
section cannot have begun before the just-ended grace period because
this would mean either: (1) The CPU came online before the grace period
started, which cannot have happened because the grace period started
before that CPU was all the way offline, or (2) The task started its
RCU read-side critical section on some other CPU, but then it would
have had to have been preempted before migrating to this CPU, which
would mean that it would have instead queued itself on that other CPU's
rcu_node structure.

This commit eliminates this false positive by adding code to the end
of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU,
which has the side-effect of clearing that CPU's ->qsmask bit, preventing
the above scenario.  This approach has the added benefit of more promptly
reporting quiescent states corresponding to offline CPUs.

Note well that the call to rcu_report_qs_rnp() reporting the quiescent
state must come -before- the clearing of this CPU's bit in the leaf
rcu_node structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU
will complain bitterly about quiescent states coming from an offline CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:02 -07:00
Paul E. McKenney
5554788e1d rcu: Suppress false-positive offline-CPU lockdep-RCU splat
The rcu_lockdep_current_cpu_online() function currently checks only the
RCU-sched data structures to determine whether or not RCU believes that a
given CPU is offline.  Unfortunately, there are multiple flavors of RCU,
which means that there is a short window of time during which the various
flavors disagree as to whether or not a given CPU is offline.  This can
result in false-positive lockdep-RCU splats in which some other flavor
of RCU tries to do something based on its view that the CPU is online,
only to get hit with a lockdep-RCU splat because RCU-sched instead
believes that the CPU is offline.

This commit therefore changes rcu_lockdep_current_cpu_online() to scan
all RCU flavors and to consider a given CPU to be online if any of the
RCU flavors believe it to be online, thus preventing these false-positive
splats.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:02 -07:00
Paul E. McKenney
928164351e rcu: Prevent useless FQS scan after all CPUs have checked in
The force_qs_rnp() function checks for ->qsmask being all zero, that is,
all CPUs for the current rcu_node structure having already passed through
quiescent states.  But with RCU-preempt, this is not sufficient to report
quiescent states further up the tree, so there are further checks that
can initiate RCU priority boosting and also for races with CPU-hotplug
operations.  However, if neither of these further checks apply, the code
proceeds to carry out a useless scan of an all-zero ->qsmask.

This commit therefore adds code to release the current rcu_node
structure's lock and continue on to the next rcu_node structure, thereby
avoiding this useless scan.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:01 -07:00
Paul E. McKenney
91f63ced7d rcu: Replace smp_wmb() with smp_store_release() for stall check
This commit gets rid of the smp_wmb() in record_gp_stall_check_time()
in favor of an smp_store_release().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:01 -07:00
Paul E. McKenney
77cfc7bf24 rcu: Fix typo and add additional debug
This commit fixes a typo and adds some additional debugging to the
message emitted when a task blocking the current grace period is listed
as blocking it when either that grace period ends or the next grace
period begins.  This commit also reformats the console message for
readability.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:00 -07:00
Paul E. McKenney
c74859d1eb rcu: Make rcu_report_unblock_qs_rnp() warn on violated preconditions
If rcu_report_unblock_qs_rnp() is invoked on something other than
preemptible RCU or if there are still preempted tasks blocking the
current grace period, something went badly wrong in the caller.
This commit therefore adds WARN_ON_ONCE() to these conditions, but
leaving the legitimate reason for early exit (rnp->qsmask != 0)
unwarned.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:59 -07:00
Paul E. McKenney
8d672fa6bf rcu: Make rcu_init_new_rnp() stop upon already-set bit
Currently, rcu_init_new_rnp() walks up the rcu_node combining tree,
setting bits in the ->qsmaskinit fields on the way up.  It walks up
unconditionally, regardless of the initial state of these bits.  This is
OK because only the corresponding RCU grace-period kthread ever tests
or sets these bits during runtime.  However, it is also pointless, and
it increases both memory and lock contention (albeit only slightly), so
this commit stops the walk as soon as an already-set bit is encountered.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:59 -07:00
Paul E. McKenney
c50cbe535c rcu: Fix an obsolete ->qsmaskinit comment
Back in the old days, when grace-period initialization blocked CPU
hotplug, the ->qsmaskinit mask was indeed updated at the time that
a given CPU went offline.  However, with the deferral of these updates
until the beginning of the next grace period in commit 0aa04b055e
("rcu: Process offlining and onlining only at grace-period start"),
it is instead ->qsmaskinitnext that gets updated at that time.

This commit therefore updates the obsolete comment.  It also fixes
punctuation while on the topic of comments mentioning ->qsmaskinit.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:58 -07:00
Paul E. McKenney
962aff03c3 rcu: Clean up handling of tasks blocked across full-rcu_node offline
Commit 0aa04b055e ("rcu: Process offlining and onlining only at
grace-period start") deferred handling of CPU-hotplug events until the
start of the next grace period, but consider the following sequence
of events:

1.	A task is preempted within an RCU-preempt read-side critical
	section.

2.	The CPU that this task was running on goes offline, along with all
	other CPUs sharing the corresponding leaf rcu_node structure.

3.	The task resumes execution.

4.	One of those CPUs comes back online before a new grace period starts.

In step 2, the code in the next rcu_gp_init() invocation will (correctly)
defer removing the leaf rcu_node structure from the upper-level bitmasks,
and will (correctly) set that structure's ->wait_blkd_tasks field.  During
the ensuing interval, RCU will (correctly) track the tasks preempted on
that structure because they must block any subsequent grace period.

In step 3, the code in rcu_read_unlock_special() will (correctly) remove
the task from the leaf rcu_node structure.  From this point forward, RCU
need not pay attention to this structure, at least not until one of the
corresponding CPUs comes back online.

In step 4, the code in the next rcu_gp_init() invocation will
(incorrectly) invoke rcu_init_new_rnp().  This is incorrect because
the corresponding rcu_cleanup_dead_rnp() was never invoked.  This is
nevertheless harmless because the upper-level bits are still set.
So, no harm, no foul, right?

At least, all is well until a little further into rcu_gp_init()
invocation, which will notice that there are no longer any tasks blocked
on the leaf rcu_node structure, conclude that there is no longer anything
left over from step 2's offline operation, and will therefore invoke
rcu_cleanup_dead_rnp().  But this invocation of rcu_cleanup_dead_rnp()
is for the beginning of the earlier offline interval, and the previous
invocation of rcu_init_new_rnp() is for the end of that same interval.
That is right, they are invoked out of order.

That cannot be good, can it?

It turns out that this is not a (correctness!) problem because
rcu_cleanup_dead_rnp() checks to see if any of the corresponding CPUs
are online, and refuses to do anything if so.  In other words, in the
case where rcu_init_new_rnp() and rcu_cleanup_dead_rnp() execute out of
order, they both have no effect.

But this is at best an accident waiting to happen.

This commit therefore adds logic to rcu_gp_init() so that
rcu_init_new_rnp() and rcu_cleanup_dead_rnp() are always invoked in
order, and so that neither are invoked at all in cases where RCU had to
pay attention to the leaf rcu_node structure during the entire time that
all corresponding CPUs were offline.

And, while in the area, this commit reduces confusion by using formal
parameters rather than local variables that just happen to have the same
value at that particular point in the code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:58 -07:00
Joel Fernandes (Google)
226ca5e766 rcu: Identify grace period is in progress as we advance up the tree
There's no need to keep checking the same starting node for whether a
grace period is in progress as we advance up the funnel lock loop. Its
sufficient if we just checked it in the start, and then subsequently
checked the internal nodes as we advanced up the combining tree. This
also makes sense because the grace-period updates propogate from the
root to the leaf, so there's a chance we may find a grace period has
started as we advance up, lets check for the same.

Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:57 -07:00
Joel Fernandes (Google)
df2bf8f7f7 rcu: Use better variable names in funnel locking loop
The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rnp, and rename other variables as
well to be more appropriate.

Original patch: https://patchwork.kernel.org/patch/10396577/

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix name in comment as well. ]
2018-07-12 15:38:57 -07:00
Joel Fernandes
b73de91d6a rcu: Rename the grace-period-request variables and parameters
The name 'c' is used for variables and parameters holding the requested
grace-period sequence number.  However it is no longer very meaningful
given the conversions from ->gpnum and (especially) ->completed to
->gp_seq. This commit therefore renames 'c' to 'gp_seq_req'.

Previous patch discussion is at:
https://patchwork.kernel.org/patch/10396579/

Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:56 -07:00
Paul E. McKenney
3d18469a2b rcu: Regularize resetting of rcu_data wrap indicator
The rcu_data structure's ->gpwrap indicator is currently reset only
when the CPU in question detects a new grace period.  This is in theory
sufficient because any CPU that has been out of action for long enough
that its ->gpwrap indicator is set is guaranteed to see both the end
of an old grace period and the start of a new one.

However, the current code leaves a short window during which the ->gpwrap
indicator has been reset but the corresponding ->gp_seq counter has not
yet been brought up to date.  This is harmless because interrupts are
disabled, but it is likely to (at the very least) cause confusion.

This commit therefore moves the resetting of ->gpwrap to follow the
updating of ->gp_seq.  While in the area, it also resets ->gp_seq_needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:56 -07:00
Paul E. McKenney
d72193123c rcutorture: Correctly handle grace-period sequence wrap
The new ->gq_seq grace-period sequence numbers must be shifted down,
which give artifacts when these numbers wrap.  This commit therefore
enables rcutorture and rcuperf to handle grace-period sequence numbers
even if they do wrap.  It does this by allowing a special subtraction
function to be specified, and this function subtracts before shifting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:55 -07:00
Paul E. McKenney
2e3e5e5501 rcu: Make rcu_start_this_gp() check for grace period already started
In the old days of ->gpnum and ->completed, the code requesting a new
grace period checked to see if that grace period had already started,
bailing early if so.  The new-age ->gp_seq approach instead checks
whether the grace period has already finished.  A compensating change
pushed the requested grace period down to the bottom of the tree, thus
reducing lock contention and even eliminating it in some cases.  But why
not further reduce contention, especially on large systems, by doing both,
especially given that the cost of doing both is extremely small?

This commit therefore adds a new rcu_seq_started() function that checks
whether a specified grace period has already started.  It then uses
this new function in place of rcu_seq_done() in the rcu_start_this_gp()
function's funnel locking code.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:54 -07:00
Joel Fernandes (Google)
5ca0905f67 rcu: Fix cpustart tracepoint gp_seq number
The "cpustart" trace event shows a stale gp_seq. This is because it uses
rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
function. This commit therefore instead uses rnp->gp_seq.

An alternative fix would be to update rdp->gp_seq earlier, but this would
break RCU's detection of the beginning of a new-to-this-CPU grace period.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:53 -07:00
Joel Fernandes (Google)
5b55072f22 rcu: Produce last "CleanupMore" trace only if late-breaking request
Currently Tree RCU's clean-up code emits a "CleanupMore" trace event in
response to late-arriving grace-period requests even if the grace period
was already requested. This makes "CleanupMore" show up an extra time (in
addition to once for each rcu_node structure that was previously marked
with the request), and for no good reason.  This commit therefore avoids
emitting this trace message unless the the only request for this next
grace period arrived during or after the cleanup scan of the rcu_node
structures.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:53 -07:00
Paul E. McKenney
a2165e4168 rcu: Don't funnel-lock above leaf node if GP in progress
The old grace-period start code would acquire only the leaf's rcu_node
structure's ->lock if that structure believed that a grace period was
in progress.  The new code advances to the leaf's parent in this case,
needlessly acquiring then leaf's parent's ->lock.  This commit therefore
checks the grace-period state after marking the leaf with the need for
the specified grace period, and if the leaf believes that a grace period
is in progress, takes an early exit.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add "Startedleaf" tracing as suggested by Joel Fernandes. ]
2018-07-12 15:38:52 -07:00
Paul E. McKenney
e44e73ca47 rcu: Make simple callback acceleration refer to rdp->gp_seq_needed
Now that the rcu_data structure contains ->gp_seq_needed, create an
rcu_accelerate_cbs_unlocked() helper function that locklessly checks to
see if new callbacks' required grace period has already been requested.
If so, update the callback list locally and again locklessly.  (Though
interrupts must be and are disabled to avoid racing with conflicting
updates in interrupt handlers.)

Otherwise, call rcu_accelerate_cbs() as before.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:49 -07:00
Paul E. McKenney
ff3bb6f4d0 rcu: Remove ->gpnum and ->completed
Now that everything has been converted to use ->gp_seq instead of
->gpnum and ->completed, this commit removes ->gpnum and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:48 -07:00
Paul E. McKenney
fee5997c17 rcu: Convert rcu_fqs tracepoint to ->gp_seq
This commit makes the rcu_fqs tracepoint use ->gp_seq instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:47 -07:00
Paul E. McKenney
db023296f0 rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seq
This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq
instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:47 -07:00
Paul E. McKenney
865aa1e08d rcu: Convert rcu_unlock_preempted_task tracepoint to ->gp_seq
This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq
instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:46 -07:00
Paul E. McKenney
598ce09480 rcu: Convert rcu_preempt_task tracepoint to ->gp_seq
This commit makes the rcu_preempt_task tracepoint use ->gp_seq instead
of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:45 -07:00
Paul E. McKenney
abd13fdd95 rcu: Convert rcu_future_grace_period tracepoint to gp_seq
This commit makes the rcu_future_grace_period tracepoint use gp_seq
instead of ->gpnum and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:44 -07:00
Paul E. McKenney
477351f782 rcu: Convert rcu_grace_period tracepoint to gp_seq
This commit makes the rcu_grace_period tracepoint use gp_seq instead
of ->gpnum or ->completed.  It also introduces a "cpuofl-bgp" string to
less obscurely indicate when a CPU has gone offline while a grace period
is waiting on it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:43 -07:00
Paul E. McKenney
ab5e869c1f rcu: Make rcu_nocb_wait_gp() check if GP already requested
This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see
if the current CPU already knows about the needed grace period having
already been requested.  If so, it avoids acquiring the corresponding
leaf rcu_node structure's ->lock, thus decreasing contention.  This
optimization is intended for cases where either multiple leader rcuo
kthreads are running on the same CPU or these kthreads are running on
a non-offloaded (e.g., housekeeping) CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ]
[ paulmck: Fix caching of furthest-future requested grace period. ]
2018-07-12 15:38:42 -07:00
Paul E. McKenney
7a1d0f23ad rcu: Move from ->need_future_gp[] to ->gp_seq_needed
One problem with the ->need_future_gp[] array is that the grace-period
assignment of each element changes as the grace periods complete.
This means that it is necessary to hold a lock when checking this
array to learn if a given grace period has already been requested.
This increase lock contention, which is the opposite of helpful.
This commit therefore replaces the ->need_future_gp[] with a single
->gp_seq_needed value and keeps it updated in the rcu_data structure.

This will enable reliable lockless checking of whether or not a given
grace period has already been requested.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:37:48 -07:00
Paul E. McKenney
aebc82644b rcutorture: Convert rcutorture_get_gp_data() to ->gp_seq
SRCU has long used ->srcu_gp_seq, and now RCU uses ->gp_seq.  This
commit therefore moves the rcutorture_get_gp_data() function from
a ->gpnum / ->completed pair to ->gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:57 -07:00
Paul E. McKenney
471f87c3d9 rcu: Make RCU CPU stall warnings use ->gp_seq
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(),
print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum
and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:56 -07:00
Paul E. McKenney
29365e563b rcu: Convert grace-period requests to ->gp_seq
This commit converts the grace-period request code paths from ->completed
and ->gpnum to ->gp_seq.  The need_future_gp_element() macro encapsulates
the shift operation required to use ->gp_seq as an index to the
->need_future_gp[] array.  The rcu_cbs_completed() function is removed
in favor of the rcu_seq_snap() function.  The rcu_start_this_gp()
gets some temporary consistency checks and uses rcu_seq_done(),
rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place
of the earlier open-coded comparisons of ->gpnum and ->completed.
The rcu_future_gp_cleanup() function replaces use of ->completed
with ->gp_seq.  The rcu_accelerate_cbs() function replaces a call to
rcu_cbs_completed() with one to rcu_seq_snap().  The rcu_advance_cbs()
function replaces an access to >completed with one to ->gp_seq and adds
some temporary warnings.  The rcu_nocb_wait_gp() function replaces a
call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded
comparison with rcu_seq_done().

The temporary warnings will be removed when the various ->gpnum and
->completed fields are removed.  Their purpose is to locate code who
might still be using ->gpnum and ->completed.  (Much easier that way
than trying to trace down the causes of too-short grace periods and
grace-period hangs!)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:55 -07:00
Paul E. McKenney
d43a5d32e1 rcu: Convert ->completedqs to ->gp_seq
This commit switches the quiescent-state no-backtracking checks from
->gpnum and ->completed to ->gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:54 -07:00
Paul E. McKenney
8aa670cdac rcu: Convert ->rcu_iw_gpnum to ->gp_seq
This commit switches the interrupt-disabled detection mechanism to
->gp_seq.  This mechanism is used as part of RCU CPU stall warnings,
and detects cases where the stall is due to a CPU having interrupts
disabled.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:53 -07:00
Paul E. McKenney
ba04107fc9 rcu: Move rcu_gp_in_progress() to ->gp_seq
This commit makes rcu_gp_in_progress() use ->gp_seq instead of
->completed and ->gpnum.  The READ_ONCE() invocations are buried
in rcu_seq_current().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:52 -07:00
Paul E. McKenney
e0da2374c3 rcu: Move rcu_nocb_gp_get() to ->gp_seq
This commit makes rcu_try_advance_all_cbs() use ->gp_seq.  It uses
rcu_seq_ctr() in order to shift away the state bits, so that the
low-order bits of the result may safely be used to index ->nocb_gp_wq[].

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:52 -07:00
Paul E. McKenney
03c8cb765a rcu: Move rcu_try_advance_all_cbs() to ->gp_seq
This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the
exception of tracing, which will be converted later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:51 -07:00
Paul E. McKenney
e05720b097 rcu: Move rcu_implicit_dynticks_qs() to ->gp_seq
This commit makes rcu_implicit_dynticks_qs() use ->gp_seq, with the
exception of tracing, which will be converted later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:51 -07:00
Paul E. McKenney
a66ae8ae35 rcu: Convert rcu_gpnum_ovf() to ->gp_seq
This commit converts rcu_gpnum_ovf() to use ->gp_seq instead of ->gpnum.
Same size unsigned long, so same approach.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:50 -07:00
Paul E. McKenney
67e14c1e39 rcu: Move RCU's grace-period-change code to ->gp_seq
This commit moves __note_gp_changes(), note_gp_changes(), and
__rcu_pending() to ->gp_seq, creating new rcu_seq_completed_gp() and
rcu_seq_new_gp() functions for this purpose.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Reinstate "cpuend: trace as suggested by Joel Fernandes. ]
2018-07-12 14:27:50 -07:00
Paul E. McKenney
e4be81a2ed rcu: Convert conditional grace-period primitives to ->gp_seq
This commit converts get_state_synchronize_rcu(), cond_synchronize_rcu(),
get_state_synchronize_sched(), and cond_synchronize_sched() from ->gpnum
and ->completed to ->gp_seq.  Note that this also introduces a full
memory barrier in the already-done paths off cond_synchronize_rcu() and
cond_synchronize_sched(), as work with LKMM indicates that the earlier
smp_load_acquire() were insufficiently strong in some situations where
these two functions were called just as the grace period ended.  In such
cases, these two functions would not gain the benefit of memory ordering
at the end of the grace period.

Please note that the performance impact is negligible, as you shouldn't
be using either function anywhere near a fastpath in any case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:49 -07:00
Paul E. McKenney
c9a24e2d0c rcu: Make quiescent-state reporting use ->gp_seq
This commit switches the functions reporting quiescent states from
use of ->gpnum to ->gp_seq.  In either case, the point is to handle
races where a given grace period ends before a quiescent state can
be reported.  Failing to catch these races would result in too-short
grace periods, hence the checking.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:48 -07:00
Paul E. McKenney
78c5a67f17 rcu: Convert rcu_check_gp_kthread_starvation() to GP sequence number
This commit switches rcu_check_gp_kthread_starvation() from printing
->gpnum and ->completed to printing ->gp_seq upon detecting a starving
RCU grace-period kthread during an RCU CPU stall warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:48 -07:00
Paul E. McKenney
17ef2fe97c rcu: Make rcutorture's batches-completed API use ->gp_seq
The rcutorture test invokes rcu_batches_started(),
rcu_batches_completed(), rcu_batches_started_bh(),
rcu_batches_completed_bh(), rcu_batches_started_sched(), and
rcu_batches_completed_sched() to do grace-period consistency checks,
and rcuperf uses the _completed variants for statistics.
These functions use ->gpnum and ->completed.  This commit therefore
replaces them with rcu_get_gp_seq(), rcu_bh_get_gp_seq(), and
rcu_sched_get_gp_seq(), adjusting rcutorture and rcuperf to make
use of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:47 -07:00
Paul E. McKenney
dee4f42298 rcu: Move rcu_gp_slow() to ->gp_seq
This commit moves rcu_gp_slow() to ->gp_seq.  This function only uses
the grace-period number to modulate delay, so rcu_seq_ctr(rsp->gp_seq)
gets the same effect, at least in cases where the delay is to happen
more than four times per wrap of an unsigned long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:46 -07:00
Paul E. McKenney
de30ad512a rcu: Introduce grace-period sequence numbers
This commit adds grace-period sequence numbers (->gp_seq) to the
rcu_state, rcu_node, and rcu_data structures, and updates them.
It also checks for consistency between rsp->gpnum and rsp->gp_seq.
These ->gp_seq counters will eventually replace the existing ->gpnum
and ->completed counters, allowing a single memory access to determine
whether or not a grace period is in progress and if so, which one.
This in turn will enable changes that will reduce ->lock contention on
the leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:46 -07:00
Paul E. McKenney
609af1cdf0 Merge branches 'expedited.2018.07.12a', 'fixes.2018.07.12a', 'srcu.2018.06.25b' and 'torture.2018.06.25b' into HEAD
expedited.2018.07.12a: Expedited grace-period updates.
fixes.2018.07.12a: Pre-gp_seq miscellaneous fixes.
srcu.2018.06.25b: SRCU updates.
torture.2018.06.25b: Pre-gp_seq torture-test updates.
2018-07-12 14:26:14 -07:00
Paul E. McKenney
18390aeae7 rcu: Make rcu_gp_cleanup() write only once to ->gp_flags
At the end of rcu_gp_cleanup(), if another grace period is needed, but
not via rcu_accelerate_cbs(), the ->gp_flags field is written twice,
once when making the new grace-period request, and once when clearing
all other types of requests.  This commit therefore adds an else-clause
to avoid this double write.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:25:17 -07:00
Paul E. McKenney
26d950a945 rcu: Diagnostics for grace-period startup hangs
This commit causes a splat if RCU is idle and a request for a new grace
period is ignored for more than one second.  This splat normally indicates
that some code path asked for a new grace period, but failed to wake up
the RCU grace-period kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix bug located by Dan Carpenter and his static checker. ]
[ paulmck: Fix self-deadlock bug located 0day test robot. ]
[ paulmck: Disable unless CONFIG_PROVE_RCU=y. ]
2018-07-12 14:24:42 -07:00
Boqun Feng
fcc6354365 rcu: Make expedited GPs handle CPU 0 being offline
Currently, the parallelized initialization of expedited grace periods uses
the workqueue associated with each rcu_node structure's ->grplo field.
This works fine unless that CPU is offline.  This commit therefore uses
the CPU corresponding to the lowest-numbered online CPU, or just queues
the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding
to this rcu_node structure.

Note that this patch uses cpu_is_offline() instead of the usual approach
of checking bits in the rcu_node structure's ->qsmaskinitnext field.  This
is safe because preemption is disabled across both the cpu_is_offline()
check and the call to queue_work_on().

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Disable preemption to close offline race window. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply Peter Zijlstra feedback on CPU selection. ]
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2018-07-12 12:36:06 -07:00
Paul E. McKenney
8c42b1f39f rcu: Exclude near-simultaneous RCU CPU stall warnings
There is a two-jiffy delay between the time that a CPU will self-report
an RCU CPU stall warning and the time that some other CPU will report a
warning on behalf of the first CPU.  This has worked well in the past,
but on busy systems, it is possible for the two warnings to overlap,
which makes interpreting them extremely difficult.

This commit therefore uses a cmpxchg-based timing decision that
allows only one report in a given one-minute period (assuming default
stall-warning Kconfig parameters).  This approach will of course fail
if you are seeing minute-long vCPU preemption, but in that case the
overlapping RCU CPU stall warnings are the least of your worries.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:56 -07:00
Boqun Feng
ce11fae8d4 rcu: Use the proper lockdep annotation in dump_blkd_tasks()
Sparse reported this:

| kernel/rcu/tree_plugin.h:814:9: warning: incorrect type in argument 1 (different modifiers)
| kernel/rcu/tree_plugin.h:814:9:    expected struct lockdep_map const *lock
| kernel/rcu/tree_plugin.h:814:9:    got struct lockdep_map [noderef] *<noident>

This is caused by using vanilla lockdep annotations on rcu_node::lock,
and that requires accessing ->lock of rcu_node directly. However we need
to keep rcu_node::lock __private to avoid breaking its extra ordering
guarantee. And we have a dedicated lockdep annotation for
rcu_node::lock, so use it.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:55 -07:00
Paul E. McKenney
4bc8d55574 rcu: Add debugging info to assertion
The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in
rcu_gp_cleanup() triggers (inexplicably, of course) every so often.
This commit therefore extracts more information.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:55 -07:00
Paul E. McKenney
6050003763 torture: Keep old-school dmesg format
This commit adds "#define pr_fmt(fmt) fmt" to the torture-test files
in order to keep the current dmesg format.  Once Joe's commits have
hit mainline, these definitions will be changed in order to automatically
generate the dmesg line prefix that the scripts expect.  This will have
the beneficial side-effect of allowing printk() formats to be used more
widely and of shortening some pr_*() lines.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Joe Perches <joe@perches.com>
2018-06-25 11:30:10 -07:00
Paul E. McKenney
90127d605f torture: Make online/offline messages appear only for verbose=2
Some bugs reproduce quickly only at high CPU-hotplug rates, so the
rcutorture TREE03 scenario now has only 200 milliseconds spacing between
CPU-hotplug operations.  At this rate, the torture-test pair of console
messages per operation becomes a bit voluminous.  This commit therefore
converts the torture-test set of "verbose" kernel-boot arguments from
bool to int, and prints the extra console messages only when verbose=2.
The default is still verbose=1.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:30:10 -07:00
Paul E. McKenney
5ab07a8df4 srcu: Add address of first callback to rcutorture output
This commit adds the address of the first callback to the per-CPU rcutorture
output in order to allow lost wakeups to be more efficiently tracked down.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
17294ce6a4 srcu: Document that srcu_funnel_gp_start() implies srcu_funnel_exp_start()
This commit updates the header comment of srcu_funnel_gp_start() to
document the fact that srcu_funnel_gp_start() does the work of
srcu_funnel_exp_start(), in some cases by invoking it directly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
5ef98a6328 srcu: Fix typos in __call_srcu() header comment
This commit simply changes some copy-pasta call_rcu() instances to
the correct call_srcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
5257514d88 rcu: Make expedited grace period use direct call on last leaf
During expedited grace-period initialization, a work item is scheduled
for each leaf rcu_node structure.  However, that initialization code
is itself (normally) executing from a workqueue, so one of the leaf
rcu_node structures could just as well be handled by that pre-existing
workqueue, and with less overhead.  This commit therefore uses a
shiny new rcu_is_leaf_node() macro to execute the last leaf rcu_node
structure's initialization directly from the pre-existing workqueue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:25:41 -07:00
Peter Zijlstra
b3dae109fa sched/swait: Rename to exclusive
Since swait basically implemented exclusive waits only, make sure
the API reflects that.

  $ git grep -l -e "\<swake_up\>"
		-e "\<swait_event[^ (]*"
		-e "\<prepare_to_swait\>" | while read file;
    do
	sed -i -e 's/\<swake_up\>/&_one/g'
	       -e 's/\<swait_event[^ (]*/&_exclusive/g'
	       -e 's/\<prepare_to_swait\>/&_exclusive/g' $file;
    done

With a few manual touch-ups.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: bigeasy@linutronix.de
Cc: oleg@redhat.com
Cc: paulmck@linux.vnet.ibm.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
2018-06-20 11:35:56 +02:00
Kees Cook
42bc47b353 treewide: Use array_size() in vmalloc()
The vmalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vmalloc(a * b)

with:
        vmalloc(array_size(a, b))

as well as handling cases of:

        vmalloc(a * b * c)

with:

        vmalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vmalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vmalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vmalloc(C1 * C2 * C3, ...)
|
  vmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vmalloc(C1 * C2, ...)
|
  vmalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Peter Zijlstra
f64c6013a2 rcu/x86: Provide early rcu_cpu_starting() callback
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.

RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.

Fix this by initializing RCU way early in the MTRR case.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
2018-05-22 16:12:26 -07:00
Paul E. McKenney
22df7316ac Merge branches 'exp.2018.05.15a', 'fixes.2018.05.15a', 'lock.2018.05.15a' and 'torture.2018.05.15a' into HEAD
exp.2018.05.15a: Parallelize expedited grace-period initialization.
fixes.2018.05.15a: Miscellaneous fixes.
lock.2018.05.15a: Decrease lock contention on root rcu_node structure,
	which is a step towards merging RCU flavors.
torture.2018.05.15a: Torture-test updates.
2018-05-15 10:33:05 -07:00
Paul E. McKenney
034777d7f5 rcutorture: Print end-of-test state
This commit adds end-of-test state printout to help check whether RCU
shut down nicely.  Note that this printout only helps for flavors of
RCU that are not used much by the kernel.  In particular, for normal
RCU having a grace period in progress is expected behavior.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:32:08 -07:00
Paul E. McKenney
a458360af6 rcu: Drop early GP request check from rcu_gp_kthread()
Now that grace-period requests use funnel locking and now that they
set ->gp_flags to RCU_GP_FLAG_INIT even when the RCU grace-period
kthread has not yet started, rcu_gp_kthread() no longer needs to check
need_any_future_gp() at startup time.  This commit therefore removes
this check.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:31:04 -07:00
Paul E. McKenney
c1935209df rcu: Simplify and inline cpu_needs_another_gp()
Now that RCU no longer relies on failsafe checks, cpu_needs_another_gp()
can be greatly simplified.  This simplification eliminates the last
call to rcu_future_needs_gp() and to rcu_segcblist_future_gp_needed(),
both of which which can then be eliminated.  And then, because
cpu_needs_another_gp() is called only from __rcu_pending(), it can be
inlined and eliminated.

This commit carries out the simplification, inlining, and elimination
called out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:59 -07:00
Paul E. McKenney
384f77f4cb rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp()
All of the cpu_needs_another_gp() function's checks (except for
newly arrived callbacks) have been subsumed into the rcu_gp_cleanup()
function's scan of the rcu_node tree.  This commit therefore drops the
call to cpu_needs_another_gp().  The check for newly arrived callbacks
is supplied by rcu_accelerate_cbs().  Any needed advancing (as in the
earlier rcu_advance_cbs() call) will be supplied when the corresponding
CPU becomes aware of the end of the now-completed grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:54 -07:00
Paul E. McKenney
665f08f1ce rcu: Make rcu_start_this_gp() check for out-of-range requests
If rcu_start_this_gp() is invoked with a requested grace period more
than three in the future, then either the ->need_future_gp[] array
needs to be bigger or the caller needs to be repaired.  This commit
therefore adds a WARN_ON_ONCE() checking for this condition.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:48 -07:00
Paul E. McKenney
360e0da67e rcu: Add funnel locking to rcu_start_this_gp()
The rcu_start_this_gp() function had a simple form of funnel locking that
used only the leaves and root of the rcu_node tree, which is fine for
systems with only a few hundred CPUs, but sub-optimal for systems having
thousands of CPUs.  This commit therefore adds full-tree funnel locking.

This variant of funnel locking is unusual in the following ways:

1.	The leaf-level rcu_node structure's ->lock is held throughout.
	Other funnel-locking implementations drop the leaf-level lock
	before progressing to the next level of the tree.

2.	Funnel locking can be started at the root, which is convenient
	for code that already holds the root rcu_node structure's ->lock.
	Other funnel-locking implementations start at the leaves.

3.	If an rcu_node structure other than the initial one believes
	that a grace period is in progress, it is not necessary to
	go further up the tree.  This is because grace-period cleanup
	scans the full tree, so that marking the need for a subsequent
	grace period anywhere in the tree suffices -- but only if
	a grace period is currently in progress.

4.	It is possible that the RCU grace-period kthread has not yet
	started, and this case must be handled appropriately.

However, the general approach of using a tree to control lock contention
is still in place.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:37 -07:00
Paul E. McKenney
41e80595ab rcu: Make rcu_start_future_gp() caller select grace period
The rcu_accelerate_cbs() function selects a grace-period target, which
it uses to have rcu_segcblist_accelerate() assign numbers to recently
queued callbacks.  Then it invokes rcu_start_future_gp(), which selects
a grace-period target again, which is a bit pointless.  This commit
therefore changes rcu_start_future_gp() to take the grace-period target as
a parameter, thus avoiding double selection.  This commit also changes
the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect
this change in functionality, and also makes a similar change to the
name of trace_rcu_future_gp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:32 -07:00
Paul E. McKenney
d5cd96851d rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp()
The rcu_start_gp_advanced() is invoked only from rcu_start_future_gp() and
much of its code is redundant when invoked from that context.  This commit
therefore inlines rcu_start_gp_advanced() into rcu_start_future_gp(),
then removes rcu_start_gp_advanced().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:27 -07:00
Paul E. McKenney
a824a287f6 rcu: Clear request other than RCU_GP_FLAG_INIT at GP end
Once the grace period has ended, any RCU_GP_FLAG_FQS requests are
irrelevant:  The grace period has ended, so there is no longer any
point in forcing quiescent states in order to try to make it end sooner.
This commit therefore causes rcu_gp_cleanup() to clear any bits other
than RCU_GP_FLAG_INIT from ->gp_flags at the end of the grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:20 -07:00
Paul E. McKenney
a508aa597e rcu: Cleanup, don't put ->completed into an int
It is true that currently only the low-order two bits are used, so
there should be no problem given modern machines and compilers, but
good hygiene and maintainability dictates use of an unsigned long
instead of an int.  This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:15 -07:00
Paul E. McKenney
bd7af8463b rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs()
The __rcu_process_callbacks() function currently checks to see if
the current CPU needs a grace period and also if there is any other
reason to kick off a new grace period.  This is one of the fail-safe
checks that has been rendered unnecessary by the changes that increase
the accuracy of rcu_gp_cleanup()'s estimate as to whether another grace
period is required.  Because this particular fail-safe involved acquiring
the root rcu_node structure's ->lock, which has seen excessive contention
in real life, this fail-safe needs to go.

However, one check must remain, namely the check for newly arrived
RCU callbacks that have not yet been associated with a grace period.
One might hope that the checks in __note_gp_changes(), which is invoked
indirectly from rcu_check_quiescent_state(), would suffice, but this
function won't be invoked at all if RCU is idle.  It is therefore necessary
to replace the fail-safe checks with a simpler check for newly arrived
callbacks during an RCU idle period, which is exactly what this commit
does.  This change removes the final call to rcu_start_gp(), so this
function is removed as well.

Note that lockless use of cpu_needs_another_gp() is racy, but that
these races are harmless in this case.  If RCU really is idle, the
values will not change, so the return value from cpu_needs_another_gp()
will be correct.  If RCU is not idle, the resulting redundant call to
rcu_accelerate_cbs() will be harmless, and might even have the benefit
of reducing grace-period latency a bit.

This commit also moves interrupt disabling into the "if" statement to
improve real-time response a bit.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:03 -07:00
Paul E. McKenney
a6058d85a2 rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition
When __call_rcu_core() notices excessive numbers of callbacks pending
on the current CPU, we know that at least one of them is not yet
classified, namely the one that was just now queued.  Therefore, it
is not necessary to invoke rcu_start_gp() and thus not necessary to
acquire the root rcu_node structure's ->lock.  This commit therefore
replaces the rcu_start_gp() with rcu_accelerate_cbs(), thus replacing
an acquisition of the root rcu_node structure's ->lock with that of
this CPU's leaf rcu_node structure.

This decreases contention on the root rcu_node structure's ->lock.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:57 -07:00
Paul E. McKenney
ec4eaccef4 rcu: Make rcu_migrate_callbacks wake GP kthread when needed
The rcu_migrate_callbacks() function invokes rcu_advance_cbs()
twice, ignoring the return value.  This is OK at pressent because of
failsafe code that does the wakeup when needed.  However, this failsafe
code acquires the root rcu_node structure's lock frequently, while
rcu_migrate_callbacks() does so only once per CPU-offline operation.

This commit therefore makes rcu_migrate_callbacks()
wake up the RCU GP kthread when either call to rcu_advance_cbs()
returns true, thus removing need for the failsafe code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:51 -07:00
Paul E. McKenney
6f576e2816 rcu: Convert ->need_future_gp[] array to boolean
There is no longer any need for ->need_future_gp[] to count the number of
requests for future grace periods, so this commit converts the additions
to assignments to "true" and reduces the size of each element to one byte.
While we are in the area, fix an obsolete comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:46 -07:00
Paul E. McKenney
0ae94e00ce rcu: Make rcu_future_needs_gp() check all ->need_future_gps[] elements
Currently, the rcu_future_needs_gp() function checks only the current
element of the ->need_future_gps[] array, which might miss elements that
were offset from the expected element, for example, due to races with
the start or the end of a grace period.  This commit therefore makes
rcu_future_needs_gp() use the need_any_future_gp() macro to check all
of the elements of this array.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:41 -07:00
Paul E. McKenney
51af970d19 rcu: Avoid losing ->need_future_gp[] values due to GP start/end races
The rcu_cbs_completed() function provides the value of ->completed
at which new callbacks can safely be invoked.  This is recorded in
two-element ->need_future_gp[] arrays in the rcu_node structure, and
the elements of these arrays corresponding to the just-completed grace
period are zeroed at the end of that grace period.  However, the
rcu_cbs_completed() function can return the current ->completed value
plus either one or two, so it is possible for the corresponding
->need_future_gp[] entry to be cleared just after it was set, thus
losing a request for a future grace period.

This commit avoids this race by expanding ->need_future_gp[] to four
elements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:33 -07:00
Paul E. McKenney
fb31340f8a rcu: Make rcu_gp_cleanup() more accurately predict need for new GP
Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset
state to reflect the end of the grace period.  It also checks to see
whether a new grace period is needed, but in a number of cases, rather
than directly cause the new grace period to be immediately started, it
instead leaves the grace-period-needed state where various fail-safes
can find it.  This works fine, but results in higher contention on the
root rcu_node structure's ->lock, which is undesirable, and contention
on that lock has recently become noticeable.

This commit therefore makes rcu_gp_cleanup() immediately start a new
grace period if there is any need for one.

It is quite possible that it will later be necessary to throttle the
grace-period rate, but that can be dealt with when and if.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:28 -07:00
Paul E. McKenney
5fe0a56298 rcu: Make rcu_gp_kthread() check for early-boot activity
The rcu_gp_kthread() function immediately sleeps waiting to be notified
of the need for a new grace period, which currently works because there
are a number of code sequences that will provide the needed wakeup later.
However, some of these code sequences need to acquire the root rcu_node
structure's ->lock, and contention on that lock has started manifesting.
This commit therefore makes rcu_gp_kthread() check for early-boot activity
when it starts up, omitting the initial sleep in that case.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:24 -07:00
Paul E. McKenney
c91a8675b9 rcu: Add accessor macros for the ->need_future_gp[] array
Accessors for the ->need_future_gp[] array are currently open-coded,
which makes them difficult to change.  To improve maintainability, this
commit adds need_future_gp_mask() to compute the indexing mask from the
array size, need_future_gp_element() to access the element corresponding
to the specified grace-period number, and need_any_future_gp() to
determine if any future grace period is needed.  This commit also applies
need_future_gp_element() to existing open-coded single-element accesses.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:18 -07:00
Paul E. McKenney
825a9911f6 rcu: Make rcu_start_future_gp()'s grace-period check more precise
The rcu_start_future_gp() function uses a sloppy check for a grace
period being in progress, which works today because there are a number
of code sequences that resolve the resulting races.  However, some of
these race-resolution code sequences must acquire the root rcu_node
structure's ->lock, and contention on that lock has started manifesting.
This commit therefore makes rcu_start_future_gp() check more precise,
eliminating the sloppy lockless check of the rcu_state structure's ->gpnum
and ->completed fields.  The effect is that rcu_start_future_gp() will
sometimes unnecessarily attempt to start a new grace period, but this
overhead will be reduced later using funnel locking.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:13 -07:00
Paul E. McKenney
9036c2ffd5 rcu: Improve non-root rcu_cbs_completed() accuracy
When rcu_cbs_completed() is invoked on a non-root rcu_node structure,
it unconditionally assumes that two grace periods must complete before
the callbacks at hand can be invoked.  This is overly conservative because
if that non-root rcu_node structure believes that no grace period is in
progress, and if the corresponding rcu_state structure's ->gpnum field
has not yet been incremented, then these callbacks may safely be invoked
after only one grace period has completed.

This change is required to permit grace-period start requests to use
funnel locking, which is in turn permitted to reduce root rcu_node ->lock
contention, which has been observed by Nick Piggin.  Furthermore, such
contention will likely be increased by the merging of RCU-bh, RCU-preempt,
and RCU-sched, so it makes sense to take steps to decrease it.

This commit therefore improves the accuracy of rcu_cbs_completed() when
invoked on a non-root rcu_node structure as described above.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:07 -07:00
Paul E. McKenney
5b4c11d54b rcu: Add leaf-node macros
This commit adds rcu_first_leaf_node() that returns a pointer to
the first leaf rcu_node structure in the specified RCU flavor and an
rcu_is_leaf_node() that returns true iff the specified rcu_node structure
is a leaf.  This commit also uses these macros where appropriate.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:28:07 -07:00
Paul E. McKenney
f7194ac32c srcu: Add cleanup_srcu_struct_quiesced()
The current cleanup_srcu_struct() flushes work, which prevents it
from being invoked from some workqueue contexts, as well as from
atomic (non-blocking) contexts.  This patch therefore introduced a
cleanup_srcu_struct_quiesced(), which can be invoked only after all
activity on the specified srcu_struct has completed.  This restriction
allows cleanup_srcu_struct_quiesced() to be invoked from workqueue
contexts as well as from atomic contexts.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nitzan Carmi <nitzanc@mellanox.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:56 -07:00
Yury Norov
17672480fb rcu: Declare rcu_eqs_special_set() in public header
Because rcu_eqs_special_set() is declared only in internal header
kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h, it is
inaccessible outside of the RCU implementation.  This patch therefore
moves the  rcu_eqs_special_set() declaration to include/linux/rcutree.h,
which allows it to be used in non-rcu kernel code.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:51 -07:00
Paul E. McKenney
265f5f28f0 rcu: Update rcu_bind_gp_kthread() header comment
The header comment for rcu_bind_gp_kthread() refers to sysidle, which
is no longer with us.  However, it is still important to bind RCU's
grace-period kthreads to the housekeeping CPU(s), so rather than remove
rcu_bind_gp_kthread(), this commit updates the comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:46 -07:00
Paul E. McKenney
0e5da22e3f rcu: Move __rcu_read_lock() and __rcu_read_unlock() to tree_plugin.h
The __rcu_read_lock() and __rcu_read_unlock() functions were moved
to kernel/rcu/update.c in order to implement tiny preemptible RCU.
However, tiny preemptible RCU was removed from the kernel a long time
ago, so this commit belatedly moves them back into the only remaining
preemptible-RCU code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:41 -07:00
Paul E. McKenney
cee4393989 rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs()
Commit e31d28b6ab ("trace: Eliminate cond_resched_rcu_qs() in favor
of cond_resched()") substituted cond_resched() for the earlier call
to cond_resched_rcu_qs().  However, the new-age cond_resched() does
not do anything to help RCU-tasks grace periods because (1) RCU-tasks
is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a
complete no-op when preemption is enabled.  This situation results
in hangs when running the trace benchmarks.

A number of potential fixes were discussed on LKML
(https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home),
including making cond_resched() not be a no-op; making cond_resched()
not be a no-op, but only when running tracing benchmarks; reverting
the aforementioned commit (which works because cond_resched_rcu_qs()
does provide an RCU-tasks quiescent state; and adding a call to the
scheduler/RCU rcu_note_voluntary_context_switch() function.  All were
deemed unsatisfactory, either due to added cond_resched() overhead or
due to magic functions inviting cargo culting.

This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(),
which provides a clear hint as to what this function is doing and
why and where it should be used, and then replaces the call to
cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's
benchmark_event_kthread() function.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:29 -07:00
Byungchul Park
6fba2b3767 rcu: Remove deprecated RCU debugfs tracing code
Commit ae91aa0adb ("rcu: Remove debugfs tracing") removed the
RCU debugfs tracing code, but did not remove the no-longer used
->exp_workdone{0,1,2,3} fields in the srcu_data structure.  This commit
therefore removes these fields along with the code that uselessly
updates them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:23 -07:00
Byungchul Park
efcd2d5436 rcu: Call wake_nocb_leader_defer() with 'FORCE' when nocb_q_count is high
If an excessive number of callbacks have been queued, but the NOCB
leader kthread's wakeup must be deferred, then we should wake up the
leader unconditionally once it is safe to do so.

This was handled correctly in commit fbce7497ee ("rcu: Parallelize and
economize NOCB kthread wakeups"), but then commit 8be6e1b15c ("rcu:
Use timer as backstop for NOCB deferred wakeups") passed RCU_NOCB_WAKE
instead of the correct RCU_NOCB_WAKE_FORCE to wake_nocb_leader_defer().
As an interesting aside, RCU_NOCB_WAKE_FORCE is never passed to anything,
which should have been taken as a hint.  ;-)

This commit therefore passes RCU_NOCB_WAKE_FORCE instead of RCU_NOCB_WAKE
to wake_nocb_leader_defer() when a callback is queued onto a NOCB CPU
that already has an excessive number of callbacks pending.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:18 -07:00
Paul E. McKenney
ef12620626 rcu: Don't allocate rcu_nocb_mask if no one needs it
Commit 44c65ff2e3 ("rcu: Eliminate NOCBs CPU-state Kconfig options")
made allocation of rcu_nocb_mask depend only on the rcu_nocbs=,
nohz_full=, or isolcpus= kernel boot parameters.  However, it failed
to change the initial value of rcu_init_nohz()'s local variable
need_rcu_nocb_mask to false, which can result in useless allocation
of an all-zero rcu_nocb_mask.  This commit therefore fixes this bug by
changing the initial value of need_rcu_nocb_mask to false.

While we are in the area, also correct the error message that is printed
when someone specifies that can-never-exist CPUs should be NOCBs CPUs.

Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Byungchul Park <byungchul.park@lge.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:11 -07:00
Byungchul Park
be01b4cab1 rcu: Inline rcu_preempt_do_callback() into its sole caller
The rcu_preempt_do_callbacks() function was introduced in commit
09223371dea(rcu: Use softirq to address performance regression), where it
was necessary to handle kernel builds both containing and not containing
RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have
resulted in this function being invoked only from rcu_kthread_do_work(),
which is present only in kernels containing RCU-preempt, which in turn
means that the rcu_preempt_do_callbacks() function is no longer needed.

This commit therefore inlines rcu_preempt_do_callbacks() into its
sole remaining caller and also removes the rcu_state_p and rcu_data_p
indirection for added clarity.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:00 -07:00
Boqun Feng
55ebfce060 rcu: exp: Protect all sync_rcu_preempt_exp_done() with rcu_node lock
Currently some callsites of sync_rcu_preempt_exp_done() are not called
with the corresponding rcu_node's ->lock held, which could introduces
bugs as per Paul:

o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
	sees that it is NULL.

o	CPU 1 blocks within an RCU read-side critical section, so
	it enqueues the task and points ->exp_tasks at it and
	clears CPU 1's bit in ->expmask.

o	All other CPUs clear their bits in ->expmask.

o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
	concludes that all quiescent states have completed, despite
	the fact that ->exp_tasks is non-NULL.

To fix this, sync_rcu_preempt_exp_unlocked() is introduced to replace
lockless callsites of sync_rcu_preempt_exp_done().

Further, a lockdep annotation is added into sync_rcu_preempt_exp_done()
to prevent mis-use in the future.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:26:07 -07:00
Boqun Feng
7be8c56f8f rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
Since commit d9a3da0699 ("rcu: Add expedited grace-period support
for preemptible RCU"), there are comments for some funtions in
rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its
predecessors needs to be held.

However, exp_mutex and its predecessors were used only to synchronize
between GPs, and it is clear that all variables visited by those functions
are under the protection of rcu_node's ->lock. Moreover, those functions
are currently called without held exp_mutex, and seems that doesn't
introduce any trouble.

So this patch fixes this problem by updating the comments to match the
current code.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Fixes: d9a3da0699 ("rcu: Add expedited grace-period support for preemptible RCU")
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:26:01 -07:00
Paul E. McKenney
25f3d7effa rcu: Parallelize expedited grace-period initialization
The latency of RCU expedited grace periods grows with increasing numbers
of CPUs, eventually failing to be all that expedited.  Much of the growth
in latency is in the initialization phase, so this commit uses workqueues
to carry out this initialization concurrently on a rcu_node-by-rcu_node
basis.

This change makes use of a new rcu_par_gp_wq because flushing a work
item from another work item running from the same workqueue can result
in deadlock.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:25:44 -07:00
Paul E. McKenney
338c46403f Merge branches 'fixes.2018.02.23a', 'srcu.2018.02.20a' and 'torture.2018.02.20a' into HEAD
fixes.2018.02.23a: Miscellaneous fixes
srcu.2018.02.20a: SRCU updates
torture.2018.02.20a: Torture-test updates
2018-02-23 15:15:41 -08:00
Paul E. McKenney
ad7c946b35 rcu: Create RCU-specific workqueues with rescuers
RCU's expedited grace periods can participate in out-of-memory deadlocks
due to all available system_wq kthreads being blocked and there not being
memory available to create more.  This commit prevents such deadlocks
by allocating an RCU-specific workqueue_struct at early boot time, and
providing it with a rescuer to ensure forward progress.  This uses the
shiny new init_rescuer() function provided by Tejun (but indirectly).

This commit also causes SRCU to use this new RCU-specific
workqueue_struct.  Note that SRCU's use of workqueues never blocks them
waiting for readers, so this should be safe from a forward-progress
viewpoint.  Note that this moves SRCU from system_power_efficient_wq
to a normal workqueue.  In the unlikely event that this results in
measurable degradation, a separate power-efficient workqueue will be
creates for SRCU.

Reported-by: Prateek Sood <prsood@codeaurora.org>
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
2018-02-23 15:14:40 -08:00
Paul E. McKenney
85ba6bfe8b torture: Provide more sensible nreader/nwriter defaults for rcuperf
The default values for nreader and nwriter are apparently not all that
user-friendly, resulting in people doing scalability tests that ran all
runs at large scale.  This commit therefore makes both the nreaders and
nwriters module default to the number of CPUs, and adds a comment to
rcuperf.c stating that the number of CPUs should be specified using the
nr_cpus kernel boot parameter.  This commit also eliminates the redundant
rcuperf scripting specification of default values for these parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:22:01 -08:00
Paul E. McKenney
db0c1a8aba rcutorture: Record which grace-period primitives are tested
The rcu_torture_writer() function adapts to requested testing from module
parameters as well as the function pointers in the structure referenced
by cur_ops.  However, as long as the module parameters do not conflict
with the function pointers, this adaptation is silent.  This silence can
result in confusion as to exactly what was tested, which could in turn
result in untested RCU code making its way into mainline.

This commit therefore makes rcu_torture_writer() announce exactly which
portions of RCU's API it ends up testing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:58 -08:00
Paul E. McKenney
f7c0e6ad4b rcutorture: Re-enable testing of dynamic expediting
During boot, normal grace periods are processed as expedited.  When
rcutorture is built into the kernel, it starts during boot and thus
detects that normal grace periods are unconditionally expedited.
Therefore, rcutorture concludes that there is no point in trying
to dynamically enable expediting, do it disables this aspect of testing,
which is a bit of an overreaction to the temporary boot-time expediting.

This commit therefore rechecks forced expediting throughout the test,
enabling dynamic expediting if normal grace periods are processed
normally at any point.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:57 -08:00
Paul E. McKenney
eb0339934f rcutorture: Avoid fake-writer use of undefined primitives
Currently the rcu_torture_fakewriter() function invokes cur_ops->sync()
and cur_ops->exp_sync() without first checking to see if they are in
fact non-NULL.  This results in kernel NULL pointer dereferences when
testing RCU implementations that choose not to provide the full set of
primitives.  Given that it is perfectly reasonable to have specialized
RCU implementations that provide only a subset of the RCU API, this is
a bug in rcutorture.

This commit therefore makes rcu_torture_fakewriter() check function
pointers before invoking them, thus allowing it to test subsetted
RCU implementations.

Reported-by: Lihao Liang <lianglihao@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:56 -08:00
Paul E. McKenney
e0d31a34c6 rcutorture: Abstract function and module names
This commit moves to __func__ for function names and for KBUILD_MODNAME
for module names, all in the name of better resilience to change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:56 -08:00
Paul E. McKenney
68a675d433 rcutorture: Replace multi-instance kzalloc() with kcalloc()
This commit replaces array-allocation calls to kzalloc() with
equivalent calls to kcalloc().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:55 -08:00
Paul E. McKenney
6308f34775 rcu: Remove SRCU throttling
The code in srcu_gp_end() inserts a delay every 0x3ff grace periods in
order to prevent SRCU grace-period work from consuming an entire CPU
when there is a long sequence of expedited SRCU grace-period requests.
However, all of SRCU's grace-period work is carried out in workqueues,
which are in turn within kthreads, which are automatically throttled as
needed by the scheduler.  In particular, if there is plenty of idle time,
there is no point in throttling.

This commit therefore removes the expedited SRCU grace-period throttling.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:13 -08:00
Byungchul Park
a72da917f1 srcu: Remove dead code in srcu_gp_end()
Of course, compilers will optimize out a dead code. Anyway, remove
any dead code for better readibility.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:12 -08:00
Ildar Ismagilov
8ddbd8832d srcu: Reduce scans of srcu_data in counter wrap check
Currently, given a multi-level srcu_node tree, SRCU can scan the full
set of srcu_data structures at each level when cleaning up after a grace
period.  This, though harmless otherwise, represents pointless overhead.
This commit therefore eliminates this overhead by scanning the srcu_data
structures only when traversing the leaf srcu_node structures.

Signed-off-by: Ildar Ismagilov <devix84@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:12 -08:00
Ildar Ismagilov
a35d13ec36 srcu: Prevent sdp->srcu_gp_seq_needed_exp counter wrap
SRCU checks each srcu_data structure's grace-period number for counter
wrap four times per cycle by default.  This frequency guarantees that
normal comparisons will detect potential wrap.  However, the expedited
grace-period number is not checked.  The consquences are not too horrible
(a failure to expedite a grace period when requested), but it would be
good to avoid such things.  This commit therefore adds this check to
the expedited grace-period number.

Signed-off-by: Ildar Ismagilov <devix84@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:11 -08:00
Paul E. McKenney
cb4081cd4e srcu: Abstract function name
This commit moves to __func__ for function names in the name of better
resilience to change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:11 -08:00
Paul E. McKenney
65963d2461 rcu: Make expedited RCU CPU selection avoid unnecessary stores
This commit reworks the first loop in sync_rcu_exp_select_cpus()
to avoid doing unnecssary stores to other CPUs' rcu_data
structures.  This speeds up that first loop by roughly a factor of
two on an old x86 system.  In the case where the system is mostly
idle, this loop incurs a large fraction of the overhead of the
synchronize_rcu_expedited().  There is less benefit on busy systems
because the overhead of the smp_call_function_single() in the second
loop dominates in that case.

However, it is not unusual to do configuration chances involving
RCU grace periods (both expedited and normal) while the system is
mostly idle, so this optimization is worth doing.

While we are in the area, this commit also adds parentheses to arguments
used by the for_each_leaf_node_possible_cpu() macro.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:29 -08:00
Paul E. McKenney
7f5d42d051 rcu: Trace expedited GP delays due to transitioning CPUs
If a CPU is transitioning to or from offline state, an expedited
grace period may undergo a timed wait.  This timed wait can unduly
delay grace periods, so this commit adds a trace statement to make
it visible.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:28 -08:00
Paul E. McKenney
9a414201ae rcu: Add more tracing of expedited grace periods
This commit adds more tracing of expedited grace periods to enable
improved debugging of slowdowns.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:27 -08:00
Ildar Ismagilov
274afd6bfa rcu: Fix misprint in srcu_funnel_exp_start
The srcu_funnel_exp_start() function checks to see if the srcu_struct
structure's expedited grace period counter needs updating to reflect a
newly arrived request for an expedited SRCU grace period.  Unfortunately,
the check is backwards, so this commit reverses the sense of the test.

Signed-off-by: Ildar Ismagilov <devix84@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:27 -08:00
Matthew Wilcox
a32e01ee68 rcu: Use wrapper for lockdep asserts
Commits c0b334c5bf and ea9b0c8a26 introduced new sparse warnings
by accessing rcu_node->lock directly and ignoring the __private
marker.  Introduce a new wrapper and use it.  Also fix a similar problem
in srcutree.c introduced by a3883df393.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:26 -08:00
Liu, Changcheng
65518db86b rcu: Remove redundant nxttail index macro define
RCU's nxttail has been optimized to be a rcu_segcblist, which is
a multi-tailed linked list with macros defined for the indexes for
each tail.  The indexes have been defined in linux/rcu_segcblist.h,
so this commit removes the redundant definitions in kernel/rcu/tree.h.

Signed-off-by: Liu Changcheng <changcheng.liu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:31 -08:00
Paul E. McKenney
bfbd767d4d rcu: Consolidate rcu.h #ifdefs
The kernel/rcu/rcu.h file has a pair of consecutive #ifdefs on
CONFIG_TINY_RCU, so this commit consolidates them, thus saving a few
lines of code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:30 -08:00
Paul E. McKenney
d07aee2c03 rcu: More clearly identify grace-period kthread stack dump
It is not always obvious that the stack dump from a starved grace-period
kthread isn't instead that of a CPU stalling the current grace period.
This commit therefore adds a pr_err() flagging these dumps.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:29 -08:00
Paul E. McKenney
d62df57370 rcu: Remove obsolete force-quiescent-state statistics for debugfs
The debugfs interface displayed statistics on RCU-pending checks but
this interface has since been removed.  This commit therefore removes the
no-longer-used rcu_state structure's ->n_force_qs_lh and ->n_force_qs_ngp
fields along with their updates.  (Though the ->n_force_qs_ngp field
was actually not used at all, embarrassingly enough.)

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:29 -08:00
Paul E. McKenney
01c495f72a rcu: Remove obsolete __rcu_pending() statistics for debugfs
The debugfs interface displayed statistics on RCU-pending checks
but this interface has since been removed.  This commit therefore
removes the no-longer-used rcu_data structure's ->n_rcu_pending,
->n_rp_core_needs_qs, ->n_rp_report_qs, ->n_rp_cb_ready,
->n_rp_cpu_needs_gp, ->n_rp_gp_completed, ->n_rp_gp_started,
->n_rp_nocb_defer_wakeup, and ->n_rp_need_nothing fields along with
their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:28 -08:00
Paul E. McKenney
62df63e048 rcu: Remove obsolete callback-invocation statistics for debugfs
The debugfs interface displayed statistics on RCU callback invocation but
this interface has since been removed.  This commit therefore removes the
no-longer-used rcu_data structure's ->n_cbs_invoked and ->n_nocbs_invoked
fields along with their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:27 -08:00
Paul E. McKenney
bec06785fe rcu: Remove obsolete boost statistics for debugfs
The debugfs interface displayed statistics on RCU priority boosting,
but this interface has since been removed.  This commit therefore
removes the no-longer-used rcu_data structure's ->n_tasks_boosted,
->n_exp_boosts, and ->n_exp_boosts and their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:27 -08:00
Tejun Heo
3caa973b7a rcu: Call touch_nmi_watchdog() while printing stall warnings
When RCU stall warning triggers, it can print out a lot of messages
while holding spinlocks.  If the console device is slow (e.g. an
actual or IPMI serial console), it may end up triggering NMI hard
lockup watchdog like the following.

*** CPU printking while holding RCU spinlock

  PID: 4149739  TASK: ffff881a46baa880  CPU: 13  COMMAND: "CPUThreadPool8"
   #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0
   #1 [ffff881fff945e58] nmi_handle at ffffffff81020653
   #2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36
   #3 [ffff881fff945ed0] do_nmi at ffffffff81020d32
   #4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: io_serial_in+21]
      RIP: ffffffff81630e55  RSP: ffff881fff943b88  RFLAGS: 00000002
      RAX: 000000000000ca00  RBX: ffffffff8230e188  RCX: 0000000000000000
      RDX: 00000000000002fd  RSI: 0000000000000005  RDI: ffffffff8230e188
      RBP: ffff881fff943bb0   R8: 0000000000000000   R9: ffffffff820cb3c4
      R10: 0000000000000019  R11: 0000000000002000  R12: 00000000000026e1
      R13: 0000000000000020  R14: ffffffff820cd398  R15: 0000000000000035
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  --- <NMI exception stack> ---
   #5 [ffff881fff943b88] io_serial_in at ffffffff81630e55
   #6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c
   #7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc
   #8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00
   #9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691
  #10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2
  #11 [ffff881fff943c90] console_unlock at ffffffff810dfc55
  #12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5
  #13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf
  #14 [ffff881fff943d60] vprintk_func at ffffffff810e1127
  #15 [ffff881fff943d70] printk at ffffffff8119a8a4
  #16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c
  #17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133
  #18 [ffff881fff943ee8] update_process_times at ffffffff810f3497
  #19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037
  #20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38
  #21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b

*** CPU triggering the hardlockup watchdog

  PID: 4149709  TASK: ffff88010f88c380  CPU: 26  COMMAND: "CPUThreadPool35"
   #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874
   #1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc
   #2 [ffff883fff105af0] __crash_kexec at ffffffff81111795
   #3 [ffff883fff105b08] panic at ffffffff8119a6ae
   #4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd
   #5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866
   #6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4
   #7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265
   #8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f
   #9 [ffff883fff105e58] nmi_handle at ffffffff81020653
  #10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94
  #11 [ffff883fff105ed0] do_nmi at ffffffff81020d32
  #12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: queued_spin_lock_slowpath+248]
      RIP: ffffffff810da958  RSP: ffff883fff103e68  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: 0000000000000046  RCX: 00000000006d0000
      RDX: ffff883fff49a950  RSI: 0000000000d10101  RDI: ffffffff81e54300
      RBP: ffff883fff103e80   R8: ffff883fff11a950   R9: 0000000000000000
      R10: 000000000e5873ba  R11: 000000000000010f  R12: ffffffff81e54300
      R13: 0000000000000000  R14: ffff88010f88c380  R15: ffffffff81e54300
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
  #13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958
  #14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b
  #15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18
  #16 [ffff883fff103ee8] update_process_times at ffffffff810f3497
  #17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037
  #18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38
  #19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b
  --- <IRQ stack> ---

Avoid spuriously triggering NMI hardlockup watchdog by touching it
from the print functions.  show_state_filter() shares the same problem
and solution.

v2: Relocate the comment to where it belongs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:26 -08:00
Paul E. McKenney
3016611eed rcu: Fix CPU offload boot message when no CPUs are offloaded
In CONFIG_RCU_NOCB_CPU=y kernels, if the boot parameters indicate that
none of the CPUs should in fact be offloaded, the following somewhat
obtuse message appears:

	Offload RCU callbacks from CPUs: .

This commit therefore makes the message at least grammatically correct
in this case:

	Offload RCU callbacks from CPUs: (none)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:19 -08:00
Lihao Liang
398953e62c rcu: Remove unnecessary spinlock in rcu_boot_init_percpu_data()
Since rcu_boot_init_percpu_data() is only called at boot time,
there is no data race and spinlock is not needed.

Signed-off-by: Lihao Liang <lianglihao@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-15 15:40:36 -08:00
Linus Torvalds
28bc6fb959 SCSI misc on 20180131
This is mostly updates of the usual driver suspects: arcmsr,
 scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
 hisi_sas.  We also have a rework of the libsas hotplug handling to
 make it more robust, a slew of 32 bit time conversions and fixes, and
 a host of the usual minor updates and style changes.  The biggest
 potential for regressions is the libsas hotplug changes, but so far
 they seem stable under testing.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWnH+5SYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWxuAP0UvuJp
 MNR/yU/wv/emSzOc48Ldwd7I0xD2XxSnloGUgwD+IGZZT5yNUQA1THCbm+en4hkB
 WvyBieQs9qRit+2czd4=
 =gJMf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates of the usual driver suspects: arcmsr,
  scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
  hisi_sas.

  We also have a rework of the libsas hotplug handling to make it more
  robust, a slew of 32 bit time conversions and fixes, and a host of the
  usual minor updates and style changes. The biggest potential for
  regressions is the libsas hotplug changes, but so far they seem stable
  under testing"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits)
  scsi: qla2xxx: Fix logo flag for qlt_free_session_done()
  scsi: arcmsr: avoid do_gettimeofday
  scsi: core: Add VENDOR_SPECIFIC sense code definitions
  scsi: qedi: Drop cqe response during connection recovery
  scsi: fas216: fix sense buffer initialization
  scsi: ibmvfc: Remove unneeded semicolons
  scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()
  scsi: hisi_sas: directly attached disk LED feature for v2 hw
  scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw
  scsi: megaraid_sas: NVMe passthrough command support
  scsi: megaraid: use ktime_get_real for firmware time
  scsi: fnic: use 64-bit timestamps
  scsi: qedf: Fix error return code in __qedf_probe()
  scsi: devinfo: fix format of the device list
  scsi: qla2xxx: Update driver version to 10.00.00.05-k
  scsi: qla2xxx: Add XCB counters to debugfs
  scsi: qla2xxx: Fix queue ID for async abort with Multiqueue
  scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event()
  scsi: qla2xxx: Fix warning during port_name debug print
  scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
  ...
2018-01-31 11:23:28 -08:00
Paul E. McKenney
1dfa55e019 Merge branches 'cond_resched.2017.12.04a', 'dyntick.2017.11.28a', 'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD
cond_resched.2017.12.04a: Convert cond_resched_rcu_qs() to cond_resched()
dyntick.2017.11.28a: Make RCU dynticks handle interrupts from NMI
fixes.2017.12.11a: Miscellaneous fixes
srbd.2017.12.05a: Remove now-redundant smp_read_barrier_depends()
torture.2017.12.11a: Torture-testing update
2017-12-11 09:21:58 -08:00
Paul E. McKenney
a2f2577d96 torture: Eliminate torture_runnable and perf_runnable
The purpose of torture_runnable is to allow rcutorture and locktorture
to be started and stopped via sysfs when they are built into the kernel
(as in not compiled as loadable modules).  However, the 0444 permissions
for both instances of torture_runnable prevent this use case from ever
being put into practice.  Given that there have been no complaints
about this deficiency, it is reasonable to conclude that no one actually
makes use of this sysfs capability.  The perf_runnable module parameter
for rcuperf is in the same situation.

This commit therefore removes both torture_runnable instances as well
as perf_runnable.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:18:29 -08:00
Paul E. McKenney
e8302739aa rcutorture: Preempt RCU-preempt readers more vigorously
This commit attempts to make a very rare rcutorture failure happen
more often by increasing the fraction of RCU-preempt read-side critical
sections that are preempted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:18:22 -08:00
Paul E. McKenney
cc1321c96f torture: Reduce #ifdefs for preempt_schedule()
This commit adds a torture_preempt_schedule() that is nothingness
in !PREEMPT builds and is preempt_schedule() otherwise.  Then
torture_preempt_schedule() is used to eliminate several ugly #ifdefs,
both in rcutorture and in locktorture.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:18:22 -08:00
Rakib Mullick
84b12b752f rcu: Remove have_rcu_nocb_mask from tree_plugin.h
Currently have_rcu_nocb_mask is used to avoid double allocation of
rcu_nocb_mask during boot up. Due to different representation of
cpumask_var_t on different kernel config CPUMASK=y(or n) it was okay.
But now we have a helper cpumask_available(), which can be utilized
to check whether rcu_nocb_mask has been allocated or not without using
a variable.

Removing the variable also reduces vmlinux size.

Unpatched version:
text	   data	    bss	    dec	    hex	filename
13050393	7852470	14543408	35446271	21cddff	vmlinux

Patched version:
 text	   data	    bss	    dec	    hex	filename
13050390	7852438	14543408	35446236	21cdddc	vmlinux

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:17:40 -08:00
Paul E. McKenney
efd88b02bb rcu: Add comment giving debug strategy for double call_rcu()
The following statement has for some reason proven non-intuitive:

	WARN_ON_ONCE(rcu_segcblist_empty(&rdp->cblist) != (count == 0));

This commit therefore adds a comment that states that this warning
usually triggers in response to a double call_rcu(), which is sort
of like a double free.  The comment also suggests building with
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y to track down the double call_rcu().

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:17:39 -08:00
Paul E. McKenney
156baec397 rcu: Export init_rcu_head() and destroy_rcu_head() to GPL modules
Use of init_rcu_head() and destroy_rcu_head() from modules results in
the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y:

	ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!
	ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!

This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to
be used by GPL-licensed kernel modules.

Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-07 19:51:49 -05:00
Paul E. McKenney
d633198088 srcu: Prohibit call_srcu() use under raw spinlocks
Invoking queue_delayed_work() while holding a raw spinlock is forbidden
in -rt kernels, which is exactly what __call_srcu() does, indirectly via
srcu_funnel_gp_start().  This commit therefore downgrades Tree SRCU's
locking from raw to non-raw spinlocks, which works because call_srcu()
is not ever called while holding a raw spinlock.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:52:33 -08:00
Paul E. McKenney
e68bbb266d rcu: Simplify rcu_eqs_{enter,exit}() non-idle task debug code
The code that checks for non-idle non-nohz_idle-usermode tasks invoking
rcu_eqs_enter() and rcu_eqs_exit() prints a considerable quantity of
helpful information.  However, these checks fire rarely, so the extra
complexity is no longer worth it.  This commit therefore replaces this
debug code with simple WARN_ON_ONCE() statements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney
9dd238e286 rcu: Fold rcu_eqs_exit_common() into rcu_eqs_exit()
There is now only one call to rcu_eqs_exit_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney
215bba9f59 rcu: Fold rcu_eqs_enter_common() into rcu_eqs_enter()
There is now only one call to rcu_eqs_enter_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney
2342172fd6 rcu: Avoid ->dynticks_nesting store tearing
Although ->dynticks_nesting is updated only by process level, it is
accessed from hardirq to check for interrupt-from-idle quiescent states.
Store tearing is thus possible, so this commit applies WRITE_ONCE()
to ->dynticks_nesting stores.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney
914955e18c rcu: Stop duplicating lockdep checks in RCU's idle-entry code
The three RCU_LOCKDEP_WARN() calls in rcu_eqs_enter_common() are
redundant with other lockdep checks, so this commit removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney
dec98900ea rcu: Add ->dynticks field to rcu_dyntick trace event
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney
84585aa8b6 rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long
Because the ->dynticks_nesting field now only contains the process-based
nesting level instead of a value encoding both the process nesting level
and the irq "nesting" level, we no longer need a long long, even on
32-bit systems.  This commit therefore changes both the ->dynticks_nesting
and ->dynticks_nmi_nesting fields to long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Paul E. McKenney
bd2b879a1c rcu: Add tracing to irq/NMI dyntick-idle transitions
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Paul E. McKenney
844ccdd7dc rcu: Eliminate rcu_irq_enter_disabled()
Now that the irq path uses the rcu_nmi_{enter,exit}() algorithm,
rcu_irq_enter() and rcu_irq_exit() may be used from any context.  There is
thus no need for rcu_irq_enter_disabled() and for the checks using it.
This commit therefore eliminates rcu_irq_enter_disabled().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
51a1fd30f1 rcu: Make ->dynticks_nesting be a simple counter
Now that ->dynticks_nesting counts only process-level dyntick-idle
entry and exit, there is no need for the elaborate segmented counter
with its guard fields and overflow checking.  This commit therefore
makes ->dynticks_nesting be a simple counter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
58721f5da4 rcu: Define rcu_irq_{enter,exit}() in terms of rcu_nmi_{enter,exit}()
RCU currently uses two different mechanisms for tracking irqs and NMIs.
This is unnecessary complexity: Given that NMIs can nest and given that
RCU's tracking handles such nesting, the NMI tracking mechanism can also
be used to track irqs.  This commit therefore defines rcu_irq_enter()
in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_nmi_exit().

Unfortunately, callers must still distinguish between the irq and NMI
functions because additional actions are taken when an irq interrupts
idle or nohz_full usermode execution, and these actions cannot always
be taken from NMI handlers.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
6136d6e48a rcu: Clamp ->dynticks_nmi_nesting at eqs entry/exit
In preparation for merging dyntick-idle irq handling into the NMI
algorithm, clamp ->dynticks_nmi_nesting value to allow for interrupts
that enter but never leave and vice versa.

It is important that the clamping happen outside of the extended quiescent
state.  Otherwise, there will be short windows where irqs and NMIs fail
to convince RCU to start watching.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00
Paul E. McKenney
fd581a91ac rcu: Move rcu_nmi_{enter,exit}() to prepare for consolidation
This is a code-motion-only commit that prepares to define rcu_irq_enter()
in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_irq_exit().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00
Paul E. McKenney
a0eb22bf64 rcu: Reduce dyntick-idle state space
Both extended-quiescent-state entry and exit first update the nesting
counter and then adjust the dyntick-idle state.  This means that there
are four states: (1) Both nesting and dyntick idle indicate idle,
(2) Nesting indicates idle but dyntick idle does not, (3) Nesting indicates
non-idle and dyntick idle does not, and (4) Both nesting and dyntick
idle indicate non-idle.  This commit simplifies the state space by
eliminating #3, reversing the order of updates on exit from extended
quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00
Paul E. McKenney
2e672ab2d4 rcu: Avoid ->dynticks_nmi_nesting store tearing
NMIs can nest, and store tearing could in theory happen on carries
from one byte to the next.  This commit therefore adds the WRITE_ONCE()
macros preventing this.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:09 -08:00
Linus Torvalds
2bcc673101 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Yet another big pile of changes:

   - More year 2038 work from Arnd slowly reaching the point where we
     need to think about the syscalls themself.

   - A new timer function which allows to conditionally (re)arm a timer
     only when it's either not running or the new expiry time is sooner
     than the armed expiry time. This allows to use a single timer for
     multiple timeout requirements w/o caring about the first expiry
     time at the call site.

   - A new NMI safe accessor to clock real time for the printk timestamp
     work. Can be used by tracing, perf as well if required.

   - A large number of timer setup conversions from Kees which got
     collected here because either maintainers requested so or they
     simply got ignored. As Kees pointed out already there are a few
     trivial merge conflicts and some redundant commits which was
     unavoidable due to the size of this conversion effort.

   - Avoid a redundant iteration in the timer wheel softirq processing.

   - Provide a mechanism to treat RTC implementations depending on their
     hardware properties, i.e. don't inflict the write at the 0.5
     seconds boundary which originates from the PC CMOS RTC to all RTCs.
     No functional change as drivers need to be updated separately.

   - The usual small updates to core code clocksource drivers. Nothing
     really exciting"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
  timers: Add a function to start/reduce a timer
  pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
  timer: Prepare to change all DEFINE_TIMER() callbacks
  netfilter: ipvs: Convert timers to use timer_setup()
  scsi: qla2xxx: Convert timers to use timer_setup()
  block/aoe: discover_timer: Convert timers to use timer_setup()
  ide: Convert timers to use timer_setup()
  drbd: Convert timers to use timer_setup()
  mailbox: Convert timers to use timer_setup()
  crypto: Convert timers to use timer_setup()
  drivers/pcmcia: omap1: Fix error in automated timer conversion
  ARM: footbridge: Fix typo in timer conversion
  drivers/sgi-xp: Convert timers to use timer_setup()
  drivers/pcmcia: Convert timers to use timer_setup()
  drivers/memstick: Convert timers to use timer_setup()
  drivers/macintosh: Convert timers to use timer_setup()
  hwrng/xgene-rng: Convert timers to use timer_setup()
  auxdisplay: Convert timers to use timer_setup()
  sparc/led: Convert timers to use timer_setup()
  mips: ip22/32: Convert timers to use timer_setup()
  ...
2017-11-13 17:56:58 -08:00
Linus Torvalds
3e2014637c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main updates in this cycle were:

   - Group balancing enhancements and cleanups (Brendan Jackman)

   - Move CPU isolation related functionality into its separate
     kernel/sched/isolation.c file, with related 'housekeeping_*()'
     namespace and nomenclature et al. (Frederic Weisbecker)

   - Improve the interactive/cpu-intense fairness calculation (Josef
     Bacik)

   - Improve the PELT code and related cleanups (Peter Zijlstra)

   - Improve the logic of pick_next_task_fair() (Uladzislau Rezki)

   - Improve the RT IPI based balancing logic (Steven Rostedt)

   - Various micro-optimizations:

   - better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi)

   - better idle loop (Cheng Jian)

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds
  sched/sysctl: Fix attributes of some extern declarations
  sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated
  sched/isolation: Add basic isolcpus flags
  sched/isolation: Move isolcpus= handling to the housekeeping code
  sched/isolation: Handle the nohz_full= parameter
  sched/isolation: Introduce housekeeping flags
  sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL
  sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu()
  sched/isolation: Use its own static key
  sched/isolation: Make the housekeeping cpumask private
  sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu()
  sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
  sched/isolation: Move housekeeping related code to its own file
  sched/idle: Micro-optimize the idle loop
  sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK
  x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter
  sched/rt: Simplify the IPI based RT balancing logic
  block/ioprio: Use a helper to check for RT prio
  sched/rt: Add a helper to test for a RT task
  ...
2017-11-13 13:37:52 -08:00
Linus Torvalds
8e9a2dba86 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Another attempt at enabling cross-release lockdep dependency
     tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
     with better performance and fewer false positives. (Byungchul Park)

   - Introduce lockdep_assert_irqs_enabled()/disabled() and convert
     open-coded equivalents to lockdep variants. (Frederic Weisbecker)

   - Add down_read_killable() and use it in the VFS's iterate_dir()
     method. (Kirill Tkhai)

   - Convert remaining uses of ACCESS_ONCE() to
     READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
     driven. (Mark Rutland, Paul E. McKenney)

   - Get rid of lockless_dereference(), by strengthening Alpha atomics,
     strengthening READ_ONCE() with smp_read_barrier_depends() and thus
     being able to convert users of lockless_dereference() to
     READ_ONCE(). (Will Deacon)

   - Various micro-optimizations:

        - better PV qspinlocks (Waiman Long),
        - better x86 barriers (Michael S. Tsirkin)
        - better x86 refcounts (Kees Cook)

   - ... plus other fixes and enhancements. (Borislav Petkov, Juergen
     Gross, Miguel Bernal Marin)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
  rcu: Use lockdep to assert IRQs are disabled/enabled
  netpoll: Use lockdep to assert IRQs are disabled/enabled
  timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
  sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
  irq_work: Use lockdep to assert IRQs are disabled/enabled
  irq/timings: Use lockdep to assert IRQs are disabled/enabled
  perf/core: Use lockdep to assert IRQs are disabled/enabled
  x86: Use lockdep to assert IRQs are disabled/enabled
  smp/core: Use lockdep to assert IRQs are disabled/enabled
  timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
  timers/nohz: Use lockdep to assert IRQs are disabled/enabled
  workqueue: Use lockdep to assert IRQs are disabled/enabled
  irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
  locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
  locking/pvqspinlock: Implement hybrid PV queued/unfair locks
  locking/rwlocks: Fix comments
  x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
  block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
  workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
  ...
2017-11-13 12:38:26 -08:00
Linus Torvalds
6098850e7e Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Documentation updates

   - RCU CPU stall-warning updates

   - Torture-test updates

   - Miscellaneous fixes

  Size wise the biggest updates are to documentation. Excluding
  documentation most of the code increase comes from a single commit
  which expands debugging"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  srcu: Add parameters to SRCU docbook comments
  doc: Rewrite confusing statement about memory barriers
  memory-barriers.txt: Fix typo in pairing example
  rcu/segcblist: Include rcupdate.h
  rcu: Add extended-quiescent-state testing advice
  rcu: Suppress lockdep false-positive ->boost_mtx complaints
  rcu: Do not include rtmutex_common.h unconditionally
  torture: Provide TMPDIR environment variable to specify tmpdir
  rcutorture: Dump writer stack if stalled
  rcutorture: Add interrupt-disable capability to stall-warning tests
  rcu: Suppress RCU CPU stall warnings while dumping trace
  rcu: Turn off tracing before dumping trace
  rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
  sched,rcu: Make cond_resched() provide RCU quiescent state
  sched: Make resched_cpu() unconditional
  irq_work: Map irq_work_on_queue() to irq_work_on() in !SMP
  rcu: Create call_rcu_tasks() kthread at boot time
  rcu: Fix up pending cbs check in rcu_prepare_for_idle
  memory-barriers: Rework multicopy-atomicity section
  memory-barriers: Replace uses of "transitive"
  ...
2017-11-13 12:18:10 -08:00
Frederic Weisbecker
b04db8e19f rcu: Use lockdep to assert IRQs are disabled/enabled
Lockdep now has an integrated IRQs disabled/enabled sanity check. Just
use it instead of the ad-hoc RCU version.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1509980490-4285-15-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08 11:13:55 +01:00
Ingo Molnar
8a103df440 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08 10:17:15 +01:00
Kees Cook
fd30b717b8 rcu: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-02 15:44:09 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Frederic Weisbecker
de201559df sched/isolation: Introduce housekeeping flags
Before we implement isolcpus under housekeeping, we need the isolation
features to be more finegrained. For example some people want NOHZ_FULL
without the full scheduler isolation, others want full scheduler
isolation without NOHZ_FULL.

So let's cut all these isolation features piecewise, at the risk of
overcutting it right now. We can still merge some flags later if they
always make sense together.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 09:55:29 +02:00
Frederic Weisbecker
7863406143 sched/isolation: Move housekeeping related code to its own file
The housekeeping code is currently tied to the NOHZ code. As we are
planning to make housekeeping independent from it, start with moving
the relevant code to its own file.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 09:55:24 +02:00
Ingo Molnar
72bc286b81 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Documentation updates
 - Miscellaneous fixes
 - RCU CPU stall-warning updates
 - Torture-test updates

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 10:49:44 +02:00
Paul E. McKenney
ad4e25a3a1 Merge branches 'doc.2017.10.20a', 'fixes.2017.10.19a', 'stall.2017.10.09a' and 'torture.2017.10.09a' into HEAD
doc.2017.10.20a: Documentation updates.
fixes.2017.10.19a: Miscellaneous fixes.
stall.2017.10.09a: RCU CPU stall-warning updates.
torture.2017.10.09a: Torture-test updates.
2017-10-20 11:11:15 -07:00
Paul E. McKenney
e4d0b679a8 srcu: Add parameters to SRCU docbook comments
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-20 11:09:33 -07:00
Paul E. McKenney
27fdb35fe9 doc: Fix various RCU docbook comment-header problems
Because many of RCU's files have not been included into docbook, a
number of errors have accumulated.  This commit fixes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19 22:26:11 -04:00
Sebastian Andrzej Siewior
56628a7fc8 rcu/segcblist: Include rcupdate.h
The RT build on ARM complains about non-existing ULONG_CMP_LT.
This commit therefore includes rcupdate.h into rcu_segcblist.c.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:13:36 -07:00
Paul E. McKenney
c0da313e09 rcu: Add extended-quiescent-state testing advice
If you add or remove calls to rcu_idle_enter(), rcu_user_enter(),
rcu_irq_exit(), rcu_irq_exit_irqson(), rcu_idle_exit(), rcu_user_exit(),
rcu_irq_enter(), rcu_irq_enter_irqson(), rcu_nmi_enter(), or
rcu_nmi_exit(), you should run a full set of tests on a kernel built
with CONFIG_RCU_EQS_DEBUG=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:13:36 -07:00
Paul E. McKenney
02a7c234e5 rcu: Suppress lockdep false-positive ->boost_mtx complaints
RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an
rt_mutex structure in locked state held by some other task.  When that
other task releases it, lockdep complains (quite accurately, but a bit
uselessly) that the other task never acquired it.  This complaint can
suppress other, more helpful, lockdep complaints, and in any case it is
a false positive.

This commit therefore switches from rt_mutex_unlock() to
rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations.
Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(),
addtional adjustments will be required.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:13:36 -07:00
Sebastian Andrzej Siewior
b88697810d rcu: Do not include rtmutex_common.h unconditionally
This commit adjusts include files and provides definitions in preparation
for suppressing lockdep false-positive ->boost_mtx complaints.  Without
this preparation, architectures not supporting rt_mutex will get build
failures.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:12:06 -07:00
Paul E. McKenney
0032f4e889 rcutorture: Dump writer stack if stalled
Right now, rcutorture warns if an rcu_torture_writer() kthread stalls,
but this warning is not always all that helpful.  This commit therefore
makes the first such warning include a stack dump.

This in turn requires that sched_show_task() be exported to GPL modules,
so this commit makes that change as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:26:09 -07:00
Paul E. McKenney
2b1516e55f rcutorture: Add interrupt-disable capability to stall-warning tests
When rcutorture sees the rcutorture.stall_cpu kernel boot parameter,
it loops with preemption disabled, which does in fact normally
generate an RCU CPU stall warning message.  However, there are test
scenarios that need the stalling CPU to have interrupts disabled.
This commit therefore adds an rcutorture.stall_cpu_irqsoff kernel
boot parameter that causes the stalling CPU to disable interrupts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:26:08 -07:00
Paul E. McKenney
f22ce09157 rcu: Suppress RCU CPU stall warnings while dumping trace
Currently, RCU emits Suppress RCU CPU stall warnings during its
automatically initiated ftrace_dump() calls after detecting an error
condition, which can result in excessively excessive console output
and lost trace events.  This commit therefore suppresses RCU CPU stall
warnings across any of these ftrace_dump() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:25:17 -07:00
Paul E. McKenney
83b6ca1fed rcu: Turn off tracing before dumping trace
Currently, RCU allows tracing to continue when it automatically does
ftrace_dump() after detecting an error condition, which can result in
excessively large traces and lost trace events.  This commit therefore
does a tracing_off() before any of these ftrace_dump() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:25:17 -07:00
Paul E. McKenney
9b9500da81 rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
One common question upon seeing an RCU CPU stall warning is "did
the stalled CPUs have interrupts disabled?"  However, the current
stall warnings are silent on this point.  This commit therefore
uses irq_work to check whether stalled CPUs still respond to IPIs,
and flags this state in the RCU CPU stall warning console messages.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:25:17 -07:00
Paul E. McKenney
f79c3ad618 sched,rcu: Make cond_resched() provide RCU quiescent state
There is some confusion as to which of cond_resched() or
cond_resched_rcu_qs() should be added to long in-kernel loops.
This commit therefore eliminates the decision by adding RCU quiescent
states to cond_resched().  This commit also simplifies the code that
used to interact with cond_resched_rcu_qs(), and that now interacts with
cond_resched(), to reduce its overhead.  This reduction is necessary to
allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
everywhere that cond_resched() is invoked.

Part of that reduction in overhead converts the jiffies_till_sched_qs
kernel parameter to read-only at runtime, thus eliminating the need for
bounds checking.

Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
[ paulmck: Keep PREEMPT=n cond_resched a no-op, per Peter Zijlstra. ]
2017-10-09 14:25:17 -07:00
Paul E. McKenney
c63eb17ff0 rcu: Create call_rcu_tasks() kthread at boot time
Currently the call_rcu_tasks() kthread is created upon first
invocation of call_rcu_tasks().  This has the advantage of avoiding
creation if there are never any invocations of call_rcu_tasks() and of
synchronize_rcu_tasks(), but it requires an unreliable heuristic to
determine when it is safe to create the kthread.  For example, it is
not safe to create the kthread when call_rcu_tasks() is invoked with
a spinlock held, but there is no good way to detect this in !PREEMPT
kernels.

This commit therefore creates this kthread unconditionally at
core_initcall() time.  If you don't want this kthread created, then
build with CONFIG_TASKS_RCU=n.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:24:14 -07:00
Neeraj Upadhyay
135bd1a230 rcu: Fix up pending cbs check in rcu_prepare_for_idle
The pending-callbacks check in rcu_prepare_for_idle() is backwards.
It should accelerate if there are pending callbacks, but the check
rather uselessly accelerates only if there are no callbacks.  This commit
therefore inverts this check.

Fixes: 15fecf89e4 ("srcu: Abstract multi-tail callback list handling")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.12.x
2017-10-09 14:24:14 -07:00
Linus Torvalds
013a8ee628 Two updates.
- A memory fix with left over code from spliting out ftrace_ops
    and function graph tracer, where the function graph tracer could
    reset the trampoline pointer, leaving the old trampoline not to
    be freed (memory leak).
 
  - The update to Paul's patch that added the unnecessary READ_ONCE().
    This removes the unnecessary READ_ONCE() instead of having to rebase
    the branch to update the patch that added it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnU++sUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQybkF8mrZjcujzgf/ebIzGKe5vQKNrL4ITAcIz0T7Hvzl
 pWw4uJp8kqO9x9EHMnztAkltQigvjvgDKZozJpUGgtNsFLuvdgQSBMK24YV8vLHs
 UmXEnQ2tSB/2Sg2ccEnpjVXaMzL9aqlbeTmACbdd9UgZnvPiUYPejq2jFfECFQjb
 k/gZT911ukBtx4mXYKzGFbTEZHdc/YUs6Y/wzB1ox5BBIUh71ZDZXxQTUHfXHlwS
 Cst69/9dKl4nBEGDGas6/95iR+ORVv85osI/pqPtjSj4EkRnWfVRotaH1kNuSQil
 gDIHSoy35NfXJx77/5IFHfrjFBAkr0IYRNL/jZaWazwM7rdqfAN8TwMQuA==
 =4CtF
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixlets from Steven Rostedt:
 "Two updates:

   - A memory fix with left over code from spliting out ftrace_ops and
     function graph tracer, where the function graph tracer could reset
     the trampoline pointer, leaving the old trampoline not to be freed
     (memory leak).

   - The update to Paul's patch that added the unnecessary READ_ONCE().
     This removes the unnecessary READ_ONCE() instead of having to
     rebase the branch to update the patch that added it"

* tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
  ftrace: Fix kmemleak in unregister_ftrace_graph
2017-10-04 08:34:01 -07:00
Paul E. McKenney
f39b536ce9 rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit()
is currently protected with READ_ONCE().  However, this protection is
unnecessary because (1) ->dynticks_nmi_nesting is updated only by the
current CPU, (2) Although NMI handlers can update this field, they reset
it back to its old value before return, and (3) Interrupts are disabled,
so nothing else can modify it.  The value of ->dynticks_nmi_nesting is
thus effectively constant, and so no protection is required.

This commit therefore removes the READ_ONCE() protection from these
two accesses.

Link: http://lkml.kernel.org/r/20170926031902.GA2074@linux.vnet.ibm.com

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-03 10:27:32 -04:00
Linus Torvalds
ac0a36461f Stack tracing and RCU has been having issues with each other and lockdep
has been pointing out constant problems. The changes have been going into
 the stack tracer, but it has been discovered that the problem isn't
 with the stack tracer itself, but it is with calling save_stack_trace()
 from within the internals of RCU. The stack tracer is the one that
 can trigger the issue the easiest, but examining the problem further,
 it could also happen from a WARN() in the wrong place, or even if
 an NMI happened in this area and it did an rcu_read_lock().
 
 The critical area is where RCU is not watching. Which can happen while
 going to and from idle, or bringing up or taking down a CPU.
 
 The final fix was to put the protection in kernel_text_address() as it
 is the one that requires RCU to be watching while doing the stack trace.
 
 To make this work properly, Paul had to allow rcu_irq_enter() happen after
 rcu_nmi_enter(). This should have been done anyway, since an NMI can
 page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter().
 
 One patch is just a consolidation of code so that the fix only needed
 to be done in one location.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnGyXoUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQybkF8mrZjctKtwf8CeKGqOdlqkZEafIpWaIASXmAVMO/
 WE+hQK+rCydWFvzADgb/rOmsR0ou8WGEXcuUPxVxmvMyqhKhZ6AU1hE/7Y8P0pMq
 F4bev+j3lAJC65ezFAh+ZQcIjaRIH4MFVPsUTaibSPSN7xziMNIpbf9VOVfpUm8A
 jf9p6YAmyhFVi6DstCc29SWnywEVwC2ZWRVKRPXKry8/dPxjfVcLclGX680Eqi9I
 EnYaOdC/mGbtvHPOUSs/P0cfxExHmyEErQHeOV8FPymj6KJ6+KoYIiELNlTHUBj/
 eeKzrHc/b3j+lz0RPlA8WxYmpmEm4SE5cV3vRebdBNUBrABSN1RxeOozyQ==
 =1KkS
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Stack tracing and RCU has been having issues with each other and
  lockdep has been pointing out constant problems.

  The changes have been going into the stack tracer, but it has been
  discovered that the problem isn't with the stack tracer itself, but it
  is with calling save_stack_trace() from within the internals of RCU.

  The stack tracer is the one that can trigger the issue the easiest,
  but examining the problem further, it could also happen from a WARN()
  in the wrong place, or even if an NMI happened in this area and it did
  an rcu_read_lock().

  The critical area is where RCU is not watching. Which can happen while
  going to and from idle, or bringing up or taking down a CPU.

  The final fix was to put the protection in kernel_text_address() as it
  is the one that requires RCU to be watching while doing the stack
  trace.

  To make this work properly, Paul had to allow rcu_irq_enter() happen
  after rcu_nmi_enter(). This should have been done anyway, since an NMI
  can page fault (reading vmalloc area), and a page fault triggers
  rcu_irq_enter().

  One patch is just a consolidation of code so that the fix only needed
  to be done in one location"

* tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Remove RCU work arounds from stack tracer
  extable: Enable RCU if it is not watching in kernel_text_address()
  extable: Consolidate *kernel_text_address() functions
  rcu: Allow for page faults in NMI handlers
2017-09-25 15:22:31 -07:00
Paul E. McKenney
28585a8326 rcu: Allow for page faults in NMI handlers
A number of architecture invoke rcu_irq_enter() on exception entry in
order to allow RCU read-side critical sections in the exception handler
when the exception is from an idle or nohz_full CPU.  This works, at
least unless the exception happens in an NMI handler.  In that case,
rcu_nmi_enter() would already have exited the extended quiescent state,
which would mean that rcu_irq_enter() would (incorrectly) cause RCU
to think that it is again in an extended quiescent state.  This will
in turn result in lockdep splats in response to later RCU read-side
critical sections.

This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
take no action if there is an rcu_nmi_enter() in effect, thus avoiding
the unscheduled return to RCU quiescent state.  This in turn should
make the kernel safe for on-demand RCU voyeurism.

Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-23 16:49:42 -04:00
Alexey Dobriyan
9b130ad5bb treewide: make "nr_cpu_ids" unsigned
First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
	kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
	while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
	function                                     old     new   delta
	coretemp_cpu_online                          450     512     +62
	rcu_init_one                                1234    1272     +38
	pci_device_probe                             374     399     +25

				...

	pgdat_reclaimable_pages                      628     556     -72
	select_fallback_rq                           446     369     -77
	task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Paul E. McKenney
656e7c0c0a Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD
doc.2017.08.17a: Documentation updates.
fixes.2017.08.17a: RCU fixes.
hotplug.2017.07.25b: CPU-hotplug updates.
misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
srcu.2017.07.27c: SRCU updates.
torture.2017.07.24c: Torture-test updates.
2017-08-17 08:10:04 -07:00
Paul E. McKenney
16c0b10607 rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
The rcu_idle_exit() and rcu_idle_enter() functions are exported because
they were originally used by RCU_NONIDLE(), which was intended to
be usable from modules.  However, RCU_NONIDLE() now instead uses
rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not
exported, and there have been no complaints.

This commit therefore removes the exports from rcu_idle_exit() and
rcu_idle_enter().

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Paul E. McKenney
d4db30af51 rcu: Add warning to rcu_idle_enter() for irqs enabled
All current callers of rcu_idle_enter() have irqs disabled, and
rcu_idle_enter() relies on this, but doesn't check.  This commit
therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust.
While we are there, pass "true" rather than "1" to rcu_eqs_enter().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Peter Zijlstra (Intel)
3a60799269 rcu: Make rcu_idle_enter() rely on callers disabling irqs
All callers to rcu_idle_enter() have irqs disabled, so there is no
point in rcu_idle_enter disabling them again.  This commit therefore
replaces the irq disabling with a RCU_LOCKDEP_WARN().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:24 -07:00
Paul E. McKenney
2dee9404fa rcu: Add assertions verifying blocked-tasks list
This commit adds assertions verifying the consistency of the rcu_node
structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and
->boost_tasks pointers.  In particular, the ->blkd_tasks lists must be
empty except for leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Masami Hiramatsu
35fe723bda rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also
rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was
fixed for rcu_eqs_enter_common() by commit 03ecd3f48e ("rcu/tracing:
Add rcu_disabled to denote when rcu_irq_enter() will not work").

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Paul E. McKenney
d8db2e86d8 rcu: Add TPS() protection for _rcu_barrier_trace strings
The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
which needs TPS() protection for strings passed through the second
argument.  However, it has escaped prior TPS()-ification efforts because
it _rcu_barrier_trace() does not start with "trace_".  This commit
therefore adds the needed TPS() protection

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:22 -07:00
Luis R. Rodriguez
d5374226c3 rcu: Use idle versions of swait to make idle-hack clear
These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
our goal clear, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadmins to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:15 -07:00
Paul E. McKenney
c5ebe66ce7 rcu: Add event tracing to ->gp_tasks update at GP start
There is currently event tracing to track when a task is preempted
within a preemptible RCU read-side critical section, and also when that
task subsequently reaches its outermost rcu_read_unlock(), but none
indicating when a new grace period starts when that grace period must
wait on pre-existing readers that have been been preempted at least once
since the beginning of their current RCU read-side critical sections.

This commit therefore adds an event trace at grace-period start in
the case where there are such readers.  Note that only the first
reader in the list is traced.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:06 -07:00
Paul E. McKenney
7414fac050 rcu: Move rcu.h to new trivial-function style
This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line
definitions for trivial functions, instead of the old style where the
two curly braces each get their own line.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:06 -07:00
Paul E. McKenney
bedbb648ef rcu: Add TPS() to event-traced strings
Strings used in event tracing need to be specially handled, for example,
using the TPS() macro.  Without the TPS() macro, although output looks
fine from within a running kernel, extracting traces from a crash dump
produces garbage instead of strings.  This commit therefore adds the TPS()
macro to some unadorned strings that were passed to event-tracing macros.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:05 -07:00
Paul E. McKenney
ccdd29ffff rcu: Create reasonable API for do_exit() TASKS_RCU processing
Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
APIs for do_exit() use.  This has the benefit of confining the use of the
tasks_rcu_exit_srcu variable to one file, allowing it to become static.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:05 -07:00
Paul E. McKenney
7e42776d5e rcu: Drive TASKS_RCU directly off of PREEMPT
The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
is used instead.  This commit therefore makes synchronize_rcu_tasks()
and call_rcu_tasks() available always, but mapped to synchronize_sched()
and call_rcu_sched(), respectively, when !PREEMPT.  This approach also
allows some #ifdefs to be removed from rcutorture.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 07:26:04 -07:00
Paul E. McKenney
35732cf9dd srcu: Provide ordering for CPU not involved in grace period
Tree RCU guarantees that every online CPU has a memory barrier between
any given grace period and any of that CPU's RCU read-side sections that
must be ordered against that grace period.  Since RCU doesn't always
know where read-side critical sections are, the actual implementation
guarantees order against prior and subsequent non-idle non-offline code,
whether in an RCU read-side critical section or not.  As a result, there
does not need to be a memory barrier at the end of synchronize_rcu()
and friends because the ordering internal to the grace period has
ordered every CPU's post-grace-period execution against each CPU's
pre-grace-period execution, again for all non-idle online CPUs.

In contrast, SRCU can have non-idle online CPUs that are completely
uninvolved in a given SRCU grace period, for example, a CPU that
never runs any SRCU read-side critical sections and took no part in
the grace-period processing.  It is in theory possible for a given
synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
uninvolved in the prior SRCU grace period, which could mean that the
code following that synchronize_srcu() would end up being unordered with
respect to both the grace period and any pre-existing SRCU read-side
critical sections.

This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
which prevents this scenario from occurring.

Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Lance Roy <ldr709@gmail.com>
Cc: <stable@vger.kernel.org> # 4.12.x
2017-07-27 15:53:04 -07:00
Paul E. McKenney
09efeeee17 rcu: Move callback-list warning to irq-disable region
After adopting callbacks from a newly offlined CPU, the adopting CPU
checks to make sure that its callback list's count is zero only if the
list has no callbacks and vice versa.  Unfortunately, it does so after
enabling interrupts, which means that false positives are possible due to
interrupt handlers invoking call_rcu().  Although these false positives
are improbable, rcutorture did make it happen once.

This commit therefore moves this check to an irq-disabled region of code,
thus suppressing the false positive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:50 -07:00
Paul E. McKenney
aed4e04686 rcu: Remove unused RCU list functions
Given changes to callback migration, rcu_cblist_head(),
rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(),
rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are
no longer used.  This commit therefore removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:49 -07:00
Paul E. McKenney
f2dbe4a562 rcu: Localize rcu_state ->orphan_pend and ->orphan_done
Given that the rcu_state structure's >orphan_pend and ->orphan_done
fields are used only during migration of callbacks from the recently
offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
rcu_adopt_orphan_cbs() are combined, these fields can become local
variables in the combined function.  This commit therefore combines
rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
rcu_segcblist_merge() function and removes the ->orphan_pend and
->orphan_done fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:49 -07:00
Paul E. McKenney
21cc248384 rcu: Advance callbacks after migration
When migrating callbacks from a newly offlined CPU, we are already
holding the root rcu_node structure's lock, so it costs almost nothing
to advance and accelerate the newly migrated callbacks.  This patch
therefore makes this advancing and acceleration happen.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:48 -07:00
Paul E. McKenney
537b85c870 rcu: Eliminate rcu_state ->orphan_lock
The ->orphan_lock is acquired and released only within the
rcu_migrate_callbacks() function, which now acquires the root rcu_node
structure's ->lock.  This commit therefore eliminates the ->orphan_lock
in favor of the root rcu_node structure's ->lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:48 -07:00
Paul E. McKenney
9fa46fb8c9 rcu: Advance outgoing CPU's callbacks before migrating them
It is possible that the outgoing CPU is unaware of recent grace periods,
and so it is also possible that some of its pending callbacks are actually
ready to be invoked.  The current callback-migration code would needlessly
force these callbacks to pass through another grace period.  This commit
therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in
order to give them full credit for having passed through any recent
grace periods.

This also fixes an odd theoretical bug where there are no callbacks in
the system except for those on the outgoing CPU, none of those callbacks
have yet been associated with a grace-period number, there is never again
another callback registered, and the surviving CPU never again takes a
scheduling-clock interrupt, never goes idle, and never enters nohz_full
userspace execution.  Yes, this is (just barely) possible.  It requires
that the surviving CPU be a nohz_full CPU, that its scheduler-clock
interrupt be shut off, and that it loop forever in the kernel.  You get
bonus points if you can make this one happen!  ;-)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:47 -07:00
Paul E. McKenney
b1a2d79fe7 rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU
RCU's CPU-hotplug callback-migration code first moves the outgoing
CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
moves them to the NOCB callback list.  This commit avoids the
extra step (and simplifies the code) by moving the callbacks directly
from the outgoing CPU's callback list to the NOCB callback list.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:47 -07:00
Paul E. McKenney
95335c0355 rcu: Check for NOCB CPUs and empty lists earlier in CB migration
The current CPU-hotplug RCU-callback-migration code checks
for the source (newly offlined) CPU being a NOCBs CPU down in
rcu_send_cbs_to_orphanage().  This commit simplifies callback migration a
bit by moving this check up to rcu_migrate_callbacks().  This commit also
adds a check for the source CPU having no callbacks, which eases analysis
of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:46 -07:00
Paul E. McKenney
c47e067a3c rcu: Remove orphan/adopt event-tracing fields
The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
are updated, but never read.  This commit therefore removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:46 -07:00
Paul E. McKenney
313517fc44 rcu: Make expedited GPs correctly handle hardware CPU insertion
The update of the ->expmaskinitnext and of ->ncpus are unsynchronized,
with the value of ->ncpus being incremented long before the corresponding
->expmaskinitnext mask is updated.  If an RCU expedited grace period
sees ->ncpus change, it will update the ->expmaskinit masks from the new
->expmaskinitnext masks.  But it is possible that ->ncpus has already
been updated, but the ->expmaskinitnext masks still have their old values.
For the current expedited grace period, no harm done.  The CPU could not
have been online before the grace period started, so there is no need to
wait for its non-existent pre-existing readers.

But the next RCU expedited grace period is in a world of hurt.  The value
of ->ncpus has already been updated, so this grace period will assume
that the ->expmaskinitnext masks have not changed.  But they have, and
they won't be taken into account until the next never-been-online CPU
comes online.  This means that RCU will be ignoring some CPUs that it
should be paying attention to.

The solution is to update ->ncpus and ->expmaskinitnext while holding
the ->lock for the rcu_node structure containing the ->expmaskinitnext
mask.  Because smp_store_release() is now used to update ->ncpus and
smp_load_acquire() is now used to locklessly read it, if the expedited
grace period sees ->ncpus change, then the updating CPU has to
already be holding the corresponding ->lock.  Therefore, when the
expedited grace period later acquires that ->lock, it is guaranteed
to see the new value of ->expmaskinitnext.

On the other hand, if the expedited grace period loads ->ncpus just
before an update, earlier full memory barriers guarantee that
the incoming CPU isn't far enough along to be running any RCU readers.

This commit therefore makes the required change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:45 -07:00
Paul E. McKenney
a58163d8ca rcu: Migrate callbacks earlier in the CPU-offline timeline
RCU callbacks must be migrated away from an outgoing CPU, and this is
done near the end of the CPU-hotplug operation, after the outgoing CPU is
long gone.  Unfortunately, this means that other CPU-hotplug callbacks
can execute while the outgoing CPU's callbacks are still immobilized
on the long-gone CPU's callback lists.  If any of these CPU-hotplug
callbacks must wait, either directly or indirectly, for the invocation
of any of the immobilized RCU callbacks, the system will hang.

This commit avoids such hangs by migrating the callbacks away from the
outgoing CPU immediately upon its departure, shortly after the return
from __cpu_die() in takedown_cpu().  Thus, RCU is able to advance these
callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
callbacks to wait on these RCU callbacks without risk of a hang.

While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
dead code on the one hand and to avoid define-without-use warnings on the
other hand.

Reported-by: Jeffrey Hugo <jhugo@codeaurora.org>
Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
2017-07-25 13:03:43 -07:00
Paul E. McKenney
8be6e1b15c rcu: Use timer as backstop for NOCB deferred wakeups
The handling of RCU's no-CBs CPUs has a maintenance headache, namely
that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
wakeup must be defered to a point where we can be sure that scheduler
locks are not held.  Of course, there are a lot of code paths leading
from an interrupts-disabled invocation of call_rcu(), and missing any
one of these can result in excessive callback-invocation latency, and
potentially even system hangs.

This commit therefore uses a timer to guarantee that the wakeup will
eventually occur.  If one of the deferred-wakeup points kicks in, then
the timer is simply cancelled.

This commit also fixes up an incomplete removal of commits that were
intended to plug remaining exit paths, which should have the added
benefit of reducing the overhead of RCU's context-switch hooks.  In
addition, it simplifies leader-to-follower callback-list handoff by
introducing locking.  The call_rcu()-to-leader handoff continues to
use atomic operations in order to maintain good real-time latency for
common-case use of call_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]
2017-07-25 09:53:09 -07:00
Paul E. McKenney
f34c8585ed rcutorture: Invoke call_rcu() from timer handler
The Linux kernel invokes call_rcu() from various interrupt/softirq
handlers, but rcutorture does not.  This commit therefore adds this
behavior to rcutorture's repertoire.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:19 -07:00
Paul E. McKenney
96036c4306 rcu: Add last-CPU to GP-kthread starvation messages
This commit augments the grace-period-kthread starvation debugging
messages by adding the last CPU that ran the kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:18 -07:00
Paul E. McKenney
a3b7b6c273 rcutorture: Eliminate unused ts_rem local from rcu_trace_clock_local()
This commit removes an unused local variable named ts_rem that is
marked __maybe_unused.  Yes, the variable was assigned to, but it
was never used beyond that point, hence not needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:17 -07:00
Paul E. McKenney
808de39cf4 rcutorture: Add task's CPU for rcutorture writer stalls
It appears that at least some of the rcutorture writer stall messages
coincide with unusually long CPU-online operations, for example, no
fewer than 205 seconds in a recent test.  It is of course possible that
the writer stall is not unrelated to this unusually long CPU-hotplug
operation, and so this commit adds the rcutorture writer task's CPU to
the stall message to gain more information about this possible connection.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:17 -07:00
Paul E. McKenney
b3c983142d rcutorture: Place event-traced strings into trace buffer
Strings used in event tracing need to be specially handled, for example,
being copied to the trace buffer instead of being pointed to by the trace
buffer.  Although the TPS() macro can be used to "launder" pointed-to
strings, this might not be all that effective within a loadable module.
This commit therefore copies rcutorture's strings to the trace buffer.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2017-07-24 16:04:12 -07:00
Paul E. McKenney
5e741fa9e9 rcutorture: Enable SRCU readers from timer handler
Now that it is legal to invoke srcu_read_lock() and srcu_read_unlock()
for a given srcu_struct from both process context and {soft,}irq
handlers, it is time to test it.  This commit therefore enables
testing of SRCU readers from rcutorture's timer handler, using in_task()
to determine whether or not it is safe to sleep in the SRCU read-side
critical sections.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:11 -07:00
Paul E. McKenney
f1dbc54b92 rcu: Remove CONFIG_TASKS_RCU ifdef from rcuperf.c
The synchronize_rcu_tasks() and call_rcu_tasks() APIs are now available
regardless of kernel configuration, so this commit removes the
CONFIG_TASKS_RCU ifdef from rcuperf.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:09 -07:00
Paul E. McKenney
ac3748c604 rcutorture: Print SRCU lock/unlock totals
This commit adds printing of SRCU lock/unlock totals, which are just
the sums of the per-CPU counts.  Saves a bit of mental arithmetic.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:08 -07:00
Paul E. McKenney
115a1a5285 rcutorture: Move SRCU status printing to SRCU implementations
This commit gets rid of some ugly #ifdefs in rcutorture.c by moving
the SRCU status printing to the SRCU implementations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:08 -07:00
Paul E. McKenney
0d8a1e831e srcu: Make process_srcu() be static
The function process_srcu() is not invoked outside of srcutree.c, so
this commit makes it static and drops the EXPORT_SYMBOL_GPL().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:03:23 -07:00
Paul E. McKenney
825c5bd2fd srcu: Move rcu_scheduler_starting() from Tiny RCU to Tiny SRCU
Other than lockdep support, Tiny RCU has no need for the
scheduler status.  However, Tiny SRCU will need this to control
boot-time behavior independent of lockdep.  Therefore, this commit
moves rcu_scheduler_starting() from kernel/rcu/tiny_plugin.h to
kernel/rcu/srcutiny.c.  This in turn allows the complete removal of
kernel/rcu/tiny_plugin.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:03:22 -07:00
Paul E. McKenney
6d48152eaf rcu: Remove RCU CPU stall warnings from Tiny RCU
Tiny RCU's job is to be tiny, so this commit removes its RCU CPU
stall warning code.  After this, there is no longer any need for
rcu_sched_ctrlblk and rcu_bh_ctrlblk to be in tiny_plugin.h, so this
commit also moves them to tiny.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:45 -07:00
Paul E. McKenney
c23484f0e7 rcu: Remove event tracing from Tiny RCU
This commit saves a few lines by getting rid of Tiny RCU's event tracing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:45 -07:00
Paul E. McKenney
43a0a2a7d7 rcu: Move RCU debug Kconfig options to kernel/rcu
RCU's debugging Kconfig options are in the unintuitive location
lib/Kconfig.debug, and there are enough of them that it would be good for
them to be more centralized.  This commit therefore extracts RCU's Kconfig
options from init/Kconfig into a new kernel/rcu/Kconfig.debug file.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:44 -07:00
Paul E. McKenney
0af92d4609 rcu: Move RCU non-debug Kconfig options to kernel/rcu
RCU's Kconfig options are scattered, and there are enough of them
that it would be good for them to be more centralized.  This commit
therefore extracts RCU's Kconfig options from init/Kconfig into a new
kernel/rcu/Kconfig file.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:44 -07:00
Paul E. McKenney
44c65ff2e3 rcu: Eliminate NOCBs CPU-state Kconfig options
The CONFIG_RCU_NOCB_CPU_ALL, CONFIG_RCU_NOCB_CPU_NONE, and
CONFIG_RCU_NOCB_CPU_ZERO Kconfig options are used only in testing and
are redundant with the rcu_nocbs= boot parameter.  This commit therefore
removes these three Kconfig options and adjusts the rcutorture scripts
to use the boot parameter instead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:43 -07:00
Paul E. McKenney
ae91aa0adb rcu: Remove debugfs tracing
RCU's debugfs tracing used to be the only reasonable low-level debug
information available, but ftrace and event tracing has since surpassed
the RCU debugfs level of usefulness.  This commit therefore removes
RCU's debugfs tracing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:43 -07:00
Paul E. McKenney
bd8cc5a062 srcu: Remove Classic SRCU
Classic SRCU was only ever intended to be a fallback in case of issues
with Tree/Tiny SRCU, and the latter two are doing quite well in testing.
This commit therefore removes Classic SRCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:42 -07:00
Paul E. McKenney
7f0cd63330 srcu: Fix rcutorture-statistics typo
The function srcutorture_get_gp_data() duplicated the check for
sp->batch_check0.head instead of also checking sp->batch_check1.head.
The only effect of this typo would be for rcutorture statistics to
understate the fraction of time that an SRCU grace period was in flight,
and only for Classic SRCU.  This commit fixes this typo.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:42 -07:00
Paul E. McKenney
c4a09ff752 rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
The PROVE_RCU_REPEATEDLY Kconfig option was initially added due to
the volume of messages from PROVE_RCU: Doing just one per boot would
have required excessive numbers of boots to locate them all.  However,
PROVE_RCU messages are now relatively rare, so there is no longer any
reason to need more than one such message per boot.  This commit therefore
removes the PROVE_RCU_REPEATEDLY Kconfig option.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
2017-06-08 18:52:41 -07:00
Paul E. McKenney
4e4bea7427 rcu: Remove typecheck() from RCU locking wrapper functions
Because raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore()
both do typecheck() on their flags argument, there is no point in
duplicating this check in raw_spin_lock_irqsave_rcu_node() and
raw_spin_unlock_irqrestore_rcu_node().  This commit therefore saves
a few lines by removing this duplicated check.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:40 -07:00
Paul E. McKenney
fe5ac724d8 rcu: Remove nohz_full full-system-idle state machine
The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013
by commit 0edd1b1784 ("nohz_full: Add full-system-idle state machine"),
but has not been used.  This commit therefore removes it.

If it turns out to be needed later, this commit can always be reverted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-08 18:52:39 -07:00
Paul E. McKenney
f7a10a9750 rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
Anything that can be done with the RCU_KTHREAD_PRIO Kconfig option can
also be done with the rcutree.kthread_prio kernel boot parameter.
This commit therefore removes this Kconfig option.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
2017-06-08 18:52:39 -07:00
Paul E. McKenney
90040c9e30 rcu: Remove *_SLOW_* Kconfig options
The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY,
RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT,
RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP,
and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only
useful for torture testing, and there are the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters
that rcutorture can use instead.  The effect of these parameters is to
artificially slow down grace period initialization and cleanup in order
to make some types of race conditions happen more often.

This commit therefore simplifies Tree RCU a bit by removing the Kconfig
options and adding the corresponding kernel parameters to rcutorture's
.boot files instead.  However, this commit also leaves out the kernel
parameters for TREE02, TREE04, and TREE07 in order to have about the
same number of tests slowed as not slowed.  TREE01, TREE03, TREE05,
and TREE06 are slowed, and the rest are not slowed.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:38 -07:00
Paul E. McKenney
a3883df393 srcu: Use rnp->lock wrappers to replace explicit memory barriers
This commit uses TREE RCU's rnp->lock wrappers to replace a few explicit
memory barriers.  This change also has the advantage of making SRCU's
memory-ordering properties be implemented in roughly the same way as they
are in Tree RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:38 -07:00
Paul E. McKenney
83d40bd3bc rcu: Move rnp->lock wrappers for SRCU use
This commit moves the now-generic rnp->lock wrapper macros from
kernel/rcu/tree.h to kernel/rcu/rcu.h, thus allowing SRCU to use them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:38 -07:00
Paul E. McKenney
bf32c76540 rcu: Convert rnp->lock wrappers to macros for SRCU use
Use of smp_mb__after_unlock_lock() would allow SRCU to omit a full
memory barrier during callback execution, so this commit converts
raw_spin_lock_rcu_node() from inline functions to type-generic macros
to allow them to handle locks in srcu_node structures as well as
rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:37 -07:00
Paul E. McKenney
2464dd940e srcu: Apply trivial callback lists to shrink Tiny SRCU
The rcu_segcblist structure provides quite a bit of functionality, and
Tiny SRCU needs almost none of it.  So this commit replaces Tiny SRCU's
uses of rcu_segcblist with a simple singly linked list with tail pointer.
This change significantly reduces Tiny SRCU's memory footprint, more
than making up for the growth caused by the creation of rcu_segcblist.c

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:35 -07:00
Paul E. McKenney
5a0465e17a srcu: Shrink srcu.h by moving docbook and private function
The call_srcu() docbook entry is currently in include/linux/srcu.h,
which causes needless processing for each include point.  This commit
therefore moves this entry to kernel/rcu/srcutree.c, which the compiler
reads only once.  In addition, the srcu_batches_completed() function is
used only within RCU and its torture-test suites.  This commit therefore
also moves this function's declaration from include/linux/srcutiny.h,
include/linux/srcutree.h, and include/linux/srcuclassic.h to
kernel/rcu/rcu.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:35 -07:00
Paul E. McKenney
c350c00829 srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
If a given CPU never happens to ever start an SRCU grace period, the
grace-period sequence counter might wrap.  If this CPU were to decide to
finally start a grace period, the state of its sdp->srcu_gp_seq_needed
might make it appear that it has already requested this grace period,
which would prevent starting the grace period.  If no other CPU ever started
a grace period again, this would look like a grace-period hang.  Even
if some other CPU took pity and started the needed grace period, the
leaf rcu_node structure's ->srcu_data_have_cbs field won't have record
of the fact that this CPU has a callback pending, which would look like
a very localized grace-period hang.

This might seem very unlikely, but SRCU grace periods can take less than
a microsecond on small systems, which means that overflow can happen
in much less than an hour on a 32-bit embedded system.  And embedded
systems are especially likely to have long-term idle CPUs.  Therefore,
it makes sense to prevent this scenario from happening.

This commit therefore scans each srcu_data structure occasionally,
with frequency controlled by the srcutree.counter_wrap_check kernel
boot parameter.  This parameter can be set to something like 255
in order to exercise the counter-wrap-prevention code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:34 -07:00
Paul E. McKenney
fe21a27e8c rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h
The rcu_request_urgent_qs_task() function is used only within RCU,
so there is no point in exporting it to the rest of the kernel from
nclude/linux/rcutiny.h and include/linux/rcutree.h.  This commit therefore
moves this function to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:33 -07:00
Paul E. McKenney
e3c8d51e1a rcu: Move torture-related functions out of rcutiny.h and rcutree.h
The various functions similar to rcu_batches_started(), the
function show_rcu_gp_kthreads(), the various functions similar to
rcu_force_quiescent_state(), and the variables rcutorture_testseq and
rcutorture_vernum are used only within RCU.  There is therefore no point
in exporting them to the kernel at large from include/linux/rcutiny.h
and include/linux/rcutree.h.  This commit therefore moves all of these
to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:33 -07:00
Paul E. McKenney
b8989b7605 rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h
The rcu_ftrace_dump() function is used only internally to RCU.  This
commit therefore moves its declaration from include/linux/rcupdate.h
to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:32 -07:00
Paul E. McKenney
3d54f7983f rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h
The rcu_is_nocb_cpu() function is used only internally to RCU.  This
commit therefore moves its declaration from include/linux/rcupdate.h
to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:31 -07:00
Paul E. McKenney
fa3c664769 rcu: Improve __call_rcu() debug-objects error message
The "__call_rcu(): Leaked duplicate callback" error message from
__call_rcu() has proven to be unhelpful.  This commit therefore changes
it to "__call_rcu(): Double-freed CB" and adds the value of the pointer
passed in.  The value of the pointer improves debuggability by allowing
correlation with tracing output, for example, the rcu:rcu_callback trace
event.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:31 -07:00
Paul E. McKenney
82118249d0 rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h
The RCU_SCHEDULER_INACTIVE, RCU_SCHEDULER_INIT, and RCU_SCHEDULER_RUNNING
definitions are used only within RCU, so this commit moves them from
include/linux/rcupdate.h to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:30 -07:00
Paul E. McKenney
791875d16e rcu: Eliminate the unused __rcu_is_watching() function
The __rcu_is_watching() function is currently not used, aside from
to implement the rcu_is_watching() function.  This commit therefore
eliminates __rcu_is_watching(), which has the beneficial side-effect
of shrinking include/linux/rcupdate.h a bit.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:30 -07:00
Paul E. McKenney
cad7b38972 rcu: Move torture-related definitions from rcupdate.h to rcu.h
The include/linux/rcupdate.h file contains a number of definitions that
are used only to communicate between rcutorture, rcuperf, and the RCU code
itself.  There is no point in having these definitions exposed globally
throughout the kernel, so this commit moves them to kernel/rcu/rcu.h.
This change has the added benefit of shrinking rcupdate.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:28 -07:00
Paul E. McKenney
25c36329a3 rcu: Move expediting-related access/control out of rcupdate.h
The rcu_gp_is_normal(), rcu_gp_is_expedited(), rcu_expedite_gp(), and
rcu_unexpedite_gp() functions are intended only for use within the
RCU implementation itself -- the sysfs access is what should be used
outside of RCU.  This commit therefore moves the declarations for
these functions to kernel/rcu/rcu.h, and also includes this file into
kernel/rcu/rcutorture.c and kernel/rcu/rcuperf.c.  This also has the
beneficial effect of shrinking rcupdate.c a bit.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:28 -07:00
Paul E. McKenney
3caec62fbb rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
The rcu_expedited and rcu_normal variables are used only by sysctl
and kernel/rcu/update.c, so it does not make sense to their extern
declarations in rcupdate.h.  This commit therefore moves these
extern declarations to update.c.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:27 -07:00
Paul E. McKenney
a68a2bb28b rcu: Move docbook comments out of rcupdate.h
The include/linux/rcupdate.h file is included by more than 200
files, so shrinking it should provide some build-time benefits.
This commit therefore moves several docbook comments from rcupdate.h to
kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus
reducing the number of times that the compiler has to scan these comments.
This likely provides only a small benefit, but every little bit helps.

This commit also fixes a malformed bulleted list noted by the 0day
Test Robot.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:27 -07:00
Paul E. McKenney
6b5fc3a133 rcu: Add memory barriers for NOCB leader wakeup
Wait/wakeup operations do not guarantee ordering on their own.  Instead,
either locking or memory barriers are required.  This commit therefore
adds memory barriers to wake_nocb_leader() and nocb_leader_wait().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: <stable@vger.kernel.org> # 4.6.x
2017-06-08 18:51:59 -07:00
Paul E. McKenney
511324e462 rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE
The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags
are used to mediate wakeups for the no-CBs CPU kthreads.  The "NOGP"
really doesn't make any sense, so this commit does s/NOGP/NOCB/.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:40 -07:00
Paul E. McKenney
68ab0b4263 rcu: Make synchronize_rcu_mult() check for duplicates
Currently, doing synchronize_rcu_mult(call_rcu, call_rcu) might
(or might not) wait for two RCU grace periods.  One approach is
of course "don't do that!", but in CONFIG_PREEMPT=n kernels,
synchronize_rcu_mult(call_rcu, call_rcu_sched) does exactly that.
This results in an ugly #ifdef in sched_cpu_deactivate().

This commit therefore makes __wait_rcu_gp() check for duplicates,
which in turn allows duplicates to be passed to synchronize_rcu_mult()
without risk of waiting twice on the same type of grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:39 -07:00
Paul E. McKenney
a602538e46 srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality
This commit adds DEBUG_OBJECTS_RCU_HEAD checking to detect call_srcu()
counterparts to double-free bugs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:39 -07:00
Paul E. McKenney
d4efe6c5ad srcu: Shrink Tiny SRCU a bit
In Tiny SRCU, __srcu_read_lock() is a trivial function, outweighed by
its EXPORT_SYMBOL_GPL(), and on many architectures, its call sequence.
This commit therefore moves it to srcutiny.h so that it can be inlined.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:38 -07:00
Paul E. McKenney
ea9b0c8a26 rcu: Add lockdep_assert_held() teeth to tree_plugin.h
Comments can be helpful, but assertions carry more force.  This commit
therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to
enforce lock-held and interrupt-disabled preconditions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:37 -07:00
Paul E. McKenney
c0b334c5bf rcu: Add lockdep_assert_held() teeth to tree.c
Comments can be helpful, but assertions carry more force.  This
commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN()
calls to enforce lock-held and interrupt-disabled preconditions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:37 -07:00
Paul E. McKenney
0c8e0e3c37 srcu: Print non-default exp_holdoff values at boot time
This commit makes srcu_bootup_announce() check for non-default values
of the auto-expedite holdoff time exp_holdoff and print a message if so.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:36 -07:00
Paul E. McKenney
b5815e6cd3 srcu: Make exp_holdoff module parameter be static
Because exp_holdoff is not used outside of srcutree.c, it can be static.
This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:36 -07:00
Paul E. McKenney
17c7798bea rcu: Update rcu_bootup_announce_oddness()
This commit updates rcu_bootup_announce_oddness() to check additional
Kconfig options and module/boot parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:35 -07:00
Paul E. McKenney
59d80fd835 rcu: Print out rcupdate.c non-default boot-time settings
This commit adds a rcupdate_announce_bootup_oddness() function to
print out non-default values of significant kernel boot parameter
settings to aid in debugging.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:35 -07:00
Paul E. McKenney
f4687d2637 rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()
This commit adds WARN_ON_ONCE() calls that trigger if either
rcu_sched_qs() or rcu_bh_qs() are invoked with preemption enabled.
In the immortal words of Peter Zijlstra: "these are much harder to ignore
than comments".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:34 -07:00
Paul E. McKenney
820687a7b9 rcuperf: Add writer_holdoff boot parameter
This commit adds a writer_holdoff boot parameter to rcuperf, which is
intended to be used to test Tree SRCU's auto-expediting.  This
boot parameter is in microseconds, and defaults to zero (that is,
disabled).  Set it to a bit larger than srcutree.exp_holdoff,
keeping the nanosecond/microsecond conversion, to force Tree SRCU
to auto-expedite more aggressively.

This commit also adds documentation for this parameter, and fixes some
alphabetization while in the neighborhood.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:32 -07:00
Paul E. McKenney
492b95e597 rcuperf: Set more user-friendly defaults
Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
as a module, rcuperf.shutdown.  This commit therefore sets the default
for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
to zero if rcuperf is built as a module and to one otherwise.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:31 -07:00
Paul E. McKenney
3ddf20c953 srcu: Shrink Tiny SRCU a bit more
This commit rearranges Tiny SRCU's srcu_struct structure, substitutes
u8 for bool, and shrinks counters down to short.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:31 -07:00
Paul E. McKenney
1f4f6da1c8 srcu: Make Classic and Tree SRCU announce themselves at bootup
Currently, the only way to tell whether a given kernel is running
Classic, Tiny, or Tree SRCU is to look at the .config file, which
can easily be lost or associated with the wrong kernel.  This commit
therefore has Classic and Tree SRCU identify themselves at boot time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:30 -07:00
Paul E. McKenney
f60cb4d4c8 rcuperf: Add test for dynamically initialized srcu_struct
This commit adds a perf_type of "srcud", which species that rcuperf
test SRCU on a dynamically initialized srcu_struct.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:28 -07:00
Paul E. McKenney
dcfc315b7b rcu: Make sync_rcu_preempt_exp_done() return bool
The sync_rcu_preempt_exp_done() function returns a logical expression,
but its return type is nevertheless int.  This commit therefore changes
the return type to bool.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:27 -07:00
Paul E. McKenney
881ed593a3 rcuperf: Add ability to performance-test call_rcu() and friends
This commit upgrades rcuperf so that it can do performance testing on
asynchronous grace-period primitives such as call_srcu().  There is
a new rcuperf.gp_async module parameter that specifies this new behavior,
with the pre-existing rcuperf.gp_exp testing expedited grace periods such as
synchronize_rcu_expedited, and with the default being to test synchronous
non-expedited grace periods such as synchronize_rcu().

There is also a new rcuperf.gp_async_max module parameter that specifies
the maximum number of outstanding callbacks per writer kthread, defaulting
to 1,000.  When this limit is exceeded, the writer thread invokes the
appropriate flavor of rcu_barrier() to wait for callbacks to drain.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Removed the redundant initialization noted by Arnd Bergmann. ]
2017-06-08 08:25:26 -07:00
Paul E. McKenney
e28371c891 rcu: Remove obsolete reference to synchronize_kernel()
The synchronize_kernel() primitive was removed in favor of
synchronize_sched() more than a decade ago, and it seems likely that
rather few kernel hackers are familiar with it.  Its continued presence
is therefore providing more confusion than enlightenment.  This commit
therefore removes the reference from the synchronize_sched() header
comment, and adds the corresponding information to the synchronize_rcu(0
header comment.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:25 -07:00
Paul E. McKenney
9683937df9 rcuperf: Defer expedited/normal check to end of test
Current rcuperf startup checks to see if the user asked to measure
only expedited grace periods, yet constrained all grace periods to be
normal, or if the user asked to measure only normal grace periods, yet
constrained all grace periods to be expedited.  Useless tests of this
sort are aborted.

Unfortunately, making RCU work through the mid-boot dead zone [1] puts
RCU into expedited-only mode during that zone.  Which happens to also
be the exact time that rcuperf carries out the aforementioned check.
So if the user asks rcuperf to measure only normal grace periods (the
default), rcuperf will now always complain and terminate the test.

This commit therefore moves the checks to rcu_perf_cleanup().  This has
the disadvantage of failing to abort useless tests, but avoids the need to
create yet another kthread and the need to do fiddly checks involving the
holdoff time.  (Yes, another approach is to do the checks in a late-stage
init function, but that would require some way to communicate badness
to rcuperf's kthreads, and seems not worth the bother.)

[1] https://lwn.net/Articles/716148/

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:24 -07:00
Paul E. McKenney
5b72f9643b rcu: Complain if blocking in preemptible RCU read-side critical section
Although preemptible RCU allows its read-side critical sections to be
preempted, general blocking is forbidden.  The reason for this is that
excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
voluntarily blocked task doesn't care how high you boost its priority.
Because preemptible RCU is a global mechanism, one ill-behaved reader
hurts everyone.  Hence the prohibition against general blocking in
RCU-preempt read-side critical sections.  Preemption yes, blocking no.

This commit enforces this prohibition.

There is a special exception for the -rt patchset (which they kindly
volunteered to implement):  It is OK to block (as opposed to merely being
preempted) within an RCU-preempt read-side critical section, but only if
the blocking is subject to priority inheritance.  This exception permits
CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.

Why doesn't this exception also apply to mainline's rt_mutex?  Because
of the possibility that someone does general blocking while holding
an rt_mutex.  Yes, the priority boosting will affect the rt_mutex,
but it won't help with the task doing general blocking while holding
that rt_mutex.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:24 -07:00
Paul E. McKenney
881ec9d209 srcu: Eliminate possibility of destructive counter overflow
Earlier versions of Tree SRCU were subject to a counter overflow bug that
could theoretically result in too-short grace periods.  This commit
eliminates this problem by adding an update-side memory barrier.
The short explanation is that if the updater sums the unlock counts
too late to see a given __srcu_read_unlock() increment, that CPU's
next __srcu_read_lock() must see the new value of ->srcu_idx, thus
incrementing the other bank of counters.  This eliminates the possibility
of destructive counter overflow as long as the srcu_read_lock() nesting
level does not exceed floor(ULONG_MAX/NR_CPUS/2), which should be an
eminently reasonable nesting limit, especially on 64-bit systems.

Reported-by: Lance Roy <ldr709@gmail.com>
Suggested-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:23 -07:00
Paul E. McKenney
f92c734f02 rcu: Prevent rcu_barrier() from starting needless grace periods
Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
on each CPU with a non-empty callback list.  This works, but means
that rcu_barrier() forces grace periods that are not otherwise needed.
The key point is that rcu_barrier() never needs to wait for a grace
period, but instead only for all pre-existing callbacks to be invoked.
This means that rcu_barrier()'s new callbacks should be placed in
the callback-list segment containing the last pre-existing callback.

This commit makes this change using the new rcu_segcblist_entrain()
function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:22 -07:00
Paolo Bonzini
1123a60416 srcu: Allow use of Classic SRCU from both process and interrupt context
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a3d ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Cc: stable@vger.kernel.org
Fixes: 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: Linu Cherian <linuc.decode@gmail.com>
Suggested-by: Linu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:19 -07:00
Paolo Bonzini
cdf7abc461 srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a3d ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
a single variable.  Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
implementation already supports mixed-context use of srcu_read_lock()
and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
and srcu_read_unlock() in each handler are nested and paired properly.
In other words, it is still illegal to (say) invoke srcu_read_lock()
in an interrupt handler and to invoke the matching srcu_read_unlock()
in a softirq handler.  Therefore, the only change required for Tiny SRCU
is to its comments.

Fixes: 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: Linu Cherian <linuc.decode@gmail.com>
Suggested-by: Linu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-08 08:24:26 -07:00
Linus Torvalds
de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Paul E. McKenney
933dfbd7c4 rcu: Open-code the rcu_cblist_n_lazy_cbs() function
Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter,
and because the rcu_cblist structure is quite straightforward, it makes
sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out
a level of indirection.  This commit makes this change.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02 09:22:48 -07:00
Paul E. McKenney
4b27f20b40 rcu: Open-code the rcu_cblist_n_cbs() function
Because the rcu_cblist_n_cbs() just samples the ->len counter, and
because the rcu_cblist structure is quite straightforward, it makes
sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level
of indirection.  This commit makes this change.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02 09:21:59 -07:00
Paul E. McKenney
8ef0f37efb rcu: Open-code the rcu_cblist_empty() function
Because the rcu_cblist_empty() just samples the ->head pointer, and
because the rcu_cblist structure is quite straightforward, it makes
sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a
level of indirection.  This commit makes this change.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02 08:18:40 -07:00
Paul E. McKenney
98059b9861 rcu: Separately compile large rcu_segcblist functions
This commit creates a new kernel/rcu/rcu_segcblist.c file that
contains non-trivial segcblist functions.  Trivial functions
remain as static inline functions in kernel/rcu/rcu_segcblist.h

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2017-05-02 07:21:02 -07:00
Ingo Molnar
45753c5f31 srcu: Debloat the <linux/rcu_segcblist.h> header
Linus noticed that the <linux/rcu_segcblist.h> has huge inline functions
which should not be inline at all.

As a first step in cleaning this up, move them all to kernel/rcu/ and
only keep an absolute minimum of data type defines in the header:

  before:   -rw-r--r-- 1 mingo mingo 22284 May  2 10:25 include/linux/rcu_segcblist.h
   after:   -rw-r--r-- 1 mingo mingo  3180 May  2 10:22 include/linux/rcu_segcblist.h

More can be done, such as uninlining the large functions, which inlining
is unjustified even if it's an RCU internal matter.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-05-02 06:29:22 -07:00
Paul E. McKenney
b5fe223a4b srcu: Adjust default auto-expediting holdoff
The default value for the kernel boot parameter srcutree.exp_holdoff
is 50 microseconds, which is too long for good Tree SRCU performance
(compared to Classic SRCU) on the workloads tested by Mike Galbraith.
This commit therefore sets the default value to 25 microseconds, which
shows excellent results in Mike's testing.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-27 08:35:24 -07:00
Paul E. McKenney
22607d66bb srcu: Specify auto-expedite holdoff time
On small systems, in the absence of readers, expedited SRCU grace
periods can complete in less than a microsecond.  This means that an
eight-CPU system can have all CPUs doing synchronize_srcu() in a tight
loop and almost always expedite.  This might actually be desirable in
some situations, but in general it is a good way to needlessly burn
CPU cycles.  And in those situations where it is desirable, your friend
is the function synchronize_srcu_expedited().

For other situations, this commit adds a kernel parameter that specifies
a holdoff between completing the last SRCU grace period and auto-expediting
the next.  If the next grace period starts before the holdoff expires,
auto-expediting is disabled.  The holdoff is 50 microseconds by default,
and can be tuned to the desired number of nanoseconds.  A value of zero
disables auto-expediting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:17 -07:00
Paul E. McKenney
2da4b2a7fd srcu: Expedite first synchronize_srcu() when idle
Classic SRCU in effect expedites the first synchronize_srcu() when SRCU
is idle, and Mike Galbraith demonstrated that some use cases do in fact
rely on this behavior.  In particular, Mike showed that Steven Rostedt's
hotplug stress script takes 55 seconds with Classic SRCU and more than
16 -minutes- when running Tree SRCU.  Assuming that each Tree SRCU's call
to synchronize_srcu() takes four milliseconds, this implies that Steven's
test invokes synchronize_srcu() in isolation, but more than once per
200 microseconds.  Mike used ftrace to demonstrate that the time between
successive calls to synchronize_srcu() ranged from 118 to 342 microseconds,
with one outlier at 80 milliseconds.  This data clearly indicates that
Tree SRCU needs to expedite the first invocation of synchronize_srcu()
during an SRCU idle period.

This commit therefor introduces a srcu_might_be_idle() function that
probabilistically checks whether or not SRCU is idle.  This function is
used by synchronize_rcu() as an additional criterion in deciding whether
or not to expedite.

(Hat trick to Peter Zijlstra for his earlier suggestion that this might
in fact be a problem.  Which for all I know might have motivated Mike to
look into it.)

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:16 -07:00
Paul E. McKenney
1e9a038b7f srcu: Expedited grace periods with reduced memory contention
Commit f60d231a87 ("srcu: Crude control of expedited grace periods")
introduced a per-srcu_struct atomic counter to track outstanding
requests for grace periods.  This works, but represents a memory-contention
bottleneck.  This commit therefore uses the srcu_node combining tree
to remove this bottleneck.

This commit adds new ->srcu_gp_seq_needed_exp fields to the
srcu_data, srcu_node, and srcu_struct structures, which track the
farthest-in-the-future grace period that must be expedited, which in
turn requires that all nearer-term grace periods also be expedited.
Requests for expediting start with the srcu_data structure, run up
through the srcu_node tree, and end at the srcu_struct structure.
Note that it may be necessary to expedite a grace period that just
now started, and this is handled by a new srcu_funnel_exp_start()
function, which is invoked when the grace period itself is already
in its way, but when that grace period was not marked as expedited.

A new srcu_get_delay() function returns zero if there is at least one
expedited SRCU grace period in flight, or SRCU_INTERVAL otherwise.
This function is used to calculate delays:  Normal grace periods
are allowed to extend in order to cover more requests with a given
grace-period computation, which decreases per-request overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:16 -07:00
Paul E. McKenney
7f6733c3c6 srcu: Make rcutorture writer stalls print SRCU GP state
In the past, SRCU was simple enough that there was little point in
making the rcutorture writer stall messages print the SRCU grace-period
number state.  With the advent of Tree SRCU, this has changed.  This
commit therefore makes Classic, Tiny, and Tree SRCU report this state
to rcutorture as needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 11:23:28 -07:00
Paul E. McKenney
c7e88067c1 srcu: Exact tracking of srcu_data structures containing callbacks
The current Tree SRCU implementation schedules a workqueue for every
srcu_data covered by a given leaf srcu_node structure having callbacks,
even if only one of those srcu_data structures actually contains
callbacks.  This is clearly inefficient for workloads that don't feature
callbacks everywhere all the time.  This commit therefore adds an array
of masks that are used by the leaf srcu_node structures to track exactly
which srcu_data structures contain callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 11:23:12 -07:00
Paul E. McKenney
f2094107ac Merge branches 'doc.2017.04.12a', 'fixes.2017.04.19a' and 'srcu.2017.04.21a' into HEAD
doc.2017.04.12a: Documentation updates
fixes.2017.04.19a: Miscellaneous fixes
srcu.2017.04.21a: Parallelize SRCU callback handling
2017-04-21 06:00:13 -07:00
Paul E. McKenney
bcbfdd01dc rcu: Make non-preemptive schedule be Tasks RCU quiescent state
Currently, a call to schedule() acts as a Tasks RCU quiescent state
only if a context switch actually takes place.  However, just the
call to schedule() guarantees that the calling task has moved off of
whatever tracing trampoline that it might have been one previously.
This commit therefore plumbs schedule()'s "preempt" parameter into
rcu_note_context_switch(), which then records the Tasks RCU quiescent
state, but only if this call to schedule() was -not- due to a preemption.

To avoid adding overhead to the common-case context-switch path,
this commit hides the rcu_note_context_switch() check under an existing
non-common-case check.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-21 05:59:27 -07:00
Paul E. McKenney
0497b489b8 srcu: Expedite srcu_schedule_cbs_snp() callback invocation
Although Tree SRCU does reduce delays when there is at least one
synchronize_srcu_expedited() invocation pending, srcu_schedule_cbs_snp()
still waits for SRCU_INTERVAL before invoking callbacks.  Since
synchronize_srcu_expedited() now posts a callback and waits for
that callback to do a wakeup, this destroys the expedited nature of
synchronize_srcu_expedited().  This destruction became apparent to
Marc Zyngier in the guise of a guest-OS bootup slowdown from five
seconds to no fewer than forty seconds.

This commit therefore invokes callbacks immediately at the end of the
grace period when there is at least one synchronize_srcu_expedited()
invocation pending.  This brought Marc's guest-OS bootup times back
into the realm of reason.

Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
2017-04-21 05:59:27 -07:00
Paul E. McKenney
da915ad5cf srcu: Parallelize callback handling
Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
however, there are workloads that could result in a high volume of
concurrent invocations of call_srcu(), which with current SRCU would
result in excessive lock contention on the srcu_struct structure's
->queue_lock, which protects SRCU's callback lists.  This commit therefore
moves SRCU to per-CPU callback lists, thus greatly reducing contention.

Because a given SRCU instance no longer has a single centralized callback
list, starting grace periods and invoking callbacks are both more complex
than in the single-list Classic SRCU implementation.  Starting grace
periods and handling callbacks are now handled using an srcu_node tree
that is in some ways similar to the rcu_node trees used by RCU-bh,
RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
controlled by exactly the same Kconfig options and boot parameters that
control the shape of the rcu_node tree).

In addition, the old per-CPU srcu_array structure is now named srcu_data
and contains an rcu_segcblist structure named ->srcu_cblist for its
callbacks (and a spinlock to protect this).  The srcu_struct gets
an srcu_gp_seq that is used to associate callback segments with the
corresponding completion-time grace-period number.  These completion-time
grace-period numbers are propagated up the srcu_node tree so that the
grace-period workqueue handler can determine whether additional grace
periods are needed on the one hand and where to look for callbacks that
are ready to be invoked.

The srcu_barrier() function must now wait on all instances of the per-CPU
->srcu_cblist.  Because each ->srcu_cblist is protected by ->lock,
srcu_barrier() can remotely add the needed callbacks.  In theory,
it could also remotely start grace periods, but in practice doing so
is complex and racy.  And interestingly enough, it is never necessary
for srcu_barrier() to start a grace period because srcu_barrier() only
enqueues a callback when a callback is already present--and it turns out
that a grace period has to have already been started for this pre-existing
callback.  Furthermore, it is only the callback that srcu_barrier()
needs to wait on, not any particular grace period.  Therefore, a new
rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
callback into the same segment occupied by the last pre-existing callback
in the list.  The special case where all the pre-existing callbacks are
on a different list (because they are in the process of being invoked)
is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
segment, relying on the done-callbacks check that takes place after all
callbacks are inovked.

Note that the readers use the same algorithm as before.  Note that there
is a separate srcu_idx that tells the readers what counter to increment.
This unfortunately cannot be combined with srcu_gp_seq because they
need to be incremented at different times.

This commit introduces some ugly #ifdefs in rcutorture.  These will go
away when I feel good enough about Tree SRCU to ditch Classic SRCU.

Some crude performance comparisons, courtesy of a quickly hacked rcuperf
asynchronous-grace-period capability:

			Callback Queuing Overhead
			-------------------------
	# CPUS		Classic SRCU	Tree SRCU
	------          ------------    ---------
	     2              0.349 us     0.342 us
	    16             31.66  us     0.4   us
	    41             ---------     0.417 us

The times are the 90th percentiles, a statistic that was chosen to reject
the overheads of the occasional srcu_barrier() call needed to avoid OOMing
the test machine.  The rcuperf test hangs when running Classic SRCU at 41
CPUs, hence the line of dashes.  Despite the hacks to both the rcuperf code
and that statistics, this is a convincing demonstration of Tree SRCU's
performance and scalability advantages.

[1] https://lwn.net/Articles/309030/
[2] https://patchwork.kernel.org/patch/5108281/

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
2017-04-21 05:59:26 -07:00
Paul E. McKenney
bfd090be14 rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
This commit just changes a "the the" to "the" to reduce repetition.

Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:20 -07:00
Nicholas Mc Guire
5455a7f6a8 rcu: Use true/false in assignment to bool
This commit makes the parse_rcu_nocb_poll() function assign true
(rather than the constant 1) to the bool variable rcu_nocb_poll.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:20 -07:00
Nicholas Mc Guire
50dc7def4a rcu: Use bool value directly
The beenonline variable is declared bool so there is no need for an
explicit comparison, especially not against the constant zero.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:19 -07:00
Paul E. McKenney
deb34f3643 rcu: Improve comments for hotplug/suspend/hibernate functions
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:18 -07:00
Paul E. McKenney
d1e4f01d09 rcu: Remove obsolete comment from rcu_future_gp_cleanup() header
The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this
commit drags this comment into the year 2017.

Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:17 -07:00
Paul E. McKenney
dad81a2026 srcu: Introduce CLASSIC_SRCU Kconfig option
The TREE_SRCU rewrite is large and a bit on the non-simple side, so
this commit helps reduce risk by allowing the old v4.11 SRCU algorithm
to be selected using a new CLASSIC_SRCU Kconfig option that depends
on RCU_EXPERT.  The default is to use the new TREE_SRCU and TINY_SRCU
algorithms, in order to help get these the testing that they need.
However, if your users do not require the update-side scalability that
is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU
to revert back to the old classic SRCU algorithm.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:23 -07:00
Paul E. McKenney
32071141b2 srcutorture: Print Tiny SRCU reader statistics
The srcu_torture_stats() function is adapted to the specific srcu_struct
layout traditionally used by SRCU.  This commit therefore adds support
for Tiny SRCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney
d8be81735a srcu: Create a tiny SRCU
In response to automated complaints about modifications to SRCU
increasing its size, this commit creates a tiny SRCU that is
used in SMP=n && PREEMPT=n builds.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney
f60d231a87 srcu: Crude control of expedited grace periods
SRCU's implementation of expedited grace periods has always assumed
that the SRCU instance is idle when the expedited request arrives.
This commit improves this a bit by maintaining a count of the number
of outstanding expedited requests, thus allowing prior non-expedited
grace periods accommodate these requests by shifting to expedited mode.
However, any non-expedited wait already in progress will still wait for
the full duration.

Improved control of expedited grace periods is planned, but one step
at a time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney
80a7956fe3 srcu: Merge ->srcu_state into ->srcu_gp_seq
Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
race conditions given multiple callback queues, so this commit takes
advantage of the two-bit state now available in rcu_seq counters to
store the state in the bottom two bits of ->srcu_gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney
f1ec57a462 srcu: Allow a second bit in rcu_seq for SRCU state
This commit increases the number of reserved bits at the bottom of an
rcu_seq grace-period counter from one to two, as will be needed to
accommodate SRCU's three-state grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney
031aeee000 srcu: Improve rcu_seq grace-period-counter abstraction
The expedited grace-period code contains several open-coded shifts
know the format of an rcu_seq grace-period counter, which is not
particularly good style.  This commit therefore creates a new
rcu_seq_ctr() function that extracts the counter portion of the
counter, and an rcu_seq_state() function that extracts the low-order
state bit.  This commit prepares for SRCU callback parallelization,
which will require two state bits.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney
91e27c356c srcu: Fix bogus try_check_zero() comment
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney
e95d68d212 srcu: Make num_rcu_lvl[] array be external
This commit makes the num_rcu_lvl[] array external so that SRCU can
make use of it for initializing its upcoming srcu_node tree.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney
efbe451d46 srcu: Move rcu_node traversal macros to rcu.h
This commit moves rcu_for_each_node_breadth_first(),
rcu_for_each_nonleaf_node_breadth_first(), and
rcu_for_each_leaf_node() from kernel/rcu/tree.h to
kernel/rcu/rcu.h so that SRCU can access them.
This commit is code-movement only.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney
41f5c63178 rcu: Remove redundant levelcnt[] array from rcu_init_one()
The levelcnt[] array is identical to num_rcu_lvl[], so this commit
removes levelcnt[].

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney
2b34c43cc1 srcu: Move rcu_init_levelspread() to rcu_tree_node.h
This commit moves the rcu_init_levelspread() function from
kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it.  This is
another step towards enabling SRCU to create its own combining tree.
This commit is code-movement only, give or take knock-on adjustments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney
f2425b4efb srcu: Move combining-tree definitions for SRCU's benefit
This commit moves the C preprocessor code that defines the default shape
of the rcu_node combining tree to a new include/linux/rcu_node_tree.h
file as a first step towards enabling SRCU to create its own combining
tree, which in turn enables SRCU to implement per-CPU callback handling,
thus avoiding contention on the lock currently guarding the single list
of callbacks.  Note that users of SRCU still need to know the size of
the srcu_struct structure, hence include/linux rather than kernel/rcu.

This commit is code-movement only.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney
8660b7d8a5 srcu: Use rcu_segcblist to track SRCU callbacks
This commit switches SRCU from custom-built callback queues to the new
rcu_segcblist structure.  This change associates grace-period sequence
numbers with groups of callbacks, which will be needed for efficient
processing of per-CPU callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney
ac367c1c62 srcu: Add grace-period sequence numbers
This commit adds grace-period sequence numbers, which will be used to
handle mid-boot grace periods and per-CPU callback lists.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney
c2a8ec0778 srcu: Move to state-based grace-period sequencing
The current SRCU grace-period processing might never reach the last
portion of srcu_advance_batches().  This is OK given the current
implementation, as the first portion, up to the try_check_zero()
following the srcu_flip() is sufficient to drive grace periods forward.
However, it has the unfortunate side-effect of making it impossible to
determine when a given grace period has ended, and it will be necessary
to efficiently trace ends of grace periods in order to efficiently handle
per-CPU SRCU callback lists.

This commit therefore adds states to the SRCU grace-period processing,
so that the end of a given SRCU grace period is marked by the transition
to the SRCU_STATE_DONE state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney
c6e56f593a srcu: Push srcu_advance_batches() fastpath into common case
This commit simplifies the SRCU state machine by pushing the
srcu_advance_batches() idle-SRCU fastpath into the common case.  This is
done by giving srcu_reschedule() a delay parameter, which is zero in
the call from srcu_advance_batches().

This commit is a step towards numbering callbacks in order to
efficiently handle per-CPU callback lists.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:19 -07:00
Dmitry Vyukov
f010ed82c7 rcu: Fix warning in rcu_seq_end()
The rcu_seq_end() function increments seq signifying completion
of a grace period, after that checks that the seq is even and wakes
_synchronize_rcu_expedited().  The _synchronize_rcu_expedited() function
uses wait_event() to wait for even seq.  The problem is that wait_event()
can return as soon as seq becomes even without waiting for the wakeup.
In such case the warning in rcu_seq_end() can falsely fire if the next
expedited grace period starts before the check.

Check that seq has good value before incrementing it.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: syzkaller@googlegroups.com
Cc: linux-kernel@vger.kernel.org
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: josh@joshtriplett.org
Cc: jiangshanlai@gmail.com
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

---

syzkaller-triggered warning:

WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events wait_rcu_exp_gp
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 panic+0x1fb/0x412 kernel/panic.c:179
 __warn+0x1c4/0x1e0 kernel/panic.c:540
 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
 rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
 rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
 rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
 wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
 process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
 worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
 kthread+0x326/0x3f0 kernel/kthread.c:227
 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
---
2017-04-18 11:38:19 -07:00
Paul E. McKenney
3c345825c8 rcu: Expedited wakeups need to be fully ordered
Expedited grace periods use workqueue handlers that wake up the requesters,
but there is no lock mediating this wakeup.  Therefore, memory barriers
are required to ensure that the handler's memory references are seen by
all to occur before synchronize_*_expedited() returns to its caller.
Possibly detected by syzkaller.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:19 -07:00
Paul E. McKenney
2e8c28c2dd srcu: Move rcu_seq_start() and friends to rcu.h
This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(),
and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h.
This will allow SRCU to use these functions, which in turn will
allow SRCU to move from a single global callback queue to a
per-CPU callback queue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:19 -07:00
Paul E. McKenney
bdcabf4c7d rcu: Add single-element dequeue functions to rcu_segcblist
This commit adds single-element dequeue functions to rcu_segcblist.
These are less efficient than using the extract and insert functions,
but allow more precise debugging code.  These functions are thus
expected to be used only in debug builds, for example, CONFIG_PROVE_RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:19 -07:00
Paul E. McKenney
b5eaeaa509 srcu: Allow early boot use of synchronize_srcu()
This commit checks for pre-scheduler state, and if that early in the
boot process, synchronize_srcu() and friends are no-ops.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:18 -07:00
Paul E. McKenney
900b1028ec srcu: Allow SRCU to access rcu_scheduler_active
This is primarily a code-movement commit in preparation for allowing
SRCU to handle early-boot SRCU grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:18 -07:00
Paul E. McKenney
15fecf89e4 srcu: Abstract multi-tail callback list handling
RCU has only one multi-tail callback list, which is implemented via
the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the
rcu_data structure, and whose operations are open-code throughout the
Tree RCU implementation.  This has been more or less OK in the past,
but upcoming callback-list optimizations in SRCU could really use
a multi-tail callback list there as well.

This commit therefore abstracts the multi-tail callback list handling
into a new kernel/rcu/rcu_segcblist.h file, and uses this new API.
The simple head-and-tail pointer callback list is also abstracted and
applied everywhere except for the NOCB callback-offload lists.  (Yes,
the plan is to apply them there as well, but this commit is already
bigger than would be good.)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:18 -07:00
Paul E. McKenney
b8c78d3afc rcu: Default RCU_FANOUT_LEAF to 16 unless explicitly changed
If the RCU_EXPERT Kconfig option is not set (the default), then the
RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause
the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems
and 64 on 64-bit systems.  This can result in excessive lock contention.
This commit therefore changes the computation of the leaf-level rcu_node
tree fanout so that the result will be 16 unless an explicit Kconfig or
kernel-boot setting says otherwise.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:18 -07:00
Paul E. McKenney
9226b10d78 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
taking various actions to supply RCU with quiescent states, depending
on the outcomes of the various checks.  This is a bit much for scheduling
fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
the rcu_dynticks structure that acts as a global guard for these checks.
Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
check the ->rcu_urgent_qs field, find it false, and simply return.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2017-04-18 11:38:18 -07:00
Paul E. McKenney
0f9be8cabb rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
that one of them still needs a quiescent state before doing an expensive
atomic operation on the ->dynticks counter.  However, this check reduces
overhead only after a rare race condition, and increases complexity.  This
commit therefore removes the scan and the mechanism enabling the scan.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:18 -07:00
Paul E. McKenney
9577df9a31 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
The rcu_qs_ctr variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:17 -07:00
Paul E. McKenney
abb06b9948 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:17 -07:00
Paul E. McKenney
88a4976d0e rcu: Semicolon inside RCU_TRACE() for tree.c
The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
where "statement" is a local-variable declaration, as it can leave a
misplaced ";" in the source code.  This commit therefore converts these
to "RCU_TRACE(statement;)", which avoids the misplaced ";".

Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:17 -07:00
Paul E. McKenney
6c8c148542 rcu: Semicolon inside RCU_TRACE() for Tiny RCU
The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
where "statement" is a local-variable declaration, as it can leave a
misplaced ";" in the source code.  This commit therefore converts these
to "RCU_TRACE(statement;)", which avoids the misplaced ";".

Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:17 -07:00
Paul E. McKenney
dffd06a756 rcu: Semicolon inside RCU_TRACE() for rcu.h
The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
where "statement" is a local-variable declaration, as it can leave a
misplaced ";" in the source code.  This commit therefore converts these
to "RCU_TRACE(statement;)", which avoids the misplaced ";".

Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:17 -07:00
Paul E. McKenney
15c68f7f63 srcu: Check for tardy grace-period activity in cleanup_srcu_struct()
Users of SRCU are obliged to complete all grace-period activity before
invoking cleanup_srcu_struct().  This means that all calls to either
synchronize_srcu() or synchronize_srcu_expedited() must have returned,
and all calls to call_srcu() must have returned, and the last call to
call_srcu() must have been followed by a call to srcu_barrier().
Furthermore, the caller must have done something to prevent any
further calls to synchronize_srcu(), synchronize_srcu_expedited(),
and call_srcu().

Therefore, if there has ever been an invocation of call_srcu() on
the srcu_struct in question, the sequence of events must be as
follows:

1.  Prevent any further calls to call_srcu().
2.  Wait for any pre-existing call_srcu() invocations to return.
3.  Invoke srcu_barrier().
4.  It is now safe to invoke cleanup_srcu_struct().

On the other hand, if there has ever been a call to synchronize_srcu()
or synchronize_srcu_expedited(), the sequence of events must be as
follows:

1.  Prevent any further calls to synchronize_srcu() or
    synchronize_srcu_expedited().
2.  Wait for any pre-existing synchronize_srcu() or
    synchronize_srcu_expedited() invocations to return.
3.  It is now safe to invoke cleanup_srcu_struct().

If there have been calls to all both types of functions (call_srcu()
and either of synchronize_srcu() and synchronize_srcu_expedited()), then
the caller must do the first three steps of the call_srcu() procedure
above and the first two steps of the synchronize_s*() procedure above,
and only then invoke cleanup_srcu_struct().

Note that cleanup_srcu_struct() does some probabilistic checks
for the caller failing to follow these procedures, in which case
cleanup_srcu_struct() does WARN_ON() and avoids freeing the per-CPU
structures associated with the specified srcu_struct structure.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-04-18 11:30:58 -07:00
Paul E. McKenney
cc985822a0 srcu: Consolidate batch checking into rcu_all_batches_empty()
The srcu_reschedule() function invokes rcu_batch_empty() on each of
the four rcu_batch structures in the srcu_struct in question twice.
Given that this check will also be needed in cleanup_srcu_struct(), this
commit consolidates these four checks into a new rcu_all_batches_empty()
function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-04-18 11:22:16 -07:00
Paul E. McKenney
b8c17e6664 rcu: Maintain special bits at bottom of ->dynticks counter
Currently, IPIs are used to force other CPUs to invalidate their TLBs
in response to a kernel virtual-memory mapping change.  This works, but
degrades both battery lifetime (for idle CPUs) and real-time response
(for nohz_full CPUs), and in addition results in unnecessary IPIs due to
the fact that CPUs executing in usermode are unaffected by stale kernel
mappings.  It would be better to cause a CPU executing in usermode to
wait until it is entering kernel mode to do the flush, first to avoid
interrupting usemode tasks and second to handle multiple flush requests
with a single flush in the case of a long-running user task.

This commit therefore reserves a bit at the bottom of the ->dynticks
counter, which is checked upon exit from extended quiescent states.
If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is
invoked, which, if not supplied, is an empty single-pass do-while loop.
If this bottom bit is set on -entry- to an extended quiescent state,
then a WARN_ON_ONCE() triggers.

This bottom bit may be set using a new rcu_eqs_special_set() function,
which returns true if the bit was set, or false if the CPU turned
out to not be in an extended quiescent state.  Please note that this
function refuses to set the bit for a non-nohz_full CPU when that CPU
is executing in usermode because usermode execution is tracked by RCU
as a dyntick-idle extended quiescent state only for nohz_full CPUs.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-04-18 11:19:22 -07:00
Steven Rostedt (VMware)
03ecd3f48e rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work
Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when
it needs to use rcu_read_lock() and friends. This is because tracing can
happen as RCU is about to enter user space, or about to go idle, and RCU
does not watch for RCU read side critical sections as it makes the
transition.

There is a small location within the RCU infrastructure that rcu_irq_enter()
itself will not work. If tracing were to occur in that section it will break
if it tries to use rcu_irq_enter().

Originally, this happens with the stack_tracer, because it will call
save_stack_trace when it encounters stack usage that is greater than any
stack usage it had encountered previously. There was a case where that
happened in the RCU section where rcu_irq_enter() did not work, and lockdep
complained loudly about it. To fix it, stack tracing added a call to be
disabled and RCU would disable stack tracing during the critical section
that rcu_irq_enter() was inoperable. This solution worked, but there are
other cases that use rcu_irq_enter() and it would be a good idea to let RCU
give a way to let others know that rcu_irq_enter() will not work. For
example, in trace events.

Another helpful aspect of this change is that it also moves the per cpu
variable called in the RCU critical section into a cache locale along with
other RCU per cpu variables used in that same location.

I'm keeping the stack_trace_disable() code, as that still could be used in
the future by places that really need to disable it. And since it's only a
static inline, it wont take up any kernel text if it is not used.

Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:22:03 -04:00
Paul E. McKenney
a278d47189 rcu: Fix dyntick-idle tracing
The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit()
(with my blessing) to allow the current _rcuidle alternative tracepoint
name to be dispensed with while still maintaining good performance.
Unfortunately, this causes RCU's dyntick-idle entry code's tracing to
appear to RCU like an interrupt that occurs where RCU is not designed
to handle interrupts.

This commit fixes this problem by moving the zeroing of ->dynticks_nesting
after the offending trace_rcu_dyntick() statement, which narrows the
window of vulnerability to a pair of adjacent statements that are now
marked with comments to that effect.

Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home
Link: http://lkml.kernel.org/r/20170405193928.GM1600@linux.vnet.ibm.com

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:21:57 -04:00
Ingo Molnar
b17b01533b sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Ingo Molnar
174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar
037741a6d4 sched/headers: Prepare for the removal of <linux/rtmutex.h> from <linux/sched.h>
Fix up missing #includes in other places that rely on sched.h doing that for them.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar
3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar
ae7e81c077 sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h>
We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
which will be used from a number of .c files.

Create empty placeholder header that maps to <linux/types.h>.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Ingo Molnar
f9411ebe3d rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h>
So rcupdate.h is a pretty complex header, in particular it includes
<linux/completion.h> which includes <linux/wait.h> - creating a
dependency that includes <linux/wait.h> in <linux/sched.h>,
which prevents the isolation of <linux/sched.h> from the derived
<linux/wait.h> header.

Solve part of the problem by decoupling rcupdate.h from completions:
this can be done by separating out the rcu_synchronize types and APIs,
and updating their usage sites.

Since this is a mostly RCU-internal types this will not just simplify
<linux/sched.h>'s dependencies, but will make all the hundreds of
.c files that include rcupdate.h but not completions or wait.h build
faster.

( For rcutiny this means that two dependent APIs have to be uninlined,
  but that shouldn't be much of a problem as they are rare variants. )

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:24 +01:00
Paul E. McKenney
31945aa9f1 Merge branches 'doc.2017.01.15b', 'dyntick.2017.01.23a', 'fixes.2017.01.23a', 'srcu.2017.01.25a' and 'torture.2017.01.15b' into HEAD
doc.2017.01.15b: Documentation updates
dyntick.2017.01.23a: Dyntick tracking consolidation
fixes.2017.01.23a: Miscellaneous fixes
srcu.2017.01.25a: SRCU rewrite, fixes, and verification
torture.2017.01.15b: Torture-test updates
2017-01-25 12:56:05 -08:00
Paul E. McKenney
7f554a3d05 srcu: Reduce probability of SRCU ->unlock_count[] counter overflow
Because there are no memory barriers between the srcu_flip() ->completed
increment and the summation of the read-side ->unlock_count[] counters,
both the compiler and the CPU can reorder the summation with the
->completed increment.  If the updater is preempted long enough during
this process, the read-side counters could overflow, resulting in a
too-short grace period.

This commit therefore adds a memory barrier just after the ->completed
increment, ensuring that if the summation misses an increment of
->unlock_count[] from __srcu_read_unlock(), the next __srcu_read_lock()
will see the new value of ->completed, thus bounding the number of
->unlock_count[] increments that can be missed to NR_CPUS.  The actual
overflow computation is more complex due to the possibility of nesting
of __srcu_read_lock().

Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-01-25 12:54:22 -08:00
Paul E. McKenney
d85b62f18d srcu: Force full grace-period ordering
If a process invokes synchronize_srcu(), is delayed just the right amount
of time, and thus does not sleep when waiting for the grace period to
complete, there is no ordering between the end of the grace period and
the code following the synchronize_srcu().  Similarly, there can be a
lack of ordering between the end of the SRCU grace period and callback
invocation.

This commit adds the necessary ordering.

Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]
2017-01-25 12:54:22 -08:00
Lance Roy
f2c4689640 srcu: Implement more-efficient reader counts
SRCU uses two per-cpu counters: a nesting counter to count the number of
active critical sections, and a sequence counter to ensure that the nesting
counters don't change while they are being added together in
srcu_readers_active_idx_check().

This patch instead uses per-cpu lock and unlock counters. Because both
counters only increase and srcu_readers_active_idx_check() reads the unlock
counter before the lock counter, this achieves the same end without having
to increment two different counters in srcu_read_lock(). This also saves a
smp_mb() in srcu_readers_active_idx_check().

Possible bug: There is no guarantee that the lock counter won't overflow
during srcu_readers_active_idx_check(), as there are no memory barriers
around srcu_flip() (see comment in srcu_readers_active_idx_check() for
details). However, this problem was already present before this patch.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Lance Roy <ldr709@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-01-25 12:53:20 -08:00
Paul E. McKenney
38d30b336c rcu: Adjust FQS offline checks for exact online-CPU detection
Commit 7ec99de36f ("rcu: Provide exact CPU-online tracking for RCU"),
as its title suggests, got rid of RCU's remaining CPU-hotplug timing
guesswork.  This commit therefore removes the one-jiffy kludge that was
used to paper over this guesswork.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:44:18 -08:00
Paul E. McKenney
3a19b46a5c rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead
Commit 4a81e8328d ("rcu: Reduce overhead of cond_resched() checks
for RCU") moved quiescent-state generation out of cond_resched()
and commit bde6c3aa99 ("rcu: Provide cond_resched_rcu_qs() to force
quiescent states in long loops") introduced cond_resched_rcu_qs(), and
commit 5cd37193ce ("rcu: Make cond_resched_rcu_qs() apply to normal RCU
flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently
polled by the RCU core state machine.

This frequent polling can increase grace-period rate, which in turn
increases grace-period overhead, which is visible in some benchmarks
(for example, the "open1" benchmark in Anton Blanchard's "will it scale"
suite).  This commit therefore reduces the rate at which rcu_qs_ctr
is polled by moving that polling into the force-quiescent-state (FQS)
machinery, and by further polling it only after the grace period has
been in effect for at least jiffies_till_sched_qs jiffies.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:44:18 -08:00
Paul E. McKenney
02a5c550b2 rcu: Abstract extended quiescent state determination
This commit is the fourth step towards full abstraction of all accesses
to the ->dynticks counter, implementing previously open-coded checks and
comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since()
functions.  This abstraction will ease changes to the ->dynticks counter
operation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:44:18 -08:00
Paul E. McKenney
2625d469ba rcu: Abstract dynticks extended quiescent state enter/exit operations
This commit is the third step towards full abstraction of all accesses
to the ->dynticks counter, implementing the previously open-coded atomic
add of 1 and entry checks in a new rcu_dynticks_eqs_enter() function, and
the same but with exit checks in a new rcu_dynticks_eqs_exit() function.
This abstraction will ease changes to the ->dynticks counter operation.

Note that this commit gets rid of the smp_mb__before_atomic() and the
smp_mb__after_atomic() calls that were previously present.  The reason
that this is OK from a memory-ordering perspective is that the atomic
operation is now atomic_add_return(), which, as a value-returning atomic,
guarantees full ordering.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fixed RCU_TRACE() statements added by this commit. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:42:43 -08:00
Paul E. McKenney
8dc79888a7 rcu: Add lockdep checks to synchronous expedited primitives
The non-expedited synchronize_*rcu() primitives have lockdep checks, but
their expedited counterparts lack these checks.  This commit therefore
adds these checks to the expedited synchronize_*rcu() primitives.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:14 -08:00
Paul E. McKenney
bb4e2c08bb rcu: Eliminate unused expedited_normal counter
Expedited grace periods no longer fall back to normal grace periods
in response to lock contention, given that expedited grace periods
now use the rcu_node tree so as to avoid contention.  This commit
therfore removes the expedited_normal counter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:14 -08:00
Paul E. McKenney
9831ce3bb4 rcu: Fix comment in rcu_organize_nocb_kthreads()
It used to be that the rcuo callback-offload kthreads were spawned
in rcu_organize_nocb_kthreads(), and the comment before the "for"
loop says as much.  However, this spawning has long since moved to
the CPU-hotplug code, so this commit fixes this comment.

Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:13 -08:00
Paul E. McKenney
fdbb9b315c rcu: Make rcu_cpu_starting() use its "cpu" argument
The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
incoming CPU's rcu_data structure.  This works for the boot CPU and for
all CPUs onlined after rcu_init() executes (during very early boot).
Currently, this is the full set of CPUs, so all is well.  But if
anyone ever parallelizes boot before rcu_init() time, it will fail.
This commit therefore substitutes the rcu_cpu_starting() function's
this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
(arguably) improving readability.

This commit inadvertently fixes a latent bug: If there ever had been
more than just the boot CPU online at rcu_init() time, the old code
would not initialize the non-boot CPUs, but rather would repeatedly
initialize the boot CPU.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:13 -08:00
Paul E. McKenney
09e2db37ec rcu: Add comment headers to expedited-grace-period counter functions
These functions (rcu_exp_gp_seq_start(), rcu_exp_gp_seq_end(),
rcu_exp_gp_seq_snap(), and rcu_exp_gp_seq_done() seemed too obvious
to comment when written, but not so much when being documented.
This commit therefore adds header comments to each of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:13 -08:00
Paul E. McKenney
630c7ed9ca rcu: Don't wake rcuc/X kthreads on NOCB CPUs
Chris Friesen notice that rcuc/X kthreads were consuming CPU even on
NOCB CPUs.  This makes no sense because the only purpose or these
kthreads is to invoke normal (non-offloaded) callbacks, of which there
will never be any on NOCB CPUs.  This problem was due to a bug in
cpu_has_callbacks_ready_to_invoke(), which should have been checking
->nxttail[RCU_NEXT_TAIL] for NULL, but which was instead (incorrectly)
checking ->nxttail[RCU_DONE_TAIL].  Because ->nxttail[RCU_DONE_TAIL] is
never NULL, the only effect is to cause the rcuc/X kthread to execute
when it should not do so.

This commit therefore checks ->nxttail[RCU_NEXT_TAIL], which is NULL
for NOCB CPUs.

Reported-by: Chris Friesen <chris.friesen@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:13 -08:00
Paul E. McKenney
7aa92230c9 rcu: Once again use NMI-based stack traces in stall warnings
This commit is for all intents and purposes a revert of bc1dce514e
("rcu: Don't use NMIs to dump other CPUs' stacks").  The reason to suppose
that this can now safely be reverted is the presence of 42a0bb3f71
("printk/nmi: generic solution for safe printk in NMI"), which is said
to have made NMI-based stack dumps safe.

However, this reversion keeps one nice property of bc1dce514e
("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that
only those CPUs blocking the grace period are dumped.  The new
trigger_single_cpu_backtrace() is used to make this happen, as
suggested by Josh Poimboeuf.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:12 -08:00
Paul E. McKenney
b201fa6737 rcu: Remove short-term CPU kicking
Commit 4914950aaa ("rcu: Stop treating in-kernel CPU-bound workloads
as errors") added a (relatively) short-timeout call to resched_cpu().
This was inspired by as issue that was fixed by b7e7ade34e ("sched/core:
Fix remote wakeups").  But given that this issue was fixed, it is time
for the current commit to remove this call to resched_cpu().

Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:12 -08:00
Paul E. McKenney
28053bc72c rcu: Add long-term CPU kicking
This commit prepares for the removal of short-term CPU kicking (in a
subsequent commit).  It does so by starting to invoke resched_cpu()
for each holdout at each force-quiescent-state interval that is more
than halfway through the stall-warning interval.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:33:02 -08:00
Tobias Klauser
94060d2235 rcu: Remove unused but set variable
Since commit 7ec99de36f ("rcu: Provide exact CPU-online tracking for
RCU"), the variable mask in rcu_init_percpu_data is set but no longer
used. Remove it to fix the following warning when building with 'W=1':

  kernel/rcu/tree.c: In function ‘rcu_init_percpu_data’:
  kernel/rcu/tree.c:3765:16: warning: variable ‘mask’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:32:35 -08:00
Paul E. McKenney
2535db485c rcu: Remove unneeded rcu_process_callbacks() declarations
The declarations of __rcu_process_callbacks() and rcu_process_callbacks()
are not needed, as the definition of both of these functions appear before
any uses.  This commit therefore removes both declarations.

Reported-by: "Ahmed, Iftekhar" <ahmedi@oregonstate.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:32:29 -08:00
Byungchul Park
c4402b27f1 rcu: Only dump stalled-tasks stacks if there was a real stall
The print_other_cpu_stall() function currently unconditionally invokes
rcu_print_detail_task_stall().  This is OK because if there was a stall
sufficient to cause print_other_cpu_stall() to be invoked, that stall
is very likely to persist through the entire print_other_cpu_stall()
execution.  However, if the stall did not persist, the variable ndetected
will be zero, and that variable is already tested in an "if" statement.
Therefore, this commit moves the call to rcu_print_detail_task_stall()
under that pre-existing "if" to improve readability, with a very rare
reduction in overhead.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ paulmck: Reworked commit log. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:32:22 -08:00
Sebastian Andrzej Siewior
7c6094db59 rcu: update: Make RCU_EXPEDITE_BOOT be the default
RCU_EXPEDITE_BOOT should speed up the boot process by enforcing
synchronize_rcu_expedited() instead of synchronize_rcu() during the boot
process. There should be no reason why one does not want this and there
is no need worry about real time latency at this point.
Therefore make it default.

Note that users wishing to avoid expediting entirely, for example when
bringing up new hardware possibly having flaky IPIs, can use the
rcu_normal boot parameter to override boot-time expediting.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Reworded commit log. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-16 16:56:39 -08:00
Paul E. McKenney
8b2f63ab05 rcu: Abstract the dynticks snapshot operation
This commit is the second step towards full abstraction of all accesses to
the ->dynticks counter, implementing the previously open-coded atomic
add of zero in a new rcu_dynticks_snap() function.  This abstraction will
ease changes o the ->dynticks counter operation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-16 15:47:53 -08:00
Paul E. McKenney
6563de9d6f rcu: Abstract the dynticks momentary-idle operation
This commit is the first step towards full abstraction of all accesses to
the ->dynticks counter, implementing the previously open-coded atomic add
of two in a new rcu_dynticks_momentary_idle() function.  This abstraction
will ease changes to the ->dynticks counter operation.

Note that this commit gets rid of the smp_mb__before_atomic() and the
smp_mb__after_atomic() calls that were previously present.  The reason
that this is OK from a memory-ordering perspective is that the atomic
operation is now atomic_add_return(), which, as a value-returning atomic,
guarantees full ordering.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-16 14:59:54 -08:00
Paul E. McKenney
52d7e48b86 rcu: Narrow early boot window of illegal synchronous grace periods
The current preemptible RCU implementation goes through three phases
during bootup.  In the first phase, there is only one CPU that is running
with preemption disabled, so that a no-op is a synchronous grace period.
In the second mid-boot phase, the scheduler is running, but RCU has
not yet gotten its kthreads spawned (and, for expedited grace periods,
workqueues are not yet running.  During this time, any attempt to do
a synchronous grace period will hang the system (or complain bitterly,
depending).  In the third and final phase, RCU is fully operational and
everything works normally.

This has been OK for some time, but there has recently been some
synchronous grace periods showing up during the second mid-boot phase.
This code worked "by accident" for awhile, but started failing as soon
as expedited RCU grace periods switched over to workqueues in commit
8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue").
Note that the code was buggy even before this commit, as it was subject
to failure on real-time systems that forced all expedited grace periods
to run as normal grace periods (for example, using the rcu_normal ksysfs
parameter).  The callchain from the failure case is as follows:

early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited

The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
which caused the code to try using workqueues before they were
initialized, which did not go well.

This commit therefore reworks RCU to permit synchronous grace periods
to proceed during this mid-boot phase.  This commit is therefore a
fix to a regression introduced in v4.9, and is therefore being put
forward post-merge-window in v4.10.

This commit sets a flag from the existing rcu_scheduler_starting()
function which causes all synchronous grace periods to take the expedited
path.  The expedited path now checks this flag, using the requesting task
to drive the expedited grace period forward during the mid-boot phase.
Finally, this flag is updated by a core_initcall() function named
rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.

Note that this arrangement assumes that tasks are not sent POSIX signals
(or anything similar) from the time that the first task is spawned
through core_initcall() time.

Fixes: 8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue")
Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Stan Kain <stan.kain@gmail.com>
Tested-by: Ivan <waffolz@hotmail.com>
Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com>
Tested-by: Bruno Pesavento <bpesavento@infinito.it>
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Frederic Bezies <fredbezies@gmail.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
2017-01-14 21:23:48 -08:00
Paul E. McKenney
f466ae66fa rcu: Remove cond_resched() from Tiny synchronize_sched()
It is now legal to invoke synchronize_sched() at early boot, which causes
Tiny RCU's synchronize_sched() to emit spurious splats.  This commit
therefore removes the cond_resched() from Tiny RCU's synchronize_sched().

Fixes: 8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue")
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
2017-01-14 21:22:20 -08:00
Paul E. McKenney
aa3e0bf1aa rcu: Don't kick unless grace period or request
The current code can result in spurious kicks when there are no grace
periods in progress and no grace-period-related requests.  This is
sort of OK for a diagnostic aid, but the resulting ftrace-dump messages
in dmesg are annoying.  This commit therefore avoids spurious kicks
in the common case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-11-14 10:46:31 -08:00
Paul E. McKenney
0742ac3e2f rcu: Make expedited grace periods recheck dyntick idle state
Expedited grace periods check dyntick-idle state, and avoid sending
IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
kernels, nohz_full CPUs.  However, the kernel has been observed checking
a CPU while it was non-idle, but sending the IPI after it has gone
idle.  This commit therefore rechecks idle state immediately before
sending the IPI, refraining from IPIing CPUs that have since gone idle.

Reported-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-11-14 10:46:31 -08:00
Paul E. McKenney
d0af39e89e torture: Trace long read-side delays
Although rcutorture will occasionally do a 50-millisecond grace-period
delay, these delays are quite rare.  And rightly so, because otherwise
the read rate would be quite low.  Thie means that it can be important
to identify whether or not a given run contained a long-delay read.
This commit therefore inserts a trace_rcu_torture_read() event to flag
runs containing long delays.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-11-14 10:46:30 -08:00
Paul E. McKenney
1f21b50b77 rcu: Remove obsolete comment from __call_rcu()
The __call_rcu() comment about opportunistically noting grace period
beginnings and endings is obsolete.  RCU still does such opportunistic
noting, but in __call_rcu_core() rather than __call_rcu(), and there
already is an appropriate comment in __call_rcu_core().  This commit
therefore removes the obsolete comment.

Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-11-14 10:46:19 -08:00
Paul E. McKenney
5403d367a7 rcu: Remove obsolete rcu_check_callbacks() header comment
In the deep past, rcu_check_callbacks() was only invoked if rcu_pending()
returned true.  Which was fine, but these days rcu_check_callbacks()
is invoked unconditionally.  This commit therefore removes the obsolete
sentence from the header comment.

Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-11-14 10:46:14 -08:00
Paul E. McKenney
b8f2ed5384 rcu: Tighten up __call_rcu() rcu_head alignment check
Commit 720abae3d6 ("rcu: force alignment on struct
callback_head/rcu_head") forced the rcu_head (AKA callback_head)
structure's alignment to pointer size, that is, to 4-byte boundaries on
32-bit systems and to 8-byte boundaries on 64-bit systems.  This
commit therefore checks for this same alignment in __call_rcu(),
which used to contain a looser check for two-byte alignment.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-11-14 10:46:08 -08:00
Linus Torvalds
9ffc66941d This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot time as
 possible, hoping to capitalize on any possible variation in CPU operation
 (due to runtime data differences, hardware differences, SMP ordering,
 thermal timing variation, cache behavior, etc).
 
 At the very least, this plugin is a much more comprehensive example for
 how to manipulate kernel code using the gcc plugin internals.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJX/BAFAAoJEIly9N/cbcAmzW8QALFbCs7EFFkML+M/M/9d8zEk
 1QbUs/z8covJTTT1PjSdw7JUrAMulI3S00owpcQVd/PcWjRPU80QwfsXBgIB0tvC
 Kub2qxn6Oaf+kTB646zwjFgjdCecw/USJP+90nfcu2+LCnE8ReclKd1aUee+Bnhm
 iDEUyH2ONIoWq6ta2Z9sA7+E4y2ZgOlmW0iga3Mnf+OcPtLE70fWPoe5E4g9DpYk
 B+kiPDrD9ql5zsHaEnKG1ldjiAZ1L6Grk8rGgLEXmbOWtTOFmnUhR+raK5NA/RCw
 MXNuyPay5aYPpqDHFm+OuaWQAiPWfPNWM3Ett4k0d9ZWLixTcD1z68AciExwk7aW
 SEA8b1Jwbg05ZNYM7NJB6t6suKC4dGPxWzKFOhmBicsh2Ni5f+Az0BQL6q8/V8/4
 8UEqDLuFlPJBB50A3z5ngCVeYJKZe8Bg/Swb4zXl6mIzZ9darLzXDEV6ystfPXxJ
 e1AdBb41WC+O2SAI4l64yyeswkGo3Iw2oMbXG5jmFl6wY/xGp7dWxw7gfnhC6oOh
 afOT54p2OUDfSAbJaO0IHliWoIdmE5ZYdVYVU9Ek+uWyaIwcXhNmqRg+Uqmo32jf
 cP5J9x2kF3RdOcbSHXmFp++fU+wkhBtEcjkNpvkjpi4xyA47IWS7lrVBBebrCq9R
 pa/A7CNQwibIV6YD8+/p
 =1dUK
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc plugins update from Kees Cook:
 "This adds a new gcc plugin named "latent_entropy". It is designed to
  extract as much possible uncertainty from a running system at boot
  time as possible, hoping to capitalize on any possible variation in
  CPU operation (due to runtime data differences, hardware differences,
  SMP ordering, thermal timing variation, cache behavior, etc).

  At the very least, this plugin is a much more comprehensive example
  for how to manipulate kernel code using the gcc plugin internals"

* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  latent_entropy: Mark functions with __latent_entropy
  gcc-plugins: Add latent_entropy plugin
2016-10-15 10:03:15 -07:00
Emese Revfy
0766f788eb latent_entropy: Mark functions with __latent_entropy
The __latent_entropy gcc attribute can be used only on functions and
variables.  If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents.  The variable must
be an integer, an integer array type or a structure with integer fields.

These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.

Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: expanded commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-10-10 14:51:45 -07:00
Linus Torvalds
00bcf5cdd6 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - rwsem micro-optimizations (Davidlohr Bueso)

   - Improve the implementation and optimize the performance of
     percpu-rwsems. (Peter Zijlstra.)

   - Convert all lglock users to better facilities such as percpu-rwsems
     or percpu-spinlocks and remove lglocks. (Peter Zijlstra)

   - Remove the ticket (spin)lock implementation. (Peter Zijlstra)

   - Korean translation of memory-barriers.txt and related fixes to the
     English document. (SeongJae Park)

   - misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/cmpxchg, locking/atomics: Remove superfluous definitions
  x86, locking/spinlocks: Remove ticket (spin)lock implementation
  locking/lglock: Remove lglock implementation
  stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
  fs/locks: Use percpu_down_read_preempt_disable()
  locking/percpu-rwsem: Add down_read_preempt_disable()
  fs/locks: Replace lg_local with a per-cpu spinlock
  fs/locks: Replace lg_global with a percpu-rwsem
  locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
  locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()
  locking/rwsem, x86: Drop a bogus cc clobber
  futex: Add some more function commentry
  locking/hung_task: Show all locks
  locking/rwsem: Scan the wait_list for readers only once
  locking/rwsem: Remove a few useless comments
  locking/rwsem: Return void in __rwsem_mark_wake()
  locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
  locking/Documentation: Add Korean translation
  locking/Documentation: Fix a typo of example result
  locking/Documentation: Fix wrong section reference
  ...
2016-10-03 12:15:00 -07:00
Paul E. McKenney
d74b62bc32 Merge branches 'doc.2016.08.22c', 'exp.2016.08.22c', 'fixes.2016.09.14a', 'hotplug.2016.08.22c' and 'torture.2016.08.22c' into HEAD
doc.2016.08.22c: Documentation updates
exp.2016.08.22c: Expedited grace-period updates
fixes.2016.09.14a: Miscellaneous fixes
hotplug.2016.08.22c: CPU-hotplug changes
torture.2016.08.22c: Torture-test changes
2016-09-14 12:58:49 -07:00
SeongJae Park
a56fefa260 rcuperf: Consistently insert space between flag and message
A few rcuperf dmesg output messages have no space between the flag and
the start of the message. In contrast, every other messages consistently
supplies a single space.  This difference makes rcuperf dmesg output
hard to read and to mechanically parse.  This commit therefore fixes
this problem by modifying a pr_alert() call and PERFOUT_STRING() macro
function to provide that single space.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 10:06:16 -07:00
SeongJae Park
472213a675 rcutorture: Print out barrier error as document says
Tests for rcu_barrier() were introduced by commit fae4b54f28 ("rcu:
Introduce rcutorture testing for rcu_barrier()").  This commit updated
the documentation to say that the "rtbe" field in rcutorture's dmesg
output indicates test failure.  However, the code was not updated, only
the documentation.  This commit therefore updates the code to match the
updated documentation.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 10:03:37 -07:00
Paul E. McKenney
4ffa669924 torture: Add task state to writer-task stall printk()s
This commit adds a dump of the scheduler state for stalled rcutorture
writer tasks.  This addition provides yet more debug for the intermittent
"failures to proceed", where grace periods move ahead but the rcutorture
writer tasks fail to do so.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 10:02:59 -07:00
Sebastian Andrzej Siewior
0ffd374b22 rcutorture: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:52:12 -07:00
Paul E. McKenney
7ec99de36f rcu: Provide exact CPU-online tracking for RCU
Up to now, RCU has assumed that the CPU-online process makes it from
CPU_UP_PREPARE to set_cpu_online() within one jiffy.  Given the recent
rise of virtualized environments, this assumption is very clearly
obsolete.  Failing to meet this deadline can result in RCU paying
attention to an incoming CPU for one jiffy, then ignoring it until the
grace period following the one in which that CPU sets itself online.
This situation might prove to be fatally disappointing to any RCU
read-side critical sections that had the misfortune to execute during
the time in which RCU was ignoring the slow-to-come-online CPU.

This commit therefore updates RCU's internal CPU state-tracking
information at notify_cpu_starting() time, thus providing RCU with
an exact transition of the CPU's state from offline to online.

Note that this means that incoming CPUs must not use RCU read-side
critical section (other than those of SRCU) until notify_cpu_starting()
time.  Note also that the CPU_STARTING notifiers -are- allowed to use
RCU read-side critical sections.  (Of course, CPU-hotplug notifiers are
rapidly becoming obsolete, so you need to act fast!)

If a given architecture or CPU family needs to use RCU read-side
critical sections earlier, the call to rcu_cpu_starting() from
notify_cpu_starting() will need to be architecture-specific, with
architectures that need early use being required to hand-place
the call to rcu_cpu_starting() at some point preceding the call to
notify_cpu_starting().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:36:57 -07:00
Paul E. McKenney
3563a438f1 rcu: Avoid redundant quiescent-state chasing
Currently, __note_gp_changes() checks to see if the CPU has slept through
multiple grace periods.  If it has, it resynchronizes that CPU's view
of the grace-period state, which includes whether or not the current
grace period needs a quiescent state from this CPU.  The fact of this
need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm
and rdp->core_needs_qs.  The former tells RCU's context-switch code to
go get a quiescent state and the latter says that it needs to be reported.
The current code unconditionally sets the former to true, but correctly
sets the latter.

This does not result in failures, but it does unnecessarily increase
the amount of work done on average at context-switch time.  This commit
therefore correctly sets both fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:35:57 -07:00
Paul Gortmaker
e77b704125 rcu: Don't use modular infrastructure in non-modular code
The Kconfig currently controlling compilation of tree.c is:

init/Kconfig:config TREE_RCU
init/Kconfig:   bool

...and update.c and sync.c are "obj-y" meaning that none are ever
built as a module by anyone.

Since MODULE_ALIAS is a no-op for non-modular code, we can remove
them from these files.

We leave moduleparam.h behind since the files instantiate some boot
time configuration parameters with module_param() still.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:35:27 -07:00
Jisheng Zhang
94d4477673 rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads
Commit abedf8e241 ("rcu: Use simple wait queues where possible in
rcutree") converts Tree RCU's wait queues to simple wait queues,
but it incorrectly reverts the commit 2aa792e6fa ("rcu: Use
rcu_gp_kthread_wake() to wake up grace period kthreads").  This can
result in redundant self-wakeups.

This commit therefore replaces the simple wait-queue wakeups with
rcu_gp_kthread_wake(), thus avoiding the redundant wakeups.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:33:46 -07:00
Paul E. McKenney
385c859f67 rcu: Use RCU's online-CPU state for expedited IPI retry
This commit improves the accuracy of the interaction between CPU hotplug
operations and RCU's expedited grace periods by using RCU's online-CPU
state to determine when failed IPIs should be retried.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:30:42 -07:00
Paul E. McKenney
98834b8378 rcu: Exclude RCU-offline CPUs from expedited grace periods
The expedited RCU grace periods currently rely on a failure indication
from smp_call_function_single() to determine that a given CPU is offline.
This works after a fashion, but is more contorted and less precise than
relying on RCU's internal state.  This commit therefore takes a first
step towards relying on internal state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:30:42 -07:00
Paul E. McKenney
24a6cff286 rcu: Make expedited RCU CPU stall warnings respond to controls
The expedited RCU CPU stall warnings currently responds to neither the
panic_on_rcu_stall sysctl setting nor the rcupdate.rcu_cpu_stall_suppress
kernel boot parameter.  This commit therefore updates the expedited code
to respond to these two controls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:30:26 -07:00
Paul E. McKenney
908d2c1fd1 rcu: Stop disabling expedited RCU CPU stall warnings
Now that RCU expedited grace periods are always driven by a workqueue,
there is no need to account for signal reception, and thus no need
to disable expedited RCU CPU stall warnings due to signal reception.
This commit therefore removes the signal-reception checks, leaving a
WARN_ON() to catch possible future bugs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:30:26 -07:00
Paul E. McKenney
8b355e3bc1 rcu: Drive expedited grace periods from workqueue
The current implementation of expedited grace periods has the user
task drive the grace period.  This works, but has downsides: (1) The
user task must awaken tasks piggybacking on this grace period, which
can result in latencies rivaling that of the grace period itself, and
(2) User tasks can receive signals, which interfere with RCU CPU stall
warnings.

This commit therefore uses workqueues to drive the grace periods, so
that the user task need not do the awakening.  A subsequent commit
will remove the now-unnecessary code allowing for signals.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:30:25 -07:00
Paul E. McKenney
f7b8eb847e rcu: Consolidate expedited grace period machinery
The functions synchronize_rcu_expedited() and synchronize_sched_expedited()
have nearly identical code.  This commit therefore consolidates this code
into a new _synchronize_rcu_expedited() function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:30:11 -07:00
Ding Tianhong
bedc196915 rcu: Fix soft lockup for rcu_nocb_kthread
Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 07:53:20 -07:00
Peter Zijlstra
3942a9bd7b locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.

The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.

Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit:

  80127a3968 ("locking/percpu-rwsem: Optimize readers and reduce global impact").

We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.

Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Colin Cross <ccross@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Link: http://lkml.kernel.org/r/20160811165413.GA22807@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 15:36:59 +02:00
Peter Zijlstra
80127a3968 locking/percpu-rwsem: Optimize readers and reduce global impact
Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.

This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.

This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Fixed modular build. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:34:01 +02:00
Linus Torvalds
a6408f6cb6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the next part of the hotplug rework.

   - Convert all notifiers with a priority assigned

   - Convert all CPU_STARTING/DYING notifiers

     The final removal of the STARTING/DYING infrastructure will happen
     when the merge window closes.

  Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  timers/core: Correct callback order during CPU hot plug
  leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
  powerpc/numa: Convert to hotplug state machine
  arm/perf: Fix hotplug state machine conversion
  irqchip/armada: Avoid unused function warnings
  ARC/time: Convert to hotplug state machine
  clocksource/atlas7: Convert to hotplug state machine
  clocksource/armada-370-xp: Convert to hotplug state machine
  clocksource/exynos_mct: Convert to hotplug state machine
  clocksource/arm_global_timer: Convert to hotplug state machine
  rcu: Convert rcutree to hotplug state machine
  KVM/arm/arm64/vgic-new: Convert to hotplug state machine
  smp/cfd: Convert core to hotplug state machine
  x86/x2apic: Convert to CPU hotplug state machine
  profile: Convert to hotplug state machine
  timers/core: Convert to hotplug state machine
  hrtimer: Convert to hotplug state machine
  x86/tboot: Convert to hotplug state machine
  arm64/armv8 deprecated: Convert to hotplug state machine
  hwtracing/coresight-etm4x: Convert to hotplug state machine
  ...
2016-07-29 13:55:30 -07:00
Linus Torvalds
c86ad14d30 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The locking tree was busier in this cycle than the usual pattern - a
  couple of major projects happened to coincide.

  The main changes are:

   - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
     across all SMP architectures (Peter Zijlstra)

   - add atomic_fetch_{inc/dec}() as well, using the generic primitives
     (Davidlohr Bueso)

   - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
     Waiman Long)

   - optimize smp_cond_load_acquire() on arm64 and implement LSE based
     atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
     on arm64 (Will Deacon)

   - introduce smp_acquire__after_ctrl_dep() and fix various barrier
     mis-uses and bugs (Peter Zijlstra)

   - after discovering ancient spin_unlock_wait() barrier bugs in its
     implementation and usage, strengthen its semantics and update/fix
     usage sites (Peter Zijlstra)

   - optimize mutex_trylock() fastpath (Peter Zijlstra)

   - ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
  locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
  locking/static_keys: Fix non static symbol Sparse warning
  locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
  locking/atomic, arch/tile: Fix tilepro build
  locking/atomic, arch/m68k: Remove comment
  locking/atomic, arch/arc: Fix build
  locking/Documentation: Clarify limited control-dependency scope
  locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
  locking/atomic, arch/mips: Convert to _relaxed atomics
  locking/atomic, arch/alpha: Convert to _relaxed atomics
  locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
  locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
  locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
  locking/atomic: Fix atomic64_relaxed() bits
  locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  ...
2016-07-25 12:41:29 -07:00
Thomas Gleixner
4df8374254 rcu: Convert rcutree to hotplug state machine
Straight forward conversion to the state machine. Though the question arises
whether this needs really all these state transitions to work.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.982013161@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:41:44 +02:00
Paul E. McKenney
4d03754f04 Merge branches 'doc.2016.06.15a', 'fixes.2016.06.15b' and 'torture.2016.06.14a' into HEAD
doc.2016.06.15a: Documentation updates
fixes.2016.06.15b: Documentation updates
torture.2016.06.14a: Documentation updates
2016-06-15 16:58:03 -07:00
Mark Rutland
bc75e99983 rcu: Correctly handle sparse possible cpus
In many cases in the RCU tree code, we iterate over the set of cpus for
a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
per-cpu data for each cpu in this range. However, if the set of possible
cpus is sparse, some cpus described in this range are not possible, and
thus no per-cpu region will have been allocated (or initialised) for
them by the generic percpu code.

Erroneous accesses to a per-cpu area for these !possible cpus may fault
or may hit other data depending on the addressed generated when the
erroneous per cpu offset is applied. In practice, both cases have been
observed on arm64 hardware (the former being silent, but detectable with
additional patches).

To avoid issues resulting from this, we must iterate over the set of
*possible* cpus for a given leaf node. This patch add a new helper,
for_each_leaf_node_possible_cpu, to enable this. As iteration is often
intertwined with rcu_node local bitmask manipulation, a new
leaf_node_cpu_bit helper is added to make this simpler and more
consistent. The RCU tree code is made to use both of these where
appropriate.

Without this patch, running reboot at a shell can result in an oops
like:

[ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
[ 3369.083881] pgd = ffffffc3ecdda000
[ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
[ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 3369.101781] Modules linked in:
[ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3
[ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
[ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
[ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5
[ 3369.146860] sp : ffffffc3eb9435a0
[ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
[ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
[ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
[ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
[ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
[ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
[ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
[ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
[ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
[ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
[ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
[ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
[ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
[ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
[ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
...
[ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8
[ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30
[ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298
[ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0
[ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40
[ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158
[ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60
[ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918
[ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8
[ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220
[ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8
[ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40
[ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238
[ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358
[ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30
[ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280
[ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88
[ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20
[ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Dennis Chen <dennis.chen@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15 16:00:05 -07:00
Daniel Bristot de Oliveira
088e9d253d rcu: sysctl: Panic on RCU Stall
It is not always easy to determine the cause of an RCU stall just by
analysing the RCU stall messages, mainly when the problem is caused
by the indirect starvation of rcu threads. For example, when preempt_rcu
is not awakened due to the starvation of a timer softirq.

We have been hard coding panic() in the RCU stall functions for
some time while testing the kernel-rt. But this is not possible in
some scenarios, like when supporting customers.

This patch implements the sysctl kernel.panic_on_rcu_stall. If
set to 1, the system will panic() when an RCU stall takes place,
enabling the capture of a vmcore. The vmcore provides a way to analyze
all kernel/tasks states, helping out to point to the culprit and the
solution for the stall.

The kernel.panic_on_rcu_stall sysctl is disabled by default.

Changes from v1:
- Fixed a typo in the git log
- The if(sysctl_panic_on_rcu_stall) panic() is in a static function
- Fixed the CONFIG_TINY_RCU compilation issue
- The var sysctl_panic_on_rcu_stall is now __read_mostly

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15 16:00:05 -07:00
Paul E. McKenney
aab057382c rcu: Fix a typo in a comment
In the area in hot pursuit of a bug, so might as well clean it up.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15 15:59:10 -07:00
Paul E. McKenney
4929c913bd rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
Currently, if the very first call to call_rcu_tasks() has irqs disabled,
it will create the rcu_tasks_kthread with irqs disabled, which will
result in a splat in the memory allocator, which kthread_run() invokes
with the expectation that irqs are enabled.

This commit fixes this problem by deferring kthread creation if called
with irqs disabled.  The first call to call_rcu_tasks() that has irqs
enabled will create the kthread.

This bug was detected by rcutorture changes that were motivated by
Iftekhar Ahmed's mutation-testing efforts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15 15:45:00 -07:00
Wei Yongjun
05dbbfe753 rcutorture: Fix error return code in rcu_perf_init()
Fix to return a negative error code -ENOMEM from kcalloc() error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:03:32 -07:00
Boqun Feng
af06d4f74a rcuperf: Don't treat gp_exp mis-setting as a WARN
0day found a boot warning triggered in rcu_perf_writer() on !SMP kernel:

	WARN_ON(rcu_gp_is_normal() && gp_exp);

, the root cause of which is trying to measure expedited grace
periods(by setting gp_exp to true by default) when all the grace periods
are normal(TINY RCU only has normal grace periods).

However, such a mis-setting would only result in failing to measure the
performance for a specific kind of grace periods, therefore using a
WARN_ON to check this is a little overkilling. We could handle this
inside rcuperf module via some error messages to tell users about the
mis-settings.

Therefore this patch removes the WARN_ON in rcu_perf_writer() and
handles those checkings in rcu_perf_init() with plain if() code.

Moreover, this patch changes the default value of gp_exp to 1) align
with rcutorture tests and 2) make the default setting work for all RCU
implementations by default.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Fixes: http://lkml.kernel.org/r/57411b10.mFvG0+AgcrMXGtcj%fengguang.wu@intel.com
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:03:31 -07:00
Paul E. McKenney
4e9a073f60 torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
This commit removes CONFIG_RCU_TORTURE_TEST_RUNNABLE in favor of the
already-existing rcutorture.torture_runnable kernel boot parameter.
It also converts an #ifdef into IS_ENABLED(), saving a few lines of code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:02:15 -07:00
Paul E. McKenney
f8cbdee99b torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
This commit applies the infamous IS_ENABLED() macro to eliminate a #ifdef.
It also eliminates the RCU_PERF_TEST_RUNNABLE Kconfig option in favor
of the already-existing rcuperf.perf_runnable kernel boot parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:02:15 -07:00
Paul E. McKenney
40e0a6cfd5 rcu: Move expedited code from tree_plugin.h to tree_exp.h
People have been having some difficulty finding their way around the
RCU code.  This commit therefore pulls some of the expedited grace-period
code from tree_plugin.h to a new tree_exp.h file.  This commit is strictly
code movement.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:01:42 -07:00
Paul E. McKenney
3549c2bc2c rcu: Move expedited code from tree.c to tree_exp.h
People have been having some difficulty finding their way around the
RCU code.  This commit therefore pulls some of the expedited grace-period
code from tree.c to a new tree_exp.h file.  This commit is strictly code
movement, with the exception of a forward declaration that was added
for the sync_sched_exp_online_cleanup() function.

A subsequent commit will move the remaining expedited grace-period code
from tree_plugin.h to tree_exp.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:01:41 -07:00
Peter Zijlstra
d3acab65f2 rcu: Remove some superfluous lines
I think you'll find this condition is superfluous, as the whole function
is under #ifdef of that same.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:01:41 -07:00
Paul E. McKenney
590d1757b9 rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init()
In the past, RCU grace-period initialization excluded CPU-hotplug
operations, but this is no longer the case.  This commit therefore
removed an outdated comment in rcu_gp_init() claiming that these
are excluded.

Reported-by: Lihao Liang <lihao.liang@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:01:40 -07:00
Paul E. McKenney
0d95092ccb rcu: Fix outdated rcu_scheduler_active comment
The comment header for rcu_scheduler_active states that it is used
to optimize synchronize_sched() at early boot.  This is incorrect.
The synchronize_sched() function instead checks the number of online
CPUs.  This commit therefore replaces the comment's synchronize_sched()
with synchronize_rcu(), which really does use rcu_scheduler_active for
this purpose.

Reported-by: Lihao Liang <lihao.liang@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:01:39 -07:00
Peter Zijlstra
6428671bae locking/mutex: Optimize mutex_trylock() fast-path
A while back Viro posted a number of 'interesting' mutex_is_locked()
users on IRC, one of those was RCU.

RCU seems to use mutex_is_locked() to avoid doing mutex_trylock(), the
regular load before modify pattern.

While the use isn't wrong per se, its curious in that its needed at all,
mutex_trylock() should be good enough on its own to avoid the pointless
cacheline bounces.

So fix those and remove the mutex_is_locked() (ab)use from RCU.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/20160601185815.GW3190@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08 15:17:01 +02:00
Du, Changbin
b9fdac7f66 debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
When activating a static object we need make sure that the object is
tracked in the object tracker.  If it is a non-static object then the
activation is illegal.

In previous implementation, each subsystem need take care of this in
their fixup callbacks.  Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.

To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not.  If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.

This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.

At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked.  Then Change
such object can overwrite someone's memory and cause unexpected
behaviour.  For example, the timer_fixup_activate bind timer to function
stub_timer.

Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
  Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin
3263d28eb5 rcu: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).

Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Paul E. McKenney
dcd36d01fb Merge branches 'doc.2016.04.19a', 'exp.2016.03.31d', 'fixes.2016.03.31d' and 'torture.2016.04.21a' into HEAD
doc.2016.04.19a: Documentation updates
exp.2016.03.31d: Expedited grace-period updates
fixes.2016.03.31d: Miscellaneous fixes
torture.2016.004.21a Torture-test updates
2016-04-21 13:48:20 -07:00
Paul E. McKenney
0aa67e75b3 rcutorture: Add irqs-disabled test for call_rcu()
Mutation testing carried out by Iftekhar Ahmed of Oregon State
University showed that rcutorture is failing to test invocations
of call_rcu() having interrupts disabled.  This commit therefore
adds interrupt disabling around one of the existing invocations
of call_rcu() (and friends).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-04-21 13:47:04 -07:00
Anna-Maria Gleixner
de26ca19a5 rcutorture: Consider FROZEN hotplug notifier transitions
The hotplug notifier rcutorture_cpu_notify() doesn't consider the
corresponding CPU_XXX_FROZEN transitions. They occur on
suspend/resume and are usually handled the same way as the
corresponding non frozen transitions.

Mask the switch case action argument with '~CPU_TASKS_FROZEN' to map
CPU_XXX_FROZEN hotplug transitions on corresponding non-frozen
transitions.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:39:52 -07:00
Paul E. McKenney
67522beecf rcutorture: Remove redundant initialization to zero
The current code initializes the global per-CPU variables
rcu_torture_count and rcu_torture_batch to zero.  However, C does this
initialization by default, and explicit initialization of per-CPU
variables now needs a different syntax if "make tags" is to work.
This commit therefore removes the initialization.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:39:51 -07:00
Artem Savkov
e6fb1fc108 rcuperf: Do not wake up shutdown wait queue if "shutdown" is false.
After finishing its tests rcuperf tries to wake up shutdown_wq even if
"shutdown" param is set to false, resulting in a wake_up() call on an
unitialized wait_queue_head_t which leads to "BUG: spinlock bad magic" and
"BUG: unable to handle kernel NULL pointer dereference".

Fix by checking "shutdown" param before waking up the queue.

Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
2016-03-31 13:39:51 -07:00
Paul E. McKenney
620316e52a rcutorture: Avoid RCU CPU stall warning and RT throttling
Running rcuperf can result in RCU CPU stall warnings and RT throttling.
These occur because on of the real-time writer processes does
ftrace_dump() while still running at real-time priority.  This commit
therefore prevents these problems by setting the writer thread back to
SCHED_NORMAL (AKA SCHED_OTHER) before doing ftrace_dump().

In addition, this commit adds a small fixed delay before dumping ftrace
buffer in order to decrease the probability that this dumping will
interfere with other writers' grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:39:47 -07:00
Paul E. McKenney
df37e66bfd rcutorture: Add rcuperf holdoff boot parameter to reduce interference
Boot-time activity can legitimately grab CPUs for extended time periods,
so the commit adds a boot parameter to delay the start of the performance
test until boot has completed.  Defaults to 10 seconds.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:38:58 -07:00
Paul E. McKenney
ac2bb275e8 rcutorture: Make rcuperf collect expedited event-trace data
This commit enables ftrace in the rcuperf TREE kernel build and adds
an ftrace_dump() at the end of rcuperf processing.  This data will be
used to measure the actual durations of the expedited grace periods
without the added delays inherent in the kernel-module measurements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:38:53 -07:00
Paul E. McKenney
2094c99558 rcutorture: Set rcuperf writer kthreads to real-time priority
This commit forces more deterministic update-side behavior by setting
rcuperf's rcu_perf_writer() kthreads to real-time priority.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:37:39 -07:00
Paul E. McKenney
6b558c4c7a rcutorture: Bind rcuperf reader/writer kthreads to CPUs
This commit forces more deterministic behavior by binding rcuperf's
rcu_perf_reader() and rcu_perf_writer() kthreads to their respective
CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:37:39 -07:00
Paul E. McKenney
8704baab9b rcutorture: Add RCU grace-period performance tests
This commit adds a new rcuperf module that carries out simple performance
tests of RCU grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:37:38 -07:00
Paul E. McKenney
291783b8ad rcutorture: Expedited-GP batch progress access to torturing
This commit provides rcu_exp_batches_completed() and
rcu_exp_batches_completed_sched() functions to allow torture-test modules
to check how many expedited grace period batches have completed.
These are analogous to the existing rcu_batches_completed(),
rcu_batches_completed_bh(), and rcu_batches_completed_sched() functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:37:37 -07:00
Paul E. McKenney
9efafb8849 rcutorture: Allow for rcupdate.rcu_normal
Currently, rcu_torture_writer() checks only for rcu_gp_is_expedited()
when deciding whether or not to do dynamic control of RCU expediting.
This means that if rcupdate.rcu_normal is specified, rcu_torture_writer()
will attempt to dynamically control RCU expediting, but will nonetheless
only test normal RCU grace periods.  This commit therefore adds a check
for !rcu_gp_is_normal(), and prints a message and desists from testing
dynamic control of RCU expediting when doing so is futile.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:37:37 -07:00
Paul E. McKenney
5dffed1e57 rcu: Dump ftrace buffer when kicking grace-period kthread
If it is necessary to kick the grace-period kthread, that is a good
time to dump the trace buffer in order to learn why kicking was needed.
This commit therefore does the dump.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:36:37 -07:00
Boqun Feng
293e2421fe rcu: Remove superfluous versions of rcu_read_lock_sched_held()
Currently, we have four versions of rcu_read_lock_sched_held(), depending
on the combined choices on PREEMPT_COUNT and DEBUG_LOCK_ALLOC.  However,
there is an existing function preemptible() that already distinguishes
between the PREEMPT_COUNT=y and PREEMPT_COUNT=n cases, and allows these
four implementations to be consolidated down to two.

This commit therefore uses preemptible() to achieve this consolidation.

Note that there could be a small performance regression in the case
of CONFIG_DEBUG_LOCK_ALLOC=y && PREEMPT_COUNT=n.  However, given the
overhead associated with CONFIG_DEBUG_LOCK_ALLOC=y, this should be
down in the noise.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:50 -07:00
Paul E. McKenney
8c7c4829a8 rcu: Awaken grace-period kthread if too long since FQS
Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing.  This commit is a crude hack that does
a wakeup if a scheduling-clock interrupt sees that it has been
too long since force-quiescent-state (FQS) processing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:50 -07:00
Paul E. McKenney
fcfd0a237b rcu: Make FQS schedule advance only if FQS happened
Currently, the force-quiescent-state (FQS) code in rcu_gp_kthread() can
advance the next FQS even if one was not executed last time.  This can
happen due timeout-duration uncertainty.  This commit therefore avoids
advancing the FQS schedule unless an FQS was just executed.  In the
corner case where an FQS was not executed, but is due now, the code does
a one-jiffy wait.

This change prepares for kthread kicking.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:49 -07:00
Paul E. McKenney
86057b80ae rcu: Awaken grace-period kthread when stalled
Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing.  This commit is a crude hack that does
a wakeup any time a stall is detected.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:49 -07:00
Paul E. McKenney
3b5f668e71 rcu: Overlap wakeups with next expedited grace period
The current expedited grace-period implementation makes subsequent grace
periods wait on wakeups for the prior grace period.  This does not fit
the dictionary definition of "expedited", so this commit allows these two
phases to overlap.  Doing this requires four waitqueues rather than two
because tasks can now be waiting on the previous, current, and next grace
periods.  The fourth waitqueue makes the bit masking work out nicely.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:11 -07:00
Paul E. McKenney
aff12cdf86 rcu: Consolidate expedited GP code into exp_funnel_lock()
This commit pulls the grace-period-start counter adjustment and tracing
from synchronize_rcu_expedited() and synchronize_sched_expedited()
into exp_funnel_lock(), thus eliminating some code duplication.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:11 -07:00
Paul E. McKenney
179e5dcd1e rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap()
This commit moves some duplicate code from synchronize_rcu_expedited()
and synchronize_sched_expedited() into rcu_exp_gp_seq_snap().  This
doesn't save lines of code, but does eliminate a "tell me twice" issue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:10 -07:00
Paul E. McKenney
4ea3e85b11 rcu: Consolidate expedited GP code into rcu_exp_wait_wake()
Currently, synchronize_rcu_expedited() and rcu_sched_expedited() have
significant duplicate code.  This commit therefore consolidates some of
this code into rcu_exp_wake(), which is now renamed to rcu_exp_wait_wake()
in recognition of its added responsibilities.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:10 -07:00
Paul E. McKenney
356051e1de rcu: Add exp_funnel_lock() fastpath
This commit speeds up the low-contention case, especially for systems
with large rcu_node trees, by attempting to directly acquire the
->exp_mutex.  This fastpath checks the leaves and root first in
order to avoid excessive memory contention on the mutex itself.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:09 -07:00
Paul E. McKenney
f6a12f34a4 rcu: Enforce expedited-GP fairness via funnel wait queue
The current mutex-based funnel-locking approach used by expedited grace
periods is subject to severe unfairness.  The problem arises when a
few tasks, making a path from leaves to root, all wake up before other
tasks do.  A new task can then follow this path all the way to the root,
which needlessly delays tasks whose grace period is done, but who do
not happen to acquire the lock quickly enough.

This commit avoids this problem by maintaining per-rcu_node wait queues,
along with a per-rcu_node counter that tracks the latest grace period
sought by an earlier task to visit this node.  If that grace period
would satisfy the current task, instead of proceeding up the tree,
it waits on the current rcu_node structure using a pair of wait queues
provided for that purpose.  This decouples awakening of old tasks from
the arrival of new tasks.

If the wakeups prove to be a bottleneck, additional kthreads can be
brought to bear for that purpose.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:08 -07:00
Paul E. McKenney
d40a4f09a4 rcu: Shorten expedited_workdone* to exp_workdone*
Just a name change to save a few lines and a bit of typing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:08 -07:00
Paul E. McKenney
ec3833ed02 rcu: Force boolean subscript for expedited stall warnings
The cpu_online() function can return values other than 0 and 1, which
can result in subscript overflow when applied to a two-element array.
This commit allows for this behavior by using "!!" on the return value
from cpu_online() when used as a subscript.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:07 -07:00
Paul E. McKenney
e2fd9d3584 rcu: Remove expedited GP funnel-lock bypass
Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking")
turns out to be a pessimization at high load because it forces a tree
full of tasks to wait for an expedited grace period that they probably
do not need.  This commit therefore removes this optimization.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:07 -07:00
Paul E. McKenney
4f41530245 rcu: Add expedited-grace-period event tracing
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:06 -07:00
Paul E. McKenney
bea2de44ae rcu: Add funnel-locking tracing for expedited grace periods
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:06 -07:00
Paul E. McKenney
26ece8ef6e rcu: Fix synchronize_rcu_expedited() header comment
This commit brings the synchronize_rcu_expedited() function's header
comment into line with the new implementation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:04 -07:00
Paul E. McKenney
a1e1224849 rcu: Make cond_resched_rcu_qs() supply RCU-sched expedited QS
Although cond_resched_rcu_qs() supplies quiescent states to all flavors
of normal RCU grace periods, it does nothing for expedited RCU-sched
grace periods.  This commit therefore adds a check for a need for a
quiescent state from the current CPU by an expedited RCU-sched grace
period, and invokes rcu_sched_qs() to supply that quiescent state if so.

Note that the check is racy in that we might be migrated to some other
CPU just after checking the per-CPU variable.  This is OK because the
act of migration will do a context switch, which will supply the needed
quiescent state.  The only downside is that we might do an unnecessary
call to rcu_sched_qs(), but the probability is low and the overhead
is small.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:04 -07:00
Paul E. McKenney
251c617c75 rcu: Make expedited RCU-preempt stall warnings count accurately
Currently, synchronize_sched_expedited_wait() simply sets the ndetected
variable to the rcu_print_task_exp_stall() return value.  This means
that if the last rcu_node structure has no stalled tasks, record of
any stalled tasks in previous rcu_node structures is lost, which can
in turn result in failure to dump out the blocking rcu_node structures.
Or could, had the test been correct.

This commit therefore adds the return value of rcu_print_task_exp_stall()
to ndetected and corrects the later test for ndetected.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:03 -07:00
Paul E. McKenney
28728dd310 rcu: Make expedited RCU-sched grace period immediately detect idle
Currently, sync_sched_exp_handler() will force a reschedule unless
this CPU has already checked in or unless a reschedule has already
been called for.  This is clearly wasteful if sync_sched_exp_handler()
interrupted an idle CPU, so this commit immediately reports the
quiescent state in that case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:03 -07:00
Paul E. McKenney
274529ba9b rcu: Consolidate dumping of ftrace buffer
This commit consolidates a couple definitions and several calls for
single-shot ftrace-buffer dumping.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:29:08 -07:00
Dmitry Vyukov
5c9a8750a6 kernel: add kcov code coverage
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Linus Torvalds
271ecc5253 Merge branch 'akpm' (patches from Andrew)
Merge first patch-bomb from Andrew Morton:

 - some misc things

 - ofs2 updates

 - about half of MM

 - checkpatch updates

 - autofs4 update

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  autofs4: fix string.h include in auto_dev-ioctl.h
  autofs4: use pr_xxx() macros directly for logging
  autofs4: change log print macros to not insert newline
  autofs4: make autofs log prints consistent
  autofs4: fix some white space errors
  autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
  autofs4: fix coding style line length in autofs4_wait()
  autofs4: fix coding style problem in autofs4_get_set_timeout()
  autofs4: coding style fixes
  autofs: show pipe inode in mount options
  kallsyms: add support for relative offsets in kallsyms address table
  kallsyms: don't overload absolute symbol type for percpu symbols
  x86: kallsyms: disable absolute percpu symbols on !SMP
  checkpatch: fix another left brace warning
  checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
  checkpatch: warn on bare unsigned or signed declarations without int
  checkpatch: exclude asm volatile from complex macro check
  mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
  mm: migrate: consolidate mem_cgroup_migrate() calls
  mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  ...
2016-03-16 11:51:08 -07:00
Peter Zijlstra
25528213fe tags: Fix DEFINE_PER_CPU expansions
$ make tags
  GEN     tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"

Which are all the result of the DEFINE_PER_CPU pattern:

  scripts/tags.sh:200:	'/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
  scripts/tags.sh:201:	'/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'

The below cures them. All except the workqueue one are within reasonable
distance of the 80 char limit. TJ do you have any preference on how to
fix the wq one, or shall we just not care its too long?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Linus Torvalds
710d60cbf1 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug updates from Thomas Gleixner:
 "This is the first part of the ongoing cpu hotplug rework:

   - Initial implementation of the state machine

   - Runs all online and prepare down callbacks on the plugged cpu and
     not on some random processor

   - Replaces busy loop waiting with completions

   - Adds tracepoints so the states can be followed"

More detailed commentary on this work from an earlier email:
 "What's wrong with the current cpu hotplug infrastructure?

   - Asymmetry

     The hotplug notifier mechanism is asymmetric versus the bringup and
     teardown.  This is mostly caused by the notifier mechanism.

   - Largely undocumented dependencies

     While some notifiers use explicitely defined notifier priorities,
     we have quite some notifiers which use numerical priorities to
     express dependencies without any documentation why.

   - Control processor driven

     Most of the bringup/teardown of a cpu is driven by a control
     processor.  While it is understandable, that preperatory steps,
     like idle thread creation, memory allocation for and initialization
     of essential facilities needs to be done before a cpu can boot,
     there is no reason why everything else must run on a control
     processor.  Before this patch series, bringup looks like this:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

       bring the rest up

   - All or nothing approach

     There is no way to do partial bringups.  That's something which is
     really desired because we waste e.g.  at boot substantial amount of
     time just busy waiting that the cpu comes to life.  That's stupid
     as we could very well do preparatory steps and the initial IPI for
     other cpus and then go back and do the necessary low level
     synchronization with the freshly booted cpu.

   - Minimal debuggability

     Due to the notifier based design, it's impossible to switch between
     two stages of the bringup/teardown back and forth in order to test
     the correctness.  So in many hotplug notifiers the cancel
     mechanisms are either not existant or completely untested.

   - Notifier [un]registering is tedious

     To [un]register notifiers we need to protect against hotplug at
     every callsite.  There is no mechanism that bringup/teardown
     callbacks are issued on the online cpus, so every caller needs to
     do it itself.  That also includes error rollback.

  What's the new design?

     The base of the new design is a symmetric state machine, where both
     the control processor and the booting/dying cpu execute a well
     defined set of states.  Each state is symmetric in the end, except
     for some well defined exceptions, and the bringup/teardown can be
     stopped and reversed at almost all states.

     So the bringup of a cpu will look like this in the future:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

                                       bring itself up

     The synchronization step does not require the control cpu to wait.
     That mechanism can be done asynchronously via a worker or some
     other mechanism.

     The teardown can be made very similar, so that the dying cpu cleans
     up and brings itself down.  Cleanups which need to be done after
     the cpu is gone, can be scheduled asynchronously as well.

  There is a long way to this, as we need to refactor the notion when a
  cpu is available.  Today we set the cpu online right after it comes
  out of the low level bringup, which is not really correct.

  The proper mechanism is to set it to available, i.e. cpu local
  threads, like softirqd, hotplug thread etc. can be scheduled on that
  cpu, and once it finished all booting steps, it's set to online, so
  general workloads can be scheduled on it.  The reverse happens on
  teardown.  First thing to do is to forbid scheduling of general
  workloads, then teardown all the per cpu resources and finally shut it
  off completely.

  This patch series implements the basic infrastructure for this at the
  core level.  This includes the following:

   - Basic state machine implementation with well defined states, so
     ordering and prioritization can be expressed.

   - Interfaces to [un]register state callbacks

     This invokes the bringup/teardown callback on all online cpus with
     the proper protection in place and [un]installs the callbacks in
     the state machine array.

     For callbacks which have no particular ordering requirement we have
     a dynamic state space, so that drivers don't have to register an
     explicit hotplug state.

     If a callback fails, the code automatically does a rollback to the
     previous state.

   - Sysfs interface to drive the state machine to a particular step.

     This is only partially functional today.  Full functionality and
     therefor testability will be achieved once we converted all
     existing hotplug notifiers over to the new scheme.

   - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
     processor:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu
       wait for boot
                                       bring itself up

                                       Signal completion to control cpu

     In a previous step of this work we've done a full tree mechanical
     conversion of all hotplug notifiers to the new scheme.  The balance
     is a net removal of about 4000 lines of code.

     This is not included in this series, as we decided to take a
     different approach.  Instead of mechanically converting everything
     over, we will do a proper overhaul of the usage sites one by one so
     they nicely fit into the symmetric callback scheme.

     I decided to do that after I looked at the ugliness of some of the
     converted sites and figured out that their hotplug mechanism is
     completely buggered anyway.  So there is no point to do a
     mechanical conversion first as we need to go through the usage
     sites one by one again in order to achieve a full symmetric and
     testable behaviour"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  cpu/hotplug: Document states better
  cpu/hotplug: Fix smpboot thread ordering
  cpu/hotplug: Remove redundant state check
  cpu/hotplug: Plug death reporting race
  rcu: Make CPU_DYING_IDLE an explicit call
  cpu/hotplug: Make wait for dead cpu completion based
  cpu/hotplug: Let upcoming cpu bring itself fully up
  arch/hotplug: Call into idle with a proper state
  cpu/hotplug: Move online calls to hotplugged cpu
  cpu/hotplug: Create hotplug threads
  cpu/hotplug: Split out the state walk into functions
  cpu/hotplug: Unpark smpboot threads from the state machine
  cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
  cpu/hotplug: Implement setup/removal interface
  cpu/hotplug: Make target state writeable
  cpu/hotplug: Add sysfs state interface
  cpu/hotplug: Hand in target state to _cpu_up/down
  cpu/hotplug: Convert the hotplugged cpu work to a state machine
  cpu/hotplug: Convert to a state machine for the control processor
  cpu/hotplug: Add tracepoints
  ...
2016-03-15 13:50:29 -07:00
Ingo Molnar
670191572e Merge commit 'torture.2015.02.23a' into core/rcu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-15 09:01:17 +01:00
Ingo Molnar
8bc6782fe2 Merge commit 'fixes.2015.02.23a' into core/rcu
Conflicts:
	kernel/rcu/tree.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-15 09:01:06 +01:00
Thomas Gleixner
27d50c7eeb rcu: Make CPU_DYING_IDLE an explicit call
Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
invoked at the proper place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Rik van Riel <riel@redhat.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-01 20:36:58 +01:00
Paul Gortmaker
abedf8e241 rcu: Use simple wait queues where possible in rcutree
As of commit dae6e64d2b ("rcu: Introduce proper blocking to no-CBs kthreads
GP waits") the RCU subsystem started making use of wait queues.

Here we convert all additions of RCU wait queues to use simple wait queues,
since they don't need the extra overhead of the full wait queue features.

Originally this was done for RT kernels[1], since we would get things like...

  BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
  in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
  Pid: 8, comm: rcu_preempt Not tainted
  Call Trace:
   [<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0
   [<ffffffff817d77b4>] rt_spin_lock+0x24/0x50
   [<ffffffff8106fcf6>] __wake_up+0x36/0x70
   [<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680
   [<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50
   [<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80
   [<ffffffff8105eabb>] kthread+0xdb/0xe0
   [<ffffffff8106b912>] ? finish_task_switch+0x52/0x100
   [<ffffffff817e0754>] kernel_thread_helper+0x4/0x10
   [<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60
   [<ffffffff817e0750>] ? gs_change+0xb/0xb

...and hence simple wait queues were deployed on RT out of necessity
(as simple wait uses a raw lock), but mainline might as well take
advantage of the more streamline support as well.

[1] This is a carry forward of work from v3.10-rt; the original conversion
was by Thomas on an earlier -rt version, and Sebastian extended it to
additional post-3.10 added RCU waiters; here I've added a commit log and
unified the RCU changes into one, and uprev'd it to match mainline RCU.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-25 11:27:16 +01:00
Daniel Wagner
065bb78c5b rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock
rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently,
this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
not enable the IRQs. lockdep is happy.

By switching over using swait this is not true anymore. swake_up_all()
enables the IRQs while processing the waiters. __do_softirq() can now
run and will eventually call rcu_process_callbacks() which wants to
grap nrp->lock.

Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
switch over to swait.

If we would hold the rnp->lock and use swait, lockdep reports
following:

 =================================
 [ INFO: inconsistent lock state ]
 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
 ---------------------------------
 inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
 rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
 {IN-SOFTIRQ-W} state was registered at:
   [<ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0
   [<ffffffff8110be0f>] lock_acquire+0xdf/0x2b0
   [<ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0
   [<ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0
   [<ffffffff810b1a9d>] __do_softirq+0x14d/0x670
   [<ffffffff810b2214>] irq_exit+0x104/0x110
   [<ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60
   [<ffffffff81842e70>] apic_timer_interrupt+0x70/0x80
   [<ffffffff810dba66>] rq_attach_root+0xa6/0x100
   [<ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650
   [<ffffffff810e4b42>] build_sched_domains+0x942/0xb00
   [<ffffffff821777c2>] sched_init_smp+0x509/0x5c1
   [<ffffffff821551e3>] kernel_init_freeable+0x172/0x28f
   [<ffffffff8182cdce>] kernel_init+0xe/0xe0
   [<ffffffff8184231f>] ret_from_fork+0x3f/0x70
 irq event stamp: 76
 hardirqs last  enabled at (75): [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
 hardirqs last disabled at (76): [<ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90
 softirqs last  enabled at (0): [<ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0
 softirqs last disabled at (0): [<          (null)>]           (null)
 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(rcu_node_1);
   <Interrupt>
     lock(rcu_node_1);
  *** DEADLOCK ***
 1 lock held by rcu_preempt/8:
  #0:  (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
 stack backtrace:
 CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136
 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
  0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
  0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
  0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
 Call Trace:
  [<ffffffff818379e0>] dump_stack+0x4f/0x7b
  [<ffffffff8110813b>] print_usage_bug+0x1db/0x1e0
  [<ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50
  [<ffffffff811087ad>] mark_lock+0x66d/0x6e0
  [<ffffffff81107790>] ? check_usage_forwards+0x150/0x150
  [<ffffffff81108898>] mark_held_locks+0x78/0xa0
  [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
  [<ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220
  [<ffffffff81108aed>] trace_hardirqs_on+0xd/0x10
  [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
  [<ffffffff810fd1c7>] swake_up_all+0xb7/0xe0
  [<ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0
  [<ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220
  [<ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60
  [<ffffffff81137c30>] ? rcu_barrier+0x20/0x20
  [<ffffffff810d2014>] kthread+0x104/0x120
  [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
  [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260
  [<ffffffff8184231f>] ret_from_fork+0x3f/0x70
  [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-25 11:27:16 +01:00
Paul E. McKenney
4f2a848c56 rcu: Export rcu_gp_is_normal()
This commit exports rcu_gp_is_normal() in order to allow it to be used
by rcutorture and rcuperf.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 20:04:51 -08:00
Paul E. McKenney
4b455dc3e1 rcu: Catch up rcu_report_qs_rdp() comment with reality
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:56 -08:00
Paul Gortmaker
9fc9204ef9 rcu: Make rcu/tiny_plugin.h explicitly non-modular
The Kconfig currently controlling compilation of this code is:

init/Kconfig:config TINY_RCU
init/Kconfig:   bool

...meaning that it currently is not being built as a module by anyone.

Lets remove the modular code that is essentially orphaned, so that
when reading the code there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.  We could
consider moving this to an earlier initcall (subsys?) if desired.

We also delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:55 -08:00
Boqun Feng
67c583a7de RCU: Privatize rcu_node::lock
In patch:

"rcu: Add transitivity to remaining rcu_node ->lock acquisitions"

All locking operations on rcu_node::lock are replaced with the wrappers
because of the need of transitivity, which indicates we should never
write code using LOCK primitives alone(i.e. without a proper barrier
following) on rcu_node::lock outside those wrappers. We could detect
this kind of misuses on rcu_node::lock in the future by adding __private
modifier on rcu_node::lock.

To privatize rcu_node::lock, unlock wrappers are also needed. Replacing
spinlock unlocks with these wrappers not only privatizes rcu_node::lock
but also makes it easier to figure out critical sections of rcu_node.

This patch adds __private modifier to rcu_node::lock and makes every
access to it wrapped by ACCESS_PRIVATE(). Besides, unlock wrappers are
added and raw_spin_unlock(&rnp->lock) and its friends are replaced with
those wrappers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:54 -08:00
Chen Gang
1914aab543 rcu: Remove useless rcu_data_p when !PREEMPT_RCU
The related warning from gcc 6.0:

  In file included from kernel/rcu/tree.c:4630:0:
  kernel/rcu/tree_plugin.h:810:40: warning: ‘rcu_data_p’ defined but not used [-Wunused-const-variable]
   static struct rcu_data __percpu *const rcu_data_p = &rcu_sched_data;
                                          ^~~~~~~~~~

Also remove always redundant rcu_data_p in tree.c.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:53 -08:00
Paul E. McKenney
aa5a898876 rcutorture: Correct no-expedite console messages
The "Disabled dynamic grace-period expediting" console message is
currently printed unconditionally.  This commit causes it to be
output only when it is impossible to switch between normal and
expedited grace periods, which was the original intent.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:52 -08:00
Paul E. McKenney
23a9bacd35 rcu: Set rdp->gpwrap when CPU is idle
Commit #e3663b1024d1 ("rcu: Handle gpnum/completed wrap while dyntick
idle") sets rdp->gpwrap on the wrong side of the "if" statement in
dyntick_save_progress_counter(), that is, it sets it when the CPU is
not idle instead of when it is idle.  Of course, if the CPU is not idle,
its rdp->gpnum won't be lagging beind the global rsp->gpnum, which means
that rdp->gpwrap will never be set.

This commit therefore moves this code to the proper leg of that "if"
statement.  This change means that the "else" cause is just "return 0"
and the "then" clause ends with "return 1", so also move the "return 0"
to follow the "if", dropping the "else" clause.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:52 -08:00
Paul E. McKenney
4914950aaa rcu: Stop treating in-kernel CPU-bound workloads as errors
Commit 4a81e8328d ("Reduce overhead of cond_resched() checks for RCU")
handles the error case where a nohz_full loops indefinitely in the kernel
with the scheduling-clock interrupt disabled.  However, this handling
includes IPIing the CPU running the offending loop, which is not what
we want for real-time workloads.  And there are starting to be real-time
CPU-bound in-kernel workloads, and these must be handled without IPIing
the CPU, at least not in the common case.  Therefore, this situation can
no longer be dismissed as an error case.

This commit therefore splits the handling out, so that the setting of
bits in the per-CPU rcu_sched_qs_mask variable is done relatively early,
but if the problem persists, resched_cpu() is eventually used to IPI the
CPU containing the offending loop.  Assuming that in-kernel CPU-bound
loops used by real-time tasks contain frequent calls cond_resched_rcu_qs()
(as in more than once per few tens of milliseconds), the real-time tasks
will never be IPIed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2016-02-23 19:59:51 -08:00
Paul E. McKenney
8994515cf0 rcu: Update rcu_report_qs_rsp() comment
The header comment for rcu_report_qs_rsp() was obsolete, dating well
before the advent of RCU grace-period kthreads.  This commit therefore
brings this comment back into alignment with current reality.

Reported-by: Lihao Liang <lihao.liang@cs.ox.ac.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:51 -08:00
Paul E. McKenney
bb53e416e0 rcu: Assign false instead of 0 for ->core_needs_qs
A zero seems to have escaped earlier true/false substitution efforts,
so this commit changes 0 to false for the ->core_needs_qs boolean field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:51 -08:00
Paul E. McKenney
648c630c64 Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD
doc.2015.12.05a:  Documentation updates
exp.2015.12.07a:  Expedited grace-period updates
fixes.2015.12.07a:  Miscellaneous fixes
list.2015.12.04b:  Linked-list updates
torture.2015.12.05a:  Torture-test updates
2015-12-07 17:02:54 -08:00
Paul E. McKenney
45fed3e7cf rcu: Make rcu_gp_init() be bool rather than int
The return value from rcu_gp_init() is always used as a bool, so
this commit makes it be a bool.

Reported-by: Iftekhar Ahmed <ahmedi@oregonstate.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:33 -08:00
Peter Zijlstra
e11f13355b rcu: Move wakeup out from under rnp->lock
This patch removes a potential deadlock hazard by moving the
wake_up_process() in rcu_spawn_gp_kthread() out from under rnp->lock.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:32 -08:00
Paul E. McKenney
7c9906ca5e rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}()
This commit replaces a local_irq_save()/local_irq_restore() pair with
a lockdep assertion that interrupts are already disabled.  This should
remove the corresponding overhead from the interrupt entry/exit fastpaths.

This change was inspired by the fact that Iftekhar Ahmed's mutation
testing showed that removing rcu_irq_enter()'s call to local_ird_restore()
had no effect, which might indicate that interrupts were always enabled
anyway.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:31 -08:00
Paul E. McKenney
d117c8aa1d rcu: Make cpu_needs_another_gp() be bool
The cpu_needs_another_gp() function is currently of type int, but only
returns zero or one.  Bow to reality and make it be of type bool.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:31 -08:00
Paul E. McKenney
a87f203e27 rcu: Eliminate unused rcu_init_one() argument
Now that the rcu_state structure's ->rda field is compile-time initialized,
there is no need to pass the per-CPU rcu_data structure into rcu_init_one().
This commit therefore eliminates this now-unused parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:19 -08:00
Paul E. McKenney
79cfea0273 rcu: Remove TINY_RCU bloat from pointless boot parameters
The rcu_expedited, rcu_normal, and rcu_normal_after_boot kernel boot
parameters are pointless in the case of TINY_RCU because in that case
synchronous grace periods, both expedited and normal, are no-ops.
However, these three symbols contribute several hundred bytes of bloat.
This commit therefore uses CPP directives to avoid compiling this code
in TINY_RCU kernels.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-07 16:59:37 -08:00
Paul E. McKenney
6b50e119c4 rcutorture: Print symbolic name for ->gp_state
Currently, ->gp_state is printed as an integer, which slows debugging.
This commit therefore prints a symbolic name in addition to the integer.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updated to fix relational operator called out by Dan Carpenter. ]
[ paulmck: More "const", as suggested by Josh Triplett. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-05 17:58:26 -08:00
Paul E. McKenney
18aff33e73 rcutorture: Print symbolic name for rcu_torture_writer_state
Currently, rcu_torture_writer_state is printed as an integer, which slows
debugging.  This commit therefore prints a symbolic name in addition to
the integer.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: More "const", as suggested by Josh Triplett. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-05 17:58:22 -08:00
Paul E. McKenney
b1adb3e273 rcutorture: Dump stack when GP kthread stalls
This commit increases debug information in the case where the grace-period
kthread is being prevented from running by dumping that kthread's stack.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Split into prior commit and this commit, as suggested by
  Josh Triplett. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-05 17:58:05 -08:00
Paul E. McKenney
a0e3a3aa28 rcutorture: Flag nonexistent RCU GP kthread
Currently, if the RCU grace-period kthread has not yet been created,
in which case the starvation-check code will print zero for the state,
which maps to TASK_RUNNING.  This could clearly be quite confusing, so
this commit prints ~0, which does not map to any legal ->state value.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-05 17:58:00 -08:00
Paul E. McKenney
46a5d164db rcu: Stop disabling interrupts in scheduler fastpaths
We need the scheduler's fastpaths to be, well, fast, and unnecessarily
disabling and re-enabling interrupts is not necessarily consistent with
this goal.  Especially given that there are regions of the scheduler that
already have interrupts disabled.

This commit therefore moves the call to rcu_note_context_switch()
to one of the interrupts-disabled regions of the scheduler, and
removes the now-redundant disabling and re-enabling of interrupts from
rcu_note_context_switch() and the functions it calls.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
  by Peter Zijlstra. ]
2015-12-04 12:27:31 -08:00
Paul E. McKenney
f0f2e7d307 rcu: Avoid tick_nohz_active checks on NOCBs CPUs
Currently, rcu_prepare_for_idle() checks for tick_nohz_active, even on
individual NOCBs CPUs, unless all CPUs are marked as NOCBs CPUs at build
time.  This check is pointless on NOCBs CPUs because they never have any
callbacks posted, given that all of their callbacks are handed off to the
corresponding rcuo kthread.  There is a check for individually designated
NOCBs CPUs, but it pointelessly follows the check for tick_nohz_active.

This commit therefore moves the check for individually designated NOCBs
CPUs up with the check for CONFIG_RCU_NOCB_CPU_ALL.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:31 -08:00
Paul E. McKenney
699d403520 rcu: Fix obsolete rcu_bootup_announce_oddness() comment
This function no longer has #ifdefs, so this commit removes the
header comment calling them out.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:30 -08:00
Paul E. McKenney
8ba9153b2c rcu: Remove lock-acquisition loop from rcu_read_unlock_special()
Several releases have come and gone without the warning triggering,
so remove the lock-acquisition loop.  Retain the WARN_ON_ONCE()
out of sheer paranoia.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:30 -08:00
Paul E. McKenney
fecbf6f01f rcu: Simplify rcu_sched_qs() control flow
This commit applies an early-exit approach to rcu_sched_qs(), reducing
the nesting level and saving a line of code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:29 -08:00
Paul Gortmaker
47dbc90663 kernel: Make rcu/tree_trace.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:

init/Kconfig:config TREE_RCU_TRACE
init/Kconfig:   def_bool RCU_TRACE && ( TREE_RCU || PREEMPT_RCU )

...meaning that it currently is not being built as a module by anyone.

Lets remove the modular code that is essentially orphaned, so that
when reading the file there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.  We could
consider moving this to an earlier initcall if desired.

We don't replace module.h with init.h since the file already has that.
We also delete the moduleparam.h include that is left over from
commit 64db4cfff9 (""Tree RCU": scalable
classic RCU implementation") since it is not needed here either.

We morph some tags like MODULE_AUTHOR into the comments at the top of
the file for documentation purposes.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:29 -08:00
Paul E. McKenney
3dc5dbe9a1 rcu: Move lock_class_key to local scope
Currently, the rcu_node_class[], rcu_fqs_class[], and rcu_exp_class[]
arrays needlessly pollute the global namespace within tree.c.  This
commit therefore converts them to static local variables within
rcu_init_one().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:27:29 -08:00
Paul E. McKenney
3e42ec1aa7 rcu: Allow expedited grace periods to be disabled at init
Expedited grace periods can speed up boot, but are undesirable in
aggressive real-time systems.  This commit therefore introduces a
kernel parameter rcupdate.rcu_normal_after_boot that disables
expedited grace periods just before init is spawned.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:54 -08:00
Paul E. McKenney
5a9be7c628 rcu: Add rcu_normal kernel parameter to suppress expediting
Although expedited grace periods can be quite useful, and although their
OS jitter has been greatly reduced, they can still pose problems for
extreme real-time workloads.  This commit therefore adds a rcu_normal
kernel boot parameter (which can also be manipulated via sysfs)
to suppress expedited grace periods, that is, to treat requests for
expedited grace periods as if they were requests for normal grace periods.
If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
This means that if you are relying on expedited grace periods to speed up
boot, you will want to specify rcu_expedited on the kernel command line,
and then specify rcu_normal via sysfs once boot completes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:53 -08:00
Paul E. McKenney
72611ab9f5 rcu: Add more diagnostics to expedited stall warning messages.
This commit adds print statements that check the rcu_node structure to
find which ->expmask bits and which ->exp_tasks structures are blocking
the current expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:53 -08:00
Paul E. McKenney
73f36f9de8 rcu: Make expedited grace periods resolve stall-warning ties
Currently, if a grace period ends just as the stall-warning timeout
fires, an empty stall warning will be printed.  This is not helpful,
so this commit avoids these useless warnings by rechecking completion
after awakening in synchronize_sched_expedited_wait().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:52 -08:00
Paul E. McKenney
df5bd5144a rcu: Reduce expedited GP memory contention via per-CPU variables
Currently, the piggybacked-work checks carried out by sync_exp_work_done()
atomically increment a small set of variables (the ->expedited_workdone0,
->expedited_workdone1, ->expedited_workdone2, ->expedited_workdone3
fields in the rcu_state structure), which will form a memory-contention
bottleneck given a sufficiently large number of CPUs concurrently invoking
either synchronize_rcu_expedited() or synchronize_sched_expedited().

This commit therefore moves these for fields to the per-CPU rcu_data
structure, eliminating the memory contention.  The show_rcuexp() function
also changes to sum up each field in the rcu_data structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:52 -08:00
Paul E. McKenney
1307f21487 rcu: Invert sync_rcu_exp_select_cpus() "if" statement
This commit saves a couple lines of code and reduces indentation
by inverting the sense of an "if" statement in the function
sync_rcu_exp_select_cpus().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:51 -08:00
Paul E. McKenney
886ef5a18a rcu: Move smp_mb() from rcu_seq_snap() to rcu_exp_gp_seq_snap()
The memory barrier in rcu_seq_snap() is needed only for grace periods,
so this commit moves it to the grace-period-oriented wrapper
rcu_exp_gp_seq_snap().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:51 -08:00
Paul E. McKenney
1de6e56ddc rcu: Clarify role of ->expmaskinitnext
Analogy with the ->qsmaskinitnext field might lead one to believe that
->expmaskinitnext tracks online CPUs.  This belief is incorrect: Any CPU
that has ever been online will have its bit set in the ->expmaskinitnext
field.  This commit therefore adds a comment to make this clear, at
least to people who read comments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:50 -08:00
Paul E. McKenney
06f60de19d rcu: Short-circuit synchronize_sched_expedited() if only one CPU
If there is only one CPU, then invoking synchronize_sched_expedited()
is by definition a grace period.  This commit checks for this condition
and does a short-circuit return in that case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:50 -08:00
Paul E. McKenney
6cf1008122 rcu: Add transitivity to remaining rcu_node ->lock acquisitions
The rule is that all acquisitions of the rcu_node structure's ->lock
must provide transitivity:  The lock is not acquired that frequently,
and sorting out exactly which required it and which did not would be
a maintenance nightmare.  This commit therefore supplies the needed
transitivity to the remaining ->lock acquisitions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-11-23 10:37:35 -08:00
Peter Zijlstra
2a67e741bb rcu: Create transitive rnp->lock acquisition functions
Providing RCU's memory-ordering guarantees requires that the rcu_node
tree's locking provide transitive memory ordering, which the Linux kernel's
spinlocks currently do not provide unless smp_mb__after_unlock_lock()
is used.  Having a separate smp_mb__after_unlock_lock() after each and
every lock acquisition is error-prone, hard to read, and a bit annoying,
so this commit provides wrapper functions that pull in the
smp_mb__after_unlock_lock() invocations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-11-23 10:37:35 -08:00
Paul E. McKenney
39cd2dd39a Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and 'torture.2015.10.06a' into HEAD
doc.2015.10.06a:  Documentation updates.
percpu-rwsem.2015.10.06a:  Optimization of per-CPU reader-writer semaphores.
torture.2015.10.06a:  Torture-test updates.
2015-10-07 16:06:25 -07:00
Paul E. McKenney
d2856b046d Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEAD
exp.2015.10.07a:  Reduce OS jitter of RCU-sched expedited grace periods.
fixes.2015.10.06a:  Miscellaneous fixes.
2015-10-07 16:05:21 -07:00
Paul E. McKenney
338b0f760e rcu: Better hotplug handling for synchronize_sched_expedited()
Earlier versions of synchronize_sched_expedited() can prematurely end
grace periods due to the fact that a CPU marked as cpu_is_offline()
can still be using RCU read-side critical sections during the time that
CPU makes its last pass through the scheduler and into the idle loop
and during the time that a given CPU is in the process of coming online.
This commit therefore eliminates this window by adding additional
interaction with the CPU-hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
b08517c76d rcu: Enable stall warnings for synchronize_rcu_expedited()
This commit redirects synchronize_rcu_expedited()'s wait to
synchronize_sched_expedited_wait(), thus enabling RCU CPU
stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
c58656382e rcu: Add tasks to expedited stall-warning messages
This commit adds task-print ability to the expedited RCU CPU stall
warning messages in preparation for adding stall warnings to
synchornize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
74611ecb0f rcu: Add online/offline info to expedited stall warning message
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after the CPU number.  A "O" indicates global
offline, a "." global online, and a "o" indicates RCU believes that the
CPU is offline for the current grace period and "." otherwise, and an
"N" indicates that RCU believes that the CPU will be offline for the
next grace period, and "." otherwise, all right after the CPU number.
So for CPU 10, you would normally see "10-...:" indicating that everything
believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
dcdb8807ba rcu: Consolidate expedited CPU selection
Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus()
are identical aside from the the argument to smp_call_function_single(),
this commit consolidates them with a functional argument.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
66fe6cbee4 rcu: Prepare for consolidating expedited CPU selection
This commit brings sync_sched_exp_select_cpus() into alignment with
sync_rcu_exp_select_cpus(), as a first step towards consolidating them
into one function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
807226e2fb rcu: Stop excluding CPU hotplug in synchronize_sched_expedited()
Now that synchronize_sched_expedited() uses IPIs, a hook in
rcu_sched_qs(), and the ->expmask field in the rcu_node combining
tree, it is no longer necessary to exclude CPU hotplug.  Any
races with CPU hotplug will be detected when attempting to send
the IPI.  This commit therefore removes the code excluding
CPU hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:49 -07:00
Paul E. McKenney
83c2c735e7 rcu: Stop silencing lockdep false positive for expedited grace periods
This reverts commit af859beaab (rcu: Silence lockdep false positive
for expedited grace periods).  Because synchronize_rcu_expedited()
no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
acquisition is no longer nested, so the false positive no longer happens.
This commit therefore removes the extra lockdep data structures, as they
are no longer needed.
2015-10-07 16:02:49 -07:00
Paul E. McKenney
6587a23b6b rcu: Switch synchronize_sched_expedited() to IPI
This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
to smp_call_function_single(), thus moving from an IPI and a pair of
context switches to an IPI and a single pass through the scheduler.
Of course, if the scheduler actually does decide to switch to a different
task, there will still be a pair of context switches, but there would
likely have been a pair of context switches anyway, just a bit later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:01:12 -07:00
Paul E. McKenney
4f441a258f rcutorture: Fix unused-function warning for torturing_tasks()
The torturing_tasks() function is used only in kernels built with
CONFIG_PROVE_RCU=y, so the second definition can result in unused-function
compiler warnings.  This commit adds __maybe_unused to suppress these
warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:28:09 -07:00
Paul E. McKenney
889d487a26 rcutorture: Fix module unwind when bad torture_type specified
The rcutorture module has a list of torture types, and specifying a
type not on this list is supposed to cleanly fail the module load.
Unfortunately, the "fail" happens without the "cleanly".  This commit
therefore adds the needed clean-up after an incorrect torture_type.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:28:01 -07:00
Oleg Nesterov
4bace7344d rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
   change it to use rcu_lockdep_assert().

2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
   unconditonally, this way we can remove the same check from
   rcu_sync_lockdep_assert() and clearly isolate the debugging
   code.

Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but
this needs some simple preparations in the core RCU code to avoid the
code duplication.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:45 -07:00
Oleg Nesterov
07899a6e5f rcu_sync: Introduce rcu_sync_dtor()
This commit allows rcu_sync structures to be safely deallocated,
The trick is to add a new ->wait field to the gp_ops array.
This field is a pointer to the rcu_barrier() function corresponding
to the flavor of RCU in question.  This allows a new rcu_sync_dtor()
to wait for any outstanding callbacks before freeing the rcu_sync
structure.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:21 -07:00
Oleg Nesterov
3a518b76af rcu_sync: Add CONFIG_PROVE_RCU checks
This commit validates that the caller of rcu_sync_is_idle() holds the
corresponding type of RCU read-side lock, but only in kernels built
with CONFIG_PROVE_RCU=y.  This validation is carried out via a new
rcu_sync_ops->held() method that is checked within rcu_sync_is_idle().

Note that although this does add code to the fast path, it only does so
in kernels built with CONFIG_PROVE_RCU=y.

Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:16 -07:00
Oleg Nesterov
82e8c565be rcu_sync: Simplify rcu_sync using new rcu_sync_ops structure
This commit adds the new struct rcu_sync_ops which holds sync/call
methods, and turns the function pointers in rcu_sync_struct into an array
of struct rcu_sync_ops.  This simplifies the "init" helpers by collapsing
a switch statement and explicit multiple definitions into a simple
assignment and a helper macro, respectively.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:10 -07:00
Oleg Nesterov
cc44ca848f rcu: Create rcu_sync infrastructure
The rcu_sync infrastructure can be thought of as infrastructure to be
used to implement reader-writer primitives having extremely lightweight
readers during times when there are no writers.  The first use is in
the percpu_rwsem used by the VFS subsystem.

This infrastructure is functionally equivalent to

        struct rcu_sync_struct {
                atomic_t counter;
        };

	/* Check possibility of fast-path read-side operations. */
        static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
        {
                return atomic_read(&rss->counter) == 0;
        }

	/* Tell readers to use slowpaths. */
        static inline void rcu_sync_enter(struct rcu_sync_struct *rss)
        {
                atomic_inc(&rss->counter);
                synchronize_sched();
        }

	/* Allow readers to once again use fastpaths. */
        static inline void rcu_sync_exit(struct rcu_sync_struct *rss)
        {
                synchronize_sched();
                atomic_dec(&rss->counter);
        }

The main difference is that it records the state and only calls
synchronize_sched() if required.  At least some of the calls to
synchronize_sched() will be optimized away when rcu_sync_enter() and
rcu_sync_exit() are invoked repeatedly in quick succession.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:04 -07:00
Paul E. McKenney
3836f5337f torture: Consolidate cond_resched_rcu_qs() into stutter_wait()
This commit moves cond_resched_rcu_qs() into stutter_wait(), saving
a line and also avoiding RCU CPU stall warnings from all torture
loops containing a stutter_wait().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:01 -07:00
Paul E. McKenney
c34d2f4184 rcu: Correct comment for values of ->gp_state field
This commit corrects the comment for the values of the ->gp_state field,
which previously incorrectly said that these were for the ->gp_flags
field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:16:11 -07:00
Petr Mladek
77f81fe08e rcu: Finish folding ->fqs_state into ->gp_state
Commit commit 4cdfc175c2 ("rcu: Move quiescent-state forcing
into kthread") started the process of folding the old ->fqs_state into
->gp_state, but did not complete it.  This situation does not cause
any malfunction, but can result in extremely confusing trace output.
This commit completes this task of eliminating ->fqs_state in favor
of ->gp_state.

The old ->fqs_state was also used to decide when to collect dyntick-idle
snapshots.  For this purpose, we add a boolean variable into the kthread,
which is set on the first call to rcu_gp_fqs() for a given grace period
and clear otherwise.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:15:59 -07:00
Paul E. McKenney
49f5903b47 rcu: Move preemption disabling out of __srcu_read_lock()
Currently, __srcu_read_lock() cannot be invoked from restricted
environments because it contains calls to preempt_disable() and
preempt_enable(), both of which can invoke lockdep, which is a bad
idea in some restricted execution modes.  This commit therefore moves
the preempt_disable() and preempt_enable() from __srcu_read_lock()
to srcu_read_lock().  It also inserts the preempt_disable() and
preempt_enable() around the call to __srcu_read_lock() in do_exit().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:15:43 -07:00
Paul E. McKenney
7f21aeef72 rcu: Add online/offline info to stall warning message
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after a hyphen following the CPU number.  A "O"
indicates that the global CPU-hotplug system believes that the CPU is
online, a "o" that RCU perceived the CPU to be online at the beginning
of the current expedited grace period, and an "N" that RCU currently
believes that it will perceive the CPU as being online at the beginning
of the next expedited grace period, with "." otherwise for all three
indications.  So for CPU 10, you would normally see "10-OoN:" indicating
that everything believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:10:18 -07:00
Paul E. McKenney
ee968ac61d rcu: Eliminate panic when silly boot-time fanout specified
This commit loosens rcutree.rcu_fanout_leaf range checks
and replaces a panic() with a fallback to compile-time values.
This fallback is accompanied by a WARN_ON(), and both occur when the
rcutree.rcu_fanout_leaf value is too small to accommodate the number of
CPUs.  For example, given the current four-level limit for the rcu_node
tree, a system with more than 16 CPUs built with CONFIG_FANOUT=2 must
have rcutree.rcu_fanout_leaf larger than 2.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:09:41 -07:00
Boqun Feng
bb73c52bad rcu: Don't disable preemption for Tiny and Tree RCU readers
Because preempt_disable() maps to barrier() for non-debug builds,
it forces the compiler to spill and reload registers.  Because Tree
RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
barrier() instances generate needless extra code for each instance of
rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
RCU and bloats Tiny RCU.

This commit therefore removes the preempt_disable() and preempt_enable()
from the non-preemptible implementations of __rcu_read_lock() and
__rcu_read_unlock(), respectively.  However, for debug purposes,
preempt_disable() and preempt_enable() are still invoked if
CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
atomic sections in non-preemptible kernels.

However, Tiny and Tree RCU operates by coalescing all RCU read-side
critical sections on a given CPU that lie between successive quiescent
states.  It is therefore necessary to compensate for removing barriers
from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
couple of the RCU functions invoked during quiescent states, namely to
rcu_all_qs() and rcu_note_context_switch().  However, note that the latter
is more paranoia than necessity, at least until link-time optimizations
become more aggressive.

This is based on an earlier patch by Paul E. McKenney, fixing
a bug encountered in kernels built with CONFIG_PREEMPT=n and
CONFIG_PREEMPT_COUNT=y.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:08:23 -07:00
Boqun Feng
db3e8db45e rcu: Use call_rcu_func_t to replace explicit type equivalents
We have had the call_rcu_func_t typedef for a quite awhile, but we still
use explicit function pointer types in some places.  These types can
confuse cscope and can be hard to read.  This patch therefore replaces
these types with the call_rcu_func_t typedef.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:19 -07:00
Boqun Feng
b6a4ae766e rcu: Use rcu_callback_t in call_rcu*() and friends
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.

Besides, this can also help cscope to generate a better database for
code reading.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:05 -07:00
Paul E. McKenney
5b74c45890 rcu: Make ->cpu_no_qs be a union for aggregate OR
This commit converts the rcu_data structure's ->cpu_no_qs field
to a union.  The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.

For now, only .norm is used.  A later commit will introduce the expedited
usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
0d43eb34f9 rcu: Invert passed_quiesce and rename to cpu_no_qs
This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs.  This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
97c668b8e9 rcu: Rename qs_pending to core_needs_qs
An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check.  Otherwise, it needs to report the next
quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
bce5fa12aa rcu: Move synchronize_sched_expedited() to combining tree
Currently, synchronize_sched_expedited() uses a single global counter
to track the number of remaining context switches that the current
expedited grace period must wait on.  This is problematic on large
systems, where the resulting memory contention can be pathological.
This commit therefore makes synchronize_sched_expedited() instead use
the combining tree in the same manner as synchronize_rcu_expedited(),
keeping memory contention down to a dull roar.

This commit creates a temporary function sync_sched_exp_select_cpus()
that is very similar to sync_rcu_exp_select_cpus().  A later commit
will consolidate these two functions, which becomes possible when
synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
smp_call_function_single().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
8203d6d0ee rcu: Use single-stage IPI algorithm for RCU expedited grace period
The current preemptible-RCU expedited grace-period algorithm invokes
synchronize_sched_expedited() to enqueue all tasks currently running
in a preemptible-RCU read-side critical section, then waits for all the
->blkd_tasks lists to drain.  This works, but results in both an IPI and
a double context switch even on CPUs that do not happen to be running
in a preemptible RCU read-side critical section.

This commit implements a new algorithm that causes less OS jitter.
This new algorithm IPIs all online CPUs that are not idle (from an
RCU perspective), but refrains from self-IPIs.  If a CPU receiving
this IPI is not in a preemptible RCU read-side critical section (or
is just now exiting one), it pushes quiescence up the rcu_node tree,
otherwise, it sets a flag that will be handled by the upcoming outermost
rcu_read_unlock(), which will then push quiescence up the tree.

The expedited grace period must of course wait on any pre-existing blocked
readers, and newly blocked readers must be queued carefully based on
the state of both the normal and the expedited grace periods.  This
new queueing approach also avoids the need to update boost state,
courtesy of the fact that blocked tasks are no longer ever migrated to
the root rcu_node structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:19 -07:00
Paul E. McKenney
b9585e940a rcu: Consolidate tree setup for synchronize_rcu_expedited()
This commit replaces sync_rcu_preempt_exp_init1(() and
sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
and sync_exp_reset_tree(), which will also be used by
synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
contains code specific to synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
7922cd0e56 rcu: Move rcu_report_exp_rnp() to allow consolidation
This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
that later commits can make synchronize_sched_expedited() use them.
The non-code-movement portion of this commit tags rcu_report_exp_rnp()
as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
f4ecea309d rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq
Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
there is no need for the sync_rcu_preempt_exp_wq global variable.  This
commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
It also initializes ->expedited_wq only once at boot instead of at the
start of each expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:17 -07:00
Paul E. McKenney
19a5ecde08 rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex
In kernels built with CONFIG_PREEMPT=y, synchronize_rcu_expedited()
invokes synchronize_sched_expedited() while holding RCU-preempt's
root rcu_node structure's ->exp_funnel_mutex, which is acquired after
the rcu_data structure's ->exp_funnel_mutex.  The first thing that
synchronize_sched_expedited() will do is acquire RCU-sched's rcu_data
structure's ->exp_funnel_mutex.   There is no danger of an actual deadlock
because the locking order is always from RCU-preempt's expedited mutexes
to those of RCU-sched.  Unfortunately, lockdep considers both rcu_data
structures' ->exp_funnel_mutex to be in the same lock class and therefore
reports a deadlock cycle.

This commit silences this false positive by placing RCU-sched's rcu_data
structures' ->exp_funnel_mutex locks into their own lock class.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:01:22 -07:00
Paul E. McKenney
12d560f4ea rcu,locking: Privatize smp_mb__after_unlock_lock()
RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
2015-08-04 08:49:21 -07:00
Paul E. McKenney
3dbe43f6fb Merge branches 'doc.2015.07.15a' and 'torture.2015.07.15a' into HEAD
doc.2015.07.15a: Documentation updates.
torture.2015.07.15a: Torture-test updates.
2015-08-04 08:42:02 -07:00
Paul E. McKenney
8ff4fbfd69 Merge branches 'fixes.2015.07.22a' and 'initexp.2015.08.04a' into HEAD
fixes.2015.07.22a: Miscellaneous fixes.
initexp.2015.08.04a: Initialization and expedited updates.
	(Single branch due to conflicts.)
2015-08-04 08:40:58 -07:00
Paul E. McKenney
af859beaab rcu: Silence lockdep false positive for expedited grace periods
In a CONFIG_PREEMPT=y kernel, synchronize_rcu_expedited()
acquires the ->exp_funnel_mutex in rcu_preempt_state, then invokes
synchronize_sched_expedited, which acquires the ->exp_funnel_mutex in
rcu_sched_state.  There can be no deadlock because rcu_preempt_state
->exp_funnel_mutex acquisition always precedes that of rcu_sched_state.
But lockdep does not know that, so it gives false-positive splats.

This commit therefore associates a separate lock_class_key structure
with the rcu_sched_state structure's ->exp_funnel_mutex, allowing
lockdep to see the lock ordering, avoiding the false positives.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-08-04 08:39:21 -07:00
Paul E. McKenney
9a54f98e34 rcu: Don't disable CPU hotplug during OOM notifiers
RCU's rcu_oom_notify() disables CPU hotplug in order to stabilize the
list of online CPUs, which it traverses.  However, this is completely
pointless because smp_call_function_single() will quietly fail if invoked
on an offline CPU.  Because the count of requests is incremented in the
rcu_oom_notify_cpu() function that is remotely invoked, everything works
nicely even in the face of concurrent CPU-hotplug operations.

Furthermore, in recent kernels, invoking get_online_cpus() from an OOM
notifier can result in deadlock.  This commit therefore removes the
call to get_online_cpus() and put_online_cpus() from rcu_oom_notify().

Reported-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-07-22 15:27:43 -07:00
Paul E. McKenney
a76a9a485d rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
The RCU_LOCKDEP_WARN() in synchronize_rcu_tasks() triggers if the
scheduler is active, which is backwards.  This commit therefore
negates the test.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:33 -07:00
Paul E. McKenney
f78f5b90c4 rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2015-07-22 15:27:32 -07:00
Alexei Starovoitov
46f00d18fc rcu: Make rcu_is_watching() really notrace
Although rcu_is_watching() is marked notrace, it invokes preempt_disable()
and preempt_enable(), both of which can be traced.  This defeats the
purpose of the notrace on rcu_is_watching(), so this commit substitutes
preempt_disable_notrace() and preempt_enable_notrace().

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-22 15:27:31 -07:00
Paul E. McKenney
ec90a194ae rcu: Create a synchronize_rcu_mult()
There have been several requests for a primitive that waits for
grace periods for several RCU flavors concurrently, so this
commit creates it.  This is a variadic macro, and you pass in
the call_rcu() functions of the flavors of RCU that you wish to
wait for.

Note that you cannot pass in call_srcu() for two reasons: (1) This
would result in a type mismatch and (2) You need to specify which
srcu_struct you want to use.  Handle this by creating a wrapper
function for your SRCU domain, for example:

	void call_srcu_mine(struct rcu_head *head, rcu_callback_t func)
	{
		call_srcu(&ss_mine, head, func);
	}

You can then do something like this:

	synchronize_rcu_mult(call_srcu_mine, call_rcu, call_rcu_sched);

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:29 -07:00
Paul E. McKenney
bc17ea1092 rcu: Fix obsolete priority-boosting comment
Tasks are no longer migrated to the root rcu_node, so there is no
longer any need for a boost kthread for the root rcu_node, and there no
longer is such a kthread.  This commit therefore fixes the comment in
rcu_boost_kthread()'s header to reflect this new reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:28 -07:00
Paul E. McKenney
24560056de rcu: Add RCU-sched flavors of get-state and cond-sync
The get_state_synchronize_rcu() and cond_synchronize_rcu() functions
allow polling for grace-period completion, with an actual wait for a
grace period occurring only when cond_synchronize_rcu() is called too
soon after the corresponding get_state_synchronize_rcu().  However,
these functions work only for vanilla RCU.  This commit adds the
get_state_synchronize_sched() and cond_synchronize_sched(), which provide
the same capability for RCU-sched.

Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:26:58 -07:00
Paul E. McKenney
cdacbe1f91 rcu: Add fastpath bypassing funnel locking
In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking.  This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:06 -07:00
Paul E. McKenney
32bb1c7999 rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
The grace-period kthread sleeps waiting to do a force-quiescent-state
scan, and when awakened sets rsp->gp_state to RCU_GP_DONE_FQS.
However, this is confusing because the kthread has not done the
force-quiescent-state, but is instead just starting to do it.  This commit
therefore renames RCU_GP_DONE_FQS to RCU_GP_DOING_FQS in order to make
things a bit easier on reviewers.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:05 -07:00
Paul E. McKenney
b9a425cfcb rcu: Pull out wait_event*() condition into helper function
The condition for the wait_event_interruptible_timeout() that waits
to do the next force-quiescent-state scan is a bit ornate:

	((gf = READ_ONCE(rsp->gp_flags)) &
	 RCU_GP_FLAG_FQS) ||
	(!READ_ONCE(rnp->qsmask) &&
	 !rcu_preempt_blocked_readers_cgp(rnp))

This commit therefore pulls this condition out into a helper function
and comments its component conditions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:04 -07:00
Paul E. McKenney
cf3620a6c7 rcu: Add stall warnings to synchronize_sched_expedited()
Although synchronize_sched_expedited() historically has no RCU CPU stall
warnings, the availability of the rcupdate.rcu_expedited boot parameter
invalidates the old assumption that synchronize_sched()'s stall warnings
would suffice.  This commit therefore adds RCU CPU stall warnings to
synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:01 -07:00
Paul E. McKenney
2cd6ffafec rcu: Extend expedited funnel locking to rcu_data structure
The strictly rcu_node based funnel-locking scheme works well in many
cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get
all that much concurrency.  This commit therefore extends the funnel
locking into the per-CPU rcu_data structure, providing concurrency equal
to the number of CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:00 -07:00
Paul E. McKenney
704dd435ac rcu: Consolidate last open-coded expedited memory barrier
One of the requirements on RCU grace periods is that if there is a
causal chain of operations that starts after one grace period and
ends before another grace period, then the two grace periods must
be serialized.  There has been (and might still be) code that relies
on this, for example, certain types of reference-counting code that
does a call_rcu() within an RCU callback function.

This requirement is why there is an smp_mb() at the end of both
synchronize_sched_expedited() and synchronize_rcu_expedited().
However, this is the only smp_mb() in these functions, so it would
be nicer to consolidate it into rcu_exp_gp_seq_end().  This commit
does just that.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:59 -07:00
Paul E. McKenney
4f525a528b rcu: Apply rcu_seq operations to _rcu_barrier()
The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:57 -07:00
Paul E. McKenney
29fd930940 rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop
This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock()
polling loop in favor of the funnel-locking scheme that was abstracted
from synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:56 -07:00
Paul E. McKenney
7fd0ddc5bf rcu: Fix synchronize_sched_expedited() type error for "s"
The type of "s" has been "long" rather than the correct "unsigned long"
for quite some time.  This commit fixes this type error.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:55 -07:00
Paul E. McKenney
b09e5f8601 rcu: Abstract funnel locking from synchronize_sched_expedited()
This commit abstracts funnel locking from synchronize_sched_expedited()
so that it may be used by synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:53 -07:00
Paul E. McKenney
543c6158f6 rcu: Make synchronize_rcu_expedited() use sequence-counter scheme
Although synchronize_rcu_expedited() uses a sequence-counter scheme, it
is based on a single increment per grace period, which means that tasks
piggybacking off of concurrent grace periods may be forced to wait longer
than necessary.  This commit therefore applies the new sequence-count
functions developed for synchronize_sched_expedited() to speed things
up a bit and to consolidate the sequence-counter implementation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:52 -07:00
Paul E. McKenney
28f00767e3 rcu: Abstract sequence counting from synchronize_sched_expedited()
This commit creates rcu_exp_gp_seq_start() and rcu_exp_gp_seq_end() to
bracket an expedited grace period, rcu_exp_gp_seq_snap() to snapshot the
sequence counter, and rcu_exp_gp_seq_done() to check to see if a full
expedited grace period has elapsed since the snapshot.  These will be
applied to synchronize_rcu_expedited().  These are defined in terms of
underlying rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), rcu_seq_done(),
which will be applied to _rcu_barrier().

One reason that this commit doesn't use the seqcount primitives themselves
is that the smp_wmb() in those primitive is insufficient due to the fact
that expedited grace periods do reads as well as writes.  In addition,
the read-side seqcount primitives detect a potentially partial change,
where the expedited primitives instead need a guaranteed full change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:51 -07:00
Peter Zijlstra
3a6d7c64d7 rcu: Make expedited GP CPU stoppage asynchronous
Sequentially stopping the CPUs slows down expedited grace periods by
at least a factor of two, based on rcutorture's grace-period-per-second
rate.  This is a conservative measure because rcutorture uses unusually
long RCU read-side critical sections and because rcutorture periodically
quiesces the system in order to test RCU's ability to ramp down to and
up from the idle state.  This commit therefore replaces the stop_one_cpu()
with stop_one_cpu_nowait(), using an atomic-counter scheme to determine
when all CPUs have passed through the stopped state.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:50 -07:00
Paul E. McKenney
385b73c06f rcu: Get rid of synchronize_sched_expedited()'s polling loop
This commit gets rid of synchronize_sched_expedited()'s mutex_trylock()
polling loop in favor of a funnel-locking scheme based on the rcu_node
tree.  The work-done check is done at each level of the tree, allowing
high-contention situations to be resolved quickly with reasonable levels
of mutex contention.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:48 -07:00
Paul E. McKenney
d6ada2cf2f rcu: Rework synchronize_sched_expedited() counter handling
Now that synchronize_sched_expedited() have a mutex, it can use simpler
work-already-done detection scheme.  This commit simplifies this scheme
by using something similar to the sequence-locking counter scheme.
A counter is incremented before and after each grace period, so that
the counter is odd in the midst of the grace period and even otherwise.
So if the counter has advanced to the second even number that is
greater than or equal to the snapshot, the required grace period has
already happened.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:47 -07:00
Peter Zijlstra
c190c3b16c rcu: Switch synchronize_sched_expedited() to stop_one_cpu()
The synchronize_sched_expedited() currently invokes try_stop_cpus(),
which schedules the stopper kthreads on each online non-idle CPU,
and waits until all those kthreads are running before letting any
of them stop.  This is disastrous for real-time workloads, which
get hit with a preemption that is as long as the longest scheduling
latency on any CPU, including any non-realtime housekeeping CPUs.
This commit therefore switches to using stop_one_cpu() on each CPU
in turn.  This avoids inflicting the worst-case scheduling latency
on the worst-case CPU onto all other CPUs, and also simplifies the
code a little bit.

Follow-up commits will simplify the counter-snapshotting algorithm
and convert a number of the counters that are now protected by the
new ->expedited_mutex to non-atomic.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
[ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:45 -07:00
Paul E. McKenney
75c27f119b rcu: Remove CONFIG_RCU_CPU_STALL_INFO
The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of
releases with no complaints, so it is time to eliminate this Kconfig
option entirely, so that the long-form RCU CPU stall warnings cannot
be disabled.  This commit does just that.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:44 -07:00
Paul E. McKenney
9b68387450 rcu: Stop disabling CPU hotplug in synchronize_rcu_expedited()
The fact that tasks could be migrated from leaf to root rcu_node
structures meant that synchronize_rcu_expedited() had to disable
CPU hotplug.  However, tasks now stay put, so this commit removes the
CPU-hotplug disabling from synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:42 -07:00
Paul E. McKenney
13bd64947f rcu: Reset rcu_fanout_leaf if out of bounds
Currently if the rcu_fanout_leaf boot parameter is out of bounds (that
is, less than RCU_FANOUT_LEAF or greater than the number of bits in an
unsigned long), a warning is issued and execution continues with the
out-of-bounds value.  This can result in all manner of failures, so this
patch resets rcu_fanout_leaf to RCU_FANOUT_LEAF when an out-of-bounds
condition is detected.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:41 -07:00
Alexander Gordeev
032dfc8722 rcu: Shut up bogus gcc array bounds warning
Because gcc does not realize a loop would not be entered ever
(i.e. in case of rcu_num_lvls == 1):

  for (i = 1; i < rcu_num_lvls; i++)
	  rsp->level[i] = rsp->level[i - 1] + levelcnt[i - 1];

some compiler (pre- 5.x?) versions give a bogus warning:

  kernel/rcu/tree.c: In function ‘rcu_init_one.isra.55’:
  kernel/rcu/tree.c:4108:13: warning: array subscript is above array bounds [-Warray-bounds]
     rsp->level[i] = rsp->level[i - 1] + rsp->levelcnt[i - 1];
               ^
Fix that warning by adding an extra item to rcu_state::level[]
array. Once the bogus warning is fixed in gcc and kernel drops
support of older versions, the dummy item may be removed from
the array.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:40 -07:00
Paul E. McKenney
5be5d1a117 rcutorture: Add RCU-tasks qualifier to dereference
Although RCU-tasks isn't really designed to support rcu_dereference()
and list manipulation, that is how rcutorture tests it.  Which means
that lockdep-RCU complains about the rcu_dereference_check() invocations
because RCU-tasks doesn't have read-side markers.  This commit therefore
creates a torturing_tasks() to silence the lockdep-RCU complaints from
rcu_dereference_check() when RCU-tasks is being tortured.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:47:19 -07:00
Paul E. McKenney
3a0af33341 rcutorture: Fix rcu_torture_cbflood() for callback-free RCU
The rcu_torture_cbflood() function correctly checks for flavors of
RCU that lack analogs to call_rcu() and rcu_barrier(), but in that
case it fails to terminate correctly.  In fact, it terminates so
incorrectly that segfaults can result.  This commit therefore causes
rcu_torture_cbflood() to do the proper wait-for-stop procedure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:47:17 -07:00
Paul E. McKenney
e8e255f719 rcutorture: Bounds-check rcutorture.shuffle_interval
Specifying a negative rcutorture.shuffle_interval value will cause a
negative value to be used as a sleep time.  This commit therefore
refuses to start shuffling unless the rcutorture.shuffle_interval
value is greater than zero.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:47:16 -07:00
Paul E. McKenney
4444d852a9 rcutorture: Check nfakewriters parameter
Currently, a negative value for rcutorture.nfakewriters= can cause
rcutorture to pass a negative size to the memory allocator, which
is not really a particularly good thing to do.  This commit therefore
adds bounds checking to this parameter, so that values that are less
than or equal to zero disable fake writing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:47:15 -07:00
Paul E. McKenney
d9eba76883 rcutorture: Better bounds checking for n_barrier_cbs
A negative value for rcutorture.n_barrier_cbs can pass a negative value
to the memory allocator, so this commit instead causes rcu_barrier()
testing to be disabled in this case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:47:14 -07:00
Alexander Gordeev
426216970e rcu: Simplify arithmetic to calculate number of RCU nodes
This update makes arithmetic to calculate number of RCU nodes
more straight and easy to read.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:21 -07:00
Alexander Gordeev
cb00710239 rcu: Limit count of static data to the number of RCU levels
Although a number of RCU levels may be less than the current
maximum of four, some static data associated with each level
are allocated for all four levels. As result, the extra data
never get accessed and just wast memory. This update limits
count of allocated items to the number of used RCU levels.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:20 -07:00
Alexander Gordeev
199977bff9 rcu: Remove unnecessary fields from rcu_state structure
Members rcu_state::levelcnt[] and rcu_state::levelspread[]
are only used at init. There is no reason to keep them
afterwards.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:19 -07:00
Alexander Gordeev
05b84aec46 rcu: Limit rcu_capacity[] size to RCU_NUM_LVLS items
Number of items in rcu_capacity[] array is defined by macro
MAX_RCU_LVLS. However, that array is never accessed beyond
RCU_NUM_LVLS index. Therefore, we can limit the array to
RCU_NUM_LVLS items and eliminate MAX_RCU_LVLS. As result,
in most cases the memory is conserved.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:18 -07:00
Alexander Gordeev
a6d77081e2 rcu: Limit rcu_state::levelcnt[] to RCU_NUM_LVLS items
Variable rcu_num_lvls is limited by RCU_NUM_LVLS macro.
In turn, rcu_state::levelcnt[] array is never accessed
beyond rcu_num_lvls. Thus, rcu_state::levelcnt[] is safe
to limit to RCU_NUM_LVLS items.

Since rcu_num_lvls could be changed during boot (as result
of rcutree.rcu_fanout_leaf kernel parameter update) one might
assume a new value could overflow the value of RCU_NUM_LVLS.
However, that is not the case, since leaf-level fanout is only
permitted to increase, resulting in rcu_num_lvls possibly to
decrease.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:16 -07:00
Alexander Gordeev
9618138b09 rcu: Simplify rcu_init_geometry() capacity arithmetics
Current code suggests that introducing the extra level to
rcu_capacity[] array makes some of the arithmetic easier.
Well, in fact it appears rather confusing and unnecessary.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:15 -07:00
Alexander Gordeev
679f9858b1 rcu: Cleanup rcu_init_geometry() code and arithmetics
This update simplifies rcu_init_geometry() code flow
and makes calculation of the total number of rcu_node
structures more easy to read.

The update relies on the fact num_rcu_lvl[] is never
accessed beyond rcu_num_lvls index by the rest of the
code. Therefore, there is no need initialize the whole
num_rcu_lvl[].

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:14 -07:00
Alexander Gordeev
372b0ec24f rcu: Remove superfluous local variable in rcu_init_geometry()
Local variable 'n' mimics 'nr_cpu_ids' while the both are
used within one function. There is no reason for 'n' to
exist whatsoever.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:13 -07:00
Alexander Gordeev
75cf15a4c0 rcu: Panic if RCU tree can not accommodate all CPUs
Currently a condition when RCU tree is unable to accommodate
the configured number of CPUs is not permitted and causes
a fall back to compile-time values. However, the code has no
means to exceed the RCU tree capacity neither at compile-time
nor in run-time. Therefore, if the condition is met in run-
time then it indicates a serios problem elsewhere and should
be handled with a panic.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:12 -07:00
Paul E. McKenney
319362c90f rcu: Provide more diagnostics for stalled GP kthread
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:45:10 -07:00
Nicholas Mc Guire
f765d11307 rcu: Change return type to bool
Type-checking coccinelle spatches are being used to locate type mismatches
between function signatures and return values in this case this produced:
./kernel/rcu/srcu.c:271 WARNING: return of wrong type
        int != unsigned long,

srcu_readers_active() returns an int that is the sum of per_cpu unsigned
long but the only user is cleanup_srcu_struct() which is using it as a
boolean (condition) to see if there is any readers rather than actually
using the approximate number of readers. The theoretically possible
unsigned long overflow case does not need to be handled explicitly - if
we had 4G++ readers then something else went wrong a long time ago.

proposal: change the return type to boolean. The function name is left
          unchanged as it fits the naming expectation for a boolean.

patch was compile tested for x86_64_defconfig (implies CONFIG_SRCU=y)

patch is against 4.1-rc5 (localversion-next is -next-20150525)

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:43:53 -07:00
Denys Vlasenko
d5671f6bf2 rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC
DEBUG_LOCK_ALLOC=y is not a production setting, but it is
not very unusual either. Many developers routinely
use kernels built with it enabled.

Apart from being selected by hand, it is also auto-selected by
PROVE_LOCKING "Lock debugging: prove locking correctness" and
LOCK_STAT "Lock usage statistics" config options.
LOCK STAT is necessary for "perf lock" to work.

I wouldn't spend too much time optimizing it, but this particular
function has a very large cost in code size: when it is deinlined,
code size decreases by 830,000 bytes:

    text     data      bss       dec     hex filename
85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
84837612 22294424 20627456 127759492 79d7484 vmlinux

(with this config: http://busybox.net/~vda/kernel_config)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Tejun Heo <tj@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: linux-kernel@vger.kernel.org
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:43:52 -07:00
Paul E. McKenney
d1ec4c34c7 rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL
The RCU_USER_QS Kconfig parameter is now just a synonym for NO_HZ_FULL,
so this commit eliminates RCU_USER_QS, replacing all uses with NO_HZ_FULL.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-06 13:52:18 -07:00
Linus Torvalds
e382608254 This patch series contains several clean ups and even a new trace clock
"monitonic raw". Also some enhancements to make the ring buffer even
 faster. But the biggest and most noticeable change is the renaming of
 the ftrace* files, structures and variables that have to deal with
 trace events.
 
 Over the years I've had several developers tell me about their confusion
 with what ftrace is compared to events. Technically, "ftrace" is the
 infrastructure to do the function hooks, which include tracing and also
 helps with live kernel patching. But the trace events are a separate
 entity altogether, and the files that affect the trace events should
 not be named "ftrace". These include:
 
   include/trace/ftrace.h	->	include/trace/trace_events.h
   include/linux/ftrace_event.h	->	include/linux/trace_events.h
 
 Also, functions that are specific for trace events have also been renamed:
 
   ftrace_print_*()		->	trace_print_*()
   (un)register_ftrace_event()	->	(un)register_trace_event()
   ftrace_event_name()		->	trace_event_name()
   ftrace_trigger_soft_disabled()->	trace_trigger_soft_disabled()
   ftrace_define_fields_##call() ->	trace_define_fields_##call()
   ftrace_get_offsets_##call()	->	trace_get_offsets_##call()
 
 Structures have been renamed:
 
   ftrace_event_file		->	trace_event_file
   ftrace_event_{call,class}	->	trace_event_{call,class}
   ftrace_event_buffer		->	trace_event_buffer
   ftrace_subsystem_dir		->	trace_subsystem_dir
   ftrace_event_raw_##call	->	trace_event_raw_##call
   ftrace_event_data_offset_##call->	trace_event_data_offset_##call
   ftrace_event_type_funcs_##call ->	trace_event_type_funcs_##call
 
 And a few various variables and flags have also been updated.
 
 This has been sitting in linux-next for some time, and I have not heard
 a single complaint about this rename breaking anything. Mostly because
 these functions, variables and structures are mostly internal to the
 tracing system and are seldom (if ever) used by anything external to that.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJViYhVAAoJEEjnJuOKh9ldcJ0IAI+mytwoMAN/CWDE8pXrTrgs
 aHlcr1zorSzZ0Lq6lKsWP+V0VGVhP8KWO16vl35HaM5ZB9U+cDzWiGobI8JTHi/3
 eeTAPTjQdgrr/L+ZO1ApzS1jYPhN3Xi5L7xublcYMJjKfzU+bcYXg/x8gRt0QbG3
 S9QN/kBt0JIIjT7McN64m5JVk2OiU36LxXxwHgCqJvVCPHUrriAdIX7Z5KRpEv13
 zxgCN4d7Jiec/FsMW8dkO0vRlVAvudZWLL7oDmdsvNhnLy8nE79UOeHos2c1qifQ
 LV4DeQ+2Hlu7w9wxixHuoOgNXDUEiQPJXzPc/CuCahiTL9N/urQSGQDoOVMltR4=
 =hkdz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This patch series contains several clean ups and even a new trace
  clock "monitonic raw".  Also some enhancements to make the ring buffer
  even faster.  But the biggest and most noticeable change is the
  renaming of the ftrace* files, structures and variables that have to
  deal with trace events.

  Over the years I've had several developers tell me about their
  confusion with what ftrace is compared to events.  Technically,
  "ftrace" is the infrastructure to do the function hooks, which include
  tracing and also helps with live kernel patching.  But the trace
  events are a separate entity altogether, and the files that affect the
  trace events should not be named "ftrace".  These include:

    include/trace/ftrace.h         ->    include/trace/trace_events.h
    include/linux/ftrace_event.h   ->    include/linux/trace_events.h

  Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*()               ->    trace_print_*()
    (un)register_ftrace_event()    ->    (un)register_trace_event()
    ftrace_event_name()            ->    trace_event_name()
    ftrace_trigger_soft_disabled() ->    trace_trigger_soft_disabled()
    ftrace_define_fields_##call()  ->    trace_define_fields_##call()
    ftrace_get_offsets_##call()    ->    trace_get_offsets_##call()

  Structures have been renamed:

    ftrace_event_file              ->    trace_event_file
    ftrace_event_{call,class}      ->    trace_event_{call,class}
    ftrace_event_buffer            ->    trace_event_buffer
    ftrace_subsystem_dir           ->    trace_subsystem_dir
    ftrace_event_raw_##call        ->    trace_event_raw_##call
    ftrace_event_data_offset_##call->    trace_event_data_offset_##call
    ftrace_event_type_funcs_##call ->    trace_event_type_funcs_##call

  And a few various variables and flags have also been updated.

  This has been sitting in linux-next for some time, and I have not
  heard a single complaint about this rename breaking anything.  Mostly
  because these functions, variables and structures are mostly internal
  to the tracing system and are seldom (if ever) used by anything
  external to that"

* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  ring_buffer: Allow to exit the ring buffer benchmark immediately
  ring-buffer-benchmark: Fix the wrong type
  ring-buffer-benchmark: Fix the wrong param in module_param
  ring-buffer: Add enum names for the context levels
  ring-buffer: Remove useless unused tracing_off_permanent()
  ring-buffer: Give NMIs a chance to lock the reader_lock
  ring-buffer: Add trace_recursive checks to ring_buffer_write()
  ring-buffer: Allways do the trace_recursive checks
  ring-buffer: Move recursive check to per_cpu descriptor
  ring-buffer: Add unlikelys to make fast path the default
  tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
  tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
  tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
  tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
  tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
  tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
  tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
  tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
  tracing: Rename ftrace_event_name() to trace_event_name()
  tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
  ...
2015-06-26 14:02:43 -07:00
Linus Torvalds
43224b96af Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "A rather largish update for everything time and timer related:

   - Cache footprint optimizations for both hrtimers and timer wheel

   - Lower the NOHZ impact on systems which have NOHZ or timer migration
     disabled at runtime.

   - Optimize run time overhead of hrtimer interrupt by making the clock
     offset updates smarter

   - hrtimer cleanups and removal of restrictions to tackle some
     problems in sched/perf

   - Some more leap second tweaks

   - Another round of changes addressing the 2038 problem

   - First step to change the internals of clock event devices by
     introducing the necessary infrastructure

   - Allow constant folding for usecs/msecs_to_jiffies()

   - The usual pile of clockevent/clocksource driver updates

  The hrtimer changes contain updates to sched, perf and x86 as they
  depend on them plus changes all over the tree to cleanup API changes
  and redundant code, which got copied all over the place.  The y2038
  changes touch s390 to remove the last non 2038 safe code related to
  boot/persistant clock"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  clocksource: Increase dependencies of timer-stm32 to limit build wreckage
  timer: Minimize nohz off overhead
  timer: Reduce timer migration overhead if disabled
  timer: Stats: Simplify the flags handling
  timer: Replace timer base by a cpu index
  timer: Use hlist for the timer wheel hash buckets
  timer: Remove FIFO "guarantee"
  timers: Sanitize catchup_timer_jiffies() usage
  hrtimer: Allow hrtimer::function() to free the timer
  seqcount: Introduce raw_write_seqcount_barrier()
  seqcount: Rename write_seqcount_barrier()
  hrtimer: Fix hrtimer_is_queued() hole
  hrtimer: Remove HRTIMER_STATE_MIGRATE
  selftest: Timers: Avoid signal deadlock in leap-a-day
  timekeeping: Copy the shadow-timekeeper over the real timekeeper last
  clockevents: Check state instead of mode in suspend/resume path
  selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  ntp: Do leapsecond adjustment in adjtimex read path
  time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
  ntp: Introduce and use SECS_PER_DAY macro instead of 86400
  ...
2015-06-22 18:57:44 -07:00
Thomas Gleixner
bc7a34b8b9 timer: Reduce timer migration overhead if disabled
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.

We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.

The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.

With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.

Before:
  47.76%  hog       [.] main
  14.84%  [kernel]  [k] _raw_spin_lock_irqsave
   9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.71%  [kernel]  [k] mod_timer
   6.24%  [kernel]  [k] lock_timer_base.isra.38
   3.76%  [kernel]  [k] detach_if_pending
   3.71%  [kernel]  [k] del_timer
   2.50%  [kernel]  [k] internal_add_timer
   1.51%  [kernel]  [k] get_nohz_timer_target
   1.28%  [kernel]  [k] __internal_add_timer
   0.78%  [kernel]  [k] timerfn
   0.48%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu


Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 15:18:28 +02:00
Paul E. McKenney
0868aa2216 Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', 'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEAD
array.2015.05.27a:  Remove all uses of RCU-protected array indexes.
doc.2015.05.27a:  Docuemntation updates.
fixes.2015.05.27a:  Miscellaneous fixes.
hotplug.2015.05.27a:  CPU-hotplug updates.
init.2015.05.27a:  Initialization/Kconfig updates.
tiny.2015.05.27a:  Updates to Tiny RCU.
torture.2015.05.27a:  Torture-testing updates.
2015-05-27 13:00:49 -07:00
Paul E. McKenney
ca1d51ed98 rcutorture: Test SRCU cleanup code path
The current rcutorture testing does not do any cleanup operations.
This works because the srcu_struct is statically allocated, but it
does represent a memory leak of the associated dynamically allocated
->per_cpu_ref per-CPU variables.  However, rcutorture currently uses
a statically allocated srcu_struct, which cannot legally be passed to
cleanup_srcu_struct().  Therefore, this commit adds a second form
of srcu (called srcud) that dynamically allocates and frees the
associated per-CPU variables.  This commit also adds a ->cleanup()
member to rcu_torture_ops that is invoked at the end of the test,
after ->cb_barriers().  This ->cleanup() pointer is NULL for all
existing tests, and thus only used for scrud.  Finally, the SRCU-P
torture-test configuration selects scrud instead of srcu, with SRCU-N
continuing to use srcu, thereby testing both static and dynamic
srcu_struct structures.

Reported-by: "Ahmed, Iftekhar" <ahmedi@onid.oregonstate.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-05-27 12:59:58 -07:00