mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-07 22:54:07 +00:00
Arkadiusz reported that enabling memcg's group oom killing causes
strange memcg statistics where there is no task in a memcg despite the
number of tasks in that memcg is not 0. It turned out that there is a
bug in wake_oom_reaper() which allows enqueuing same task twice which
makes impossible to decrease the number of tasks in that memcg due to a
refcount leak.
This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
T1@P1 |T2@P1 |T3@P1 |OOM reaper
----------+----------+----------+------------
# Processing an OOM victim in a different memcg domain.
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
out_of_memory()
oom_kill_process(P1)
do_send_sig_info(SIGKILL, @P1)
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T2@P1)
wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
mutex_unlock(&oom_lock)
# Completed processing an OOM victim in a different memcg domain.
spin_lock(&oom_reaper_lock)
# T1P1 is dequeued.
spin_unlock(&oom_reaper_lock)
but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.
Fix this bug using an approach used by commit
|
||
|---|---|---|
| .. | ||
| autogroup.h | ||
| clock.h | ||
| coredump.h | ||
| cpufreq.h | ||
| cputime.h | ||
| deadline.h | ||
| debug.h | ||
| hotplug.h | ||
| idle.h | ||
| init.h | ||
| isolation.h | ||
| jobctl.h | ||
| loadavg.h | ||
| mm.h | ||
| nohz.h | ||
| numa_balancing.h | ||
| prio.h | ||
| rt.h | ||
| signal.h | ||
| smt.h | ||
| stat.h | ||
| sysctl.h | ||
| task_stack.h | ||
| task.h | ||
| topology.h | ||
| user.h | ||
| wake_q.h | ||
| xacct.h | ||