mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2026-01-06 02:26:17 +00:00
Currently preferred node is set to dst_nid which is the last node in the iteration whose group weight or task weight is greater than the current node. However it doesn't guarantee that dst_nid has the numa capacity to move. It also doesn't guarantee that dst_nid has the best_cpu which is the CPU/node ideal for node migration. Lets consider faults on a 4 node system with group weight numbers in different nodes being in 0 < 1 < 2 < 3 proportion. Consider the task is running on 3 and 0 is its preferred node but its capacity is full. Consider nodes 1, 2 and 3 have capacity. Then the task should be migrated to node 1. Currently the task gets moved to node 2. env.dst_nid points to the last node whose faults were greater than current node. Modify to set the preferred node based of best_cpu. Earlier setting preferred node was skipped if nr_active_nodes is 1. This could result in the task being moved out of the preferred node to a random node during regular load balancing. Also while modifying task_numa_migrate(), use sched_setnuma to set preferred node. This ensures out numa accounting is correct. Running SPECjbb2005 on a 4 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 16 25122.9 25549.6 1.698 1 73850 73190 -0.89 Running SPECjbb2005 on a 16 node machine and comparing bops/JVM JVMS LAST_PATCH WITH_PATCH %CHANGE 8 105930 113437 7.08676 1 178624 196130 9.80047 (numbers from v1 based on v4.17-rc5) Testcase Time: Min Max Avg StdDev numa01.sh Real: 435.78 653.81 534.58 83.20 numa01.sh Sys: 121.93 187.18 145.90 23.47 numa01.sh User: 37082.81 51402.80 43647.60 5409.75 numa02.sh Real: 60.64 61.63 61.19 0.40 numa02.sh Sys: 14.72 25.68 19.06 4.03 numa02.sh User: 5210.95 5266.69 5233.30 20.82 numa03.sh Real: 746.51 808.24 780.36 23.88 numa03.sh Sys: 97.26 108.48 105.07 4.28 numa03.sh User: 58956.30 61397.05 60162.95 1050.82 numa04.sh Real: 465.97 519.27 484.81 19.62 numa04.sh Sys: 304.43 359.08 334.68 20.64 numa04.sh User: 37544.16 41186.15 39262.44 1314.91 numa05.sh Real: 411.57 457.20 433.29 16.58 numa05.sh Sys: 230.05 435.48 339.95 67.58 numa05.sh User: 33325.54 36896.31 35637.84 1222.64 Testcase Time: Min Max Avg StdDev %Change numa01.sh Real: 506.35 794.46 599.06 104.26 -10.76% numa01.sh Sys: 150.37 223.56 195.99 24.94 -25.55% numa01.sh User: 43450.69 61752.04 49281.50 6635.33 -11.43% numa02.sh Real: 60.33 62.40 61.31 0.90 -0.195% numa02.sh Sys: 18.12 31.66 24.28 5.89 -21.49% numa02.sh User: 5203.91 5325.32 5260.29 49.98 -0.513% numa03.sh Real: 696.47 853.62 745.80 57.28 4.6339% numa03.sh Sys: 85.68 123.71 97.89 13.48 7.3347% numa03.sh User: 55978.45 66418.63 59254.94 3737.97 1.5323% numa04.sh Real: 444.05 514.83 497.06 26.85 -2.464% numa04.sh Sys: 230.39 375.79 316.23 48.58 5.8343% numa04.sh User: 35403.12 41004.10 39720.80 2163.08 -1.153% numa05.sh Real: 423.09 460.41 439.57 13.92 -1.428% numa05.sh Sys: 287.38 480.15 369.37 68.52 -7.964% numa05.sh User: 34732.12 38016.80 36255.85 1070.51 -1.704% Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1529514181-9842-5-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|---|---|---|
| .. | ||
| bpf | ||
| cgroup | ||
| configs | ||
| debug | ||
| dma | ||
| events | ||
| gcov | ||
| irq | ||
| livepatch | ||
| locking | ||
| power | ||
| printk | ||
| rcu | ||
| sched | ||
| time | ||
| trace | ||
| .gitignore | ||
| acct.c | ||
| async.c | ||
| audit_fsnotify.c | ||
| audit_tree.c | ||
| audit_watch.c | ||
| audit.c | ||
| audit.h | ||
| auditfilter.c | ||
| auditsc.c | ||
| backtracetest.c | ||
| bounds.c | ||
| capability.c | ||
| compat.c | ||
| configs.c | ||
| context_tracking.c | ||
| cpu_pm.c | ||
| cpu.c | ||
| crash_core.c | ||
| crash_dump.c | ||
| cred.c | ||
| delayacct.c | ||
| dma.c | ||
| elfcore.c | ||
| exec_domain.c | ||
| exit.c | ||
| extable.c | ||
| fail_function.c | ||
| fork.c | ||
| freezer.c | ||
| futex_compat.c | ||
| futex.c | ||
| groups.c | ||
| hung_task.c | ||
| iomem.c | ||
| irq_work.c | ||
| jump_label.c | ||
| kallsyms.c | ||
| kcmp.c | ||
| Kconfig.freezer | ||
| Kconfig.hz | ||
| Kconfig.locks | ||
| Kconfig.preempt | ||
| kcov.c | ||
| kexec_core.c | ||
| kexec_file.c | ||
| kexec_internal.h | ||
| kexec.c | ||
| kmod.c | ||
| kprobes.c | ||
| ksysfs.c | ||
| kthread.c | ||
| latencytop.c | ||
| Makefile | ||
| memremap.c | ||
| module_signing.c | ||
| module-internal.h | ||
| module.c | ||
| notifier.c | ||
| nsproxy.c | ||
| padata.c | ||
| panic.c | ||
| params.c | ||
| pid_namespace.c | ||
| pid.c | ||
| profile.c | ||
| ptrace.c | ||
| range.c | ||
| reboot.c | ||
| relay.c | ||
| resource.c | ||
| rseq.c | ||
| seccomp.c | ||
| signal.c | ||
| smp.c | ||
| smpboot.c | ||
| smpboot.h | ||
| softirq.c | ||
| stacktrace.c | ||
| stop_machine.c | ||
| sys_ni.c | ||
| sys.c | ||
| sysctl_binary.c | ||
| sysctl.c | ||
| task_work.c | ||
| taskstats.c | ||
| test_kprobes.c | ||
| torture.c | ||
| tracepoint.c | ||
| tsacct.c | ||
| ucount.c | ||
| uid16.c | ||
| uid16.h | ||
| umh.c | ||
| up.c | ||
| user_namespace.c | ||
| user-return-notifier.c | ||
| user.c | ||
| utsname_sysctl.c | ||
| utsname.c | ||
| watchdog_hld.c | ||
| watchdog.c | ||
| workqueue_internal.h | ||
| workqueue.c | ||