mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-08 15:30:18 +00:00
Automatic NUMA balancing depends on being able to protect PTEs to trap a
fault and gather reference locality information. Very broadly speaking
it would mark PTEs as not present and use another bit to distinguish
between NUMA hinting faults and other types of faults. It was
universally loved by everybody and caused no problems whatsoever. That
last sentence might be a lie.
This series is very heavily based on patches from Linus and Aneesh to
replace the existing PTE/PMD NUMA helper functions with normal change
protections. I did alter and add parts of it but I consider them
relatively minor contributions. At their suggestion, acked-bys are in
there but I've no problem converting them to Signed-off-by if requested.
AFAIK, this has received no testing on ppc64 and I'm depending on Aneesh
for that. I tested trinity under kvm-tool and passed and ran a few
other basic tests. At the time of writing, only the short-lived tests
have completed but testing of V2 indicated that long-term testing had no
surprises. In most cases I'm leaving out detail as it's not that
interesting.
specjbb single JVM: There was negligible performance difference in the
benchmark itself for short runs. However, system activity is
higher and interrupts are much higher over time -- possibly TLB
flushes. Migrations are also higher. Overall, this is more overhead
but considering the problems faced with the old approach I think
we just have to suck it up and find another way of reducing the
overhead.
specjbb multi JVM: Negligible performance difference to the actual benchmark
but like the single JVM case, the system overhead is noticeably
higher. Again, interrupts are a major factor.
autonumabench: This was all over the place and about all that can be
reasonably concluded is that it's different but not necessarily
better or worse.
autonumabench
3.18.0-rc5 3.18.0-rc5
mmotm-20141119 protnone-v3r3
User NUMA01 32380.24 ( 0.00%) 21642.92 ( 33.16%)
User NUMA01_THEADLOCAL 22481.02 ( 0.00%) 22283.22 ( 0.88%)
User NUMA02 3137.00 ( 0.00%) 3116.54 ( 0.65%)
User NUMA02_SMT 1614.03 ( 0.00%) 1543.53 ( 4.37%)
System NUMA01 322.97 ( 0.00%) 1465.89 (-353.88%)
System NUMA01_THEADLOCAL 91.87 ( 0.00%) 49.32 ( 46.32%)
System NUMA02 37.83 ( 0.00%) 14.61 ( 61.38%)
System NUMA02_SMT 7.36 ( 0.00%) 7.45 ( -1.22%)
Elapsed NUMA01 716.63 ( 0.00%) 599.29 ( 16.37%)
Elapsed NUMA01_THEADLOCAL 553.98 ( 0.00%) 539.94 ( 2.53%)
Elapsed NUMA02 83.85 ( 0.00%) 83.04 ( 0.97%)
Elapsed NUMA02_SMT 86.57 ( 0.00%) 79.15 ( 8.57%)
CPU NUMA01 4563.00 ( 0.00%) 3855.00 ( 15.52%)
CPU NUMA01_THEADLOCAL 4074.00 ( 0.00%) 4136.00 ( -1.52%)
CPU NUMA02 3785.00 ( 0.00%) 3770.00 ( 0.40%)
CPU NUMA02_SMT 1872.00 ( 0.00%) 1959.00 ( -4.65%)
System CPU usage of NUMA01 is worse but it's an adverse workload on this
machine so I'm reluctant to conclude that it's a problem that matters. On
the other workloads that are sensible on this machine, system CPU usage is
great. Overall time to complete the benchmark is comparable
3.18.0-rc5 3.18.0-rc5
mmotm-20141119protnone-v3r3
User 59612.50 48586.44
System 460.22 1537.45
Elapsed 1442.20 1304.29
NUMA alloc hit 5075182 5743353
NUMA alloc miss 0 0
NUMA interleave hit 0 0
NUMA alloc local 5075174 5743339
NUMA base PTE updates 637061448 443106883
NUMA huge PMD updates 1243434 864747
NUMA page range updates 1273699656 885857347
NUMA hint faults 1658116 1214277
NUMA hint local faults 959487 754113
NUMA hint local percent 57 62
NUMA pages migrated 5467056 61676398
The NUMA pages migrated look terrible but when I looked at a graph of the
activity over time I see that the massive spike in migration activity was
during NUMA01. This correlates with high system CPU usage and could be
simply down to bad luck but any modifications that affect that workload
would be related to scan rates and migrations, not the protection
mechanism. For all other workloads, migration activity was comparable.
Overall, headline performance figures are comparable but the overhead is
higher, mostly in interrupts. To some extent, higher overhead from this
approach was anticipated but not to this degree. It's going to be
necessary to reduce this again with a separate series in the future. It's
still worth going ahead with this series though as it's likely to avoid
constant headaches with Xen and is probably easier to maintain.
This patch (of 10):
A transhuge NUMA hinting fault may find the page is migrating and should
wait until migration completes. The check is race-prone because the pmd
is deferenced outside of the page lock and while the race is tiny, it'll
be larger if the PMD is cleared while marking PMDs for hinting fault.
This patch closes the race.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
107 lines
3.1 KiB
C
107 lines
3.1 KiB
C
#ifndef _LINUX_MIGRATE_H
|
|
#define _LINUX_MIGRATE_H
|
|
|
|
#include <linux/mm.h>
|
|
#include <linux/mempolicy.h>
|
|
#include <linux/migrate_mode.h>
|
|
|
|
typedef struct page *new_page_t(struct page *page, unsigned long private,
|
|
int **reason);
|
|
typedef void free_page_t(struct page *page, unsigned long private);
|
|
|
|
/*
|
|
* Return values from addresss_space_operations.migratepage():
|
|
* - negative errno on page migration failure;
|
|
* - zero on page migration success;
|
|
*/
|
|
#define MIGRATEPAGE_SUCCESS 0
|
|
|
|
enum migrate_reason {
|
|
MR_COMPACTION,
|
|
MR_MEMORY_FAILURE,
|
|
MR_MEMORY_HOTPLUG,
|
|
MR_SYSCALL, /* also applies to cpusets */
|
|
MR_MEMPOLICY_MBIND,
|
|
MR_NUMA_MISPLACED,
|
|
MR_CMA
|
|
};
|
|
|
|
#ifdef CONFIG_MIGRATION
|
|
|
|
extern void putback_movable_pages(struct list_head *l);
|
|
extern int migrate_page(struct address_space *,
|
|
struct page *, struct page *, enum migrate_mode);
|
|
extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
|
|
unsigned long private, enum migrate_mode mode, int reason);
|
|
|
|
extern int migrate_prep(void);
|
|
extern int migrate_prep_local(void);
|
|
extern void migrate_page_copy(struct page *newpage, struct page *page);
|
|
extern int migrate_huge_page_move_mapping(struct address_space *mapping,
|
|
struct page *newpage, struct page *page);
|
|
extern int migrate_page_move_mapping(struct address_space *mapping,
|
|
struct page *newpage, struct page *page,
|
|
struct buffer_head *head, enum migrate_mode mode,
|
|
int extra_count);
|
|
#else
|
|
|
|
static inline void putback_movable_pages(struct list_head *l) {}
|
|
static inline int migrate_pages(struct list_head *l, new_page_t new,
|
|
free_page_t free, unsigned long private, enum migrate_mode mode,
|
|
int reason)
|
|
{ return -ENOSYS; }
|
|
|
|
static inline int migrate_prep(void) { return -ENOSYS; }
|
|
static inline int migrate_prep_local(void) { return -ENOSYS; }
|
|
|
|
static inline void migrate_page_copy(struct page *newpage,
|
|
struct page *page) {}
|
|
|
|
static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
|
|
struct page *newpage, struct page *page)
|
|
{
|
|
return -ENOSYS;
|
|
}
|
|
|
|
#endif /* CONFIG_MIGRATION */
|
|
|
|
#ifdef CONFIG_NUMA_BALANCING
|
|
extern bool pmd_trans_migrating(pmd_t pmd);
|
|
extern int migrate_misplaced_page(struct page *page,
|
|
struct vm_area_struct *vma, int node);
|
|
extern bool migrate_ratelimited(int node);
|
|
#else
|
|
static inline bool pmd_trans_migrating(pmd_t pmd)
|
|
{
|
|
return false;
|
|
}
|
|
static inline int migrate_misplaced_page(struct page *page,
|
|
struct vm_area_struct *vma, int node)
|
|
{
|
|
return -EAGAIN; /* can't migrate now */
|
|
}
|
|
static inline bool migrate_ratelimited(int node)
|
|
{
|
|
return false;
|
|
}
|
|
#endif /* CONFIG_NUMA_BALANCING */
|
|
|
|
#if defined(CONFIG_NUMA_BALANCING) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
|
|
extern int migrate_misplaced_transhuge_page(struct mm_struct *mm,
|
|
struct vm_area_struct *vma,
|
|
pmd_t *pmd, pmd_t entry,
|
|
unsigned long address,
|
|
struct page *page, int node);
|
|
#else
|
|
static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm,
|
|
struct vm_area_struct *vma,
|
|
pmd_t *pmd, pmd_t entry,
|
|
unsigned long address,
|
|
struct page *page, int node)
|
|
{
|
|
return -EAGAIN;
|
|
}
|
|
#endif /* CONFIG_NUMA_BALANCING && CONFIG_TRANSPARENT_HUGEPAGE*/
|
|
|
|
#endif /* _LINUX_MIGRATE_H */
|