Commit Graph

7403 Commits

Author SHA1 Message Date
Stanislav Kinsburskii
b28d1ccf92 powerpc/io: Expect immutable pointer in virt_to_phys() prototype
virt_to_phys() doesn't need the address pointer to be mutable.

At the same time allowing it to be mutable leads to the following build
warning for constant pointers:

  warning: passing argument 1 of ‘virt_to_phys’ discards ‘const’ qualifier from pointer target type

Signed-off-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/168155747391.13678.10634415747614468991.stgit@skinsburskii.localdomain
2023-10-19 17:16:20 +11:00
Benjamin Gray
c6519c6df0 powerpc/uaccess: Cast away __user annotation after verification
Sparse reports dereference of a __user pointer. copy_mc_to_user() takes
a __user pointer, verifies it, then calls the generic copy routine
copy_mc_generic().

As we have verified the pointer, cast out the __user annotation when
passing to copy_mc_generic().

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231011053711.93427-10-bgray@linux.ibm.com
2023-10-19 17:16:20 +11:00
Benjamin Gray
8577dd00a6 powerpc/opal: Annotate out param endianness
Sparse reports an endian mismatch with args to opal_int_get_xirr().
Checking the skiboot source[1] shows the function takes a __be32*
(as expected), so update the function declaration to reflect this.

[1]: https://github.com/open-power/skiboot/blob/80e2b1dc73/hw/xive.c#L3479

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231011053711.93427-9-bgray@linux.ibm.com
2023-10-19 17:16:20 +11:00
Benjamin Gray
b7bce57043 powerpc/kvm: Force cast endianness of KVM shared regs
Sparse reports endianness mismatches in the KVM shared regs getter and
setter helpers.

This code has dynamic endianness behind a safe interface, so a force is
warranted here to tell sparse this is OK.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Link: https://msgid.link/20231011053711.93427-8-bgray@linux.ibm.com
2023-10-19 17:15:39 +11:00
Benjamin Gray
2b4a6cc9a1 powerpc: Annotate endianness of various variables and functions
Sparse reports several endianness warnings on variables and functions
that are consistently treated as big endian. There are no
multi-endianness shenanigans going on here so fix these low hanging
fruit up in one patch.

All changes are just type annotations; no endianness switching
operations are introduced by this patch.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231011053711.93427-7-bgray@linux.ibm.com
2023-10-19 17:12:47 +11:00
Christophe Leroy
b1fba034a6 powerpc: Support execute-only on all powerpc
Introduce PAGE_EXECONLY_X macro which provides exec-only rights.
The _X may be seen as redundant with the EXECONLY but it helps
keep consistency, all macros having the EXEC right have _X.

And put it next to PAGE_NONE as PAGE_EXECONLY_X is
somehow PAGE_NONE + EXEC just like all other SOMETHING_X are
just SOMETHING + EXEC.

On book3s/64 PAGE_EXECONLY becomes PAGE_READONLY_X.

On book3s/64, as PAGE_EXECONLY is only valid for Radix add
VM_READ flag in vm_get_page_prot() for non-Radix.

And update access_error() so that a non exec fault on a VM_EXEC only
mapping is always invalid, even when the underlying layer don't
always generate a fault for that.

For 8xx, set PAGE_EXECONLY_X as _PAGE_NA | _PAGE_EXEC.
For others, only set it as just _PAGE_EXEC

With that change, 8xx, e500 and 44x fully honor execute-only
protection.

On 40x that is a partial implementation of execute-only. The
implementation won't be complete because once a TLB has been loaded
via the Instruction TLB miss handler, it will be possible to read
the page. But at least it can't be read unless it is executed first.

On 603 MMU, TLB missed are handled by SW and there are separate
DTLB and ITLB. Execute-only is therefore now supported by not loading
DTLB when read access is not permitted.

On hash (604) MMU it is more tricky because hash table is common to
load/store and execute. Nevertheless it is still possible to check
whether _PAGE_READ is set before loading hash table for a load/store
access. At least it can't be read unless it is executed first.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/4283ea9cbef9ff2fbee468904800e1962bc8fc18.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:47 +11:00
Christophe Leroy
163a72fa89 powerpc: Finally remove _PAGE_USER
_PAGE_USER is now gone on all targets. Remove it completely.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/76ebe74fdaed4297a1d8203a61174650c1d8d278.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:47 +11:00
Christophe Leroy
bac4cffc7c powerpc/32s: Introduce _PAGE_READ and remove _PAGE_USER
On 603 MMU, TLB missed are handled by SW and there are separated
DTLB and ITLB. It is therefore possible to implement execute-only
protection by not loading DTLB when read access is not permitted.

To do that, _PAGE_READ flag is needed but there is no bit available
for it in PTE. On the other hand the only real use of _PAGE_USER is
to implement PAGE_NONE by clearing _PAGE_USER.

As _PAGE_NONE can also be implemented by clearing _PAGE_READ, remove
_PAGE_USER and add _PAGE_READ. Then use the virtual address to know
whether user rights or kernel rights are to be used.

With that change, 603 MMU now honors execute-only protection.

For hash (604) MMU it is more tricky because hash table is common to
load/store and execute. Nevertheless it is still possible to check
whether _PAGE_READ is set before loading hash table for a load/store
access. At least it can't be read unless it is executed first.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/b7702dd5a041ec59055ed2880f4952e94c087a2e.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:47 +11:00
Christophe Leroy
46ebef51fd powerpc/32s: Add _PAGE_WRITE to supplement _PAGE_RW
Several places, _PAGE_RW maps to write permission and don't
always imply read. To make it more clear, do as book3s/64 in
commit c7d54842de ("powerpc/mm: Use _PAGE_READ to indicate
Read access") and use _PAGE_WRITE when more relevant.

For the time being _PAGE_WRITE is equivalent to _PAGE_RW but that
will change when _PAGE_READ gets added in following patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5798782869fe4d2698f104948dabd17657b89395.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:47 +11:00
Christophe Leroy
ed815bd3fe powerpc/40x: Introduce _PAGE_READ and remove _PAGE_USER
_PAGE_USER is used to select the zone. Today zone 0 is kernel
and zone 1 is user.

To implement _PAGE_NONE, _PAGE_USER is cleared, leading to no access
for user but kernel still has access to the page so it's possible for
a user application to write in that page by using a kernel function
as trampoline.

What is really wanted is to have user rights on pages below TASK_SIZE
and no user rights on pages above TASK_SIZE. Use zones for that.
There are 16 zones so lets use the 4 upper address bits to set the
zone and declare zone rights based on TASK_SIZE.

Then drop _PAGE_USER and reuse it as _PAGE_READ that will be checked
in Data TLB miss handler. That will properly handle PAGE_NONE for
both kernel and user.

In addition, it partially implements execute-only right. The
implementation won't be complete because once a TLB has been loaded
via the Instruction TLB miss handler, it will be possible to read
the page. But at least it can't be read unless it is executed first.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2a13e3ba8a5dec43143cc1f9a91ec71ea1529f3c.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:47 +11:00
Christophe Leroy
93820bfeef powerpc/44x: Introduce _PAGE_READ and remove _PAGE_USER
44x MMU has 6 page protection bits:
- R, W, X for supervisor
- R, W, X for user

It means that it can support X without R.

To do that, _PAGE_READ flag is needed but there is no bit available
for it in PTE. On the other hand the only real use of _PAGE_USER is
to implement PAGE_NONE by clearing _PAGE_USER.

As _PAGE_NONE can also be implemented by clearing _PAGE_READ,
remove _PAGE_USER and add _PAGE_READ. In order to insert bits in
one go during TLB miss, move _PAGE_ACCESSED and put _PAGE_READ
just after _PAGE_DIRTY so that _PAGE_DIRTY is copied into SW and
_PAGE_READ into SR at once.

With that change, 44x now also honors execute-only protection.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/043e17987b260b99b45094138c6cb2e89e63d499.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
48cf93bb16 powerpc/e500: Introduce _PAGE_READ and remove _PAGE_USER
e500 MMU has 6 page protection bits:
- R, W, X for supervisor
- R, W, X for user

It means that it can support X without R.

To do that, _PAGE_READ flag is needed.

With 32 bits PTE there is no bit available for it in PTE. On the
other hand the only real use of _PAGE_USER is to implement PAGE_NONE
by clearing _PAGE_USER. As _PAGE_NONE can also be implemented by
clearing _PAGE_READ, remove _PAGE_USER and add _PAGE_READ. Move
_PAGE_PRESENT into bit 30 so that _PAGE_READ can match SR bit.

With 64 bits PTE _PAGE_USER is already the combination of SR and UR
so all we need to do is to rename it _PAGE_READ.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/0849ab6bf7ae2af23f94b0457fa40d0ea3983fe4.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
8e9bd41e4c powerpc/nohash: Replace pte_user() by pte_read()
pte_user() is now only used in pte_access_permitted() to check
access on vmas. User flag is cleared to make a page unreadable.

So rename it pte_read() and remove pte_user() which isn't used
anymore.

For the time being it checks _PAGE_USER but in the near futur
all plateforms will be converted to _PAGE_READ so lets support
both for now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/72cbb5be595e9ef884140def73815ed0b0b37010.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
d20506d472 powerpc/nohash: Add _PAGE_WRITE to supplement _PAGE_RW
Several places, _PAGE_RW maps to write permission and don't
always imply read. To make it more clear, do as book3s/64 in
commit c7d54842de ("powerpc/mm: Use _PAGE_READ to indicate
Read access") and use _PAGE_WRITE when more relevant.

For the time being _PAGE_WRITE is equivalent to _PAGE_RW but that
will change when _PAGE_READ gets added in following patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1f79b88db54d030ada776dc9845e0e88345bfc28.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
58f534623c powerpc/64s: Use generic permission masks
book3s64 need specific masks because it needs _PAGE_PRIVILEGED
for PAGE_NONE.

book3s64 already has _PAGE_RW and _PAGE_RWX.
So add _PAGE_NA, _PAGE_RO and _PAGE_ROX and remove specific
permission masks.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d37a418de52e2c950cad6797e81663b56991366f.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
f9f09b93e8 powerpc/8xx: Use generic permission masks
8xx already has _PAGE_NA and _PAGE_RO. So add _PAGE_ROX, _PAGE_RW and
_PAGE_RWX and remove specific permission masks.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5d0b5ce43485f697313eee4326ddff97157fb219.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
a5a08dc90f powerpc: Refactor permission masks used for __P/__S table and kernel memory flags
Prepare a common version of the permission masks that will be based
on _PAGE_NA, _PAGE_RO, _PAGE_ROX, _PAGE_RW, _PAGE_RWX that will be
defined in platform specific headers in later patches.

Put them in a new header pgtable-masks.h which will be included by
platforms.

And prepare a common version of flags used for mapping kernel memory
that will be based on _PAGE_RO, _PAGE_ROX, _PAGE_RW, _PAGE_RWX that
will be defined in platform specific headers.

Put them in unless _PAGE_KERNEL_RO is already defined so that platform
specific definitions can be dismantled one by one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d31b59efcdb38e675479563307af891aeadf6ec9.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
a785874736 powerpc: Rely on address instead of pte_user()
pte_user() may return 'false' when a user page is PAGE_NONE.

In that case it is still a user page and needs to be handled
as such. So use is_kernel_addr() instead.

And remove "user" text from ptdump as ptdump only dumps
kernel tables.

Note: no change done for book3s/64 which still has it
'priviledge' bit.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/c778dad89fad07727c31717a9c62f45357c29ebc.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
69339071bb powerpc: Remove pte_mkuser() and pte_mkpriviledged()
pte_mkuser() is never used. Remove it.

pte_mkpriviledged() is not used anymore. Remove it too.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1a1dc18816456c637dc8a9c38d532f7598b60ac4.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:46 +11:00
Christophe Leroy
d3c0dfcfc9 powerpc: Implement and use pgprot_nx()
ioremap_page_range() calls pgprot_nx() vmap() and vmap_pfn()
clear execute permission by calling pgprot_nx().

When pgprot_nx() is not defined it falls back to a nop.

Implement it for powerpc then use it in early_ioremap_range().

Then the call to pte_exprotect() can be removed from ioremap_prot().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5993a7a097e989af1c97fc4a6c011fefc67dbe6e.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
4c8dd6c987 powerpc/e500: Simplify pte_mkexec()
Commit b6cb20fdc2 ("powerpc/book3e: Fix set_memory_x() and
set_memory_nx()") implemented a more elaborated version of
pte_mkwrite() suitable for both kernel and user pages. That was
needed because set_memory_x() was using pte_mkwrite(). But since
commit a4c182ecf3 ("powerpc/set_memory: Avoid spinlock recursion
in change_page_attr()") pte_mkwrite() is not used anymore by
set_memory_x() so pte_mkwrite() can be simplified as it is only
used for user pages.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/cdc822322fe2ff4b0f5ecfde71d09d950b1c7557.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
799d8836a7 powerpc/nohash: Refactor __ptep_set_access_flags()
nohash/32 version of __ptep_set_access_flags() does the same
as nohash/64 version, the only difference is that nohash/32
version is more complete and uses pte_update().

Make it common and remove the nohash/64 version.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/e296885df46289d3e5f4cb51efeefe593f76ef24.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
2ef9f4bb9c powerpc/nohash: Refactor pte_clear()
pte_clear() are doing the same on nohash/32 and nohash/64,

Keep the static inline version of nohash/64, make it common and
remove the macro version of nohash/32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/818f7df83d7e9e18f55e274cd3c44f2871ade4dd.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
cc68d77feb powerpc/nohash: Deduplicate ptep_set_wrprotect() and ptep_get_and_clear()
ptep_set_wrprotect() and ptep_get_and_clear are identical for
nohash/32 and nohash/64.

Make them common.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ffe46edecdabce915e2d1a4b79a3b2ab770f2248.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
8c3d9eb323 powerpc/nohash: Refactor ptep_test_and_clear_young()
Remove ptep_test_and_clear_young() macro, make
__ptep_test_and_clear_young() common to nohash/32 and nohash/64
and change it to become ptep_test_and_clear_young()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/de90679ae169104fa3c98d0a8828d7ede228fc52.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
3a4288164d powerpc/nohash: Deduplicate pte helpers
Deduplicate following helpers that are identical on
nohash/32 and nohash/64:
  pte_mkwrite_novma()
  pte_mkdirty()
  pte_mkyoung()
  pte_wrprotect()
  pte_mkexec()
  pte_young()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/485f3cbafaa948143f92692e17ca652dfed68da2.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
27672be775 powerpc/nohash: Deduplicate _PAGE_CHG_MASK
_PAGE_CHG_MASK is identical between nohash/32 and nohash/64,
deduplicate it.

While at it, clean the #ifdef for PTE_RPN_MASK in nohash/32 as
it is already CONFIG_PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/c1a1893dbba4afb825bfa78b262f0b0e0fc3b490.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
7c929ad0b3 powerpc/nohash: Refactor checking of no-change in pte_update()
On nohash/64, a few callers of pte_update() check if there is
really a change in order to avoid an unnecessary write.

Refactor that inside pte_update().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/076563e611c2b51036686a8d378bfd5ef1726341.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
42a2722319 powerpc/nohash: Refactor pte_update()
pte_update() is similar.

Take the nohash/32 version which works on nohash/64 and add the debug
call to assert_pte_locked() which is only on nohash/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/e01cb630cad42f645915ce7702d23985241b71fc.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:45 +11:00
Christophe Leroy
0f4027eab5 powerpc/nohash: Replace #ifdef CONFIG_44x by IS_ENABLED(CONFIG_44x) in pgtable.h
No need of a #ifdef, use IS_ENABLED(CONFIG_44x)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7c7d97322a6a05a9842b1e8c4b41265916f542ca.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
4c1a89d983 powerpc/nohash: Move 8xx version of pte_update() into pte-8xx.h
No point in having 8xx special pte_update() in common header,
move it into pte-8xx.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/17e209b1a1a43ed219e9e1f2947ec594ed4f9394.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
7835006979 powerpc/nohash: Refactor declaration of {map/unmap}_kernel_page()
map_kernel_page() and unmap_kernel_page() have the same prototypes
on nohash/32 and nohash/64, keep only one declaration.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7fec5f3288cf0d0eac61b1b3f48c3ea54eb80cad.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
81fbb99970 powerpc/nohash: Remove {pte/pmd}_protnone()
Only book3s/64 selects ARCH_SUPPORTS_NUMA_BALANCING so
CONFIG_NUMA_BALANCING can't be selected on nohash targets.

Remove pte_protnone() and pmd_protnone().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/a50c1a19828a8eced82cbcc5c61754b667037d21.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
d3e0179672 powerpc: Untangle fixmap.h and pgtable.h and mmu.h
fixmap.h need pgtable.h for [un]map_kernel_page()

pgtable.h need fixmap.h for FIXADDR_TOP.

Untangle the two files by moving FIXADDR_TOP into pgtable.h

Also move VIRT_IMMR_BASE to fixmap.h to avoid fixmap.h in mmu.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5eba12392a018be28ad0a02ed844767b132589e7.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
da9554e0fe powerpc: Refactor update_mmu_cache_range()
On nohash, this function voids except for E500 with hugepages.

On book3s, this function is for hash MMUs only.

Combine those tests and rename E500 update_mmu_cache_range()
as __update_mmu_cache() which gets called by
update_mmu_cache_range().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/b029842cb6783cbeb43d202e69a90341d65295a4.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
93f81f6eea powerpc: Deduplicate prototypes of ptep_set_access_flags() and phys_mem_access_prot()
Prototypes of ptep_set_access_flags() and phys_mem_access_prot() are identical
for book3s and nohash.

Deduplicate them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/b846832cce842a2852615b7356937fb9507e436d.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
3b8547ec4d powerpc: Remove pte_ERROR()
pte_ERROR() is used neither in powerpc code nor in common mm code.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/bec9eb973ecc1cba091e5c9201d877a7797f3242.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Christophe Leroy
cc8ee288f4 powerpc/40x: Remove stale PTE_ATOMIC_UPDATES macro
40x TLB handlers were reworked by commit 2c74e2586b ("powerpc/40x:
Rework 40x PTE access and TLB miss") to not require PTE_ATOMIC_UPDATES
anymore.

Then commit 4e1df545e2 ("powerpc/pgtable: Drop PTE_ATOMIC_UPDATES")
removed all code related to PTE_ATOMIC_UPDATES.

Remove left over PTE_ATOMIC_UPDATES macro.

Fixes: 2c74e2586b ("powerpc/40x: Rework 40x PTE access and TLB miss")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/f061db5857fcd748f84a6707aad01754686ce97e.1695659959.git.christophe.leroy@csgroup.eu
2023-10-19 17:12:44 +11:00
Matthew Wilcox (Oracle)
51a752c28b powerpc: implement arch_xor_unlock_is_negative_byte on 32-bit
Simply remove the ifdef.  The assembly is identical to that in the
non-optimised case of test_and_clear_bits() on PPC32, and it's not clear
to me how the PPC32 optimisation works, nor whether it would work for
arch_xor_unlock_is_negative_byte().  If that optimisation would work,
someone can implement it later, but this is more efficient than the
implementation in filemap.c.

Link: https://lkml.kernel.org/r/20231004165317.1061855-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:17 -07:00
Matthew Wilcox (Oracle)
247dbcdbf7 bitops: add xor_unlock_is_negative_byte()
Replace clear_bit_and_unlock_is_negative_byte() with
xor_unlock_is_negative_byte().  We have a few places that like to lock a
folio, set a flag and unlock it again.  Allow for the possibility of
combining the latter two operations for efficiency.  We are guaranteed
that the caller holds the lock, so it is safe to unlock it with the xor. 
The caller must guarantee that nobody else will set the flag without
holding the lock; it is not safe to do this with the PG_dirty flag, for
example.

Link: https://lkml.kernel.org/r/20231004165317.1061855-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:16 -07:00
Michael Ellerman
1c7b4bc375 Merge branch fixes into next
Merge our fixes branch to bring in commits that are prerequisities for further
development or would cause conflicts.
2023-10-18 22:26:42 +11:00
Linus Torvalds
f8bf101b3b powerpc fixes for 6.6 #4
- Fix softlockup/crash when using hcall tracing.
 
  - Fix pte_access_permitted() for PAGE_NONE on 8xx.
 
  - Fix inverted pte_young() test in __ptep_test_and_clear_young() on 64-bit BookE.
 
  - Fix unhandled math emulation exception on 85xx.
 
  - Fix kernel crash on syscall return on 476.
 
 Thanks to: Athira Rajeev, Christophe Leroy, Eddie James, Naveen N Rao.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmUrtuwTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgK76D/9+/c8NDWu8ccY/rrdCiiOCu7oDLwVU
 LhhZ3IRWR7MeJItUiHRH1A+rBfpfWLWGWXRfusijl1qeUFqvum7zhEyEwvpXViJk
 QrDahmftcrKNQ6kqpmIr22ig8I9XS7O2iWtO2eVHeOddvgIp/JbPM+IuN5gFsVqj
 NyNwd4Wl/5dkJ2AamRZDFNqJsCyt5uHTb+fmDWa9kdVKmyJxqIwawNTgrwkAc+/Q
 +o9ZTxMSFYJmwNR2IhIF4mUISducwj7A21NHtTtD/vM8Naabnn+uDZUjBQvD/o6H
 wWP0KyqRQK7i3dnKaIzsPRwR7AF4ZE0E9C/W0DOwqj3UhNXNjAnPE/VGr4YX142a
 qiyGWX5u8oxt6ZEAHBde/f8M9knxCH7gqEDtIDO5rQ9ia9BD/P1o5e0K5fovtsWc
 ZL5QyUs476UJFfHpfxWXxwLfhUdPg7LG81Mkbi8S4/LUX20LcUqh10Vlv0LzTdLE
 GwPYL5YHOePUwNy1SYtmLbaC+zSOhvA+a13Vy22p5uPf86XCwEe3TUUl9Xx+fvd+
 hSNFKpNIUX9fXImtYIq+NEsv5afGSu+QeiEPAhOeCSUZZX92ktzIPpn4oivymEzb
 UY2pJ6lGkyoJoX7SA53wnC928QOggdopnDWSwNqUa79RMtPKqpI4PJA16phCc1/Y
 4Fwx5QOfMvBxcA==
 =uMis
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix softlockup/crash when using hcall tracing

 - Fix pte_access_permitted() for PAGE_NONE on 8xx

 - Fix inverted pte_young() test in __ptep_test_and_clear_young()
   on 64-bit BookE

 - Fix unhandled math emulation exception on 85xx

 - Fix kernel crash on syscall return on 476

Thanks to Athira Rajeev, Christophe Leroy, Eddie James, and Naveen N
Rao.

* tag 'powerpc-6.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/47x: Fix 47x syscall return crash
  powerpc/85xx: Fix math emulation exception
  powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
  powerpc/8xx: Fix pte_access_permitted() for PAGE_NONE
  powerpc/pseries: Remove unused r0 in the hcall tracing code
  powerpc/pseries: Fix STK_PARAM access in the hcall tracing code
2023-10-15 08:48:53 -07:00
Thomas Zimmermann
052ddf7b86 fbdev: Replace fb_pgprotect() with pgprot_framebuffer()
Rename the fbdev mmap helper fb_pgprotect() to pgprot_framebuffer().
The helper sets VMA page-access flags for framebuffers in device I/O
memory.

Also clean up the helper's parameters and return value. Instead of
the VMA instance, pass the individial parameters separately: existing
page-access flags, the VMAs start and end addresses and the offset
in the underlying device memory rsp file. Return the new page-access
flags. These changes align pgprot_framebuffer() with other pgprot_()
functions.

v4:
	* fix commit message (Christophe)
v3:
	* rename fb_pgprotect() to pgprot_framebuffer() (Arnd)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230922080636.26762-3-tzimmermann@suse.de
2023-10-12 09:20:46 +02:00
Thomas Zimmermann
c205a67d28 fbdev: Avoid file argument in fb_pgprotect()
Only PowerPC's fb_pgprotect() needs the file argument, although
the implementation in either phys_mem_access_prot() or
pci_phys_mem_access_prot() does not use it. Pass NULL to the internal
helper in preparation of further updates. A later patch will remove
the file parameter from fb_pgprotect().

While at it, replace the shift operation with PHYS_PFN().

v5:
	* state function names in commit description (Javier)

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230922080636.26762-2-tzimmermann@suse.de
2023-10-12 09:20:39 +02:00
Ingo Molnar
fdb8b7a1af Linux 6.6-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmUjFeceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGNCAH/RDI8G44DCV9Ps5U
 rl/FMf6iLUxU6fCS3Wwe8vtppLjPP7Y16AH5HKMumoDIqTfh9ZAUVKhZfT+PTgz3
 /oFXcGzZQLTcdbtH7XK2/zk7N/RI25/rDiCDd1uIJVCNii+hsBKS6Ihc4wXadxaR
 0z3lwoEKp2egeaeqmJWMzJLdjRrYhLs33+SEciVYqTiIvlWsM5QBm/sMvES7V57s
 TXrs5/y7yXtDBZ2PgYNCBRLyBazjqB28x07aQoePOAs6nFXl5N/wWPW/4wirWFHT
 s9LYZlmVo+O+RHWj10ASm/2l+ihgn959ZfRj1VekK2AWU1x/VzSPcuCXKvsrUoa+
 xEjL+vM=
 =efE3
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-09 18:09:23 +02:00
Christophe Leroy
5ea0bbaa32 powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
Commit 45201c8794 ("powerpc/nohash: Remove hash related code from
nohash headers.") replaced:

  if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
	return 0;

By:

  if (pte_young(*ptep))
	return 0;

But it should be:

  if (!pte_young(*ptep))
	return 0;

Fix it.

Fixes: 45201c8794 ("powerpc/nohash: Remove hash related code from nohash headers.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/8bb7f06494e21adada724ede47a4c3d97e879d40.1695659959.git.christophe.leroy@csgroup.eu
2023-10-09 22:37:31 +11:00
Christophe Leroy
5d9cea8a55 powerpc/8xx: Fix pte_access_permitted() for PAGE_NONE
On 8xx, PAGE_NONE is handled by setting _PAGE_NA instead of clearing
_PAGE_USER.

But then pte_user() returns 1 also for PAGE_NONE.

As _PAGE_NA prevent reads, add a specific version of pte_read()
that returns 0 when _PAGE_NA is set instead of always returning 1.

Fixes: 351750331f ("powerpc/mm: Introduce _PAGE_NA")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/57bcfbe578e43123f9ed73e040229b80f1ad56ec.1695659959.git.christophe.leroy@csgroup.eu
2023-10-09 22:37:31 +11:00
Uros Bizjak
5e0eb67974 locking/local, arch: Rewrite local_add_unless() as a static inline function
Rewrite local_add_unless() as a static inline function with boolean
return value, similar to the arch_atomic_add_unless() arch fallbacks.

The function is currently unused.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230731084458.28096-1-ubizjak@gmail.com
2023-10-04 11:38:11 +02:00
Ryan Roberts
935d4f0c6d mm: hugetlb: add huge page size param to set_huge_pte_at()
Patch series "Fix set_huge_pte_at() panic on arm64", v2.

This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic.  The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory.  This test (and the uffd poison feature) was merged for
v6.5-rc7.

Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.


Description of Bug
==================

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215a ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():

    static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
    {
        VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));

        return page_folio(pfn_to_page(swp_offset_pfn(entry)));
    }


Fix
===

The simplest fix would have been to revert the dodgy cleanup commit
18f3962953 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at().  As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.

So instead, I've added a huge page size parameter to set_huge_pte_at(). 
This means that the arm64 code has the size in all cases.  It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.

I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64).  I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.


This patch (of 2):

In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at().  Provide for this by
adding an `unsigned long sz` parameter to the function.  This follows the
same pattern as huge_pte_clear().

This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc).  The actual arm64 bug will be fixed in a separate
commit.

No behavioral changes intended.

Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>	[powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>	[vmalloc change]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Hari Bathini
b098f1c323 powerpc/fadump: make is_kdump_kernel() return false when fadump is active
Currently, is_kdump_kernel() returns true in crash dump capture kernel
for both kdump and fadump crash dump capturing methods, as both these
methods set elfcorehdr_addr. Some restrictions enforced for crash dump
capture kernel, based on is_kdump_kernel(), are specifically meant for
kdump case and not desirable for fadump - eg. IO queues restriction in
device drivers. So, define is_kdump_kernel() to return false when f/w
assisted dump is active.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230912082950.856977-2-hbathini@linux.ibm.com
2023-09-18 12:23:27 +10:00
Jordan Niethe
19d31c5f11 KVM: PPC: Add support for nestedv2 guests
A series of hcalls have been added to the PAPR which allow a regular
guest partition to create and manage guest partitions of its own. KVM
already had an interface that allowed this on powernv platforms. This
existing interface will now be called "nestedv1". The newly added PAPR
interface will be called "nestedv2".  PHYP will support the nestedv2
interface. At this time the host side of the nestedv2 interface has not
been implemented on powernv but there is no technical reason why it
could not be added.

The nestedv1 interface is still supported.

Add support to KVM to utilize these hcalls to enable running nested
guests as a pseries guest on PHYP.

Overview of the new hcall usage:

- L1 and L0 negotiate capabilities with
  H_GUEST_{G,S}ET_CAPABILITIES()

- L1 requests the L0 create a L2 with
  H_GUEST_CREATE() and receives a handle to use in future hcalls

- L1 requests the L0 create a L2 vCPU with
  H_GUEST_CREATE_VCPU()

- L1 sets up the L2 using H_GUEST_SET and the
  H_GUEST_VCPU_RUN input buffer

- L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()

- L2 returns to L1 with an exit reason and L1 reads the
  H_GUEST_VCPU_RUN output buffer populated by the L0

- L1 handles the exit using H_GET_STATE if necessary

- L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

- L1 frees the L2 in the L0 with H_GUEST_DELETE()

Support for the new API is determined by trying
H_GUEST_GET_CAPABILITIES. On a successful return, use the nestedv2
interface.

Use the vcpu register state setters for tracking modified guest state
elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
buffer immediately before running a L2. The guest wide
elements can not be added to the input buffer so send them with a
separate H_GUEST_SET call if necessary.

Make the vcpu register getter load the corresponding value from the real
host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
which values have already been loaded between H_GUEST_VCPU_RUN calls. If
an element is present in the H_GUEST_VCPU_RUN output buffer it also does
not need to be loaded again.

Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.vnet.ibm.com>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230914030600.16993-11-jniethe5@gmail.com
2023-09-14 22:04:24 +10:00
Jordan Niethe
dfcaacc8f9 KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long
The LPID register is 32 bits long. The host keeps the lpids for each
guest in an unsigned word struct kvm_arch. Currently, LPIDs are already
limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT.

The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host
for each L2 guest. This value is used as an lpid, e.g. it is the
parameter used by H_RPT_INVALIDATE. To minimize needless special casing
it makes sense to keep this "Guest ID" in struct kvm_arch::lpid.

This means that struct kvm_arch::lpid is too small so prepare for this
and make it an unsigned long. This is not a problem for the KVM-HV and
nestedv1 cases as their lpid values are already limited to valid ranges
so in those contexts the lpid can be used as an unsigned word safely as
needed.

In the PAPR, the H_RPT_INVALIDATE pid/lpid parameter is already
specified as an unsigned long so change pseries_rpt_invalidate() to
match that.  Update the callers of pseries_rpt_invalidate() to also take
an unsigned long if they take an lpid value.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230914030600.16993-10-jniethe5@gmail.com
2023-09-14 22:04:24 +10:00
Jordan Niethe
6ccbbc33f0 KVM: PPC: Add helper library for Guest State Buffers
The PAPR "Nestedv2" guest API introduces the concept of a Guest State
Buffer for communication about L2 guests between L1 and L0 hosts.

In the new API, the L0 manages the L2 on behalf of the L1. This means
that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
table...), it must request the L0 perform the modification. If the
nested host needs to read L2 state likewise this request must
go through the L0.

The Guest State Buffer is a Type-Length-Value style data format defined
in the PAPR which assigns all relevant partition state a unique
identity. Unlike a typical TLV format the length is redundant as the
length of each identity is fixed but is included for checking
correctness.

A guest state buffer consists of an element count followed by a stream
of elements, where elements are composed of an ID number, data length,
then the data:

  Header:

   <---4 bytes--->
  +----------------+-----
  | Element Count  | Elements...
  +----------------+-----

  Element:

   <----2 bytes---> <-2 bytes-> <-Length bytes->
  +----------------+-----------+----------------+
  | Guest State ID |  Length   |      Data      |
  +----------------+-----------+----------------+

Guest State IDs have other attributes defined in the PAPR such as
whether they are per thread or per guest, or read-only.

Introduce a library for using guest state buffers. This includes support
for actions such as creating buffers, adding elements to buffers,
reading the value of elements and parsing buffers. This will be used
later by the nestedv2 guest support.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230914030600.16993-9-jniethe5@gmail.com
2023-09-14 22:04:24 +10:00
Jordan Niethe
c8ae9b3c6e KVM: PPC: Use accessors for VCORE registers
Introduce accessor generator macros for VCORE registers. Use the accessor
functions to replace direct accesses to this registers.

This will be important later for Nested APIv2 support which requires
additional functionality for accessing and modifying VCPU state.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230914030600.16993-6-jniethe5@gmail.com
2023-09-14 22:04:24 +10:00
Jordan Niethe
7028ac8d17 KVM: PPC: Use accessors for VCPU registers
Introduce accessor generator macros for VCPU registers. Use the accessor
functions to replace direct accesses to this registers.

This will be important later for Nested APIv2 support which requires
additional functionality for accessing and modifying VCPU state.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230914030600.16993-5-jniethe5@gmail.com
2023-09-14 22:04:24 +10:00
Jordan Niethe
2a64bc6731 KVM: PPC: Rename accessor generator macros
More "wrapper" style accessor generating macros will be introduced for
the nestedv2 guest support. Rename the existing macros with more
descriptive names now so there is a consistent naming convention.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230914030600.16993-4-jniethe5@gmail.com
2023-09-14 22:04:23 +10:00
Jordan Niethe
52425a3b3c KVM: PPC: Introduce FPR/VR accessor functions
Introduce accessor functions for floating point and vector registers
like the ones that exist for GPRs. Use these to replace the existing FPR
and VR accessor macros.

This will be important later for Nested APIv2 support which requires
additional functionality for accessing and modifying VCPU state.

Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230914030600.16993-3-jniethe5@gmail.com
2023-09-14 22:04:23 +10:00
Linus Torvalds
4b3d6e0c6c ata changes for 6.6
- Fix OF include file for ata platform drivers (Rob).
 
  - Simplify various ahci, sata and pata platform drivers using the
    function devm_platform_ioremap_resource() (Yangtao).
 
  - Cleanup libata time related argument types (e.g. timeouts values)
    (Sergey).
 
  - Cleanup libata code around error handling as all ata drivers now
    define a error_handler operation (Hannes and Niklas).
 
  - Remove functions intended for libsas that are in fact unused
    (Niklas).
 
  - Change the remove device callback of platform drivers to a null
    function (Uwe).
 
  - Simplify the pata_imx driver using devm_clk_get_enabled() (Li).
 
  - Remove old and uinused remnants of the ide code in arm, parisc,
    powerpc, sparc and m68k architectures and associated drivers
    (pata_buddha, pata_falcon and pata_gayle) (Geert).
 
  - Add missing MODULE_DESCRIPTION() in the sata_gemini and pata_ftide010
    drivers (me).
 
  - Several fixes for the pata_ep93xx and pata_falcon drivers (Nikita,
    Michael).
 
  - Add Elkhart Lake AHCI controller support to the ahci driver (Werner).
 
  - Disable NCQ trim on Micron 1100 drives (Pawel).
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZPbUdQAKCRDdoc3SxdoY
 du6NAP9G10EhjgXDNIM8f1AN8gHrZk2RGSxSuqIKKROxl2i+ugD+Mt8/UjRpf0JO
 nVdKTfgeXaLfCXHN/fmkIYNmk+I7Twg=
 =9Ety
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata updates from Damien Le Moal:

 - Fix OF include file for ata platform drivers (Rob)

 - Simplify various ahci, sata and pata platform drivers using the
   function devm_platform_ioremap_resource() (Yangtao)

 - Cleanup libata time related argument types (e.g. timeouts values)
   (Sergey)

 - Cleanup libata code around error handling as all ata drivers now
   define a error_handler operation (Hannes and Niklas)

 - Remove functions intended for libsas that are in fact unused (Niklas)

 - Change the remove device callback of platform drivers to a null
   function (Uwe)

 - Simplify the pata_imx driver using devm_clk_get_enabled() (Li)

 - Remove old and uinused remnants of the ide code in arm, parisc,
   powerpc, sparc and m68k architectures and associated drivers
   (pata_buddha, pata_falcon and pata_gayle) (Geert)

 - Add missing MODULE_DESCRIPTION() in the sata_gemini and pata_ftide010
   drivers (me)

 - Several fixes for the pata_ep93xx and pata_falcon drivers (Nikita,
   Michael)

 - Add Elkhart Lake AHCI controller support to the ahci driver (Werner)

 - Disable NCQ trim on Micron 1100 drives (Pawel)

* tag 'ata-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (60 commits)
  ata: libata-core: Disable NCQ_TRIM on Micron 1100 drives
  ata: ahci: Add Elkhart Lake AHCI controller
  ata: pata_falcon: add data_swab option to byte-swap disk data
  ata: pata_falcon: fix IO base selection for Q40
  ata: pata_ep93xx: use soc_device_match for UDMA modes
  ata: pata_ep93xx: fix error return code in probe
  ata: sata_gemini: Add missing MODULE_DESCRIPTION
  ata: pata_ftide010: Add missing MODULE_DESCRIPTION
  m68k: Remove <asm/ide.h>
  ata: pata_gayle: Remove #include <asm/ide.h>
  ata: pata_falcon: Remove #include <asm/ide.h>
  ata: pata_buddha: Remove #include <asm/ide.h>
  asm-generic: Remove ide_iops.h
  sparc: Remove <asm/ide.h>
  powerpc: Remove <asm/ide.h>
  parisc: Remove <asm/ide.h>
  ARM: Remove <asm/ide.h>
  ata: pata_imx: Use helper function devm_clk_get_enabled()
  ata: sata_rcar: Convert to platform remove callback returning void
  ata: sata_mv: Convert to platform remove callback returning void
  ...
2023-09-05 12:37:28 -07:00
Linus Torvalds
8e1e49550d TTY/Serial driver changes for 6.6-rc1
Here is the big set of tty and serial driver changes for 6.6-rc1.
 
 Lots of cleanups in here this cycle, and some driver updates.  Short
 summary is:
   - Jiri's continued work to make the tty code and apis be a bit more
     sane with regards to modern kernel coding style and types
   - cpm_uart driver updates
   - n_gsm updates and fixes
   - meson driver updates
   - sc16is7xx driver updates
   - 8250 driver updates for different hardware types
   - qcom-geni driver fixes
   - tegra serial driver change
   - stm32 driver updates
   - synclink_gt driver cleanups
   - tty structure size reduction
 
 All of these have been in linux-next this week with no reported issues.
 The last bit of cleanups from Jiri and the tty structure size reduction
 came in last week, a bit late but as they were just style changes and
 size reductions, I figured they should get into this merge cycle so that
 others can work on top of them with no merge conflicts.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH+jA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykKyACgldt6QeenTN+6dXIHS/eQHtTKZwMAn3arSeXI
 QrUUnLFjOWyoX87tbMBQ
 =LVw0
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big set of tty and serial driver changes for 6.6-rc1.

  Lots of cleanups in here this cycle, and some driver updates. Short
  summary is:

   - Jiri's continued work to make the tty code and apis be a bit more
     sane with regards to modern kernel coding style and types

   - cpm_uart driver updates

   - n_gsm updates and fixes

   - meson driver updates

   - sc16is7xx driver updates

   - 8250 driver updates for different hardware types

   - qcom-geni driver fixes

   - tegra serial driver change

   - stm32 driver updates

   - synclink_gt driver cleanups

   - tty structure size reduction

  All of these have been in linux-next this week with no reported
  issues. The last bit of cleanups from Jiri and the tty structure size
  reduction came in last week, a bit late but as they were just style
  changes and size reductions, I figured they should get into this merge
  cycle so that others can work on top of them with no merge conflicts"

* tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
  tty: shrink the size of struct tty_struct by 40 bytes
  tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw()
  tty: n_tty: extract ECHO_OP processing to a separate function
  tty: n_tty: unify counts to size_t
  tty: n_tty: use u8 for chars and flags
  tty: n_tty: simplify chars_in_buffer()
  tty: n_tty: remove unsigned char casts from character constants
  tty: n_tty: move newline handling to a separate function
  tty: n_tty: move canon handling to a separate function
  tty: n_tty: use MASK() for masking out size bits
  tty: n_tty: make n_tty_data::num_overrun unsigned
  tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun()
  tty: n_tty: use 'num' for writes' counts
  tty: n_tty: use output character directly
  tty: n_tty: make flow of n_tty_receive_buf_common() a bool
  Revert "tty: serial: meson: Add a earlycon for the T7 SoC"
  Documentation: devices.txt: Fix minors for ttyCPM*
  Documentation: devices.txt: Remove ttySIOC*
  Documentation: devices.txt: Remove ttyIOC*
  serial: 8250_bcm7271: improve bcm7271 8250 port
  ...
2023-09-01 09:38:00 -07:00
Ingo Molnar
f441ff73f1 powerpc: Fix pud_mkwrite() definition after pte_mkwrite() API changes
Fix up missed semantic mis-merge between commits

  161e393c0f ("mm: Make pte_mkwrite() take a VMA")
  27af67f356 ("powerpc/book3s64/mm: enable transparent pud hugepage")

where the newly introduced powerpc use of 'pte_mkwrite()' needs to use
the 'novma()' versions as per commit 2f0584f3f4 ("mm: Rename arch
pte_mkwrite()'s to pte_mkwrite_novma()").

Fixes: df57721f9a ("Merge tag 'x86_shstk_for_6.6-rc1' of [...]")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-31 14:49:38 -07:00
Linus Torvalds
4ad0a4c234 powerpc updates for 6.6
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the
    configured SMT state when hotplugging CPUs into the system.
 
  - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix
    MMU to avoid a broadcast TLBIE flush on exit.
 
  - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused
    associated arch hooks.
 
  - Add support for the "nohlt" command line option to disable CPU idle.
 
  - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1.
 
  - Rework memory block size determination, and support 256MB size on systems
    with GPUs that have hotpluggable memory.
 
  - Various other small features and fixes.
 
 Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev,
 Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand,
 Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin
 Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He,
 Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara
 R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick
 Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell
 Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav
 Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, Zheng Zengkai.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmTwgbwTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFmpD/432vipeoqvkAYsyK0xi/Y3GcY0wcyd
 WJApLXXadEbtKQrgXQ6sowWqalg5thYnQCRarg/tXKK/po3KfgwkPjGDpOL+cIdr
 12QVN2XJm9VmJ1wYJxzk+yXx4F43AdmMdr94qWAGufbTHezwb4UpzVR1NxtFrOE/
 X5TNsC2+2mdZY/ZaNHS5vsTIFv3EhQfqgjZPlIAdLn6CGc8xWT514Q/uHA8+ytM/
 HL7Hqs33DoPSvgTa5TT/2E0d0k5nO3P5KObzAjpYlireTPaBi51mpKGewcrtm0o2
 v3cBlbfx3C7pe9ZhKBK9BH8cjynfiqsVZ9/lCw/7eBNdm9tHuzG0jeS7Db9tCZXS
 fM7G2R7SoIusPTqxlBmkU5DpYslwrHiVgCyy3ijxkoA/fakVwh/GgTcMsRt73IY6
 n6DsUvWwuYHCIeIiHmHQJqCqCRtV+aMzU3AbbBHOjtdIanhlW16M686dEsgCirh7
 akRVRD5VqKaqXs34PpkRL89Xv3wZRjl6XZ3hZFfCjSYXfpXDXhgSToIskpHYhKL8
 gpY7WtG9YQP05Xz5HRCx6EluaZVeKe0lZi6fezX7Mi9AygJQO8FfXqP1mHBlEq40
 ThWtvL9D89RV6lADqqFN20XepgvKNOyAXcE4szvsnIZYUSPmZQZSPxx+DHtROaLP
 jX3ifxtxJp92pQ==
 =5g7K
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the
   configured SMT state when hotplugging CPUs into the system

 - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the
   Radix MMU to avoid a broadcast TLBIE flush on exit

 - Drop the exclusion between ptrace/perf watchpoints, and drop the now
   unused associated arch hooks

 - Add support for the "nohlt" command line option to disable CPU idle

 - Add support for -fpatchable-function-entry for ftrace, with GCC >=
   13.1

 - Rework memory block size determination, and support 256MB size on
   systems with GPUs that have hotpluggable memory

 - Various other small features and fixes

Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam
Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel
Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof
Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar,
Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar
Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh
Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain,
Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai.

* tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits)
  macintosh/ams: linux/platform_device.h is needed
  powerpc/xmon: Reapply "Relax frame size for clang"
  powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached
  powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled
  powerpc/iommu: Fix notifiers being shared by PCI and VIO buses
  powerpc/mpc5xxx: Add missing fwnode_handle_put()
  powerpc/config: Disable SLAB_DEBUG_ON in skiroot
  powerpc/pseries: Remove unused hcall tracing instruction
  powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n
  powerpc: dts: add missing space before {
  powerpc/eeh: Use pci_dev_id() to simplify the code
  powerpc/64s: Move CPU -mtune options into Kconfig
  powerpc/powermac: Fix unused function warning
  powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT
  powerpc: Don't include lppaca.h in paca.h
  powerpc/pseries: Move hcall_vphn() prototype into vphn.h
  powerpc/pseries: Move VPHN constants into vphn.h
  cxl: Drop unused detach_spa()
  powerpc: Drop zalloc_maybe_bootmem()
  powerpc/powernv: Use struct opal_prd_msg in more places
  ...
2023-08-31 12:43:10 -07:00
Linus Torvalds
df57721f9a Add x86 shadow stack support
Convert IBT selftest to asm to fix objtool warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ
 krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC
 Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/
 gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO
 Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR
 ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx
 rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9
 MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/
 Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s
 VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh
 8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp
 AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA=
 =3UUm
 -----END PGP SIGNATURE-----

Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 shadow stack support from Dave Hansen:
 "This is the long awaited x86 shadow stack support, part of Intel's
  Control-flow Enforcement Technology (CET).

  CET consists of two related security features: shadow stacks and
  indirect branch tracking. This series implements just the shadow stack
  part of this feature, and just for userspace.

  The main use case for shadow stack is providing protection against
  return oriented programming attacks. It works by maintaining a
  secondary (shadow) stack using a special memory type that has
  protections against modification. When executing a CALL instruction,
  the processor pushes the return address to both the normal stack and
  to the special permission shadow stack. Upon RET, the processor pops
  the shadow stack copy and compares it to the normal stack copy.

  For more information, refer to the links below for the earlier
  versions of this patch set"

Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/

* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/shstk: Change order of __user in type
  x86/ibt: Convert IBT selftest to asm
  x86/shstk: Don't retry vm_munmap() on -EINTR
  x86/kbuild: Fix Documentation/ reference
  x86/shstk: Move arch detail comment out of core mm
  x86/shstk: Add ARCH_SHSTK_STATUS
  x86/shstk: Add ARCH_SHSTK_UNLOCK
  x86: Add PTRACE interface for shadow stack
  selftests/x86: Add shadow stack test
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/shstk: Wire in shadow stack interface
  x86: Expose thread features in /proc/$PID/status
  x86/shstk: Support WRSS for userspace
  x86/shstk: Introduce map_shadow_stack syscall
  x86/shstk: Check that signal frame is shadow stack mem
  x86/shstk: Check that SSP is aligned on sigreturn
  x86/shstk: Handle signals for shadow stack
  x86/shstk: Introduce routines modifying shstk
  x86/shstk: Handle thread shadow stack
  x86/shstk: Add user-mode shadow stack support
  ...
2023-08-31 12:20:12 -07:00
Linus Torvalds
d68b4b6f30 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options").
 
 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h").
 
 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands").
 
 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions").
 
 - Switch the handling of kdump from a udev scheme to in-kernel handling,
   by Eric DeVolder ("crash: Kernel handling of CPU and memory hot
   un/plug").
 
 - Many singleton patches to various parts of the tree
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO2GpAAKCRDdBJ7gKXxA
 juW3AQD1moHzlSN6x9I3tjm5TWWNYFoFL8af7wXDJspp/DWH/AD/TO0XlWWhhbYy
 QHy7lL0Syha38kKLMXTM+bN6YQHi9AU=
 =WJQa
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
   ("refactor Kconfig to consolidate KEXEC and CRASH options")

 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h")

 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands")

 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions")

 - Switch the handling of kdump from a udev scheme to in-kernel
   handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
   hot un/plug")

 - Many singleton patches to various parts of the tree

* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
  document while_each_thread(), change first_tid() to use for_each_thread()
  drivers/char/mem.c: shrink character device's devlist[] array
  x86/crash: optimize CPU changes
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  crash: hotplug support for kexec_load()
  x86/crash: add x86 crash hotplug support
  crash: memory and CPU hotplug sysfs attributes
  kexec: exclude elfcorehdr from the segment digest
  crash: add generic infrastructure for crash hotplug support
  crash: move a few code bits to setup support of crash hotplug
  kstrtox: consistently use _tolower()
  kill do_each_thread()
  nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
  scripts/bloat-o-meter: count weak symbol sizes
  treewide: drop CONFIG_EMBEDDED
  lockdep: fix static memory detection even more
  lib/vsprintf: declare no_hash_pointers in sprintf.h
  lib/vsprintf: split out sprintf() and friends
  kernel/fork: stop playing lockless games for exe_file replacement
  adfs: delete unused "union adfs_dirtail" definition
  ...
2023-08-29 14:53:51 -07:00
Linus Torvalds
b96a3e9142 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP.  It
   also speeds up GUP handling of transparent hugepages.
 
 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").
 
 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").
 
 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").
 
 - xu xin has doe some work on KSM's handling of zero pages.  These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support tracking
   KSM-placed zero-pages").
 
 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
 
 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").
 
 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
 
 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").
 
 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").
 
 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").
 
 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").
 
 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").
 
 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap").  And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").
 
 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
 
 - Baoquan He has converted some architectures to use the GENERIC_IOREMAP
   ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
   GENERIC_IOREMAP way").
 
 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").
 
 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep").  Liam also developed some efficiency improvements
   ("Reduce preallocations for maple tree").
 
 - Cleanup and optimization to the secondary IOMMU TLB invalidation, from
   Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").
 
 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").
 
 - Kemeng Shi provides some maintenance work on the compaction code ("Two
   minor cleanups for compaction").
 
 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
   file-backed faults under the VMA lock").
 
 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").
 
 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").
 
 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").
 
 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
 
 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").
 
 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").
 
 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
 
 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").
 
 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").
 
 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
   on memory feature on ppc64").
 
 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
 
 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").
 
 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").
 
 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").
 
 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").
 
 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").
 
 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").
 
 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").
 
 - pagetable handling cleanups from Matthew Wilcox ("New page table range
   API").
 
 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").
 
 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").
 
 - Matthew Wilcox has also done some maintenance work on the MM subsystem
   documentation ("Improve mm documentation").
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
 jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
 Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
 =Afws
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
2023-08-29 14:25:26 -07:00
Matthew Wilcox (Oracle)
9fee28baa6 powerpc: implement the new page table range API
Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().  Change
the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to per-folio.

[willy@infradead.org: re-export flush_dcache_icache_folio()]
  Link: https://lkml.kernel.org/r/ZMx1daYwvD9EM7Cv@casper.infradead.org
Link: https://lkml.kernel.org/r/20230802151406.3735276-22-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:23 -07:00
Russell Currey
eac030b22e powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT
lppaca_shared_proc() takes a pointer to the lppaca which is typically
accessed through get_lppaca().  With DEBUG_PREEMPT enabled, this leads
to checking if preemption is enabled, for example:

  BUG: using smp_processor_id() in preemptible [00000000] code: grep/10693
  caller is lparcfg_data+0x408/0x19a0
  CPU: 4 PID: 10693 Comm: grep Not tainted 6.5.0-rc3 #2
  Call Trace:
    dump_stack_lvl+0x154/0x200 (unreliable)
    check_preemption_disabled+0x214/0x220
    lparcfg_data+0x408/0x19a0
    ...

This isn't actually a problem however, as it does not matter which
lppaca is accessed, the shared proc state will be the same.
vcpudispatch_stats_procfs_init() already works around this by disabling
preemption, but the lparcfg code does not, erroring any time
/proc/powerpc/lparcfg is accessed with DEBUG_PREEMPT enabled.

Instead of disabling preemption on the caller side, rework
lppaca_shared_proc() to not take a pointer and instead directly access
the lppaca, bypassing any potential preemption checks.

Fixes: f13c13a005 ("powerpc: Stop using non-architected shared_proc field in lppaca")
Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Rework to avoid needing a definition in paca.h and lppaca.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-4-mpe@ellerman.id.au
2023-08-24 22:33:17 +10:00
Michael Ellerman
1aa0006676 powerpc: Don't include lppaca.h in paca.h
By adding a forward declaration for struct lppaca we can untangle paca.h
and lppaca.h. Also move get_lppaca() into lppaca.h for consistency.

Add includes of lppaca.h to some files that need it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-3-mpe@ellerman.id.au
2023-08-24 22:33:16 +10:00
Michael Ellerman
9a6c05fe9a powerpc/pseries: Move hcall_vphn() prototype into vphn.h
Consolidate the two prototypes for hcall_vphn() into vphn.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-2-mpe@ellerman.id.au
2023-08-24 22:33:16 +10:00
Michael Ellerman
c040c7488b powerpc/pseries: Move VPHN constants into vphn.h
These don't have any particularly good reason to belong in lppaca.h,
move them into their own header.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055317.751786-1-mpe@ellerman.id.au
2023-08-24 22:33:16 +10:00
Michael Ellerman
fabdb27da7 powerpc: Drop zalloc_maybe_bootmem()
The only callers of zalloc_maybe_bootmem() are PCI setup routines. These
used to be called early during boot before slab setup, and also during
runtime due to hotplug.

But commit 5537fcb319 ("powerpc/pci: Add ppc_md.discover_phbs()")
moved the boot-time calls later, after slab setup, meaning there's no
longer any need for zalloc_maybe_bootmem(), kzalloc() can be used in all
cases.

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230823055430.752550-1-mpe@ellerman.id.au
2023-08-24 22:33:16 +10:00
Geert Uytterhoeven
d143de46ac powerpc: Remove <asm/ide.h>
As of commit b7fb14d3ac ("ide: remove the legacy ide driver") in
v5.14, there are no more generic users of <asm/ide.h>.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-08-24 07:26:07 +09:00
Christophe Leroy
0e2a34c467 powerpc/64e: Fix circular dependency with CONFIG_SMP disabled
asm/percpu.h includes asm/paca.h which needs struct tlb_core_data
which is defined in mmu-e500.h

asm/percpu.h is included from asm/mmu.h in a #ifdef CONFIG_E500
before the inclusion of mmu-e500.h

To fix that, move the inclusion of asm/percpu.h into mmu-e500.h
after the definition of struct tlb_core_data

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308220708.nRf5AUAe-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308220857.uFq2oAxM-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202308221055.lw3UzJIL-lkp@intel.com/
Fixes: 3a24ea0df8 ("powerpc/kuap: Use ASM feature fixups instead of static branches")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5e0f97d5cbcd05238b56b4424ab096468296824d.1692684461.git.christophe.leroy@csgroup.eu
2023-08-23 15:55:21 +10:00
Aneesh Kumar K.V
603fd64dfa powerpc/book3s64/memhotplug: enable memmap on memory for radix
Radix vmemmap mapping can map things correctly at the PMD level or PTE
level based on different device boundary checks.  Hence we skip the
restrictions w.r.t vmemmap size to be multiple of PMD_SIZE.  This also
makes the feature widely useful because to use PMD_SIZE vmemmap area we
require a memory block size of 2GiB

We can also use MHP_RESERVE_PAGES_MEMMAP_ON_MEMORY to that the feature can
work with a memory block size of 256MB.  Using altmap.reserve feature to
align things correctly at pageblock granularity.  We can end up losing
some pages in memory with this.  For ex: with a 256MiB memory block size,
we require 4 pages to map vmemmap pages, In order to align things
correctly we end up adding a reserve of 28 pages.  ie, for every 4096
pages 28 pages get reserved.

Link: https://lkml.kernel.org/r/20230808091501.287660-6-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:49 -07:00
Naveen N Rao
0f71dcfb4a powerpc/ftrace: Add support for -fpatchable-function-entry
GCC v13.1 updated support for -fpatchable-function-entry on ppc64le to
emit nops after the local entry point, rather than before it. This
allows us to use this in the kernel for ftrace purposes. A new script is
added under arch/powerpc/tools/ to help detect if nops are emitted after
the function local entry point, or before the global entry point.

With -fpatchable-function-entry, we no longer have the profiling
instructions generated at function entry, so we only need to validate
the presence of two nops at the ftrace location in ftrace_init_nop(). We
patch the preceding instruction with 'mflr r0' to match the
-mprofile-kernel ABI for subsequent ftrace use.

This changes the profiling instructions used on ppc32. The default -pg
option emits an additional 'stw' instruction after 'mflr r0' and before
the branch to _mcount 'bl _mcount'. This is very similar to the original
-mprofile-kernel implementation on ppc64le, where an additional 'std'
instruction was used to save LR to its save location in the caller's
stackframe. Subsequently, this additional store was removed in later
compiler versions for performance reasons. The same reasons apply for
ppc32 so we only patch in a 'mflr r0'.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/68586d22981a2c3bb45f27a2b621173d10a7d092.1687166935.git.naveen@kernel.org
2023-08-22 00:09:06 +10:00
Naveen N Rao
cc93b92332 powerpc/ftrace: Add separate ftrace_init_nop() with additional validation
Currently, we validate instructions around the ftrace location every
time we have to enable/disable ftrace. Introduce ftrace_init_nop() to
instead perform all the validation during ftrace initialization. This
allows us to simply patch the necessary instructions during
enabling/disabling ftrace.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/f373684081e8e98be09b7f44d2d93069768324dc.1687166935.git.naveen@kernel.org
2023-08-22 00:09:06 +10:00
Naveen N Rao
f3993a0330 powerpc/ftrace: Extend ftrace support for large kernels to ppc32
Commit 67361cf807 ("powerpc/ftrace: Handle large kernel configs")
added ftrace support for ppc64 kernel images with a text section larger
than 32MB. The approach itself isn't specific to ppc64, so extend the
same to also work on ppc32.

While at it, reduce the space reserved for the stub from 64 bytes to 32
bytes since the different stub variants are all less than 8
instructions.

To reduce use of #ifdef, a stub implementation is provided for
kernel_toc_address() and -SZ_2G is cast to 'long long' to prevent
errors on ppc32.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/9fa3258cbb9105cf8a0a8135214d44ffbc75fe84.1687166935.git.naveen@kernel.org
2023-08-22 00:09:05 +10:00
Naveen N Rao
0240605931 powerpc/module: Remove unused .ftrace.tramp section
.ftrace.tramp section is not used for any purpose. This code was added
all the way back in the original commit introducing support for dynamic
ftrace on ppc64 modules. Remove it.

Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/9cf6d7f37ba82f7cb6dafecf660f44925c526d8d.1687166935.git.naveen@kernel.org
2023-08-22 00:09:05 +10:00
Naveen N Rao
f6834c8c59 powerpc/ftrace: Fix dropping weak symbols with older toolchains
The minimum level of gcc supported for building the kernel is v5.1.
v5.x releases of gcc emitted a three instruction sequence for
-mprofile-kernel:
	mflr	r0
	std	r0, 16(r1)
	bl	_mcount

It is only with the v6.x releases that gcc started emitting the two
instruction sequence for -mprofile-kernel, omitting the second store
instruction.

With the older three instruction sequence, the actual ftrace location
can be the 5th instruction into a function. Update the allowed offset
for ftrace location from 12 to 16 to accommodate the same.

Cc: stable@vger.kernel.org
Fixes: 7af82ff90a ("powerpc/ftrace: Ignore weak functions")
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7b265908a9461e38fc756ef9b569703860a80621.1687166935.git.naveen@kernel.org
2023-08-22 00:09:05 +10:00
Greg Kroah-Hartman
642073c306 Merge commit b320441c04 ("Merge tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty") into tty-next
We need the serial-core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 14:29:37 +02:00
Douglas Anderson
8d539b84f1 nmi_backtrace: allow excluding an arbitrary CPU
The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU.  This convenience means callers didn't need to
find a place to allocate a CPU mask just to handle the common case.

Let's extend the API to take a CPU ID to exclude instead of just a
boolean.  This isn't any more complex for the API to handle and allows the
hardlockup detector to exclude a different CPU (the one it already did a
trace for) without needing to find space for a CPU mask.

Arguably, this new API also encourages safer behavior.  Specifically if
the caller wants to avoid tracing the current CPU (maybe because they
already traced the current CPU) this makes it more obvious to the caller
that they need to make sure that the current CPU ID can't change.

[akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub]
Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:19:00 -07:00
Aneesh Kumar K.V
f2b79c0d79 powerpc/book3s64/radix: add support for vmemmap optimization for radix
With 2M PMD-level mapping, we require 32 struct pages and a single vmemmap
page can contain 1024 struct pages (PAGE_SIZE/sizeof(struct page)).  Hence
with 64K page size, we don't use vmemmap deduplication for PMD-level
mapping.

[aneesh.kumar@linux.ibm.com: ppc64: don't include radix headers if CONFIG_PPC_RADIX_MMU=n]
  Link: https://lkml.kernel.org/r/87zg3jw8km.fsf@linux.ibm.com
Link: https://lkml.kernel.org/r/20230724190759.483013-12-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:55 -07:00
Aneesh Kumar K.V
368a0590d9 powerpc/book3s64/vmemmap: switch radix to use a different vmemmap handling function
This is in preparation to update radix to implement vmemmap optimization
for devdax.  Below are the rules w.r.t radix vmemmap mapping

1. First try to map things using PMD (2M)
2. With altmap if altmap cross-boundary check returns true, fall back to
   PAGE_SIZE
3. If we can't allocate PMD_SIZE backing memory for vmemmap, fallback to
   PAGE_SIZE

On removing vmemmap mapping, check if every subsection that is using the
vmemmap area is invalid.  If found to be invalid, that implies we can
safely free the vmemmap area.  We don't use the PAGE_UNUSED pattern used
by x86 because with 64K page size, we need to do the above check even at
the PAGE_SIZE granularity.

[aneesh.kumar@linux.ibm.com: fix section mismatch warning]
  Link: https://lkml.kernel.org/r/87h6pqvu5g.fsf@linux.ibm.com
[aneesh.kumar@linux.ibm.com: fix kernel build error]
  Link: https://lkml.kernel.org/r/877cqkwd20.fsf@linux.ibm.com
Link: https://lkml.kernel.org/r/20230724190759.483013-11-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:55 -07:00
Aneesh Kumar K.V
27af67f356 powerpc/book3s64/mm: enable transparent pud hugepage
This is enabled only with radix translation and 1G hugepage size.  This
will be used with devdax device memory with a namespace alignment of 1G.

Anon transparent hugepage is not supported even though we do have helpers
checking pud_trans_huge().  We should never find that return true.  The
only expected pte bit combination is _PAGE_PTE | _PAGE_DEVMAP.

Some of the helpers are never expected to get called on hash translation
and hence is marked to call BUG() in such a case.

Link: https://lkml.kernel.org/r/20230724190759.483013-10-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:55 -07:00
Alistair Popple
6bbd42e2df mmu_notifiers: call invalidate_range() when invalidating TLBs
The invalidate_range() is going to become an architecture specific mmu
notifier used to keep the TLB of secondary MMUs such as an IOMMU in sync
with the CPU page tables.  Currently it is called from separate code paths
to the main CPU TLB invalidations.  This can lead to a secondary TLB not
getting invalidated when required and makes it hard to reason about when
exactly the secondary TLB is invalidated.

To fix this move the notifier call to the architecture specific TLB
maintenance functions for architectures that have secondary MMUs requiring
explicit software invalidations.

This fixes a SMMU bug on ARM64.  On ARM64 PTE permission upgrades require
a TLB invalidation.  This invalidation is done by the architecture
specific ptep_set_access_flags() which calls flush_tlb_page() if required.
However this doesn't call the notifier resulting in infinite faults being
generated by devices using the SMMU if it has previously cached a
read-only PTE in it's TLB.

Moving the invalidations into the TLB invalidation functions ensures all
invalidations happen at the same time as the CPU invalidation.  The
architecture specific flush_tlb_all() routines do not call the notifier as
none of the IOMMUs require this.

Link: https://lkml.kernel.org/r/0287ae32d91393a582897d6c4db6f7456b1001f2.1690292440.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Tested-by: SeongJae Park <sj@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zhi Wang <zhi.wang.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:41 -07:00
Christophe Leroy
8d05554dca powerpc: mm: convert to GENERIC_IOREMAP
By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(),
generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap()
and iounmap() are all visible and available to arch. Arch needs to
provide wrapper functions to override the generic versions if there's
arch specific handling in its ioremap_prot(), ioremap() or iounmap().
This change will simplify implementation by removing duplicated code
with generic_ioremap_prot() and generic_iounmap(), and has the equivalent
functioality as before.

Here, add wrapper functions ioremap_prot() and iounmap() for powerpc's
special operation when ioremap() and iounmap().

Link: https://lkml.kernel.org/r/20230706154520.11257-18-bhe@redhat.com
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:35 -07:00
Baoquan He
016fec9101 mm: move is_ioremap_addr() into new header file
Now is_ioremap_addr() is only used in kernel/iomem.c and gonna be used in
mm/ioremap.c.  Move it into its own new header file linux/ioremap.h.

Link: https://lkml.kernel.org/r/20230706154520.11257-17-bhe@redhat.com
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:35 -07:00
Baoquan He
0b1f77e74b asm-generic/iomap.h: remove ARCH_HAS_IOREMAP_xx macros
Patch series "mm: ioremap: Convert architectures to take GENERIC_IOREMAP
way", v8.

Motivation and implementation:
==============================
Currently, many architecutres have't taken the standard GENERIC_IOREMAP
way to implement ioremap_prot(), iounmap(), and ioremap_xx(), but make
these functions specifically under each arch's folder.  Those cause many
duplicated code of ioremap() and iounmap().

In this patchset, firstly introduce generic_ioremap_prot() and
generic_iounmap() to extract the generic code for GENERIC_IOREMAP.  By
taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(),
generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and
iounmap() are all visible and available to arch.  Arch needs to provide
wrapper functions to override the generic version if there's arch specific
handling in its corresponding ioremap_prot(), ioremap() or iounmap(). 
With these changes, duplicated ioremap/iounmap() code uder ARCH-es are
removed, and the equivalent functioality is kept as before.

Background info:
================

1) The converting more architectures to take GENERIC_IOREMAP way is
   suggested by Christoph in below discussion:
   https://lore.kernel.org/all/Yp7h0Jv6vpgt6xdZ@infradead.org/T/#u

2) In the previous v1 to v3, it's basically further action after arm64
   has converted to GENERIC_IOREMAP way in below patchset.  It's done by
   adding hook ioremap_allowed() and iounmap_allowed() in ARCH to add ARCH
   specific handling the middle of ioremap_prot() and iounmap().

[PATCH v5 0/6] arm64: Cleanup ioremap() and support ioremap_prot()
https://lore.kernel.org/all/20220607125027.44946-1-wangkefeng.wang@huawei.com/T/#u

Later, during v3 reviewing, Christophe Leroy suggested to introduce
generic_ioremap_prot() and generic_iounmap() to generic codes, and ARCH
can provide wrapper function ioremap_prot(), ioremap() or iounmap() if
needed.  Christophe made a RFC patchset as below to specially demonstrate
his idea.  This is what v4 and now v5 is doing.

[RFC PATCH 0/8] mm: ioremap: Convert architectures to take GENERIC_IOREMAP way
https://lore.kernel.org/all/cover.1665568707.git.christophe.leroy@csgroup.eu/T/#u

Testing:
========
In v8, I only applied this patchset onto the latest linus's tree to build
and run on arm64 and s390.


This patch (of 19):

Let's use '#define ioremap_xx' and "#ifdef ioremap_xx" instead.

To remove defined ARCH_HAS_IOREMAP_xx macros in <asm/io.h> of each ARCH,
the ARCH's own ioremap_wc|wt|np definition need be above "#include
<asm-generic/iomap.h>.  Otherwise the redefinition error would be seen
during compiling.  So the relevant adjustments are made to avoid compiling
error:

  loongarch:
  - doesn't include <asm-generic/iomap.h>, defining ARCH_HAS_IOREMAP_WC
    is redundant, so simply remove it.

  m68k:
  - selected GENERIC_IOMAP, <asm-generic/iomap.h> has been added in
    <asm-generic/io.h>, and <asm/kmap.h> is included above
    <asm-generic/iomap.h>, so simply remove ARCH_HAS_IOREMAP_WT defining.

  mips:
  - move "#include <asm-generic/iomap.h>" below ioremap_wc definition
    in <asm/io.h>

  powerpc:
  - remove "#include <asm-generic/iomap.h>" in <asm/io.h> because it's
    duplicated with the one in <asm-generic/io.h>, let's rely on the
    latter.

  x86:
  - selected GENERIC_IOMAP, remove #include <asm-generic/iomap.h> in
    the middle of <asm/io.h>. Let's rely on <asm-generic/io.h>.

Link: https://lkml.kernel.org/r/20230706154520.11257-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Helge Deller <deller@gmx.de>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:32 -07:00
Hugh Dickins
32cc0b7c9d powerpc: add pte_free_defer() for pgtables sharing page
Add powerpc-specific pte_free_defer(), to free table page via call_rcu(). 
pte_free_defer() will be called inside khugepaged's retract_page_tables()
loop, where allocating extra memory cannot be relied upon.  This precedes
the generic version to avoid build breakage from incompatible pgtable_t.

This is awkward because the struct page contains only one rcu_head, but
that page may be shared between PTE_FRAG_NR pagetables, each wanting to
use the rcu_head at the same time.  But powerpc never reuses a fragment
once it has been freed: so mark the page Active in pte_free_defer(),
before calling pte_fragment_free() directly; and there call_rcu() to
pte_free_now() when last fragment is freed and the page is PageActive.

Link: https://lkml.kernel.org/r/6e3ca5f1-334d-4b14-b92d-fc8e99914fcb@google.com
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:23 -07:00
Mahesh Salgaonkar
e160bf64e2 powerpc/rtas: export rtas_error_rc() for reuse.
Also, #define descriptive names for common rtas return codes and use it
instead of numeric values.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/169235811556.193557.1023625262204809514.stgit@jupiter
2023-08-18 23:28:57 +10:00
Michal Suchanek
89c9ce1c99 powerpc: Move DMA64_PROPNAME define to a header
Avoid redefining the same value in multiple source.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230817162411.429-1-msuchanek@suse.de
2023-08-18 17:03:15 +10:00
Aneesh Kumar K.V
4d15721177 powerpc/mm: Cleanup memory block size probing
Parse the device tree in early init to find the memory block size to be
used by the kernel. Consolidate the memory block size device tree parsing
to one helper and use that on both powernv and pseries. We still want to
use machine-specific callback because on all machine types other than
powernv and pseries we continue to return MIN_MEMORY_BLOCK_SIZE.

pseries_memory_block_size used to look for the second memory
block (memory@x) to determine the memory_block_size value. This patch
changed that to look at all memory blocks and make sure we can map them all
correctly using the computed memory block size value.

Add workaround to force 256MB memory block size if device driver managed
memory such as GPU memory is present. This helps to add GPU memory
that is not aligned to 1G.

Co-developed-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230801044447.11275-1-aneesh.kumar@linux.ibm.com
2023-08-18 17:03:15 +10:00
Christophe Leroy
7cb0094be4 powerpc/32s: Cleanup the mess in __set_pte_at()
__set_pte_at() handles 3 main cases with #ifdefs plus the 'percpu'
subcase which leads to code duplication.

Rewrite the function using IS_ENABLED() to minimise the total number
of cases and remove duplicated code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2322dd08217bccab25456fe8b189edf0e6a8b6dd.1692121353.git.christophe.leroy@csgroup.eu
2023-08-18 17:03:14 +10:00
Christophe Leroy
eb5aa21372 powerpc/82xx: Remove CONFIG_8260 and CONFIG_8272
CONFIG_8272 is never used, remove it.

CONFIG_8260 is redundant with CONFIG_PPC_82xx, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/80930252a5167f3cdaa7eb694074d75521a0bdf9.1692259495.git.christophe.leroy@csgroup.eu
2023-08-18 17:03:14 +10:00
Benjamin Gray
1e60f3564b powerpc/watchpoints: Track perf single step directly on the breakpoint
There is a bug in the current watchpoint tracking logic, where the
teardown in arch_unregister_hw_breakpoint() uses bp->ctx->task, which it
does not have a reference of and parallel threads may be in the process
of destroying. This was partially addressed in commit fb822e6076
("powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event"),
but the underlying issue of accessing a struct member in an unknown
state still remained. Syzkaller managed to trigger a null pointer
derefernce due to the race between the task destructor and checking the
pointer and dereferencing it in the loop.

While this null pointer dereference could be fixed by using READ_ONCE
to access the task up front, that just changes the error to manipulating
possbily freed memory.

Instead, the breakpoint logic needs to be reworked to remove any
dependency on a context or task struct during breakpoint removal.

The reason we have this currently is to clear thread.last_hit_ubp. This
member is used to differentiate the perf DAWR single-step sequence from
other causes of single-step, such as userspace just calling
ptrace(PTRACE_SINGLESTEP, ...). We need to differentiate them because,
when the single step interrupt is received, we need to know whether to
re-insert the DAWR breakpoint (perf) or not (ptrace / other).

arch_unregister_hw_breakpoint() needs to clear this information to
prevent dangling pointers to possibly freed memory. These pointers are
dereferenced in single_step_dabr_instruction() without a way to check
their validity.

This patch moves the tracking of this information to the breakpoint
itself. This means we no longer have to do anything special to clean up.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230801011744.153973-4-bgray@linux.ibm.com
2023-08-16 23:54:50 +10:00
Linus Walleij
58b6fed89a powerpc: Make virt_to_pfn() a static inline
Making virt_to_pfn() a static inline taking a strongly typed
(const void *) makes the contract of a passing a pointer of that
type to the function explicit and exposes any misuse of the
macro virt_to_pfn() acting polymorphic and accepting many types
such as (void *), (unitptr_t) or (unsigned long) as arguments
without warnings.

Move the virt_to_pfn() and related functions below the
declaration of __pa() so it compiles.

For symmetry do the same with pfn_to_kaddr().

As the file is included right into the linker file, we need
to surround the functions with ifndef __ASSEMBLY__ so we
don't cause compilation errors.

The conversion moreover exposes the fact that pmd_page_vaddr()
was returning an unsigned long rather than a const void * as
could be expected, so all the sites defining pmd_page_vaddr()
had to be augmented as well.

Finally the KVM code in book3s_64_mmu_hv.c was passing an
unsigned int to virt_to_phys() so fix that up with a cast so the
result compiles.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[mpe: Fixup kfence.h, simplify pfn_to_kaddr() & pmd_page_vaddr()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230809-virt-to-phys-powerpc-v1-1-12e912a7d439@linaro.org
2023-08-16 23:54:49 +10:00
Masahiro Yamada
efa1f85019 powerpc: remove <asm/export.h>
All *.S files under arch/powerpc/ have been converted to include
<linux/export.h> instead of <asm/export.h>.

Remove <asm/export.h>.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230806150954.394189-3-masahiroy@kernel.org
2023-08-16 23:54:48 +10:00
Nick Desaulniers
ae7936d232 powerpc/inst: add PPC_TLBILX_LPID
Clang didn't recognize the instruction tlbilxlpid. This was fixed in
clang-18 [0] then backported to clang-17 [1].  To support clang-16 and
older, rather than using that instruction bare in inline asm, add it to
ppc-opcode.h and use that macro as is done elsewhere for other
instructions.

Link: https://github.com/ClangBuiltLinux/linux/issues/1891
Link: https://github.com/llvm/llvm-project/issues/64080
Link: 53648ac1d0 [0]
Link: 0af7e5e54a [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/llvm/202307211945.TSPcyOhh-lkp@intel.com/
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230803-ppc_tlbilxlpid-v3-1-ca84739bfd73@google.com
2023-08-16 23:54:48 +10:00
Christophe Leroy
6039fcd3fb powerpc/reg: Remove #ifdef around mtspr macro
That ifdef was introduced by commit 1458dd951f ("powerpc/8xx:
Handle CPU6 ERRATA directly in mtspr() macro") and left over by
commit 2a45addd21 ("powerpc/8xx: Remove CPU6 ERRATA Workaround")

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/cf652e47ea9e453e89813611b6f76d0939a12063.1687344017.git.christophe.leroy@csgroup.eu
2023-08-16 23:54:48 +10:00
Christophe Leroy
7768716d2f powerpc/cpm2: Remove cpm2_map() and cpm2_unmap()
Since commit 449012daa9 ("[POWERPC] cpm2: Infrastructure code
cleanup.") cpm2_map() is just returning cpm2_immr pointer and
cpm2_unmap() does nothing.

We already have parts of code that use cpm2_immr directly so get rid
of cpm2_map() and cpm2_unmap() by using cpm2_immr directly. And avoid
going through local pointers that hide the pointed structure.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/9fe6ff7284e9f968b12abe7de7c08d7ea40e29d6.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16 23:54:47 +10:00
Christophe Leroy
fbbf4280da powerpc/8xx: Remove immr_map() and immr_unmap()
Since commit fb533d0c5a ("[POWERPC] 8xx: Infrastructure code cleanup.")
immr_map() is just returning mpc8xxx_immr pointer and immr_unmap()
do nothing.

We already have parts of code that use mpc8xxx_immr directly so get rid
of immr_map() and immr_unmap() by using mpc8xxx_immr directly. And avoid
going through local pointers that hide the pointed structure.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/633ed46f6015ff44d5599258647ea517f75d6a1d.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16 23:54:47 +10:00
Christophe Leroy
fecc436a97 powerpc/include: Remove mpc8260.h and m82xx_pci.h
SIU_INT_IRQ1 is not used anywhere and __IO_BASE is defined in
asm/io.h

Remove m82xx_pci.h

Then the only thing remaining in mpc8260.h is MPC82XX_BCR_PLDP

Move MPC82XX_BCR_PLDP into asm/cpm2.h then remove mpc8260.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/afe23bf3624c389ff17e9789884c78c124b7b202.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16 23:54:47 +10:00
Christophe Leroy
e6e077cb2a powerpc/include: Declare mpc8xx_immr in 8xx_immap.h
Do the same as for cmp2_immr : declare it at the same place
as its type immap_t, that is in 8xx_immap.h instead of fs_pd.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/62d490b65899c2f2667ca7045c5f0fad9cbda458.1691474658.git.christophe.leroy@csgroup.eu
2023-08-16 23:54:47 +10:00
Michael Ellerman
15f63e306d Merge branch 'topic/cpu-smt' into next
Merge SMT changes we are sharing with the tip tree.
2023-08-14 21:46:03 +10:00
Mateusz Guzik
c8afaa1b0f locking: remove spin_lock_prefetch
The only remaining consumer is new_inode, where it showed up in 2001 as
commit c37fa164f793 ("v2.4.9.9 -> v2.4.9.10") in a historical repo [1]
with a changelog which does not mention it.

Since then the line got only touched up to keep compiling.

While it may have been of benefit back in the day, it is guaranteed to
at best not get in the way in the multicore setting -- as the code
performs *a lot* of work between the prefetch and actual lock acquire,
any contention means the cacheline is already invalid by the time the
routine calls spin_lock().  It adds spurious traffic, for short.

On top of it prefetch is notoriously tricky to use for single-threaded
purposes, making it questionable from the get go.

As such, remove it.

I admit upfront I did not see value in benchmarking this change, but I
can do it if that is deemed appropriate.

Removal from new_inode and of the entire thing are in the same patch as
requested by Linus, so whatever weird looks can be directed at that guy.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/fs/inode.c?id=c37fa164f793735b32aa3f53154ff1a7659e6442 [1]
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-12 09:18:47 -07:00
Christophe Leroy
c2d6c1b4f0 serial: cpm_uart: Use get_baudrate() instead of uart_baudrate()
uart_baudrate() is just a trivial wrapper to get_baudrate().

Use get_baudrate() directly and remove assignment in if condition.

And also remove uart_clock() which is not used since
commit 0b2a2e5b77 ("cpm_uart: Remove !CONFIG_PPC_CPM_NEW_BINDING
code")

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/4d497386f576a3df768e44a04f9bb512e424c311.1691068700.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-04 15:08:29 +02:00
ndesaulniers@google.com
79e8328e5a word-at-a-time: use the same return type for has_zero regardless of endianness
Compiling big-endian targets with Clang produces the diagnostic:

  fs/namei.c:2173:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
	} while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants)));
	          ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                               ||
  fs/namei.c:2173:13: note: cast one or both operands to int to silence this warning

It appears that when has_zero was introduced, two definitions were
produced with different signatures (in particular different return
types).

Looking at the usage in hash_name() in fs/namei.c, I suspect that
has_zero() is meant to be invoked twice per while loop iteration; using
logical-or would not update `bdata` when `a` did not have zeros.  So I
think it's preferred to always return an unsigned long rather than a
bool than update the while loop in hash_name() to use a logical-or
rather than bitwise-or.

[ Also changed powerpc version to do the same  - Linus ]

Link: https://github.com/ClangBuiltLinux/linux/issues/1832
Link: https://lore.kernel.org/lkml/20230801-bitwise-v1-1-799bec468dc4@google.com/
Fixes: 36126f8f2e ("word-at-a-time: make the interfaces truly generic")
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-02 10:23:36 -07:00
Michael Ellerman
73c58e7e14 powerpc: Add HOTPLUG_SMT support
Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
parameter.

Implement the recently added hooks to allow partial SMT states, allow
any number of threads per core.

Tie the config symbol to HOTPLUG_CPU, which enables it on the major
platforms that support SMT. If there are other platforms that want the
SMT support that can be tweaked in future.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[ldufour: remove topology_smt_supported]
[ldufour: remove topology_smt_threads_supported]
[ldufour: select CONFIG_SMT_NUM_THREADS_DYNAMIC]
[ldufour: update kernel-parameters.txt]
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Link: https://msgid.link/20230705145143.40545-10-ldufour@linux.ibm.com
2023-08-02 22:49:43 +10:00
Arnd Bergmann
54f30b83fe powerpc: address missing-prototypes warnings
There are a few warnings in powerpc64 defconfig builds after -Wmissing-prototypes
gets promoted from W=1 to the default warning set:

arch/powerpc/mm/book3s64/pgtable.c:422:6: error: no previous prototype for 'arch_report_meminfo' [-Werror=missing-prototypes]
arch/powerpc/platforms/cell/ras.c:275:5: error: no previous prototype for 'cbe_sysreset_hack' [-Werror=missing-prototypes]
arch/powerpc/platforms/cell/spu_manage.c:29:21: error: no previous prototype for 'spu_devnode' [-Werror=missing-prototypes]
arch/powerpc/platforms/pasemi/time.c:12:17: error: no previous prototype for 'pas_get_boot_time' [-Werror=missing-prototypes]
arch/powerpc/platforms/powermac/feature.c:1532:13: error: no previous prototype for 'g5_phy_disable_cpu1' [-Werror=missing-prototypes]
arch/powerpc/platforms/86xx/pic.c:28:13: error: no previous prototype for 'mpc86xx_init_irq' [-Werror=missing-prototypes]
drivers/pci/pci-sysfs.c:936:13: error: no previous prototype for 'pci_adjust_legacy_attr' [-Werror=missing-prototypes]

Address these by including the right header files or marking the
functions static. The audit.c one is a bit tricky since compat_audit.h
cannot include regular kernel headers tht have conflicting types on
32-bit powerpc.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[mpe: Drop change to __vmemmap_free() which only exists in mm]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230727122720.2558065-1-arnd@kernel.org
2023-08-02 22:22:19 +10:00
Rob Herring
81d7cac4d1 powerpc: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Signed-off-by: Rob Herring <robh@kernel.org>
[mpe: Fixup maple/setup.c which needs platform_device]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230724210247.778034-1-robh@kernel.org
2023-08-02 22:22:19 +10:00
Nicholas Piggin
f74b2a6c01 powerpc/64s: Use dec_mm_active_cpus helper
Avoid open-coded atomic_dec on mm->context.active_cpus and use the
function made for it. Add CONFIG_DEBUG_VM underflow checking on the
counter.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230524060821.148015-3-npiggin@gmail.com
2023-08-02 22:22:18 +10:00
Christophe Leroy
3a24ea0df8 powerpc/kuap: Use ASM feature fixups instead of static branches
To avoid a useless nop on top of every uaccess enable/disable and
make life easier for objtool, replace static branches by ASM feature
fixups that will nop KUAP enabling instructions out in the unlikely
case KUAP is disabled at boottime.

Leave it as is on book3s/64 for now, it will be handled later when
objtool is activated on PPC64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/671948788024fd890ec4ed175bc332dab8664ea5.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:18 +10:00
Christophe Leroy
eb52f66f0a powerpc/kuap: KUAP enabling/disabling functions must be __always_inline
Objtool reports following warnings:

  arch/powerpc/kernel/signal_32.o: warning: objtool:
    __prevent_user_access.constprop.0+0x4 (.text+0x4):
    redundant UACCESS disable

  arch/powerpc/kernel/signal_32.o: warning: objtool: user_access_begin+0x2c
    (.text+0x4c): return with UACCESS enabled

  arch/powerpc/kernel/signal_32.o: warning: objtool: handle_rt_signal32+0x188
    (.text+0x360): call to __prevent_user_access.constprop.0() with UACCESS enabled

  arch/powerpc/kernel/signal_32.o: warning: objtool: handle_signal32+0x150
    (.text+0x4d4): call to __prevent_user_access.constprop.0() with UACCESS enabled

This is due to some KUAP enabling/disabling functions being outline
allthough they are marked inline. Use __always_inline instead.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ca5e50ddbec3867db5146ebddbc9a1dc0e443bc8.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:18 +10:00
Christophe Leroy
5222a1d514 powerpc/kuap: Simplify KUAP lock/unlock on BOOK3S/32
On book3s/32 KUAP is performed at segment level. At the moment,
when enabling userspace access, only current segment is modified.
Then if a write is performed on another user segment, a fault is
taken and all other user segments get enabled for userspace
access. This then require special attention when disabling
userspace access.

Having a userspace write access crossing a segment boundary is
unlikely. Having a userspace write access crossing a segment boundary
back and forth is even more unlikely. So, instead of enabling
userspace access on all segments when a write fault occurs, just
change which segment has userspace access enabled in order to
eliminate the case when more than one segment has userspace access
enabled. That simplifies userspace access deactivation.

There is however a corner case which is even more unlikely but has
to be handled anyway: an unaligned access which is crossing a
segment boundary. That would definitely require at least having
userspace access enabled on the two segments. To avoid complicating
the likely case for a so unlikely happening, handle such situation
like an alignment exception and emulate the store.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/8de8580513c1a6e880bad1ba9a69d3efad3d4fa5.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:18 +10:00
Christophe Leroy
26e0412082 powerpc/kuap: Use MMU_FTR_KUAP on all and refactor disabling kuap
All but book3s/64 use a static branch key for disabling kuap.
book3s/64 uses an mmu feature.

Refactor all targets to use MMU_FTR_KUAP like book3s/64.

For PPC32 that implies updating mmu features fixups once KUAP
has been initialised.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/6b3d7c977bad73378ea368bc6818e9c94ea95ab0.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:18 +10:00
Christophe Leroy
4589a2b789 powerpc/kuap: MMU_FTR_BOOK3S_KUAP becomes MMU_FTR_KUAP
In order to reuse MMU_FTR_BOOK3S_KUAP for other targets than BOOK3S,
rename it MMU_FTR_KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/c8b6f7b8cd0eeaace96879ed0e0a157faa619451.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:18 +10:00
Christophe Leroy
6b289911c8 powerpc/features: Add capability to update mmu features later
On powerpc32, features fixup is performed very early and that's too
early to read the cmdline and take into account 'nosmap' parameter.

On the other hand, no userspace access is performed that early and
KUAP feature fixup can be performed later.

Add a function to update mmu features. The function is passed a
mask with the features that can be updated.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/31b27ee2c9d338f4f82cd8cd69d6bff979495290.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:17 +10:00
Christophe Leroy
38bb171b95 powerpc/kuap: Fold kuep_is_disabled() into its only user
kuep_is_disabled() was introduced by commit 91bb30822a ("powerpc/32s:
Refactor update of user segment registers") but then all users but one
were removed by commit 526d4a4c77 ("powerpc/32s: Do kuep_lock() and
kuep_unlock() in assembly").

Fold kuep_is_disabled() into init_new_context() which is its only user.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/b2247147c0a8c830ac82966451647850df4a64da.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:17 +10:00
Christophe Leroy
1bec4adcd5 powerpc/kuap: Avoid useless jump_label on empty function
Disassembly of interrupt_enter_prepare() shows a pointless nop
before the mftb

  c000abf0 <interrupt_enter_prepare>:
  c000abf0:       81 23 00 84     lwz     r9,132(r3)
  c000abf4:       71 29 40 00     andi.   r9,r9,16384
  c000abf8:       41 82 00 28     beq-    c000ac20 <interrupt_enter_prepare+0x30>
  c000abfc: ===>  60 00 00 00     nop	<====
  c000ac00:       7d 0c 42 e6     mftb    r8
  c000ac04:       80 e2 00 08     lwz     r7,8(r2)
  c000ac08:       81 22 00 28     lwz     r9,40(r2)
  c000ac0c:       91 02 00 24     stw     r8,36(r2)
  c000ac10:       7d 29 38 50     subf    r9,r9,r7
  c000ac14:       7d 29 42 14     add     r9,r9,r8
  c000ac18:       91 22 00 08     stw     r9,8(r2)
  c000ac1c:       4e 80 00 20     blr
  c000ac20:       60 00 00 00     nop
  c000ac24:       7d 5a c2 a6     mfmd_ap r10
  c000ac28:       3d 20 de 00     lis     r9,-8704
  c000ac2c:       91 43 00 b0     stw     r10,176(r3)
  c000ac30:       7d 3a c3 a6     mtspr   794,r9
  c000ac34:       4e 80 00 20     blr

That comes from the call to kuap_loc(), allthough __kuap_lock() is an
empty function on the 8xx.

To avoid that, only perform kuap_is_disabled() check when there is
something to do with __kuap_lock().

Do the same with __kuap_save_and_lock() and
__kuap_get_and_assert_locked().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/a854d25bea375d4ba6ca9c2617f9edbba397100a.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:17 +10:00
Christophe Leroy
880df2d46a powerpc/kuap: Avoid unnecessary reads of MD_AP
A disassembly of interrupt_exit_kernel_prepare() shows a useless read
of MD_AP register. This is shown by r9 being re-used immediately without
doing anything with the value read.

  c000e0e0:       60 00 00 00     nop
  c000e0e4: ===>  7d 3a c2 a6     mfmd_ap r9	<====
  c000e0e8:       7d 20 00 a6     mfmsr   r9
  c000e0ec:       7c 51 13 a6     mtspr   81,r2
  c000e0f0:       81 3f 00 84     lwz     r9,132(r31)
  c000e0f4:       71 29 80 00     andi.   r9,r9,32768

kuap_get_and_assert_locked() is paired with kuap_kernel_restore()
and are only used in interrupt_exit_kernel_prepare(). The value
returned by kuap_get_and_assert_locked() is only used by
kuap_kernel_restore().

On 8xx, kuap_kernel_restore() doesn't use the value read by
kuap_get_and_assert_locked() so modify kuap_get_and_assert_locked()
to not perform the read of MD_AP and return 0 instead.

The same applies on BOOKE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/bcbc84c2dd90bb1021da792b1968cdc22112dad8.1689091022.git.christophe.leroy@csgroup.eu
2023-08-02 22:22:17 +10:00
Andrew Donnellan
106ea7ffd5 Revert "powerpc/64s: Remove support for ELFv1 little endian userspace"
This reverts commit 606787fed7.

ELFv1 with LE has never been a thing, and people who try to make ELFv1 LE
binaries are maniacs who need to be stopped, but unfortunately there are
ELFv1 LE binaries out there in the wild.

One such binary is the ppc64el (as Debian calls it) helper for
arch-test[0], a tool for detecting architectures that can be executed on a
given machine by means of attempting to execute helper binaries compiled
for each architecture and seeing which binaries succeed and fail. The
helpers are small snippets of assembly, and the ppc64el assembly doesn't
include the right directives to generate an ELFv2 binary.

This results in arch-test incorrectly determining that a ppc64el kernel
can't execute a ppc64el userspace, which in turn means that a number of
developer tools such as debootstrap will break (assuming arch-test is
installed).

[0] https://github.com/kilobyte/arch-test

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230719071821.320594-1-ajd@linux.ibm.com
2023-07-19 21:28:35 +10:00
Christophe Leroy
b49e578b93 Revert "powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto"
This partly reverts commit 1e688dd2a3.

That commit aimed at optimising the code around generation of
WARN_ON/BUG_ON but this leads to a lot of dead code erroneously
generated by GCC.

That dead code becomes a problem when we start using objtool validation
because objtool will abort validation with a warning as soon as it
detects unreachable code. This is because unreachable code might
be the indication that objtool doesn't properly decode object text.

     text	   data	    bss	    dec	    hex	filename
  9551585	3627834	 224376	13403795	 cc8693	vmlinux.before
  9535281	3628358	 224376	13388015	 cc48ef	vmlinux.after

Once this change is reverted, in a standard configuration (pmac32 +
function tracer) the text is reduced by 16k which is around 1.7%

We already had problem with it when starting to use objtool on powerpc
as a replacement for recordmcount, see commit 93e3f45a26 ("powerpc:
Fix __WARN_FLAGS() for use with Objtool")

There is also a problem with at least GCC 12, on ppc64_defconfig +
CONFIG_CC_OPTIMIZE_FOR_SIZE=y + CONFIG_DEBUG_SECTION_MISMATCH=y :

    LD      .tmp_vmlinux.kallsyms1
  powerpc64-linux-ld: net/ipv4/tcp_input.o:(__ex_table+0xc4): undefined reference to `.L2136'
  make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
  make[1]: *** [/home/chleroy/linux-powerpc/Makefile:1238: vmlinux] Error 2

Taking into account that other problems are encountered with that
'asm goto' in WARN_ON(), including build failures, keeping that
change is not worth it allthough it is primarily a compiler bug.

Revert it for now.

mpe: Retain EMIT_WARN_ENTRY as a synonym for EMIT_BUG_ENTRY to reduce
churn, as there are now nearly as many uses of EMIT_WARN_ENTRY as
EMIT_BUG_ENTRY.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230712134552.534955-1-mpe@ellerman.id.au
2023-07-17 10:35:28 +10:00
Rick Edgecombe
2f0584f3f4 mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma()
The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.

One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable or shadow stack mappings.

But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.

So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.

Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
create a new arch config HAS_HUGE_PAGE that can be used to tell if
pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
compiler would not be able to find pmd_mkwrite_novma().

No functional change.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/
Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
2023-07-11 14:10:56 -07:00
Aneesh Kumar K.V
cf53564b11 powerpc/mm/book3s64/hash/4k: Add pmd_same callback for 4K page size
With commit 0d940a9b27 ("mm/pgtable: allow pte_offset_map[_lock]() to
fail") the kernel is now using pmd_same to compare pmd values that are
pointing to a level 4 page table page. Move the functions out of #ifdef
CONFIG_TRANSPARENT_HUGEPAGE and add a variant that can work with both 4K
and 64K page size.

kernel BUG at arch/powerpc/include/asm/book3s/64/hash-4k.h:141!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
.....
NIP [c00000000048aee0] __pte_offset_map_lock+0xf0/0x164
LR [c00000000048ae78] __pte_offset_map_lock+0x88/0x164
Call Trace:
 0xc0003f000009a340 (unreliable)
 __handle_mm_fault+0x1340/0x1980
 handle_mm_fault+0xbc/0x380
 __get_user_pages+0x320/0x550
 get_user_pages_remote+0x13c/0x520
 get_arg_page+0x80/0x1d0
 copy_string_kernel+0xc8/0x250
 kernel_execve+0x11c/0x270
 run_init_process+0xe4/0x10c
 kernel_init+0xbc/0x1a0
 ret_from_kernel_user_thread+0x14/0x1c

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230706022405.798157-1-aneesh.kumar@linux.ibm.com
2023-07-10 09:47:47 +10:00
Linus Torvalds
d8b0bd57c2 powerpc updates for 6.5
- Extend KCSAN support to 32-bit and BookE. Add some KCSAN annotations.
 
  - Make ELFv2 ABI the default for 64-bit big-endian kernel builds, and use
    the -mprofile-kernel option (kernel specific ftrace ABI) for big endian
    ELFv2 kernels.
 
  - Add initial Dynamic Execution Control Register (DEXCR) support, and allow
    the ROP protection instructions to be used on Power 10.
 
  - Various other small features and fixes.
 
 Thanks to: Aditya Gupta, Aneesh Kumar K.V, Benjamin Gray, Brian King,
 Christophe Leroy, Colin Ian King, Dmitry Torokhov, Gaurav Batra, Jean Delvare,
 Joel Stanley, Marco Elver, Masahiro Yamada, Nageswara R Sastry, Nathan
 Chancellor, Naveen N Rao, Nayna Jain, Nicholas Piggin, Paul Gortmaker, Randy
 Dunlap, Rob Herring, Rohan McLure, Russell Currey, Sachin Sant, Timothy
 Pearson, Tom Rix, Uwe Kleine-König.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmSeqQMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgKukD/sGUceX6gIc7UcjWhL1ZCVco0bsgLjT
 JrY1NenisGKjwwRd/o+2+h3ziJDoO5AsQfT72EiNLhaYJhnlb1d0vXzsvN0THc+2
 W5RrxAZUNhBy+c7gSSEJjy8+vBIwSQAliQLChHGOSejGCj94SxF5+zjUFvSX458I
 z0+ZQK+Fiw5NcpzEnBT0XPnLzap74a7TL0JcG1MLbj2QtHXhbfjIlkkPDX3kK0Gw
 xbelFy38X7KKbQsXXYSTCGqwRdJ3yqu21nEsjRuo2yT5H5rQbjCNggkMOL1DecDd
 ULGxit/z13Pt1Ad3oe+6FF17ggOiCG0F75DONASjFDthFYx6NQffkJS1n1VZauQj
 jU6LtWeD3HkgIYm6Udjq+LaSmkAmn5a+9tsElE/K+V1WG4rKyMeVmE3z/tCJG0l2
 yhLKyFs+glXN/LiWHyX0mrQIIVZdRK237X1qXJuIvAuB7Drm5duXFAHR8pdJD0dg
 H23OhoO2FvLxb9GvnzGxqjdazzattctz31wU/1RgnPxumYkJ9PlBcXn9h1uXa8/K
 rDZFJADsQhEfRCjmLG3GIaFWqZdc4Cn+ZUk4iHkjPDFDL05Fq7JYHIuwteiN6/wP
 NHRvtKdJisu583NI9RN9300JykrEqjSRbMOWlc3vuFwbRLGioXvWhWlIZ3/t58jG
 R8s+f0nKSPr+fg==
 =ssit
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Extend KCSAN support to 32-bit and BookE. Add some KCSAN annotations

 - Make ELFv2 ABI the default for 64-bit big-endian kernel builds, and
   use the -mprofile-kernel option (kernel specific ftrace ABI) for big
   endian ELFv2 kernels

 - Add initial Dynamic Execution Control Register (DEXCR) support, and
   allow the ROP protection instructions to be used on Power 10

 - Various other small features and fixes

Thanks to Aditya Gupta, Aneesh Kumar K.V, Benjamin Gray, Brian King,
Christophe Leroy, Colin Ian King, Dmitry Torokhov, Gaurav Batra, Jean
Delvare, Joel Stanley, Marco Elver, Masahiro Yamada, Nageswara R Sastry,
Nathan Chancellor, Naveen N Rao, Nayna Jain, Nicholas Piggin, Paul
Gortmaker, Randy Dunlap, Rob Herring, Rohan McLure, Russell Currey,
Sachin Sant, Timothy Pearson, Tom Rix, and Uwe Kleine-König.

* tag 'powerpc-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (76 commits)
  powerpc: remove checks for binutils older than 2.25
  powerpc: Fail build if using recordmcount with binutils v2.37
  powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
  powerpc/iommu: Only build sPAPR access functions on pSeries
  powerpc: powernv: Annotate data races in opal events
  powerpc: Mark writes registering ipi to host cpu through kvm and polling
  powerpc: Annotate accesses to ipi message flags
  powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
  powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
  powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
  powerpc: qspinlock: Mark accesses to qnode lock checks
  powerpc/powernv/pci: Remove last IODA1 defines
  powerpc/powernv/pci: Remove MVE code
  powerpc/powernv/pci: Remove ioda1 support
  powerpc: 52xx: Make immr_id DT match tables static
  powerpc: mpc512x: Remove open coded "ranges" parsing
  powerpc: fsl_soc: Use of_range_to_resource() for "ranges" parsing
  powerpc: fsl: Use of_property_read_reg() to parse "reg"
  powerpc: fsl_rio: Use of_range_to_resource() for "ranges" parsing
  macintosh: Use of_property_read_reg() to parse "reg"
  ...
2023-06-30 09:20:08 -07:00
Linus Torvalds
1b722407a1 drm changes for 6.5-rc1:
core:
 - replace strlcpy with strscpy
 - EDID changes to support further conversion to struct drm_edid
 - Move i915 DSC parameter code to common DRM helpers
 - Add Colorspace functionality
 
 aperture:
 - ignore framebuffers with non-primary devices
 
 fbdev:
 - use fbdev i/o helpers
 - add Kconfig options for fb_ops helpers
 - use new fb io helpers directly in drivers
 
 sysfs:
 - export DRM connector ID
 
 scheduler:
 - Avoid an infinite loop
 
 ttm:
 - store function table in .rodata
 - Add query for TTM mem limit
 - Add NUMA awareness to pools
 - Export ttm_pool_fini()
 
 bridge:
 - fsl-ldb: support i.MX6SX
 - lt9211, lt9611: remove blanking packets
 - tc358768: implement input bus formats, devm cleanups
 - ti-snd65dsi86: implement wait_hpd_asserted
 - analogix: fix endless probe loop
 - samsung-dsim: support swapped clock, fix enabling, support var clock
 - display-connector: Add support for external power supply
 - imx: Fix module linking
 - tc358762: Support reset GPIO
 
 panel:
 - nt36523: Support Lenovo J606F
 - st7703: Support Anbernic RG353V-V2
 - InnoLux G070ACE-L01 support
 - boe-tv101wum-nl6: Improve initialization
 - sharp-ls043t1le001: Mode fixes
 - simple: BOE EV121WXM-N10-1850, S6D7AA0
 - Ampire AM-800480L1TMQW-T00H
 - Rocktech RK043FN48H
 - Starry himax83102-j02
 - Starry ili9882t
 
 amdgpu:
 - add new ctx query flag to handle reset better
 - add new query/set shadow buffer for rdna3
 - DCN 3.2/3.1.x/3.0.x updates
 - Enable DC_FP on loongarch
 - PCIe fix for RDNA2
 - improve DC FAMS/SubVP support for better power management
 - partition support for lots of engines
 - Take NUMA into account when allocating memory
 - Add new DRM_AMDGPU_WERROR config parameter to help with CI
 - Initial SMU13 overdrive support
 - Add support for new colorspace KMS API
 - W=1 fixes
 
 amdkfd:
 - Query TTM mem limit rather than hardcoding it
 - GC 9.4.3 partition support
 - Handle NUMA for partitions
 - Add debugger interface for enabling gdb
 - Add KFD event age tracking
 
 radeon:
 - Fix possible UAF
 
 i915:
 - new getparam for PXP support
 - GSC/MEI proxy driver
 - Meteorlake display enablement
 - avoid clearing preallocated framebuffers with TTM
 - implement framebuffer mmap support
 - Disable sampler indirect state in bindless heap
 - Enable fdinfo for GuC backends
 - GuC loading and firmware table handling fixes
 - Various refactors for multi-tile enablement
 - Define MOCS and PAT tables for MTL
 - GSC/MEI support for Meteorlake
 - PMU multi-tile support
 - Large driver kernel doc cleanup
 - Allow VRR toggling and arbitrary refresh rates
 - Support async flips on linear buffers on display ver 12+
 - Expose CRTC CTM property on ILK/SNB/VLV
 - New debugfs for display clock frequencies
 - Hotplug refactoring
 - Display refactoring
 - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
 - Use large rings for compute contexts
 - HuC loading for MTL
 - Allow user to set cache at BO creation
 - MTL powermanagement enhancements
 - Switch to dedicated workqueues to stop using flush_scheduled_work()
 - Move display runtime init under display/
 - Remove 10bit gamma on desktop gen3 parts, they don't support it
 
 habanalabs:
 - uapi: return 0 for user queries if there was a h/w or f/w error
 - Add pci health check when we lose connection with the firmware. This can be used to
   distinguish between pci link down and firmware getting stuck.
 - Add more info to the error print when TPC interrupt occur.
 - Firmware fixes
 
 msm:
 - Adreno A660 bindings
 - SM8350 MDSS bindings fix
 - Added support for DPU on sm6350 and sm6375 platforms
 - Implemented tearcheck support to support vsync on SM150 and newer platforms
 - Enabled missing features (DSPP, DSC, split display) on sc8180x, sc8280xp, sm8450
 - Added support for DSI and 28nm DSI PHY on MSM8226 platform
 - Added support for DSI on sm6350 and sm6375 platforms
 - Added support for display controller on MSM8226 platform
 - A690 GPU support
 - Move cmdstream dumping out of fence signaling path
 - a610 support
 - Support for a6xx devices without GMU
 
 nouveau:
 - NULL ptr before deref fixes
 
 armada:
 - implement fbdev emulation as client
 
 sun4i:
 - fix mipi-dsi dotclock
 - release clocks
 
 vc4:
 - rgb range toggle property
 - BT601 / BT2020 HDMI support
 
 vkms:
 - convert to drmm helpers
 - add reflection and rotation support
 - fix rgb565 conversion
 
 gma500:
 - fix iomem access
 
 shmobile:
 - support renesas soc platform
 - enable fbdev
 
 mxsfb:
 - Add support for i.MX93 LCDIF
 
 stm:
 - dsi: Use devm_ helper
 - ltdc: Fix potential invalid pointer deref
 
 renesas:
 - Group drivers in renesas subdirectory to prepare for new platform
 - Drop deprecated R-Car H3 ES1.x support
 
 meson:
 - Add support for MIPI DSI displays
 
 virtio:
 - add sync object support
 
 mediatek:
 - Add display binding document for MT6795
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmSc3UwACgkQDHTzWXnE
 hr69fQ/+PF9L7FSB/qfjaoqJnk6wJyCehv7pDX2/UK7FUrW0e4EwNVx4KKIRqO/P
 pKSU9wRlC72ViGgqOYnw0pwzuh45630vWo1stbgxipU2cvM6Ywlq8FiQFdymFe+P
 tLYWe5MR55Y+E9Y+bCrKn2yvQ7v+f6EZ6ITIX7mrXL77Bpxhv58VzmZawkxmw5MV
 vwhSqJaaeeWNoyfSIDdN8Oj9fE6ScTyiA0YisOP6jnK/TiQofXQxFrMIdKctCcoA
 HjolfEEPVCDOSBipkV3hLiyN8lXmt47BmuHp9opSL/g1aASteVeD1/GrccTaA4xV
 ah+Jx1hBLcH5sm8CZzbCcHhNu3ILnPCFZFCx8gwflQqmDIOZvoMdL75j7lgqJZG8
 TePEiifG3kYO/ZiDc5TUBdeMfbgeehPOsxbvOlA3LxJrgyxe/5o9oejX2Uvvzhoq
 9fno1PLqeCILqYaMiCocJwyTw/2VKYCCH7Wiypd4o3h0nmAbbqPT3KeZgNOjoa2X
 GXpiIU9rTQ8LZgSmOXdCt2rc9Jb6q+eCiDgrZzAukbP8veQyOvO16Nx1+XzLhOYc
 BfjEOoA7nBJD+UPLWkwj42gKtoEWN7IOMTHgcK11d8jdpGISGupl/1nntGhYk0jO
 +3RRZXMB/Gjwe9ge4K9bFC81pbfuAE7ELQtPsgV9LapMmWHKccY=
 =FmUA
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "There is one set of patches to misc for a i915 gsc/mei proxy driver.

  Otherwise it's mostly amdgpu/i915/msm, lots of hw enablement and lots
  of refactoring.

  core:
   - replace strlcpy with strscpy
   - EDID changes to support further conversion to struct drm_edid
   - Move i915 DSC parameter code to common DRM helpers
   - Add Colorspace functionality

  aperture:
   - ignore framebuffers with non-primary devices

  fbdev:
   - use fbdev i/o helpers
   - add Kconfig options for fb_ops helpers
   - use new fb io helpers directly in drivers

  sysfs:
   - export DRM connector ID

  scheduler:
   - Avoid an infinite loop

  ttm:
   - store function table in .rodata
   - Add query for TTM mem limit
   - Add NUMA awareness to pools
   - Export ttm_pool_fini()

  bridge:
   - fsl-ldb: support i.MX6SX
   - lt9211, lt9611: remove blanking packets
   - tc358768: implement input bus formats, devm cleanups
   - ti-snd65dsi86: implement wait_hpd_asserted
   - analogix: fix endless probe loop
   - samsung-dsim: support swapped clock, fix enabling, support var
     clock
   - display-connector: Add support for external power supply
   - imx: Fix module linking
   - tc358762: Support reset GPIO

  panel:
   - nt36523: Support Lenovo J606F
   - st7703: Support Anbernic RG353V-V2
   - InnoLux G070ACE-L01 support
   - boe-tv101wum-nl6: Improve initialization
   - sharp-ls043t1le001: Mode fixes
   - simple: BOE EV121WXM-N10-1850, S6D7AA0
   - Ampire AM-800480L1TMQW-T00H
   - Rocktech RK043FN48H
   - Starry himax83102-j02
   - Starry ili9882t

  amdgpu:
   - add new ctx query flag to handle reset better
   - add new query/set shadow buffer for rdna3
   - DCN 3.2/3.1.x/3.0.x updates
   - Enable DC_FP on loongarch
   - PCIe fix for RDNA2
   - improve DC FAMS/SubVP support for better power management
   - partition support for lots of engines
   - Take NUMA into account when allocating memory
   - Add new DRM_AMDGPU_WERROR config parameter to help with CI
   - Initial SMU13 overdrive support
   - Add support for new colorspace KMS API
   - W=1 fixes

  amdkfd:
   - Query TTM mem limit rather than hardcoding it
   - GC 9.4.3 partition support
   - Handle NUMA for partitions
   - Add debugger interface for enabling gdb
   - Add KFD event age tracking

  radeon:
   - Fix possible UAF

  i915:
   - new getparam for PXP support
   - GSC/MEI proxy driver
   - Meteorlake display enablement
   - avoid clearing preallocated framebuffers with TTM
   - implement framebuffer mmap support
   - Disable sampler indirect state in bindless heap
   - Enable fdinfo for GuC backends
   - GuC loading and firmware table handling fixes
   - Various refactors for multi-tile enablement
   - Define MOCS and PAT tables for MTL
   - GSC/MEI support for Meteorlake
   - PMU multi-tile support
   - Large driver kernel doc cleanup
   - Allow VRR toggling and arbitrary refresh rates
   - Support async flips on linear buffers on display ver 12+
   - Expose CRTC CTM property on ILK/SNB/VLV
   - New debugfs for display clock frequencies
   - Hotplug refactoring
   - Display refactoring
   - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
   - Use large rings for compute contexts
   - HuC loading for MTL
   - Allow user to set cache at BO creation
   - MTL powermanagement enhancements
   - Switch to dedicated workqueues to stop using flush_scheduled_work()
   - Move display runtime init under display/
   - Remove 10bit gamma on desktop gen3 parts, they don't support it

  habanalabs:
   - uapi: return 0 for user queries if there was a h/w or f/w error
   - Add pci health check when we lose connection with the firmware.
     This can be used to distinguish between pci link down and firmware
     getting stuck.
   - Add more info to the error print when TPC interrupt occur.
   - Firmware fixes

  msm:
   - Adreno A660 bindings
   - SM8350 MDSS bindings fix
   - Added support for DPU on sm6350 and sm6375 platforms
   - Implemented tearcheck support to support vsync on SM150 and newer
     platforms
   - Enabled missing features (DSPP, DSC, split display) on sc8180x,
     sc8280xp, sm8450
   - Added support for DSI and 28nm DSI PHY on MSM8226 platform
   - Added support for DSI on sm6350 and sm6375 platforms
   - Added support for display controller on MSM8226 platform
   - A690 GPU support
   - Move cmdstream dumping out of fence signaling path
   - a610 support
   - Support for a6xx devices without GMU

  nouveau:
   - NULL ptr before deref fixes

  armada:
   - implement fbdev emulation as client

  sun4i:
   - fix mipi-dsi dotclock
   - release clocks

  vc4:
   - rgb range toggle property
   - BT601 / BT2020 HDMI support

  vkms:
   - convert to drmm helpers
   - add reflection and rotation support
   - fix rgb565 conversion

  gma500:
   - fix iomem access

  shmobile:
   - support renesas soc platform
   - enable fbdev

  mxsfb:
   - Add support for i.MX93 LCDIF

  stm:
   - dsi: Use devm_ helper
   - ltdc: Fix potential invalid pointer deref

  renesas:
   - Group drivers in renesas subdirectory to prepare for new platform
   - Drop deprecated R-Car H3 ES1.x support

  meson:
   - Add support for MIPI DSI displays

  virtio:
   - add sync object support

  mediatek:
   - Add display binding document for MT6795"

* tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm: (1791 commits)
  drm/i915: Fix a NULL vs IS_ERR() bug
  drm/i915: make i915_drm_client_fdinfo() reference conditional again
  drm/i915/huc: Fix missing error code in intel_huc_init()
  drm/i915/gsc: take a wakeref for the proxy-init-completion check
  drm/msm/a6xx: Add A610 speedbin support
  drm/msm/a6xx: Add A619_holi speedbin support
  drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
  drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
  drm/msm/a6xx: Fix some A619 tunables
  drm/msm/a6xx: Add A610 support
  drm/msm/a6xx: Add support for A619_holi
  drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
  drm/msm/a6xx: Introduce GMU wrapper support
  drm/msm/a6xx: Move CX GMU power counter enablement to hw_init
  drm/msm/a6xx: Extend and explain UBWC config
  drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
  drm/msm/a6xx: Add a helper for software-resetting the GPU
  drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
  drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
  drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
  ...
2023-06-29 11:00:17 -07:00
Linus Torvalds
77b1a7f7a0 - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in
top-level directories.
 
 - Douglas Anderson has added a new "buddy" mode to the hardlockup
   detector.  It permits the detector to work on architectures which
   cannot provide the required interrupts, by having CPUs periodically
   perform checks on other CPUs.
 
 - Zhen Lei has enhanced kexec's ability to support two crash regions.
 
 - Petr Mladek has done a lot of cleanup on the hard lockup detector's
   Kconfig entries.
 
 - And the usual bunch of singleton patches in various places.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJelTAAKCRDdBJ7gKXxA
 juDkAP0VXWynzkXoojdS/8e/hhi+htedmQ3v2dLZD+vBrctLhAEA7rcH58zAVoWa
 2ejqO6wDrRGUC7JQcO9VEjT0nv73UwU=
 =F293
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-mm updates from Andrew Morton:

 - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level
   directories

 - Douglas Anderson has added a new "buddy" mode to the hardlockup
   detector. It permits the detector to work on architectures which
   cannot provide the required interrupts, by having CPUs periodically
   perform checks on other CPUs

 - Zhen Lei has enhanced kexec's ability to support two crash regions

 - Petr Mladek has done a lot of cleanup on the hard lockup detector's
   Kconfig entries

 - And the usual bunch of singleton patches in various places

* tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
  kernel/time/posix-stubs.c: remove duplicated include
  ocfs2: remove redundant assignment to variable bit_off
  watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
  powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
  devres: show which resource was invalid in __devm_ioremap_resource()
  watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
  watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
  watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
  watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
  watchdog/hardlockup: make the config checks more straightforward
  watchdog/hardlockup: sort hardlockup detector related config values a logical way
  watchdog/hardlockup: move SMP barriers from common code to buddy code
  watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
  watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
  watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
  watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
  watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
  watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
  watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
  ...
2023-06-28 10:59:38 -07:00
Linus Torvalds
6e17c6de3d - Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing.
 
 - Nhat Pham adds the new cachestat() syscall.  It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability.
 
 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning.
 
 - Lorenzo Stoakes has done some maintanance work on the get_user_pages()
   interface.
 
 - Liam Howlett continues with cleanups and maintenance work to the maple
   tree code.  Peng Zhang also does some work on maple tree.
 
 - Johannes Weiner has done some cleanup work on the compaction code.
 
 - David Hildenbrand has contributed additional selftests for
   get_user_pages().
 
 - Thomas Gleixner has contributed some maintenance and optimization work
   for the vmalloc code.
 
 - Baolin Wang has provided some compaction cleanups,
 
 - SeongJae Park continues maintenance work on the DAMON code.
 
 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting.
 
 - Christoph Hellwig has some cleanups for the filemap/directio code.
 
 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the provided
   APIs rather than open-coding accesses.
 
 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings.
 
 - John Hubbard has a series of fixes to the MM selftesting code.
 
 - ZhangPeng continues the folio conversion campaign.
 
 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock.
 
 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from
   128 to 8.
 
 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management.
 
 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code.
 
 - Vishal Moola also has done some folio conversion work.
 
 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA
 joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh
 J0qXCzqaaN8+BuEyLGDVPaXur9KirwY=
 =B7yQ
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00
Linus Torvalds
bc6cb4d5bc Locking changes for v6.5:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double().
 
   The cmpxchg128() family of functions is basically & functionally
   the same as cmpxchg_double(), but with a saner interface: instead
   of a 6-parameter horror that forced u128 - u64/u64-halves layout
   details on the interface and exposed users to complexity,
   fragility & bugs, use a natural 3-parameter interface with u128 types.
 
 - Restructure the generated atomic headers, and add
   kerneldoc comments for all of the generic atomic{,64,_long}_t
   operations. Generated definitions are much cleaner now,
   and come with documentation.
 
 - Implement lock_set_cmp_fn() on lockdep, for defining an ordering
   when taking multiple locks of the same type. This gets rid of
   one use of lockdep_set_novalidate_class() in the bcache code.
 
 - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended
   variable shadowing generating garbage code on Clang on certain
   ARM builds.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSav3wRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gDyxAAjCHQjpolrre7fRpyiTDwqzIKT27H04vQ
 zrQVlVc42WBnn9pe8LthGy43/RvYvqlZvLoLONA4fMkuYriM6nSMsoZjeUmE+6Rs
 QAElQC74P5YvEBOa67VNY3/M7sj22ftDe7ODtVV8OrnPjMk1sQNRvaK025Cs3yig
 8MAI//hHGNmyVAp1dPYZMJNqxGCvluReLZ4SaUJFCMrg7YgUXgCBj/5Gi07TlKxn
 sT8BFCssoEW/B9FXkh59B1t6FBCZoSy4XSZfsZe0uVAUJ4XDEOO+zBgaWFCedNQT
 wP323ryBgMrkzUKA8j2/o5d3QnMA1GcBfHNNlvAl/fOfrxWXzDZnOEY26YcaLMa0
 YIuRF/JNbPZlt6DCUVBUEvMPpfNYi18dFN0rat1a6xL2L4w+tm55y3mFtSsg76Ka
 r7L2nWlRrAGXnuA+VEPqkqbSWRUSWOv5hT2Mcyb5BqqZRsxBETn6G8GVAzIO6j6v
 giyfUdA8Z9wmMZ7NtB6usxe3p1lXtnZ/shCE7ZHXm6xstyZrSXaHgOSgAnB9DcuJ
 7KpGIhhSODQSwC/h/J0KEpb9Pr/5jCWmXAQ2DWnZK6ndt1jUfFi8pfK58wm0AuAM
 o9t8Mx3o8wZjbMdt6up9OIM1HyFiMx2BSaZK+8f/bWemHQ0xwez5g4k5O5AwVOaC
 x9Nt+Tp0Ze4=
 =DsYj
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()

   The cmpxchg128() family of functions is basically & functionally the
   same as cmpxchg_double(), but with a saner interface.

   Instead of a 6-parameter horror that forced u128 - u64/u64-halves
   layout details on the interface and exposed users to complexity,
   fragility & bugs, use a natural 3-parameter interface with u128
   types.

 - Restructure the generated atomic headers, and add kerneldoc comments
   for all of the generic atomic{,64,_long}_t operations.

   The generated definitions are much cleaner now, and come with
   documentation.

 - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
   taking multiple locks of the same type.

   This gets rid of one use of lockdep_set_novalidate_class() in the
   bcache code.

 - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
   shadowing generating garbage code on Clang on certain ARM builds.

* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
  percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
  locking/atomic: treewide: delete arch_atomic_*() kerneldoc
  locking/atomic: docs: Add atomic operations to the driver basic API documentation
  locking/atomic: scripts: generate kerneldoc comments
  docs: scripts: kernel-doc: accept bitwise negation like ~@var
  locking/atomic: scripts: simplify raw_atomic*() definitions
  locking/atomic: scripts: simplify raw_atomic_long*() definitions
  locking/atomic: scripts: split pfx/name/sfx/order
  locking/atomic: scripts: restructure fallback ifdeffery
  locking/atomic: scripts: build raw_atomic_long*() directly
  locking/atomic: treewide: use raw_atomic*_<op>()
  locking/atomic: scripts: add trivial raw_atomic*_<op>()
  locking/atomic: scripts: factor out order template generation
  locking/atomic: scripts: remove leftover "${mult}"
  locking/atomic: scripts: remove bogus order parameter
  locking/atomic: xtensa: add preprocessor symbols
  locking/atomic: x86: add preprocessor symbols
  locking/atomic: sparc: add preprocessor symbols
  locking/atomic: sh: add preprocessor symbols
  ...
2023-06-27 14:14:30 -07:00
Linus Torvalds
7cffdbe360 Updates for the x86 boot process:
- Initialize FPU late.
 
    Right now FPU is initialized very early during boot. There is no real
    requirement to do so. The only requirement is to have it done before
    alternatives are patched.
 
    That's done in check_bugs() which does way more than what the function
    name suggests.
 
    So first rename check_bugs() to arch_cpu_finalize_init() which makes it
    clear what this is about.
 
    Move the invocation of arch_cpu_finalize_init() earlier in
    start_kernel() as it has to be done before fork_init() which needs to
    know the FPU register buffer size.
 
    With those prerequisites the FPU initialization can be moved into
    arch_cpu_finalize_init(), which removes it from the early and fragile
    part of the x86 bringup.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZdNYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaNBEACWtVd1uhqQldIFgSvZYujsrWXlmkU+
 pok6gDzKQNwZADiXW/tn5fP8SBLWT0pgLM9d+oZ5mEaLaOW7HcZLEHcVrn74e3TT
 53xN8e1zCzyjCJ/x22vrKH4sn/bU+bQyzSNVu9Disqn9Fl+ts37FqAHDv/ExbneD
 DaYXXCLgQsyGbPLD8B7yGOpJTGBUTJxNQS1ZFElBaRsAaw0mYZOEoPvuTFK4o7Uz
 GUB2vGefmeNfX+EgLYKG9QoS0F3SMS9X2IYswy1H76ZnV/eXmTsA1S3u3X9yX7kC
 XBnPtCC+iX+7o3xFkTpa0oQUdzEyGOItExZZgce6jEQu4Fl7NoIJxhlMg9/Y+vcF
 ntipEKSWFLAi1GkZzeKRwSSsoWqRaFxOKLy8qhn9kud09k+UtMBkNrF1CSp9laAz
 QParu3B1oHPEzx/jS0bSOCMN+AQZH8rX7LxRp4kpBOeBSZNCnfaBUzfIvmccPls+
 EJTO/0JUpRm5LsPSDiJhypPRoOOIP26IloR6OoZTcI3p76NrnYblRvisvuFAgDU6
 bk7Belf+GDx0kBZugqQgok7nDaHIBR7vEmca1NV8507UrffVyxLAiI4CiWPcFdOq
 ovhO8K+gP4xvzZx4cXZBwYwusjvl/oxKy8yQiGgoftDiWU4sdUCSrwX3x27+hUYL
 2P1OLDOXSGwESQ==
 =yxMj
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Thomas Gleixner:
 "Initialize FPU late.

  Right now FPU is initialized very early during boot. There is no real
  requirement to do so. The only requirement is to have it done before
  alternatives are patched.

  That's done in check_bugs() which does way more than what the function
  name suggests.

  So first rename check_bugs() to arch_cpu_finalize_init() which makes
  it clear what this is about.

  Move the invocation of arch_cpu_finalize_init() earlier in
  start_kernel() as it has to be done before fork_init() which needs to
  know the FPU register buffer size.

  With those prerequisites the FPU initialization can be moved into
  arch_cpu_finalize_init(), which removes it from the early and fragile
  part of the x86 bringup"

* tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
  x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
  x86/fpu: Mark init functions __init
  x86/fpu: Remove cpuinfo argument from init functions
  x86/init: Initialize signal frame size late
  init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
  init: Invoke arch_cpu_finalize_init() earlier
  init: Remove check_bugs() leftovers
  um/cpu: Switch to arch_cpu_finalize_init()
  sparc/cpu: Switch to arch_cpu_finalize_init()
  sh/cpu: Switch to arch_cpu_finalize_init()
  mips/cpu: Switch to arch_cpu_finalize_init()
  m68k/cpu: Switch to arch_cpu_finalize_init()
  loongarch/cpu: Switch to arch_cpu_finalize_init()
  ia64/cpu: Switch to arch_cpu_finalize_init()
  ARM: cpu: Switch to arch_cpu_finalize_init()
  x86/cpu: Switch to arch_cpu_finalize_init()
  init: Provide arch_cpu_finalize_init()
2023-06-26 13:39:10 -07:00
Linus Torvalds
64bf6ae93e v6.5/vfs.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZJU4SwAKCRCRxhvAZXjc
 ojOTAP9gT/z1gasIf8OwDHb4inZGnVpHh2ApKLvgMXH6ICtwRgD+OBtOcf438Lx1
 cpFSTVJlh21QXMOOXWHe/LRUV2kZ5wI=
 =zdfx
 -----END PGP SIGNATURE-----

Merge tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Miscellaneous features, cleanups, and fixes for vfs and individual fs

  Features:

   - Use mode 0600 for file created by cachefilesd so it can be run by
     unprivileged users. This aligns them with directories which are
     already created with mode 0700 by cachefilesd

   - Reorder a few members in struct file to prevent some false sharing
     scenarios

   - Indicate that an eventfd is used a semaphore in the eventfd's
     fdinfo procfs file

   - Add a missing uapi header for eventfd exposing relevant uapi
     defines

   - Let the VFS protect transitions of a superblock from read-only to
     read-write in addition to the protection it already provides for
     transitions from read-write to read-only. Protecting read-only to
     read-write transitions allows filesystems such as ext4 to perform
     internal writes, keeping writers away until the transition is
     completed

  Cleanups:

   - Arnd removed the architecture specific arch_report_meminfo()
     prototypes and added a generic one into procfs.h. Note, we got a
     report about a warning in amdpgpu codepaths that suggested this was
     bisectable to this change but we concluded it was a false positive

   - Remove unused parameters from split_fs_names()

   - Rename put_and_unmap_page() to unmap_and_put_page() to let the name
     reflect the order of the cleanup operation that has to unmap before
     the actual put

   - Unexport buffer_check_dirty_writeback() as it is not used outside
     of block device aops

   - Stop allocating aio rings from highmem

   - Protecting read-{only,write} transitions in the VFS used open-coded
     barriers in various places. Replace them with proper little helpers
     and document both the helpers and all barrier interactions involved
     when transitioning between read-{only,write} states

   - Use flexible array members in old readdir codepaths

  Fixes:

   - Use the correct type __poll_t for epoll and eventfd

   - Replace all deprecated strlcpy() invocations, whose return value
     isn't checked with an equivalent strscpy() call

   - Fix some kernel-doc warnings in fs/open.c

   - Reduce the stack usage in jffs2's xattr codepaths finally getting
     rid of this: fs/jffs2/xattr.c:887:1: error: the frame size of 1088
     bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
     royally annoying compilation warning

   - Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY where an int and not
     fmode_t is required to avoid fmode_t to integer degradation
     warnings

   - Create coredumps with O_WRONLY instead of O_RDWR. There's a long
     explanation in that commit how O_RDWR is actually a bug which we
     found out with the help of Linus and git archeology

   - Fix "no previous prototype" warnings in the pipe codepaths

   - Add overflow calculations for remap_verify_area() as a signed
     addition overflow could be triggered in xfstests

   - Fix a null pointer dereference in sysv

   - Use an unsigned variable for length calculations in jfs avoiding
     compilation warnings with gcc 13

   - Fix a dangling pipe pointer in the watch queue codepath

   - The legacy mount option parser provided as a fallback by the VFS
     for filesystems not yet converted to the new mount api did prefix
     the generated mount option string with a leading ',' causing issues
     for some filesystems

   - Fix a repeated word in a comment in fs.h

   - autofs: Update the ctime when mtime is updated as mandated by
     POSIX"

* tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
  readdir: Replace one-element arrays with flexible-array members
  fs: Provide helpers for manipulating sb->s_readonly_remount
  fs: Protect reconfiguration of sb read-write from racing writes
  eventfd: add a uapi header for eventfd userspace APIs
  autofs: set ctime as well when mtime changes on a dir
  eventfd: show the EFD_SEMAPHORE flag in fdinfo
  fs/aio: Stop allocating aio rings from HIGHMEM
  fs: Fix comment typo
  fs: unexport buffer_check_dirty_writeback
  fs: avoid empty option when generating legacy mount string
  watch_queue: prevent dangling pipe pointer
  fs.h: Optimize file struct to prevent false sharing
  highmem: Rename put_and_unmap_page() to unmap_and_put_page()
  cachefiles: Allow the cache to be non-root
  init: remove unused names parameter in split_fs_names()
  jfs: Use unsigned variable for length calculations
  fs/sysv: Null check to prevent null-ptr-deref bug
  fs: use UB-safe check for signed addition overflow in remap_verify_area
  procfs: consolidate arch_report_meminfo declaration
  fs: pipe: reveal missing function protoypes
  ...
2023-06-26 09:50:21 -07:00
Douglas Anderson
df8b78e163 powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
The powerpc architecture was the only one that defined
arch_trigger_cpumask_backtrace() in asm/nmi.h instead of
asm/irq.h. Move it to be consistent.

This fixes compile time errors introduced by commit 7ca8fe94aa
("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH"). That commit
caused <asm/nmi.h> to stop being included if the hardlockup detector
wasn't enabled. The specific errors were:
  error: implicit declaration of function `nmi_cpu_backtrace'
  error: implicit declaration of function `nmi_trigger_cpumask_backtrace'

NOTE: when moving this into irq.h, we also change the guards from just
checking if "CONFIG_NMI_IPI" is defined to also checking if
"CONFIG_PPC_BOOK3S_64" is defined. This matches the code in
arch/powerpc/kernel/stacktrace.c. Previously this worked because
<asm.nmi.h> was included if "CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH" was
defined. For powerpc that's only selected if "CONFIG_PPC_BOOK3S_64" is
defined.

[dianders@chromium.org: change the guards to include CONFIG_PPC_BOOK3S_64]
  Link: https://lkml.kernel.org/r/20230622202816.v2.1.Ice67126857506712559078e7de26d32d26e64631@changeid
Link: https://lkml.kernel.org/r/20230621164809.1.Ice67126857506712559078e7de26d32d26e64631@changeid
Fixes: 7ca8fe94aa ("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Closes: https://lore.kernel.org/r/871qi5otdh.fsf@mail.lhotse
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23 17:04:04 -07:00
Rohan McLure
86dacd967b powerpc: Mark writes registering ipi to host cpu through kvm and polling
Mark writes to hypervisor ipi state so that KCSAN recognises these
asynchronous issue of kvmppc_{set,clear}_host_ipi to be intended, with
atomic writes. Mark asynchronous polls to this variable in
kvm_ppc_read_one_intr().

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-9-rmclure@linux.ibm.com
2023-06-21 15:13:57 +10:00
Rohan McLure
b0c5b4f1ee powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
The idle_state entry in the PACA on PowerNV features a bit which is
atomically tested and set through ldarx/stdcx. to be used as a spinlock.
This lock then guards access to other bit fields of idle_state. KCSAN
cannot differentiate between any of these bitfield accesses as they all
are implemented by 8-byte store/load instructions, thus cores contending
on the bit-lock appear to data race with modifications to idle_state.

Separate the bit-lock entry from the data guarded by the lock to avoid
the possibility of data races being detected by KCSAN.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-7-rmclure@linux.ibm.com
2023-06-21 15:13:57 +10:00
Rohan McLure
be286b8637 powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
Checks to see if the [H]SRR registers have been clobbered by (soft)
NMI interrupts imply the possibility for a data race on the
[h]srr_valid entries in the PACA. Annotate accesses to these fields with
READ_ONCE, removing the need for the barrier.

The diagnostic can use plain-access reads and writes, but annotate with
data_race.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-5-rmclure@linux.ibm.com
2023-06-21 15:13:57 +10:00
Joel Stanley
326b3f8c6e powerpc/powernv/pci: Remove MVE code
With IODA1 support gone the OPAL calls to set MVE are dead code. Remove
them.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230613045202.294451-3-joel@jms.id.au
2023-06-21 15:13:57 +10:00
Petr Mladek
0c68bda696 watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
arch_touch_nmi_watchdog() needs a different implementation for various
hardlockup detector implementations. And it does nothing when
any hardlockup detector is not built at all.

arch_touch_nmi_watchdog() is declared via linux/nmi.h. And it must be
defined as an empty function when there is no hardlockup detector.
It is done directly in this header file for the perf and buddy detectors.
And it is done in the included asm/linux.h for arch specific detectors.

The reason probably is that the arch specific variants build the code
using another conditions. For example, powerpc64/sparc64 builds the code
when CONFIG_PPC_WATCHDOG is enabled.

Another reason might be that these architectures define more functions
in asm/nmi.h anyway.

However the generic code actually knows when the function will be
implemented. It happens when some full featured or the sparc64-specific
hardlockup detector is built.

In particular, CONFIG_HARDLOCKUP_DETECTOR can be enabled only when
a generic or arch-specific full featured hardlockup detector is available.
The only exception is sparc64 which can be built even when the global
HARDLOCKUP_DETECTOR switch is disabled.

The information about sparc64 is a bit complicated. The hardlockup
detector is built there when CONFIG_HAVE_NMI_WATCHDOG is set and
CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.

People might wonder whether this change really makes things easier.
The motivation is:

  + The current logic in linux/nmi.h is far from obvious.
    For example, arch_touch_nmi_watchdog() is defined as {} when
    neither CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER nor
    CONFIG_HAVE_NMI_WATCHDOG is defined.

  + The change synchronizes the checks in lib/Kconfig.debug and
    in the generic code.

  + It is a step that will help cleaning HAVE_NMI_WATCHDOG related
    checks.

The change should not change the existing behavior.

Link: https://lkml.kernel.org/r/20230616150618.6073-4-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:25:29 -07:00
Catalin Marinas
78615c4ddb powerpc: move the ARCH_DMA_MINALIGN definition to asm/cache.h
Patch series "Move the ARCH_DMA_MINALIGN definition to asm/cache.h".

The ARCH_KMALLOC_MINALIGN reduction series defines a generic
ARCH_DMA_MINALIGN in linux/cache.h:

https://lore.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com/

Unfortunately, this causes a duplicate definition warning for
microblaze, powerpc (32-bit only) and sh as these architectures define
ARCH_DMA_MINALIGN in a different file than asm/cache.h. Move the macro
to asm/cache.h to avoid this issue and also bring them in line with the
other architectures.


This patch (of 3):

The powerpc architecture defines ARCH_DMA_MINALIGN in asm/page_32.h and
only if CONFIG_NOT_COHERENT_CACHE is enabled (32-bit platforms only). 
Move this macro to asm/cache.h to allow a generic ARCH_DMA_MINALIGN
definition in linux/cache.h without redefine errors/warnings.

Link: https://lkml.kernel.org/r/20230613155245.1228274-1-catalin.marinas@arm.com
Link: https://lkml.kernel.org/r/20230613155245.1228274-2-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306131053.1ybvRRhO-lkp@intel.com/
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rich Felker <dalias@libc.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:24 -07:00
Benjamin Gray
be98fcf7c1 powerpc/dexcr: Support userspace ROP protection
The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR
to hold a key used in the hash calculation. This key should be different
for each process to make it harder for a malicious process to recreate
valid hash values for a victim process.

Add support for storing a per-thread hash key, and setting/clearing
HASHKEYR appropriately.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-6-bgray@linux.ibm.com
2023-06-19 17:36:26 +10:00
Benjamin Gray
5bcba4e6c1 powerpc/dexcr: Handle hashchk exception
Recognise and pass the appropriate signal to the user program when a
hashchk instruction triggers. This is independent of allowing
configuration of DEXCR[NPHIE], as a hypervisor can enforce this aspect
regardless of the kernel.

The signal mirrors how ARM reports their similar check failure. For
example, their FPAC handler in arch/arm64/kernel/traps.c do_el0_fpac()
does this. When we fail to read the instruction that caused the fault
we send a segfault, similar to how emulate_math() does it.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-5-bgray@linux.ibm.com
2023-06-19 17:36:26 +10:00
Benjamin Gray
0ffd60b782 powerpc/dexcr: Add initial Dynamic Execution Control Register (DEXCR) support
ISA 3.1B introduces the Dynamic Execution Control Register (DEXCR). It
is a per-cpu register that allows control over various CPU behaviours
including branch hint usage, indirect branch speculation, and
hashst/hashchk support.

Add some definitions and basic support for the DEXCR in the kernel.
Right now it just

  * Initialises the DEXCR and HASHKEYR to a fixed value when a CPU
    onlines.
  * Clears them in reset_sprs().
  * Detects when the NPHIE aspect is supported (the others don't get
    looked at in this series, so there's no need to waste a CPU_FTR
    on them).

We initialise the HASHKEYR to ensure that all cores have the same key,
so an HV enforced NPHIE + swapping cores doesn't randomly crash a
process using hash instructions. The stores to HASHKEYR are
unconditional because the ISA makes no mention of the SPR being missing
if support for doing the hashes isn't present. So all that would happen
is the HASHKEYR value gets ignored. This helps slightly if NPHIE
detection fails; e.g., we currently only detect it on pseries.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Use simple values for DEXCR constants]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-4-bgray@linux.ibm.com
2023-06-19 17:36:25 +10:00
Benjamin Gray
7eec97b32e powerpc/book3s: Add missing <linux/sched.h> include
The functions here use struct task_struct fields, so need to import
the full definition from <linux/sched.h>. The <asm/current.h> header
that defines current only forward declares struct task_struct.

Failing to include this <linux/sched.h> header leads to a compilation
error when a translation unit does not also include <linux/sched.h>
indirectly.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-2-bgray@linux.ibm.com
2023-06-19 17:36:25 +10:00
Dave Airlie
cce3b573a5 Linux 6.4-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmSPcdMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWrQH/3KmuvZsWMC4PpJY
 VcF9VfF9i+Zv7DoG8sjD5VpNh47e87RsR6WNOFnKol5SUrM6vsBAb5i2rfQahNIv
 NSj0fPCE4/Nj9LMecKVC9WD8CitxYdbR+CF9Is21AQj1VihUl9eHXGcAWxuaMyhk
 TjPUwmbOOsRVMXXdGJzjX78cvLsxqpSv8A/5OTh16IBimbh7p+YjKJFkbfj/PMWf
 aF1quFkIEXgzJcHCpP6KDZHr2KbpY+jIN9hUENnGKJxHYNso5u+KrIW1kAm8meP1
 x26ETSquM0T70OAzovOWg+BeVkLDac/3Rh30ztLAI4AtajrlSzycvFsU9UNEJCc2
 BnM2IZI=
 =ANT5
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next

Linux 6.4-rc7

Need this to pull in the msm work.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-06-19 16:01:25 +10:00
Thomas Gleixner
61235b24b9 init: Remove check_bugs() leftovers
Everything is converted over to arch_cpu_finalize_init(). Remove the
check_bugs() leftovers including the empty stubs in asm-generic, alpha,
parisc, powerpc and xtensa.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20230613224545.553215951@linutronix.de
2023-06-16 10:16:00 +02:00
Nicholas Piggin
afc6386815 powerpc: merge 32-bit and 64-bit _switch implementation
The _switch stack frame setup are substantially the same, so are the
comments. The difference in how the stack and current are switched,
and other hardware and software housekeeping is done is moved into
macros.

Generated code should be unchanged.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Tweak include orer to fix compile errors on some configs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606132447.315714-6-npiggin@gmail.com
2023-06-15 14:03:55 +10:00
Nicholas Piggin
606787fed7 powerpc/64s: Remove support for ELFv1 little endian userspace
ELFv2 was introduced together with little-endian. ELFv1 with LE has
never been a thing. The GNU toolchain can create such a beast, but
anyone doing that is a maniac who needs to be stopped so I consider
this patch a feature.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230606093832.199712-5-npiggin@gmail.com
2023-06-14 12:46:42 +10:00
Douglas Anderson
df95d3085c watchdog/hardlockup: rename some "NMI watchdog" constants/function
Do a search and replace of:
- NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED
- SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED
- watchdog_nmi_ => watchdog_hardlockup_
- nmi_watchdog_available => watchdog_hardlockup_available
- nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled
- soft_watchdog_user_enabled => watchdog_softlockup_user_enabled
- NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT

Then update a few comments near where names were changed.

This is specifically to make it less confusing when we want to introduce
the buddy hardlockup detector, which isn't using NMIs.  As part of this,
we sanitized a few names for consistency.

[trix@redhat.com: make variables static]
  Link: https://lkml.kernel.org/r/20230525162822.1.I0fb41d138d158c9230573eaa37dc56afa2fb14ee@changeid
Link: https://lkml.kernel.org/r/20230519101840.v5.12.I91f7277bab4bf8c0cb238732ed92e7ce7bbd71a6@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 17:44:20 -07:00
Arnd Bergmann
ad1a48301f init: consolidate prototypes in linux/init.h
The init/main.c file contains some extern declarations for functions
defined in architecture code, and it defines some other functions that are
called from architecture code with a custom prototype.  Both of those
result in warnings with 'make W=1':

init/calibrate.c:261:37: error: no previous prototype for 'calibrate_delay_is_known' [-Werror=missing-prototypes]
init/main.c:790:20: error: no previous prototype for 'mem_encrypt_init' [-Werror=missing-prototypes]
init/main.c:792:20: error: no previous prototype for 'poking_init' [-Werror=missing-prototypes]
arch/arm64/kernel/irq.c:122:13: error: no previous prototype for 'init_IRQ' [-Werror=missing-prototypes]
arch/arm64/kernel/time.c:55:13: error: no previous prototype for 'time_init' [-Werror=missing-prototypes]
arch/x86/kernel/process.c:935:13: error: no previous prototype for 'arch_post_acpi_subsys_init' [-Werror=missing-prototypes]
init/calibrate.c:261:37: error: no previous prototype for 'calibrate_delay_is_known' [-Werror=missing-prototypes]
kernel/fork.c:991:20: error: no previous prototype for 'arch_task_cache_init' [-Werror=missing-prototypes]

Add prototypes for all of these in include/linux/init.h or another
appropriate header, and remove the duplicate declarations from
architecture specific code.

[sfr@canb.auug.org.au: declare time_init_early()]
  Link: https://lkml.kernel.org/r/20230519124311.5167221c@canb.auug.org.au
Link: https://lkml.kernel.org/r/20230517131102.934196-12-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 17:44:16 -07:00
Christophe Leroy
396f2b0106 powerpc/kcsan: Properly instrument arch_spin_unlock()
The following boottime error is encountered with SMP kernel:

  kcsan: improperly instrumented type=(0): arch_spin_unlock(&arch_spinlock)
  kcsan: improperly instrumented type=(0): spin_unlock(&test_spinlock)
  kcsan: improperly instrumented type=(KCSAN_ACCESS_WRITE): arch_spin_unlock(&arch_spinlock)
  kcsan: improperly instrumented type=(KCSAN_ACCESS_WRITE): spin_unlock(&test_spinlock)
  kcsan: improperly instrumented type=(KCSAN_ACCESS_WRITE | KCSAN_ACCESS_COMPOUND): arch_spin_unlock(&arch_spinlock)
  kcsan: improperly instrumented type=(KCSAN_ACCESS_WRITE | KCSAN_ACCESS_COMPOUND): spin_unlock(&test_spinlock)
  kcsan: selftest: test_barrier failed
  kcsan: selftest: 2/3 tests passed
  Kernel panic - not syncing: selftests failed

Properly instrument arch_spin_unlock() with kcsan_mb().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/57834a703dfa5d6c27c9de0a01329059636e5ab7.1685080579.git.christophe.leroy@csgroup.eu
2023-06-09 23:29:51 +10:00
Michael Ellerman
4b668b3202 Merge branch 'fixes' into next
Merge our fixes branch, in particular to bring in the change to
arch/powerpc/boot/Makefile, which is depended upon by a subsequent
series.
2023-06-09 23:28:26 +10:00
Mark Rutland
d12157efc8 locking/atomic: make atomic*_{cmp,}xchg optional
Most architectures define the atomic/atomic64 xchg and cmpxchg
operations in terms of arch_xchg and arch_cmpxchg respectfully.

Add fallbacks for these cases and remove the trivial cases from arch
code. On some architectures the existing definitions are kept as these
are used to build other arch_atomic*() operations.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-5-mark.rutland@arm.com
2023-06-05 09:57:14 +02:00
Arnd Bergmann
ef104443bf procfs: consolidate arch_report_meminfo declaration
The arch_report_meminfo() function is provided by four architectures,
with a __weak fallback in procfs itself. On architectures that don't
have a custom version, the __weak version causes a warning because
of the missing prototype.

Remove the architecture specific prototypes and instead add one
in linux/proc_fs.h.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for arch/x86
Acked-by: Helge Deller <deller@gmx.de> # parisc
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Message-Id: <20230516195834.551901-1-arnd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-17 09:24:49 +02:00
Jason Gunthorpe
ad593827db powerpc/iommu: Remove iommu_del_device()
Now that power calls iommu_device_register() and populates its groups
using iommu_ops->device_group it should not be calling
iommu_group_remove_device().

The core code owns the groups and all the other related iommu data, it
will clean it up automatically.

Remove the bus notifiers and explicit calls to
iommu_group_remove_device().

Fixes: a940904443 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/0-v1-1421774b874b+167-ppc_device_group_jgg@nvidia.com
2023-05-17 00:51:06 +10:00
Uwe Kleine-König
7d3ee229ea powerpc: Drop MPC5200 LocalPlus bus FIFO driver
While mpc5200b.dtsi contains a device that this driver can bind to, the
only purpose of a bound device is to be used by the four exported functions
mpc52xx_lpbfifo_submit(), mpc52xx_lpbfifo_abort(), mpc52xx_lpbfifo_poll()
and mpc52xx_lpbfifo_start_xfer(). However there are no callers to this
function and so the driver is effectively superfluous and can be deleted.
Also drop some defines and a struct from <asm/mpc52xx.h> that are unused
now together with the declarations of the four mentioned functions.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230413061642.kqkor4wkt7lp2mhp@pengutronix.de
2023-05-15 23:25:56 +10:00
Maxime Ripard
ff32fcca64
Merge drm/drm-next into drm-misc-next
Start the 6.5 release cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2023-05-09 15:03:40 +02:00
Linus Torvalds
b115d85a95 Locking changes in v6.4:
- Introduce local{,64}_try_cmpxchg() - a slightly more optimal
    primitive, which will be used in perf events ring-buffer code.
 
  - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation.
 
  - Misc cleanups/fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRUvUoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hlIhAArP33rTKi+HAndQ3UHW3XtmHRxEEQTfiE
 wvIoN89h58QW4DGMeAV4ltafbIPQAkI233Aogwz903L0qbDV0Ro4OU3XJembRuWl
 LeOADKwYyypXdOa8XICuY9aIP7e1/h0DF3ySs7inLcwK9JCyAIxnsVHYej+hsRXA
 kZoXN98T3TR1C0V9UQy4SU3HI1lC3tsG3R9Ti9TnYUg3ygVXhRE9lOQ4kv9lFPVz
 BNuj2Blj7KNiVaY9kehrhO54THI7NmsCVZO44Rcl48I0KAcFulAmFcNlE7GnR8Nj
 thj38pU6XAFVHXG8MYjgE+Al+PnK48NtJxexCtHyGvGG4D2aLzRMnkolxAUCcVuK
 G+UBsQm3ybjYgHgt1zuN6ehcpT+5tULkDH8JA7vrgZYaVgxHzsUaHgYfCCWKnmUY
 mPR6aImEmYZwZVNLskhe0HT4mq244bp+VnWlnJ6LZK7t/itenvDhqnj7KTi4Bfej
 lTHplOTitV/8uCEW8V4pX+YTEenVsIQmTc/G3iIabXP/6HzLffA3q4vyW6vKIErE
 pqrpuFA0Z4GB+pU0mJXt7+I7zscDVthwI055jDyQBjA7IcdVGm2MjQ6xcNRW5FYN
 UynvaEMocue4ZO4WdFsd1ZBUd9VfoNzGQspBw46DhCL1MEQBYv36SKQNjej/9aRr
 ilVwqnOWI2s=
 =mM0A
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Introduce local{,64}_try_cmpxchg() - a slightly more optimal
   primitive, which will be used in perf events ring-buffer code

 - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation

 - Misc cleanups/fixes

* tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: Correct (cmp)xchg() instrumentation
  locking/x86: Define arch_try_cmpxchg_local()
  locking/arch: Wire up local_try_cmpxchg()
  locking/generic: Wire up local{,64}_try_cmpxchg()
  locking/atomic: Add generic try_cmpxchg{,64}_local() support
  locking/rwbase: Mitigate indefinite writer starvation
  locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-05 12:56:55 -07:00
Linus Torvalds
c8c655c34e s390:
* More phys_to_virt conversions
 
 * Improvement of AP management for VSIE (nested virtualization)
 
 ARM64:
 
 * Numerous fixes for the pathological lock inversion issue that
   plagued KVM/arm64 since... forever.
 
 * New framework allowing SMCCC-compliant hypercalls to be forwarded
   to userspace, hopefully paving the way for some more features
   being moved to VMMs rather than be implemented in the kernel.
 
 * Large rework of the timer code to allow a VM-wide offset to be
   applied to both virtual and physical counters as well as a
   per-timer, per-vcpu offset that complements the global one.
   This last part allows the NV timer code to be implemented on
   top.
 
 * A small set of fixes to make sure that we don't change anything
   affecting the EL1&0 translation regime just after having having
   taken an exception to EL2 until we have executed a DSB. This
   ensures that speculative walks started in EL1&0 have completed.
 
 * The usual selftest fixes and improvements.
 
 KVM x86 changes for 6.4:
 
 * Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
   and by giving the guest control of CR0.WP when EPT is enabled on VMX
   (VMX-only because SVM doesn't support per-bit controls)
 
 * Add CR0/CR4 helpers to query single bits, and clean up related code
   where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
   as a bool
 
 * Move AMD_PSFD to cpufeatures.h and purge KVM's definition
 
 * Avoid unnecessary writes+flushes when the guest is only adding new PTEs
 
 * Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
   when emulating invalidations
 
 * Clean up the range-based flushing APIs
 
 * Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
   A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
   changed SPTE" overhead associated with writing the entire entry
 
 * Track the number of "tail" entries in a pte_list_desc to avoid having
   to walk (potentially) all descriptors during insertion and deletion,
   which gets quite expensive if the guest is spamming fork()
 
 * Disallow virtualizing legacy LBRs if architectural LBRs are available,
   the two are mutually exclusive in hardware
 
 * Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
   after KVM_RUN, similar to CPUID features
 
 * Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
 
 * Apply PMU filters to emulated events and add test coverage to the
   pmu_event_filter selftest
 
 x86 AMD:
 
 * Add support for virtual NMIs
 
 * Fixes for edge cases related to virtual interrupts
 
 x86 Intel:
 
 * Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
   not being reported due to userspace not opting in via prctl()
 
 * Fix a bug in emulation of ENCLS in compatibility mode
 
 * Allow emulation of NOP and PAUSE for L2
 
 * AMX selftests improvements
 
 * Misc cleanups
 
 MIPS:
 
 * Constify MIPS's internal callbacks (a leftover from the hardware enabling
   rework that landed in 6.3)
 
 Generic:
 
 * Drop unnecessary casts from "void *" throughout kvm_main.c
 
 * Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
   size by 8 bytes on 64-bit kernels by utilizing a padding hole
 
 Documentation:
 
 * Fix goof introduced by the conversion to rST
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
 Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
 WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
 huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
 XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
 j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
 =S2Hq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "s390:

   - More phys_to_virt conversions

   - Improvement of AP management for VSIE (nested virtualization)

  ARM64:

   - Numerous fixes for the pathological lock inversion issue that
     plagued KVM/arm64 since... forever.

   - New framework allowing SMCCC-compliant hypercalls to be forwarded
     to userspace, hopefully paving the way for some more features being
     moved to VMMs rather than be implemented in the kernel.

   - Large rework of the timer code to allow a VM-wide offset to be
     applied to both virtual and physical counters as well as a
     per-timer, per-vcpu offset that complements the global one. This
     last part allows the NV timer code to be implemented on top.

   - A small set of fixes to make sure that we don't change anything
     affecting the EL1&0 translation regime just after having having
     taken an exception to EL2 until we have executed a DSB. This
     ensures that speculative walks started in EL1&0 have completed.

   - The usual selftest fixes and improvements.

  x86:

   - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
     enabled, and by giving the guest control of CR0.WP when EPT is
     enabled on VMX (VMX-only because SVM doesn't support per-bit
     controls)

   - Add CR0/CR4 helpers to query single bits, and clean up related code
     where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
     return as a bool

   - Move AMD_PSFD to cpufeatures.h and purge KVM's definition

   - Avoid unnecessary writes+flushes when the guest is only adding new
     PTEs

   - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
     optimizations when emulating invalidations

   - Clean up the range-based flushing APIs

   - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
     single A/D bit using a LOCK AND instead of XCHG, and skip all of
     the "handle changed SPTE" overhead associated with writing the
     entire entry

   - Track the number of "tail" entries in a pte_list_desc to avoid
     having to walk (potentially) all descriptors during insertion and
     deletion, which gets quite expensive if the guest is spamming
     fork()

   - Disallow virtualizing legacy LBRs if architectural LBRs are
     available, the two are mutually exclusive in hardware

   - Disallow writes to immutable feature MSRs (notably
     PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features

   - Overhaul the vmx_pmu_caps selftest to better validate
     PERF_CAPABILITIES

   - Apply PMU filters to emulated events and add test coverage to the
     pmu_event_filter selftest

   - AMD SVM:
       - Add support for virtual NMIs
       - Fixes for edge cases related to virtual interrupts

   - Intel AMX:
       - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
         XTILE_DATA is not being reported due to userspace not opting in
         via prctl()
       - Fix a bug in emulation of ENCLS in compatibility mode
       - Allow emulation of NOP and PAUSE for L2
       - AMX selftests improvements
       - Misc cleanups

  MIPS:

   - Constify MIPS's internal callbacks (a leftover from the hardware
     enabling rework that landed in 6.3)

  Generic:

   - Drop unnecessary casts from "void *" throughout kvm_main.c

   - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
     struct size by 8 bytes on 64-bit kernels by utilizing a padding
     hole

  Documentation:

   - Fix goof introduced by the conversion to rST"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
  KVM: s390: pci: fix virtual-physical confusion on module unload/load
  KVM: s390: vsie: clarifications on setting the APCB
  KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
  KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
  KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
  KVM: selftests: Test the PMU event "Instructions retired"
  KVM: selftests: Copy full counter values from guest in PMU event filter test
  KVM: selftests: Use error codes to signal errors in PMU event filter test
  KVM: selftests: Print detailed info in PMU event filter asserts
  KVM: selftests: Add helpers for PMC asserts in PMU event filter test
  KVM: selftests: Add a common helper for the PMU event filter guest code
  KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
  KVM: arm64: vhe: Drop extra isb() on guest exit
  KVM: arm64: vhe: Synchronise with page table walker on MMU update
  KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
  KVM: arm64: nvhe: Synchronise with page table walker on TLBI
  KVM: arm64: Handle 32bit CNTPCTSS traps
  KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
  KVM: arm64: vgic: Don't acquire its_lock before config_lock
  KVM: selftests: Add test to verify KVM's supported XCR0
  ...
2023-05-01 12:06:20 -07:00
Uros Bizjak
d994f2c8e2 locking/arch: Wire up local_try_cmpxchg()
Implement target specific support for local_try_cmpxchg()
and local_cmpxchg() using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230405141710.3551-4-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:16 +02:00
Andrzej Hajda
068550631f locking/arch: Rename all internal __xchg() names to __arch_xchg()
Decrease the probability of this internal facility to be used by
driver code.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [riscv]
Link: https://lore.kernel.org/r/20230118154450.73842-1-andrzej.hajda@intel.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:08:44 +02:00
Linus Torvalds
70cc1b5307 powerpc updates for 6.4
- Add support for building the kernel using PC-relative addressing on Power10.
 
  - Allow HV KVM guests on Power10 to use prefixed instructions.
 
  - Unify support for the P2020 CPU (85xx) into a single machine description.
 
  - Always build the 64-bit kernel with 128-bit long double.
 
  - Drop support for several obsolete 2000's era development boards as
    identified by Paul Gortmaker.
 
  - A series fixing VFIO on Power since some generic changes.
 
  - Various other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
 Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley,
 Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael
 Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin,
 Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras,
 Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher
 Boessenkool, Timothy Pearson.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmRLdD8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPrED/93Hwi1/8udXYV6OAFBnLLLDX9RZWNx
 sj6W7PC/yY7MGUTIGcayrAURt5xFfQZMBdq9nhhH46Wyd8pSUe1IlpXIEgqeH64Y
 5uaSe6u4OZ5dDmiYz8bM+Y4Ixkfq1xMO0Rj27FIRJyU4Pp6gyMQ6/8W+iPU2vYIb
 jl4frVO8PpmCbW8euOyT/b9YB2h79+5nLgZT4RvmYJblKQwgzRZWaiCAU0wYf+xT
 xYsMQcqJlPstizTnXIr+GC08VrMPQR51kJCurnNUMXoAQY6toEXebveWRNZ3sH39
 K0BRQ036NNGPS4GlHVbjgLIdoWr4pUEROZ48Jy9WeiDh6OwO2vb2zrm7ZLtKlGXI
 LCQ08T9diPbAAJyVJaKjpsNXhTffuLPRhIOr1o2vqGvY+Fqbw96bPQAyFbxpJtrw
 rGTTc5e93fEdZV+eaGR8kfdNM75WM6lgiteaIbUnpirYTh/0j6YEO1Ijc4d9e5ux
 aoHN2eXQDjMm9foZf1D6vaCTRyN8LwLa9hcydKCn6+qULUQzoZ2r4cRt6eSD7+fx
 Wni1Zg7vD6xx294nVmhg5dWuD4qVIAZj3BZPFHHR2SPqy+UbEJLk/NKgznmrhhgQ
 HBcZAEFqIBZkA4e0w6LJKh/1j1rxpZUokgjMotOJq4VRivq82W1P7SioFDY3/6w5
 UvGtIgGq4Qsa2Q==
 =89WH
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add support for building the kernel using PC-relative addressing on
   Power10.

 - Allow HV KVM guests on Power10 to use prefixed instructions.

 - Unify support for the P2020 CPU (85xx) into a single machine
   description.

 - Always build the 64-bit kernel with 128-bit long double.

 - Drop support for several obsolete 2000's era development boards as
   identified by Paul Gortmaker.

 - A series fixing VFIO on Power since some generic changes.

 - Various other small features and fixes.

Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel
Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro
Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas
Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali
Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob
Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and
Timothy Pearson.

* tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits)
  powerpc/64s: Disable pcrel code model on Clang
  powerpc: Fix merge conflict between pcrel and copy_thread changes
  powerpc/configs/powernv: Add IGB=y
  powerpc/configs/64s: Drop JFS Filesystem
  powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems
  powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest
  powerpc/configs: Make pseries_le an alias for ppc64le_guest
  powerpc/configs: Incorporate generic kvm_guest.config into guest configs
  powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs
  powerpc/configs/64s: Enable Device Mapper options
  powerpc/configs/64s: Enable PSTORE
  powerpc/configs/64s: Enable VLAN support
  powerpc/configs/64s: Enable BLK_DEV_NVME
  powerpc/configs/64s: Drop REISERFS
  powerpc/configs/64s: Use SHA512 for module signatures
  powerpc/configs/64s: Enable IO_STRICT_DEVMEM
  powerpc/configs/64s: Enable SCHEDSTATS
  powerpc/configs/64s: Enable DEBUG_VM & other options
  powerpc/configs/64s: Enable EMULATED_STATS
  powerpc/configs/64s: Enable KUNIT and most tests
  ...
2023-04-28 16:24:32 -07:00
Linus Torvalds
2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Paolo Bonzini
4f382a79a6 KVM/arm64 updates for 6.4
- Numerous fixes for the pathological lock inversion issue that
   plagued KVM/arm64 since... forever.
 
 - New framework allowing SMCCC-compliant hypercalls to be forwarded
   to userspace, hopefully paving the way for some more features
   being moved to VMMs rather than be implemented in the kernel.
 
 - Large rework of the timer code to allow a VM-wide offset to be
   applied to both virtual and physical counters as well as a
   per-timer, per-vcpu offset that complements the global one.
   This last part allows the NV timer code to be implemented on
   top.
 
 - A small set of fixes to make sure that we don't change anything
   affecting the EL1&0 translation regime just after having having
   taken an exception to EL2 until we have executed a DSB. This
   ensures that speculative walks started in EL1&0 have completed.
 
 - The usual selftest fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmRCZIwPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDoZ8P/ioXAdDbAE4hTuyD2YdKJ3IGWN3pg52Z7xc2
 rBXXFrbK9+n9FEc3AVdHoGsRPDP0Ynl+apj+aB0Klr/Fl0KKqac+W0ARX9rn1mI1
 HjeygFPaGnXjMUp0BjeSLS+g3b0gebELJ6R1QEe1/MIPb8Se7M1y3ZpMWdhe0PPL
 vyzw3LZq2OAlLgWKZhAfhh03qdr2kqJxypYs6nMrcexfn8dXT78dsYKW1nXmqKcE
 61Gg23MDPUoexYpUhm+ym5t8hltoI1di8faPmxEpaFzpSDyAg8V5vo6LiW9jn3cf
 RX0Sikk1laiRAhVbbIFCKC148vFyKxum3scpKyb91Qc+sK1kmIcxvEqlc6SfG9je
 +5ndZwAfXtW6SMSOyX8y5fXbee7M0sx3n3le9BNgwXfmLWg/GHXJ544dJgVIlf/e
 0Z+8QnP1IUDfARR/b2FlW7A7XLzNHQzO379ekcAdUptbGwlS9CrW6SJ83QR7K6fB
 bh0aSSELKsD7pX8wnNyNACvmz2zL12ITlDKdZWUr8MSxyTjgVy7s0BDsQT3sbrA1
 1sH++RvUWfC2k7tVT3vjZFzUDlPw3bnZmo5YMWRTMbXEdr1V5rDw5F5IXit13KeT
 8bk0hnJgnLmyoX2A17v5dkFMIKD7p13tqDRdfFcn0ru63HIKxgkS3ITkDmsAQELK
 DHT7RBE0
 =Bhta
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.4

- Numerous fixes for the pathological lock inversion issue that
  plagued KVM/arm64 since... forever.

- New framework allowing SMCCC-compliant hypercalls to be forwarded
  to userspace, hopefully paving the way for some more features
  being moved to VMMs rather than be implemented in the kernel.

- Large rework of the timer code to allow a VM-wide offset to be
  applied to both virtual and physical counters as well as a
  per-timer, per-vcpu offset that complements the global one.
  This last part allows the NV timer code to be implemented on
  top.

- A small set of fixes to make sure that we don't change anything
  affecting the EL1&0 translation regime just after having having
  taken an exception to EL2 until we have executed a DSB. This
  ensures that speculative walks started in EL1&0 have completed.

- The usual selftest fixes and improvements.
2023-04-26 15:46:52 -04:00
Linus Torvalds
733f7e9c18 This update includes the following changes:
API:
 
 - Total usage stats now include all that returned error (instead of some).
 - Remove maximum hash statesize limit.
 - Add cloning support for hmac and unkeyed hashes.
 - Demote BUG_ON in crypto_unregister_alg to a WARN_ON.
 
 Algorithms:
 
 - Use RIP-relative addressing on x86 to prepare for PIE build.
 - Add accelerated AES/GCM stitched implementation on powerpc P10.
 - Add some test vectors for cmac(camellia).
 - Remove failure case where jent is unavailable outside of FIPS mode in drbg.
 - Add permanent and intermittent health error checks in jitter RNG.
 
 Drivers:
 
 - Add support for 402xx devices in qat.
 - Add support for HiSTB TRNG.
 - Fix hash concurrency issues in stm32.
 - Add OP-TEE firmware support in caam.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmRGCjcACgkQxycdCkmx
 i6d6JA//ZmwgEqAKA8qWpHnNKZylTLqFhLxnKZwr4Hhp1KzManh/T9pepXiD2zAY
 D92wU60v0hfGAazeUWQRmrIZxcjyd3b3Tr7WiFuNoZbkPsuXWZAoz8iHgMq69dqb
 DXZhKJnlmVlcr+qTSk9MP8HODL5kU6Ug2pk+r8hL/WsBI+JGfZEXKcJhhMqYLYls
 nl+NN4fkE5tgcTh2lp/9dQsQRylhESZuqb8L2wItQmripSbhPGwYf24I7B7xcGrn
 o7X4XG//cQO6zQErgnOJOosIgJEEynW27CN4ZiHB8WhRAk0YLXydQBs6EjZgNA8H
 EvZC/bIx2YOt8ngG99q4kRg4OgKp4c7UnV6l1pxuJWbIyXrFh4djxHdq9pTYr3UB
 P3pVEX38Wu7U5Tfgy3y1QqZzsvrPjmnI3NQ8QBrcFzNRDan5K6nH4kQyk9Cv7LQm
 GlE1JOThU5U2G33ZWKCluJUjVUCRceMWQYla1X5R4uWMCwSqRMpmx8Ib9QvbYlWe
 iUI+RatLnlIobx+lgaC8mtij9dQddFjk6YwFYhQcD3Bl30DhTeIlbnOUY9YOTXps
 H6V9X2inVUjyZr1uJ4a7rPdCUuzQxR6HWPyp6fXMlbLrEhL8e6c4/QbEoTubRQeS
 WTtoIFt4ezd2SG6hI6dTCscgFc5EAyEMDD5GtQmJeyozu0Gqtpo=
 =ITkW
 -----END PGP SIGNATURE-----

Merge tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Total usage stats now include all that returned errors (instead of
     just some)
   - Remove maximum hash statesize limit
   - Add cloning support for hmac and unkeyed hashes
   - Demote BUG_ON in crypto_unregister_alg to a WARN_ON

  Algorithms:
   - Use RIP-relative addressing on x86 to prepare for PIE build
   - Add accelerated AES/GCM stitched implementation on powerpc P10
   - Add some test vectors for cmac(camellia)
   - Remove failure case where jent is unavailable outside of FIPS mode
     in drbg
   - Add permanent and intermittent health error checks in jitter RNG

  Drivers:
   - Add support for 402xx devices in qat
   - Add support for HiSTB TRNG
   - Fix hash concurrency issues in stm32
   - Add OP-TEE firmware support in caam"

* tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (139 commits)
  i2c: designware: Add doorbell support for Mendocino
  i2c: designware: Use PCI PSP driver for communication
  powerpc: Move Power10 feature PPC_MODULE_FEATURE_P10
  crypto: p10-aes-gcm - Remove POWER10_CPU dependency
  crypto: testmgr - Add some test vectors for cmac(camellia)
  crypto: cryptd - Add support for cloning hashes
  crypto: cryptd - Convert hash to use modern init_tfm/exit_tfm
  crypto: hmac - Add support for cloning
  crypto: hash - Add crypto_clone_ahash/shash
  crypto: api - Add crypto_clone_tfm
  crypto: api - Add crypto_tfm_get
  crypto: x86/sha - Use local .L symbols for code
  crypto: x86/crc32 - Use local .L symbols for code
  crypto: x86/aesni - Use local .L symbols for code
  crypto: x86/sha256 - Use RIP-relative addressing
  crypto: x86/ghash - Use RIP-relative addressing
  crypto: x86/des3 - Use RIP-relative addressing
  crypto: x86/crc32c - Use RIP-relative addressing
  crypto: x86/cast6 - Use RIP-relative addressing
  crypto: x86/cast5 - Use RIP-relative addressing
  ...
2023-04-26 08:32:52 -07:00
Danny Tsen
1560541631 powerpc: Move Power10 feature PPC_MODULE_FEATURE_P10
Move Power10 feature, PPC_MODULE_FEATURE_P10, definition to be in
arch/powerpc/include/asm/cpufeature.h.

Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20 18:20:05 +08:00
Thomas Zimmermann
b50f6d465a arch/powerpc: Implement <asm/fb.h> with generic helpers
Replace the architecture's fb_is_primary_device() with the generic
one from <asm-generic/fb.h>. No functional changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230417125651.25126-16-tzimmermann@suse.de
2023-04-20 10:05:05 +02:00
Michael Ellerman
18b5e7170a powerpc/pseries: Always inline functions called from cpuidle
Code in the idle path is not allowed to be instrumented because RCU is
disabled, see commit 0e985e9d22 ("cpuidle: Add comments about
noinstr/__cpuidle usage").

Force inlining of the inline functions called from cpuidle, to ensure
they are not emitted out-of-line and then available for tracing.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406144535.3786008-4-mpe@ellerman.id.au
2023-04-20 13:21:49 +10:00
Paul Gortmaker
ad46ad2d85 powerpc: drop MPC8272-ADS and PowerQUICC II FADS shared code.
With the two platforms depending on this shared code, and no others,
we can remove the orphaned code and Kconfigs

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230224204959.17425-4-paul.gortmaker@windriver.com
2023-04-20 13:21:48 +10:00
Andrew Donnellan
df9cad0949 powerpc/pseries: Add FW_FEATURE_PLPKS feature flag
Add a firmware feature flag, FW_FEATURE_PLPKS, to indicate availability of
Platform KeyStore related hcalls.

Check this flag in plpks_is_available() and pseries_plpks_init() before
trying to make an hcall.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230224041012.772648-1-ajd@linux.ibm.com
2023-04-20 13:21:46 +10:00
Joel Stanley
92cb1eff88 powerpc: Remove duplicate SPRN_HSRR definitions
There are two copies of these defines. Keep the older ones as they have
associated bit definitions.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230405045316.95003-1-joel@jms.id.au
2023-04-20 13:21:44 +10:00
Nicholas Piggin
77e69ee7ce powerpc/64: modules support building with PCREL addresing
Build modules using PCREL addressing when CONFIG_PPC_KERNEL_PCREL=y.

- The module loader must handle several new relocation types:

  * R_PPC64_REL24_NOTOC is a function call handled like R_PPC_REL24, but
    does not restore r2 upon return. The external function call stub is
    changed to use pcrel addressing to load the function pointer rather
    than based on the module TOC.

  * R_PPC64_GOT_PCREL34 is a reference to external data. A GOT table
    must be built by hand, because the linker adds this during the final
    link (which is not done for kernel modules). The GOT table is built
    similarly to the way the external function call stub table is. This
    section is called .mygot because .got has a special meaning for the
    linker and can become upset.

  * R_PPC64_PCREL34 is used for local data addressing, but there is a
    special case where the percpu section is moved at load-time to the
    percpu area which is out of range of this relocation. This requires
    the PCREL34 relocations are converted to use GOT_PCREL34 addressing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Some coding style & formatting fixups]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-7-npiggin@gmail.com
2023-04-20 13:21:42 +10:00
Nicholas Piggin
7e3a68be42 powerpc/64: vmlinux support building with PCREL addresing
PC-Relative or PCREL addressing is an extension to the ELF ABI which
uses Power ISA v3.1 PC-relative instructions to calculate addresses,
rather than the traditional TOC scheme.

Add an option to build vmlinux using pcrel addressing. Modules continue
to use TOC addressing.

- TOC address helpers and r2 are poisoned with -1 when running vmlinux.
  r2 could be used for something useful once things are ironed out.

- Assembly must call C functions with @notoc annotation, or the linker
  complains aobut a missing nop after the call. This is done with the
  CFUNC macro introduced earlier.

- Boot: with the exception of prom_init, the execution branches to the
  kernel virtual address early in boot, before any addresses are
  generated, which ensures 34-bit pcrel addressing does not miss the
  high PAGE_OFFSET bits. TOC relative addressing has a similar
  requirement. prom_init does not go to the virtual address and its
  addresses should not carry over to the post-prom kernel.

- Ftrace trampolines are converted from TOC addressing to pcrel
  addressing, including module ftrace trampolines that currently use the
  kernel TOC to find ftrace target functions.

- BPF function prologue and function calling generation are converted
  from TOC to pcrel.

- copypage_64.S has an interesting problem, prefixed instructions have
  alignment restrictions so the linker can add padding, which makes the
  assembler treat the difference between two local labels as
  non-constant even if alignment is arranged so padding is not required.
  This may need toolchain help to solve nicely, for now move the prefix
  instruction out of the alternate patch section to work around it.

This reduces kernel text size by about 6%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com
2023-04-20 12:59:21 +10:00
Nicholas Piggin
4e991e3c16 powerpc: add CFUNC assembly label annotation
This macro is to be used in assembly where C functions are called.
pcrel addressing mode requires branches to functions with a
localentry value of 1 to have either a trailing nop or @notoc.
This macro permits the latter without changing callers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add dummy definitions to fix selftests build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-5-npiggin@gmail.com
2023-04-20 12:54:24 +10:00
Nicholas Piggin
dc5dac748a powerpc/64: Add support to build with prefixed instructions
Add an option to build kernel and module with prefixed instructions if
the CPU and toolchain support it.

This is not related to kernel support for userspace execution of
prefixed instructions.

Building with prefixed instructions breaks some extended inline asm
memory addressing, for example it will provide immediates that exceed
the range of simple load/store displacement. Whether this is a
toolchain or a kernel asm problem remains to be seen. For now, these
are replaced with simpler and less efficient direct register addressing
when compiling with prefixed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-4-npiggin@gmail.com
2023-04-20 12:54:22 +10:00
Pali Rohár
3ce271435b powerpc/fsl_uli1575: Mark uli_exclude_device() as static
Function uli_exclude_device() is not used outside of the fsl_uli1575.c
source file anymore. So mark it as static and remove public prototype.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230409000812.18904-9-pali@kernel.org
2023-04-20 10:20:50 +10:00
Pali Rohár
c4f6d8665c powerpc/85xx: mpc85xx_ds: Move uli_init() code into its own driver file
Move uli_init() function into existing driver fsl_uli1575.c file in order
to share its code between more platforms and board files.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230409000812.18904-5-pali@kernel.org
2023-04-20 10:20:50 +10:00
Christophe Leroy
821b3a471f powerpc/fsl_uli1575: Misc cleanup
Use a single line for uli_exclude_device().

Add uli_exclude_device() prototype in ppc-pci.h and guard it.

Remove that prototype from mpc85xx_ds.c and mpc86xx_hpcn.c files.

Make uli_pirq_to_irq[] static as it is used only in that file.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230409000812.18904-2-pali@kernel.org
2023-04-20 10:20:49 +10:00
Nicholas Piggin
eed7c420aa powerpc: copy_thread differentiate kthreads and user mode threads
When copy_thread is given a kernel function to run in arg->fn, this
does not necessarily mean it is a kernel thread. User threads can be
created this way (e.g., kernel_init, see also x86's copy_thread()).
These threads run a kernel function which may call kernel_execve()
and return, which returns like a userspace exec(2) syscall.

Kernel threads are to  be differentiated with PF_KTHREAD, will always
have arg->fn set, and should never return from that function, instead
calling kthread_exit() to exit.

Create separate paths for the kthread and user kernel thread creation
logic. The kthread path will never exit and does not require a user
interrupt frame, so it gets a minimal stack frame.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-6-npiggin@gmail.com
2023-04-11 23:13:33 +10:00
Nysal Jan K.A
b0bbe5a291 powerpc/atomics: Remove unused function
Remove arch_atomic_try_cmpxchg_lock function as it is no longer used
since commit 9f61521c7a ("powerpc/qspinlock: powerpc qspinlock
implementation")

Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230224103940.1328725-1-nysal@linux.ibm.com
2023-04-04 22:11:03 +10:00
Nicholas Miehlbradt
ac9c8901cb powerpc: Implement arch_within_stack_frames
Walks the stack when copy_{to,from}_user address is in the stack to
ensure that the object being copied is entirely a single stack frame and
does not contain stack metadata.

Substantially similar to the x86 implementation. The back chain is used
to traverse the stack and identify stack frame boundaries.

Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230228054355.300628-1-nicholas@linux.ibm.com
2023-04-04 22:11:03 +10:00
Michael Ellerman
2837dbcef1 Merge branch 'topic/ppc-kvm' into next
Merge our KVM topic branch to bring some KVM commits into next for wider
testing.
2023-04-04 22:10:32 +10:00
Paul Mackerras
a3800ef9c4 KVM: PPC: Enable prefixed instructions for HV KVM and disable for PR KVM
Now that we can read prefixed instructions from a HV KVM guest and
emulate prefixed load/store instructions to emulated MMIO locations,
we can add HFSCR_PREFIXED into the set of bits that are set in the
HFSCR for a HV KVM guest on POWER10, allowing the guest to use
prefixed instructions.

PR KVM has not yet been extended to handle prefixed instructions in
all situations where we might need to emulate them, so prevent the
guest from enabling prefixed instructions in the FSCR for now.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ZAgs25dCmLrVkBdU@cleo
2023-04-03 15:45:59 +10:00
Paul Mackerras
953e37397f KVM: PPC: Fetch prefixed instructions from the guest
In order to handle emulation of prefixed instructions in the guest,
this first makes vcpu->arch.last_inst be an unsigned long, i.e. 64
bits on 64-bit platforms.  For prefixed instructions, the upper 32
bits are used for the prefix and the lower 32 bits for the suffix, and
both halves are byte-swapped if the guest endianness differs from the
host.

Next, vcpu->arch.emul_inst is now 64 bits wide, to match the HEIR
register on POWER10.  Like HEIR, for a prefixed instruction it is
defined to have the prefix is in the top 32 bits and the suffix in the
bottom 32 bits, with both halves in the correct byte order.

kvmppc_get_last_inst is extended on 64-bit machines to put the prefix
and suffix in the right places in the ppc_inst_t being returned.

kvmppc_load_last_inst now returns the instruction in an unsigned long
in the same format as vcpu->arch.last_inst.  It makes the decision
about whether to fetch a suffix based on the SRR1_PREFIXED bit in the
MSR image stored in the vcpu struct, which generally comes from SRR1
or HSRR1 on an interrupt.  This bit is defined in Power ISA v3.1B to
be set if the interrupt occurred due to a prefixed instruction and
cleared otherwise for all interrupts except for instruction storage
interrupt, which does not come to the hypervisor.  It is set to zero
for asynchronous interrupts such as external interrupts.  In previous
ISA versions it was always set to 0 for all interrupts except
instruction storage interrupt.

The code in book3s_hv_rmhandlers.S that loads the faulting instruction
on a HDSI is only used on POWER8 and therefore doesn't ever need to
load a suffix.

[npiggin@gmail.com - check that the is-prefixed bit in SRR1 matches the
type of instruction that was fetched.]

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ZAgsq9h1CCzouQuV@cleo
2023-04-03 15:45:50 +10:00
Paul Mackerras
acf17878da KVM: PPC: Make kvmppc_get_last_inst() produce a ppc_inst_t
This changes kvmppc_get_last_inst() so that the instruction it fetches
is returned in a ppc_inst_t variable rather than a u32.  This will
allow us to return a 64-bit prefixed instruction on those 64-bit
machines that implement Power ISA v3.1 or later, such as POWER10.
On 32-bit platforms, ppc_inst_t is 32 bits wide, and is turned back
into a u32 by ppc_inst_val, which is an identity operation on those
platforms.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ZAgsiPlL9O7KnlZZ@cleo
2023-04-03 15:45:41 +10:00
Nicholas Piggin
460ba21d83 KVM: PPC: Permit SRR1 flags in more injected interrupt types
The prefix architecture in ISA v3.1 introduces a prefixed bit in SRR1
for many types of synchronous interrupts which is set when the interrupt
is caused by a prefixed instruction.

This requires KVM to be able to set this bit when injecting interrupts
into a guest. Plumb through the SRR1 "flags" argument to the core_queue
APIs where it's missing for this. For now they are set to 0, which is
no change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup kvmppc_core_queue_alignment() in booke.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230330103224.3589928-2-npiggin@gmail.com
2023-04-03 14:54:44 +10:00
Linus Torvalds
95d0b9d89d powerpc fixes for 6.3 #4
- Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y).
  - Fix oops when a PF_IO_WORKER thread tries to core dump.
  - Don't try to reconfigure VAS when it's disabled.
 
 Thanks to: Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, Russell Currey.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmQpT20THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBHsD/kB7wxwe+h/Xddx2l9le15HfSTlVOiU
 7CO3jDYs//jt7E4ZEeHE0ppPRWcW5ICEttCrqWyT8WDfwMv7f+iBU80enZ5Rg/a7
 tX5jOel1TmsStj2bfh/myI7CX/+mEwRZ1o3J6/UgaZDlI2OLzLP/gKet3rfSh3zJ
 ilqIZxoj+ARsaYIrxnopFHmSmD5oCAsvVM6COPd3epRnIWmfd7NxtKbZU3V4WdLy
 qL9Q9Dk+RmGDSzJe1j4cRR1fA6jRIFU1WYDFz38HIVX1grFKdsm2Eeta6OXwXWGz
 TZDp82IBka6SvFkPx2RE2bGqaWbjZbA8b1mK59s5hB9trU3qFgZjoRBJoR1hp/zc
 MjfKLTxW7qC/TQAwxBKzFgTm1njRY5H5aMDnV9+6fBBC6esTgBy/la8ZygPmqWJj
 fFulUobDul9xjMg7l0CS+cjyO31Bb68fmFJOH9fmXR3fYlkP9VgJh89demD4xbMQ
 UdtAdpI8xhV98e6pX+vSlfbaeUNqzJQ7UWtr6J08nY17dhcoPxJ9RaAZdsJkZIk3
 IM/jwdhvpG++VM5CbiX+4yQHUJkekIXpfnxZIXqOl+rqv5Z0A8gRQkY469FjqZy7
 EJoYIOBGiZEgiU4oqJRkp0ifMSPMt0I0gMmRq5NxS/64mxFHlwCsJvaiHQ/fd2m8
 3DTDKQd6KOT31A==
 =1Sli
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y)

 - Fix oops when a PF_IO_WORKER thread tries to core dump

 - Don't try to reconfigure VAS when it's disabled

Thanks to Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, and
Russell Currey.

* tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled
  powerpc: Don't try to copy PPR for task with NULL pt_regs
  powerpc/64s: Fix __pte_needs_flush() false positive warning
2023-04-02 10:01:56 -07:00
Nathan Lynch
f40b0f6c5c powerpc/rtas: ensure 8-byte alignment for struct rtas_args
CHRP and PAPR agree: "In order to make an RTAS call, the operating
system must construct an argument call buffer aligned on an eight byte
boundary in physically contiguous real memory [...]." (7.2.7 Calling
Mechanism and Conventions).

struct rtas_args is the type used for this argument call buffer. The
unarchitected 'rets' member happens to produce 8-byte alignment for
the struct on 64-bit targets in practice. But without an alignment
directive the structure will have only 4-byte alignment on 32-bit
targets:

  $ nm b/{before,after}/chrp32/vmlinux | grep rtas_args
  c096881c b rtas_args
  c0968820 b rtas_args

Add an alignment directive to the struct rtas_args declaration so all
instances have the alignment required by the specs. rtas-types.h no
longer refers to any spinlock types, so drop the spinlock_types.h
inclusion while we're here.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-1-010e4416f13f@linux.ibm.com
2023-03-30 23:36:34 +11:00
Christophe Leroy
e7299f961f powerpc/perf: Properly detect mpc7450 family
Unlike PVR_POWER8, etc ...., PVR_7450 represents a full PVR
value and not a family value.

To avoid confusion, do like E500 family and define the relevant
PVR_VER_xxxx values for the 7450 family:
  0x8000 ==> 7450
  0x8001 ==> 7455
  0x8002 ==> 7447
  0x8003 ==> 7447A
  0x8004 ==> 7448

And use them to detect 7450 family for perf events.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202302260657.7dM9Uwev-lkp@intel.com/
Fixes: ec3eb9d941 ("powerpc/perf: Use PVR rather than oprofile field to determine CPU version")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/99ca1da2e5a6cf82a8abf4bc034918e500e31781.1677513277.git.christophe.leroy@csgroup.eu
2023-03-30 23:35:43 +11:00
Ira Weiny
0398abca61 powerpc: Remove memcpy_page_flushcache()
Commit 21b56c8477 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().

Remove the unnecessary memcpy_page_flushcache() call.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20221230-kmap-x86-v1-2-15f1ecccab50@intel.com
2023-03-29 23:53:02 +11:00
Gerald Schaefer
99c2913363 mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()
s390 can do more fine-grained handling of spurious TLB protection faults,
when there also is the PTE pointer available.

Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an
additional parameter.

This will add no functional change to other architectures, but those with
private flush_tlb_fix_spurious_fault() implementations need to be made
aware of the new parameter.

Link: https://lkml.kernel.org/r/20230306161548.661740-1-gerald.schaefer@linux.ibm.com
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:12 -07:00
Benjamin Gray
1abce0580b powerpc/64s: Fix __pte_needs_flush() false positive warning
Userspace PROT_NONE ptes set _PAGE_PRIVILEGED, triggering a false
positive debug assertion that __pte_flags_need_flush() is not called
on a kernel mapping.

Detect when it is a userspace PROT_NONE page by checking the required
bits of PAGE_NONE are set, and none of the RWX bits are set.
pte_protnone() is insufficient here because it always returns 0 when
CONFIG_NUMA_BALANCING=n.

Fixes: b11931e9ad ("powerpc/64s: add pte_needs_flush and huge_pmd_needs_flush")
Cc: stable@vger.kernel.org # v6.1+
Reported-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230302225947.81083-1-bgray@linux.ibm.com
2023-03-28 21:37:23 +11:00
Marco Elver
90db9dbedd kasan, powerpc: don't rename memintrinsics if compiler adds prefixes
With appropriate compiler support [1], KASAN builds use __asan prefixed
meminstrinsics, and KASAN no longer overrides memcpy/memset/memmove.

If compiler support is detected (CC_HAS_KASAN_MEMINTRINSIC_PREFIX), define
memintrinsics normally (do not prefix '__').

On powerpc, KASAN is the only user of __mem functions, which are used to
define instrumented memintrinsics.  Alias the normal versions for KASAN to
use in its implementation.

Link: https://lore.kernel.org/all/20230224085942.1791837-1-elver@google.com/ [1]
Link: https://lore.kernel.org/oe-kbuild-all/202302271348.U5lvmo0S-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230227094726.3833247-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-23 17:18:33 -07:00
Thomas Huth
67c48662e2 KVM: PPC: Standardize on "int" return types in the powerpc KVM code
Most functions that are related to kvm_arch_vm_ioctl() already use
"int" as return type to pass error values back to the caller. Some
outlier functions use "long" instead for no good reason (they do not
really require long values here). Let's standardize on "int" here to
avoid casting the values back and forth between the two types.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208140105.655814-2-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16 10:18:05 -04:00
Christophe Leroy
2fc39acfca powerpc/machdep: Define 'compatible' property in ppc_md and use it
Most probe functions do nothing else than checking whether
the machine is compatible to a given string.

Define that string in ppc_md structure and check it directly from
probe_machine() instead of using ppc_md.probe() for that.

Keep checking in ppc_md.probe() only for more complex probing.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/6cb9865d916231c38401ba34ad1a98c249fae135.1676711562.git.christophe.leroy@csgroup.eu
2023-03-15 00:52:10 +11:00
Christophe Leroy
35e175bdd5 powerpc/machdep: Make machine name const
Machine name in struct machdep_calls should never be modified.

Mark it const.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/6cb9865d916231c38401ba34ad1a98c249fae135.1676711562.git.christophe.leroy@csgroup.eu
2023-03-15 00:52:10 +11:00
Alexey Kardashevskiy
a940904443 powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in the
Type1 VFIO driver. Recent development added 2 uses of iommu_ops to the
generic VFIO which broke POWER:
  - a coherency capability check;
  - blocking IOMMU domain - iommu_group_dma_owner_claimed()/...

This adds a simple iommu_ops which reports support for cache coherency
and provides a basic support for blocking domains. No other domain types
are implemented so the default domain is NULL.

Since now iommu_ops controls the group ownership, this takes it out of
VFIO.

This adds an IOMMU device into a pci_controller (=PHB) and registers it
in the IOMMU subsystem, iommu_ops is registered at this point. This
setup is done in postcore_initcall_sync.

This replaces iommu_group_add_device() with iommu_probe_device() as the
former misses necessary steps in connecting PCI devices to IOMMU
devices. This adds a comment about why explicit iommu_probe_device() is
still needed.

The previous discussion is here:
  https://lore.kernel.org/r/20220707135552.3688927-1-aik@ozlabs.ru/
  https://lore.kernel.org/r/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f4708 ("vfio: Set DMA ownership for VFIO devices")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[mpe: Fix CONFIG_IOMMU_API=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2000135730.16998523.1678123860135.JavaMail.zimbra@raptorengineeringinc.com
2023-03-15 00:51:46 +11:00
Alexey Kardashevskiy
9d67c94335 powerpc/iommu: Add "borrowing" iommu_table_group_ops
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
for PEs: control the ownership, create/set/unset a table the hardware
for dynamic DMA windows (DDW). VFIO uses the API to implement support on
POWER.

So far only PowerNV IODA2 (POWER8 and newer machines) implemented this
and other cases (POWER7 or nested KVM) did not and instead reused
existing iommu_table structs. This means 1) no DDW 2) ownership transfer
is done directly in the VFIO SPAPR TCE driver.

Soon POWER is going to get its own iommu_ops and ownership control is
going to move there. This implements spapr_tce_table_group_ops which
borrows iommu_table tables. The upside is that VFIO needs to know less
about POWER.

The new ops returns the existing table from create_table() and only
checks if the same window is already set. This is only going to work if
the default DMA window starts table_group.tce32_start and as big as
pe->table_group.tce32_size (not the case for IODA2+ PowerNV).

This changes iommu_table_group_ops::take_ownership() to return an error
if borrowing a table failed.

This should not cause any visible change in behavior for PowerNV.
pSeries was not that well tested/supported anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[mpe: Fix CONFIG_IOMMU_API=n build (skiroot_defconfig), & formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/525438831.16998517.1678123820075.JavaMail.zimbra@raptorengineeringinc.com
2023-03-14 23:36:27 +11:00
Josh Poimboeuf
5e00d69cde powerpc/cpu: Mark start_secondary_resume() __noreturn
start_secondary_resume() doesn't return.  Annotate it as such.  By
extension this also makes arch_cpu_idle_dead() noreturn.

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/b6b2141f832d8cd8ade65f190d04b011cda5f9bb.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:24 -08:00
Linus Torvalds
49d5759268 ARM:
- Provide a virtual cache topology to the guest to avoid
   inconsistencies with migration on heterogenous systems. Non secure
   software has no practical need to traverse the caches by set/way in
   the first place.
 
 - Add support for taking stage-2 access faults in parallel. This was an
   accidental omission in the original parallel faults implementation,
   but should provide a marginal improvement to machines w/o FEAT_HAFDBS
   (such as hardware from the fruit company).
 
 - A preamble to adding support for nested virtualization to KVM,
   including vEL2 register state, rudimentary nested exception handling
   and masking unsupported features for nested guests.
 
 - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
   resuming a CPU when running pKVM.
 
 - VGIC maintenance interrupt support for the AIC
 
 - Improvements to the arch timer emulation, primarily aimed at reducing
   the trap overhead of running nested.
 
 - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
   interest of CI systems.
 
 - Avoid VM-wide stop-the-world operations when a vCPU accesses its own
   redistributor.
 
 - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
   in the host.
 
 - Aesthetic and comment/kerneldoc fixes
 
 - Drop the vestiges of the old Columbia mailing list and add [Oliver]
   as co-maintainer
 
 This also drags in arm64's 'for-next/sme2' branch, because both it and
 the PSCI relay changes touch the EL2 initialization code.
 
 RISC-V:
 
 - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
 
 - Correctly place the guest in S-mode after redirecting a trap to the guest
 
 - Redirect illegal instruction traps to guest
 
 - SBI PMU support for guest
 
 s390:
 
 - Two patches sorting out confusion between virtual and physical
   addresses, which currently are the same on s390.
 
 - A new ioctl that performs cmpxchg on guest memory
 
 - A few fixes
 
 x86:
 
 - Change tdp_mmu to a read-only parameter
 
 - Separate TDP and shadow MMU page fault paths
 
 - Enable Hyper-V invariant TSC control
 
 - Fix a variety of APICv and AVIC bugs, some of them real-world,
   some of them affecting architecurally legal but unlikely to
   happen in practice
 
 - Mark APIC timer as expired if its in one-shot mode and the count
   underflows while the vCPU task was being migrated
 
 - Advertise support for Intel's new fast REP string features
 
 - Fix a double-shootdown issue in the emergency reboot code
 
 - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
   similar treatment to VMX
 
 - Update Xen's TSC info CPUID sub-leaves as appropriate
 
 - Add support for Hyper-V's extended hypercalls, where "support" at this
   point is just forwarding the hypercalls to userspace
 
 - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
   MSR filters
 
 - One-off fixes and cleanups
 
 - Fix and cleanup the range-based TLB flushing code, used when KVM is
   running on Hyper-V
 
 - Add support for filtering PMU events using a mask.  If userspace
   wants to restrict heavily what events the guest can use, it can now
   do so without needing an absurd number of filter entries
 
 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled
 
 - Add PEBS support for Intel Sapphire Rapids
 
 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
 - Move several SVM-specific flags into vcpu_svm
 
 x86 Intel:
 
 - Handle NMI VM-Exits before leaving the noinstr region
 
 - A few trivial cleanups in the VM-Enter flows
 
 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1
 
 - Fix a crash when using eVMCS's enlighted MSR bitmaps
 
 Generic:
 
 - Clean up the hardware enable and initialization flow, which was
   scattered around multiple arch-specific hooks.  Instead, just
   let the arch code call into generic code.  Both x86 and ARM should
   benefit from not having to fight common KVM code's notion of how
   to do initialization.
 
 - Account allocations in generic kvm_arch_alloc_vm()
 
 - Fix a memory leak if coalesced MMIO unregistration fails
 
 selftests:
 
 - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
   the correct hypercall instruction instead of relying on KVM to patch
   in VMMCALL
 
 - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
 mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
 duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
 VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
 /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
 =goe1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm->lock vs. kvm->srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
2023-02-25 11:30:21 -08:00
Linus Torvalds
d0a32f5520 powerpc updates for 6.3
- Support for configuring secure boot with user-defined keys on PowerVM LPARs.
 
  - Simplify the replay of soft-masked IRQs by making it non-recursive.
 
  - Add support for KCSAN on 64-bit Book3S.
 
  - Improvements to the API & code which interacts with RTAS (pseries firmware).
 
  - Change 32-bit powermac to assign PCI bus numbers per domain by default.
 
  - Some improvements to the 32-bit BPF JIT.
 
  - Various other small features and fixes.
 
 Thanks to: Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin Gray, Christophe
 Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict
 Glaw, Josh Poimboeuf, Kajol Jain, Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers,
 Mimi Zohar, Murphy Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin,
 Pali Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika Vasireddy,
 Sourabh Jain, Stefan Berger, Stephen Rothwell, Sudhakar Kuppusamy.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmP4GnkTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEnlEAC9UoE9JM853o9ZzpOJDrbYknHsRQad
 ztQJ9xu5qjkFHHryTmWKYdiAtNDFbcfn7+1aoc5FXrIb6BOfvBo/uRFw6P501Qwv
 Fg0MQyWUnT5WrI7+rBE2q+1+FaHBNKLycLNRSh5JpXtuKe2ubQfiFD80tarBnEnU
 6I4bqXd+xjDtnqtpfiYnil/kdZTu/MzntdkmCne6fMkflgEQFU9EVQEnnE+imqFa
 6BuCwITvZ+NyaaU+cYMeGZT7aoz9PAwkksgTxXW2gQbTIApX9WX4kYU/vbW4aHts
 0bpzMmIbSbAklYIu2PQQhSU0bLfKJ+xly8E8tozHgRX6hrFlqvtmD/T5LHTBD11f
 FFzKb0NUCD8qTIy6Hn0M1tj5egLpxxzATPe/kVTkxxqTlZrzdSEaqzft6syyJHJd
 ueo0QN53AUyBaVMtxLbnB/U/8Vnz6rLqY+8dLKzXhjYjoPJqOZh/Qlc1Tk3syPwf
 E2j4H6wFqGMTOGi453Pijkpj3qpNkNT79FG5DmClcQLJxD/EXDyffLZITrkzQa0S
 FEkcMzz/Hn9Hkf7ZuNo4DN6ss6IF0vlxoi7GNr+MRR53/aVQJUDc8z24c4ICl/3w
 20ETk57XMVJzP++Hb+yn16JyAawfQOOlckBRZ2O8W5YYVoes45hxDQxVoh8EII69
 hb3KOGYEqF5wyA==
 =ECNb
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Support for configuring secure boot with user-defined keys on PowerVM
   LPARs

 - Simplify the replay of soft-masked IRQs by making it non-recursive

 - Add support for KCSAN on 64-bit Book3S

 - Improvements to the API & code which interacts with RTAS (pseries
   firmware)

 - Change 32-bit powermac to assign PCI bus numbers per domain by
   default

 - Some improvements to the 32-bit BPF JIT

 - Various other small features and fixes

Thanks to Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin
Gray, Christophe Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand,
Greg Kroah-Hartman, Jan-Benedict Glaw, Josh Poimboeuf, Kajol Jain,
Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers, Mimi Zohar, Murphy
Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin, Pali
Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika
Vasireddy, Sourabh Jain, Stefan Berger, Stephen Rothwell, and Sudhakar
Kuppusamy.

* tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (114 commits)
  powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries
  powerpc: dts: turris1x.dts: Set lower priority for CPLD syscon-reboot
  powerpc/e500: Add missing prototype for 'relocate_init'
  powerpc/64: Fix unannotated intra-function call warning
  powerpc/epapr: Don't use wrteei on non booke
  powerpc: Pass correct CPU reference to assembler
  powerpc/mm: Rearrange if-else block to avoid clang warning
  powerpc/nohash: Fix build with llvm-as
  powerpc/nohash: Fix build error with binutils >= 2.38
  powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags
  macintosh: windfarm: Use unsigned type for 1-bit bitfields
  powerpc/kexec_file: print error string on usable memory property update failure
  powerpc/machdep: warn when machine_is() used too early
  powerpc/64: Replace -mcpu=e500mc64 by -mcpu=e5500
  powerpc/eeh: Set channel state after notifying the drivers
  selftests/powerpc: Fix incorrect kernel headers search path
  powerpc/rtas: arch-wide function token lookup conversions
  powerpc/rtas: introduce rtas_function_token() API
  powerpc/pseries/lpar: convert to papr_sysparm API
  powerpc/pseries/hv-24x7: convert to papr_sysparm API
  ...
2023-02-25 11:00:06 -08:00
Linus Torvalds
a93e884edf Driver core changes for 6.3-rc1
Here is the large set of driver core changes for 6.3-rc1.
 
 There's a lot of changes this development cycle, most of the work falls
 into two different categories:
   - fw_devlink fixes and updates.  This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.
   - driver core changes to work to make struct bus_type able to be moved
     into read-only memory (i.e. const)  The recent work with Rust has
     pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only making
     things safer overall.  This is the contuation of that work (started
     last release with kobject changes) in moving struct bus_type to be
     constant.  We didn't quite make it for this release, but the
     remaining patches will be finished up for the release after this
     one, but the groundwork has been laid for this effort.
 
 Other than that we have in here:
   - debugfs memory leak fixes in some subsystems
   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.
   - cacheinfo rework and fixes
   - Other tiny fixes, full details are in the shortlog
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
 6oeFOjD3JDju3cQsfGgd
 =Su6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.3-rc1.

  There's a lot of changes this development cycle, most of the work
  falls into two different categories:

   - fw_devlink fixes and updates. This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.

   - driver core changes to work to make struct bus_type able to be
     moved into read-only memory (i.e. const) The recent work with Rust
     has pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only
     making things safer overall. This is the contuation of that work
     (started last release with kobject changes) in moving struct
     bus_type to be constant. We didn't quite make it for this release,
     but the remaining patches will be finished up for the release after
     this one, but the groundwork has been laid for this effort.

  Other than that we have in here:

   - debugfs memory leak fixes in some subsystems

   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.

   - cacheinfo rework and fixes

   - Other tiny fixes, full details are in the shortlog

  All of these have been in linux-next for a while with no reported
  problems"

[ Geert Uytterhoeven points out that that last sentence isn't true, and
  that there's a pending report that has a fix that is queued up - Linus ]

* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
  debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
  OPP: fix error checking in opp_migrate_dentry()
  debugfs: update comment of debugfs_rename()
  i3c: fix device.h kernel-doc warnings
  dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
  driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
  Revert "driver core: add error handling for devtmpfs_create_node()"
  Revert "devtmpfs: add debug info to handle()"
  Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
  driver core: cpu: don't hand-override the uevent bus_type callback.
  devtmpfs: remove return value of devtmpfs_delete_node()
  devtmpfs: add debug info to handle()
  driver core: add error handling for devtmpfs_create_node()
  driver core: bus: update my copyright notice
  driver core: bus: add bus_get_dev_root() function
  driver core: bus: constify bus_unregister()
  driver core: bus: constify some internal functions
  driver core: bus: constify bus_get_kset()
  driver core: bus: constify bus_register/unregister_notifier()
  driver core: remove private pointer from struct bus_type
  ...
2023-02-24 12:58:55 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds
db77b8502a asm-generic: cleanups for 6.3
Only three minor changes: a cross-platform series from Mike Rapoport to
 consolidate asm/agp.h between architectures, and a correctness change
 for __generic_cmpxchg_local() from Matt Evans.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPvuk8ACgkQmmx57+YA
 GNmbHw/9GbG2s+rUXOKZCx/ChA11aawJ2K7FUB8zNAb2TSInxgV8/RFdQyTgcmi7
 lh7TFOSiWWw0TvYGPz8gyP70vqGM6SEfepB9Kx5Wnb8VrXAEDX80Y1PrFojp5emF
 CYUXIzvT5XrbCLJFOpsxjEK4BB3DBfujosZZHBxx2UzUdme4lwL2vjzDmfbMpfRy
 N/TiqW96I3E9TPvqac567jmq4ghrhnFAuD3fqAndCpv0ANtZT/iNaROAgTiEOCUL
 azUoe6e6W+oIkV9tnzwaAtIBs5pkdt0DmPymxCvschpmLuh952YfJxuu6dwwl6Ue
 DLVntWRYmRBgfxi8e4DbRURa5rFnj7xE+ZgsszvJJyditCHWuw4DWU4eI3SzDSV1
 YVTjhDGoIBQYpMPeNMEaDfmMC6h7b+fP1zDwBA1mQlpS/YQJGntQ5jU6p+46ceFG
 ZfoniYOfEjwJlJA6G5yTGcro4Z1U7ghg7rvp/iTvAVM+5T3hEoLbDcI1jZSTXQB7
 JTi6LzdQVsqdQQReNAtpcB3V9l5OT8ZeFeMqd5b4i7pEs5SteUjTa23Mj+O7fmM1
 LLoeLb3X5N9DiMRaOEjJAfsS/aEsC+whf6qIl81s22XnZEQ4h3BtBrNQY6eP8mAD
 rtojRnWCJI2vYVyQTIWXN91f2cqRww4J22GZHn9a1DdMHUDCSoM=
 =3646
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "Only three minor changes: a cross-platform series from Mike Rapoport
  to consolidate asm/agp.h between architectures, and a correctness
  change for __generic_cmpxchg_local() from Matt Evans"

* tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  char/agp: introduce asm-generic/agp.h
  char/agp: consolidate {alloc,free}_gatt_pages()
  locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value
2023-02-20 15:55:47 -08:00
Benjamin Gray
4302abc628 powerpc/64s: Prevent fallthrough to hash TLB flush when using radix
In the fix reconnecting hash__tlb_flush() to tlb_flush() the
void return on radix__tlb_flush() was not restored and subsequently
falls through to the restored hash__tlb_flush().

Guard hash__tlb_flush() under an else to prevent this.

Fixes: 1665c027af ("powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()")
Reported-by: "Erhard F." <erhard_f@mailbox.org>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230217011434.115554-1-bgray@linux.ibm.com
2023-02-17 12:30:56 +11:00
Paolo Bonzini
33436335e9 KVM/riscv changes for 6.3
- Fix wrong usage of PGDIR_SIZE to check page sizes
 - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
 - Redirect illegal instruction traps to guest
 - SBI PMU support for guest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmPifFIACgkQrUjsVaLH
 LAcEyxAAinMBaBhiPmwWZQvcCzh/UFmJo8BQCwAPuwoc/a4ZGAR7ylzd0oJilP8M
 wSgX6Ad8XF+CEW2VpxW9nwyi41N25ep1Lrf8vOaWy9L9QNUo0t15WrCIbXT2p399
 HrK9fz7HHKKIMsJy+rYb9EepdmMf55xtr1Y/EjyvhoDQbrEMlKsAODYz/SUoriQG
 Tn3cCYBzLdvzDzu0xXM9v+nsetWXdajK/v4je+mE3NQceXhePAO4oVWP4IpnoROd
 ZQm3evvVdf0WtKG9curxwMB7jjBqDBFrcLYl0qHGa7pi2o5PzVM7esgaV47KwetH
 IgA/Mrf1IfzpgM7VYDDax5wUHlKj63KisqU0J8rU3PUloQXaWqv7+ho51t9GzZ/i
 9x4uyO/evVntgyTw6HCbqmQJDgEtJiG1ydrR/ydBMYHLnh7LPY2UpKgcqmirtbkK
 1/DYDp84vikQ5VW1hc8IACdoBShh9Moh4xsEStzkTrIeHcZCjtORXUh8UIPZ0Mu2
 7Mnkktu9I55SLwA3rwH/EYT1ISrOV1G+q3wfqgeLpn8YUWwCIiqWQ5Ur0/WSMJse
 uJ3HedZDzj9T4n4khX+mKEYh6joAafQZag+4TID2lRSwd0S/mpeC22hYrViMdDmq
 yhE+JNin/sz4AVaHNzGwfqk2NC2RFl9aRn2X0xTwyBubif9pKMQ=
 =spUL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.3

- Fix wrong usage of PGDIR_SIZE to check page sizes
- Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
2023-02-15 12:33:28 -05:00
Nathan Lynch
388defd5e4 powerpc/machdep: warn when machine_is() used too early
machine_is() can't provide correct results before probe_machine() has
run. Warn when it's used too early in boot, placing the WARN_ON() in a
helper function so the reported file:line indicates exactly what went
wrong.

checkpatch complains about __attribute__((weak)) in the patch, so
change that to __weak, and align the line continuations as well.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210-warn-on-machine-is-before-probe-machine-v2-1-b57f8243c51c@linux.ibm.com
2023-02-15 22:41:11 +11:00
Mike Rapoport
a13408c205 char/agp: introduce asm-generic/agp.h
There are several architectures that duplicate definitions of
map_page_into_agp(), unmap_page_from_agp() and flush_agp_cache().

Define those in asm-generic/agp.h and use it instead of duplicated
per-architecture headers.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-13 22:13:29 +01:00
Mike Rapoport
0e4f2c4567 char/agp: consolidate {alloc,free}_gatt_pages()
There is a copy of alloc_gatt_pages() and free_gatt_pages in several
architectures in arch/$ARCH/include/asm/agp.h. All the copies do exactly
the same: alias alloc_gatt_pages() to __get_free_pages(GFP_KERNEL) and
alias free_gatt_pages() to free_pages().

Define alloc_gatt_pages() and free_gatt_pages() in drivers/char/agp/agp.h
and drop per-architecture definitions.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-13 22:13:12 +01:00
Nathan Lynch
716bfc97bd powerpc/rtas: introduce rtas_function_token() API
Users of rtas_token() supply a string argument that can't be validated
at build time. A typo or misspelling has to be caught by inspection or
by observing wrong behavior at runtime.

Since the core RTAS code now has consolidated the names of all
possible RTAS functions and mapped them to their tokens, token lookup
can be implemented using symbolic constants to index a static array.

So introduce rtas_function_token(), a replacement API which does that,
along with a rtas_service_present()-equivalent helper,
rtas_function_implemented(). Callers supply an opaque predefined
function handle which is used internally to index the function
table. Typos or other inappropriate arguments yield build errors, and
the function handle is a type that can't be easily confused with RTAS
tokens or other integer types.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-19-26929c8cce78@linux.ibm.com
2023-02-13 22:35:03 +11:00
Nathan Lynch
419e27f32b powerpc/pseries: PAPR system parameter API
Introduce a set of APIs for retrieving and updating PAPR system
parameters. This encapsulates the toil of temporary RTAS work area
management, RTAS function call retries, and translation of RTAS call
statuses to conventional error values.

There are several places in the kernel that already retrieve system
parameters by calling the RTAS ibm,get-system-parameter function
directly. These will be converted to papr_sysparm_get() in changes to
follow.

As for updating system parameters, current practice is to use
sys_rtas() from user space; there are no in-kernel users of the RTAS
ibm,set-system-parameter function. However this will become deprecated
in time because it is not compatible with lockdown.

The papr_sysparm_* APIs will form the common basis for in-kernel
and user space access to system parameters. The code to expose the
set/get capabilities to user space will follow.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-14-26929c8cce78@linux.ibm.com
2023-02-13 22:35:03 +11:00
Nathan Lynch
43033bc62d powerpc/pseries: add RTAS work area allocator
Various pseries-specific RTAS functions take a temporary "work area"
parameter - a buffer in memory accessible to RTAS. Typically such
functions are passed the statically allocated rtas_data_buf buffer as
the argument. This buffer is protected by a global spinlock. So users
of rtas_data_buf cannot perform sleeping operations while accessing
the buffer.

Most RTAS functions that have a work area parameter can return a
status (-2/990x) that indicates that the caller should retry. Before
retrying, the caller may need to reschedule or sleep (see
rtas_busy_delay() for details). This combination of factors
leads to uncomfortable constructions like this:

	do {
		spin_lock(&rtas_data_buf_lock);
		rc = rtas_call(token, __pa(rtas_data_buf, ...);
		if (rc == 0) {
			/* parse or copy out rtas_data_buf contents */
		}
		spin_unlock(&rtas_data_buf_lock);
	} while (rtas_busy_delay(rc));

Another unfortunately common way of handling this is for callers to
blithely ignore the possibility of a -2/990x status and hope for the
best.

If users were allowed to perform blocking operations while owning a
work area, the programming model would become less tedious and
error-prone. Users could schedule away, sleep, or perform other
blocking operations without having to release and re-acquire
resources.

We could continue to use a single work area buffer, and convert
rtas_data_buf_lock to a mutex. But that would impose an unnecessarily
coarse serialization on all users. As awkward as the current design
is, it prevents longer running operations that need to repeatedly use
rtas_data_buf from blocking the progress of others.

There are more considerations. One is that while 4KB is fine for all
current in-kernel uses, some RTAS calls can take much smaller buffers,
and some (VPD, platform dumps) would likely benefit from larger
ones. Another is that at least one RTAS function (ibm,get-vpd)
has *two* work area parameters. And finally, we should expect the
number of work area users in the kernel to increase over time as we
introduce lockdown-compatible ABIs to replace less safe use cases
based on sys_rtas/librtas.

So a special-purpose allocator for RTAS work area buffers seems worth
trying.

Properties:

* The backing memory for the allocator is reserved early in boot in
  order to satisfy RTAS addressing requirements, and then managed with
  genalloc.
* Allocations can block, but they never fail (mempool-like).
* Prioritizes first-come, first-serve fairness over throughput.
* Early boot allocations before the allocator has been initialized are
  served via an internal static buffer.

Intended to replace rtas_data_buf. New code that needs RTAS work area
buffers should prefer this API.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-12-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Nathan Lynch
2c81ca7fba powerpc/tracing: tracepoints for RTAS entry and exit
Add two sets of tracepoints to be used around RTAS entry:

* rtas_input/rtas_output, which emit the function name, its inputs,
  the returned status, and any other outputs. These produce an API-level
  record of OS<->RTAS activity.

* rtas_ll_entry/rtas_ll_exit, which are lower-level and emit the
  entire contents of the parameter block (aka rtas_args) on entry and
  exit. Likely useful only for debugging.

With uses of these tracepoints in do_enter_rtas() to be added in the
following patch, examples of get-time-of-day and event-scan functions
as rendered by trace-cmd (with some multi-line formatting manually
imposed on the rtas_ll_* entries to avoid extremely long lines in the
commit message):

cat-36800 [059]  4978.518303: rtas_input:           get-time-of-day arguments:
cat-36800 [059]  4978.518306: rtas_ll_entry:        token=3 nargs=0 nret=8
                                                    params: [0]=0x00000000 [1]=0x00000000 [2]=0x00000000 [3]=0x00000000
                                                            [4]=0x00000000 [5]=0x00000000 [6]=0x00000000 [7]=0x00000000
							    [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
							    [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
cat-36800 [059]  4978.518366: rtas_ll_exit:         token=3 nargs=0 nret=8
                                                    params: [0]=0x00000000 [1]=0x000007e6 [2]=0x0000000b [3]=0x00000001
						            [4]=0x00000000 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
							    [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
							    [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
cat-36800 [059]  4978.518366: rtas_output:          get-time-of-day status: 0, other outputs: 2022 11 1 0 14 8 772648000

kworker/39:1-336   [039]  4982.731623: rtas_input:           event-scan arguments: 4294967295 0 80484920 2048
kworker/39:1-336   [039]  4982.731626: rtas_ll_entry:        token=6 nargs=4 nret=1
                                                             params: [0]=0xffffffff [1]=0x00000000 [2]=0x04cc1a38 [3]=0x00000800
							             [4]=0x00000000 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
								     [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
								     [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
kworker/39:1-336   [039]  4982.731676: rtas_ll_exit:         token=6 nargs=4 nret=1
                                                             params: [0]=0xffffffff [1]=0x00000000 [2]=0x04cc1a38 [3]=0x00000800
							             [4]=0x00000001 [5]=0x0000000e [6]=0x00000008 [7]=0x2e0dac40
								     [8]=0x00000000 [9]=0x00000000 [10]=0x00000000 [11]=0x00000000
								     [12]=0x00000000 [13]=0x00000000 [14]=0x00000000 [15]=0x00000000
kworker/39:1-336   [039]  4982.731677: rtas_output:          event-scan status: 1, other outputs:

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-10-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Nathan Lynch
8252b88294 powerpc/rtas: improve function information lookups
The core RTAS support code and its clients perform two types of lookup
for RTAS firmware function information.

First, mapping a known function name to a token. The typical use case
invokes rtas_token() to retrieve the token value to pass to
rtas_call(). rtas_token() relies on of_get_property(), which performs
a linear search of the /rtas node's property list under a lock with
IRQs disabled.

Second, and less common: given a token value, looking up some
information about the function. The primary example is the sys_rtas
filter path, which linearly scans a small table to match the token to
a rtas_filter struct. Another use case to come is RTAS entry/exit
tracepoints, which will require efficient lookup of function names
from token values. Currently there is no general API for this.

We need something much like the existing rtas_filters table, but more
general and organized to facilitate efficient lookups.

Introduce:

* A new rtas_function type, aggregating function name, token,
  and filter. Other function characteristics could be added in the
  future.

* An array of rtas_function, where each element corresponds to a known
  RTAS function. All information in the table is static save the token
  values, which are derived from the device tree at boot. The array is
  sorted by function name to allow binary search.

* A named constant for each known RTAS function, used to index the
  function array. These also will be used in a client-facing API to be
  added later.

* An xarray that maps valid tokens to rtas_function objects.

Fold the existing rtas_filter table into the new rtas_function array,
with the appropriate adjustments to block_rtas_call(). Remove
now-redundant fields from struct rtas_filter. Preserve the function of
the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by
introducing a per-function flag that is set for the function entries
related to pseries LPAR migration. These have never had working users
via sys_rtas on ppc64le; see commit de0f7349a0 ("powerpc/rtas:
prevent suspend-related sys_rtas use on LE").

Convert rtas_token() to use a lockless binary search on the function
table. Fall back to the old behavior for lookups against names that
are not known to be RTAS functions, but issue a warning. rtas_token()
is for function names; it is not a general facility for accessing
arbitrary properties of the /rtas node. All known misuses of
rtas_token() have been converted to more appropriate of_ APIs in
preceding changes.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-8-26929c8cce78@linux.ibm.com
2023-02-13 22:35:02 +11:00
Russell Currey
91361b5175 powerpc/pseries: Pass PLPKS password on kexec
Before interacting with the PLPKS, we ask the hypervisor to generate a
password for the current boot, which is then required for most further
PLPKS operations.

If we kexec into a new kernel, the new kernel will try and fail to
generate a new password, as the password has already been set.

Pass the password through to the new kernel via the device tree, in
/chosen/ibm,plpks-pw. Check for the presence of this property before
trying to generate a new password - if it exists, use the existing
password and remove it from the device tree.

This only works with the kexec_file_load() syscall, not the older
kexec_load() syscall, however if you're using Secure Boot then you want
to be using kexec_file_load() anyway.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-24-ajd@linux.ibm.com
2023-02-12 22:12:39 +11:00
Russell Currey
9ee76bd5c7 powerpc/pseries: Add helper to get PLPKS password length
Add helper function to get the PLPKS password length. This will be used
in a later patch to support passing the password between kernels over
kexec.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-23-ajd@linux.ibm.com
2023-02-12 22:12:39 +11:00
Andrew Donnellan
0cf2cc1fe4 powerpc/pseries: Make caller pass buffer to plpks_read_var()
Currently, plpks_read_var() allocates a buffer to pass to the
H_PKS_READ_OBJECT hcall, then allocates another buffer into which the data
is copied, and returns that buffer to the caller.

This is a bit over the top - while we probably still want to allocate a
separate buffer to pass to the hypervisor in the hcall, we can let the
caller allocate the final buffer and specify the size.

Don't allocate var->data in plpks_read_var(), instead expect the caller to
allocate it. If the caller needs to discover the size, it can set
var->data to NULL and var->datalen will be populated. Update header file
to document this.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-20-ajd@linux.ibm.com
2023-02-12 22:12:38 +11:00
Nayna Jain
899d9b8fee powerpc/pseries: Implement signed update for PLPKS objects
The Platform Keystore provides a signed update interface which can be used
to create, replace or append to certain variables in the PKS in a secure
fashion, with the hypervisor requiring that the update be signed using the
Platform Key.

Implement an interface to the H_PKS_SIGNED_UPDATE hcall in the plpks
driver to allow signed updates to PKS objects.

(The plpks driver doesn't need to do any cryptography or otherwise handle
the actual signed variable contents - that will be handled by userspace
tooling.)

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
[ajd: split patch, add timeout handling and misc cleanups]
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-18-ajd@linux.ibm.com
2023-02-12 22:12:38 +11:00
Nayna Jain
119da30d03 powerpc/pseries: Expose PLPKS config values, support additional fields
The plpks driver uses the H_PKS_GET_CONFIG hcall to retrieve configuration
and status information about the PKS from the hypervisor.

Update _plpks_get_config() to handle some additional fields. Add getter
functions to allow the PKS configuration information to be accessed from
other files. Validate that the values we're getting comply with the spec.

While we're here, move the config struct in _plpks_get_config() off the
stack - it's getting large and we also need to make sure it doesn't cross
a page boundary.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
[ajd: split patch, extend to support additional v3 API fields, minor fixes]
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-17-ajd@linux.ibm.com
2023-02-12 22:12:38 +11:00
Russell Currey
3def7a3e7c powerpc/pseries: Move PLPKS constants to header file
Move the constants defined in plpks.c to plpks.h, and standardise their
naming, so that PLPKS consumers can make use of them later on.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-16-ajd@linux.ibm.com
2023-02-12 22:12:38 +11:00
Russell Currey
90b74e305d powerpc/pseries: Move plpks.h to include directory
Move plpks.h from platforms/pseries/ to include/asm/. This is necessary
for later patches to make use of the PLPKS from code in other subsystems.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-15-ajd@linux.ibm.com
2023-02-12 22:12:38 +11:00
Andrew Donnellan
50a466bf3e powerpc/secvar: Allow backend to populate static list of variable names
Currently, the list of variables is populated by calling
secvar_ops->get_next() repeatedly, which is explicitly modelled on the
OPAL API (including the keylen parameter).

For the upcoming PLPKS backend, we have a static list of variable names.
It is messy to fit that into get_next(), so instead, let the backend put
a NULL-terminated array of variable names into secvar_ops->var_names,
which will be used if get_next() is undefined.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-12-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
86b6c0ae2c powerpc/secvar: Extend sysfs to include config vars
The forthcoming pseries consumer of the secvar API wants to expose a
number of config variables.  Allowing secvar implementations to provide
their own sysfs attributes makes it easy for consumers to expose what
they need to.

This is not being used by the OPAL secvar implementation at present, and
the config directory will not be created if no attributes are set.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-11-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
e024079440 powerpc/secvar: Handle max object size in the consumer
Currently the max object size is handled in the core secvar code with an
entirely OPAL-specific implementation, so create a new max_size() op and
move the existing implementation into the powernv platform.  Should be
no functional change.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-9-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
ec2f40bd00 powerpc/secvar: Handle format string in the consumer
The code that handles the format string in secvar-sysfs.c is entirely
OPAL specific, so create a new "format" op in secvar_operations to make
the secvar code more generic.  No functional change.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-8-ajd@linux.ibm.com
2023-02-12 22:12:37 +11:00
Russell Currey
26149b0202 powerpc/secvar: Warn and error if multiple secvar ops are set
The secvar code only supports one consumer at a time.

Multiple consumers aren't possible at this point in time, but we'd want
it to be obvious if it ever could happen.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-6-ajd@linux.ibm.com
2023-02-12 22:12:36 +11:00
Michael Ellerman
53cea34b0a powerpc/secvar: Use u64 in secvar_operations
There's no reason for secvar_operations to use uint64_t vs the more
common kernel type u64.

The types are compatible, but they require different printk format
strings which can lead to confusion.

Change all the secvar related routines to use u64.

Reviewed-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-5-ajd@linux.ibm.com
2023-02-12 22:12:36 +11:00
Michael Ellerman
fc8a898cfd Merge branch 'fixes' into next
Merge our fixes branch to bring in some changes that conflict with
upcoming next content.
2023-02-12 22:11:56 +11:00
Rohan McLure
b6e259297a powerpc/kcsan: Memory barriers semantics
Annotate memory barriers *mb() with calls to kcsan_mb(), signaling to
compilers supporting KCSAN that the respective memory barrier has been
issued. Rename memory barrier *mb() to __*mb() to opt in for
asm-generic/barrier.h to generate the respective *mb() macro.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-4-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
dc222fa773 powerpc/64: Move paca allocation to early_setup()
The early paca and boot cpuid dance is complicated and currently does
not quite work as expected for boot cpuid != 0 cases.

early_init_devtree() currently allocates the paca_ptrs and boot cpuid
paca, but until that returns and early_setup() calls setup_paca(), this
thread is currently still executing with smp_processor_id() == 0.

One problem this causes is the paca_ptrs[smp_processor_id()] pointer is
poisoned, so valid_emergency_stack() (any backtrace) and any similar
users will crash.

Another is that the hardware id which is set here will not be returned
by get_hard_smp_processor_id(smp_processor_id()), but it would work
correctly for boot_cpuid == 0, which could lead to difficult to
reproduce or find bugs. The hard id does not seem to be used by the rest
of early_init_devtree(), it just looks like all this code might have
been put here to allocate somewhere to store boot CPU hardware id while
scanning the devtree.

Rearrange things so the hwid is put in a global variable like
boot_cpuid, and do all the paca allocation and boot paca setup in the
64-bit early_setup() after we have everything ready to go.

The paca_ptrs[0] re-poisoning code in early_setup does not seem to have
ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid
is not 0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix build error on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
fb3b72a3f4 powerpc: Consolidate 32-bit and 64-bit interrupt_enter_prepare
There are two separeate implementations for 32-bit and 64-bit which
mostly do the same thing. Consolidating on one implementation ends
up being smaller and simpler, there is just irq soft-mask reconcile
that is specific to 64-bit.

There should be no real functional change with this patch, but it
does make the context tracking calls necessary for 32-bit to support
context tracking.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095805.2823731-2-npiggin@gmail.com
2023-02-10 22:17:35 +11:00
Mike Rapoport (IBM)
e5080a9677 mm, arch: add generic implementation of pfn_valid() for FLATMEM
Every architecture that supports FLATMEM memory model defines its own
version of pfn_valid() that essentially compares a pfn to max_mapnr.

Use mips/powerpc version implemented as static inline as a generic
implementation of pfn_valid() and drop its per-architecture definitions.

[rppt@kernel.org: fix the generic pfn_valid()]
  Link: https://lkml.kernel.org/r/Y9lg7R1Yd931C+y5@kernel.org
Link: https://lkml.kernel.org/r/20230129124235.209895-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Guo Ren <guoren@kernel.org>		[csky]
Acked-by: Huacai Chen <chenhuacai@loongson.cn>	[LoongArch]
Acked-by: Stafford Horne <shorne@gmail.com>	[OpenRISC]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Cc: Brian Cain <bcain@quicinc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09 16:51:41 -08:00
Pali Rohár
5d2eb73aa0 powerpc/pci: Add option for using pci_to_OF_bus_map
The "pci-OF-bus-map" property was declared deprecated in 2006 [1] and to
the best of everyone's knowledge is not used by anything anymore [2].

The creation of the property was disabled on powermac (arch/powerpc) in
2005 by commit 35499c0195 ("powerpc: Merge in 64-bit powermac
support."). But it is still created by default on CHRP.

On powermac the actual map (pci_to_OF_bus_map) is still used by default,
even though the device tree property is not created.

Add an option to enable/disable use of the pci_to_OF_bus_map, and
creation of the property (on CHRP).

Disabling the option allows enabling CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT
which allows "normal" bus numbering and more than 256 buses, like 64-bit
and other architectures.

Mark the new option as default n, the intention is that the option and
the code will be removed in a future release.

[1]: https://lore.kernel.org/linuxppc-dev/1148016268.13249.14.camel@localhost.localdomain/
[2]: https://lore.kernel.org/all/575f239205e8635add81c9f902b7d9db7beb83ea.camel@kernel.crashing.org/

Signed-off-by: Pali Rohár <pali@kernel.org>
[mpe: Reword commit & help text, shrink option name, rework to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206113902.1857123-1-mpe@ellerman.id.au
2023-02-07 20:15:23 +11:00
David Hildenbrand
950fe885a8 mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE
__HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that
support swp PTEs, so let's drop it.

Link: https://lkml.kernel.org/r/20230113171026.582290-27-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:11 -08:00
David Hildenbrand
2bba2ffbe0 powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit and 64bit.

On 64bit, let's use MSB 56 (LSB 7), located right next to the page type. 
On 32bit, let's use LSB 2 to avoid stealing one bit from the swap offset.

There seems to be no real reason why these bits cannot be used for swap
PTEs.  The important part is that _PAGE_PRESENT and _PAGE_HASHPTE remain
0.

While at it, mask the type in __swp_entry() and remove _PAGE_BIT_SWAP_TYPE
from pte-e500.h: while it was used in 64bit code it was ignored in 32bit
code.

Link: https://lkml.kernel.org/r/20230113171026.582290-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:09 -08:00
David Hildenbrand
8897ebff37 powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
We already implemented support for 64bit book3s in commit bff9beaa2e
("powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s")

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also in 32bit by reusing yet
unused LSB 2 / MSB 29.  There seems to be no real reason why that bit
cannot be used, and reusing it avoids having to steal one bit from the
swap offset.

While at it, mask the type in __swp_entry().

Link: https://lkml.kernel.org/r/20230113171026.582290-18-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:09 -08:00
Michael Ellerman
1665c027af powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()
Commit baf1ed24b2 ("powerpc/mm: Remove empty hash__ functions")
removed some empty hash MMU flushing routines, but got a bit overeager
and also removed the call to hash__tlb_flush() from tlb_flush().

In regular use this doesn't lead to any noticable breakage, which is a
little concerning. Presumably there are flushes happening via other
paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck.

Fix it by reinstating the call to hash__tlb_flush().

Fixes: baf1ed24b2 ("powerpc/mm: Remove empty hash__ functions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230131111407.806770-1-mpe@ellerman.id.au
2023-02-02 13:25:47 +11:00
Stephen Rothwell
b7810ea80f driver core: fixup for "driver core: make struct bus_type.uevent() take a const *"
After merging the driver-core tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

arch/powerpc/platforms/ps3/system-bus.c:472:19: error: initialization of 'int (*)(const struct device *, struct kobj_uevent_env *)' from incompatible pointer type 'int (*)(struct device *, struct kobj_uevent_env *)' [-Werror=incompatible-pointer-types]
  472 |         .uevent = ps3_system_bus_uevent,
      |                   ^~~~~~~~~~~~~~~~~~~~~
arch/powerpc/platforms/ps3/system-bus.c:472:19: note: (near initialization for 'ps3_system_bus_type.uevent')
arch/powerpc/platforms/pseries/ibmebus.c:436:22: error: initialization of 'int (*)(const struct device *, struct kobj_uevent_env *)' from incompatible pointer type 'int (*)(struct device *, struct kobj_uevent_env *)' [-Werror=incompatible-pointer-types]
  436 |         .uevent    = ibmebus_bus_modalias,
      |                      ^~~~~~~~~~~~~~~~~~~~
arch/powerpc/platforms/pseries/ibmebus.c:436:22: note: (near initialization for 'ibmebus_bus_type.uevent')

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 2a81ada32f ("driver core: make struct bus_type.uevent() take a const *")
Link: https://lore.kernel.org/r/20230130152818.03c00ea3@canb.auug.org.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-30 17:02:03 +01:00
Nicholas Piggin
c28548012e powerpc/64: Fix perf profiling asynchronous interrupt handlers
Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the
hard irq disabled state. So when should_hard_irq_enable() returns true
because we want PMI interrupts in irq handlers, MSR[EE] is enabled but
PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before
enabling MSR[EE].

This also tidies some of the warnings, no need to duplicate them in
both should_hard_irq_enable() and do_hard_irq_enable().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
2023-01-30 20:07:42 +11:00
Nicholas Piggin
bc88ef6632 powerpc/64s: Fix local irq disable when PMIs are disabled
When PMI interrupts are soft-masked, local_irq_save() will clear the PMI
mask bit, allowing PMIs in and causing a race condition. This causes a
deadlock in native_hpte_insert via hash_preload, which depends on PMIs
being disabled since commit 8b91cee5ea ("powerpc/64s/hash: Make hash
faults work in NMI context"). native_hpte_insert calls local_irq_save().
It's possible the lpar hash code is also affected when tracing is
enabled because __trace_hcall_entry() calls local_irq_save().

Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit into
the mask.

This was found with the stress_hpt option with a kbuild workload running
together with `perf record -g`.

Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Fixes: 8b91cee5ea ("powerpc/64s/hash: Make hash faults work in NMI context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Just take the fix without the new warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095352.2823517-1-npiggin@gmail.com
2023-01-30 20:07:18 +11:00
Nicholas Piggin
5746ca131e powerpc/64: Don't recurse irq replay
Interrupt handlers called by soft-pending irq replay code can run
softirqs, softirq replay enables and disables local irqs, which allows
interrupts to come in including soft-masked interrupts, and it can
cause pending irqs to be replayed again. That makes the soft irq replay
state machine and possible races more complicated and fragile than it
needs to be.

Use irq_enter/irq_exit around irq replay to prevent softirqs running
while interrupts are being replayed. Softirqs will now be run at the
irq_exit() call after all the irq replaying is done. This prevents irqs
being replayed while irqs are being replayed, and should hopefully make
things simpler and easier to think about and debug.

A new PACA_IRQ_REPLAYING is added to prevent asynchronous interrupt
handlers hard-enabling EE while pending irqs are being replayed, because
that causes new pending irqs to arrive which is also a complexity. This
means pending irqs won't be profiled quite so well because perf irqs
can't be taken.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121102618.2824429-1-npiggin@gmail.com
2023-01-30 18:21:41 +11:00
Christophe Leroy
bab537805a powerpc: Check !irq instead of irq == NO_IRQ and remove NO_IRQ
NO_IRQ is a relic from the old days. It is not used anymore in core
functions. By the way, function irq_of_parse_and_map() returns value 0
on error.

In some drivers, NO_IRQ is erroneously used to check the return of
irq_of_parse_and_map().

It is not a real bug today because the only architectures using the
drivers being fixed by this patch define NO_IRQ as 0, but there are
architectures which define NO_IRQ as -1. If one day those
architectures start using the non fixed drivers, there will be a
problem.

Long time ago Linus advocated for not using NO_IRQ, see
https://lore.kernel.org/all/Pine.LNX.4.64.0511211150040.13959@g5.osdl.org

He re-iterated the same view recently in
https://lore.kernel.org/all/CAHk-=wg2Pkb9kbfbstbB91AJA2SF6cySbsgHG-iQMq56j3VTcA@mail.gmail.com

So test !irq instead of tesing irq == NO_IRQ.

All other usage of NO_IRQ for powerpc were removed in previous cycles so
the time has come to remove NO_IRQ completely for powerpc.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4b8d4f96140af01dec3a3330924dda8b2451c316.1674476798.git.christophe.leroy@csgroup.eu
2023-01-30 17:53:05 +11:00
Nathan Lynch
599af49155 powerpc/rtas: remove lock and args fields from global rtas struct
Only code internal to the RTAS subsystem needs access to the central
lock and parameter block. Remove these from the globally visible
'rtas' struct and make them file-static in rtas.c.

Some changed lines in rtas_call() lack appropriate spacing around
operators and cause checkpatch errors; fix these as well.

Suggested-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-4-nathanl@linux.ibm.com
2023-01-30 17:53:05 +11:00
Greg Kroah-Hartman
b84d6d3b45 vio: move to_vio_dev() to use container_of_const()
The driver core is changing to pass some pointers as const, so move
to_vio_dev() to use container_of_const() to handle this change.
to_vio_dev() now properly keeps the const-ness of the pointer passed
into it, while as before it could be lost.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-9-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-27 13:45:41 +01:00
Kajol Jain
76d588dddc powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP
and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event.

Command to trigger the warning:
  # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5

   Performance counter stats for 'sleep 5':

                   0      thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/

         5.002117947 seconds time elapsed

         0.000131000 seconds user
         0.001063000 seconds sys

Below is snippet of the warning in dmesg:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec
  preempt_count: 2, expected: 0
  4 locks held by perf-exec/2869:
   #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90
   #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0
   #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510
   #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510
  irq event stamp: 4806
  hardirqs last  enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0
  hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510
  softirqs last  enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  Call Trace:
    dump_stack_lvl+0x98/0xe0 (unreliable)
    __might_resched+0x2f8/0x310
    __mutex_lock+0x6c/0x13f0
    thread_imc_event_add+0xf4/0x1b0
    event_sched_in+0xe0/0x210
    merge_sched_in+0x1f0/0x600
    visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0
    ctx_flexible_sched_in+0xcc/0x140
    ctx_sched_in+0x20c/0x2a0
    ctx_resched+0x104/0x1c0
    perf_event_exec+0x340/0x510
    begin_new_exec+0x730/0xef0
    load_elf_binary+0x3f8/0x1e10
  ...
  do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0
  WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0
  CPU: 36 PID: 2869 Comm: sleep Tainted: G        W          6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  NIP:  c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670
  REGS: c00000004d2134e0 TRAP: 0700   Tainted: G        W           (6.2.0-rc2-00011-g1247637727f2)
  MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48002824  XER: 00000000
  CFAR: c00000000013fb64 IRQMASK: 1

The above warning triggered because the current imc-pmu code uses mutex
lock in interrupt disabled sections. The function mutex_lock()
internally calls __might_resched(), which will check if IRQs are
disabled and in case IRQs are disabled, it will trigger the warning.

Fix the issue by changing the mutex lock to spinlock.

Fixes: 8f95faaac5 ("powerpc/powernv: Detect and create IMC device")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Fix comments, trim oops in change log, add reported-by tags]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
2023-01-11 18:29:09 +11:00
Sean Christopherson
441f7bfa99 KVM: Opt out of generic hardware enabling on s390 and PPC
Allow architectures to opt out of the generic hardware enabling logic,
and opt out on both s390 and PPC, which don't need to manually enable
virtualization as it's always on (when available).

In addition to letting s390 and PPC drop a bit of dead code, this will
hopefully also allow ARM to clean up its related code, e.g. ARM has its
own per-CPU flag to track which CPUs have enable hardware due to the
need to keep hardware enabled indefinitely when pKVM is enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <20221130230934.1014142-50-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:48:37 -05:00
Sean Christopherson
a578a0a9e3 KVM: Drop kvm_arch_{init,exit}() hooks
Drop kvm_arch_init() and kvm_arch_exit() now that all implementations
are nops.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>	# s390
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <20221130230934.1014142-30-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:41:23 -05:00
Sean Christopherson
ae19b15d91 KVM: PPC: Move processor compatibility check to module init
Move KVM PPC's compatibility checks to their respective module_init()
hooks, there's no need to wait until KVM's common compat check, nor is
there a need to perform the check on every CPU (provided by common KVM's
hook), as the compatibility checks operate on global data.

  arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
  arch/powerpc/kvm/book3s.c: return 0
  arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
  arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
                             strcmp(cur_cpu_spec->cpu_name, "e5500")
                             strcmp(cur_cpu_spec->cpu_name, "e6500")

Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Message-Id: <20221130230934.1014142-27-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:41:19 -05:00
Sean Christopherson
63a1bd8ad1 KVM: Drop arch hardware (un)setup hooks
Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that
all implementations are nops.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>	# s390
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <20221130230934.1014142-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:40:54 -05:00
Linus Torvalds
5f6e430f93 powerpc updates for 6.2
- Add powerpc qspinlock implementation optimised for large system scalability and
    paravirt. See the merge message for more details.
 
  - Enable objtool to be built on powerpc to generate mcount locations.
 
  - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is
    restricted to the patching CPU.
 
  - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI.
 
  - Sanitise user registers on interrupt entry on 64-bit Book3S.
 
  - Many other small features and fixes.
 
 Thanks to: Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen
 Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin
 Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven,
 Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain,
 Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao,
 Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure,
 Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
 Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang
 Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, Wolfram Sang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmOfrj8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIWtD/9mGF/ze2k+qFTo+30fb7bO8WJIDgsR
 dIASnZjXV7q/45elvymhUdkQv4R7xL3pzC40P1+ZKtWzGTNe+zWUQLoALNwRK85j
 8CsxZbqefGNKE5Z6ZHo9s37wsu3+jJu9yEQpGFo1LINyzeclCn5St5oqfRam+Hd/
 cPF+VfvREwZ0+YOKGBhJ2EgC+Gc9xsFY7DLQsoYlu71iZZr6Z6rgZW/EY5h3RMGS
 YKBoVwDsWaU0FpFWrr/rYTI6DqSr3AHr1+ftDg7ncCZMD6vQva6aMCCt94aLB1aE
 vC+DNdhZlA558bXGa5yA7Wr//7aUBUIwyC60DogOeZ6vw3kD9tdEd1fbH5hmqNKY
 K5bfqm28XU2959CTE8RDgsYYZvwDcfrjBIML14WZGdCQOTcGKpgOGp22o6yNb1Pq
 JKpHHnVpvu2PZ/p2XdKSm9+etr2yI6lXZAEVTS7ehdtMukButjSHEVbSCEZ8tlWz
 KokQt2J23BMHuSrXK6+67wWQBtdsLEk+LBOQmweiwarMocqvL/Zjz/5J7DR2DtH8
 wlY3wOtB1+E5j7xZ+RgK3c3jNg5dH39ZwvFsSATWTI3P+iq6OK/bbk4q4LmZt2l9
 ZIfH/CXPf9BvGCHzHa3AAd3UBbJLFwj17btMEv1wFVPS0T4LPUzkgTNTNUYeP6zL
 h1e5QfgUxvKPuQ==
 =7k3p
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add powerpc qspinlock implementation optimised for large system
   scalability and paravirt. See the merge message for more details

 - Enable objtool to be built on powerpc to generate mcount locations

 - Use a temporary mm for code patching with the Radix MMU, so the
   writable mapping is restricted to the patching CPU

 - Add an option to build the 64-bit big-endian kernel with the ELFv2
   ABI

 - Sanitise user registers on interrupt entry on 64-bit Book3S

 - Many other small features and fixes

Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.

* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
  powerpc/code-patching: Fix oops with DEBUG_VM enabled
  powerpc/qspinlock: Fix 32-bit build
  powerpc/prom: Fix 32-bit build
  powerpc/rtas: mandate RTAS syscall filtering
  powerpc/rtas: define pr_fmt and convert printk call sites
  powerpc/rtas: clean up includes
  powerpc/rtas: clean up rtas_error_log_max initialization
  powerpc/pseries/eeh: use correct API for error log size
  powerpc/rtas: avoid scheduling in rtas_os_term()
  powerpc/rtas: avoid device tree lookups in rtas_os_term()
  powerpc/rtasd: use correct OF API for event scan rate
  powerpc/rtas: document rtas_call()
  powerpc/pseries: unregister VPA when hot unplugging a CPU
  powerpc/pseries: reset the RCU watchdogs after a LPM
  powerpc: Take in account addition CPU node when building kexec FDT
  powerpc: export the CPU node count
  powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
  powerpc/dts/fsl: Fix pca954x i2c-mux node names
  cxl: Remove unnecessary cxl_pci_window_alignment()
  selftests/powerpc: Fix resource leaks
  ...
2022-12-19 07:13:33 -06:00
Linus Torvalds
4f292c4de4 New Feature:
* Randomize the per-cpu entry areas
 Cleanups:
 * Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
   coding it
 * Move to "native" set_memory_rox() helper
 * Clean up pmd_get_atomic() and i386-PAE
 * Remove some unused page table size macros
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
 krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
 cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
 Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
 MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
 UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
 9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
 +PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
 x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
 VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
 D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
 dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
 =wwVF
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Dave Hansen:
 "New Feature:

   - Randomize the per-cpu entry areas

  Cleanups:

   - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it

   - Move to "native" set_memory_rox() helper

   - Clean up pmd_get_atomic() and i386-PAE

   - Remove some unused page table size macros"

* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  x86/mm: Ensure forced page table splitting
  x86/kasan: Populate shadow for shared chunk of the CPU entry area
  x86/kasan: Add helpers to align shadow addresses up and down
  x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
  x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
  x86/mm: Recompute physical address for every page of per-CPU CEA mapping
  x86/mm: Rename __change_page_attr_set_clr(.checkalias)
  x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
  x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
  x86/mm: Add a few comments
  x86/mm: Fix CR3_ADDR_MASK
  x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
  mm: Convert __HAVE_ARCH_P..P_GET to the new style
  mm: Remove pointless barrier() after pmdp_get_lockless()
  x86/mm/pae: Get rid of set_64bit()
  x86_64: Remove pointless set_64bit() usage
  x86/mm/pae: Be consistent with pXXp_get_and_clear()
  x86/mm/pae: Use WRITE_ONCE()
  x86/mm/pae: Don't (ab)use atomic64
  mm/gup: Fix the lockless PMD access
  ...
2022-12-17 14:06:53 -06:00
Peter Zijlstra
2dff2c359e mm: Convert __HAVE_ARCH_P..P_GET to the new style
Since __HAVE_ARCH_* style guards have been depricated in favour of
defining the function name onto itself, convert pxxp_get().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y2EUEBlQXNgaJgoI@hirez.programming.kicks-ass.net
2022-12-15 10:37:27 -08:00
Linus Torvalds
e2ca6ba6ba MM patches for 6.2-rc1.
- More userfaultfs work from Peter Xu.
 
 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
 
 - Some filemap cleanups from Vishal Moola.
 
 - David Hildenbrand added the ability to selftest anon memory COW handling.
 
 - Some cpuset simplifications from Liu Shixin.
 
 - Addition of vmalloc tracing support by Uladzislau Rezki.
 
 - Some pagecache folioifications and simplifications from Matthew Wilcox.
 
 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
 
 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.  This series shold have been in the
   non-MM tree, my bad.
 
 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages.
 
 - DAMON cleanups and tuneups from SeongJae Park
 
 - Tony Luck fixed the handling of COW faults against poisoned pages.
 
 - Peter Xu utilized the PTE marker code for handling swapin errors.
 
 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient.
 
 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand.
 
 - zram support for multiple compression streams from Sergey Senozhatsky.
 
 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway.
 
 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations.
 
 - Vishal Moola removed the try_to_release_page() wrapper.
 
 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache.
 
 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking.
 
 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend.
 
 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range().
 
 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen.
 
 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests.  Better, but still not perfect.
 
 - Christoph Hellwig has removed the .writepage() method from several
   filesystems.  They only need .writepages().
 
 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting.
 
 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines.
 
 - Many singleton patches, as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
 jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
 CodAgiA51qwzId3GRytIo/tfWZSezgA=
 =d19R
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - More userfaultfs work from Peter Xu

 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying

 - Some filemap cleanups from Vishal Moola

 - David Hildenbrand added the ability to selftest anon memory COW
   handling

 - Some cpuset simplifications from Liu Shixin

 - Addition of vmalloc tracing support by Uladzislau Rezki

 - Some pagecache folioifications and simplifications from Matthew
   Wilcox

 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
   it

 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.

   This series should have been in the non-MM tree, my bad

 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages

 - DAMON cleanups and tuneups from SeongJae Park

 - Tony Luck fixed the handling of COW faults against poisoned pages

 - Peter Xu utilized the PTE marker code for handling swapin errors

 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient

 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand

 - zram support for multiple compression streams from Sergey Senozhatsky

 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway

 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations

 - Vishal Moola removed the try_to_release_page() wrapper

 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache

 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking

 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend

 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range()

 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen

 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests. Better, but still not perfect

 - Christoph Hellwig has removed the .writepage() method from several
   filesystems. They only need .writepages()

 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting

 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines

 - Many singleton patches, as usual

* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
  mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
  mm: mmu_gather: allow more than one batch of delayed rmaps
  mm: fix typo in struct pglist_data code comment
  kmsan: fix memcpy tests
  mm: add cond_resched() in swapin_walk_pmd_entry()
  mm: do not show fs mm pc for VM_LOCKONFAULT pages
  selftests/vm: ksm_functional_tests: fixes for 32bit
  selftests/vm: cow: fix compile warning on 32bit
  selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
  mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
  mm,thp,rmap: fix races between updates of subpages_mapcount
  mm: memcg: fix swapcached stat accounting
  mm: add nodes= arg to memory.reclaim
  mm: disable top-tier fallback to reclaim on proactive reclaim
  selftests: cgroup: make sure reclaim target memcg is unprotected
  selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  mm: memcg: fix stale protection of reclaim target memcg
  mm/mmap: properly unaccount memory on mas_preallocate() failure
  omfs: remove ->writepage
  jfs: remove ->writepage
  ...
2022-12-13 19:29:45 -08:00
Linus Torvalds
268325bda5 Random number generator updates for Linux 6.2-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
 A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
 bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
 RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
 WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
 waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
 ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
 /ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
 PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
 RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
 CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
 APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
 =QRhK
 -----END PGP SIGNATURE-----

Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:

 - Replace prandom_u32_max() and various open-coded variants of it,
   there is now a new family of functions that uses fast rejection
   sampling to choose properly uniformly random numbers within an
   interval:

       get_random_u32_below(ceil) - [0, ceil)
       get_random_u32_above(floor) - (floor, U32_MAX]
       get_random_u32_inclusive(floor, ceil) - [floor, ceil]

   Coccinelle was used to convert all current users of
   prandom_u32_max(), as well as many open-coded patterns, resulting in
   improvements throughout the tree.

   I'll have a "late" 6.1-rc1 pull for you that removes the now unused
   prandom_u32_max() function, just in case any other trees add a new
   use case of it that needs to converted. According to linux-next,
   there may be two trivial cases of prandom_u32_max() reintroductions
   that are fixable with a 's/.../.../'. So I'll have for you a final
   conversion patch doing that alongside the removal patch during the
   second week.

   This is a treewide change that touches many files throughout.

 - More consistent use of get_random_canary().

 - Updates to comments, documentation, tests, headers, and
   simplification in configuration.

 - The arch_get_random*_early() abstraction was only used by arm64 and
   wasn't entirely useful, so this has been replaced by code that works
   in all relevant contexts.

 - The kernel will use and manage random seeds in non-volatile EFI
   variables, refreshing a variable with a fresh seed when the RNG is
   initialized. The RNG GUID namespace is then hidden from efivarfs to
   prevent accidental leakage.

   These changes are split into random.c infrastructure code used in the
   EFI subsystem, in this pull request, and related support inside of
   EFISTUB, in Ard's EFI tree. These are co-dependent for full
   functionality, but the order of merging doesn't matter.

 - Part of the infrastructure added for the EFI support is also used for
   an improvement to the way vsprintf initializes its siphash key,
   replacing an sleep loop wart.

 - The hardware RNG framework now always calls its correct random.c
   input function, add_hwgenerator_randomness(), rather than sometimes
   going through helpers better suited for other cases.

 - The add_latent_entropy() function has long been called from the fork
   handler, but is a no-op when the latent entropy gcc plugin isn't
   used, which is fine for the purposes of latent entropy.

   But it was missing out on the cycle counter that was also being mixed
   in beside the latent entropy variable. So now, if the latent entropy
   gcc plugin isn't enabled, add_latent_entropy() will expand to a call
   to add_device_randomness(NULL, 0), which adds a cycle counter,
   without the absent latent entropy variable.

 - The RNG is now reseeded from a delayed worker, rather than on demand
   when used. Always running from a worker allows it to make use of the
   CPU RNG on platforms like S390x, whose instructions are too slow to
   do so from interrupts. It also has the effect of adding in new inputs
   more frequently with more regularity, amounting to a long term
   transcript of random values. Plus, it helps a bit with the upcoming
   vDSO implementation (which isn't yet ready for 6.2).

 - The jitter entropy algorithm now tries to execute on many different
   CPUs, round-robining, in hopes of hitting even more memory latencies
   and other unpredictable effects. It also will mix in a cycle counter
   when the entropy timer fires, in addition to being mixed in from the
   main loop, to account more explicitly for fluctuations in that timer
   firing. And the state it touches is now kept within the same cache
   line, so that it's assured that the different execution contexts will
   cause latencies.

* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
  random: include <linux/once.h> in the right header
  random: align entropy_timer_state to cache line
  random: mix in cycle counter when jitter timer fires
  random: spread out jitter callback to different CPUs
  random: remove extraneous period and add a missing one in comments
  efi: random: refresh non-volatile random seed when RNG is initialized
  vsprintf: initialize siphash key using notifier
  random: add back async readiness notifier
  random: reseed in delayed work rather than on-demand
  random: always mix cycle counter in add_latent_entropy()
  hw_random: use add_hwgenerator_randomness() for early entropy
  random: modernize documentation comment on get_random_bytes()
  random: adjust comment to account for removed function
  random: remove early archrandom abstraction
  random: use random.trust_{bootloader,cpu} command line option only
  stackprotector: actually use get_random_canary()
  stackprotector: move get_random_canary() into stackprotector.h
  treewide: use get_random_u32_inclusive() when possible
  treewide: use get_random_u32_{above,below}() instead of manual loop
  treewide: use get_random_u32_below() instead of deprecated function
  ...
2022-12-12 16:22:22 -08:00
Linus Torvalds
06cff4a58e arm64 updates for 6.2
ACPI:
 	* Enable FPDT support for boot-time profiling
 	* Fix CPU PMU probing to work better with PREEMPT_RT
 	* Update SMMUv3 MSI DeviceID parsing to latest IORT spec
 	* APMT support for probing Arm CoreSight PMU devices
 
 CPU features:
 	* Advertise new SVE instructions (v2.1)
 	* Advertise range prefetch instruction
 	* Advertise CSSC ("Common Short Sequence Compression") scalar
 	  instructions, adding things like min, max, abs, popcount
 	* Enable DIT (Data Independent Timing) when running in the kernel
 	* More conversion of system register fields over to the generated
 	  header
 
 CPU misfeatures:
 	* Workaround for Cortex-A715 erratum #2645198
 
 Dynamic SCS:
 	* Support for dynamic shadow call stacks to allow switching at
 	  runtime between Clang's SCS implementation and the CPU's
 	  pointer authentication feature when it is supported (complete
 	  with scary DWARF parser!)
 
 Tracing and debug:
 	* Remove static ftrace in favour of, err, dynamic ftrace!
 	* Seperate 'struct ftrace_regs' from 'struct pt_regs' in core
 	  ftrace and existing arch code
 	* Introduce and implement FTRACE_WITH_ARGS on arm64 to replace
 	  the old FTRACE_WITH_REGS
 	* Extend 'crashkernel=' parameter with default value and fallback
 	  to placement above 4G physical if initial (low) allocation
 	  fails
 
 SVE:
 	* Optimisation to avoid disabling SVE unconditionally on syscall
 	  entry and just zeroing the non-shared state on return instead
 
 Exceptions:
 	* Rework of undefined instruction handling to avoid serialisation
 	  on global lock (this includes emulation of user accesses to the
 	  ID registers)
 
 Perf and PMU:
 	* Support for TLP filters in Hisilicon's PCIe PMU device
 	* Support for the DDR PMU present in Amlogic Meson G12 SoCs
 	* Support for the terribly-named "CoreSight PMU" architecture
 	  from Arm (and Nvidia's implementation of said architecture)
 
 Misc:
 	* Tighten up our boot protocol for systems with memory above
           52 bits physical
 	* Const-ify static keys to satisty jump label asm constraints
 	* Trivial FFA driver cleanups in preparation for v1.1 support
 	* Export the kernel_neon_* APIs as GPL symbols
 	* Harden our instruction generation routines against
 	  instrumentation
 	* A bunch of robustness improvements to our arch-specific selftests
 	* Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmOPLFAQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNPRcCACLyDTvkimiqfoPxzzgdkx/6QOvw9s3/mXg
 UcTORSZBR1VnYkiMYEKVz/tTfG99dnWtD8/0k/rz48NbhBfsF2sN4ukyBBXVf0zR
 fjnaVyVC11LUgBgZKPo6maV+jf/JWf9hJtpPl06KTiPb2Hw2JX4DXg+PeF8t2hGx
 NLH4ekQOrlDM8mlsN5mc0YsHbiuO7Xe/NRuet8TsgU4bEvLAwO6bzOLVUMqDQZNq
 bQe2ENcGVAzAf7iRJb38lj9qB/5hrQTHRXqLXMSnJyyVjQEwYca0PeJMa7x30bXF
 ZZ+xQ8Wq0mxiffZraf6SE34yD4gaYS4Fziw7rqvydC15vYhzJBH1
 =hV+2
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The highlights this time are support for dynamically enabling and
  disabling Clang's Shadow Call Stack at boot and a long-awaited
  optimisation to the way in which we handle the SVE register state on
  system call entry to avoid taking unnecessary traps from userspace.

  Summary:

  ACPI:
   - Enable FPDT support for boot-time profiling
   - Fix CPU PMU probing to work better with PREEMPT_RT
   - Update SMMUv3 MSI DeviceID parsing to latest IORT spec
   - APMT support for probing Arm CoreSight PMU devices

  CPU features:
   - Advertise new SVE instructions (v2.1)
   - Advertise range prefetch instruction
   - Advertise CSSC ("Common Short Sequence Compression") scalar
     instructions, adding things like min, max, abs, popcount
   - Enable DIT (Data Independent Timing) when running in the kernel
   - More conversion of system register fields over to the generated
     header

  CPU misfeatures:
   - Workaround for Cortex-A715 erratum #2645198

  Dynamic SCS:
   - Support for dynamic shadow call stacks to allow switching at
     runtime between Clang's SCS implementation and the CPU's pointer
     authentication feature when it is supported (complete with scary
     DWARF parser!)

  Tracing and debug:
   - Remove static ftrace in favour of, err, dynamic ftrace!
   - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
     and existing arch code
   - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
     old FTRACE_WITH_REGS
   - Extend 'crashkernel=' parameter with default value and fallback to
     placement above 4G physical if initial (low) allocation fails

  SVE:
   - Optimisation to avoid disabling SVE unconditionally on syscall
     entry and just zeroing the non-shared state on return instead

  Exceptions:
   - Rework of undefined instruction handling to avoid serialisation on
     global lock (this includes emulation of user accesses to the ID
     registers)

  Perf and PMU:
   - Support for TLP filters in Hisilicon's PCIe PMU device
   - Support for the DDR PMU present in Amlogic Meson G12 SoCs
   - Support for the terribly-named "CoreSight PMU" architecture from
     Arm (and Nvidia's implementation of said architecture)

  Misc:
   - Tighten up our boot protocol for systems with memory above 52 bits
     physical
   - Const-ify static keys to satisty jump label asm constraints
   - Trivial FFA driver cleanups in preparation for v1.1 support
   - Export the kernel_neon_* APIs as GPL symbols
   - Harden our instruction generation routines against instrumentation
   - A bunch of robustness improvements to our arch-specific selftests
   - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
  arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
  arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
  arm64: Prohibit instrumentation on arch_stack_walk()
  arm64:uprobe fix the uprobe SWBP_INSN in big-endian
  arm64: alternatives: add __init/__initconst to some functions/variables
  arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
  kselftest/arm64: Allow epoll_wait() to return more than one result
  kselftest/arm64: Don't drain output while spawning children
  kselftest/arm64: Hold fp-stress children until they're all spawned
  arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
  arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
  arm64/sysreg: Convert MVFR2_EL1 to automatic generation
  arm64/sysreg: Convert MVFR1_EL1 to automatic generation
  arm64/sysreg: Convert MVFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
  ...
2022-12-12 09:50:05 -08:00
Michael Ellerman
f24f21c412 Merge branch 'topic/objtool' into next
Merge the powerpc objtool support, which we were keeping in a topic
branch in case of any merge conflicts.
2022-12-08 23:57:47 +11:00
Nathan Lynch
336e2554ec powerpc/rtas: document rtas_call()
rtas_call() has a complex calling convention, non-standard return
values, and many users. Add kernel-doc for it and remove the less
structured commentary from rtas.h.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-2-nathanl@linux.ibm.com
2022-12-07 22:20:33 +11:00
Laurent Dufour
e13d23a404 powerpc: export the CPU node count
At boot time, the FDT is parsed to compute the number of CPUs.
In addition count the number of CPU nodes and export it.

This is useful when building the FDT for a kexeced kernel since we need to
take in account the CPU node added since the boot time during CPU hotplug
operations.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221110180619.15796-2-ldufour@linux.ibm.com
2022-12-07 20:14:49 +11:00
Christophe Leroy
6f3a81b600 powerpc/code-patching: Remove protection against patching init addresses after init
Once init section is freed, attempting to patch init code
ends up in the weed.

Commit 51c3c62b58 ("powerpc: Avoid code patching freed init sections")
protected patch_instruction() against that, but it is the responsibility
of the caller to ensure that the patched memory is valid.

All callers have now been verified and fixed so the check
can be removed.

This improves ftrace activation by about 2% on 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/504310828f473d424e2ed229eff57bf075f52796.1669969781.git.christophe.leroy@csgroup.eu
2022-12-02 21:59:57 +11:00
Michael Jeanson
ad050d2390 powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1
In v5.7 the powerpc syscall entry/exit logic was rewritten in C, on
PPC64_ELF_ABI_V1 this resulted in the symbols in the syscall table
changing from their dot prefixed variant to the non-prefixed ones.

Since ftrace prefixes a dot to the syscall names when matching them to
build its syscall event list, this resulted in no syscall events being
available.

Remove the PPC64_ELF_ABI_V1 specific version of
arch_syscall_match_sym_name to have the same behavior across all powerpc
variants.

Fixes: 68b34588e2 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201161442.2127231-1-mjeanson@efficios.com
2022-12-02 20:57:09 +11:00
Rohan McLure
cbf892ba56 powerpc/64: Add interrupt register sanitisation macros
Include in asm/ppc_asm.h macros to be used in multiple successive
patches to implement zeroising architected registers in interrupt
handlers. Registers will be sanitised in this fashion in future patches
to reduce the speculation influence of user-controlled register values.
These mitigations will be configurable through the
CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option.

Included are macros for conditionally zeroising registers and restoring
as required with the mitigation enabled. With the mitigation disabled,
non-volatiles must be restored on demand at separate locations to
those required by the mitigation.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-2-rmclure@linux.ibm.com
2022-12-02 20:45:57 +11:00
Michael Ellerman
22db71bcba Merge branch 'topic/qspinlock' into next
Merge Nick's powerpc qspinlock implementation. From his cover letter:

This replaces the generic queued spinlock code (like s390 does) with our
own implementation.

Generic PV qspinlock code is causing latency / starvation regressions on
large systems that are resulting in hard lockups reported (mostly in
pathoogical cases). The generic qspinlock code has a number of issues
important for powerpc hardware and hypervisors that aren't easily solved
without changing code that would impact other architectures. Follow
s390's lead and implement our own for now.

Issues for powerpc using generic qspinlocks:
  - The previous lock value should not be loaded with simple loads, and
    need not be passed around from previous loads or cmpxchg results,
    because powerpc uses ll/sc-style atomics which can perform more
    complex operations that do not require this. powerpc implementations
    tend to prefer loads use larx for improved coherency performance.
  - The queueing process should absolutely minimise the number of stores
    to the lock word to reduce exclusive coherency probes, important for
    large system scalability. The pending logic is counter productive
    here.
  - Non-atomic unlock for paravirt locks is important (atomic
    instructions tend to still be more expensive than x86 CPUs).
  - Yielding to the lock owner is important in the oversubscribed
    paravirt case, which requires storing the owner CPU in the lock
    word.
  - More control of lock stealing for the paravirt case is important to
    keep latency down on large systems.
  - The lock acquisition operation should always be made with a special
    variant of atomic instructions with the lock hint bit set,
    including (especially) in the queueing paths. This is more a matter
    of adding more arch lock helpers so not an insurmountable problem
    for generic code.
2022-12-02 18:04:56 +11:00
Nicholas Piggin
dfecd06bc5 powerpc: remove STACK_FRAME_OVERHEAD
This is equal to STACK_FRAME_MIN_SIZE on 32-bit and 64-bit ELFv1, and no
longer used in 64-bit ELFv2, so replace STACK_FRAME_OVERHEAD occurrences
with STACK_FRAME_MIN_SIZE.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-18-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
cd52414d5a powerpc/64: ELFv2 use minimal stack frames in int and switch frame sizes
Adjust the ELFv2 interrupt and switch frames to the minimum C ABI size,
plus pt_regs, plus 16 bytes for the aligned regs marker for the int
frame (and the switch frame needs to match that because it uses the same
regs offset as the int frame).

This saves 80 bytes of kernel stack per interrupt. It's the principle of
getting our accounting right that's more important than the practical
saving.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-17-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
4cefb0f6c5 powerpc: split validate_sp into two functions
Most callers just want to validate an arbitrary kernel stack pointer,
some need a particular size. Make the size case the exceptional one
with an extra function.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-15-npiggin@gmail.com
2022-12-02 17:54:09 +11:00
Nicholas Piggin
6f291a0381 powerpc: add a define for the switch frame size and regs offset
This is open-coded in process.c, ppc32 uses a different define with the
same value, and the C definition is name differently which makes it an
extra indirection to grep for.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-12-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
1223e5a20f powerpc: add a define for the user interrupt frame size
The user interrupt frame is a different size from the kernel frame, so
give it its own name.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-11-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
e856e33692 powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset
This is a count of longs from the stack pointer to the regs marker.
Rename it to make it more distinct from the other byte offsets. It
can be derived from the byte offset definitions just added.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-10-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
d2e8ff9f14 powerpc: add a definition for the marker offset within the interrupt frame
Define a constant rather than open-code the offset for the
"regs" marker.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-9-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
c03be0a3f3 powerpc: add definition for pt_regs offset within an interrupt frame
This is a common offset that currently uses the overloaded
STACK_FRAME_OVERHEAD constant. It's easier to read and more
flexible to use a specific regs offset for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-8-npiggin@gmail.com
2022-12-02 17:54:08 +11:00
Nicholas Piggin
d6aee468e4 powerpc/64: Remove asm interrupt tracing call helpers
These are now unused. Remove.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-3-npiggin@gmail.com
2022-12-02 17:54:07 +11:00
Nicholas Piggin
0b2199841a powerpc/qspinlock: add compile-time tuning adjustments
This adds compile-time options that allow the EH lock hint bit to be
enabled or disabled, and adds some new options that may or may not
help matters. To help with experimentation and tuning.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-18-npiggin@gmail.com
2022-12-02 17:48:50 +11:00
Nicholas Piggin
12b459a5eb powerpc/qspinlock: provide accounting and options for sleepy locks
Finding the owner or a queued waiter on a lock with a preempted vcpu is
indicative of an oversubscribed guest causing the lock to get into
trouble. Provide some options to detect this situation and have new CPUs
avoid queueing for a longer time (more steal iterations) to minimise the
problems caused by vcpu preemption on the queue.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-17-npiggin@gmail.com
2022-12-02 17:48:50 +11:00
Nicholas Piggin
f61ab43cc1 powerpc/qspinlock: allow lock stealing in trylock and lock fastpath
This change allows trylock to steal the lock. It also allows the initial
lock attempt to steal the lock rather than bailing out and going to the
slow path.

This gives trylock more strength: without this a continually-contended
lock will never permit a trylock to succeed. With this change, the
trylock has a small but non-zero chance.

It also gives the lock fastpath most of the benefit of passing the
reservation back through to the steal loop in the slow path without the
complexity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-13-npiggin@gmail.com
2022-12-02 17:48:50 +11:00
Nicholas Piggin
e1a31e7fd7 powerpc/qspinlock: store owner CPU in lock word
Store the owner CPU number in the lock word so it may be yielded to,
as powerpc's paravirtualised simple spinlocks do.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-7-npiggin@gmail.com
2022-12-02 17:48:49 +11:00
Nicholas Piggin
0944534ef4 powerpc/qspinlock: theft prevention to control latency
Give the queue head the ability to stop stealers. After a number of
spins without successfully acquiring the lock, the queue head sets
this, which halts stealing and will assure it is the next owner.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-6-npiggin@gmail.com
2022-12-02 17:48:49 +11:00
Nicholas Piggin
6aa42f883c powerpc/qspinlock: allow new waiters to steal the lock before queueing
Allow new waiters to "steal" the lock before queueing. That is, to
acquire it while other CPUs have queued.

This particularly helps paravirt performance when physical CPUs are
oversubscribed, by keeping the lock from becoming a strict FIFO and
vCPU preemption causing queue train wrecks.

The new __queued_spin_trylock_steal() function is put in qspinlock.h
to save having to move it, because it will be used there by a later
change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-5-npiggin@gmail.com
2022-12-02 17:48:49 +11:00
Nicholas Piggin
b3a73b7db2 powerpc/qspinlock: convert atomic operations to assembly
This uses more optimal ll/sc style access patterns (rather than
cmpxchg), and also sets the EH=1 lock hint on those operations
which acquire ownership of the lock.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-4-npiggin@gmail.com
2022-12-02 17:48:49 +11:00
Nicholas Piggin
4c93c2e4b9 powerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx.
The first 16 bits of the lock are only modified by the owner, and other
modifications always use atomic operations on the entire 32 bits, so
unlocks can use plain stores on the 16 bits. This is the same kind of
optimisation done by core qspinlock code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-3-npiggin@gmail.com
2022-12-02 17:48:49 +11:00
Nicholas Piggin
84990b1695 powerpc/qspinlock: add mcs queueing for contended waiters
This forms the basis of the qspinlock slow path.

Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock
owner does not participate in the queue, only waiters. The first waiter
spins on the lock word, then when the lock is released it takes
ownership and unqueues the next waiter. This is how qspinlocks can be
implemented with the spinlock API -- lock owners don't need a node, only
waiters do.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-2-npiggin@gmail.com
2022-12-02 17:48:49 +11:00
Nicholas Piggin
9f61521c7a powerpc/qspinlock: powerpc qspinlock implementation
Add a powerpc specific implementation of queued spinlocks. This is the
build framework with a very simple (non-queued) spinlock implementation
to begin with. Later changes add queueing, and other features and
optimisations one-at-a-time. It is done this way to more easily see how
the queued spinlocks are built, and to make performance and correctness
bisects more useful.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop paravirt.h & processor.h changes to fix 32-bit build]
[mpe: Fix 32-bit build of qspinlock.o & disallow GENERIC_LOCKBREAK per Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/CONLLQB6DCJU.2ZPOS7T6S5GRR@bobo
2022-12-02 17:48:02 +11:00
David Hildenbrand
d6379159f4 mm: remove unused savedwrite infrastructure
NUMA hinting no longer uses savedwrite, let's rip it out.

... and while at it, drop __pte_write() and __pmd_write() on ppc64.

Link: https://lkml.kernel.org/r/20221108174652.198904-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:49 -08:00
Benjamin Gray
274d842fa1 powerpc/tlb: Add local flush for page given mm_struct and psize
Adds a local TLB flush operation that works given an mm_struct, VA to
flush, and page size representation. Most implementations mirror the
surrounding code. The book3s/32/tlbflush.h implementation is left as
a BUILD_BUG because it is more complicated and not required for
anything as yet.

This removes the need to create a vm_area_struct, which the temporary
patching mm work does not need.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-8-bgray@linux.ibm.com
2022-11-30 21:46:49 +11:00
Benjamin Gray
d34471c9bd powerpc/mm: Remove flush_all_mm, local_flush_all_mm
These functions were introduced for "cxl: Enable global TLBIs for cxl
contexts" [1], which ended up using them for Radix only. They were never
implemented on Hash (and creating an implementation appears to be
difficult), so nothing can actually rely on them.

They behave differently to the existing surrounding functions too, in
that they actually need to do something on Hash. The other functions
are primarily for use in generic code that expects their definitions,
but Hash updates the TLB during PTE updates.

After replacing the only usage with the Radix specific version, there
are no more users of these functions, and given they are not implemented
anyway it is safe to delete them.

[1]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20170903181513.29635-1-fbarrat@linux.vnet.ibm.com/

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-7-bgray@linux.ibm.com
2022-11-30 21:46:49 +11:00
Benjamin Gray
0f0a0a6091 cxl: Use radix__flush_all_mm instead of generic flush_all_mm
The generic implementation of this function isn't really generic (Hash
is not implemented). Unfortunately, the runtime warnings cannot be
replaced with BUILD_BUG's, so it seems safer not to provide a stub in
the first place.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-6-bgray@linux.ibm.com
2022-11-30 21:46:48 +11:00
Benjamin Gray
baf1ed24b2 powerpc/mm: Remove empty hash__ functions
The empty hash__* functions are unnecessary. The empty definitions were
introduced when 64-bit Hash support was added, as the functions were
still used in generic code. These empty definitions were prefixed with
hash__ when Radix support was added, and new wrappers with the original
names were added that selected the Radix or Hash version based on
radix_enabled().

But the hash__ prefixed functions were not part of a public interface,
so there is no need to include them for compatibility with anything.
Generic code will use the non-prefixed wrappers, and Hash specific code
will know that there is no point in calling them (or even worse, call
them and expect them to do something).

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-5-bgray@linux.ibm.com
2022-11-30 21:46:48 +11:00
Jordan Niethe
3671f4ebe3 powerpc: Allow clearing and restoring registers independent of saved breakpoint state
For the coming temporary mm used for instruction patching, the
breakpoint registers need to be cleared to prevent them from
accidentally being triggered. As soon as the patching is done, the
breakpoints will be restored.

The breakpoint state is stored in the per-cpu variable current_brk[].
Add a suspend_breakpoints() function which will clear the breakpoint
registers without touching the state in current_brk[]. Add a pair
function restore_breakpoints() which will move the state in
current_brk[] back to the registers.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-2-bgray@linux.ibm.com
2022-11-30 21:46:48 +11:00
Christoph Hellwig
dea681c91d powerpc/ps3: mark ps3_system_bus_type static
ps3_system_bus_type is only used inside of system-bus.c, so remove
the external declaration and the very outdated comment next to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221122072225.423432-1-hch@lst.de
2022-11-30 21:46:47 +11:00
Michael Ellerman
611c020239 Merge branch 'fixes' into next
Merge our fixes branch to bring in some changes that are prerequisites
for work in next.
2022-11-30 21:46:06 +11:00
Michael Ellerman
9a04b0febb Merge branch 'topic/ppc-kvm' into next
Merge our KVM topic branch.
2022-11-30 20:42:22 +11:00
Nicholas Piggin
6c645b01e5 KVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS support
32-bit does not trace_irqs_off() to match the trace_irqs_on() call in
kvmppc_fix_ee_before_entry(). This can lead to irqs being enabled twice
in the trace, and the irqs-off region between guest exit and the host
enabling local irqs again is not properly traced.

64-bit code does call this, but from asm code where volatiles are live
and so incorrectly get clobbered.

Move the irq reconcile into C to fix both problems.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-2-npiggin@gmail.com
2022-11-30 20:40:17 +11:00
Michael Ellerman
2e7ec190a0 powerpc/64s: Add missing declaration for machine_check_early_boot()
There's no declaration for machine_check_early_boot(), which leads to a
build failure with W=1. Add one.

Fixes: 2f5182cffa ("powerpc/64s: early boot machine check handler")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221125132521.2167039-1-mpe@ellerman.id.au
2022-11-26 00:25:32 +11:00
Nayna Jain
af223e1728 powerpc/pseries: Fix the H_CALL error code in PLPKS driver
PAPR Spec defines H_P1 actually as H_PARAMETER and maps H_ABORTED to
a different numerical value.

Fix the error codes as per PAPR Specification.

Fixes: 2454a7af0f ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-3-nayna@linux.ibm.com
2022-11-24 23:31:50 +11:00
Nicholas Piggin
f985adaf2f powerpc: remove the last remnants of cputime_t
cputime_t was a core kernel type, removed by commits
ed5c8c854f2b..b672592f0221. As explained in commit b672592f02
("sched/cputime: Remove generic asm headers"), the final cleanup
is for the arch to provide cputime_to_nsec[s](). Commit ade7667a98
("powerpc: Add cputime_to_nsecs()") did that, but justdidn't remove
the then-unused cputime_to_usecs(), cputime_t type, and associated
remnants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006105653.115829-1-npiggin@gmail.com
2022-11-24 23:31:48 +11:00
Christophe Leroy
0b4721815c powerpc/8xx: Reverse order entries are written by __set_pte_at()
At the time being, with 16k pages __set_pte_at() writes table entries
in reverse order:

 294:	91 49 00 0c 	stw     r10,12(r9)
 298:	91 49 00 08 	stw     r10,8(r9)
 29c:	91 49 00 04 	stw     r10,4(r9)
 2a0:	91 49 00 00 	stw     r10,0(r9)

Allthough there should be no impact at all as it stays in a single
cacheline, reverse the writing in a more natural order.

 288:	91 49 00 0c 	stw     r10,0(r9)
 28c:	91 49 00 08 	stw     r10,4(r9)
 290:	91 49 00 04 	stw     r10,8(r9)
 294:	91 49 00 00 	stw     r10,12(r9)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/67c3b5d44edfec054234ea9b4d05fc4b4f7f8a0e.1664346554.git.christophe.leroy@csgroup.eu
2022-11-24 23:31:48 +11:00
Christophe Leroy
f2c45962cc powerpc/8xx: Simplify pte_update() with 16k pages
While looking at code generated for code patching, I saw that
pte_clear generated:

 2d8:	38 a0 00 00 	li      r5,0
 2dc:	38 e0 10 00 	li      r7,4096
 2e0:	39 00 20 00 	li      r8,8192
 2e4:	39 40 30 00 	li      r10,12288
 2e8:	90 a9 00 00 	stw     r5,0(r9)
 2ec:	90 e9 00 04 	stw     r7,4(r9)
 2f0:	91 09 00 08 	stw     r8,8(r9)
 2f4:	91 49 00 0c 	stw     r10,12(r9)

With 16k pages, only the first entry is used by the kernel, so no need
to adapt the address of other entries. Only duplicate the first entry
for hardware.

Now it is:

 2cc:	39 40 00 00 	li      r10,0
 2d0:	91 49 00 00 	stw     r10,0(r9)
 2d4:	91 49 00 04 	stw     r10,4(r9)
 2d8:	91 49 00 08 	stw     r10,8(r9)
 2dc:	91 49 00 0c 	stw     r10,12(r9)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65f76300de07091a59a042a3db2d0ce9b939a05c.1664346532.git.christophe.leroy@csgroup.eu
2022-11-24 23:31:48 +11:00
Kajol Jain
4ac9d3187c powerpc/kvm: Remove unused references for MMCR3/SIER2/SIER3 registers
Commit 57dc0eed73 ("KVM: PPC: Book3S HV P9: Implement PMU save/restore
in C") removed the PMU save/restore functions from assembly code and
implemented these functions in C, for power9 and later platforms.

After the code refactoring, Performance Monitoring Unit (PMU) registers
became part of "p9_host_os_sprs" structure and now this structure is
used to save/restore pmu host registers, for power9 and later platfroms.
But we still have old unused registers references. Patch removes unused
host_mmcr references for Monitor Mode Control Register 3 (MMCR3)/
Sampled Instruction Event Register 2 (SIER2)/ SIER3 registers from
"struct kvmppc_host_state".

Fixes: 57dc0eed73 ("KVM: PPC: Book3S HV P9: Implement PMU save/restore in C")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916105736.268153-3-disgoel@linux.vnet.ibm.com
2022-11-24 23:31:47 +11:00
Nicholas Piggin
b86cf14f24 powerpc: add compile-time support for lbarx, lharx
ISA v2.06 (POWER7 and up) as well as e6500 support lbarx and lharx.
Add a compile option that allows code to use it, and add support in
cmpxchg and xchg 8 and 16 bit values without shifting and masking.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220909052312.63916-1-npiggin@gmail.com
2022-11-24 23:31:47 +11:00
Naveen N. Rao
7af82ff90a powerpc/ftrace: Ignore weak functions
Extend commit b39181f7c6 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to
avoid adding weak function") to ppc32 and ppc64 -mprofile-kernel by
defining FTRACE_MCOUNT_MAX_OFFSET.

For ppc64 -mprofile-kernel ABI, we can have two instructions at function
entry for TOC setup followed by 'mflr r0' and 'bl _mcount'. So, the
mcount location is at most the 4th instruction in a function. For ppc32,
mcount location is always the 3rd instruction in a function, preceded by
'mflr r0' and 'stw r0,4(r1)'.

With this patch, and with ppc64le_guest_defconfig and some ftrace/bpf
config items enabled:
  # grep __ftrace_invalid_address available_filter_functions | wc -l
  79

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220809105425.424045-1-naveen.n.rao@linux.vnet.ibm.com
2022-11-24 23:31:46 +11:00
Christophe Leroy
e75d07bd83 powerpc: Remove find_current_mm_pte()
Last usage of find_current_mm_pte() was removed by
commit 15759cb054 ("powerpc/perf/callchain: Use
__get_user_pages_fast in read_user_stack_slow")

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ec79f462a3bfa8365b7df505e574d5d85246bc68.1646818177.git.christophe.leroy@csgroup.eu
2022-11-24 23:12:18 +11:00
Mark Rutland
94d095ffa0 ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.

This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.

Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 13:56:41 +00:00
Mark Rutland
0ef86097f1 ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*()
helpers. In preparation, this patch renames
ftrace_instruction_pointer_set() to
ftrace_regs_set_instruction_pointer().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 13:56:41 +00:00
Sathvika Vasireddy
e52ec98c5a objtool/powerpc: Enable objtool to be built on ppc
This patch adds [stub] implementations for required functions, inorder
to enable objtool build on powerpc.

[Christophe Leroy: powerpc: Add missing asm/asm.h for objtool,
Use local variables for type and imm in arch_decode_instruction(),
Adapt len for prefixed instructions.]

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-16-sv@linux.ibm.com
2022-11-18 19:00:16 +11:00
Jason A. Donenfeld
622754e84b stackprotector: actually use get_random_canary()
The RNG always mixes in the Linux version extremely early in boot. It
also always includes a cycle counter, not only during early boot, but
each and every time it is invoked prior to being fully initialized.
Together, this means that the use of additional xors inside of the
various stackprotector.h files is superfluous and over-complicated.
Instead, we can get exactly the same thing, but better, by just calling
`get_random_canary()`.

Acked-by: Guo Ren <guoren@kernel.org> # for csky
Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18 02:18:10 +01:00
Sathvika Vasireddy
01f2cf0b99 powerpc: Override __ALIGN and __ALIGN_STR macros
In a subsequent patch, we would want to annotate powerpc assembly functions
with SYM_FUNC_START_LOCAL macro. This macro depends on __ALIGN macro.

The default expansion of __ALIGN macro is:
        #define __ALIGN      .align 4,0x90

So, override __ALIGN and __ALIGN_STR macros to use the same alignment as
that of the existing _GLOBAL macro. Also, do not pad with 0x90, because
repeated 0x90s are not a nop or trap on powerpc.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-3-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Sathvika Vasireddy
93e3f45a26 powerpc: Fix __WARN_FLAGS() for use with Objtool
Commit 1e688dd2a3 ("powerpc/bug: Provide better flexibility to
WARN_ON/__WARN_FLAGS() with asm goto") updated __WARN_FLAGS() to use asm
goto, and added a call to 'unreachable()' after the asm goto for optimal
code generation. With CONFIG_OBJTOOL enabled, 'annotate_unreachable()'
statement in 'unreachable()' tries to note down the location of the
subsequent instruction in a separate elf section to aid code flow
analysis. However, on powerpc, this results in gcc emitting a call to a
symbol of size 0. This results in objtool complaining of "unannotated
intra-function call" since the target symbol is not a valid function
call destination.

Objtool wants this annotation for code flow analysis, which we are not
yet enabling on powerpc. As such, expand the call to 'unreachable()' in
__WARN_FLAGS() without annotate_unreachable():
        barrier_before_unreachable();
        __builtin_unreachable();

This still results in optimal code generation for __WARN_FLAGS(), while
getting rid of the objtool warning.

We still need barrier_before_unreachable() to work around gcc bugs 82365
and 106751:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106751

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-2-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Kefeng Wang
e025ab842e mm: remove kern_addr_valid() completely
Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address.  So as there is no need for kern_addr_valid(), let's
remove it.

Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de>			[parisc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: Guo Ren <guoren@kernel.org>			[csky]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 17:37:18 -08:00
Andreas Schwab
ce883a2ba3 powerpc/32: fix syscall wrappers with 64-bit arguments
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Fixes: e237506238 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
2022-11-01 10:24:09 +11:00
Nicholas Piggin
b9ef323ea1 powerpc/64s: Disable preemption in hash lazy mmu mode
apply_to_page_range on kernel pages does not disable preemption, which
is a requirement for hash's lazy mmu mode, which keeps track of the
TLBs to flush with a per-cpu array.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-1-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Linus Torvalds
70609c1495 powerpc fixes for 6.1 #2
- Fix 32-bit syscall wrappers with 64-bit arguments of unaligned register-pairs.
    Notably this broke ftruncate64 & pread/write64, which can lead to file corruption.
 
  - Fix lost interrupts when returning to soft-masked context on 64-bit.
 
  - Fix build failure when CONFIG_DTL=n.
 
 Thanks to: Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd Bergmann, Sachin Sant.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNJQp8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPd3D/9+uPapAs6/ZGkrbojseiF/fZNyIfTF
 HcLZ3iWUP9+APuaRxd1m3HbBoWdgFQ9yC5abYnX1z+E/yEhbTaOof9Gu1vXfxbIF
 Orlc+X12uZMrdU1L2qlgk/89QLeKOTlp/b2eUXJ332Vl4DfJDoknMsqTqbMz9NrL
 EDj6vmG3JzJgKHmplwxd7d4CExQNh9khZr3lZ6Ae/hC1cZ1PVXD67akwxsc0i8E2
 dBGzwsP1SYXPrBMXONT1GNs/whXrKBHNyLWFueab3p0FsSkbc4sY/HaAPzjq7YzK
 aPV0FQi2YpIZ5IIunZA+vW+tQ8EvyOpOUPMGb7iiN/31YcFbf/k6Y92AlCFlHRq3
 9vvsNGzDeFBfvH2wWcR1XaAMQHmBpC4L+dYbCKIyuK9DjgZtpZUV3acagRMFXqMN
 TeN9h2GXAYxCdNc27I5v3wAvbNcx69E4Bs5F7voKGD4pb7V/4d/li2oix6v0FKIy
 qAt1PGhD4IgPE3oJOEOYsJRVQNyDfG4eKlFh5ijyyc5hQeGNeKiAH+G+a61R3ueo
 GvVOdkR/1QxK1XkbgrMyclb0nHN1kEjRHeIcMFW1+dMKA9UvOlJ6u8nkOFRvA3js
 r00O1uE7DT5WzdGXgEHCCDd7PybBKxpM32m4/vzgDM7xG+DHxp3j7LTQsDOfkFsH
 bkc31LqrAw3yDQ==
 =B+7s
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix 32-bit syscall wrappers with 64-bit arguments of unaligned
   register-pairs. Notably this broke ftruncate64 & pread/write64, which
   can lead to file corruption.

 - Fix lost interrupts when returning to soft-masked context on 64-bit.

 - Fix build failure when CONFIG_DTL=n.

Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd
Bergmann, and Sachin Sant.

* tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Fix CONFIG_DTL=n build
  powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
2022-10-14 11:16:18 -07:00
Linus Torvalds
676cb49573 - hfs and hfsplus kmap API modernization from Fabio Francesco
- Valentin Schneider makes crash-kexec work properly when invoked from
   an NMI-time panic.
 
 - ntfs bugfixes from Hawkins Jiawei
 
 - Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
   percpu counters.
 
 - nilfs2 cleanups from Minghao Chi
 
 - lots of other single patches all over the tree!
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
 joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
 DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
 =lUQY
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - hfs and hfsplus kmap API modernization (Fabio Francesco)

 - make crash-kexec work properly when invoked from an NMI-time panic
   (Valentin Schneider)

 - ntfs bugfixes (Hawkins Jiawei)

 - improve IPC msg scalability by replacing atomic_t's with percpu
   counters (Jiebin Sun)

 - nilfs2 cleanups (Minghao Chi)

 - lots of other single patches all over the tree!

* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
  proc: test how it holds up with mapping'less process
  mailmap: update Frank Rowand email address
  ia64: mca: use strscpy() is more robust and safer
  init/Kconfig: fix unmet direct dependencies
  ia64: update config files
  nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
  fork: remove duplicate included header files
  init/main.c: remove unnecessary (void*) conversions
  proc: mark more files as permanent
  nilfs2: remove the unneeded result variable
  nilfs2: delete unnecessary checks before brelse()
  checkpatch: warn for non-standard fixes tag style
  usr/gen_init_cpio.c: remove unnecessary -1 values from int file
  ipc/msg: mitigate the lock contention with percpu counter
  percpu: add percpu_counter_add_local and percpu_counter_sub_local
  fs/ocfs2: fix repeated words in comments
  relay: use kvcalloc to alloc page array in relay_alloc_page_array
  proc: make config PROC_CHILDREN depend on PROC_FS
  fs: uninline inode_maybe_inc_iversion()
  ...
2022-10-12 11:00:22 -07:00
Nicholas Piggin
e237506238 powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
powerpc 32-bit system call (and function) calling convention for 64-bit
arguments requires the next available odd-pair (two sequential registers
with the first being odd-numbered) from the standard register argument
allocation.

The first argument register is r3, so a 64-bit argument that appears at
an even position in the argument list must skip a register (unless there
were preceding 64-bit arguments, which might throw things off). This
requires non-standard compat definitions to deal with the holes in the
argument register allocation.

With pt_regs syscall wrappers which use a standard mapper to map pt_regs
GPRs to function arguments, 32-bit kernels hit the same basic problem,
the standard definitions don't cope with the unused argument registers.

Fix this by having 32-bit kernels share those syscall definitions with
compat.

Thanks to Jason for spending a lot of time finding and bisecting this
and developing a trivial reproducer. The perfect bug report.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
2022-10-13 00:49:58 +11:00
Linus Torvalds
4899a36f91 powerpc updates for 6.1
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
 
  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
 
  - Add support for syscall wrappers.
 
  - Add support for KFENCE on 64-bit.
 
  - Update 64-bit HV KVM to use the new guest state entry/exit accounting API.
 
  - Support execute-only memory when using the Radix MMU (P9 or later).
 
  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
 
  - Updates to our linker script to move more data into read-only sections.
 
  - Allow the VDSO to be randomised on 32-bit.
 
  - Many other small features and fixes.
 
 Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe
 Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva,
 Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof
 Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
 Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure,
 Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram
 Sang, ye xingchen, Zheng Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNCpBMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDx3EACCf86iumFF3RyvENtDwoTRgH3H0z2E
 /ZC4LKrtxgaPFJzKUT4F0kLK85Hw5GzMEKK42NIhAB0o5vFwmEzxOtnlHOyEufAm
 EDIZDIfxV2J9Qx/cW2DSojPj/o9O6noXwhw9SBqMwiDWd8gXmNgOUEklAO7aR7Vq
 Ne2N2FLMNthZydCoHR6dAEjfe2ceFXP5cALwzQO+ILDdZQ0UcF2Yq4yw/gEDoCrB
 FH7mmE7UaQQHvYzo85VTZu7XfUys1P7kUcnhVurOg7/07ITnvnQR+itKZXC+bSft
 1K7ULtjd2QiCgxZA/apFc3lO46kqHVFsB3onRQw12/Ku5vfGFfY0L0iK97OgM4s0
 0u4r+J7A+MM5YBJVVjwZ6woYO5CWMHYKBZepxOpcvftPxj1LNkiHsryqKILGISEC
 aIY/lI0hpeNU4QshDMXzSTgeb/VF9O5cGPncTPkOFbXxD4RpVyz8tSngsG1+D8lj
 S6B2h3k4A14rnblLOxP22jcedBlTYQcRQS4vwr0a7+63QTjfSJ12xT3ucIAKU9f7
 65rVSS/igbrfxqHDmrd60WWZBMXeK0Zy7YIG6iYPTxpP31eFpSp9wtDlV7V2+EH2
 F2p+TJY8aTA8UW+2L5gigN3RsBeeEB8zxJkB14ivICM7+XzVu11PxPDqjDZYkfzC
 ueKKvCcHhHAYqQ==
 =TFBA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Remove our now never-true definitions for pgd_huge() and p4d_leaf().

 - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.

 - Add support for syscall wrappers.

 - Add support for KFENCE on 64-bit.

 - Update 64-bit HV KVM to use the new guest state entry/exit accounting
   API.

 - Support execute-only memory when using the Radix MMU (P9 or later).

 - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.

 - Updates to our linker script to move more data into read-only
   sections.

 - Allow the VDSO to be randomised on 32-bit.

 - Many other small features and fixes.

Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.

* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
  KVM: PPC: Book3S HV: Fix stack frame regs marker
  powerpc: Don't add __powerpc_ prefix to syscall entry points
  powerpc/64s/interrupt: Fix stack frame regs marker
  powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
  powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
  powerpc/pseries: Add firmware details to the hardware description
  powerpc/powernv: Add opal details to the hardware description
  powerpc: Add device-tree model to the hardware description
  powerpc/64: Add logical PVR to the hardware description
  powerpc: Add PVR & CPU name to hardware description
  powerpc: Add hardware description string
  powerpc/configs: Enable PPC_UV in powernv_defconfig
  powerpc/configs: Update config files for removed/renamed symbols
  powerpc/mm: Fix UBSAN warning reported on hugetlb
  powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
  powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
  powerpc: Drops STABS_DEBUG from linker scripts
  powerpc/64s: Remove lost/old comment
  powerpc/64s: Remove old STAB comment
  powerpc: remove orphan systbl_chk.sh
  ...
2022-10-09 14:05:15 -07:00
Michael Ellerman
9474689020 powerpc: Don't add __powerpc_ prefix to syscall entry points
When using syscall wrappers the __SYSCALL_DEFINEx() and related macros
add a "__powerpc_" prefix to all syscall entry points.

So for example sys_mmap becomes __powerpc_sys_mmap.

This risks breaking workflows and tools that expect the old naming
scheme. At a minimum setting a breakpoint on eg. sys_mmap with gdb no
longer works.

There seems to be no compelling reason to add the "__powerpc_" prefix,
other than that it follows what some other arches do (x86, arm64, s390).

But unlike other arches powerpc doesn't always enable syscall wrappers,
so the syscall entry points can change name depending on CONFIG options.

For those reasons drop the "__powerpc_" prefix, reverting to the
existing naming.

Doing so reveals two prototypes in signal.h that have the incorrect type
when syscall wrappers are enabled. There are already prototypes for both
functions in syscalls.h, so drop the ones from signal.h.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006135940.1223988-1-mpe@ellerman.id.au
2022-10-07 00:59:54 +11:00
Nicholas Piggin
0fa6831811 powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
irq soft-masking means that when Linux irqs are disabled, the MSR[EE]
value can change from 1 to 0 asynchronously: if a masked interrupt of
the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled,
the masked handler will return with MSR[EE]=0.

This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can
be called with local irqs disabled, unless a hard_irq_disable has been
done.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com
2022-10-04 23:16:20 +11:00
Michael Ellerman
41dc056391 powerpc: Add hardware description string
Create a hardware description string, which we will use to record
various details of the hardware platform we are running on.

Print the accumulated description at boot, and use it to set the generic
description which is printed in oopses.

To begin with add ppc_md.name, aka the "machine description".

Example output at boot with the full series applied:

  Linux version 6.0.0-rc2-gcc-11.1.0-00199-g893f9007a5ce-dirty (michael@alpine1-p1) (powerpc64-linux-gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #844 SMP Thu Sep 29 22:29:53 AEST 2022
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-5b4c5a pSeries
  printk: bootconsole [udbg0] enabled

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-1-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Aneesh Kumar K.V
d368e0c478 powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
This function does the hash page table update. Hence rename it to
indicate this better to avoid confusion with flush_pmd_tlb_range()

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Drop unnecessary extern]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220907081941.209501-1-aneesh.kumar@linux.ibm.com
2022-09-30 18:35:52 +10:00
Haren Myneni
335e1a9104 powerpc: Ignore DSI error caused by the copy/paste instruction
The data storage interrupt (DSI) error will be generated when the
paste operation is issued on the suspended Nest Accelerator (NX)
window due to NX state changes. The hypervisor expects the
partition to ignore this error during page fault handling.
To differentiate DSI caused by an actual HW configuration or by
the NX window, a new “ibm,pi-features” type value is defined.
Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
indicate this DSI error. If this error is not ignored, the user
space can get SIGBUS when the NX request is issued.

This patch adds changes to read ibm,pi-features property and ignore
DSI error during page fault handling if MMU_FTR_NX_DSI is defined.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Mention PAPR version in comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b9cd844b85eb8f70459109ce1b14e44c4cc85fa7.camel@linux.ibm.com
2022-09-28 22:52:32 +10:00
Michael Ellerman
19c95df127 powerpc: Reverse stack frame marker on little endian
On little endian the stack frame marker appears reversed when dumping
memory sequentially, as is typical in xmon or gdb, eg:

  c000000004733e40 0000000000000000 0000000000000000  |................|
  c000000004733e50 0000000000000000 0000000000000000  |................|
  c000000004733e60 0000000000000000 0000000000000000  |................|
  c000000004733e70 5347455200000000 0000000000000000  |SGER............|
  c000000004733e80 a700000000000000 708897f7ff7f0000  |........p.......|
  c000000004733e90 0073428fff7f0000 208997f7ff7f0000  |.sB..... .......|
  c000000004733ea0 0100000000000000 ffffffffffffffff  |................|
  c000000004733eb0 0000000000000000 0000000000000000  |................|

To make it easier to recognise, reverse the value on little endian, so
it always appears as "REGS", eg:

  c000000004733e70 5245475300000000 0000000000000000  |REGS............|

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220927150419.1503001-2-mpe@ellerman.id.au
2022-09-28 22:21:17 +10:00
Michael Ellerman
bbd7170908 powerpc: Make stack frame marker upper case
Now that the stack frame regs marker is only 32-bits it is not as
obvious in memory dumps and easier to miss, eg:

  c000000004733e40 0000000000000000 0000000000000000  |................|
  c000000004733e50 0000000000000000 0000000000000000  |................|
  c000000004733e60 0000000000000000 0000000000000000  |................|
  c000000004733e70 7367657200000000 0000000000000000  |sger............|
  c000000004733e80 a700000000000000 708897f7ff7f0000  |........p.......|
  c000000004733e90 0073428fff7f0000 208997f7ff7f0000  |.sB..... .......|
  c000000004733ea0 0100000000000000 ffffffffffffffff  |................|
  c000000004733eb0 0000000000000000 0000000000000000  |................|

So make it upper case to make it stand out a bit more:

  c000000004733e70 5347455200000000 0000000000000000  |SGER............|

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220927150419.1503001-1-mpe@ellerman.id.au
2022-09-28 22:21:11 +10:00
Pali Rohár
6bd7ff497b powerpc/udbg: Remove extern function prototypes
'extern' keyword is pointless and deprecated for function prototypes.

Signed-off-by: Pali Rohár <pali@kernel.org>
Suggested-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220822231751.16973-1-pali@kernel.org
2022-09-28 19:22:14 +10:00
Nicholas Piggin
2f5182cffa powerpc/64s: early boot machine check handler
Use the early boot interrupt fixup in the machine check handler to allow
the machine check handler to run before interrupt endian is set up.
Branch to an early boot handler that just does a basic crash, which
allows it to run before ppc_md is set up. MSR[ME] is enabled on the boot
CPU earlier, and the machine check stack is temporarily set to the
middle of the init task stack.

This allows machine checks (e.g., due to invalid data access in real
mode) to print something useful earlier in boot (as soon as udbg is set
up, if CONFIG_PPC_EARLY_DEBUG=y).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926055620.2676869-3-npiggin@gmail.com
2022-09-28 19:22:13 +10:00
Nicholas Piggin
3569d84bb2 powerpc/64e: provide an addressing macro for use with TOC in alternate register
The interrupt entry code carefully saves a minimal number of registers,
so in some places the TOC is required, it is loaded into a different
register, so provide a macro that can supply an alternate TOC register.

This continues to use got addressing because TOC-relative results in
"got/toc optimization is not supported" messages by the linker. Having
r2 be one of the saved registers and using that for TOC addressing may
be the best way to avoid that and switch this to TOC addressing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-6-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin
8e93fb33c8 powerpc/64: provide a helper macro to load r2 with the kernel TOC
A later change stops the kernel using r2 and loads it with a poison
value.  Provide a PACATOC loading abstraction which can hide this
detail.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin
754f611774 powerpc/64: switch asm helpers from GOT to TOC relative addressing
There is no need to use GOT addressing within the kernel.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-4-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin
17773afdcd powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER
Using a 32-bit constant for this marker allows it to be loaded with
two ALU instructions, like 32-bit. This avoids a TOC entry and a
TOC load that depends on the r2 value that has just been loaded from
the PACA.

This changes the value for 32-bit as well, so both have the same
value in the low 4 bytes and 64-bit has 0 in the top bytes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-2-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin
f7bff6e775 powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entry
BUG/WARN are handled with a program interrupt which can turn into an
infinite recursion when there are bugs in interrupt handler entry
(which can be irritated by bugs in other parts of the code).

There is one feeble attempt to avoid this recursion, but it misses
several cases. Make a tidier macro for this and switch most bugs in
the interrupt entry wrapper over to use it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-7-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin
9524f2278f powerpc/64s: Fix irq state management in runlatch functions
When irqs are soft-disabled, MSR[EE] is volatile and can change from
1 to 0 asynchronously (if a PACA_IRQ_MUST_HARD_MASK interrupt hits).
So it can not be used to check hard IRQ enabled status, except to
confirm it is disabled.

ppc64_runlatch_on/off functions use MSR this way to decide whether to
re-enable MSR[EE] after disabling it, which leads to MSR[EE] being
enabled when it shouldn't be (when a PACA_IRQ_MUST_HARD_MASK had
disabled it between reading the MSR and clearing EE).

This has been tolerated in the kernel previously, and it doesn't seem
to cause a problem, but it is unexpected and may trip warnings or cause
other problems as we tighten up this state management. Fix this by only
re-enabling if PACA_IRQ_HARD_DIS is clear.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-5-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin
56adbb7a8b powerpc/64/interrupt: Fix false warning in context tracking due to idle state
Commit 171476775d ("context_tracking: Convert state to atomic_t")
added a CONTEXT_IDLE state which can be encountered by interrupts from
kernel mode in the idle thread, causing a false positive warning.

Fixes: 171476775d ("context_tracking: Convert state to atomic_t")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-2-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Miehlbradt
a5edf9815d powerpc/64s: Enable KFENCE on book3s64
KFENCE support was added for ppc32 in commit 90cbac0e99
("powerpc: Enable KFENCE for PPC32").
Enable KFENCE on ppc64 architecture with hash and radix MMUs.
It uses the same mechanism as debug pagealloc to
protect/unprotect pages. All KFENCE kunit tests pass on both
MMUs.

KFENCE memory is initially allocated using memblock but is
later marked as SLAB allocated. This necessitates the change
to __pud_free to ensure that the KFENCE pages are freed
appropriately.

Based on previous work by Christophe Leroy and Jordan Niethe.

Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926075726.2846-4-nicholas@linux.ibm.com
2022-09-28 19:22:10 +10:00
Nicholas Piggin
9c7bfc2dc2 powerpc/64s: Make POWER10 and later use pause_short in cpu_relax loops
We want to move away from using SMT priority updates for cpu_relax, and
use a 'wait' instruction which is similar to x86. As well as being a
much better fit for what everybody else uses and tests with, priority
nops are stateful which is nasty (interrupts have to consider they might
be taken at a different priority), and they're expensive to execute,
similar to a mtSPR which can effect other threads in the pipe.

This has shown to give results that are less affected by code alignment
on benchmarks that cause a lot of spin waiting (e.g., rwsem contention
on unixbench filesystem benchmarks) on POWER10.

QEMU TCG only supports this instruction correctly since v7.1, versions
without the fix may cause hangs whne running POWER10 CPUs.

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix checkpatch warnings RE the macros]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220920122259.363092-2-npiggin@gmail.com
2022-09-28 19:22:10 +10:00
Nicholas Piggin
dabeb572ad powerpc: add ISA v3.0 / v3.1 wait opcode macro
The wait instruction encoding changed between ISA v2.07 and ISA v3.0.
In v3.1 the instruction gained a new field.

Update the PPC_WAIT macro to the current encoding. Rename the older
incompatible one with a _v203 suffix as it was introduced in v2.03
(the WC field was introduced in v2.07 but the kernel only uses WC=0).

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220920122259.363092-1-npiggin@gmail.com
2022-09-28 19:22:10 +10:00
Pali Rohár
b19448fe84 powerpc: Add support for early debugging via Serial 16550 console
Currently powerpc early debugging contains lot of platform specific
options, but does not support standard UART / serial 16550 console.

Later legacy_serial.c code supports registering UART as early debug console
from device tree but it is not early during booting, but rather later after
machine description code finishes.

So for real early debugging via UART is current code unsuitable.

Add support for new early debugging option CONFIG_PPC_EARLY_DEBUG_16550
which enable Serial 16550 console on address defined by new option
CONFIG_PPC_EARLY_DEBUG_16550_PHYSADDR and by stride by option
CONFIG_PPC_EARLY_DEBUG_16550_STRIDE.

With this change it is possible to debug powerpc machine descriptor code.
For example this early debugging code can print on serial console also
"No suitable machine description found" error which is done before
legacy_serial.c code.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220822231501.16827-1-pali@kernel.org
2022-09-28 19:22:09 +10:00
Rohan McLure
7e92e01b72 powerpc: Provide syscall wrapper
Implement syscall wrapper as per s390, x86, arm64. When enabled
cause handlers to accept parameters from a stack frame rather than
from user scratch register state. This allows for user registers to be
safely cleared in order to reduce caller influence on speculation
within syscall routine. The wrapper is a macro that emits syscall
handler symbols that call into the target handler, obtaining its
parameters from a struct pt_regs on the stack.

As registers are already saved to the stack prior to calling
system_call_exception, it appears that this function is executed more
efficiently with the new stack-pointer convention than with parameters
passed by registers, avoiding the allocation of a stack frame for this
method. On a 32-bit system, we see >20% performance increases on the
null_syscall microbenchmark, and on a Power 8 the performance gains
amortise the cost of clearing and restoring registers which is
implemented at the end of this series, seeing final result of ~5.6%
performance improvement on null_syscall.

Syscalls are wrapped in this fashion on all platforms except for the
Cell processor as this commit does not provide SPU support. This can be
quickly fixed in a successive patch, but requires spu_sys_callback to
allocate a pt_regs structure to satisfy the wrapped calling convention.

Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmai.com>
[mpe: Make incompatible with COMPAT to retain clearing of high bits of args]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-22-rmclure@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure
f8971c627b powerpc: Change system_call_exception calling convention
Change system_call_exception arguments to pass a pointer to a stack
frame container caller state, as well as the original r0, which
determines the number of the syscall. This has been observed to yield
improved performance to passing them by registers, circumventing the
need to allocate a stack frame.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain clearing of high bits of args for compat tasks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-21-rmclure@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure
8640de0dee powerpc: Use common syscall handler type
Cause syscall handlers to be typed as follows when called indirectly
throughout the kernel. This is to allow for better type checking.

typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
                           unsigned long, unsigned long, unsigned long);

Since both 32 and 64-bit abis allow for at least the first six
machine-word length parameters to a function to be passed by registers,
even handlers which admit fewer than six parameters may be viewed as
having the above type.

Coercing syscalls to syscall_fn requires a cast to void* to avoid
-Wcast-function-type.

Fixup comparisons in VDSO to avoid pointer-integer comparison. Introduce
explicit cast on systems with SPUs.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-19-rmclure@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure
8cd1def4b8 powerpc: Include all arch-specific syscall prototypes
Forward declare all syscall handler prototypes where a generic prototype
is not provided in either linux/syscalls.h or linux/compat.h in
asm/syscalls.h. This is required for compile-time type-checking for
syscall handlers, which is implemented later in this series.

32-bit compatibility syscall handlers are expressed in terms of types in
ppc32.h. Expose this header globally.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use standard include guard naming for syscalls_32.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-17-rmclure@linux.ibm.com
2022-09-28 19:22:08 +10:00
Rohan McLure
dec20c50df powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers
Arch-specific implementations of syscall handlers are currently used
over generic implementations for the following reasons:

1. Semantics unique to powerpc
2. Compatibility syscalls require 'argument padding' to comply with
   64-bit argument convention in ELF32 abi.
3. Parameter types or order is different in other architectures.

These syscall handlers have been defined prior to this patch series
without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with
custom input and output types. We remove every such direct definition in
favour of the aforementioned macros.

Also update syscalls.tbl in order to refer to the symbol names generated
by each of these macros. Since ppc64_personality can be called by both
64 bit and 32 bit binaries through compatibility, we must generate both
both compat_sys_ and sys_ symbols for this handler.

As an aside:
A number of architectures including arm and powerpc agree on an
alternative argument order and numbering for most of these arch-specific
handlers. A future patch series may allow for asm/unistd.h to signal
through its defines that a generic implementation of these syscall
handlers with the correct calling convention be emitted, through the
__ARCH_WANT_COMPAT_SYS_... convention.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-16-rmclure@linux.ibm.com
2022-09-28 19:22:08 +10:00
Rohan McLure
b7fa9ce86d powerpc: Remove direct call to mmap2 syscall handlers
Syscall handlers should not be invoked internally by their symbol names,
as these symbols defined by the architecture-defined SYSCALL_DEFINE
macro. Move the compatibility syscall definition for mmap2 to
syscalls.c, so that all mmap implementations can share a helper function.

Remove 'inline' on static mmap helper.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix compat_sys_mmap2() prototype and offset handling]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-14-rmclure@linux.ibm.com
2022-09-28 19:21:26 +10:00
Rohan McLure
b6b1334c95 powerpc/32: Remove powerpc select specialisation
Syscall #82 has been implemented for 32-bit platforms in a unique way on
powerpc systems. This hack will in effect guess whether the caller is
expecting new select semantics or old select semantics. It does so via a
guess, based off the first parameter. In new select, this parameter
represents the length of a user-memory array of file descriptors, and in
old select this is a pointer to an arguments structure.

The heuristic simply interprets sufficiently large values of its first
parameter as being a call to old select. The following is a discussion
on how this syscall should be handled.


As discussed in this thread, the existence of such a hack suggests that for
whatever powerpc binaries may predate glibc, it is most likely that they
would have taken use of the old select semantics. x86 and arm64 both
implement this syscall with oldselect semantics.

Remove the powerpc implementation, and update syscall.tbl to refer to emit
a reference to sys_old_select and compat_sys_old_select
for 32-bit binaries, in keeping with how other architectures support
syscall #82.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a1a0@csgroup.eu/
Link: https://lore.kernel.org/r/20220921065605.1051927-12-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure
c2e7a19827 powerpc: Use generic fallocate compatibility syscall
The powerpc fallocate compat syscall handler is identical to the
generic implementation provided by commit 59c10c52f5 ("riscv:
compat: syscall: Add compat_sys_call_table implementation"), and as
such can be removed in favour of the generic implementation.

A future patch series will replace more architecture-defined syscall
handlers with generic implementations, dependent on introducing generic
implementations that are compatible with powerpc and arm's parameter
reorderings.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-11-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure
016ff72bd2 powerpc: Fix fallocate and fadvise64_64 compat parameter combination
As reported[1] by Arnd, the arch-specific fadvise64_64 and fallocate
compatibility handlers assume parameters are passed with 32-bit
big-endian ABI. This affects the assignment of odd-even parameter pairs
to the high or low words of a 64-bit syscall parameter.

Fix fadvise64_64 fallocate compat handlers to correctly swap upper/lower
32 bits conditioned on endianness.

A future patch will replace the arch-specific compat fallocate with an
asm-generic implementation. This patch is intended for ease of
back-port.

[1]: https://lore.kernel.org/all/be29926f-226e-48dc-871a-e29a54e80583@www.fastmail.com/

Fixes: 57f48b4b74 ("powerpc/compat_sys: swap hi/lo parts of 64-bit syscall args in LE mode")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-9-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure
9d54a5ce3a powerpc: Add ZEROIZE_GPRS macros for register clears
Provide register zeroing macros, following the same convention as
existing register stack save/restore macros, to be used in later
change to concisely zero a sequence of consecutive gprs.

The resulting macros are called ZEROIZE_GPRS and ZEROIZE_NVGPRS, keeping
with the naming of the accompanying restore and save macros, and usage
of zeroize to describe this operation elsewhere in the kernel.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-4-rmclure@linux.ibm.com
2022-09-26 23:00:14 +10:00
Rohan McLure
5ba6c9a912 powerpc: Remove asmlinkage from syscall handler definitions
The asmlinkage macro has no special meaning in powerpc, and prior to
this patch is used sporadically on some syscall handler definitions. On
architectures that do not define asmlinkage, it resolves to extern "C"
for C++ compilers and a nop otherwise. The current invocations of
asmlinkage provide far from complete support for C++ toolchains, and so
the macro serves no purpose in powerpc.

Remove all invocations of asmlinkage in arch/powerpc. These incidentally
only occur in syscall definitions and prototypes.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-2-rmclure@linux.ibm.com
2022-09-26 23:00:14 +10:00
Christophe Leroy
4af8354553 powerpc/irq: Refactor irq_soft_mask_{set,or}_return()
This partialy reapply commit ef5b570d37 ("powerpc/irq: Don't
open code irq_soft_mask helpers") which was reverted by
commit 684c68d92e ("Revert "powerpc/irq: Don't open code
irq_soft_mask helpers"")

irq_soft_mask_set_return() and irq_soft_mask_or_return()
are overset of irq_soft_mask_set().

Have them use irq_soft_mask_set() instead of duplicating it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/18473da42362ee8f07bce36b9caef8ba77d7633f.1663656054.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy
6556fd1a1e powerpc: Cleanup idle for e500
e500 idle setup is a bit messy.

e500_idle() is used for PPC32 while book3e_idle() is used for PPC64.
As they are mutually exclusive, call them all e500_idle().

Use CONFIG_MPC_85xx instead of PPC32 + E500 in Makefile and rename
idle_e500.c to idle_85xx.c .

Rename idle_book3e.c to idle_64e.c and remove #ifdef PPC64 in
as it's only built on PPC64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8039301334e948974c85ec5ef2db37751075185b.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy
aa5f59df20 powerpc: Remove CONFIG_PPC_BOOK3E_MMU
CONFIG_PPC_BOOK3E_MMU is redundant with CONFIG_PPC_E500.

Remove it.

Also rename mmu-book3e.h to mmu-e500.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c5549cd59a131204ff94ab909cad2e2dad4ddf2f.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy
3e7318584d powerpc: Remove CONFIG_PPC_FSL_BOOK3E
CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.

Remove it.

And rename five files accordingly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Rename include guards to match new file names]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/795cb93b88c9a0279289712e674f39e3b108a1b4.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Christophe Leroy
688de017ef powerpc: Change CONFIG_E500 to CONFIG_PPC_E500
It will be used outside arch/powerpc, make it clear its a
powerpc configuration item.

And we already have CONFIG_PPC_E500MC, so that will make
it more consistent.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e63b22083c11c4300f4a82d3123a46e5fdd54fa6.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Christophe Leroy
e0d68273d7 powerpc: Remove CONFIG_PPC_BOOK3E
CONFIG_PPC_BOOK3E is redundant with CONFIG_PPC_BOOK3E_64.

The later is more explicit about the fact that it's a 64 bits target.

Remove CONFIG_PPC_BOOK3E.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d0891490813c19cdcfc04678f512ea68cba3e64.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Christophe Leroy
dfc3095cec powerpc: Remove CONFIG_FSL_BOOKE
PPC_85xx is PPC32 only.
PPC_85xx always selects E500 and is the only PPC32 that
selects E500.
FSL_BOOKE is selected when E500 and PPC32 are selected.

So FSL_BOOKE is redundant with PPC_85xx.

Remove FSL_BOOKE.

And rename four files accordingly.

cpu_setup_fsl_booke.S is not renamed because it is linked to
PPC_FSL_BOOK3E and not to FSL_BOOKE as suggested by its name.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/08e3e15594e66d63b9e89c5b4f9c35153913c28f.1663606875.git.christophe.leroy@csgroup.eu
2022-09-26 22:47:37 +10:00
Christophe Leroy
76b719881a powerpc/cputable: Move __cpu_setup() prototypes out of cputable.h
Move all prototypes out of cputable.h

For that rename cpu_setup_power.h to cpu_setup.h and move all
prototypes in it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Standardise cpu_spec *spec formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f45118489ee450db654db8bbcdfd8f5907337c22.1663606875.git.christophe.leroy@csgroup.eu
2022-09-26 22:26:49 +10:00
Christophe Leroy
6cc07821ad powerpc/mm: Make PAGE_KERNEL_xxx macros grep-friendly
Avoid multi-lines to help getting a complete view when using
grep. They still remain under the 100 chars limit.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bc3f5a51949ee7f52dba36677db23d4337c7995.1662544980.git.christophe.leroy@csgroup.eu
2022-09-26 20:58:18 +10:00
Christophe Leroy
b997b2f57c powerpc/mm: Reduce redundancy in pgtable.h
PAGE_KERNEL_TEXT, PAGE_KERNEL_EXEC and PAGE_AGP are the same
for all powerpcs.

Remove duplicated definitions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/92254499430d13d99e4a4d7e9ad8e8284cb35380.1662544974.git.christophe.leroy@csgroup.eu
2022-09-26 20:58:18 +10:00
Christophe Leroy
73ea68ad0d powerpc/book3s: Inline first level of update_mmu_cache()
update_mmu_cache() voids when hash page tables are not used.
On PPC32 that means when MMU_FTR_HPTE_TABLE is not defined.
On PPC64 that means when RADIX is enabled.

Rename core part of update_mmu_cache() as __update_mmu_cache()
and include the initial verification in an inlined caller.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bea5ad0de7f83eff256116816d46c84fa0a444de.1662370698.git.christophe.leroy@csgroup.eu
2022-09-26 20:58:18 +10:00
Christophe Leroy
691cdf016d powerpc: Rely on generic definition of hugepd_t and is_hugepd when unused
CONFIG_ARCH_HAS_HUGEPD is used to tell core mm when huge page
directories are used.

When they are not used, no need to provide hugepd_t or is_hugepd(),
just rely on the core mm fallback definition.

For that, change core mm behaviour so that CONFIG_ARCH_HAS_HUGEPD
is used instead of indirect is_hugepd macro existence.

powerpc being the only user of huge page directories, there is no
impact on other architectures.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/da81462d93069bb90fe5e762dd3283a644318937.1662543243.git.christophe.leroy@csgroup.eu
2022-09-26 20:58:18 +10:00
Christophe Leroy
a26494cf4a powerpc/nohash: Remove pgd_huge() stub
linux/hugetlb.h has a fallback pgd_huge() macro for when
pgd_huge is not defined.

Remove the powerpc redundant definitions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae6aa7fce84f7abcbf67f534271a4a6dd7949b0d.1662543243.git.christophe.leroy@csgroup.eu
2022-09-26 20:58:18 +10:00
Michael Ellerman
79c5640ab4 powerpc/mm/64s: Drop p4d_leaf()
Because 64-bit Book3S uses pgtable-nop4d.h, the P4D is folded into the
PGD. So P4D entries are actually PGD entries, or vice versa.

The other way to think of it is that the P4D is a single entry page
table below the PGD. Zero bits of the address are needed to index into
the P4D, therefore a P4D entry maps the same size address space as a PGD
entry.

As explained in the previous commit, there are no huge page sizes
supported directly at the PGD level on 64-bit Book3S, so there are also
no huge page sizes supported at the P4D level.

Therefore p4d_is_leaf() can never be true, so drop the definition and
fallback to the default implementation that always returns false.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220903123640.719846-2-mpe@ellerman.id.au
2022-09-26 20:58:18 +10:00
Michael Ellerman
51da853e37 powerpc/mm/64s: Drop pgd_huge()
On powerpc there are two ways for huge pages to be represented in the
top level page table, aka PGD (Page Global Directory).

If the address space mapped by an individual PGD entry does not
correspond to a given huge page size, then the PGD entry points to a
non-standard page table, known as a "hugepd" (Huge Page Directory).
The hugepd contains some number of huge page PTEs sufficient to map the
address space with the given huge page size.

On the other hand, if the address space mapped by an individual PGD
entry does correspond exactly to a given huge page size, that PGD entry
is used to directly encode the huge page PTE in place. In this case the
pgd_huge() wrapper indicates to generic code that the PGD entry is
actually a huge page PTE.

This commit deals with the pgd_huge() case only, it does nothing with
respect to the hugepd case.

Over time the size of the virtual address space supported on powerpc has
increased several times, which means the location at which huge pages
can sit in the tree has also changed. There have also been new huge page
sizes added, with the introduction of the Radix MMU.

On Power9 and later with the Radix MMU, the largest huge page size in
any implementation is 1GB.

Since the introduction of Radix, 1GB entries have been supported at the
PUD level, with both 4K and 64K base page size. Radix has never had a
supported huge page size at the PGD level.

On Power8 or earlier, which uses the Hash MMU, or Power9 or later with
the Hash MMU enabled, the largest huge page size is 16GB.

Using the Hash MMU and a base page size of 4K, 16GB has never been a
supported huge page size at the PGD level, due to the geometry being
incompatible. The two supported huge page sizes (16M & 16GB) both use
the hugepd format.

Using the Hash MMU and a base page size of 64K, 16GB pages were
supported in the past at the PGD level.

However in commit ba95b5d035 ("powerpc/mm/book3s/64: Rework page table
geometry for lower memory usage") the page table layout was reworked to
shrink the size of the PGD.

As a result the 16GB page size now fits at the PUD level when using 64K
base page size.

Therefore there are no longer any supported configurations where
pgd_huge() can be true, so drop the definitions for pgd_huge(), and
fallback to the generic definition which is always false.

Fixes: ba95b5d035 ("powerpc/mm/book3s/64: Rework page table geometry for lower memory usage")
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220903123640.719846-1-mpe@ellerman.id.au
2022-09-26 20:58:18 +10:00
Michael Ellerman
b150a4d12b powerpc/vmlinux.lds: Add an explicit symbol for the SRWX boundary
Currently __init_begin is used as the boundary for strict RWX between
executable/read-only text and data, and non-executable (after boot) code
and data.

But that's a little subtle, so add an explicit symbol to document that
the SRWX boundary lies there, and add a comment making it clear that
__init_begin must also begin there.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916131422.318752-2-mpe@ellerman.id.au
2022-09-26 20:58:16 +10:00
Michael Ellerman
0c32903197 powerpc/64: Remove unused prom_init_toc symbols
Commit 24d33ac5b8 ("powerpc/64s: Make prom_init require RELOCATABLE")
made prom_init depend on CONFIG_RELOCATABLE.

But it missed cleaning up a case in the linker script for RELOCATABLE=n,
and associated symbols. Remove them now.

Fixes: 24d33ac5b8 ("powerpc/64s: Make prom_init require RELOCATABLE")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220920131157.1032707-1-mpe@ellerman.id.au
2022-09-26 20:58:16 +10:00
Michael Ellerman
ec5c3a359c Merge branch 'fixes' into next
Merge our fixes branch to bring in a few things that new feature patches
rely on or conflict with.
2022-09-24 00:40:14 +10:00
Gaosheng Cui
3d7a198cfd KVM: PPC: remove orphan declarations from kvm_ppc.h
Remove the following orphan declarations from kvm_ppc.h:

1. kvmppc_mmu_priv_switch() has been removed since
commit dd9ebf1f94 ("KVM: PPC: e500: Add shadow PID
support").
2. kvmppc_core_destroy_mmu() has been removed since
commit ecc0981ff0 ("KVM: ppc: cosmetic changes to mmu
hook names").
3. kvmppc_prepare_vrma() has been removed since
commit aa04b4cc5b ("KVM: PPC: Allocate RMAs (Real Mode
Areas) at boot for use by guests").

So remove the declarations for them from header file.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220913075029.682327-10-cuigaosheng1@huawei.com
2022-09-15 22:55:46 +10:00
Gaosheng Cui
b47f0024f9 powerpc/ps3: remove orphan declarations from ps3av.h
Remove the following orphan declarations from ps3av.h:
1. ps3av_dev_open()
2. ps3av_dev_close()

They have been removed since commit 13a5e30cf7 ("[POWERPC] PS3:
Rework AV settings driver"), so remove them.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220913075029.682327-9-cuigaosheng1@huawei.com
2022-09-15 22:55:46 +10:00
Gaosheng Cui
3abed8acfe powerpc/sysdev: remove unused xics_ipi_dispatch() declaration
xics_ipi_dispatch() has been removed since
commit 23d72bfd8f ("powerpc: Consolidate ipi message mux
and demux"), so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220913075029.682327-8-cuigaosheng1@huawei.com
2022-09-15 22:55:46 +10:00
Gaosheng Cui
77d3053581 powerpc/powernv: remove orphan declarations from opal.h
Remove the following orphan declarations from opal.h:
1. opal_notifier_register()
2. opal_notifier_unregister()
3. opal_notifier_update_evt()
4. opal_notifier_enable()
5. opal_notifier_disable()

They have been removed since commit 81f2f7ce4c ("opal:
Remove events notifier"), so remove them.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220913075029.682327-7-cuigaosheng1@huawei.com
2022-09-15 22:55:46 +10:00
Gaosheng Cui
d24b8f01fe powerpc/mm: remove orphan declarations from mmu_context.h
Remove the following orphan declarations from mmu_context.h:
1. switch_cop() and drop_cop() have been removed since
commit 6ff4d3e966 ("powerpc: Remove old unused icswx
based coprocessor support").

2. mm_iommu_cleanup() has been removed since
commit 4b6fad7097 ("powerpc/mm/iommu, vfio/spapr: Put
pages on VFIO container shutdown").

So remove the declarations for them from header file.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220913075029.682327-6-cuigaosheng1@huawei.com
2022-09-15 22:55:46 +10:00
Gaosheng Cui
b5a472ad81 powerpc: remove unused udbg_init_debug_beat() declaration
udbg_init_debug_beat() has been removed since
commit bf4981a006 ("powerpc: Remove the celleb support"),
so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220913075029.682327-5-cuigaosheng1@huawei.com
2022-09-15 22:55:46 +10:00
Nathan Lynch
f88aabad33 Revert "powerpc/rtas: Implement reentrant rtas call"
At the time this was submitted by Leonardo, I confirmed -- or thought
I had confirmed -- with PowerVM partition firmware development that
the following RTAS functions:

- ibm,get-xive
- ibm,int-off
- ibm,int-on
- ibm,set-xive

were safe to call on multiple CPUs simultaneously, not only with
respect to themselves as indicated by PAPR, but with arbitrary other
RTAS calls:

https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/

Recent discussion with firmware development makes it clear that this
is not true, and that the code in commit b664db8e3f ("powerpc/rtas:
Implement reentrant rtas call") is unsafe, likely explaining several
strange bugs we've seen in internal testing involving DLPAR and
LPM. These scenarios use ibm,configure-connector, whose internal state
can be corrupted by the concurrent use of the "reentrant" functions,
leading to symptoms like endless busy statuses from RTAS.

Fixes: b664db8e3f ("powerpc/rtas: Implement reentrant rtas call")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220907220111.223267-1-nathanl@linux.ibm.com
2022-09-14 22:43:13 +10:00
Kefeng Wang
2be9880dc8 kernel: exit: cleanup release_thread()
Only x86 has own release_thread(), introduce a new weak release_thread()
function to clean empty definitions in other ARCHs.

Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Guo Ren <guoren@kernel.org>				[csky]
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Brian Cain <bcain@quicinc.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
Acked-by: Stafford Horne <shorne@gmail.com>			[openrisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>		[arm64]
Acked-by: Huacai Chen <chenhuacai@kernel.org>			[LoongArch]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org> [csky]
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 21:55:07 -07:00
Al Viro
89bbeb7e31 termios: get rid of non-UAPI asm/termios.h
All non-UAPI asm/termios.h consist of include of UAPI counterpart
and, possibly, include of linux/uaccess.h

	The latter can't be simply removed, even though nothing in
linux/termios.h doesn't depend upon it anymore - there are several
places that rely upon that indirect chain of includes to pull
linux/uaccess.h.  So the include needs to be lifted out of there -
we lift into tty_driver.h, serdev.h and places that pull asm/termios.h,
but none of
	* linux/uaccess.h (obvious)
	* net/sock.h (pulls uaccess.h)
	* linux/{tty,tty_driver,serdev}.h (tty.h pulls tty_driver.h)

That leaves us just with the include of UAPI asm/termios.h, which is
what <asm/termios.h> will resolve to if we simply remove non-UAPI header.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDnKvYCHn/ogBUv@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-09 10:44:35 +02:00
Al Viro
d04f9915fa make generic INIT_C_CC a bit more generic
turn it into an array initializer; then alpha, mips and powerpc
variants fold into it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDm7M6M91gC2RPL@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-09 10:44:35 +02:00
Al Viro
38fc315a73 termios: consolidate values for VDISCARD in INIT_C_CC
On old systems it used to be ^O.  Linux had never actually used
the value, but INIT_C_CC (on i386) did initialize it to ^O;
unfortunately, it had a typo in the comment claiming that to be
^U.  Most of the architectures copied the (correct) definition
along with mistaken comment.  alpha, powerpc and sparc tried
to make the definition match comment.

However, util-linux still resets it to ^O on any architecture,
^O is the historical value, kernel ignores it anyway and finally,
Linus said "Just change everybody to do the same, nobody cares
about VDISCARD".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDmy//MKzs3ye7l@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-09 10:44:34 +02:00
Al Viro
c9874d3ffe termios: start unifying non-UAPI parts of asm/termios.h
* new header (linut/termios_internal.h), pulled by the users of those
suckers
* defaults for INIT_C_CC and externs for conversion helpers moved over
there
* remove termios-base.h (empty now)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDmptU7dNGZ+/Hn@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-09 10:44:34 +02:00
Nicholas Piggin
b11931e9ad powerpc/64s: add pte_needs_flush and huge_pmd_needs_flush
Allow PTE changes to avoid flushing the TLB when access permissions are
being relaxed, the dirty bit is being set, and the accessed bit is being
changed.

Relaxing access permissions and setting dirty and accessed bits do not
require a flush because the MMU will re-load the PTE and notice the
updates (it may also cause a spurious fault).

Clearing the accessed bit does not require a flush because of the
imprecise PTE accessed bit accounting that is already performed, as
documented in ptep_clear_flush_young().

This reduces TLB flushing for some mprotect(2) calls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20220901110334.1618913-1-npiggin@gmail.com
2022-09-08 11:11:18 +10:00
Christophe Leroy
78c73c80fd powerpc/math-emu: Inhibit W=1 warnings
When building with W=1 you get:

	arch/powerpc/math-emu/fre.c:6:5: error: no previous prototype for 'fre' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fsqrt.c:11:1: error: no previous prototype for 'fsqrt' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fsqrts.c:12:1: error: no previous prototype for 'fsqrts' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/frsqrtes.c:6:5: error: no previous prototype for 'frsqrtes' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/mtfsf.c:10:1: error: no previous prototype for 'mtfsf' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/mtfsfi.c:10:1: error: no previous prototype for 'mtfsfi' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fabs.c:7:1: error: no previous prototype for 'fabs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fadd.c:11:1: error: no previous prototype for 'fadd' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fadds.c:12:1: error: no previous prototype for 'fadds' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fcmpo.c:11:1: error: no previous prototype for 'fcmpo' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fcmpu.c:11:1: error: no previous prototype for 'fcmpu' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fcmpu.c:14:19: error: variable 'B_c' set but not used [-Werror=unused-but-set-variable]
	arch/powerpc/math-emu/fcmpu.c:13:19: error: variable 'A_c' set but not used [-Werror=unused-but-set-variable]
	arch/powerpc/math-emu/fctiw.c:11:1: error: no previous prototype for 'fctiw' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fctiwz.c:11:1: error: no previous prototype for 'fctiwz' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fdiv.c:11:1: error: no previous prototype for 'fdiv' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fdivs.c:12:1: error: no previous prototype for 'fdivs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fmadd.c:11:1: error: no previous prototype for 'fmadd' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fmadds.c:12:1: error: no previous prototype for 'fmadds' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fmsub.c:11:1: error: no previous prototype for 'fmsub' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fmsubs.c:12:1: error: no previous prototype for 'fmsubs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fmul.c:11:1: error: no previous prototype for 'fmul' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fmuls.c:12:1: error: no previous prototype for 'fmuls' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fnabs.c:7:1: error: no previous prototype for 'fnabs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fneg.c:7:1: error: no previous prototype for 'fneg' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fnmadd.c:11:1: error: no previous prototype for 'fnmadd' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fnmadds.c:12:1: error: no previous prototype for 'fnmadds' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fnmsub.c:11:1: error: no previous prototype for 'fnmsub' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fnmsubs.c:12:1: error: no previous prototype for 'fnmsubs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fres.c:7:1: error: no previous prototype for 'fres' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/frsp.c:12:1: error: no previous prototype for 'frsp' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fsel.c:11:1: error: no previous prototype for 'fsel' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/lfs.c:12:1: error: no previous prototype for 'lfs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/frsqrte.c:7:1: error: no previous prototype for 'frsqrte' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fsub.c:11:1: error: no previous prototype for 'fsub' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fsubs.c:12:1: error: no previous prototype for 'fsubs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/mcrfs.c:10:1: error: no previous prototype for 'mcrfs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/mffs.c:10:1: error: no previous prototype for 'mffs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/mtfsb0.c:10:1: error: no previous prototype for 'mtfsb0' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/mtfsb1.c:10:1: error: no previous prototype for 'mtfsb1' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/stfiwx.c:7:1: error: no previous prototype for 'stfiwx' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/stfs.c:12:1: error: no previous prototype for 'stfs' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/fmr.c:7:1: error: no previous prototype for 'fmr' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/lfd.c:10:1: error: no previous prototype for 'lfd' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/stfd.c:7:1: error: no previous prototype for 'stfd' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/math_efp.c:177:5: error: no previous prototype for 'do_spe_mathemu' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/math_efp.c:726:5: error: no previous prototype for 'speround_handler' [-Werror=missing-prototypes]
	arch/powerpc/math-emu/math_efp.c:893:12: error: no previous prototype for 'spe_mathemu_init' [-Werror=missing-prototypes]

Fix the warnings in math_efp.c by adding prototypes of do_spe_mathemu()
and speround_handler() to asm/processor.h and declare spe_mathemu_init()
static.

The other warnings are benign and not worth the churn of fixing them,
expecially the 'unused-but-set-variable' which would impact the core
part of 'math-emu'.

So silence them by adding -Wno-missing-prototypes -Wno-unused-but-set-variable.

But then you get:

	arch/powerpc/math-emu/fre.c:6:5: error: no previous declaration for 'fre' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fsqrt.c:11:1: error: no previous declaration for 'fsqrt' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fsqrts.c:12:1: error: no previous declaration for 'fsqrts' [-Werror=missing-declarations]
	arch/powerpc/math-emu/frsqrtes.c:6:5: error: no previous declaration for 'frsqrtes' [-Werror=missing-declarations]
	arch/powerpc/math-emu/mtfsf.c:10:1: error: no previous declaration for 'mtfsf' [-Werror=missing-declarations]
	arch/powerpc/math-emu/mtfsfi.c:10:1: error: no previous declaration for 'mtfsfi' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fabs.c:7:1: error: no previous declaration for 'fabs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fadd.c:11:1: error: no previous declaration for 'fadd' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fadds.c:12:1: error: no previous declaration for 'fadds' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fcmpo.c:11:1: error: no previous declaration for 'fcmpo' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fcmpu.c:11:1: error: no previous declaration for 'fcmpu' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fctiw.c:11:1: error: no previous declaration for 'fctiw' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fctiwz.c:11:1: error: no previous declaration for 'fctiwz' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fdiv.c:11:1: error: no previous declaration for 'fdiv' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fdivs.c:12:1: error: no previous declaration for 'fdivs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fmadd.c:11:1: error: no previous declaration for 'fmadd' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fmadds.c:12:1: error: no previous declaration for 'fmadds' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fmsub.c:11:1: error: no previous declaration for 'fmsub' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fmsubs.c:12:1: error: no previous declaration for 'fmsubs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fmul.c:11:1: error: no previous declaration for 'fmul' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fmuls.c:12:1: error: no previous declaration for 'fmuls' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fnabs.c:7:1: error: no previous declaration for 'fnabs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fneg.c:7:1: error: no previous declaration for 'fneg' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fnmadd.c:11:1: error: no previous declaration for 'fnmadd' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fnmadds.c:12:1: error: no previous declaration for 'fnmadds' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fnmsub.c:11:1: error: no previous declaration for 'fnmsub' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fnmsubs.c:12:1: error: no previous declaration for 'fnmsubs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fres.c:7:1: error: no previous declaration for 'fres' [-Werror=missing-declarations]
	arch/powerpc/math-emu/frsp.c:12:1: error: no previous declaration for 'frsp' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fsel.c:11:1: error: no previous declaration for 'fsel' [-Werror=missing-declarations]
	arch/powerpc/math-emu/lfs.c:12:1: error: no previous declaration for 'lfs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/frsqrte.c:7:1: error: no previous declaration for 'frsqrte' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fsub.c:11:1: error: no previous declaration for 'fsub' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fsubs.c:12:1: error: no previous declaration for 'fsubs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/mcrfs.c:10:1: error: no previous declaration for 'mcrfs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/mffs.c:10:1: error: no previous declaration for 'mffs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/mtfsb0.c:10:1: error: no previous declaration for 'mtfsb0' [-Werror=missing-declarations]
	arch/powerpc/math-emu/mtfsb1.c:10:1: error: no previous declaration for 'mtfsb1' [-Werror=missing-declarations]
	arch/powerpc/math-emu/stfiwx.c:7:1: error: no previous declaration for 'stfiwx' [-Werror=missing-declarations]
	arch/powerpc/math-emu/stfs.c:12:1: error: no previous declaration for 'stfs' [-Werror=missing-declarations]
	arch/powerpc/math-emu/fmr.c:7:1: error: no previous declaration for 'fmr' [-Werror=missing-declarations]
	arch/powerpc/math-emu/lfd.c:10:1: error: no previous declaration for 'lfd' [-Werror=missing-declarations]
	arch/powerpc/math-emu/stfd.c:7:1: error: no previous declaration for 'stfd' [-Werror=missing-declarations]

So also add -Wno-missing-declarations.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/688084b40b5ac88f2905cb207d5dad947d8d34dc.1662531153.git.christophe.leroy@csgroup.eu
2022-09-08 11:11:18 +10:00
Nicholas Piggin
6ba5aa541a powerpc/pseries: Move dtl scanning and steal time accounting to pseries platform
dtl is the PAPR Dispatch Trace Log, which is entirely a pseries feature.
The pseries platform alrady has a file dealing with the dtl, so move
scanning for stolen time accounting there from kernel/time.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902085316.2071519-5-npiggin@gmail.com
2022-09-05 14:14:27 +10:00
Nicholas Piggin
0e8a631328 powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING
CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen
time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled.
Implement this using the VPA accumulated wait counters.

Note this will not work on current KVM hosts because KVM does not
implement the VPA dispatch counters (yet). It could be implemented
with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE,
but that is not necessary for the more limited accounting provided
by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and
has downsides like potential log wrap.

From Shrikanth:

  [...] it was tested on Power10 [PowerVM] Shared LPAR. system has two
  LPAR. we will call first one LPAR1 and second one as LPAR2. Test was
  carried out in SMT=1. Similar observation was seen in SMT=8 as well.

  LPAR config header from each LPAR is below. LPAR1 is twice as big as
  LPAR2. Since Both are sharing the same underlying hardware, work
  stealing will happen when both the LPAR's are contending for the same
  resource.

  LPAR1:
  type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00
  LPAR2:
  type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00

  mpstat was used to check for the utilization. stress-ng has been used
  as the workload. Few cases are tested. when the both LPAR are idle
  there is no steal time. when LPAR1 starts running at 100% which
  consumes all of the physical resource, steal time starts to get
  accounted.  With LPAR1 running at 100% and LPAR2 starts running, steal
  time starts increasing. This is as expected. When the LPAR2 Load is
  increased further, steal time increases further.

  Case 1: 0% LPAR1; 0% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
   0.00   0.00   0.05   0.00   0.00   0.00   0.00   0.00   0.00  99.95

  Case 2: 100% LPAR1; 0% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  97.68   0.00   0.00   0.00   0.00   0.00   2.32   0.00   0.00   0.00

  Case 3: 100% LPAR1; 50% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  86.34   0.00   0.10   0.00   0.00   0.03  13.54   0.00   0.00   0.00

  Case 4: 100% LPAR1; 100% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  78.54   0.00   0.07   0.00   0.00   0.02  21.36   0.00   0.00   0.00

  Case 5: 50% LPAR1; 100% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  49.37   0.00   0.00   0.00   0.00   0.00   1.17   0.00   0.00  49.47

  Patch is accounting for the steal time and basic tests are holding
  good.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
[mpe: Add SPDX tag to new paravirt_api_clock.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com
2022-09-05 14:14:02 +10:00
Nicholas Piggin
a8933c8d55 powerpc/pseries: Add wait interval counter definitions to struct lppaca
The hypervisor exposes accumulated partition scheduling interval times
in the VPA (lppaca). These can be used to implement a simple stolen time
in the guest without complex and costly dtl scanning.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Link: https://lore.kernel.org/r/20220902085316.2071519-2-npiggin@gmail.com
2022-09-04 22:39:59 +10:00
Michael Ellerman
684c68d92e Revert "powerpc/irq: Don't open code irq_soft_mask helpers"
This reverts commit ef5b570d37.

Zhouyi reported that commit is causing crashes when running rcutorture
with KASAN enabled:

  BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100
  caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0
  CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G        W          5.19.0-rc5-next-20220708-dirty #253
  Call Trace:
    dump_stack_lvl+0xbc/0x108 (unreliable)
    check_preemption_disabled+0x154/0x160
    rcu_preempt_deferred_qs_irqrestore+0x74/0xed0
    __rcu_read_unlock+0x290/0x3b0
    rcu_torture_read_unlock+0x30/0xb0
    rcutorture_one_extend+0x198/0x810
    rcu_torture_one_read+0x58c/0xc90
    rcu_torture_reader+0x12c/0x360
    kthread+0x1e8/0x220
    ret_from_kernel_thread+0x5c/0x64

KASAN will generate instrumentation instructions around the
WRITE_ONCE(local_paca->irq_soft_mask, mask):

   0xc000000000295cb0 <+0>:	addis   r2,r12,774
   0xc000000000295cb4 <+4>:	addi    r2,r2,16464
   0xc000000000295cb8 <+8>:	mflr    r0
   0xc000000000295cbc <+12>:	bl      0xc00000000008bb4c <mcount>
   0xc000000000295cc0 <+16>:	mflr    r0
   0xc000000000295cc4 <+20>:	std     r31,-8(r1)
   0xc000000000295cc8 <+24>:	addi    r3,r13,2354
   0xc000000000295ccc <+28>:	mr      r31,r13
   0xc000000000295cd0 <+32>:	std     r0,16(r1)
   0xc000000000295cd4 <+36>:	stdu    r1,-48(r1)
   0xc000000000295cd8 <+40>:	bl      0xc000000000609b98 <__asan_store1+8>
   0xc000000000295cdc <+44>:	nop
   0xc000000000295ce0 <+48>:	li      r9,1
   0xc000000000295ce4 <+52>:	stb     r9,2354(r31)
   0xc000000000295ce8 <+56>:	addi    r1,r1,48
   0xc000000000295cec <+60>:	ld      r0,16(r1)
   0xc000000000295cf0 <+64>:	ld      r31,-8(r1)
   0xc000000000295cf4 <+68>:	mtlr    r0

If there is a context switch before "stb     r9,2354(r31)", r31 may
not equal to r13, in such case, irq soft mask will not work.

The usual solution of marking the code ineligible for instrumentation
forces the code out-of-line, which we would prefer to avoid. Christophe
proposed a partial revert, but Nick raised some concerns with that. So
for now do a full revert.

Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
[mpe: Construct change log based on Zhouyi's original report]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220831131052.42250-1-mpe@ellerman.id.au
2022-09-02 18:54:53 +10:00
Christophe Leroy
814816d71e powerpc: Fix hard_irq_disable() with sanitizer
As reported by Zhouyi Zhou, WRITE_ONCE() is not atomic
as expected when KASAN or KCSAN are compiled in.

Fix it by re-implementing it using inline assembly.

Fixes: 077fc62b2b ("powerpc/irq: remove inline assembly in hard_irq_disable macro")
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a8298991b3df049a54ee8e558838e34265812014.1661272586.git.christophe.leroy@csgroup.eu
2022-08-31 23:05:27 +10:00
Russell Currey
395cac7752 powerpc/mm: Support execute-only memory on the Radix MMU
Add support for execute-only memory (XOM) for the Radix MMU by using an
execute-only mapping, as opposed to the RX mapping used by powerpc's
other MMUs.

The Hash MMU already supports XOM through the execute-only pkey,
which is a separate mechanism shared with x86.  A PROT_EXEC-only mapping
will map to RX, and then the pkey will be applied on top of it.

mmap() and mprotect() consumers in userspace should observe the same
behaviour on Hash and Radix despite the differences in implementation.

Replacing the vma_is_accessible() check in access_error() with a read
check should be functionally equivalent for non-Radix MMUs, since it
follows write and execute checks.  For Radix, the change enables
detecting faults on execute-only mappings where vma_is_accessible() would
return true.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220817050640.406017-1-ruscur@russell.cc
2022-08-26 11:02:21 +10:00
Christophe Leroy
e38cd72c17 powerpc: Remove stale declarations in mmu_decl.h
rtas_size and rtas_data are not used anymore since at least
commit 7c8c6b9776 ("powerpc: Merge lmb.c and make MM initialization
use it.")

Remove them.

Since commit 4b74a35fc7 ("powerpc/32s: Make Hash var static")
the forward declaration of struct hash_pte is unneeded.

Remove it.

__initial_memory_limit_addr was removed by commit e63075a3c9 ("memblock:
Introduce default allocation limit and use it to replace explicit ones")

Remove the declaration.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a821e8397dd56b8177ecc04966d3b3a7c4bda6d4.1660919016.git.christophe.leroy@csgroup.eu
2022-08-26 11:02:20 +10:00
Michael Ellerman
eb316ae798 powerpc: Move patch sites out of asm-prototypes.h
The definitions for the patch sites etc. don't belong in
asm-prototypes.h, they are not EXPORT'ed asm symbols.

Move them into sections.h which is traditionally used for asm symbols.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220818050659.187181-1-mpe@ellerman.id.au
2022-08-26 11:02:20 +10:00
Michael Ellerman
310d1344e3 Revert "powerpc: Remove unused FW_FEATURE_NATIVE references"
This reverts commit 79b74a6848.

It broke booting on IBM Cell machines when the kernel is also built with
CONFIG_PPC_PS3=y.

That's because FW_FEATURE_NATIVE_ALWAYS = 0 does have an important
effect, which is to clear the PS3 ALWAYS features from
FW_FEATURE_ALWAYS.

Note that CONFIG_PPC_NATIVE has since been renamed
CONFIG_PPC_HASH_MMU_NATIVE.

Fixes: 79b74a6848 ("powerpc: Remove unused FW_FEATURE_NATIVE references")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220823115952.1203106-1-mpe@ellerman.id.au
2022-08-26 08:41:54 +10:00
Christophe Leroy
8a8f786666 powerpc/vdso: Don't map VDSO at a fixed address on PPC32
PPC64 removed default mapping address from VDSO in
commit 30d0b36828 ("powerpc: Move 64bit VDSO to improve context
switch performance").

Do like PPC64 and let get_unmapped_area() place the VDSO mapping
at the address it wants, don't force a default address.

This allows randomisation of VDSO address.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cba76f5a5b01fcc49415e632d92c11c1c5998cab.1660843877.git.christophe.leroy@csgroup.eu
2022-08-22 13:36:59 +10:00
Chao Peng
20ec3ebd70 KVM: Rename mmu_notifier_* to mmu_invalidate_*
The motivation of this renaming is to make these variables and related
helper functions less mmu_notifier bound and can also be used for non
mmu_notifier based page invalidation. mmu_invalidate_* was chosen to
better describe the purpose of 'invalidating' a page that those
variables are used for.

  - mmu_notifier_seq/range_start/range_end are renamed to
    mmu_invalidate_seq/range_start/range_end.

  - mmu_notifier_retry{_hva} helper functions are renamed to
    mmu_invalidate_retry{_hva}.

  - mmu_notifier_count is renamed to mmu_invalidate_in_progress to
    avoid confusion with mn_active_invalidate_count.

  - While here, also update kvm_inc/dec_notifier_count() to
    kvm_mmu_invalidate_begin/end() to match the change for
    mmu_notifier_count.

No functional change intended.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:41 -04:00
Linus Torvalds
d785610f05 powerpc fixes for 6.0 #2
- Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit CPUs trap on it
    rather than ignoring it as they should.
 
  - Fix ftrace when building with clang, which was broken by some refactoring.
 
  - A couple of other minor fixes.
 
 Thanks to: Christophe Leroy, Naveen N. Rao, Nick Desaulniers, Ondrej Mosnacek, Pali Rohár,
 Russell Currey Segher Boessenkool.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmL4KksTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEdED/4/YGF3az2N9a0cHt+qnuO2gMGuRdZ2
 COV1d2+2+37nCk2sqrPX+ASBYK0F6vXTivOCIYIItkWD1vYktHC4LrG+slUltb1S
 ZHttp+IYLC5S24v89ViKSQuQ1MNYWU86ii9Wq/mKilsDCrGWpAHpnAv/wip2Epzv
 Xk/9XWE8MCEBeVwJIXBKgbvOC0+pmftyrNtuHDW+AmST+km54ch1OpomAYHZOnsr
 ix1FdJwr/HeZn8EwFYyUePcFnUjNdTa5/Ty4BYn1VPKzEmmAae9UEPM61fmxlOhr
 JiYG0S8/QgZDcjkSsFZGNWmsOPACrYMMCnHQi3VAHRndnXjX6gTv3mPyLBblTbCf
 C3o5xYDqfY0rgXZSCM4NT0IutZ+5yv7q2p/jYn55NR6OpB/D/94pOJdZr6a1H4Cz
 e5Pr0bN2rLlaHTofqscDoWMCjAcyyymQclB/dqrBCvDViiJj3efCS4wrFCoSNnlG
 a0xEYecxprtL+wFD+E5LH8A10Ll1y5zNfh3rXzIMWtX5iWT0Rjx040SAFDG2a/cG
 A3IBKqT85dPGgxQEEV2uP6GGIjl+7Fd6nX+FVgl5AkFo1irdndng7G4SlJmBI3or
 nM5KqJ1nizjkDuCrYboOoZEdQKULHMv6lWS3hbgXS+rJkf4K75wuIC37V1SRcNmy
 VjNvssP3rVX63A==
 =SpYx
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
   CPUs trap on it rather than ignoring it as they should.

 - Fix ftrace when building with clang, which was broken by some
   refactoring.

 - A couple of other minor fixes.

Thanks to Christophe Leroy, Naveen N.  Rao, Nick Desaulniers, Ondrej
Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.

* tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kexec: Fix build failure from uninitialised variable
  powerpc/ppc-opcode: Fix PPC_RAW_TW()
  powerpc64/ftrace: Fix ftrace for clang builds
  powerpc: Make eh value more explicit when using lwarx
  powerpc: Don't hide eh field of lwarx behind a macro
  powerpc: Fix eh field when calling lwarx on PPC32
2022-08-14 08:48:13 -07:00
Christophe Leroy
59bab33a4f powerpc/ppc-opcode: Fix PPC_RAW_TW()
PPC_RAW_TW() is erroneously defined with base code 0x7f000008
instead of 0x7c000008.

That's invisible because its only user is PPC_RAW_TRAP() which is
0x7fe00008, but fix it anyway to avoid any risk of future bug.

Fixes: d00d762daf ("powerpc/ppc-opcode: Define and use PPC_RAW_TRAP() and PPC_RAW_TW()")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eca9251f1e1f82c4c46ec6380ddb28356ab3fdfe.1659527244.git.christophe.leroy@csgroup.eu
2022-08-10 15:39:37 +10:00
Christophe Leroy
5cccf7a521 powerpc: Make eh value more explicit when using lwarx
Just like the first patch of this series, define a local 'eh' in order
to make the code clearer.

And IS_ENABLED() returns either 1 or 0 so no need to do
IS_ENABLED(CONFIG_PPC64) ? 1 : 0.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Use symbolic names, use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/629befaa2d05e2922346e58a383886510d6af55a.1659430931.git.christophe.leroy@csgroup.eu
2022-08-10 15:32:20 +10:00
Christophe Leroy
eb5a33ea31 powerpc: Don't hide eh field of lwarx behind a macro
The eh field must remain 0 for PPC32 and is only used
by PPC64.

Don't hide that behind a macro, just leave the responsibility
to the user.

At the time being, the only users of PPC_RAW_L{WDQ}ARX are
setting the eh field to 0, so the special handling of __PPC_EH
is useless. Just take the value given by the caller.

Same for DEFINE_TESTOP(), don't do special handling in that
macro, ensure the caller hands over the proper eh value.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8b9c8a1a14f9143552a85fcbf96698224a8c2469.1659430931.git.christophe.leroy@csgroup.eu
2022-08-10 15:32:02 +10:00
Christophe Leroy
18db466a9a powerpc: Fix eh field when calling lwarx on PPC32
Commit 9401f4e46c ("powerpc: Use lwarx/ldarx directly instead of
PPC_LWARX/LDARX macros") properly handled the eh field of lwarx
in asm/bitops.h but failed to clear it for PPC32 in
asm/simple_spinlock.h

So, do as in arch_atomic_try_cmpxchg_lock(), set it to 1 if PPC64
but set it to 0 if PPC32. For that use IS_ENABLED(CONFIG_PPC64) which
returns 1 when CONFIG_PPC64 is set and 0 otherwise.

Fixes: 9401f4e46c ("powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros")
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
[mpe: Use symbolic names, use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a1176e19e627dd6a1b8d24c6c457a8ab874b7d12.1659430931.git.christophe.leroy@csgroup.eu
2022-08-10 15:27:44 +10:00
Linus Torvalds
4e23eeebb2 Bitmap patches for v6.0-rc1
This branch consists of:
 
 Qu Wenruo:
 lib: bitmap: fix the duplicated comments on bitmap_to_arr64()
 https://lore.kernel.org/lkml/0d85e1dbad52ad7fb5787c4432bdb36cbd24f632.1656063005.git.wqu@suse.com/
 
 Alexander Lobakin:
 bitops: let optimize out non-atomic bitops on compile-time constants
 https://lore.kernel.org/lkml/20220624121313.2382500-1-alexandr.lobakin@intel.com/T/
 
 Yury Norov:
 lib: cleanup bitmap-related headers
 https://lore.kernel.org/linux-arm-kernel/YtCVeOGLiQ4gNPSf@yury-laptop/T/#m305522194c4d38edfdaffa71fcaaf2e2ca00a961
 
 Alexander Lobakin:
 x86/olpc: fix 'logical not is only applied to the left hand side'
 https://www.spinics.net/lists/kernel/msg4440064.html
 
 Yury Norov:
 lib/nodemask: inline wrappers around bitmap
 https://lore.kernel.org/all/20220723214537.2054208-1-yury.norov@gmail.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmLpVvwACgkQsUSA/Tof
 vsiAHgwAwS9pl8GJ+fKYnue2CYo9349d2oT6BBUs/Rv8uqYEa4QkpYsR7NS733TG
 pos0hhoRvSOzrUP4qppXUjfJ+NkzLgpnKFOeWfFoNAKlHuaaMRvF3Y0Q/P8g0/Kg
 HPWcCQLHyCH9Wjs3e2TTgRjxTrHuruD2VJ401/PX/lw0DicUhmev5mUFa10uwFkP
 ZJRprjoFn9HJ0Hk16pFZDi36d3YumhACOcWRiJdoBDrEPV3S6lm9EeOy/yHBNp5k
 9bKj+RboeT2t70KaZcKv+M5j1nu0cAhl7kRkjcxcmGyimI0l82Vgq9yFxhGqvWg8
 RnCrJ5EaO08FGCAKG9GEwzdiNa24Gdq5XZSpQA7JZHmhmchpnnlNenJicyv0gOQi
 abChZeWSEsyA+78l2+kk9nezfVKUOnKDEZQxBVTOyWsmZYxHZV94oam340VjQDaY
 4/fETdOy/qqPIxnpxAeFGWxZjcVaYiYPLj7KLPMsB0aAAF7pZrem465vSfgbrE81
 +gCdqrWd
 =4dTW
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)

 - optimize out non-atomic bitops on compile-time constants (Alexander
   Lobakin)

 - cleanup bitmap-related headers (Yury Norov)

 - x86/olpc: fix 'logical not is only applied to the left hand side'
   (Alexander Lobakin)

 - lib/nodemask: inline wrappers around bitmap (Yury Norov)

* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
  lib/nodemask: inline next_node_in() and node_random()
  powerpc: drop dependency on <asm/machdep.h> in archrandom.h
  x86/olpc: fix 'logical not is only applied to the left hand side'
  lib/cpumask: move some one-line wrappers to header file
  headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
  headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>
  headers/deps: mm: Optimize <linux/gfp.h> header dependencies
  lib/cpumask: move trivial wrappers around find_bit to the header
  lib/cpumask: change return types to unsigned where appropriate
  cpumask: change return types to bool where appropriate
  lib/bitmap: change type of bitmap_weight to unsigned long
  lib/bitmap: change return types to bool where appropriate
  arm: align find_bit declarations with generic kernel
  iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
  lib/test_bitmap: test the tail after bitmap_to_arr64()
  lib/bitmap: fix off-by-one in bitmap_to_arr64()
  lib: test_bitmap: add compile-time optimization/evaluations assertions
  bitmap: don't assume compiler evaluates small mem*() builtins calls
  net/ice: fix initializing the bitmap in the switch code
  bitops: let optimize out non-atomic bitops on compile-time constants
  ...
2022-08-07 17:52:35 -07:00
Linus Torvalds
cae4199f93 powerpc updates for 6.0
- Add support for syscall stack randomization.
 
  - Add support for atomic operations to the 32 & 64-bit BPF JIT.
 
  - Full support for KASAN on 64-bit Book3E.
 
  - Add a watchdog driver for the new PowerVM hypervisor watchdog.
 
  - Add a number of new selftests for the Power10 PMU support.
 
  - Add a driver for the PowerVM Platform KeyStore.
 
  - Increase the NMI watchdog timeout during live partition migration, to avoid timeouts
    due to increased memory access latency.
 
  - Add support for using the 'linux,pci-domain' device tree property for PCI domain
    assignment.
 
  - Many other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira Rajeev, Bagas
 Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Greg Kurz,
 Haowen Bai, Hari Bathini, Jason A. Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg
 Haefliger, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada,
 Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch, Naveen N. Rao, Nayna
 Jain, Nicholas Piggin, Ning Qiang, Pali Rohár, Petr Mladek, Rashmica Gupta, Sachin Sant,
 Scott Cheloha, Segher Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu
 Jianfeng, Zhouyi Zhou.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmLuAPgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBPpD/9kY/T0qlOXABxlZCgtqeAjPX+2xpnY
 BF+TlsN1TS1auFcEZL2BapmVacsvOeGEFDVuZHZvZJc69Hx+gSjnjFCnZjp6n+Yz
 wt6y9w9Pu0t/sjD5vNQ46O15/dXqm6RoVI7um12j/WLMN8Ko5+x3gKAyQONjQd2/
 1kPcxVH6FUosAdnCuvIcqCX4e4IIHl2ZkitHOTXoQUvUy9oAK/mOBnwqZ6zLGUKC
 E5M+Zyt4RFGxhPs48FkX6Nq6crDGU/P0VJpDKkR/t7GHnE67Bm70gZougAPrzrgP
 nx8zoTWgDKpqDeuqK7pFcyKgNS3dKbxsN3sAfKHOWu/YnV4wMyy+7fmwagMauki7
 lXccKN6F/r+8JcMNx80Jp/dAw3ZdLceP38M3Ryf8IL6lTfkNySumUvrKJn6r1Cu1
 wvzhgyEuDawss9KHdEmXcA2i3+XVZvitaipO7JWUC8pblrP1SJMoPfIIe9zh3y3M
 pyZj0TcGJ8XaK+badvI+PW/K/KeRgXEY8HpC3wDHSoIkli3OE4jDwXn6TiZgvm3n
 k0sKL8YSmQZ8hP8QAkR+r8NQKYqLlfyPxdslK5omDPxfub5Uzk9ZV2Ep7svkaiQn
 Wqjq27Dpz8+w0XPjsQ0Tkv+ByTkOhrawOH7x9SpFLHpv9g5otcYmS79NkO/htx8C
 6LyPNx1VYn5IRA==
 =tRkm
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add support for syscall stack randomization

 - Add support for atomic operations to the 32 & 64-bit BPF JIT

 - Full support for KASAN on 64-bit Book3E

 - Add a watchdog driver for the new PowerVM hypervisor watchdog

 - Add a number of new selftests for the Power10 PMU support

 - Add a driver for the PowerVM Platform KeyStore

 - Increase the NMI watchdog timeout during live partition migration, to
   avoid timeouts due to increased memory access latency

 - Add support for using the 'linux,pci-domain' device tree property for
   PCI domain assignment

 - Many other small features and fixes

Thanks to Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira
Rajeev, Bagas Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas,
Greg Kroah-Hartman, Greg Kurz, Haowen Bai, Hari Bathini, Jason A.
Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg Haefliger, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada,
Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch,
Naveen N.  Rao, Nayna Jain, Nicholas Piggin, Ning Qiang, Pali Rohár,
Petr Mladek, Rashmica Gupta, Sachin Sant, Scott Cheloha, Segher
Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu
Jianfeng, and Zhouyi Zhou.

* tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (191 commits)
  powerpc/64e: Fix kexec build error
  EDAC/ppc_4xx: Include required of_irq header directly
  powerpc/pci: Fix PHB numbering when using opal-phbid
  powerpc/64: Init jump labels before parse_early_param()
  selftests/powerpc: Avoid GCC 12 uninitialised variable warning
  powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address
  powerpc/xive: Fix refcount leak in xive_get_max_prio
  powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader
  powerpc/perf: Include caps feature for power10 DD1 version
  powerpc: add support for syscall stack randomization
  powerpc: Move system_call_exception() to syscall.c
  powerpc/powernv: rename remaining rng powernv_ functions to pnv_
  powerpc/powernv/kvm: Use darn for H_RANDOM on Power9
  powerpc/powernv: Avoid crashing if rng is NULL
  selftests/powerpc: Fix matrix multiply assist test
  powerpc/signal: Update comment for clarity
  powerpc: make facility_unavailable_exception 64s
  powerpc/platforms/83xx/suspend: Remove write-only global variable
  powerpc/platforms/83xx/suspend: Prevent unloading the driver
  powerpc/platforms/83xx/suspend: Reorder to get rid of a forward declaration
  ...
2022-08-06 16:38:17 -07:00
Linus Torvalds
6614a3c316 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
 
 - Some kmemleak fixes from Patrick Wang and Waiman Long
 
 - DAMON updates from SeongJae Park
 
 - memcg debug/visibility work from Roman Gushchin
 
 - vmalloc speedup from Uladzislau Rezki
 
 - more folio conversion work from Matthew Wilcox
 
 - enhancements for coherent device memory mapping from Alex Sierra
 
 - addition of shared pages tracking and CoW support for fsdax, from
   Shiyang Ruan
 
 - hugetlb optimizations from Mike Kravetz
 
 - Mel Gorman has contributed some pagealloc changes to improve latency
   and realtime behaviour.
 
 - mprotect soft-dirty checking has been improved by Peter Xu
 
 - Many other singleton patches all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
 jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
 SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
 =w/UH
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
2022-08-05 16:32:45 -07:00
Linus Torvalds
3bd6e5854b asm-generic: updates for 6.0
There are three independent sets of changes:
 
  - Sai Prakash Ranjan adds tracing support to the asm-generic
    version of the MMIO accessors, which is intended to help
    understand problems with device drivers and has been part
    of Qualcomm's vendor kernels for many years.
 
  - A patch from Sebastian Siewior to rework the handling of
    IRQ stacks in softirqs across architectures, which is
    needed for enabling PREEMPT_RT.
 
  - The last patch to remove the CONFIG_VIRT_TO_BUS option and
    some of the code behind that, after the last users of this
    old interface made it in through the netdev, scsi, media and
    staging trees.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmLqPPEACgkQmmx57+YA
 GNlUbQ/+NpIsiA0JUrCGtySt8KrLHdA2dH9lJOR5/iuxfphscPFfWtpcPvcXQWmt
 a8u7wyI8SHW1ku4U0Y5sO0dBSldDnoIqJ5t4X5d7YNU9yVtEtucqQhZf+GkrPlVD
 1HkRu05B7y0k2BMn7BLhSvkpafs3f1lNGXjs8oFBdOF1/zwp/GjcrfCK7KFzqjwU
 dYrX0SOFlKFd4BZC75VfK+XcKg4LtwIOmJraRRl7alz2Q5Oop2hgjgZxXDPf//vn
 SPOhXJN/97i1FUpY2TkfHVH1NxbPfjCV4pUnjmLG0Y4NSy9UQ/ZcXHcywIdeuhfa
 0LySOIsAqBeccpYYYdg2ubiMDZOXkBfANu/sB9o/EhoHfB4svrbPRDhBIQZMFXJr
 MJYu+IYce2rvydA/nydo4q++pxR8v1ES1ZIo8bDux+q1CI/zbpQV+f98kPVRA0M7
 ajc+5GTIqNIsvHzzadq7eYxcj5Bi8Li2JA9sVkAQ+6iq1TVyeYayMc9eYwONlmqw
 MD+PFYc651pKtXZCfkLXPIKSwS0uPqBndAibuVhpZ0hxWaCBBdKvY9mrWcPxt0kA
 tMR8lrosbbrV2K48BFdWTOHvCs2FhHQxPGVPZ/iWuxTA0hHZ9tUlaEkSX+VM57IU
 KCYQLdWzT8J9vrgqSbgYKlb6pSPz6FIjTfut6NZMmshIbavHV/Q=
 =aTR0
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "There are three independent sets of changes:

   - Sai Prakash Ranjan adds tracing support to the asm-generic version
     of the MMIO accessors, which is intended to help understand
     problems with device drivers and has been part of Qualcomm's vendor
     kernels for many years

   - A patch from Sebastian Siewior to rework the handling of IRQ stacks
     in softirqs across architectures, which is needed for enabling
     PREEMPT_RT

   - The last patch to remove the CONFIG_VIRT_TO_BUS option and some of
     the code behind that, after the last users of this old interface
     made it in through the netdev, scsi, media and staging trees"

* tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  uapi: asm-generic: fcntl: Fix typo 'the the' in comment
  arch/*/: remove CONFIG_VIRT_TO_BUS
  soc: qcom: geni: Disable MMIO tracing for GENI SE
  serial: qcom_geni_serial: Disable MMIO tracing for geni serial
  asm-generic/io: Add logging support for MMIO accessors
  KVM: arm64: Add a flag to disable MMIO trace for nVHE KVM
  lib: Add register read/write tracing support
  drm/meson: Fix overflow implicit truncation warnings
  irqchip/tegra: Fix overflow implicit truncation warnings
  coresight: etm4x: Use asm-generic IO memory barriers
  arm64: io: Use asm-generic high level MMIO accessors
  arch/*: Disable softirq stacks on PREEMPT_RT.
2022-08-05 10:07:23 -07:00
Linus Torvalds
eff0cb3d91 pci-v5.20-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmLr+2wUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxfZg//eChkC2EUdT6K3zuQDbJJhsGcuOQF
 lnZuUyDn4xw7BkEoZf8V6YdAnp7VvgKhLOq1/q3Geu/LBbCaczoEogOCaR/WcVOs
 C+MsN0RWZQtgfuZKncQoqp25NeLPK9PFToeiIX/xViAYZF7NVjDY7XQiZHQ6JkEA
 /7cUqv/4nS3KCMsKjfmiOxGnqohMWtICiw9qjFvJ40PEDnNB1b53rkiVTxBFePpI
 ePfsRfi/C7klE3xNfoiEgrPp+Jfw+oShsCwXUsId7bEL2oLBc7ClqP05ZYZD3bTK
 QQYyZ12Cq8TysciYpUGBjBnywUHS5DIO5YaV3wxyVAR2Z+6GY2/QVjOa2kKvoK0o
 Hba6TJf8bL58AhSI8Q62pBM0sS7dqJSff+9c2BGpZvII5spP/rQQLlJO56TJjwkw
 Dlf0d3thhZOc9vSKjKw+0v0FdAyc4L11EOwUsw95jZeT5WWgqJYGFnWPZwqBI1KM
 DI1E5wVO5tA2H3NEn+BTTHbLWL+UppqyXPXBHiW52b2q5Bt8fJWMsFvnEEjclxmG
 pYCI7VgF8jqbYKxjobxPFY2x6PH9hfaGMxwzZSdOX6e/Eh+1esgyyaC5APpCO+Pp
 e4OkJaOzCmggrD0jYeLWu+yDm5KRrYo5cdfKHrKgAof0Am41lAa1OhJ2iH4ckNqP
 1qmHereDOe0zNVw=
 =9TAR
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Consolidate duplicated 'next function' scanning and extend to allow
     'isolated functions' on s390, similar to existing hypervisors
     (Niklas Schnelle)

  Resource management:
   - Implement pci_iobar_pfn() for sparc, which allows us to remove the
     sparc-specific pci_mmap_page_range() and pci_mmap_resource_range().

     This removes the ability to map the entire PCI I/O space using
     /proc/bus/pci, but we believe that's already been broken since
     v2.6.28 (Arnd Bergmann)

   - Move common PCI definitions to asm-generic/pci.h and rework others
     to be be more specific and more encapsulated in arches that need
     them (Stafford Horne)

  Power management:

   - Convert drivers to new *_PM_OPS macros to avoid need for '#ifdef
     CONFIG_PM_SLEEP' or '__maybe_unused' (Bjorn Helgaas)

  Virtualization:

   - Add ACS quirk for Broadcom BCM5750x multifunction NICs that isolate
     the functions but don't advertise an ACS capability (Pavan Chebbi)

  Error handling:

   - Clear PCI Status register during enumeration in case firmware left
     errors logged (Kai-Heng Feng)

   - When we have native control of AER, enable error reporting for all
     devices that support AER. Previously only a few drivers enabled
     this (Stefan Roese)

   - Keep AER error reporting enabled for switches. Previously we
     enabled this during enumeration but immediately disabled it (Stefan
     Roese)

   - Iterate over error counters instead of error strings to avoid
     printing junk in AER sysfs counters (Mohamed Khalfella)

  ASPM:

   - Remove pcie_aspm_pm_state_change() so ASPM config changes, e.g.,
     via sysfs, are not lost across power state changes (Kai-Heng Feng)

  Endpoint framework:

   - Don't stop an EPC when unbinding an EPF from it (Shunsuke Mie)

  Endpoint embedded DMA controller driver:

   - Simplify and clean up support for the DesignWare embedded DMA
     (eDMA) controller (Frank Li, Serge Semin)

  Broadcom STB PCIe controller driver:

   - Avoid config space accesses when link is down because we can't
     recover from the CPU aborts these cause (Jim Quinlan)

   - Look for power regulators described under Root Ports in DT and
     enable them before scanning the secondary bus (Jim Quinlan)

   - Disable/enable regulators in suspend/resume (Jim Quinlan)

  Freescale i.MX6 PCIe controller driver:

   - Simplify and clean up clock and PHY management (Richard Zhu)

   - Disable/enable regulators in suspend/resume (Richard Zhu)

   - Set PCIE_DBI_RO_WR_EN before writing DBI registers (Richard Zhu)

   - Allow speeds faster than Gen2 (Richard Zhu)

   - Make link being down a non-fatal error so controller probe doesn't
     fail if there are no Endpoints connected (Richard Zhu)

  Loongson PCIe controller driver:

   - Add ACPI and MCFG support for Loongson LS7A (Huacai Chen)

   - Avoid config reads to non-existent LS2K/LS7A devices because a
     hardware defect causes machine hangs (Huacai Chen)

   - Work around LS7A integrated devices that report incorrect Interrupt
     Pin values (Jianmin Lv)

  Marvell Aardvark PCIe controller driver:

   - Add support for AER and Slot capability on emulated bridge (Pali
     Rohár)

  MediaTek PCIe controller driver:

   - Add Airoha EN7532 to DT binding (John Crispin)

   - Allow building of driver for ARCH_AIROHA (Felix Fietkau)

  MediaTek PCIe Gen3 controller driver:

   - Print decoded LTSSM state when the link doesn't come up (Jianjun
     Wang)

  NVIDIA Tegra194 PCIe controller driver:

   - Convert DT binding to json-schema (Vidya Sagar)

   - Add DT bindings and driver support for Tegra234 Root Port and
     Endpoint mode (Vidya Sagar)

   - Fix some Root Port interrupt handling issues (Vidya Sagar)

   - Set default Max Payload Size to 256 bytes (Vidya Sagar)

   - Fix Data Link Feature capability programming (Vidya Sagar)

   - Extend Endpoint mode support to devices beyond Controller-5 (Vidya
     Sagar)

  Qualcomm PCIe controller driver:

   - Rework clock, reset, PHY power-on ordering to avoid hangs and
     improve consistency (Robert Marko, Christian Marangi)

   - Move pipe_clk handling to PHY drivers (Dmitry Baryshkov)

   - Add IPQ60xx support (Selvam Sathappan Periakaruppan)

   - Allow ASPM L1 and substates for 2.7.0 (Krishna chaitanya chundru)

   - Add support for more than 32 MSI interrupts (Dmitry Baryshkov)

  Renesas R-Car PCIe controller driver:

   - Convert DT binding to json-schema (Herve Codina)

   - Add Renesas RZ/N1D (R9A06G032) to rcar-gen2 DT binding and driver
     (Herve Codina)

  Samsung Exynos PCIe controller driver:

   - Fix phy-exynos-pcie driver so it follows the 'phy_init() before
     phy_power_on()' PHY programming model (Marek Szyprowski)

  Synopsys DesignWare PCIe controller driver:

   - Simplify and clean up the DWC core extensively (Serge Semin)

   - Fix an issue with programming the ATU for regions that cross a 4GB
     boundary (Serge Semin)

   - Enable the CDM check if 'snps,enable-cdm-check' exists; previously
     we skipped it if 'num-lanes' was absent (Serge Semin)

   - Allocate a 32-bit DMA-able page to be MSI target instead of using a
     driver data structure that may not be addressable with 32-bit
     address (Will McVicker)

   - Add DWC core support for more than 32 MSI interrupts (Dmitry
     Baryshkov)

  Xilinx Versal CPM PCIe controller driver:

   - Add DT binding and driver support for Versal CPM5 Gen5 Root Port
     (Bharat Kumar Gogada)"

* tag 'pci-v5.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (150 commits)
  PCI: imx6: Support more than Gen2 speed link mode
  PCI: imx6: Set PCIE_DBI_RO_WR_EN before writing DBI registers
  PCI: imx6: Reformat suspend callback to keep symmetric with resume
  PCI: imx6: Move the imx6_pcie_ltssm_disable() earlier
  PCI: imx6: Disable clocks in reverse order of enable
  PCI: imx6: Do not hide PHY driver callbacks and refine the error handling
  PCI: imx6: Reduce resume time by only starting link if it was up before suspend
  PCI: imx6: Mark the link down as non-fatal error
  PCI: imx6: Move regulator enable out of imx6_pcie_deassert_core_reset()
  PCI: imx6: Turn off regulator when system is in suspend mode
  PCI: imx6: Call host init function directly in resume
  PCI: imx6: Disable i.MX6QDL clock when disabling ref clocks
  PCI: imx6: Propagate .host_init() errors to caller
  PCI: imx6: Collect clock enables in imx6_pcie_clk_enable()
  PCI: imx6: Factor out ref clock disable to match enable
  PCI: imx6: Move imx6_pcie_clk_disable() earlier
  PCI: imx6: Move imx6_pcie_enable_ref_clk() earlier
  PCI: imx6: Move PHY management functions together
  PCI: imx6: Move imx6_pcie_grp_offset(), imx6_pcie_configure_type() earlier
  PCI: imx6: Convert to NOIRQ_SYSTEM_SLEEP_PM_OPS()
  ...
2022-08-04 19:30:35 -07:00
Linus Torvalds
5264406cdb iov_iter work, part 1 - isolated cleanups and optimizations.
One of the goals is to reduce the overhead of using ->read_iter()
 and ->write_iter() instead of ->read()/->write(); new_sync_{read,write}()
 has a surprising amount of overhead, in particular inside iocb_flags().
 That's why the beginning of the series is in this pile; it's not directly
 iov_iter-related, but it's a part of the same work...
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYurGOQAKCRBZ7Krx/gZQ
 6ysyAP91lvBfMRepcxpd9kvtuzWkU8A3rfSziZZteEHANB9Q7QEAiPn2a2OjWkcZ
 uAyUWfCkHCNx+dSMkEvUgR5okQ0exAM=
 =9UCV
 -----END PGP SIGNATURE-----

Merge tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs iov_iter updates from Al Viro:
 "Part 1 - isolated cleanups and optimizations.

  One of the goals is to reduce the overhead of using ->read_iter() and
  ->write_iter() instead of ->read()/->write().

  new_sync_{read,write}() has a surprising amount of overhead, in
  particular inside iocb_flags(). That's the explanation for the
  beginning of the series is in this pile; it's not directly
  iov_iter-related, but it's a part of the same work..."

* tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  first_iovec_segment(): just return address
  iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
  iov_iter: first_{iovec,bvec}_segment() - simplify a bit
  iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
  iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
  iov_iter_bvec_advance(): don't bother with bvec_iter
  copy_page_{to,from}_iter(): switch iovec variants to generic
  keep iocb_flags() result cached in struct file
  iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
  struct file: use anonymous union member for rcuhead and llist
  btrfs: use IOMAP_DIO_NOSYNC
  teach iomap_dio_rw() to suppress dsync
  No need of likely/unlikely on calls of check_copy_size()
2022-08-03 13:50:22 -07:00
Linus Torvalds
7d9d077c78 RCU pull request for v5.20 (or whatever)
This pull request contains the following branches:
 
 doc.2022.06.21a: Documentation updates.
 
 fixes.2022.07.19a: Miscellaneous fixes.
 
 nocb.2022.07.19a: Callback-offload updates, perhaps most notably a new
 	RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to
 	be offloaded at boot time, regardless of kernel boot parameters.
 	This is useful to battery-powered systems such as ChromeOS
 	and Android.  In addition, a new RCU_NOCB_CPU_CB_BOOST kernel
 	boot parameter prevents offloaded callbacks from interfering
 	with real-time workloads and with energy-efficiency mechanisms.
 
 poll.2022.07.21a: Polled grace-period updates, perhaps most notably
 	making these APIs account for both normal and expedited grace
 	periods.
 
 rcu-tasks.2022.06.21a: Tasks RCU updates, perhaps most notably reducing
 	the CPU overhead of RCU tasks trace grace periods by more than
 	a factor of two on a system with 15,000 tasks.	The reduction
 	is expected to increase with the number of tasks, so it seems
 	reasonable to hypothesize that a system with 150,000 tasks might
 	see a 20-fold reduction in CPU overhead.
 
 torture.2022.06.21a: Torture-test updates.
 
 ctxt.2022.07.05a: Updates that merge RCU's dyntick-idle tracking into
 	context tracking, thus reducing the overhead of transitioning to
 	kernel mode from either idle or nohz_full userspace execution
 	for kernels that track context independently of RCU.  This is
 	expected to be helpful primarily for kernels built with
 	CONFIG_NO_HZ_FULL=y.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLgMcgTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jArXD/0fjbCwqpRjHVTzjMY8jN4zDkqZZD6m
 g8Fx27hZ4ToNFwRptyHwNezrNj14skjAJEXfdjaVw32W62ivXvf0HINvSzsTLCSq
 k2kWyBdXLc9CwY5p5W4smnpn5VoAScjg5PoPL59INoZ/Zziji323C7Zepl/1DYJt
 0T6bPCQjo1ZQoDUCyVpSjDmAqxnderWG0MeJVt74GkLqmnYLANg0GH8c7mH4+9LL
 kVGlLp5nlPgNJ4FEoFdMwNU8T/ETmaVld/m2dkiawjkXjJzB2XKtBigU91DDmXz5
 7DIdV4ABrxiy4kGNqtIe/jFgnKyVD7xiDpyfjd6KTeDr/rDS8u2ZH7+1iHsyz3g0
 Np/tS3vcd0KR+gI/d0eXxPbgm5sKlCmKw/nU2eArpW/+4LmVXBUfHTG9Jg+LJmBc
 JrUh6aEdIZJZHgv/nOQBNig7GJW43IG50rjuJxAuzcxiZNEG5lUSS23ysaA9CPCL
 PxRWKSxIEfK3kdmvVO5IIbKTQmIBGWlcWMTcYictFSVfBgcCXpPAksGvqA5JiUkc
 egW+xLFo/7K+E158vSKsVqlWZcEeUbsNJ88QOlpqnRgH++I2Yv/LhK41XfJfpH+Y
 ALxVaDd+mAq6v+qSHNVq9wT3ozXIPy/zK1hDlMIqx40h2YvaEsH4je+521oSoN9r
 vX60+QNxvUBLwA==
 =vUNm
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes

 - Callback-offload updates, perhaps most notably a new
   RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
   offloaded at boot time, regardless of kernel boot parameters.

   This is useful to battery-powered systems such as ChromeOS and
   Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
   parameter prevents offloaded callbacks from interfering with
   real-time workloads and with energy-efficiency mechanisms

 - Polled grace-period updates, perhaps most notably making these APIs
   account for both normal and expedited grace periods

 - Tasks RCU updates, perhaps most notably reducing the CPU overhead of
   RCU tasks trace grace periods by more than a factor of two on a
   system with 15,000 tasks.

   The reduction is expected to increase with the number of tasks, so it
   seems reasonable to hypothesize that a system with 150,000 tasks
   might see a 20-fold reduction in CPU overhead

 - Torture-test updates

 - Updates that merge RCU's dyntick-idle tracking into context tracking,
   thus reducing the overhead of transitioning to kernel mode from
   either idle or nohz_full userspace execution for kernels that track
   context independently of RCU.

   This is expected to be helpful primarily for kernels built with
   CONFIG_NO_HZ_FULL=y

* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
  rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
  rcu: Diagnose extended sync_rcu_do_polled_gp() loops
  rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
  rcutorture: Test polled expedited grace-period primitives
  rcu: Add polled expedited grace-period primitives
  rcutorture: Verify that polled GP API sees synchronous grace periods
  rcu: Make Tiny RCU grace periods visible to polled APIs
  rcu: Make polled grace-period API account for expedited grace periods
  rcu: Switch polled grace-period APIs to ->gp_seq_polled
  rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
  rcu/nocb: Add option to opt rcuo kthreads out of RT priority
  rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
  rcu/nocb: Add an option to offload all CPUs on boot
  rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
  rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
  rcu/nocb: Add/del rdp to iterate from rcuog itself
  rcu/tree: Add comment to describe GP-done condition in fqs loop
  rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
  rcu/kvfree: Remove useless monitor_todo flag
  rcu: Cleanup RCU urgency state for offline CPU
  ...
2022-08-02 19:12:45 -07:00
Linus Torvalds
a0b09f2d6f Random number generator updates for Linux 6.0-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmLnDOwACgkQSfxwEqXe
 A65Fiw//Z0YaPejSslQIGitQ1b0XzdWBhyJArYDieaaiQRXMqlaSKlIUqHz38xb7
 +FykUY51/SJLjHV2riPxq1OK3/MPmk6VlTd0HHihcHVmg77oZcFcv2tPnDpZoqND
 TsBOujLbXKwxP8tNFedRY/4+K7w+ue9BTfDjuH7aCtz7uWd+4cNJmPg3x9FCfkMA
 +hbcRluwE9W3Pg4OCKwv+qxL0JF3qQtNKEOp1wpnjGAZZW/I9gFNgFBEkykvcAsj
 TkIRDc3agPFj6QgDeRIgLdnf9KCsLubKAg5oJneeCvQztJJUCSkn8nQXxpx+4sLo
 GsRgvCdfL/GyJqfSAzQJVYDHKtKMkJiCiWCC/oOALR8dzHJfSlULDAjbY1m/DAr9
 at+vi4678Or7TNx2ZSaUlCXXKZ+UT7yWMlQWax9JuxGk1hGYP5/eT1AH5SGjqUwF
 w1q8oyzxt1vUcnOzEddFXPFirnqqhAk4dQFtu83+xKM4ZssMVyeB4NZdEhAdW0ng
 MX+RjrVj4l5gWWuoS0Cx3LUxDCgV6WT0dN+Vl9axAZkoJJbcXLEmXwQ6NbzTLPWg
 1/MT7qFTxNcTCeAArMdZvvFbeh7pOBXO42pafrK/7vDRnTMUIw9tqXNLQUfvdFQp
 F5flPgiVRHDU2vSzKIFtnPTyXU0RBBGvNb4n0ss2ehH2DSsCxYE=
 =Zy3d
 -----END PGP SIGNATURE-----

Merge tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "Though there's been a decent amount of RNG-related development during
  this last cycle, not all of it is coming through this tree, as this
  cycle saw a shift toward tackling early boot time seeding issues,
  which took place in other trees as well.

  Here's a summary of the various patches:

   - The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
     option have been removed, as they overlapped with the more widely
     supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
     "random.trust_cpu". This change allowed simplifying a bit of arch
     code.

   - x86's RDRAND boot time test has been made a bit more robust, with
     RDRAND disabled if it's clearly producing bogus results. This would
     be a tip.git commit, technically, but I took it through random.git
     to avoid a large merge conflict.

   - The RNG has long since mixed in a timestamp very early in boot, on
     the premise that a computer that does the same things, but does so
     starting at different points in wall time, could be made to still
     produce a different RNG state. Unfortunately, the clock isn't set
     early in boot on all systems, so now we mix in that timestamp when
     the time is actually set.

   - User Mode Linux now uses the host OS's getrandom() syscall to
     generate a bootloader RNG seed and later on treats getrandom() as
     the platform's RDRAND-like faculty.

   - The arch_get_random_{seed_,}_long() family of functions is now
     arch_get_random_{seed_,}_longs(), which enables certain platforms,
     such as s390, to exploit considerable performance advantages from
     requesting multiple CPU random numbers at once, while at the same
     time compiling down to the same code as before on platforms like
     x86.

   - A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
     Uros.

   - A comment spelling fix"

More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:

  https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/

* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: correct spelling of "overwrites"
  random: handle archrandom with multiple longs
  um: seed rng using host OS rng
  random: use try_cmpxchg in _credit_init_bits
  timekeeping: contribute wall clock to rng on time change
  x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
  random: remove CONFIG_ARCH_RANDOM
2022-08-02 17:31:35 -07:00
Linus Torvalds
043402495d integrity-v6.0
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCYulqTBQcem9oYXJAbGlu
 dXguaWJtLmNvbQAKCRDLwZzRsCrn5SBBAP9nbAW1SPa/hDqbrclHdDrS59VkSVwv
 6ZO2yAmxJAptHwD+JzyJpJiZsqVN/Tu85V1PqeAt9c8az8f3CfDBp2+w7AA=
 =Ad+c
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
 "Aside from the one EVM cleanup patch, all the other changes are kexec
  related.

  On different architectures different keyrings are used to verify the
  kexec'ed kernel image signature. Here are a number of preparatory
  cleanup patches and the patches themselves for making the keyrings -
  builtin_trusted_keyring, .machine, .secondary_trusted_keyring, and
  .platform - consistent across the different architectures"

* tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification
  arm64: kexec_file: use more system keyrings to verify kernel image signature
  kexec, KEYS: make the code in bzImage64_verify_sig generic
  kexec: clean up arch_kexec_kernel_verify_sig
  kexec: drop weak attribute from functions
  kexec_file: drop weak attribute from functions
  evm: Use IS_ENABLED to initialize .enabled
2022-08-02 15:21:18 -07:00
Yury Norov
3e73120315 powerpc: drop dependency on <asm/machdep.h> in archrandom.h
archrandom.h includes <asm/machdep.h> to refer ppc_md. This causes
circular header dependency, if generic nodemask.h  includes random.h:

In file included from include/linux/cred.h:16,
                 from include/linux/seq_file.h:13,
                 from arch/powerpc/include/asm/machdep.h:6,
                 from arch/powerpc/include/asm/archrandom.h:5,
                 from include/linux/random.h:109,
                 from include/linux/nodemask.h:97,
                 from include/linux/list_lru.h:12,
                 from include/linux/fs.h:13,
                 from include/linux/compat.h:17,
                 from arch/powerpc/kernel/asm-offsets.c:12:
include/linux/sched.h:1203:9: error: unknown type name 'nodemask_t'
 1203 |         nodemask_t                      mems_allowed;
      |         ^~~~~~~~~~

Fix it by removing <asm/machdep.h> dependency from archrandom.h

Now as arch_get_random_seed_long() moved to c-file, and not exported,
it's not available for modules. As Michael Ellerman says:

  I think we actually don't need it exported to modules, I think it's
  a private detail of the RNG <-> architecture interface, not something
  that modules should be calling.

CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Paul Mackerras <paulus@samba.org>
CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: linuxppc-dev@lists.ozlabs.org
Suggested-by: Michael Ellerman <mpe@ellerman.id.au> (for non-exporting)
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-08-01 08:13:21 -07:00