Commit Graph

213 Commits

Author SHA1 Message Date
Robin Murphy
344323e042 arm64: Rewrite __arch_clear_user()
Now that we're always using STTR variants rather than abstracting two
different addressing modes, the user_ldst macro here is frankly more
obfuscating than helpful. Rewrite __arch_clear_user() with regular
USER() annotations so that it's clearer what's going on, and take the
opportunity to minimise the branchiness in the most common paths, while
also allowing the exception fixup to return an accurate result.

Apparently some folks examine large reads from /dev/zero closely enough
to notice the loop being hot, so align it per the other critical loops
(presumably around a typical instruction fetch granularity).

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1cbd78b12c076a8ad4656a345811cfb9425df0b3.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01 18:34:38 +01:00
Robin Murphy
9e51cafd78 arm64: Better optimised memchr()
Although we implement our own assembly version of memchr(), it turns
out to be barely any better than what GCC can generate for the generic
C version (and would go wrong if the size_t argument were ever large
enough to be interpreted as negative). Unfortunately we can't import the
tuned implementation from the Arm optimized-routines library, since that
has some Advanced SIMD parts which are not really viable for general
kernel library code. What we can do, however, is pep things up with some
relatively straightforward word-at-a-time logic for larger calls.

Adding some timing to optimized-routines' memchr() test for a simple
benchmark, overall this version comes in around half as fast as the SIMD
code, but still nearly 4x faster than our existing implementation.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/58471b42f9287e039dafa9e5e7035077152438fd.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01 18:34:38 +01:00
Robin Murphy
285133040e arm64: Import latest memcpy()/memmove() implementation
Import the latest implementation of memcpy(), based on the
upstream code of string/aarch64/memcpy.S at commit afd6244 from
https://github.com/ARM-software/optimized-routines, and subsuming
memmove() in the process.

Note that for simplicity Arm have chosen to contribute this code
to Linux under GPLv2 rather than the original MIT license.

Note also that the needs of the usercopy routines vs. regular memcpy()
have now diverged so far that we abandon the shared template idea
and the damage which that incurred to the tuning of LDP/STP loops.
We'll be back to tackle those routines separately in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/3c953af43506581b2422f61952261e76949ba711.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01 18:34:38 +01:00
Sam Tebbs
020b199bc7 arm64: Import latest version of Cortex Strings' strncmp
Import the latest version of the former Cortex Strings - now
Arm Optimized Routines - strncmp function based on the upstream
code of string/aarch64/strncmp.S at commit e823e3a from
https://github.com/ARM-software/optimized-routines

Note that for simplicity Arm have chosen to contribute this code
to Linux under GPLv2 rather than the original MIT license.

Signed-off-by: Sam Tebbs <sam.tebbs@arm.com>
[ rm: update attribution and commit message ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/26110bee02ad360596c9a7536af7eaaf6890d0e8.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01 18:34:38 +01:00
Sam Tebbs
325a1de812 arm64: Import updated version of Cortex Strings' strlen
Import an updated version of the former Cortex Strings - now Arm
Optimized Routines - strcmp function. The latest version introduces
Advanced SIMD usage which rules it out for our purposes, but we can
still pick an intermediate improvement from the previous version,
namely string/aarch64/strlen.S at commit 98e4d6a from
https://github.com/ARM-software/optimized-routines

Note that for simplicity Arm have chosen to contribute this code
to Linux under GPLv2 rather than the original MIT license.

Signed-off-by: Sam Tebbs <sam.tebbs@arm.com>
[ rm: update attribution and commit message ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/32e3489398a24b23ae6e996935ac4818f8fd9dfd.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01 18:34:37 +01:00
Sam Tebbs
758602c044 arm64: Import latest version of Cortex Strings' strcmp
Import the latest version of the former Cortex Strings - now
Arm Optimized Routines - strcmp function based on the upstream
code of string/aarch64/strcmp.S at commit afd6244 from
https://github.com/ARM-software/optimized-routines

Note that for simplicity Arm have chosen to contribute this code
to Linux under GPLv2 rather than the original MIT license.

Signed-off-by: Sam Tebbs <sam.tebbs@arm.com>
[ rm: update attribution and commit message ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0fe90c90b96b569fbdfd46e47bd1298abb02079e.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01 18:34:37 +01:00
Sam Tebbs
43de30d367 arm64: Import latest version of Cortex Strings' memcmp
Import the latest version of the former Cortex Strings - now
Arm Optimized Routines - memcmp function based on the upstream
code of string/aarch64/memcmp.S at commit e823e3a from
https://github.com/ARM-software/optimized-routines

Note that for simplicity Arm have chosen to contribute this code
to Linux under GPLv2 rather than the original MIT license.

Signed-off-by: Sam Tebbs <sam.tebbs@arm.com>
[ rm: update attribution and commit message ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/2889de2d41054f3f508fb3addad784a3606ef383.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01 18:34:37 +01:00
Julien Thierry
427bfc59e2 arm64: insn: Add SVE instruction class
SVE has been public for some time now. Let the decoder acknowledge
its existence.

Signed-off-by: Julien Thierry <jthierry@redhat.com>
Link: https://lore.kernel.org/r/20210303170536.1838032-6-jthierry@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-27 17:38:30 +01:00
Julien Thierry
72fd723694 arm64: Move instruction encoder/decoder under lib/
Aarch64 instruction set encoding and decoding logic can prove useful
for some features/tools both part of the kernel and outside the kernel.

Isolate the function dealing only with encoding/decoding instructions,
with minimal dependency on kernel utilities in order to be able to reuse
that code.

Code was only moved, no code should have been added, removed nor
modifier.

Signed-off-by: Julien Thierry <jthierry@redhat.com>
Link: https://lore.kernel.org/r/20210303170536.1838032-5-jthierry@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-27 17:38:30 +01:00
Peter Collingbourne
1cbdf60bd1 kasan: arm64: support specialized outlined tag mismatch checks
By using outlined checks we can achieve a significant code size
improvement by moving the tag-based ASAN checks into separate
functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these
functions have a custom calling convention that preserves most
registers and is specialized to the register containing the address
and the type of access, and as a result we can eliminate the code
size and performance overhead of a standard calling convention such
as AAPCS for these functions.

This change depends on a separate series of changes to Clang [1] to
support outlined checks in the kernel, although the change works fine
without them (we just don't get outlined checks). This is because the
flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang
changes land. The flag was introduced in the Clang 9.0 timeframe as
part of the support for outlined checks in userspace and because our
minimum Clang version is 10.0 we can pass it unconditionally.

Outlined checks require a new runtime function with a custom calling
convention. Add this function to arch/arm64/lib.

I measured the code size of defconfig + tag-based KASAN, as well
as boot time (i.e. time to init launch) on a DragonBoard 845c with
an Android arm64 GKI kernel. The results are below:

                               code size    boot time
CONFIG_KASAN_INLINE=y before    92824064      6.18s
CONFIG_KASAN_INLINE=y after     38822400      6.65s
CONFIG_KASAN_OUTLINE=y          39215616     11.48s

We can see straight away that specialized outlined checks beat the
existing CONFIG_KASAN_OUTLINE=y on both code size and boot time
for tag-based ASAN.

As for the comparison between CONFIG_KASAN_INLINE=y before and after
we saw similar performance numbers in userspace [2] and decided
that since the performance overhead is minimal compared to the
overhead of tag-based ASAN itself as well as compared to the code
size improvements we would just replace the inlined checks with the
specialized outlined checks without the option to select between them,
and that is what I have implemented in this patch.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76
Link: [1] https://reviews.llvm.org/D90426
Link: [2] https://reviews.llvm.org/D56954
Link: https://lore.kernel.org/r/20210526174927.2477847-3-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-26 23:31:26 +01:00
Fuad Tabba
fade9c2c6e arm64: Rename arm64-internal cache maintenance functions
Although naming across the codebase isn't that consistent, it
tends to follow certain patterns. Moreover, the term "flush"
isn't defined in the Arm Architecture reference manual, and might
be interpreted to mean clean, invalidate, or both for a cache.

Rename arm64-internal functions to make the naming internally
consistent, as well as making it consistent with the Arm ARM, by
specifying whether it applies to the instruction, data, or both
caches, whether the operation is a clean, invalidate, or both.
Also specify which point the operation applies to, i.e., to the
point of unification (PoU), coherency (PoC), or persistence
(PoP).

This commit applies the following sed transformation to all files
under arch/arm64:

"s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\
"s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\
"s/\binvalidate_icache_range\b/icache_inval_pou/g;"\
"s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\
"s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\
"s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\
"s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\
"s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\
"s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\
"s/\b__flush_icache_all\b/icache_inval_all_pou/g;"

Note that __clean_dcache_area_poc is deliberately missing a word
boundary check at the beginning in order to match the efistub
symbols in image-vars.h.

Also note that, despite its name, __flush_icache_range operates
on both instruction and data caches. The name change here
reflects that.

No functional change intended.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25 19:27:49 +01:00
Fuad Tabba
f749448edb arm64: __clean_dcache_area_pop to take end parameter instead of size
To be consistent with other functions with similar names and
functionality in cacheflush.h, cache.S, and cachetlb.rst, change
to specify the range in terms of start and end, as opposed to
start and size.

No functional change intended.

Reported-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-15-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25 19:27:49 +01:00
Will Deacon
8d9902055c arm64: lib: Annotate {clear, copy}_page() as position-independent
clear_page() and copy_page() are suitable for use outside of the kernel
address space, so annotate them as position-independent code.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-2-qperret@google.com
2021-03-19 12:01:19 +00:00
Andrey Konovalov
2cb3427642 arm64: kasan: simplify and inline MTE functions
This change provides a simpler implementation of mte_get_mem_tag(),
mte_get_random_tag(), and mte_set_mem_tag_range().

Simplifications include removing system_supports_mte() checks as these
functions are onlye called from KASAN runtime that had already checked
system_supports_mte().  Besides that, size and address alignment checks
are removed from mte_set_mem_tag_range(), as KASAN now does those.

This change also moves these functions into the asm/mte-kasan.h header and
implements mte_set_mem_tag_range() via inline assembly to avoid
unnecessary functions calls.

[vincenzo.frascino@arm.com: fix warning in mte_get_random_tag()]
  Link: https://lkml.kernel.org/r/20210211152208.23811-1-vincenzo.frascino@arm.com

Link: https://lkml.kernel.org/r/a26121b294fdf76e369cb7a74351d1c03a908930.1612546384.git.andreyknvl@google.com
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:03 -08:00
Vincenzo Frascino
85f49cae4d arm64: mte: add in-kernel MTE helpers
Provide helper functions to manipulate allocation and pointer tags for
kernel addresses.

Low-level helper functions (mte_assign_*, written in assembly) operate tag
values from the [0x0, 0xF] range.  High-level helper functions
(mte_get/set_*) use the [0xF0, 0xFF] range to preserve compatibility with
normal kernel pointers that have 0xFF in their top byte.

MTE_GRANULE_SIZE and related definitions are moved to mte-def.h header
that doesn't have any dependencies and is safe to include into any
low-level header.

Link: https://lkml.kernel.org/r/c31bf759b4411b2d98cdd801eb928e241584fd1f.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-22 12:55:07 -08:00
Mark Rutland
7b90dc40e3 arm64: uaccess cleanup macro naming
Now the uaccess primitives use LDTR/STTR unconditionally, the
uao_{ldp,stp,user_alternative} asm macros are misnamed, and have a
redundant argument. Let's remove the redundant argument and rename these
to user_{ldp,stp,ldst} respectively to clean this up.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murohy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02 19:49:11 +00:00
Mark Rutland
9e94fdade4 arm64: uaccess: simplify __copy_user_flushcache()
Currently __copy_user_flushcache() open-codes raw_copy_from_user(), and
doesn't use uaccess_mask_ptr() on the user address. Let's have it call
raw_copy_from_user(), which is both a simplification and ensures that
user pointers are masked under speculation.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02 19:49:10 +00:00
Mark Rutland
e2a2190a80 arm64: uaccess: move uao_* alternatives to asm-uaccess.h
The uao_* alternative asm macros are only used by the uaccess assembly
routines in arch/arm64/lib/, where they are included indirectly via
asm-uaccess.h. Since they're specific to the uaccess assembly (and will
lose the alternatives in subsequent patches), let's move them into
asm-uaccess.h.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
[will: update #include in mte.S to pull in uao asm macros]
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-09 21:49:34 +00:00
Fangrui Song
ec9d78070d arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.S
Commit 39d114ddc6 ("arm64: add KASAN support") added .weak directives to
arch/arm64/lib/mem*.S instead of changing the existing SYM_FUNC_START_PI
macros. This can lead to the assembly snippet `.weak memcpy ... .globl
memcpy` which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL
memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
https://reviews.llvm.org/D90108) will error on such an overridden symbol
binding.

Use the appropriate SYM_FUNC_START_WEAK_PI instead.

Fixes: 39d114ddc6 ("arm64: add KASAN support")
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201029181951.1866093-1-maskray@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-30 08:32:31 +00:00
Steven Price
36943aba91 arm64: mte: Enable swap of tagged pages
When swapping pages out to disk it is necessary to save any tags that
have been set, and restore when swapping back in. Make use of the new
page flag (PG_ARCH_2, locally named PG_mte_tagged) to identify pages
with tags. When swapping out these pages the tags are stored in memory
and later restored when the pages are brought back in. Because shmem can
swap pages back in without restoring the userspace PTE it is also
necessary to add a hook for shmem.

Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: move function prototypes to mte.h]
[catalin.marinas@arm.com: drop '_tags' from arch_swap_restore_tags()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will@kernel.org>
2020-09-04 12:46:07 +01:00
Catalin Marinas
18ddbaa02b arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
Add support for bulk setting/getting of the MTE tags in a tracee's
address space at 'addr' in the ptrace() syscall prototype. 'data' points
to a struct iovec in the tracer's address space with iov_base
representing the address of a tracer's buffer of length iov_len. The
tags to be copied to/from the tracer's buffer are stored as one tag per
byte.

On successfully copying at least one tag, ptrace() returns 0 and updates
the tracer's iov_len with the number of tags copied. In case of error,
either -EIO or -EFAULT is returned, trying to follow the ptrace() man
page.

Note that the tag copying functions are not performance critical,
therefore they lack optimisations found in typical memory copy routines.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alan Hayward <Alan.Hayward@arm.com>
Cc: Luis Machado <luis.machado@linaro.org>
Cc: Omair Javaid <omair.javaid@linaro.org>
2020-09-04 12:46:07 +01:00
Vincenzo Frascino
2563776b41 arm64: mte: Tags-aware copy_{user_,}highpage() implementations
When the Memory Tagging Extension is enabled, the tags need to be
preserved across page copy (e.g. for copy-on-write, page migration).

Introduce MTE-aware copy_{user_,}highpage() functions to copy tags to
the destination if the source page has the PG_mte_tagged flag set.
copy_user_page() does not need to handle tag copying since, with this
patch, it is only called by the DAX code where there is no source page
structure (and no source tags).

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
2020-09-04 12:46:06 +01:00
Catalin Marinas
34bfeea4a9 arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE
Pages allocated by the kernel are not guaranteed to have the tags
zeroed, especially as the kernel does not (yet) use MTE itself. To
ensure the user can still access such pages when mapped into its address
space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged
(PG_arch_2) - is used to track pages with valid allocation tags.

Since the zero page is mapped as pte_special(), it won't be covered by
the above set_pte_at() mechanism. Clear its tags during early MTE
initialisation.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
2020-09-04 12:46:06 +01:00
Linus Torvalds
4152d146ee Merge branch 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux
Pull READ/WRITE_ONCE rework from Will Deacon:
 "This the READ_ONCE rework I've been working on for a while, which
  bumps the minimum GCC version and improves code-gen on arm64 when
  stack protector is enabled"

[ Side note: I'm _really_ tempted to raise the minimum gcc version to
  4.9, so that we can just say that we require _Generic() support.

  That would allow us to more cleanly handle a lot of the cases where we
  depend on very complex macros with 'sizeof' or __builtin_choose_expr()
  with __builtin_types_compatible_p() etc.

  This branch has a workaround for sparse not handling _Generic(),
  either, but that was already fixed in the sparse development branch,
  so it's really just gcc-4.9 that we'd require.   - Linus ]

* 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse
  compiler_types.h: Optimize __unqual_scalar_typeof compilation time
  compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long)
  compiler-types.h: Include naked type in __pick_integer_type() match
  READ_ONCE: Fix comment describing 2x32-bit atomicity
  gcov: Remove old GCC 3.4 support
  arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros
  locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros
  READ_ONCE: Drop pointer qualifiers when reading from scalar types
  READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
  READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
  arm64: csum: Disable KASAN for do_csum()
  fault_inject: Don't rely on "return value" from WRITE_ONCE()
  net: tls: Avoid assigning 'const' pointer to non-const pointer
  netfilter: Avoid assigning 'const' pointer to non-const pointer
  compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-06-10 14:46:54 -07:00
Catalin Marinas
ada66f1837 arm64: Reorder the macro arguments in the copy routines
The current argument order is obviously buggy (memcpy.S):

	macro strb1 ptr, regB, val
	strb \ptr, [\regB], \val
	endm

However, it cancels out as the calling sites in copy_template.S pass the
address as the regB argument.

Mechanically reorder the arguments to match the instruction mnemonics.
There is no difference in objdump before and after this patch.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200429183702.28445-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-29 21:50:01 +01:00
Mark Brown
30218da597 arm64: lib: Consistently enable crc32 extension
Currently most of the assembly files that use architecture extensions
enable them using the .arch directive but crc32.S uses .cpu instead. Move
that over to .arch for consistency.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200414182843.31664-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-28 14:36:32 +01:00
Will Deacon
c6a771d932 arm64: csum: Disable KASAN for do_csum()
do_csum() over-reads the source buffer and therefore abuses
READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for
READ_ONCE_NOCHECK() becoming a macro, and therefore losing its
'__no_sanitize_address' annotation, just annotate do_csum() explicitly
and fall back to normal loads.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-15 21:36:41 +01:00
韩科才
0c837c4f73 arm64: fix spelling mistake "ca not" -> "cannot"
There is a spelling mistake in the comment, Fix it.

Signed-off-by: hankecai <hankecai@bbktel.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17 18:22:40 +00:00
Robin Murphy
e9c7ddbf8b arm64: csum: Optimise IPv6 header checksum
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
about 1.3x-2x faster across a range of microarchitecture/compiler
combinations. Not much in absolute terms, but every little helps.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-09 18:08:25 +00:00
Will Deacon
aa246c056c Merge branch 'for-next/asm-annotations' into for-next/core
* for-next/asm-annotations: (6 commits)
  arm64: kernel: Correct annotation of end of el0_sync
  ...
2020-01-22 11:34:21 +00:00
Will Deacon
4f6cdf296c Merge branches 'for-next/acpi', 'for-next/cpufeatures', 'for-next/csum', 'for-next/e0pd', 'for-next/entry', 'for-next/kbuild', 'for-next/kexec/cleanup', 'for-next/kexec/file-kdump', 'for-next/misc', 'for-next/nofpsimd', 'for-next/perf' and 'for-next/scs' into for-next/core
* for-next/acpi:
  ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()

* for-next/cpufeatures: (2 commits)
  arm64: Introduce ID_ISAR6 CPU register
  ...

* for-next/csum: (2 commits)
  arm64: csum: Fix pathological zero-length calls
  ...

* for-next/e0pd: (7 commits)
  arm64: kconfig: Fix alignment of E0PD help text
  ...

* for-next/entry: (5 commits)
  arm64: entry: cleanup sp_el0 manipulation
  ...

* for-next/kbuild: (4 commits)
  arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
  ...

* for-next/kexec/cleanup: (11 commits)
  Revert "arm64: kexec: make dtb_mem always enabled"
  ...

* for-next/kexec/file-kdump: (2 commits)
  arm64: kexec_file: add crash dump support
  ...

* for-next/misc: (12 commits)
  arm64: entry: Avoid empty alternatives entries
  ...

* for-next/nofpsimd: (7 commits)
  arm64: nofpsmid: Handle TIF_FOREIGN_FPSTATE flag cleanly
  ...

* for-next/perf: (2 commits)
  perf/imx_ddr: Fix cpu hotplug state cleanup
  ...

* for-next/scs: (6 commits)
  arm64: kernel: avoid x18 in __cpu_soft_restart
  ...
2020-01-22 11:32:31 +00:00
Robin Murphy
c2c24edb1d arm64: csum: Fix pathological zero-length calls
In validating the checksumming results of the new routine, I sadly
neglected to test its not-checksumming results. Thus it slipped through
that the one case where @buff is already dword-aligned and @len = 0
manages to defeat the tail-masking logic and behave as if @len = 8.
For a zero length it doesn't make much sense to deference @buff anyway,
so just add an early return (which has essentially zero impact on
performance).

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-17 16:05:50 +00:00
Ard Biesheuvel
7f153ccb9b arm64/lib: copy_page: avoid x18 register in assembler code
Register x18 will no longer be used as a caller save register in the
future, so stop using it in the copy_page() code.

Link: https://patchwork.kernel.org/patch/9836869/
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Sami: changed the offset and bias to be explicit]
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-16 17:32:56 +00:00
Robin Murphy
5777eaed56 arm64: Implement optimised checksum routine
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.

The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).

Reported-by: Lingyan Huang <huanglingyan2@huawei.com>
Tested-by: Lingyan Huang <huanglingyan2@huawei.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-16 15:23:29 +00:00
Mark Brown
3ac0f4526d arm64: lib: Use modern annotations for assembly functions
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC and also add a new annotation for static functions which previously
had no ENTRY equivalent. Update the annotations in the library code to the
new macros.

Signed-off-by: Mark Brown <broonie@kernel.org>
[will: Use SYM_FUNC_START_WEAK_PI]
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08 12:23:02 +00:00
Pavel Tatashin
e50be648aa arm64: uaccess: Remove uaccess_*_not_uao asm macros
It is safer and simpler to drop the uaccess assembly macros in favour of
inline C functions. Although this bloats the Image size slightly, it
aligns our user copy routines with '{get,put}_user()' and generally
makes the code a lot easier to reason about.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
[will: tweaked commit message and changed temporary variable names]
Signed-off-by: Will Deacon <will@kernel.org>
2019-11-20 18:51:54 +00:00
Pavel Tatashin
94bb804e1e arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault
A number of our uaccess routines ('__arch_clear_user()' and
'__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
encounter an unhandled fault whilst accessing userspace.

For CPUs implementing both hardware PAN and UAO, this bug has no effect
when both extensions are in use by the kernel.

For CPUs implementing hardware PAN but not UAO, this means that a kernel
using hardware PAN may execute portions of code with PAN inadvertently
disabled, opening us up to potential security vulnerabilities that rely
on userspace access from within the kernel which would usually be
prevented by this mechanism. In other words, parts of the kernel run the
same way as they would on a CPU without PAN implemented/emulated at all.

For CPUs not implementing hardware PAN and instead relying on software
emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
much worse. Calling 'schedule()' with software PAN disabled means that
the next task will execute in the kernel using the page-table and ASID
of the previous process even after 'switch_mm()', since the actual
hardware switch is deferred until return to userspace. At this point, or
if there is a intermediate call to 'uaccess_enable()', the page-table
and ASID of the new process are installed. Sadly, due to the changes
introduced by KPTI, this is not an atomic operation and there is a very
small window (two instructions) where the CPU is configured with the
page-table of the old task and the ASID of the new task; a speculative
access in this state is disastrous because it would corrupt the TLB
entries for the new task with mappings from the previous address space.

As Pavel explains:

  | I was able to reproduce memory corruption problem on Broadcom's SoC
  | ARMv8-A like this:
  |
  | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
  | stack is accessed and copied.
  |
  | The test program performed the following on every CPU and forking
  | many processes:
  |
  |	unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
  |				  MAP_SHARED | MAP_ANONYMOUS, -1, 0);
  |	map[0] = getpid();
  |	sched_yield();
  |	if (map[0] != getpid()) {
  |		fprintf(stderr, "Corruption detected!");
  |	}
  |	munmap(map, PAGE_SIZE);
  |
  | From time to time I was getting map[0] to contain pid for a
  | different process.

Ensure that PAN is re-enabled when returning after an unhandled user
fault from our uaccess routines.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: 338d4f49d6 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
[will: rewrote commit message]
Signed-off-by: Will Deacon <will@kernel.org>
2019-11-20 18:51:47 +00:00
Will Deacon
61b7cddfe8 Merge branch 'for-next/atomics' into for-next/core
* for-next/atomics: (10 commits)
  Rework LSE instruction selection to use static keys instead of alternatives
2019-08-30 12:55:39 +01:00
Andrew Murray
eb3aabbfbf arm64: atomics: Remove atomic_ll_sc compilation unit
We no longer fall back to out-of-line atomics on systems with
CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set.

Remove the unused compilation unit which provided these symbols.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-08-29 15:53:49 +01:00
Leo Yan
42d038c4fb arm64: Add support for function error injection
Inspired by the commit 7cd01b08d3 ("powerpc: Add support for function
error injection"), this patch supports function error injection for
Arm64.

This patch mainly support two functions: one is regs_set_return_value()
which is used to overwrite the return value; the another function is
override_function_with_return() which is to override the probed
function returning and jump to its caller.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2019-08-07 13:53:09 +01:00
Thomas Gleixner
d2912cb15b treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:55 +02:00
Thomas Gleixner
caab277b1d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not see http www gnu org
  licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:07 +02:00
Torsten Duwe
edf072d36d arm64: Makefile: Replace -pg with CC_FLAGS_FTRACE
In preparation for arm64 supporting ftrace built on other compiler
options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE)
flags, whatever these may be, rather than assuming '-pg'.

There should be no functional change as a result of this patch.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-09 10:34:59 +01:00
Mark Rutland
ac0e8c72b0 arm64: string: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the string routine
exports to the assembly files the functions are defined in. Routines
which should only be exported for !KASAN builds are exported using the
EXPORT_SYMBOL_NOKASAN() helper.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:12 +00:00
Mark Rutland
56c08ec516 arm64: uaccess: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the uaccess exports
to the assembly files the functions are defined in.  As we have to
include <asm/assembler.h>, the existing includes are fixed to follow the
usual ordering conventions.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
50fdecb292 arm64: page: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the copy_page and
clear_page exports to the assembly files the functions are defined in.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
abb77f3d96 arm64: tishift: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the tishift exports
to the assembly file the functions are defined in.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Jackie Liu
cc9f8349cb arm64: crypto: add NEON accelerated XOR implementation
This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:

[ 93.837726] xor: measuring software checksum speed
[ 93.874039]   8regs  : 7123.200 MB/sec
[ 93.914038]   32regs : 7180.300 MB/sec
[ 93.954043]   arm64_neon: 9856.000 MB/sec
[ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec)

I believe this code can bring some optimization for
all arm64 platform. thanks for Ard Biesheuvel's suggestions.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 16:47:06 +00:00
Ard Biesheuvel
efdb25efc7 arm64/lib: improve CRC32 performance for deep pipelines
Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.

Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.

Tested using the following test program:

  #include <stdlib.h>

  extern void crc32_le(unsigned short, char const*, int);

  int main(void)
  {
    static const char buf[4096];

    srand(20181126);

    for (int i = 0; i < 100 * 1000 * 1000; i++)
      crc32_le(0, buf, rand() % 1024);

    return 0;
  }

On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from

  $ time ./crc32

  real  0m10.149s
  user  0m10.149s
  sys   0m0.000s

to

  $ time ./crc32

  real  0m7.915s
  user  0m7.915s
  sys   0m0.000s

Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:58:04 +00:00
Andrey Ryabinin
19a2ca0fb5 arm64: lib: use C string functions with KASAN enabled
ARM64 has asm implementation of memchr(), memcmp(), str[r]chr(),
str[n]cmp(), str[n]len().  KASAN don't see memory accesses in asm code,
thus it can potentially miss many bugs.

Ifdef out __HAVE_ARCH_* defines of these functions when KASAN is enabled,
so the generic implementations from lib/string.c will be used.

We can't just remove the asm functions because efistub uses them.  And we
can't have two non-weak functions either, so declare the asm functions as
weak.

Link: http://lkml.kernel.org/r/20180920135631.23833-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:18 -07:00
Tri Vo
2a6c7c367d arm64: lse: remove -fcall-used-x0 flag
x0 is not callee-saved in the PCS. So there is no need to specify
-fcall-used-x0.

Clang doesn't currently support -fcall-used flags. This patch will help
building the kernel with clang.

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Tri Vo <trong@android.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-24 10:56:24 +01:00
Ard Biesheuvel
7481cddf29 arm64/lib: add accelerated crc32 routines
Unlike crc32c(), which is wired up to the crypto API internally so the
optimal driver is selected based on the platform's capabilities,
crc32_le() is implemented as a library function using a slice-by-8 table
based C implementation. Even though few of the call sites may be
bottlenecks, calling a time variant implementation with a non-negligible
D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates
support for the CRC32 instructions that were optional in ARMv8.0, but are
already widely available, even on the Cortex-A53 based Raspberry Pi.

So implement routines that use these instructions if available, and fall
back to the existing generic routines otherwise. The selection is based
on alternatives patching.

Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since
CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE,
this just codifies the status quo.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-10 16:10:53 +01:00
Will Deacon
7c8fc35dfc locking/atomics/arm64: Replace our atomic/lock bitop implementations with asm-generic
The <asm-generic/bitops/{atomic,lock}.h> implementations are built around
the atomic-fetch ops, which we implement efficiently for both LSE and
LL/SC systems. Use that instead of our hand-rolled, out-of-line bitops.S.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: yamada.masahiro@socionext.com
Link: https://lore.kernel.org/lkml/1529412794-17720-9-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:52:12 +02:00
Jason A. Donenfeld
255845fc43 arm64: export tishift functions to modules
Otherwise modules that use these arithmetic operations will fail to
link. We accomplish this with the usual EXPORT_SYMBOL, which on most
architectures goes in the .S file but the ARM64 maintainers prefer that
insead it goes into arm64ksyms.

While we're at it, we also fix this up to use SPDX, and I personally
choose to relicense this as GPL2||BSD so that these symbols don't need
to be export_symbol_gpl, so all modules can use the routines, since
these are important general purpose compiler-generated function calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: PaX Team <pageexec@freemail.hu>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-05-21 19:00:48 +01:00
Mark Rutland
3789c122d0 arm64: avoid instrumenting atomic_ll_sc.o
Our out-of-line atomics are built with a special calling convention,
preventing pointless stack spilling, and allowing us to patch call sites
with ARMv8.1 atomic instructions.

Instrumentation inserted by the compiler may result in calls to
functions not following this special calling convention, resulting in
registers being unexpectedly clobbered, and various problems resulting
from this.

For example, if a kernel is built with KCOV and ARM64_LSE_ATOMICS, the
compiler inserts calls to __sanitizer_cov_trace_pc in the prologues of
the atomic functions. This has been observed to result in spurious
cmpxchg failures, leading to a hang early on in the boot process.

This patch avoids such issues by preventing instrumentation of our
out-of-line atomics.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-27 12:14:44 +01:00
Linus Torvalds
9c2dd8405c DeviceTree updates for 4.17:
- Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a bunch
   more warnings (hidden behind W=1).
 
 - Build dtc lexer and parser files instead of using shipped versions.
 
 - Rework overlay apply API to take an FDT as input and apply overlays in
   a single step.
 
 - Add a phandle lookup cache. This improves boot time by hundreds of
   msec on systems with large DT.
 
 - Add trivial mcp4017/18/19 potentiometers bindings.
 
 - Remove VLA stack usage in DT code.
 -----BEGIN PGP SIGNATURE-----
 
 iQItBAABCAAXBQJaxiUdEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcM0+w/+
 L7nkug1Hz2476eRrsn5bm6oOO0vCrhQcDTJ/AlvU1YO8XBVgGEetLDs8drmvD0/O
 FQDcpumX6G0eFoHTnTNWD7keM+0nY5jZBIAqKQNa9a0HKkjYc4HO5Ot9E02XG8W8
 759vvCcGeJpysoCls9u8OplzqiDyNVQJd1a0fLivtafdKypuE/Ywh15wrzckPO+F
 bxqWQd+uwm98ZVz8/o3vfYtAOJmA06A+hsyVLXYu7iKQcXYVxi+ZNbRV44MQ50NI
 1w5m8GgtWe4A2lpXjmeXk1VmLPO3eEgQKnBoH7gcJmCHaVg/SVfMgBscuGSQZRQa
 rQvaYRUNGJ0Mtji8EZpZb5Vip4ZCDtZCQBB3snN24CvGXI6WuIIg/8ncXt0AfLqn
 pxFmC32ZcwvJR2NCpPVfTgILm6foT9IzJWKl6SQLVtqqVp9nPFua7T3l8AQak7FB
 2MMaaqh7L0l0za0ZgArZZo/IWUHRb0MwZdXAkqBZlQ6f3IBqGQeKCnkclAeH8qYr
 OorCOmC2OlKXLPHoz8XHeBzPRdnv1dQ//gEkKXBJ2igLU03hRWv9dxnGju/45sun
 Ifo79uBAUc9s3F4Kjd/zs2iLztuPrYCSICHtJh9LPeOxoV1ZUNt+6Cm23yQ014Uo
 /GsFW+lzh7c9wB1eETjPHd1WuYXiSrmE4zvbdykyLCk=
 =ZWpa
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree updates from Rob Herring:

 - Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a
   bunch more warnings (hidden behind W=1).

 - Build dtc lexer and parser files instead of using shipped versions.

 - Rework overlay apply API to take an FDT as input and apply overlays
   in a single step.

 - Add a phandle lookup cache. This improves boot time by hundreds of
   msec on systems with large DT.

 - Add trivial mcp4017/18/19 potentiometers bindings.

 - Remove VLA stack usage in DT code.

* tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (26 commits)
  of: unittest: fix an error code in of_unittest_apply_overlay()
  of: unittest: move misplaced function declaration
  of: unittest: Remove VLA stack usage
  of: overlay: Fix forgotten reference to of_overlay_apply()
  of: Documentation: Fix forgotten reference to of_overlay_apply()
  of: unittest: local return value variable related cleanups
  of: unittest: remove unneeded local return value variables
  dt-bindings: trivial: add various mcp4017/18/19 potentiometers
  of: unittest: fix an error test in of_unittest_overlay_8()
  of: cache phandle nodes to reduce cost of of_find_node_by_phandle()
  dt-bindings: rockchip-dw-mshc: use consistent clock names
  MAINTAINERS: Add linux/of_*.h headers to appropriate subsystems
  scripts: turn off some new dtc warnings by default
  scripts/dtc: Update to upstream version v1.4.6-9-gaadd0b65c987
  scripts/dtc: generate lexer and parser during build instead of shipping
  powerpc: boot: add strrchr function
  of: overlay: do not include path in full_name of added nodes
  of: unittest: clean up changeset test
  arm64/efi: Make strrchr() available to the EFI namespace
  ARM: boot: add strrchr function
  ...
2018-04-05 21:03:42 -07:00
Will Deacon
6b24442d68 arm64: lse: Pass -fomit-frame-pointer to out-of-line ll/sc atomics
In cases where x30 is used as a temporary in the out-of-line ll/sc atomics
(e.g. atomic_fetch_add), the compiler tends to put out a full stackframe,
which included pointing the x29 at the new frame.

Since these things aren't traceable anyway, we can pass -fomit-frame-pointer
to reduce the work when spilling. Since this is incompatible with -pg, we
also remove that from the CFLAGS for this file.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 18:52:32 +00:00
Rob Herring
fdfb69a725 arm64/efi: Make strrchr() available to the EFI namespace
libfdt gained a new dependency on strrchr, so make it available to the
EFI namespace before we update libfdt.

Thanks to Ard for providing this fix.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2018-03-05 13:45:38 -06:00
Will Deacon
f71c2ffcb2 arm64: uaccess: Mask __user pointers for __arch_{clear, copy_*}_user
Like we've done for get_user and put_user, ensure that user pointers
are masked before invoking the underlying __arch_{clear,copy_*}_user
operations.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:40 +00:00
Catalin Marinas
6b88a32c7a arm64: kpti: Fix the interaction between ASID switching and software PAN
With ARM64_SW_TTBR0_PAN enabled, the exception entry code checks the
active ASID to decide whether user access was enabled (non-zero ASID)
when the exception was taken. On return from exception, if user access
was previously disabled, it re-instates TTBR0_EL1 from the per-thread
saved value (updated in switch_mm() or efi_set_pgd()).

Commit 7655abb953 ("arm64: mm: Move ASID from TTBR0 to TTBR1") makes a
TTBR0_EL1 + ASID switching non-atomic. Subsequently, commit 27a921e757
("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN") changes the
__uaccess_ttbr0_disable() function and asm macro to first write the
reserved TTBR0_EL1 followed by the ASID=0 update in TTBR1_EL1. If an
exception occurs between these two, the exception return code will
re-instate a valid TTBR0_EL1. Similar scenario can happen in
cpu_switch_mm() between setting the reserved TTBR0_EL1 and the ASID
update in cpu_do_switch_mm().

This patch reverts the entry.S check for ASID == 0 to TTBR0_EL1 and
disables the interrupts around the TTBR0_EL1 and ASID switching code in
__uaccess_ttbr0_disable(). It also ensures that, when returning from the
EFI runtime services, efi_set_pgd() doesn't leave a non-zero ASID in
TTBR1_EL1 by using uaccess_ttbr0_{enable,disable}.

The accesses to current_thread_info()->ttbr0 are updated to use
READ_ONCE/WRITE_ONCE.

As a safety measure, __uaccess_ttbr0_enable() always masks out any
existing non-zero ASID TTBR1_EL1 before writing in the new ASID.

Fixes: 27a921e757 ("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN")
Acked-by: Will Deacon <will.deacon@arm.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 17:37:48 +00:00
Jason A. Donenfeld
f5ed22e264 arm64: make label allocation style consistent in tishift
This is entirely cosmetic, but somehow it was missed when sending
differing versions of this patch. This just makes the file a bit more
uniform.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-02 14:22:18 +00:00
Will Deacon
27a921e757 arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN
With the ASID now installed in TTBR1, we can re-enable ARM64_SW_TTBR0_PAN
by ensuring that we switch to a reserved ASID of zero when disabling
user access and restore the active user ASID on the uaccess enable path.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-11 13:40:35 +00:00
Linus Torvalds
c9b012e5f4 arm64 updates for 4.15
Plenty of acronym soup here:
 
 - Initial support for the Scalable Vector Extension (SVE)
 - Improved handling for SError interrupts (required to handle RAS events)
 - Enable GCC support for 128-bit integer types
 - Remove kernel text addresses from backtraces and register dumps
 - Use of WFE to implement long delay()s
 - ACPI IORT updates from Lorenzo Pieralisi
 - Perf PMU driver for the Statistical Profiling Extension (SPE)
 - Perf PMU driver for Hisilicon's system PMUs
 - Misc cleanups and non-critical fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJaCcLqAAoJELescNyEwWM0JREH/2FbmD/khGzEtP8LW+o9D8iV
 TBM02uWQxS1bbO1pV2vb+512YQO+iWfeQwJH9Jv2FZcrMvFv7uGRnYgAnJuXNGrl
 W+LL6OhN22A24LSawC437RU3Xe7GqrtONIY/yLeJBPablfcDGzPK1eHRA0pUzcyX
 VlyDruSHWX44VGBPV6JRd3x0vxpV8syeKOjbRvopRfn3Nwkbd76V3YSfEgwoTG5W
 ET1sOnXLmHHdeifn/l1Am5FX1FYstpcd7usUTJ4Oto8y7e09tw3bGJCD0aMJ3vow
 v1pCUWohEw7fHqoPc9rTrc1QEnkdML4vjJvMPUzwyTfPrN+7uEuMIEeJierW+qE=
 =0qrg
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The big highlight is support for the Scalable Vector Extension (SVE)
  which required extensive ABI work to ensure we don't break existing
  applications by blowing away their signal stack with the rather large
  new vector context (<= 2 kbit per vector register). There's further
  work to be done optimising things like exception return, but the ABI
  is solid now.

  Much of the line count comes from some new PMU drivers we have, but
  they're pretty self-contained and I suspect we'll have more of them in
  future.

  Plenty of acronym soup here:

   - initial support for the Scalable Vector Extension (SVE)

   - improved handling for SError interrupts (required to handle RAS
     events)

   - enable GCC support for 128-bit integer types

   - remove kernel text addresses from backtraces and register dumps

   - use of WFE to implement long delay()s

   - ACPI IORT updates from Lorenzo Pieralisi

   - perf PMU driver for the Statistical Profiling Extension (SPE)

   - perf PMU driver for Hisilicon's system PMUs

   - misc cleanups and non-critical fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
  arm64: Make ARMV8_DEPRECATED depend on SYSCTL
  arm64: Implement __lshrti3 library function
  arm64: support __int128 on gcc 5+
  arm64/sve: Add documentation
  arm64/sve: Detect SVE and activate runtime support
  arm64/sve: KVM: Hide SVE from CPU features exposed to guests
  arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
  arm64/sve: KVM: Prevent guests from using SVE
  arm64/sve: Add sysctl to set the default vector length for new processes
  arm64/sve: Add prctl controls for userspace vector length management
  arm64/sve: ptrace and ELF coredump support
  arm64/sve: Preserve SVE registers around EFI runtime service calls
  arm64/sve: Preserve SVE registers around kernel-mode NEON use
  arm64/sve: Probe SVE capabilities and usable vector lengths
  arm64: cpufeature: Move sys_caps_initialised declarations
  arm64/sve: Backend logic for setting the vector length
  arm64/sve: Signal handling support
  arm64/sve: Support vector length resetting for new processes
  arm64/sve: Core task context handling
  arm64/sve: Low-level CPU setup
  ...
2017-11-15 10:56:56 -08:00
Jason A. Donenfeld
9bfe7553fa arm64: Implement __lshrti3 library function
Commit fb8722735f ("arm64: support __int128 on gcc 5+") added support
for the __int128 data type, but this breaks the build in some configurations
where GCC ends up emitting calls to the __lshrti3 helper in libgcc, which
results in a link error:

  kernel/sched/fair.o: In function `__calc_delta':
  fair.c:(.text+0xca0): undefined reference to `__lshrti3'
  kernel/time/timekeeping.o: In function `timekeeping_resume':
  timekeeping.c:(.text+0x3f60): undefined reference to `__lshrti3'
  make: *** [vmlinux] Error 1

Fix the build by providing an implementation of __lshrti3, like we do
already for __ashlti3 and __ashrti3.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-13 16:05:50 +00:00
Jason A. Donenfeld
fb8722735f arm64: support __int128 on gcc 5+
Versions of gcc prior to gcc 5 emitted a __multi3 function call when
dealing with TI types, resulting in failures when trying to link to
libgcc, and more generally, bad performance. However, since gcc 5,
the compiler supports actually emitting fast instructions, which means
we can at long last enable this option and receive the speedups.

The gcc commit that added proper Aarch64 support is:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d
This commit appears to be part of the gcc 5 release.

There are still a few instructions, __ashlti3 and __ashrti3, which
require libgcc, which is fine. Rather than linking to libgcc, we
simply provide them ourselves, since they're not that complicated.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03 15:24:21 +00:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Julien Thierry
7b77452ec5 arm64: use WFE for long delays
The current delay implementation uses the yield instruction, which is a
hint that it is beneficial to schedule another thread. As this is a hint,
it may be implemented as a NOP, causing all delays to be busy loops. This
is the case for many existing CPUs.

Taking advantage of the generic timer sending periodic events to all
cores, we can use WFE during delays to reduce power consumption. This is
beneficial only for delays longer than the period of the timer event
stream.

If timer event stream is not enabled, delays will behave as yield/busy
loops.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-13 18:56:15 +01:00
Robin Murphy
21cfa0e96d arm64: uaccess: Add the uaccess_flushcache.c file
The uaccess_flushcache.c file was inadvertently dropped by the
maintainer in a previous commit. Add it back.

Fixes: 5d7bdeb1ee ("arm64: uaccess: Implement *_flushcache variants")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-10 10:49:21 +01:00
Robin Murphy
5d7bdeb1ee arm64: uaccess: Implement *_flushcache variants
Implement the set of copy functions with guarantees of a clean cache
upon completion necessary to support the pmem driver.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09 12:16:26 +01:00
Ard Biesheuvel
288be97cc7 arm64/lib: copy_page: use consistent prefetch stride
The optional prefetch instructions in the copy_page() routine are
inconsistent: at the start of the function, two cachelines are
prefetched beyond the one being loaded in the first iteration, but
in the loop, the prefetch is one more line ahead. This appears to
be unintentional, so let's fix it.

While at it, fix the comment style and white space.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-25 10:04:42 +01:00
Al Viro
92430dab36 arm64: switch to RAW_COPY_USER
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28 18:23:24 -04:00
Masahiro Yamada
9a284e5c9e scripts/spelling.txt: add "overwritting" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  overwritting||overwriting

Link: http://lkml.kernel.org/r/1481573103-11329-29-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Al Viro
b4b8664d29 arm64: don't pull uaccess.h into *.S
Split asm-only parts of arm64 uaccess.h into a new header and use that
from *.S.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-26 13:05:17 -05:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Catalin Marinas
bd38967d40 arm64: Factor out PAN enabling/disabling into separate uaccess_* macros
This patch moves the directly coded alternatives for turning PAN on/off
into separate uaccess_{enable,disable} macros or functions. The asm
macros take a few arguments which will be used in subsequent patches.

Note that any (unlikely) access that the compiler might generate between
uaccess_enable() and uaccess_disable(), other than those explicitly
specified by the user access code, will not be protected by PAN.

Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-21 17:33:47 +00:00
Al Viro
2692a71bbd Merge branch 'work.uaccess' into for-linus 2016-10-14 20:42:44 -04:00
Al Viro
4855bd255f arm64: don't zero in __copy_from_user{,_inatomic}
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-15 19:51:56 -04:00
Mark Rutland
6ba3b554f5 arm64: use alternative auto-nop
Make use of the new alternative_if and alternative_else_nop_endif and
get rid of our homebew NOP sleds, making the code simpler to read.

Note that for cpu_do_switch_mm the ret has been moved out of the
alternative sequence, and in the default case there will be three
additional NOPs executed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-12 10:46:07 +01:00
Yang Shi
bffe1baff5 arm64: kasan: instrument user memory access API
The upstream commit 1771c6e1a5
("x86/kasan: instrument user memory access API") added KASAN instrument to
x86 user memory access API, so added such instrument to ARM64 too.

Define __copy_to/from_user in C in order to add kasan_check_read/write call,
rename assembly implementation to __arch_copy_to/from_user.

Tested by test_kasan module.

Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-21 15:37:18 +01:00
Linus Torvalds
588ab3f9af arm64 updates for 4.6:
- Initial page table creation reworked to avoid breaking large block
   mappings (huge pages) into smaller ones. The ARM architecture requires
   break-before-make in such cases to avoid TLB conflicts but that's not
   always possible on live page tables
 
 - Kernel virtual memory layout: the kernel image is no longer linked to
   the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
   the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
   in physical RAM
 
 - Kernel ASLR: position independent kernel Image and modules being
   randomly mapped in the vmalloc space with the randomness is provided
   by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
   acked by Matt Fleming)
 
 - Implement relative exception tables for arm64, required by KASLR
   (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
   actual x86 conversion to deferred to 4.7 because of the merge
   dependencies)
 
 - Support for the User Access Override feature of ARMv8.2: this allows
   uaccess functions (get_user etc.) to be implemented using LDTR/STTR
   instructions. Such instructions, when run by the kernel, perform
   unprivileged accesses adding an extra level of protection. The
   set_fs() macro is used to "upgrade" such instruction to privileged
   accesses via the UAO bit
 
 - Half-precision floating point support (part of ARMv8.2)
 
 - Optimisations for CPUs with or without a hardware prefetcher (using
   run-time code patching)
 
 - copy_page performance improvement to deal with 128 bytes at a time
 
 - Sanity checks on the CPU capabilities (via CPUID) to prevent
   incompatible secondary CPUs from being brought up (e.g. weird
   big.LITTLE configurations)
 
 - valid_user_regs() reworked for better sanity check of the sigcontext
   information (restored pstate information)
 
 - ACPI parking protocol implementation
 
 - CONFIG_DEBUG_RODATA enabled by default
 
 - VDSO code marked as read-only
 
 - DEBUG_PAGEALLOC support
 
 - ARCH_HAS_UBSAN_SANITIZE_ALL enabled
 
 - Erratum workaround Cavium ThunderX SoC
 
 - set_pte_at() fix for PROT_NONE mappings
 
 - Code clean-ups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
 RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
 hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
 50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
 DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
 YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
 OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
 EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
 3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
 dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
 xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
 LepccTgykiUBqW5TRzPz
 =/oS+
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Here are the main arm64 updates for 4.6.  There are some relatively
  intrusive changes to support KASLR, the reworking of the kernel
  virtual memory layout and initial page table creation.

  Summary:

   - Initial page table creation reworked to avoid breaking large block
     mappings (huge pages) into smaller ones.  The ARM architecture
     requires break-before-make in such cases to avoid TLB conflicts but
     that's not always possible on live page tables

   - Kernel virtual memory layout: the kernel image is no longer linked
     to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
     of the vmalloc space, allowing the kernel to be loaded (nearly)
     anywhere in physical RAM

   - Kernel ASLR: position independent kernel Image and modules being
     randomly mapped in the vmalloc space with the randomness is
     provided by UEFI (efi_get_random_bytes() patches merged via the
     arm64 tree, acked by Matt Fleming)

   - Implement relative exception tables for arm64, required by KASLR
     (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
     but actual x86 conversion to deferred to 4.7 because of the merge
     dependencies)

   - Support for the User Access Override feature of ARMv8.2: this
     allows uaccess functions (get_user etc.) to be implemented using
     LDTR/STTR instructions.  Such instructions, when run by the kernel,
     perform unprivileged accesses adding an extra level of protection.
     The set_fs() macro is used to "upgrade" such instruction to
     privileged accesses via the UAO bit

   - Half-precision floating point support (part of ARMv8.2)

   - Optimisations for CPUs with or without a hardware prefetcher (using
     run-time code patching)

   - copy_page performance improvement to deal with 128 bytes at a time

   - Sanity checks on the CPU capabilities (via CPUID) to prevent
     incompatible secondary CPUs from being brought up (e.g.  weird
     big.LITTLE configurations)

   - valid_user_regs() reworked for better sanity check of the
     sigcontext information (restored pstate information)

   - ACPI parking protocol implementation

   - CONFIG_DEBUG_RODATA enabled by default

   - VDSO code marked as read-only

   - DEBUG_PAGEALLOC support

   - ARCH_HAS_UBSAN_SANITIZE_ALL enabled

   - Erratum workaround Cavium ThunderX SoC

   - set_pte_at() fix for PROT_NONE mappings

   - Code clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
  arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
  arm64: kasan: Use actual memory node when populating the kernel image shadow
  arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
  arm64: Fix misspellings in comments.
  arm64: efi: add missing frame pointer assignment
  arm64: make mrs_s prefixing implicit in read_cpuid
  arm64: enable CONFIG_DEBUG_RODATA by default
  arm64: Rework valid_user_regs
  arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
  arm64: KVM: Move kvm_call_hyp back to its original localtion
  arm64: mm: treat memstart_addr as a signed quantity
  arm64: mm: list kernel sections in order
  arm64: lse: deal with clobbered IP registers after branch via PLT
  arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
  arm64: kconfig: add submenu for 8.2 architectural features
  arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
  arm64: Add support for Half precision floating point
  arm64: Remove fixmap include fragility
  arm64: Add workaround for Cavium erratum 27456
  arm64: mm: Mark .rodata as RO
  ...
2016-03-17 20:03:47 -07:00
Adam Buchbinder
ef769e3208 arm64: Fix misspellings in comments.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-04 18:19:17 +00:00
Ard Biesheuvel
5be8b70af1 arm64: lse: deal with clobbered IP registers after branch via PLT
The LSE atomics implementation uses runtime patching to patch in calls
to out of line non-LSE atomics implementations on cores that lack hardware
support for LSE. To avoid paying the overhead cost of a function call even
if no call ends up being made, the bl instruction is kept invisible to the
compiler, and the out of line implementations preserve all registers, not
just the ones that they are required to preserve as per the AAPCS64.

However, commit fd045f6cd9 ("arm64: add support for module PLTs") added
support for routing branch instructions via veneers if the branch target
offset exceeds the range of the ordinary relative branch instructions.
Since this deals with jump and call instructions that are exposed to ELF
relocations, the PLT code uses x16 to hold the address of the branch target
when it performs an indirect branch-to-register, something which is
explicitly allowed by the AAPCS64 (and ordinary compiler generated code
does not expect register x16 or x17 to retain their values across a bl
instruction).

Since the lse runtime patched bl instructions don't adhere to the AAPCS64,
they don't deal with this clobbering of registers x16 and x17. So add them
to the clobber list of the asm() statements that perform the call
instructions, and drop x16 and x17 from the list of registers that are
callee saved in the out of line non-LSE implementations.

In addition, since we have given these functions two scratch registers,
they no longer need to stack/unstack temp registers.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: factored clobber list into #define, updated Makefile comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-26 18:35:02 +00:00
James Morse
7054419600 arm64: kernel: Don't toggle PAN on systems with UAO
If a CPU supports both Privileged Access Never (PAN) and User Access
Override (UAO), we don't need to disable/re-enable PAN round all
copy_to_user() like calls.

UAO alternatives cause these calls to use the 'unprivileged' load/store
instructions, which are overridden to be the privileged kind when
fs==KERNEL_DS.

This patch changes the copy_to_user() calls to have their PAN toggling
depend on a new composite 'feature' ARM64_ALT_PAN_NOT_UAO.

If both features are detected, PAN will be enabled, but the copy_to_user()
alternatives will not be applied. This means PAN will be enabled all the
time for these functions. If only PAN is detected, the toggling will be
enabled as normal.

This will save the time taken to disable/re-enable PAN, and allow us to
catch copy_to_user() accesses that occur with fs==KERNEL_DS.

Futex and swp-emulation code continue to hang their PAN toggling code on
ARM64_HAS_PAN.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18 17:27:05 +00:00
James Morse
57f4959bad arm64: kernel: Add support for User Access Override
'User Access Override' is a new ARMv8.2 feature which allows the
unprivileged load and store instructions to be overridden to behave in
the normal way.

This patch converts {get,put}_user() and friends to use ldtr*/sttr*
instructions - so that they can only access EL0 memory, then enables
UAO when fs==KERNEL_DS so that these functions can access kernel memory.

This allows user space's read/write permissions to be checked against the
page tables, instead of testing addr<USER_DS, then using the kernel's
read/write permissions.

Signed-off-by: James Morse <james.morse@arm.com>
[catalin.marinas@arm.com: move uao_thread_switch() above dsb()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18 17:27:04 +00:00
Andrew Pinski
60e0a09db2 arm64: lib: patch in prfm for copy_page if requested
On ThunderX T88 pass 1 and pass 2, there is no hardware prefetching so
we need to patch in explicit software prefetching instructions

Prefetching improves this code by 60% over the original code and 2x
over the code without prefetching for the affected hardware using the
benchmark code at https://github.com/apinski-cavium/copy_page_benchmark

Signed-off-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16 15:12:33 +00:00
Will Deacon
223e23e8aa arm64: lib: improve copy_page to deal with 128 bytes at a time
We want to avoid lots of different copy_page implementations, settling
for something that is "good enough" everywhere and hopefully easy to
understand and maintain whilst we're at it.

This patch reworks our copy_page implementation based on discussions
with Cavium on the list and benchmarking on Cortex-A processors so that:

  - The loop is unrolled to copy 128 bytes per iteration

  - The reads are offset so that we read from the next 128-byte block
    in the same iteration that we store the previous block

  - Explicit prefetch instructions are removed for now, since they hurt
    performance on CPUs with hardware prefetching

  - The loop exit condition is calculated at the start of the loop

Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16 15:12:33 +00:00
Thierry Reding
7f4e346263 arm64/efi: Make strnlen() available to the EFI namespace
Changes introduced in the upstream version of libfdt pulled in by commit
91feabc2e2 ("scripts/dtc: Update to upstream commit b06e55c88b9b") use
the strnlen() function, which isn't currently available to the EFI name-
space. Add it to the EFI namespace to avoid a linker error.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Rob Herring <robh@kernel.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-16 10:32:10 +00:00
Andrey Ryabinin
39d114ddc6 arm64: add KASAN support
This patch adds arch specific code for kernel address sanitizer
(see Documentation/kasan.txt).

1/8 of kernel addresses reserved for shadow memory. There was no
big enough hole for this, so virtual addresses for shadow were
stolen from vmalloc area.

At early boot stage the whole shadow region populated with just
one physical page (kasan_zero_page). Later, this page reused
as readonly zero shadow for some memory that KASan currently
don't track (vmalloc).
After mapping the physical memory, pages for shadow memory are
allocated and mapped.

Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.
KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.
Some files built without kasan instrumentation (e.g. mm/slub.c).
Original mem* function replaced (via #define) with prefixed variants
to disable memory access checks for such files.

Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-12 17:46:36 +01:00
Ard Biesheuvel
207918461e arm64: use ENDPIPROC() to annotate position independent assembler routines
For more control over which functions are called with the MMU off or
with the UEFI 1:1 mapping active, annotate some assembler routines as
position independent. This is done by introducing ENDPIPROC(), which
replaces the ENDPROC() declaration of those routines.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-12 16:19:45 +01:00
Feng Kan
404268828c arm64: copy_to-from-in_user optimization using copy template
This patch optimize copy_to-from-in_user for arm 64bit architecture. The
copy template is used as template file for all the copy*.S files. Minor
change was made to it to accommodate the copy to/from/in user files.

Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: Balamurugan Shanmugam <bshanmugam@apm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 11:34:44 +01:00
Feng Kan
e5c88e3f2f arm64: Change memcpy in kernel to use the copy template file
This converts the memcpy.S to use the copy template file. The copy
template file was based originally on the memcpy.S

Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: Balamurugan Shanmugam <bshanmugam@apm.com>
[catalin.marinas@arm.com: removed tmp3(w) .req statements as they are not used]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 11:34:43 +01:00
Will Deacon
0ea366f5e1 arm64: atomics: prefetch the destination word for write prior to stxr
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch makes use of prfm to prefetch cachelines for write prior to
ldxr/stxr loops when using the ll/sc atomic routines.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 15:28:53 +01:00
Will Deacon
084f903727 arm64: bitops: patch in lse instructions when supported by the CPU
On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our bitops functions so that
LSE atomic instructions are used instead.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 15:28:51 +01:00
Will Deacon
c0385b24af arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics
In order to patch in the new atomic instructions at runtime, we need to
generate wrappers around the out-of-line exclusive load/store atomics.

This patch adds a new Kconfig option, CONFIG_ARM64_LSE_ATOMICS. which
causes our atomic functions to branch to the out-of-line ll/sc
implementations. To avoid the register spill overhead of the PCS, the
out-of-line functions are compiled with specific compiler flags to
force out-of-line save/restore of any registers that are usually
caller-saved.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 15:28:50 +01:00
James Morse
338d4f49d6 arm64: kernel: Add support for Privileged Access Never
'Privileged Access Never' is a new arm8.1 feature which prevents
privileged code from accessing any virtual address where read or write
access is also permitted at EL0.

This patch enables the PAN feature on all CPUs, and modifies {get,put}_user
helpers temporarily to permit access.

This will catch kernel bugs where user memory is accessed directly.
'Unprivileged loads and stores' using ldtrb et al are unaffected by PAN.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[will: use ALTERNATIVE in asm and tidy up pan_enable check]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 11:08:41 +01:00
Will Deacon
23e9499446 arm64: lib: use pair accessors for copy_*_user routines
The AArch64 instruction set contains load/store pair memory accessors,
so use these in our copy_*_user routines to transfer 16 bytes per
iteration.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 11:08:39 +01:00
Kyle McMartin
97fc15436b arm64: __clear_user: handle exceptions on strb
ARM64 currently doesn't fix up faults on the single-byte (strb) case of
__clear_user... which means that we can cause a nasty kernel panic as an
ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero.
i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.)

This is a pretty obscure bug in the general case since we'll only
__do_kernel_fault (since there's no extable entry for pc) if the
mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll
always fault.

if (!down_read_trylock(&mm->mmap_sem)) {
	if (!user_mode(regs) && !search_exception_tables(regs->pc))
		goto no_context;
retry:
	down_read(&mm->mmap_sem);
} else {
	/*
	 * The above down_read_trylock() might have succeeded in
	 * which
	 * case, we'll have missed the might_sleep() from
	 * down_read().
	 */
	might_sleep();
	if (!user_mode(regs) && !search_exception_tables(regs->pc))
		goto no_context;
}

Fix that by adding an extable entry for the strb instruction, since it
touches user memory, similar to the other stores in __clear_user.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Reported-by: Miloš Prchlík <mprchlik@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-11-13 15:21:26 +00:00
zhichang.yuan
0a42cb0a6f arm64: lib: Implement optimized string length routines
This patch, based on Linaro's Cortex Strings library, adds
an assembly optimized strlen() and strnlen() functions.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23 15:17:12 +01:00
zhichang.yuan
192c4d902f arm64: lib: Implement optimized string compare routines
This patch, based on Linaro's Cortex Strings library, adds
an assembly optimized strcmp() and strncmp() functions.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23 15:16:59 +01:00
zhichang.yuan
d875c9b372 arm64: lib: Implement optimized memcmp routine
This patch, based on Linaro's Cortex Strings library, adds
an assembly optimized memcmp() function.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23 15:07:57 +01:00
zhichang.yuan
b29a51fe0e arm64: lib: Implement optimized memset routine
This patch, based on Linaro's Cortex Strings library, improves
the performance of the assembly optimized memset() function.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23 15:07:48 +01:00
zhichang.yuan
280adc1951 arm64: lib: Implement optimized memmove routine
This patch, based on Linaro's Cortex Strings library, improves
the performance of the assembly optimized memmove() function.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23 15:07:35 +01:00
zhichang.yuan
808dbac6b5 arm64: lib: Implement optimized memcpy routine
This patch, based on Linaro's Cortex Strings library, improves
the performance of the assembly optimized memcpy() function.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23 15:06:53 +01:00
Will Deacon
8e86f0b409 arm64: atomics: fix use of acquire + release for full barrier semantics
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	<Access [A]>

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	<Access [A]>

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	<Access [A]>

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07 16:45:43 +00:00
Will Deacon
12a0ef7b0a arm64: use generic strnlen_user and strncpy_from_user functions
This patch implements the word-at-a-time interface for arm64 using the
same algorithm as ARM. We use the fls64 macro, which expands to a clz
instruction via a compiler builtin. Big-endian configurations make use
of the implementation from asm-generic.

With this implemented, we can replace our byte-at-a-time strnlen_user
and strncpy_from_user functions with the optimised generic versions.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:43:06 +00:00
Catalin Marinas
420c158dcf arm64: Treat the bitops index argument as an 'int'
The bitops prototype use an 'int' as the bit index type but the asm
implementation assume it to be a 'long'. Since the compiler does not
guarantee zeroing the upper 32-bits in a register when used as 'int',
change the bitops implementation accordingly.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-05-08 10:33:17 +01:00
Catalin Marinas
16c85a1fd7 arm64: Use acquire/release semantics instead of explicit DMB
This patch changes the test_and_*_bit functions to use the
load-acquire/store-release instructions instead of explicit DMB.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-04-30 15:58:37 +01:00
Mark Rutland
c47d6a04e6 arm64: klib: bitops: fix unpredictable stxr usage
We're currently relying on unpredictable behaviour in our testops
(test_and_*_bit), as stxr is unpredictable when the status register and
the source register are the same

This patch changes reallocates the status register so as to bring us back into
the realm of predictable behaviour. Boot tested on an AEMv8 model.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-04-30 15:53:01 +01:00
Catalin Marinas
6247958653 arm64: klib: Optimised atomic bitops
This patch implements the AArch64-specific atomic bitops functions using
exclusive memory accesses to avoid locking.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-03-21 17:39:31 +00:00
Catalin Marinas
2b8cac814c arm64: klib: Optimised string functions
This patch introduces AArch64-specific string functions (strchr,
strrchr).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-03-21 17:39:30 +00:00
Catalin Marinas
4a8992271c arm64: klib: Optimised memory functions
This patch introduces AArch64-specific memory functions (memcpy,
memmove, memchr, memset). These functions are not optimised for any CPU
implementation but can be used as a starting point once hardware is
available.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-03-21 17:39:29 +00:00
Marc Zyngier
f27bb139c3 arm64: Miscellaneous library functions
This patch adds udelay, memory and bit operations together with the
ksyms exports.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
2012-09-17 13:42:18 +01:00
Catalin Marinas
0aea86a217 arm64: User access library functions
This patch add support for various user access functions. These
functions use the standard LDR/STR instructions and not the LDRT/STRT
variants in order to allow kernel addresses (after set_fs(KERNEL_DS)).

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2012-09-17 13:42:11 +01:00