The remaining struct_mutex usage is just to serialize various gpu
related things (submit/retire/recover/fault/etc), so replace
struct_mutex with gpu->lock.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211109181117.591148-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
cur_ctx_seqno already does the same thing, but handles the edge cases
where a refcnt'd context can live after lastclose. So let's not have
two ways to do the same thing.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20211109181117.591148-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
When converting to use an idr to map userspace fence seqno values back
to a dma_fence, we lost the error return when userspace passes seqno
that is larger than the last submitted fence. Restore this check.
Reported-by: Akhil P Oommen <akhilpo@codeaurora.org>
Fixes: a61acbbe9c ("drm/msm: Track "seqno" fences by idr")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20211111192457.747899-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
* eDP support in DP sub-driver (for newer SoCs with native eDP output)
* dpu irq handling cleanup
* CRC support for making igt happy
* Support for NO_CONNECTOR bridges
* dsi: 14nm phy support for msm8953
* mdp5: support for msm8x53, sdm450, sdm632
* various smaller fixes and cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGsH9EwcpqGNNRJeL99NvFFjHX3SUg+nTYu0dHG5U9+QuA@mail.gmail.com
Until we better understand the stability issues caused by frequent
frequency changes, lets limit them to a618.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Caleb Connolly <caleb.connolly@linaro.org>
Link: https://lore.kernel.org/r/20211018153627.2787882-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Add a short delay before clamping to idle frequency on active->idle
transition. It takes ~0.5ms to increase the freq again on the next
idle->active transition, so this helps avoid extra freq transitions
on workloads that bounce between CPU and GPU.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20210927230455.1066297-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Some userspace apps make assumptions that rendering against multiple
contexts within the same process (from the same thread, with appropriate
MakeCurrent() calls) provides sufficient synchronization without any
external synchronization (ie. glFenceSync()/glWaitSync()). Since a
submitqueue maps to a gl/vk context, having multiple sched entities of
the same priority only works with implicit sync enabled.
To fix this, limit things to a single sched entity per priority level
per process.
An alternative would be sharing submitqueues between contexts in
userspace, but tracking of per-context faults (ie. GL_EXT_robustness)
is already done at the submitqueue level, so this is not an option.
Signed-off-by: Rob Clark <robdclark@chromium.org>
msm_file_private is more gpu related, and in the next commit it will
need access to other GPU specific #defines. While we're at it, add
some comments.
Signed-off-by: Rob Clark <robdclark@chromium.org>
The drm/scheduler provides additional prioritization on top of that
provided by however many number of ringbuffers (each with their own
priority level) is supported on a given generation. Expose the
additional levels of priority to userspace and map the userspace
priority back to ring (first level of priority) and schedular priority
(additional priority levels within the ring).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-13-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
For existing adrenos, there is one or more ringbuffer, depending on
whether preemption is supported. When preemption is supported, each
ringbuffer has it's own priority. A submitqueue (which maps to a
gl context or vk queue in userspace) is mapped to a specific ring-
buffer at creation time, based on the submitqueue's priority.
Each ringbuffer has it's own drm_gpu_scheduler. Each submitqueue
maps to a drm_sched_entity. And each submit maps to a drm_sched_job.
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/4
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-10-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Previously the (non-fd) fence returned from submit ioctl was a raw
seqno, which is scoped to the ring. But from UABI standpoint, the
ioctls related to seqno fences all specify a submitqueue. We can
take advantage of that to replace the seqno fences with a cyclic idr
handle.
This is in preperation for moving to drm scheduler, at which point
the submit ioctl will return after queuing the submit job to the
scheduler, but before the submit is written into the ring (and
therefore before a ring seqno has been assigned). Which means we
need to replace the dma_fence that userspace may need to wait on
with a scheduler fence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-8-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fix a couple incorrect or misspelt comments, and add submitqueue doc
comment.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
This adds a few things to try and make frequency scaling better match
the workload:
1) Longer polling interval to avoid whip-lashing between too-high and
too-low frequencies in certain workloads, like mobile games which
throttle themselves to 30fps.
Previously our polling interval was short enough to let things
ramp down to minimum freq in the "off" frame, but long enough to
not react quickly enough when rendering started on the next frame,
leading to uneven frame times. (Ie. rather than a consistent 33ms
it would alternate between 16/33/48ms.)
2) Awareness of when the GPU is active vs idle. Since we know when
the GPU is active vs idle, we can clamp the frequency down to the
minimum while it is idle. (If it is idle for long enough, then
the autosuspend delay will eventually kick in and power down the
GPU.)
Since devfreq has no knowledge of powered-but-idle, this takes a
small bit of trickery to maintain a "fake" frequency while idle.
This, combined with the longer polling period allows devfreq to
arrive at a reasonable "active" frequency, while still clamping
to minimum freq when idle to reduce power draw.
3) Boost. Because simple_ondemand needs to see a certain threshold
of busyness to ramp up, we could end up needing multiple polling
cycles before it reacts appropriately on interactive workloads
(ex. scrolling a web page after reading for some time), on top
of the already lengthened polling interval, when we see a idle
to active transition after a period of idle time we boost the
frequency that we return to.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20210726144653.2180096-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Before we start adding more cleverness, split it into it's own file.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20210726144653.2180096-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Wire up support to stall the SMMU on iova fault, and collect a devcore-
dump snapshot for easier debugging of faults.
Currently this is a6xx-only, but mostly only because so far it is the
only one using adreno-smmu-priv.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/20210610214431.539029-6-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
While keeping the previous default value for hangcheck period,
we allow now the possibility of configuring its value via
debugfs.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Link: https://lore.kernel.org/r/20210607104441.184700-1-siglesias@igalia.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
These aren't used by anything anymore.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20210608172808.11803-2-jonathan@marek.ca
Signed-off-by: Rob Clark <robdclark@chromium.org>
Performance counts, and ALWAYS_ON counters used for capturing GPU
timestamps, lose their state across suspend/resume cycles. Userspace
tooling for performance monitoring needs to be aware of this. For
example, after a suspend userspace needs to recalibrate it's offset
between CPU and GPU time.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/20210325012358.1759770-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Register GPU as a devfreq cooling device so that it can be passively
cooled by the thermal framework.
Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rather than relying on the big dev->struct_mutex hammer, introduce a
more specific lock for protecting the bo lists.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Newer microcode versions have support for the CP_WHERE_AM_I opcode which
allows the RPTR shadow memory to be marked as privileged to protect it
from corruption. Move the RPTR shadow into its own buffer and protect it
it if the current microcode version supports the new feature.
We can also re-enable preemption for those targets that support
CP_WHERE_AM_I. Start out by preemptively assuming that we can enable
preemption and disable it in a5xx_hw_init if the microcode version comes
back as too old.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
In $debugfs/gem we already show any vma(s) associated with an object.
Also show process names if the vma's address space is a per-process
address space.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Add support for allocating private address space instances. Targets that
support per-context pagetables should implement their own function to
allocate private address spaces.
The default will return a pointer to the global address space.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Now that we can get the ctx from the submitqueue, the extra arg is
redundant.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[split out of previous patch to reduce churny noise]
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Each submitqueue is attached to a context. Add a pointer to the
context to the submitqueue at create time and refcount it so
that it stays around through the life of the queue.
Co-developed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
This will be populated by adreno-smmu, to provide a way for coordinating
enabling/disabling TTBR0 translation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
In a later patch, the drvdata will not directly be 'struct msm_gpu *',
so add a helper to reduce the churn.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
a650 supports expanded apriv support that allows us to map critical buffers
(ringbuffer and memstore) as as privileged to protect them from corruption.
Cc: stable@vger.kernel.org
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
This patch changes the plumbing to send the devfreq recommended opp rather
than the frequency. Also consolidate and rearrange the code in a6xx to set
the GPU frequency and the icc vote in preparation for the upcoming
changes for GPU->DDR scaling votes.
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
* new gpu support: a405, a640, a650
* dpu: color processing support
* mdp5: support for msm8x36 (the thing with a405)
* some prep work for per-context pagetables (ie the part that
does not depend on in-flight iommu patches)
* last but not least, UABI update for submit ioctl to support
syncobj (from Bas)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJe3bDxAAoJEAx081l5xIa+tkoQAIGUxvEgYBQ+S6RvANZAT+Wq
2JZS2JPvExcB3Xe4erI+y7DeIuK2VghQUAcxMWhrDGgU7jKLV7jq09HTKkdE7++4
feLQMZziy3rAN3H6Pe1+72ZI9jAeK7JpvyRxI1nSu1O1JnaZS2rHmCOnBT8yA8sw
tHld1b5KUMmgTLR6CcJQYz0qp7p8x5LE8MdWY57Px5AqcnXFf1z/oiYNiCcxK2Jl
tEic1b9mvCwvlGWYdu00aavqo7WESj3oWYxtb8MsmVBWjAHtTqrlBY21DyQzgdEu
sgc8QAG+zHJ7Ls81INSVfDQ1zrspn/n+yL8efMhQibpMAQqGgt17nF+ZIx50nLMi
USg5qBJKgBL2iccooA9QEioFE3tB6Ld8SfcjLGIU7jegi0Fw/KpVPqmUVjKdqrXT
qjUKExa4e4pFxOlgbOYc1lIzSLwpGjGpLWbRWj8aee1GyrWRJA0Y9aRo75G6Sr4e
SX6807kX+h0IrF1rJzftVKa+KviD9SD4NyAyah6OJvg0FVJnhbO75PmnAkB6GVnQ
Jgg7fALjjkANRd8764H2B0pjke6wPDnUNXnh32ei2FWxVfQfIu/qhlJg9cU7TdMf
Z2kcHijoRGjAfvddD+oDs3DS58b9o7DHKgsZuLWvh87MpVbv9CynZSh5SgGqqNKR
nHajwsRXQc6e/fXT4YzN
=hIK6
-----END PGP SIGNATURE-----
Merge tag 'drm-next-msm-5.8-2020-06-08' of git://anongit.freedesktop.org/drm/drm
Pull drm msm updates from Dave Airlie:
"This tree has been in next for a couple of weeks, but Rob missed an
arm32 build issue, so I was awaiting the tree with a patch reverted.
- new gpu support: a405, a640, a650
- dpu: color processing support
- mdp5: support for msm8x36 (the thing with a405)
- some prep work for per-context pagetables (ie the part that does
not depend on in-flight iommu patches)
- last but not least, UABI update for submit ioctl to support syncobj
(from Bas)"
* tag 'drm-next-msm-5.8-2020-06-08' of git://anongit.freedesktop.org/drm/drm: (30 commits)
Revert "drm/msm/dpu: add support for clk and bw scaling for display"
drm/msm/a6xx: skip HFI set freq if GMU is powered down
drm/msm: Update the MMU helper function APIs
drm/msm: Refactor address space initialization
drm/msm: Attach the IOMMU device during initialization
drm/msm/dpu: dpu_setup_dspp_pcc() can be static
drm/msm/a6xx: a6xx_hfi_send_start() can be static
drm/msm/a4xx: add a405_registers for a405 device
drm/msm/a4xx: add adreno a405 support
drm/msm/a6xx: update a6xx_hw_init for A640 and A650
drm/msm/a6xx: enable GMU log
drm/msm/a6xx: update pdc/rscc GMU registers for A640/A650
drm/msm/a6xx: A640/A650 GMU firmware path
drm/msm/a6xx: HFI v2 for A640 and A650
drm/msm/a6xx: add A640/A650 to gpulist
drm/msm/a6xx: use msm_gem for GMU memory objects
drm/msm: add internal MSM_BO_MAP_PRIV flag
drm/msm: add msm_gem_get_and_pin_iova_range
drm/msm: Check for powered down HW in the devfreq callbacks
drm/msm/dpu: update bandwidth threshold check
...
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct in. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space. For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
[squash in rebase fixups]
Signed-off-by: Rob Clark <robdclark@chromium.org>
As a result of commit 987d65d013 (drm: debugfs: make
drm_debugfs_create_files() never fail) and changes to various debugfs
functions in drm/core and across various drivers, there is no need for
the drm_driver.debugfs_init() hook to have a return value. Therefore,
declare it as void.
This also includes refactoring all users of the .debugfs_init() hook to
return void across the subsystem.
v2: include changes to the hook and drivers that use it in one patch to
prevent driver breakage and enable individual successful compilation of
this change.
References: https://lists.freedesktop.org/archives/dri-devel/2020-February/257183.html
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310133121.27913-18-wambui.karugax@gmail.com
Some A3xx and all A4xx Adreno GPUs do not have GMEM inside the GPU core
and must use the On Chip MEMory (OCMEM) in order to be functional.
There's a separate interconnect path that needs to be setup to OCMEM.
Add support for this second path to the GPU core.
In the downstream MSM 3.4 sources, the two interconnect paths for the
GPU are between:
- MSM_BUS_MASTER_GRAPHICS_3D and MSM_BUS_SLAVE_EBI_CH0
- MSM_BUS_MASTER_V_OCMEM_GFX3D and MSM_BUS_SLAVE_OCMEM
Signed-off-by: Brian Masney <masneyb@onstation.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For KHR_robustness, userspace wants to know two things, the count of GPU
faults globally, and the count of faults attributed to a given context.
This patch providees the former, and the next patch provides the latter.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Try to get the interconnect path for the GPU and vote for the maximum
bandwidth to support all frequencies. This is needed for performance.
Later we will want to scale the bandwidth based on the frequency to
also optimize for power but that will require some device tree
infrastructure that does not yet exist.
v6: use icc_set_bw() instead of icc_set()
v5: Remove hardcoded interconnect name and just use the default
v4: Don't use a port string at all to skip the need for names in the DT
v3: Use macros and change port string per Georgi Djakov
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Acked-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Every GPU core only has one interrupt so there isn't any
value in looking up the interrupt by name. Remove the name (which
is legacy anyway) and use platform_get_irq() instead.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When debugfs is disabled, but coredump is turned on, the adreno driver fails to build:
drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:4: error: 'struct msm_gpu_funcs' has no member named 'show'
.show = adreno_show,
^~~~
drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: note: (near initialization for 'funcs.base')
drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: error: initialization of 'void (*)(struct msm_gpu *, struct msm_gem_submit *, struct msm_file_private *)' from incompatible pointer type 'void (*)(struct msm_gpu *, struct msm_gpu_state *, struct drm_printer *)' [-Werror=incompatible-pointer-types]
drivers/gpu/drm/msm/adreno/a3xx_gpu.c:460:11: note: (near initialization for 'funcs.base.submit')
drivers/gpu/drm/msm/adreno/a4xx_gpu.c:546:4: error: 'struct msm_gpu_funcs' has no member named 'show'
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:1460:4: error: 'struct msm_gpu_funcs' has no member named 'show'
drivers/gpu/drm/msm/adreno/a6xx_gpu.c:769:4: error: 'struct msm_gpu_funcs' has no member named 'show'
drivers/gpu/drm/msm/msm_gpu.c: In function 'msm_gpu_devcoredump_read':
drivers/gpu/drm/msm/msm_gpu.c:289:12: error: 'const struct msm_gpu_funcs' has no member named 'show'
Adjust the #ifdef to make it build again.
Fixes: c0fec7f562 ("drm/msm/gpu: Capture the GPU state on a GPU hang")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When the userspace tries to read the crashstate dump, the read side
implementation in the driver currently ascii85 encodes all the binary
buffers and it does this each time the read system call is called.
A userspace tool like cat typically does a page by page read and the
number of read calls depends on the size of the data captured by the
driver. This is certainly not desirable and does not scale well with
large captures.
This patch encodes the buffer only once in the read path. With this there
is an immediate >10X speed improvement in crashstate save time.
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The devfreq framework requires the drivers to provide busy time estimations.
The GPU driver relies on the hardware performance counteres for the busy time
estimations, but different hardware revisions have counters which can be
sourced from different clocks. So the busy time estimation will be target
dependent. Additionally on targets where the clocks are completely controlled
by the on chip microcontroller, fetching and setting the current GPU frequency
will be different. This patch aims to embrace these differences by re-factoring
the devfreq code a bit.
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add a helper function to parse the clock names and set up
the bulk data so we can take advantage of the bulk clock
functions instead of rolling our own. This is added
as a helper function so the upcoming a6xx GMU code can
also take advantage of it.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
All users of do_gettimeofday() have been removed, but this one recently
crept in, along with an incorrect printing of the microseconds portion.
This converts it to using ktime_get_real_timespec64() as a direct
replacement, and adds the leading zeroes. I considered using monotonic
times (ktime_get()) instead, but as this timestamp appears to only
be used for humans rather than compared with other timestamps, the
real time domain is probably good enough.
Fixes: e43b045e2c82 ("drm/msm/gpu: Capture the state of the GPU")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
For hangs, dump copy out the contents of the buffer objects attached to the
guilty submission and print them in the crash dump report.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add the contents of each ringbuffer to the GPU state and dump the
data in the crash file encoded with ascii85. To save space only
the used portions of the ringbuffer are dumped.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Capture the GPU state on a GPU hang and store it for later playback
via the devcoredump facility. Only one crash state is stored at a
time on the assumption that the first hang is usually the most
interesting. The existing crash state can be cleared after capturing
it and then a new one will be captured on the next hang.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Convert the existing GPU show function to use the GPU state to
dump the information rather than reading it directly from the hardware.
This will require an additional step to capture the state before
dumping it for the existing nodes but it will greatly facilitate reusing
the same code for dumping a previously captured state from a GPU hang.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add the infrastructure to capture the current state of the GPU and
store it in memory so that it can be dumped later.
For now grab the same basic ringbuffer information and registers
that are provided by the debugfs 'gpu' node but obviously this should
be extended to capture a much larger set of GPU information.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add some debugfs to dump out PFP and ME microcontroller state, as well
as some of the queues (MEQ and ROQ). Also add a debugfs file to trigger
a GPU reset (and reloading the firmware on next submit).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add support for devfreq to dynamically control the GPU frequency.
By default try to use the 'simple_ondemand' governor which can
adjust the frequency based on GPU load.
v2: Fix __aeabi_uldivmod issue from the 0 day bot and use
devfreq_recommended_opp() as suggested by Rob.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Remove the downstream bus scaling code. It isn't needed for for
compatibility with a downstream or vendor kernel. Get it out of the
way to clear space for devfreq support.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We use a global ringbuffer size and block size for all targets and
at least for 5XX preemption we need to know the value the RB_CNTL
in several locations so it makes sense to calculate it once and use
it everywhere.
The only monkey wrench is that we need to disable the RPTR shadow
for A430 targets but that only needs to be done once and doesn't
affect A5XX so we can or in the value at init time.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.
The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifier for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.
The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When we move to multiple ringbuffers we're going to store the data
in the memptrs on a per-ring basis. In order to prepare for that
move the current memptrs from the adreno namespace into msm_gpu.
This is way cleaner and immediately lets us kill off some sub
functions so there is much less cost later when we do move to
per-ring structs.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Currently the behavior of a command stream is provided by the user
application during submission and the application is expected to internally
maintain the settings for each 'context' or 'rendering queue' and specify
the correct ones.
This works okay for simple cases but as applications become more
complex we will want to set context specific flags and do various
permission checks to allow certain contexts to enable additional
privileges.
Add kernel-side submit queues to be analogous to 'contexts' or
'rendering queues' on the application side. Each file descriptor
instance will maintain its own list of queues. Queues cannot be
shared between file descriptors.
For backwards compatibility context id '0' is defined as a default
context specifying no priority and no special flags. This is
intended to be the usual configuration for 99% of applications so
that a garden variety application can function correctly without
creating a queue. Only those applications requiring the specific
benefit of different queues need create one.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Now that the msm_gem supports an arbitrary number of vma's, we no longer
need to assign an id (index) to each address space. So rip out the
associated code.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
There isn't any generic code that uses ->idle so remove it.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of using a fixed list of clock names use the clock-names
list in the device tree to discover and get the list of clocks
that we need.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some A3XX and A4XX GPU targets required that the GPU clock be
programmed to a non zero value when it was disabled so
27Mhz was chosen as the "invalid" frequency.
Even though newer targets do not have the same clock restrictions
we still write 27Mhz on clock disable and expect the clock subsystem
to round down to zero.
For unknown reasons even though the slow clock speed is always
27Mhz and it isn't actually a functional level the legacy device tree
frequency tables always defined it and then did gymnastics to work
around it.
Instead of playing the same silly games just hard code the "slow" clock
speed in the code as 27MHz and save ourselves a bit of infrastructure.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The adreno code inherited a silly workaround from downstream
from the bad old days before decent clock control. grp_clk[0]
(named 'src_clk') doesn't actually exist - it was used as a proxy
for whatever the core clock actually was (usually 'core_clk').
All targets should be able to correctly request 'core_clk' and
get the right thing back so zap the anachronism and directly
use grp_clk[0] to control the clock rate.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add some new functions to manipulate GPU registers. gpu_read64 and
gpu_write64 can read/write a 64 bit value to two 32 bit registers.
For 4XX and older these are normally perfcounter registers, but
future targets will use 64 bit addressing so there will be many
more spots where a 64 bit read and write are needed.
gpu_rmw() does a read/modify/write on a 32 bit register given a mask
and bits to OR in.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When the GPU hardware init function fails (like say, ME_INIT timed
out) return error instead of blindly continuing on. This gives us
a small chance of saving the system before it goes boom.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
For a5xx the gpu is 64b so we need to change iova to 64b everywhere. On
the display side, iova is still 32b so it can ignore the upper bits.
(Although all the armv8 devices have an iommu that can map 64b pa to 32b
iova.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
We can have various combinations of 64b and 32b address space, ie. 64b
CPU but 32b display and gpu, or 64b CPU and GPU but 32b display. So
best to decouple the device iova's from mmap offset.
Signed-off-by: Rob Clark <robdclark@gmail.com>
At this point, there is nothing left to fail. And submit already has a
fence assigned and is added to the submit_list. Any problems from here
on out are asynchronous (ie. hangcheck/recovery).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Better encapsulate the per-timeline stuff into fence-context. For now
there is just a single fence-context, but eventually we'll also have one
per-CRTC to enable fully explicit fencing.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Track the list of in-flight submits. If the gpu hangs, retire up to an
including the offending submit, and then re-submit the remainder. This
way, for concurrently running piglit tests (for example), one failing
test doesn't cause unrelated tests to fail simply because it's submit
was queued up after one that triggered a hang.
Signed-off-by: Rob Clark <robdclark@gmail.com>
As found in apq8016 (used in DragonBoard 410c) and msm8916.
Note that numerically a306 is actually 307 (since a305c already claimed
306). Nice and confusing.
Signed-off-by: Rob Clark <robdclark@gmail.com>
A few spots in the driver have support for downstream android
CONFIG_MSM_BUS_SCALING. This is mainly to simplify backporting the
driver for various devices which do not have sufficient upstream
kernel support. But the intentionally dead code seems to cause
some confusion. Rename the #define to make this more clear.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Shut down the clks when the gpu has nothing to do. A short inactivity
timer is used to provide a low pass filter for power transitions.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add a VRAM carveout that is used for systems which do not have an IOMMU.
The VRAM carveout uses CMA. The arch code must setup a CMA pool for the
device (preferrably in highmem.. a 256m-512m VRAM pool in lowmem is not
cool). The user can configure the VRAM pool size using msm.vram module
param.
Technically, the abstraction of IOMMU behind msm_mmu is not strictly
needed, but it simplifies the GEM code a bit, and will be useful later
when I add support for a2xx devices with GPUMMU, so I decided to keep
this part.
It appears to be possible to configure the GPU to restrict access to
addresses within the VRAM pool, but this is not done yet. So for now
the GPU will refuse to load if there is no sort of mmu. Once address
based limits are supported and tested to confirm that we aren't giving
the GPU access to arbitrary memory, this restriction can be lifted
Signed-off-by: Rob Clark <robdclark@gmail.com>
This got a bit broken with original patches when re-arranging things to
move dependencies on mach-msm inside #ifndef OF.
Signed-off-by: Rob Clark <robdclark@gmail.com>
A basic, no-frills recovery mechanism in case the gpu gets wedged. We
could try to be a bit more fancy and restart the next submit after the
one that got wedged, but for now keep it simple. This is enough to
recover things if, for example, the gpu hangs mid way through a piglit
run.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add initial support for a3xx 3d core.
So far, with hardware that I've seen to date, we can have:
+ zero, one, or two z180 2d cores
+ a3xx or a2xx 3d core, which share a common CP (the firmware
for the CP seems to implement some different PM4 packet types
but the basics of cmdstream submission are the same)
Which means that the eventual complete "class" hierarchy, once
support for all past and present hw is in place, becomes:
+ msm_gpu
+ adreno_gpu
+ a3xx_gpu
+ a2xx_gpu
+ z180_gpu
This commit splits out the parts that will eventually be common
between a2xx/a3xx into adreno_gpu, and the parts that are even
common to z180 into msm_gpu.
Note that there is no cmdstream validation required. All memory access
from the GPU is via IOMMU/MMU. So as long as you don't map silly things
to the GPU, there isn't much damage that the GPU can do.
Signed-off-by: Rob Clark <robdclark@gmail.com>