mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-14 23:40:48 +00:00
Presently PAPR doesn't support injecting smart errors on an
NVDIMM. This makes testing the NVDIMM health reporting functionality
difficult as simulating NVDIMM health related events need a hacked up
qemu version.
To solve this problem this patch proposes simulating certain set of
NVDIMM health related events in papr_scm. Specifically 'fatal' health
state and 'dirty' shutdown state. These error can be injected via the
user-space 'ndctl-inject-smart(1)' command. With the proposed patch and
corresponding ndctl patches following command flow is expected:
$ sudo ndctl list -DH -d nmem0
...
"health_state":"ok",
"shutdown_state":"clean",
...
# inject unsafe shutdown and fatal health error
$ sudo ndctl inject-smart nmem0 -Uf
...
"health_state":"fatal",
"shutdown_state":"dirty",
...
# uninject all errors
$ sudo ndctl inject-smart nmem0 -N
...
"health_state":"ok",
"shutdown_state":"clean",
...
The patch adds a new member 'health_bitmap_inject_mask' inside struct
papr_scm_priv which is then bitwise ANDed to the health bitmap fetched from the
hypervisor. The value for 'health_bitmap_inject_mask' is accessible from sysfs
at nmemX/papr/health_bitmap_inject.
A new PDSM named 'SMART_INJECT' is proposed that accepts newly
introduced 'struct nd_papr_pdsm_smart_inject' as payload thats
exchanged between libndctl and papr_scm to indicate the requested
smart-error states.
When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject()
constructs a pair or 'inject_mask' and 'clear_mask' bitmaps from the payload
and bit-blt it to the 'health_bitmap_inject_mask'. This ensures the after being
fetched from the hypervisor, the health_bitmap reflects requested smart-error
states.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124202204.1488346-1-vaibhav@linux.ibm.com
76 lines
2.8 KiB
Plaintext
76 lines
2.8 KiB
Plaintext
What: /sys/bus/nd/devices/nmemX/papr/flags
|
|
Date: Apr, 2020
|
|
KernelVersion: v5.8
|
|
Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
|
|
Description:
|
|
(RO) Report flags indicating various states of a
|
|
papr-pmem NVDIMM device. Each flag maps to a one or
|
|
more bits set in the dimm-health-bitmap retrieved in
|
|
response to H_SCM_HEALTH hcall. The details of the bit
|
|
flags returned in response to this hcall is available
|
|
at 'Documentation/powerpc/papr_hcalls.rst' . Below are
|
|
the flags reported in this sysfs file:
|
|
|
|
* "not_armed"
|
|
Indicates that NVDIMM contents will not
|
|
survive a power cycle.
|
|
* "flush_fail"
|
|
Indicates that NVDIMM contents
|
|
couldn't be flushed during last
|
|
shut-down event.
|
|
* "restore_fail"
|
|
Indicates that NVDIMM contents
|
|
couldn't be restored during NVDIMM
|
|
initialization.
|
|
* "encrypted"
|
|
NVDIMM contents are encrypted.
|
|
* "smart_notify"
|
|
There is health event for the NVDIMM.
|
|
* "scrubbed"
|
|
Indicating that contents of the
|
|
NVDIMM have been scrubbed.
|
|
* "locked"
|
|
Indicating that NVDIMM contents cant
|
|
be modified until next power cycle.
|
|
|
|
What: /sys/bus/nd/devices/nmemX/papr/perf_stats
|
|
Date: May, 2020
|
|
KernelVersion: v5.9
|
|
Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
|
|
Description:
|
|
(RO) Report various performance stats related to papr-scm NVDIMM
|
|
device. This attribute is only available for NVDIMM devices
|
|
that support reporting NVDIMM performance stats. Each stat is
|
|
reported on a new line with each line composed of a
|
|
stat-identifier followed by it value. Below are currently known
|
|
dimm performance stats which are reported:
|
|
|
|
* "CtlResCt" : Controller Reset Count
|
|
* "CtlResTm" : Controller Reset Elapsed Time
|
|
* "PonSecs " : Power-on Seconds
|
|
* "MemLife " : Life Remaining
|
|
* "CritRscU" : Critical Resource Utilization
|
|
* "HostLCnt" : Host Load Count
|
|
* "HostSCnt" : Host Store Count
|
|
* "HostSDur" : Host Store Duration
|
|
* "HostLDur" : Host Load Duration
|
|
* "MedRCnt " : Media Read Count
|
|
* "MedWCnt " : Media Write Count
|
|
* "MedRDur " : Media Read Duration
|
|
* "MedWDur " : Media Write Duration
|
|
* "CchRHCnt" : Cache Read Hit Count
|
|
* "CchWHCnt" : Cache Write Hit Count
|
|
* "FastWCnt" : Fast Write Count
|
|
|
|
What: /sys/bus/nd/devices/nmemX/papr/health_bitmap_inject
|
|
Date: Jan, 2022
|
|
KernelVersion: v5.17
|
|
Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
|
|
Description:
|
|
(RO) Reports the health bitmap inject bitmap that is applied to
|
|
bitmap received from PowerVM via the H_SCM_HEALTH. This is used
|
|
to forcibly set specific bits returned from Hcall. These is then
|
|
used to simulate various health or shutdown states for an nvdimm
|
|
and are set by user-space tools like ndctl by issuing a PAPR DSM.
|
|
|