Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Correct edac_mc_find()'s return value on error (Robert Richter)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlzdOjIACgkQEsHwGGHe
VUrkxQ/+J+mSL23ush4HNq43OqW7Wgpkdoljo+5hr1XWH790Kc/HNI+B/Qdsgl6I
UoFkJPwMTfxqV/JZFuNC22ADnrXw6nRDZSE7+K+fHmpBgj2rK4hQ6tOYON9dTg/X
9nPF1ylBQcdzwNt4eTTnS5U1eKBSWqmQm5NtYmq2s83DQ1v3r2Pckqx00SC5Stdy
9vQLCPUR5SjsGpeQZnrdesxO89t3pWWzfZrAsCxPiY36VbUVmsHQVbHC9yfpLSIE
6pzc0esQx6zqyb4e0Rx+aKJPJ6fS43NsxifkI81h7S5F/xPEuZy39rhAmnEOYCNE
0ceQwl+ktzzGzmurgXf9Y5LLy8vOcCUJX4HBCkaweHSDRiP1xZtIl7Ao3aAaP4no
AUQqX/O0BydkO+awarhCduM7vIGdr0B9xn0xCtUODitoTIzaoVZ8F2iF7xSOi/3F
QkP4dSsf72ZK/EdA/8k1bxyBnxQ/TKgwLIQEJ1n9Bz7biPxZmqCyHQMDg7HaA1Uc
crGkJJr0cGHb9TEAzy0ygtRNrKE+hpt0FGFGE0zlE4uX7YP1Y/Bw3R38zbcl2x9h
nIqEmHwswVfBdm28RfxRH3LnS9SHwsXTtUk0y89Je1pK0qZ2Kv6nRiEFzWIAqPyY
I/7eorSkNXU+vVnZko8cvuHEuJh1VUj2bGJeS6NUo9X5+IOE0LE=
=yjXI
-----END PGP SIGNATURE-----
Merge tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fixes from Borislav Petkov:
- Do not build mpc85_edac as a module (Michael Ellerman)
- Correct edac_mc_find()'s return value on error (Robert Richter)
* tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC/mc: Fix edac_mc_find() in case no device is found
EDAC/mpc85xx: Prevent building as a module
The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.
I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.
[ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]
Fixes: c73e8833be ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
The mpc85xx EDAC driver can be configured as a module but then fails to
build because it uses two unexported symbols:
ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
We don't want to export those symbols just for this driver, so make the
driver only configurable as a built-in.
This seems to have been broken since at least
c92132f598 ("edac/85xx: Add PCIe error interrupt edac support")
(Nov 2013).
[ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
as a module. ]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Cc: morbidrsa@gmail.com
Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
Pull RAS updates from Borislav Petkov:
- Support for varying MCA bank numbers per CPU: this is in preparation
for future CPU enablement (Yazen Ghannam)
- MCA banks read race fix (Tony Luck)
- Facility to filter MCEs which should not be logged (Yazen Ghannam)
- The usual round of cleanups and fixes
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
x86/MCE: Add an MCE-record filtering function
RAS/CEC: Increment cec_entered under the mutex lock
x86/mce: Fix debugfs_simple_attr.cocci warnings
x86/mce: Remove mce_report_event()
x86/mce: Handle varying MCA bank counts
x86/mce: Fix machine_check_poll() tests for error types
MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE
x86/MCE: Group AMD function prototypes in <asm/mce.h>
AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
errors under certain conditions. The errors are benign and can safely be
ignored. However, the high error rate may cause the MCA threshold
counter to overflow causing a high rate of thresholding interrupts.
In addition, users may see the errors reported through the AMD MCE
decoder module, even with the interrupt disabled, due to MCA polling.
Clear the "Counter Present" bit in the Instruction Fetch bank's
MCA_MISC0 register. This will prevent enabling MCA thresholding on this
bank which will prevent the high interrupt rate due to this error.
Define an AMD-specific function to filter these errors from the MCE
event pool so that they don't get reported during early boot.
Rename filter function in EDAC/mce_amd to avoid a naming conflict, while
at it.
[ bp: Move function prototype to the internal header and
massage/cleanup, fix typos. ]
Reported-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "clemej@gmail.com" <clemej@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # 5.0.x: c95b323dcd: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
Cc: <stable@vger.kernel.org> # 5.0.x: 30aa3d26ed: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
Cc: <stable@vger.kernel.org> # 5.0.x: 9308fd4074: x86/MCE: Group AMD function prototypes in <asm/mce.h>
Cc: <stable@vger.kernel.org> # 5.0.x
Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com
Reserve ECC Double Bit Error SMC call to alert U-Boot that a DBE has
occurred. Move the call from local EDAC header file to a common header.
[ bp: Merge the two patches. ]
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Richard Gong <richard.gong@intel.com>
Reviewed-by: Alan Tull <atull@kernel.org> # firmware
Cc: Greg KH <greg@kroah.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@kernel.org
Link: https://lkml.kernel.org/r/1553870639-23895-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The FIFO memory and ECC initialization doesn't need to be
done as a separate operation early in the startup.
Improve the Arria10 and Stratix10 peripheral FIFO init
by initializing memory and enabling ECC as part of the
device driver initialization.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-2-git-send-email-thor.thayer@linux.intel.com
Improve the Arria10 and Stratix10 error injection routine
by reading the data and changing just 1 bit before writing
back out. Previous routine would overwrite the first bytes
to 0 then change 1 bit but this method is less intrusive.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-1-git-send-email-thor.thayer@linux.intel.com
AMD systems may support chip select interleaving. However, on family
17h+ this was not taken into account when printing the chip select
sizes.
Add support to detect if chip selects are interleaved on family 17h+,
and adjust the sizes accordingly.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-6-Yazen.Ghannam@amd.com
The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.
Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.
Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com
Future AMD systems may support x16 symbol sizes.
Recognize if a system is using x16 symbol size. Also, simplify the print
statement.
Note that a x16 syndrome vector table is not necessary like with x4 or
x8 syndromes. This is because systems that support x16 symbol sizes are
SMCA systems and in that case, the syndrome can be directly extracted
from the MCA_SYND[Syndrome] field.
[ bp: massage. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com
The AMD64 EDAC module currently hardcodes the EDAC channel layer size
count to two. Future AMD systems may have more channels than this.
Set the EDAC channel layer size equal to the maximum number of channels
possible for the system. On Family 17h and later, this is set in the
num_umcs variable. Older systems will continue to use two as the
default.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190325203319.7603-1-Yazen.Ghannam@amd.com
The first few models of Family 17h all had 2 Unified Memory Controllers
per Die, so this was treated as a fixed value. However, future systems
may have more Unified Memory Controllers per Die.
Related to this, the channel number and base address of a Unified Memory
Controller were found by matching on fixed, known values. However,
current and future systems follow this pattern for the channel number
and base address of a Unified Memory Controller: 0xYXXXXX, where Y is
the channel number. So matching on hardcoded values is not necessary.
Set the number of Unified Memory Controllers at driver init time based
on the family/model. Also, update the functions that find the channel
number and base address of a Unified Memory Controller to support more
than two.
[ bp: Move num_umcs into the .c file and simplify comment. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com
Define and use a macro for looping over the number of Unified Memory
Controllers.
No functional change.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-2-Yazen.Ghannam@amd.com
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.
This also fixes a probe failure that appeared when some other PCI IDs
for Family 17h Model 30h were added to the AMD NB code.
Fixes: be3518a16e (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
Stratix10 Double Bit Error Address was always read from SDRAM Address
register instead of each device's Address register.
To determine which device had the DBE, cycle through the EDAC devices
comparing the DBE value to the db_irq value. Once found, report the DBE
Address from the device registers as well as the device name.
Finally, notify the system via an SMC call and indicate the panic should
result in a system reboot. Change a run-time check to a Stratix10
compile-time check for a clean SMC notification.
Fixes: d5fc912556 ("EDAC, altera: Combine Stratix10 and Arria10 probe functions")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1552490842-25440-1-git-send-email-thor.thayer@linux.intel.com
The following Kconfig constellations fail randconfig builds:
CONFIG_ACPI_NFIT=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_SKX=m
CONFIG_EDAC_I10NM=y
or
CONFIG_ACPI_NFIT=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_SKX=y
CONFIG_EDAC_I10NM=m
with:
...
CC [M] drivers/edac/skx_common.o
...
.../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module'
That is because if one of the two drivers - skx_edac or i10nm_edac - is
built-in and the other one is a module, the shared file skx_common.c
gets linked into a module object by kbuild. Therefore, when linking that
same file into vmlinux, the '__this_module' symbol used in debugfs isn't
defined, leading to the above error.
Fix it by moving all debugfs code from skx_common.c to both skx_base.c
and i10nm_base.c respectively. Thus, skx_common.c doesn't refer to the
'__this_module' symbol anymore.
Clarify skx_common.c's purpose at the top of the file for future
reference, while at it.
[ bp: Make text more readable. ]
Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190321221339.GA32323@agluck-desk
Pull RAS updates from Borislav Petkov:
"This time around we have in store:
- Disable MC4_MISC thresholding banks on all AMD family 0x15 models
(Shirish S)
- AMD MCE error descriptions update and error decode improvements
(Yazen Ghannam)
- The usual smaller conversions and fixes"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Improve error message when kernel cannot recover, p2
EDAC/mce_amd: Decode MCA_STATUS in bit definition order
EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit
EDAC, mce_amd: Print ExtErrorCode and description on a single line
EDAC, mce_amd: Match error descriptions to latest documentation
x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
RAS: Add a MAINTAINERS entry
RAS: Use consistent types for UUIDs
x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
x86/MCE: Switch to use the new generic UUID API
* New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)
* Altera SDRAM functionality carveout for separate enablement of RAS and
SDRAM capabilities on some Altera chips. (Thor Thayer)
* The usual round of cleanups and fixes
Last but not least:
* Recruit James Morse as a reviewer for the ARM side
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlx+SqYACgkQEsHwGGHe
VUpwxg//fDdlIcnNjPUKWcBQxfy7meFd5xlDwbbIbkdE1mfHLBP6n2gRVM9NguSm
shYPXcdqIrFTn4D7nOxVLS2Gqa7cF/j9M+YaqTNfe9/OVI0oSeM84D2+kEUi2tHQ
LkCbBL9W+SAk4wjcFUPrEuwPaABfPdt0g9wuEf3Yg+PQsZ4FojwF7p91plBiKo/X
GewLIM4+QT/mIkyn5u+2UJWayUvtdc1nchBGg3klYaDTRsUqH9pn284bInj7/Woj
r34288yXuksIhDnUd2h4F9RCdZegBLIZf/k7Rqdg+Acot64c3PprE+/SI9nFcYfn
fcF/48Sv6vMfP5kDKeJhsDjWu85VdpP+Cp4bxebXx4NURWn30kyYGDdpvbpgWxzc
XDOXiEDxfh43/dNEyqCRr86dcZS8ro1pQNlnQvxOJyMljdEGjbB4JizG2ZvVluBP
hSu3ifgpTiBGJMRQQijha41SMuWE7Z1ZgZt/XnyPAKwEEFtQVrm7IfnDohag3VYw
6kWMVeyenmx/yF1JmA0fTxAdeeZPMnbUx0JxHRo1wJXF+1b19b0P+1nYUjgKlXQN
Wq78DGPkQ9InfISFegS/A2AMWk+ZgLZ5d4pVwRVWdyeOMQVUoXO4R3KQur1tV7gu
vm5BpWRZUszhcVvuhly8fOTyOsudYsNe7EeMd2V0Q2FZBy81MH8=
=A1kA
-----END PGP SIGNATURE-----
Merge tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler)
- New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)
- Altera SDRAM functionality carveout for separate enablement of RAS
and SDRAM capabilities on some Altera chips. (Thor Thayer)
- The usual round of cleanups and fixes
And last but not least: recruit James Morse as a reviewer for the ARM
side.
* tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC/altera: Add separate SDRAM EDAC config
EDAC, altera: Add missing of_node_put()
EDAC, skx_common: Add code to recognise new compound error code
EDAC, i10nm: Fix randconfig builds
EDAC, i10nm: Add a driver for Intel 10nm server processors
EDAC, skx_edac: Delete duplicated code
EDAC, skx_common: Separate common code out from skx_edac
EDAC: Do not check return value of debugfs_create() functions
EDAC: Add James Morse as a reviewer
dt-bindings, EDAC: Add Aspeed AST2500
EDAC, aspeed: Add an Aspeed AST2500 EDAC driver
The CONFIG_ALTERA_EDAC Kconfig symbol always enables the SDRAM EDAC
functionality. On the newer architectures, however, there are cases
where the peripheral EDAC functionality is enabled but SDRAM needs to be
disabled.
Move SDRAM functions so they can be contained inside the conditional
CONFIG. Create new CONFIG option just for SDRAM.
[ bp: Massage commit message. ]
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: dinguyen@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux@armlinux.org.uk
Link: https://lkml.kernel.org/r/1551121006-4657-2-git-send-email-thor.thayer@linux.intel.com
Previous AMD systems have had a bit in MCA_STATUS to indicate that an
error was detected on a scrub operation. However, this bit was defined
differently within different banks and families/models.
Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero
or defined as "Scrub", for all MCA banks and CPU models. Therefore, this
bit can be defined as the "Scrub" bit.
Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding
module for Family 17h and newer systems.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190212212417.107049-1-Yazen.Ghannam@amd.com
The call to of_parse_phandle() returns a node pointer with refcount
incremented thus it must be explicitly decremented here after the last
usage.
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: wang.yi59@zte.com.cn
Link: https://lkml.kernel.org/r/1550126347-27984-1-git-send-email-huang.zijiang@zte.com.cn
A new error code for systems that use DRAM as an extra level of cache
looks like:
000F 0010 1MMM CCCC
where the MMM and CCCC bits are used for the same purpose as the
original code. For this new class of errors the ADXL translation will
provide details of both the DIMM used as cache for the error location
and the component that is being cached.
Note: This new error code is first supported in Skylake. Older EDAC
drivers do not need to be updated.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190205182109.27828-1-tony.luck@intel.com
Save a log line by printing the extended error code and the description
on a single line. This is similar to how errors are printed in other
subsystems, e.g. "#, description". If we don't have a valid description
then only the number/code is printed.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-6-Yazen.Ghannam@amd.com
Update the error descriptions to match the latest documentation for
easier searching. In some cases the changes are small and in other cases
the changes may be total rewording of the description.
No functional changes.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-5-Yazen.Ghannam@amd.com
Some SMCA bank types on future systems will report new error types even
though the bank type is not treated as a new version. These new error
types will reported by bits that are reserved in past systems.
Add the new error descriptions to the lists in edac_mce_amd.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com
The existing CS, PSP, and SMU SMCA bank types will see new versions (as
indicated by their McaTypes) in future SMCA systems.
Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
same names as the older versions, since they are logically the same to
the user. SMCA systems won't mix and match IP blocks with different
McaType versions in the same system, so there isn't a need to
distinguish them. The MCA_IPID register is saved when logging an MCA
error, and that can be used to triage the error.
Also, add the new error descriptions to edac_mce_amd. Some error types
(positions in the list) are overloaded compared to the previous
McaTypes. Therefore, just create new lists of the error descriptions to
keep things simple even if some of the error descriptions are the same
between versions.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com
This driver supports the Intel 10nm series server integrated memory
controller. It gets the memory capacity and topology information by
reading the registers in PCI configuration space and memory-mapped I/O.
It decodes the memory error address to the platform specific address
by using the ACPI Address Translation (ADXL) Device Specific Method
(DSM).
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-5-tony.luck@intel.com
Delete the duplicated code from skx_edac.c and rename skx_edac.c to
skx_base.c. Update the Makefile to build the skx_edac driver from
skx_base.c and skx_common.c.
Add SPDX to skx_base.c and clean out unnecessary #include lines.
[ bp: Drop the license boilerplate - there's an SPDX identifier now. ]
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-4-tony.luck@intel.com
Parts of skx_edac can be shared with the Intel 10nm server EDAC driver.
Carve out the common parts from skx_edac in preparation to support both
skx_edac driver and i10nm_edac drivers.
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-3-tony.luck@intel.com
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
[ bp: Make edac_debugfs_init() return void too, while at it. ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190122152151.16139-17-gregkh@linuxfoundation.org
Gate error injection feature with CONFIG_EDAC_DEBUG so that it is not
visible in production setups.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: York Sun <york.sun@nxp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20181119225303.13265-1-york.sun@nxp.com
Current debugfs shows the decoded result in its own print format which
is inconvenient for analysis/statistics.
Use skx_mce_check_error() instead of skx_decode() for debugfs,
then the decoded result is showed via EDAC core in a more readable
format like "CPU_SrcID#[0-9]_MC#[0-9]_Chan#[0-9]_DIMM#[0-9]".
Print a warning the first time this interface is used so the
administrator can see the console log that error(s) have been faked.
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353705-13531-1-git-send-email-qiuxu.zhuo@intel.com
The debugfs node is /sys/kernel/debug/skx_edac_test. Rename it and move
under EDAC debugfs root directory. Remove the unused 'skx_fake_addr' and
remove the 'skx_test' on error.
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353684-13496-1-git-send-email-qiuxu.zhuo@intel.com
Some debug/error strings in hex formatting do not have the '0x' prefix.
Prepend hex formatting with '0x' for them, but with one exception:
"Couldn't enable %04x:%04x", instead of putting '0x' in this line,
add the word 'device'. We commonly use 8086:1234 without the leading
'0x' (e.g. as '-d' argument to lspci(8) and setpci(8) commands).
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Tony Luck <tony.luck@intel.com>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353660-13458-1-git-send-email-qiuxu.zhuo@intel.com
The order of function calling in skx_exit() is not the reversed order in
skx_init(). Fix it.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Tony Luck <tony.luck@intel.com>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353616-13421-1-git-send-email-qiuxu.zhuo@intel.com
... and use the single edac_subsys object returned from
subsys_system_register(). The idea is to have a single bus
and multiple devices on it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
Nobody(*) uses them. Dropping this will allow us to make the total
number of memory controllers configurable (as we won't have to worry
about duplicated device names under this directory).
(*) https://lkml.kernel.org/r/20180927221054.580220e5@coco.lan
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181001224313.GA9487@agluck-desk
It was previously noted that Kconfig complained about unmet dependencies
when trying to configure skx_edac together with CONFIG_ACPI=n. First fix
for this checked for ACPI when doing
select ACPI_ADXL
but this required stub functions for the case where ACPI wasn't
selected. It also allowed building a driver that didn't actually work
for a system that has non-volatile DIMMs.
Arnd Bergmann pointed out that the right fix is to make EDAC_SKX
"depend on ACPI".
Fixes: a324e9396c ("EDAC, skx: Fix randconfig builds")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: qiuxu.zhuo@intel.com
Link: http://lkml.kernel.org/r/20181106183914.GA26731@agluck-desk
Fix this gcc -Wunused-but-set-variable warning:
drivers/edac/i82975x_edac.c:378:16: warning: variable 'dtype'
set but not used [-Wunused-but-set-variable]
It was introduced in
084a4fccef ("edac: move dimm properties to struct dimm_info")
but never used.
Also, remove the function i82975x_dram_type() and move the comment and
the assignment to the place where it is used.
[ bp: massage commit message and shorten comment. ]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Arvind R." <arvino55@gmail.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: ravi@jetztechnologies.com
CC: arvino55@gmail.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181107022237.14048-1-yuehaibing@huawei.com
irq_handled isn't initialized to false on function entry. However, it is
not really needed and the IRQ handler return value can be set directly
instead.
[ bp: rewrite commit message. ]
Fixes: 27450653f1 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Channagoud Kadabi <ckadabi@codeaurora.org>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
CC: kernel-janitors@vger.kernel.org
CC: linux-arm-msm@vger.kernel.org
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181018141522.rywdvjmlpk4ticiw@kili.mountain
* ACPI_ADXL build fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlvcKnEACgkQEsHwGGHe
VUr45g/9E67lU84Dz41ly6zFDTmQdYBNPRayz9QgIGHfIMwIN8aAVoezC8B4NCqc
8rQ3W48REkewLmPO2GoEVld4UnHbvVlZYZ4bGcxhYzWL5dcleoJIFVupF4Ifo6/Y
SPbVUyihtw4FFr+Ft8x/bOJQ6QQ4CIiVX3mJBdwdqQ6Lm9yoEz6AlSbTJiyyzr8I
gGfcKD5TcmJWpsDzRXJ/xWddfA+hfUpKxkuJqPIRZvmKnJpy79af8MlQAZwXuXVS
361wj5SzP0LzktT5JQn73V04NzSSDTbFSycnWXUex3lxIsE6KolsEwfglccdhvIy
Nz/Du4kJn1Ye/zbsO27qtkGCXSz0qYKsdfUg1RY+MnZPe4mFmmAm0izTKH3EltQx
+OQWtcBz5vvNf3Odwnfw1nNvYrbnzaDNIsjHspopVrsmD2oRefmpsYsOEzGyAGDw
PtUC3l70u7i4e2NtF7Doo23g1yIyXWQZtrEDDFmQZGUo7YKxE7AkGPzINxpNSQME
z11ny9GyISaUc/Zf5zUHYmIFcYtiCngecl4F8hCXvfyp4MbxYgTI+5YY185NfLSU
pQHwyMGKLifI6ndhWd3sO9KSCqFzSZKaVF/DdKScSqB+v1NqflMyjzfulR625/5U
cSWWQP8Zu6H7os/X3+o5KCC/Pzbs5Nx/QPLwCabrOwcI3kdmQ2E=
=JQUK
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull more EDAC updates from Borislav Petkov:
"The second part of the EDAC pile which contains the ADXL user and a
build fix which addresses a not-so-sensical .config but fixes
randconfig builds people do:
- skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo)
- ACPI_ADXL build fix"
[ I don't think "sensical" is a word, particularly when used in the
context of actually meaning "nonsensical", but I like it - Linus ]
* tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, skx: Fix randconfig builds
EDAC, skx_edac: Add address translation for non-volatile DIMMs
The driver depends on the ADXL component glue and selects it. However,
ADXL itself implicitly depends on ACPI and in nonsensical randconfig
builds like this:
# CONFIG_ACPI is not set
CONFIG_ACPI_ADXL=y
where ACPI is not enabled, the build fails with:
drivers/edac/skx_edac.o: In function `skx_mce_check_error':
skx_edac.c:(.text+0xab): undefined reference to `adxl_decode'
drivers/edac/skx_edac.o: In function `skx_init':
skx_edac.c:(.init.text+0x8bf): undefined reference to `adxl_get_component_names'
make: *** [vmlinux] Error 1
Add stubs for that case so that the build succeeds. CONFIG_ACPI=n
doesn't make any sense for real configurations but this fix will at
least silence randconfig builds.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
The most noteworthy SoC driver changes this time include:
- The TEE subsystem gains an in-kernel interface to access the TEE
from device drivers.
- The reset controller subsystem gains a driver for the Qualcomm
Snapdragon 845 Power Domain Controller.
- The Xilinx Zynq platform now has a firmware interface for its
platform management unit. This contains a firmware "ioctl" interface
that was a little controversial at first, but the version we merged
solved that by not exposing arbitrary firmware calls to user space.
- The Amlogic Meson platform gains a "canvas" driver that is used
for video processing and shared between different high-level drivers.
The rest is more of the usual, mostly related to SoC specific power
management support and core drivers in drivers/soc:
- Several Renesas SoCs (RZ/G1N, RZ/G2M, R-Car V3M, RZ/A2M) gain new
features related to power and reset control.
- The Mediatek mt8183 and mt6765 SoC platforms gain support for
their respective power management chips.
- A new driver for NXP i.MX8, which need a firmware interface for
power management.
- The SCPI firmware interface now contains support estimating power
usage of performance states
- The NVIDIA Tegra "pmc" driver gains a few new features, in particular
a pinctrl interface for configuring the pads.
- Lots of small changes for Qualcomm, in particular the "smem"
device driver.
- Some cleanups for the TI OMAP series related to their sysc
controller.
Additional cleanups and bugfixes in SoC specific drivers include the
Meson, Keystone, NXP, AT91, Sunxi, Actions, and Tegra platforms.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJb1zEhAAoJEGCrR//JCVInnYQP/1pPXWsR/DV4COf4kGJFSAFn
EfHXJM1vKtb7AWl6SClpHFlUMt+fvL+dzDNJ9aeRr2GjcuWfzKDcrBM1ZvM70I31
C1Oc3b6OXEERCozDpRg/Vt8OpIvvWnVpaVffS9E5y6KqF8KZ0UbpWIxUJ87ik44D
UvNXYOU/LUGPxR1UFm5rm2zWF4i+rBvqnpVaXbeOsXsLElzxXVfv2ymhhqIpo2ws
o6e00DSjUImg8hLL4HCGFs2EX1KSD+oFzYaOHIE0/DEaiOnxVOpMSRhX2tZ+tRRb
DekbjL+wz5gOAKJTQfQ2sNNkOuK8WFqmE5G0RJ0iYPXuNsB/17UNb2bhTJeqGdcD
dqCQBLQuDUD2iHJ/d4RK5Kx3a8h2X63n5bdefgF5UX/2RBpXwFk1QtHr8X0DuY8c
o/dPGFNBOn3egzMyXrD5VEtnaTwK1Y6/h09qfuOOF1ZuYDmELKRkWMV9l8dIsvd8
ANjaw5B8MOUAf8DccBmPgUGu0XLCDyuFGqNVd9Kj5u3az+tyggIsgkEjWg1pxTv0
7dDDyv4Ara1V1HVDZ23l3CgmYCZQx2R/vdpX/DjuDPGEHGjZ5s2TW8P6oegdxtIh
LcTonNoTsRYzMrGD/aqhG/8fYsAScXePa3CLKl1Hrl+wFVV0XcaggH23GwD/k+7S
eDBrEzLkOTxM+WXvsvKY
=c/PQ
-----END PGP SIGNATURE-----
Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
"The most noteworthy SoC driver changes this time include:
- The TEE subsystem gains an in-kernel interface to access the TEE
from device drivers.
- The reset controller subsystem gains a driver for the Qualcomm
Snapdragon 845 Power Domain Controller.
- The Xilinx Zynq platform now has a firmware interface for its
platform management unit. This contains a firmware "ioctl"
interface that was a little controversial at first, but the version
we merged solved that by not exposing arbitrary firmware calls to
user space.
- The Amlogic Meson platform gains a "canvas" driver that is used for
video processing and shared between different high-level drivers.
The rest is more of the usual, mostly related to SoC specific power
management support and core drivers in drivers/soc:
- Several Renesas SoCs (RZ/G1N, RZ/G2M, R-Car V3M, RZ/A2M) gain new
features related to power and reset control.
- The Mediatek mt8183 and mt6765 SoC platforms gain support for their
respective power management chips.
- A new driver for NXP i.MX8, which need a firmware interface for
power management.
- The SCPI firmware interface now contains support estimating power
usage of performance states
- The NVIDIA Tegra "pmc" driver gains a few new features, in
particular a pinctrl interface for configuring the pads.
- Lots of small changes for Qualcomm, in particular the "smem" device
driver.
- Some cleanups for the TI OMAP series related to their sysc
controller.
Additional cleanups and bugfixes in SoC specific drivers include the
Meson, Keystone, NXP, AT91, Sunxi, Actions, and Tegra platforms"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (129 commits)
firmware: tegra: bpmp: Implement suspend/resume support
drivers: clk: Add ZynqMP clock driver
dt-bindings: clock: Add bindings for ZynqMP clock driver
firmware: xilinx: Add zynqmp IOCTL API for device control
Documentation: xilinx: Add documentation for eemi APIs
MAINTAINERS: imx: include drivers/firmware/imx path
firmware: imx: add misc svc support
firmware: imx: add SCU firmware driver support
reset: Fix potential use-after-free in __of_reset_control_get()
dt-bindings: arm: fsl: add scu binding doc
soc: fsl: qbman: add interrupt coalesce changing APIs
soc: fsl: bman_portals: defer probe after bman's probe
soc: fsl: qbman: Use last response to determine valid bit
soc: fsl: qbman: Add 64 bit DMA addressing requirement to QBMan
soc: fsl: qbman: replace CPU 0 with any online CPU in hotplug handlers
soc: fsl: qbman: Check if CPU is offline when initializing portals
reset: qcom: PDC Global (Power Domain Controller) reset controller
dt-bindings: reset: Add PDC Global binding for SDM845 SoCs
reset: Grammar s/more then once/more than once/
bus: ti-sysc: Just use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS
...
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
- Work to get rid of direct accesses to struct device_node name and
type pointers in preparation for removing them. New helpers for
parsing DT cpu nodes and conversions to use the helpers. printk
conversions to %pOFn for printing DT node names. Most went thru
subystem trees, so this is the remainder.
- Fixes to DT child node lookups to actually be restricted to child
nodes instead of treewide.
- Refactoring of dtb targets out of arch code. This makes the support
more uniform and enables building all dtbs on c6x, microblaze, and
powerpc.
- Various DT binding updates for Renesas r8a7744 SoC
- Vendor prefixes for Facebook, OLPC
- Restructuring of some ARM binding docs moving some peripheral bindings
out of board/SoC binding files
- New "secure-chosen" binding for secure world settings on ARM
- Dual licensing of 2 DT IRQ binding headers
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlvTKWYQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw8J5EACMAnrTxWQmXfQXOZEVxztcFavH6LP8mh2e
7FZIZ38jzHXXvl81tAg1nBhzFUU/qtvqW8NDCZ9OBxKvp6PFDNhWu241ZodSB1Kw
MZWy2A9QC+qbHYCC+SB5gOT0+Py3v7LNCBa5/TxhbFd35THJM8X0FP7gmcCGX593
9Ml1rqawT4mK5XmCpczT0cXxyC4TgVtpfDWZH2KgJTR/kwXVQlOQOGZ8a1y/wrt7
8TLIe7Qy4SFRzjhwbSta1PUehyYfe4uTSsXIJ84kMvNMxinLXQtvd7t9TfsK8p/R
WjYUneJskVjtxVrMQfdV4MxyFL1YEt2mYcr0PMKIWxMCgGDAZsHPoUZmjyh/PrCI
uiZtEHn3fXpUZAV/xEHHNirJxYyQfHGiksAT+lPrUXYYLCcZ3ZmqiTEYhGoQAfH5
CQPMuxA6yXxp6bov6zJwZSTZtkXciju8aQRhUhlxIfHTqezmGYeql/bnWd+InNuR
upANLZBh6D2jTWzDyobconkCCLlVkSqDoqOx725mMl6hIcdH9d2jVX7hwRf077VI
5i3CyPSJOkSOLSdB8bAPYfBoaDtH2bthxieUrkkSbIjbwHO1H6a2lxPeG/zah0a3
ePMGhi7J84UM4VpJEi000cP+bhPumJtJrG7zxP7ldXdfAF436sQ6KRptlcpLpj5i
IwMhUQNH+g==
=335v
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree updates from Rob Herring:
"A bit bigger than normal as I've been busy this cycle.
There's a few things with dependencies and a few things subsystem
maintainers didn't pick up, so I'm taking them thru my tree.
The fixes from Johan didn't get into linux-next, but they've been
waiting for some time now and they are what's left of what subsystem
maintainers didn't pick up.
Summary:
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
- Work to get rid of direct accesses to struct device_node name and
type pointers in preparation for removing them. New helpers for
parsing DT cpu nodes and conversions to use the helpers. printk
conversions to %pOFn for printing DT node names. Most went thru
subystem trees, so this is the remainder.
- Fixes to DT child node lookups to actually be restricted to child
nodes instead of treewide.
- Refactoring of dtb targets out of arch code. This makes the support
more uniform and enables building all dtbs on c6x, microblaze, and
powerpc.
- Various DT binding updates for Renesas r8a7744 SoC
- Vendor prefixes for Facebook, OLPC
- Restructuring of some ARM binding docs moving some peripheral
bindings out of board/SoC binding files
- New "secure-chosen" binding for secure world settings on ARM
- Dual licensing of 2 DT IRQ binding headers"
* tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (78 commits)
ARM: dt: relicense two DT binding IRQ headers
power: supply: twl4030-charger: fix OF sibling-node lookup
NFC: nfcmrvl_uart: fix OF child-node lookup
net: stmmac: dwmac-sun8i: fix OF child-node lookup
net: bcmgenet: fix OF child-node lookup
drm/msm: fix OF child-node lookup
drm/mediatek: fix OF sibling-node lookup
of: Add missing exports of node name compare functions
dt-bindings: Add OLPC vendor prefix
dt-bindings: misc: bk4: Add device tree binding for Liebherr's BK4 SPI bus
dt-bindings: thermal: samsung: Add SPDX license identifier
dt-bindings: clock: samsung: Add SPDX license identifiers
dt-bindings: timer: ostm: Add R7S9210 support
dt-bindings: phy: rcar-gen2: Add r8a7744 support
dt-bindings: can: rcar_can: Add r8a7744 support
dt-bindings: timer: renesas, cmt: Document r8a7744 CMT support
dt-bindings: watchdog: renesas-wdt: Document r8a7744 support
dt-bindings: thermal: rcar: Add device tree support for r8a7744
Documentation: dt: Add binding for /secure-chosen/stdout-path
dt-bindings: arm: zte: Move sysctrl bindings to their own doc
...
Currently, this driver doesn't support address translation for
non-volatile DIMMs.
The ACPI ADXL DSM method provides address translation for both volatile
and non-volatile DIMMs. Enable it to use the ACPI DSM methods if they
are supported and there are non-volatile DIMMs populated on the system.
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1540106336-5212-1-git-send-email-qiuxu.zhuo@intel.com
Hygon Dhyana support (Pu Wen)
- sb_edac: New maintainer + fixes (Tony Luck)
Error reporting improvements and fixes (Qiuxu Zhuo)
- ghes_edac: SMBIOS handle type 17 for DIMM locating and per-DIMM error
accounting (Fan Wu)
- altera_edac: Stratix10 support and refactoring (Thor Thayer)
Out of tree addition:
- acpi_adxl: Address Translation interface using an ACPI DSM (Tony Luck)
- the usual amount of other misc fixes and cleanups all over.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlvRbEYACgkQEsHwGGHe
VUqMXRAArOg2zJd2IuAMH/QxGSpwCIWYAQRN3mhLV7ngnBQ9d12CaadxUbhFLfgK
j70DkvG7a3QzeaXkm/wf0EVro590x7zi7AgAPuOJNx1gdOEJTJpT/Bc7YSN0m8R3
gjruqIVTKuaeY33tmhWN8XoMpX5W8Hmff7j1wrLxDfz/8vTPxZto/XpJaNVkmzUQ
cIsDzOJzedGIeeto/dEqlwMr1sm4qddoTrVJhSsHrtnXKBRQMcxLQVL5LFtvbTPS
tnXNN+xphtknzXWh151YmmEv4pPL34FN6JZOYpfViY/uTtiIuYD3O1Gsl1CVMXV7
XPGAngo5lT+HV6Nb00sv4Ncdkl0vatQVMFWynOM0eCjVU39bD4ZrfL6alxBgx0PH
JEBqaGGZEs71J3/GXpFeFPiy1SiZmUTwncPCouVAwkFu1HPBfRT5yVFLnl53ITg4
Z0Hmdgd/zDkowfiXDQxKg2muFLHQwIYj1L0m0d+YfjYqoTIbPSsOsZMDlokwwHCS
/AUGoi+pQCcexdeTbSfmz/8SS2shLhSudvLGtpehPiM8evff6qgzOvF7mVhaN8M7
gPUIOiiTaC5z68RqbksqlxPdDEfq+A6R+0a2VUYkma2qMHLVCCuKZ9g+XiFBJ55D
f8xDzwrZ4jkVI925bsThz7wxBITGgb6cHGNw3s9XwUUbns2Oz9g=
=HEJK
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"The EDAC tree was busier than usual this cycle as the shortlog below
shows.
Also, this pull request is carrying an ACPI DSM driver which is used
to ask the platform to supply the DIMM location of a reported hardware
error and thus simplify all the EDAC logic when trying to map the
error address to the respective DIMM.
Core EDAC updates:
- amd64_edac: AMD family 0x17, models 0x10-0x2f support (Michael Jin)
Hygon Dhyana support (Pu Wen)
- sb_edac: New maintainer + fixes (Tony Luck) Error reporting
improvements and fixes (Qiuxu Zhuo)
- ghes_edac: SMBIOS handle type 17 for DIMM locating and per-DIMM
error accounting (Fan Wu)
- altera_edac: Stratix10 support and refactoring (Thor Thayer)
Out of tree addition:
- acpi_adxl: Address Translation interface using an ACPI DSM (Tony
Luck)
- the usual amount of other misc fixes and cleanups all over"
* tag 'edac_for_4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (22 commits)
ACPI/ADXL: Add address translation interface using an ACPI DSM
EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr()
EDAC, skx_edac: Fix logical channel intermediate decoding
EDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting
EDAC, altera: Work around int-to-pointer-cast warnings
EDAC, amd64: Add Hygon Dhyana support
EDAC: Raise the maximum number of memory controllers
arm64: dts: stratix10: Add peripheral EDAC nodes
EDAC, altera: Add Stratix10 peripheral support
EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine
arm64: dts: stratix10: Add SDRAM node
EDAC, altera: Combine Stratix10 and Arria10 probe functions
arm64: dts: stratix10: Additions to EDAC System Manager
EDAC, i7core: Remove set but not used variable pvt
EDAC, ghes: Use CPER module handles to locate DIMMs
EDAC: Correct DIMM capacity unit symbol
EDAC, sb_edac: Fix signedness bugs in *_get_ha() functions
EDAC, sb_edac: Fix reporting for patrol scrubber errors
EDAC, sb_edac: Return early on ADDRV bit and address type test
MAINTAINERS: Update maintainer for drivers/edac/sb_edac.c
...
Pull perf updates from Ingo Molnar:
"The main updates in this cycle were:
- Lots of perf tooling changes too voluminous to list (big perf trace
and perf stat improvements, lots of libtraceevent reorganization,
etc.), so I'll list the authors and refer to the changelog for
details:
Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.
... with the bulk of the changes written by Jiri Olsa, Tzvetomir
Stoyanov and Arnaldo Carvalho de Melo.
- Continued intel_rdt work with a focus on playing well with perf
events. This also imported some non-perf RDT work due to
dependencies. (Reinette Chatre)
- Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
This allows to speed up the PMI handler by avoiding unnecessary MSR
writes and make it more accurate. (Andi Kleen)
- kprobes cleanups and simplification (Masami Hiramatsu)
- Intel Goldmont PMU updates (Kan Liang)
- ... plus misc other fixes and updates"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
kprobes/x86: Use preempt_enable() in optimized_callback()
x86/intel_rdt: Prevent pseudo-locking from using stale pointers
kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
perf/x86/intel: Export mem events only if there's PEBS support
x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
x86/intel_rdt: Fix initial allocation to consider CDP
x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
x86/intel_rdt: Introduce utility to obtain CDP peer
tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
perf python: More portable way to make CFLAGS work with clang
perf python: Make clang_has_option() work on Python 3
perf tools: Free temporary 'sys' string in read_event_files()
perf tools: Avoid double free in read_event_file()
perf tools: Free 'printk' string in parse_ftrace_printk()
perf tools: Cleanup trace-event-info 'tdata' leak
perf strbuf: Match va_{add,copy} with va_end
perf test: S390 does not support watchpoints in test 22
perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
tools include: Adopt linux/bits.h
...
Going primarily by:
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
with additional information gleaned from other related pages; notably:
- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont
The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
for i in `git grep -l FAM6_ATOM` ; do
sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com
Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
has the side effect of defaulting to iterating using "cpu" node names in
preference to the deprecated (for FDT) device_type == "cpu".
The error messages are removed in the process as it's not the driver's
job to be checking cpu nodes. Any problems with cpu nodes should be
noticed by the architecture code.
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rob Herring <robh@kernel.org>
The altera edac driver passes a token from a DT resource as
resource_size_t into an SMC call, but casts it to an __iomem pointer and
then a plain void pointer inbetween, mixing three or four incompatible
types in the process. The compiler complains about one of the
conversions:
drivers/edac/altera_edac.c: In function 'altr_init_a10_ecc_block':
drivers/edac/altera_edac.c:1053:10: error: cast to pointer from integer of \
different size [-Werror=int-to-pointer-cast]
base = (void __iomem *)res.start;
^
drivers/edac/altera_edac.c: In function 'altr_edac_a10_probe':
drivers/edac/altera_edac.c:2062:10: error: cast to pointer from integer of \
different size [-Werror=int-to-pointer-cast]
base = (void __iomem *)res.start;
Using a static checker probably also notices the __iomem cast. Solving
this properly isn't trivial, but simply casting to a 'uintptr_t' instead
of 'void __iomem *' makes it less wrong and should avoid the warnings.
Fixes: d5fc912556 ("EDAC, altera: Combine Stratix10 and Arria10 probe functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: David Frey <dpfrey@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180927100949.973078-1-arnd@arndb.de
Remove the unused local variable pvt:
drivers/edac/i7core_edac.c: In function 'i7core_mce_check_error':
drivers/edac/i7core_edac.c:1818:21: warning: variable 'pvt' set but not used \
[-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1537841043-108267-1-git-send-email-yuehaibing@huawei.com
The {i3200|i7core|sb|skx}_edac drivers show DIMM capacity using the
wrong unit symbol: 'Mb' - megabit. Fix them by replacing 'Mb' with
'MiB' - mebibyte.
[Tony: These are all "edac_dbg()" messages, so this won't break scripts
that parse console logs.]
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919003433.16475-1-tony.luck@intel.com
A static checker gave the following warnings:
drivers/edac/sb_edac.c:1030 ibridge_get_ha() warn: signedness bug returning '(-22)'
drivers/edac/sb_edac.c:1037 knl_get_ha() warn: signedness bug returning '(-22)'
Both because the functions are declared to return a "u8", but try to
return -EINVAL for the error case.
Fix by returning 0xff (since the caller doesn't look at, or pass on, the
return value).
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180914201905.GA30946@agluck-desk
Signed-off-by: Borislav Petkov <bp@suse.de>
Add error reporting driver for Single Bit Errors (SBEs) and Double Bit
Errors (DBEs). As of now, this driver supports error reporting for
Last Level Cache Controller (LLCC) of Tag RAM and Data RAM. Interrupts
are triggered when the errors happen in the cache, the driver handles
those interrupts and dumps the syndrome registers.
Signed-off-by: Channagoud Kadabi <ckadabi@codeaurora.org>
Signed-off-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
Co-developed-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
sb_edac sometimes reports the wrong DIMM for a memory error found by
the patrol scrubber. That is because the hardware provides only a 4KB
page-aligned address for the error case.
This means that the EDAC driver will point at the DIMM matching offset
0x0 in the 4KB page, but because of interleaving across channels and
ranks, the actual DIMM involved may be different if the error is on some
other cache line within the page.
Therefore, reconstruct the socket/iMC/channel information from the "mce"
structure passed to the EDAC driver. The DIMM cannot be determined, so
pass "dimm=-1" to the EDAC core. It will report that all the DIMMs on
that channel may be affected.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180907230828.13901-3-tony.luck@intel.com
[ Improve comments on the functions to convert bank number
to memory controller number. Minor cleanup to commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message more. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Users of the mce_register_decode_chain() are called for every logged
error. EDAC drivers should check:
1) Is this a memory error? [bit 7 in status register]
2) Is there a valid address? [bit 58 in status register]
3) Is the address a system address? [bitfield 8:6 in misc register]
The sb_edac driver performed test "1" twice. Waited far too long to
perform check "2". Didn't do check "3" at all.
Fix it by moving the test for valid address from
sbridge_mce_output_error() into sbridge_mce_check_error() and add a test
for the type immediately after. Delete the redundant check for the type
of the error from sbridge_mce_output_error().
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com
[ Re-word commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Split regmap_config.use_single_rw into use_single_read and
use_single_write. This change enables drivers of devices which only
support bulk operations in one direction to use the regmap_bulk_*()
functions for both directions and have their bulk operation split into
single operations only when necessary.
Update all struct regmap_config instances where use_single_rw==true to
instead set both use_single_read and use_single_write. No attempt was
made to evaluate whether it is possible to set only one of
use_single_read or use_single_write.
Signed-off-by: David Frey <dpfrey@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add new device IDs for family 17h, models 10h-2fh.
This is required by amd64_edac_mod in order to properly detect PCI
device functions 0 and 6.
Signed-off-by: Michael Jin <mikhail.jin@gmail.com>
Reviewed-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Extend the driver to check whether segment number and bus number matches
when deciding how to group memory controller PCI devices to CPU sockets.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180724190213.26359-1-msys.mizuma@gmail.com
[ Cleanup commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
In the quest to remove all stack VLA usage from the kernel[1], switch to
using a kmalloc-allocated buffer instead of stack space. This should be
fine since the existing routine is allocating memory too.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180629184850.GA37464@beast
Link: https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com [1]
Signed-off-by: Borislav Petkov <bp@suse.de>
Make sure to free and deregister the addrmatch and chancounts devices
allocated during probe in all error paths. Also fix use-after-free in a
probe error path and in the remove success path where the devices were
being put before before deregistration.
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 356f0a3086 ("i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy")
Link: http://lkml.kernel.org/r/20180612124335.6420-2-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
Make sure to use put_device() to free the initialised struct device so
that resources managed by driver core also gets released in the event of
a registration failure.
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2d56b109e3 ("EDAC: Handle error path in edac_mc_sysfs_init() properly")
Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
If regmap_write() fails, we should release some resources as done in all
the other error handling paths of the function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180610174532.22071-1-christophe.jaillet@wanadoo.fr
Fixes: e9918d7faf ("EDAC, altera: Handle SDRAM Uncorrectable Errors on Stratix10")
Signed-off-by: Borislav Petkov <bp@suse.de>
ARM machines all have DMI tables so if they request hw error reporting
through GHES, then the driver should be able to detect DIMMs and report
errors successfully (famous last words :)).
Make the platform-based list x86-specific so that ghes_edac can load on
ARM.
Reported-by: Qiang Zheng <zhengqiang10@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Tested-by: Qiang Zheng <zhengqiang10@huawei.com>
Link: https://lkml.kernel.org/r/1526039543-180996-1-git-send-email-zhengqiang10@huawei.com
The kbuild test robot reported the following warning:
drivers/edac/altera_edac.c: In function 'ocram_free_mem':
drivers/edac/altera_edac.c:1410:42: warning: cast from pointer to integer
of different size [-Wpointer-to-int-cast]
gen_pool_free((struct gen_pool *)other, (u32)p, size);
^
After adding support for ARM64 architectures, the unsigned long
parameter is 64 bits and causes a build warning on 64-bit configs. Fix
by casting to the correct size (unsigned long) instead of u32.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: c3eea1942a ("EDAC, altera: Add Altera L2 cache and OCRAM support")
Link: http://lkml.kernel.org/r/1526317441-4996-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Prevent build error when CONFIG_ACPI_NFIT=m and CONFIG_EDAC_SKX=y by
limiting EDAC_SKX based on how ACPI_NFIT is set.
Fixes this build error:
drivers/edac/skx_edac.o: In function `get_nvdimm_info':
../drivers/edac/skx_edac.c:399: undefined reference to `nfit_get_smbios_id'
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 58ca9ac146 ("EDAC, skx_edac: Detect non-volatile DIMMs")
Link: http://lkml.kernel.org/r/3af91354-8e19-d2af-1bba-ced8dce053f1@infradead.org
Signed-off-by: Borislav Petkov <bp@suse.de>
... for improved readability. Also, add a local mask variable for the
same reason.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
The ghes_edac driver obtains memory type from SMBIOS type 17, but it
does not recognize DDR4 and NVDIMM types.
Add support of DDR4 and NVDIMM types. NVDIMM type is denoted by memory
type DDR3/4 and non-volatile.
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180509222030.9299-1-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
On Stratix10, uncorrectable errors are routed to the SError exception
instead of the IRQ exceptions. In Stratix10, uncorrectable SErrors
must be treated as fatal and will cause a panic. Older Altera/Intel
parts printed out a message for UE so do that here using the notifier
framework.
Record the UE in sticky registers that retain the state through a reset.
Check these registers on probe and printout the error on startup.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: mchehab@kernel.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1526079610-5527-1-git-send-email-thor.thayer@linux.intel.com
[ Remove unused var in s10_edac_dberr_handler(), reorder args. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The use of the @ghes argument was removed in a previous commit, but
function signature was not updated to reflect this.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180430213358.8319-1-mr.nuke.me@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Add a null check for ghes_pvt, before dereferencing it. The pointer
could still be null in case the return path is taken before initialising
ghes_pvt in the registration function.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Sughosh Ganu <sughosh.ganu@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1524737809-24475-1-git-send-email-sughosh.ganu@arm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Tony reported seeing
"Internal error: Can't find EDAC structure"
when injecting correctable errors due to the fact that ghes_edac would
still load even if the whitelist won't hit. Drop the pr_err() in
ghes_edac_report_mem_error() for now due to the hacky way how ghes_edac
depends on ghes.c.
While at it, make ghes_edac_register() return an error if it doesn't hit
in the whitelist as it is the only sensible thing to do in that
situation.
Furthermore, move the call to it to happen last in ghes_probe() so that
GHES initializing properly does not depend on ghes_edac init at all
as latter is only reporting errors and not required for GHES's proper
functioning.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Tested-by: Sughosh Ganu <sughosh.ganu@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180420182015.zao3olss4tvvlxki@agluck-desk
* misc fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlrDZ6EACgkQEsHwGGHe
VUpWWA//RG9WqqdXxPa05TxiRFWMiNSH/HSGJQZsc3K7xm2TEGVdDv5thYsxXCaW
kA0oiUJA3zd62HeUIWMeGypm+hLM8T73836aIKcfxMduxQ6DR9nk51IV4RmHsLvT
rd4bUsrjCov6spapqS265LOxZVnpKvxZv0NVTj6pIk0pkeghWMZh63vZtrxVkLsS
6dHsdIDBr0fMhcFBVQ67SEDpMIyMAqziPOP9sv6NNElG4Q93RnZr6KYvQdE32Iev
Z6lIUU/sjVJEuhgp6FjJrhvjc+FmGJGERsFqo3Z8RT9P26Cj3wq0ov8IY/KSBazG
mfJFGbL+O3lppGKjjuAwMGr4R3fZMLQpgLxdYe7VNvOHB9ea7732q9MDGiNkN/u7
z8o+dtvmPWr0Y1ir8te+7LnSQXIAwxMkKP/lMStzzuFUETQDd0RWr7GAKlbrgPOU
+DhiQRPBFS7qpslA7BdqV6Ybr3nhDVZciDYWsvNpDMwdem05TxlFwC5l/BzP9NtM
FmQ3B0rQ5we6gib6UZWUjW3BH9wUgFUpu/WXKJl1unMHwYNe3CvdM8yD5W7mUylh
6SVYAmkZyuJ1Du40+YMeimroe77XV60IqoUFNEzMSFeB4/GxMG6+u2gUqNuTpHpZ
aMmCRj9DxWGxZyDsTc49An0X/JbVyIE/Aq6heMTYScgBvCaD1VE=
=K89j
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"Noteworthy is the NVDIMM support:
- NVDIMM support to EDAC (Tony Luck)
- misc fixes"
* tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, sb_edac: Remove variable length array usage
EDAC, skx_edac: Detect non-volatile DIMMs
firmware, DMI: Add function to look up a handle and return DIMM size
acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle
EDAC: Add new memory type for non-volatile DIMMs
EDAC: Drop duplicated array of strings for memory type names
EDAC, layerscape: Allow building for LS1021A
This removes the entire architecture code for blackfin, cris, frv, m32r,
metag, mn10300, score, and tile, including the associated device drivers.
I have been working with the (former) maintainers for each one to ensure
that my interpretation was right and the code is definitely unused in
mainline kernels. Many had fond memories of working on the respective
ports to start with and getting them included in upstream, but also saw
no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company
in charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems
that all the SoC product lines are still around, but have not used the
custom CPU architectures for several years at this point. In contrast,
CPU instruction sets that remain popular and have actively maintained
kernel ports tend to all be used across multiple licensees.
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I made
sure that they are all unused. Some architectures (notably tile, mn10300,
and blackfin) are still being shipped in products with old kernels,
but those products will never be updated to newer kernel releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing their
support or getting it added to mainline gcc in the first place.
They all have patched gcc-7.3 ports that work to some degree, but
complete upstream support won't happen before gcc-8.1. Csky posted
their first kernel patch set last week, their situation will be similar.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJawdL2AAoJEGCrR//JCVInuH0P/RJAZh1nTD+TR34ZhJq2TBoo
PgygwDU7Z2+tQVU+EZ453Gywz9/NMRFk1RWAZqrLix4ZtyIMvC6A1qfT2yH1Y7Fb
Qh6tccQeLe4ezq5u4S/46R/fQXu3Txr92yVwzJJUuPyU0arF9rv5MmI8e6p7L1en
yb74kSEaCe+/eMlsEj1Cc1dgthDNXGKIURHkRsILoweysCpesjiTg4qDcL+yTibV
FP2wjVbniKESMKS6qL71tiT5sexvLsLwMNcGiHPj94qCIQuI7DLhLdBVsL5Su6gI
sbtgv0dsq4auRYAbQdMaH1hFvu6WptsuttIbOMnz2Yegi2z28H8uVXkbk2WVLbqG
ZESUwutGh8MzOL2RJ4jyyQq5sfo++CRGlfKjr6ImZRv03dv0pe/W85062cK5cKNs
cgDDJjGRorOXW7dyU6jG2gRqODOQBObIv3w5efdq5OgzOWlbI4EC+Y5u1Z0JF/76
pSwtGXA6YhwC+9LLAlnVTHG+yOwuLmAICgoKcTbzTVDKA2YQZG/cYuQfI5S1wD8e
X6urPx3Md2GCwLXQ9mzKBzKZUpu/Tuhx0NvwF4qVxy6x1PELjn68zuP7abDHr46r
57/09ooVN+iXXnEGMtQVS/OPvYHSa2NgTSZz6Y86lCRbZmUOOlK31RDNlMvYNA+s
3iIVHovno/JuJnTOE8LY
=fQ8z
-----END PGP SIGNATURE-----
Merge tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pul removal of obsolete architecture ports from Arnd Bergmann:
"This removes the entire architecture code for blackfin, cris, frv,
m32r, metag, mn10300, score, and tile, including the associated device
drivers.
I have been working with the (former) maintainers for each one to
ensure that my interpretation was right and the code is definitely
unused in mainline kernels. Many had fond memories of working on the
respective ports to start with and getting them included in upstream,
but also saw no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company in
charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
seems that all the SoC product lines are still around, but have not
used the custom CPU architectures for several years at this point. In
contrast, CPU instruction sets that remain popular and have actively
maintained kernel ports tend to all be used across multiple licensees.
[ See the new nds32 port merged in the previous commit for the next
generation of "one company in charge of an SoC line, a CPU
microarchitecture and a software ecosystem" - Linus ]
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I
made sure that they are all unused. Some architectures (notably tile,
mn10300, and blackfin) are still being shipped in products with old
kernels, but those products will never be updated to newer kernel
releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing
their support or getting it added to mainline gcc in the first
place. They all have patched gcc-7.3 ports that work to some
degree, but complete upstream support won't happen before gcc-8.1.
Csky posted their first kernel patch set last week, their situation
will be similar
[ Palmer Dabbelt points out that RISC-V support is in mainline gcc
since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"
This really says it all:
2498 files changed, 95 insertions(+), 467668 deletions(-)
* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
MAINTAINERS: UNICORE32: Change email account
staging: iio: remove iio-trig-bfin-timer driver
tty: hvc: remove tile driver
tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
serial: remove tile uart driver
serial: remove m32r_sio driver
serial: remove blackfin drivers
serial: remove cris/etrax uart drivers
usb: Remove Blackfin references in USB support
usb: isp1362: remove blackfin arch glue
usb: musb: remove blackfin port
usb: host: remove tilegx platform glue
pwm: remove pwm-bfin driver
i2c: remove bfin-twi driver
spi: remove blackfin related host drivers
watchdog: remove bfin_wdt driver
can: remove bfin_can driver
mmc: remove bfin_sdh driver
input: misc: remove blackfin rotary driver
input: keyboard: remove bf54x driver
...
The Tile architecture is obsolete and getting removed from the kernel,
this driver appears to only be used there, and not on the ARM based
successors (Tile-Mx, BlueField), so we should remove it as well.
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In preparation for enabling -Wvla, remove VLA and replace it with a
fixed-length array instead.
Also, remove max_interleave as it is no longer needed.
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180314182131.GA25259@embeddedgus
Signed-off-by: Borislav Petkov <bp@suse.de>
This just covers the topology function of the EDAC driver. We locate
which DIMM slots are populated with NVDIMMs and query the NFIT and
SMBIOS tables to get the size.
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-6-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
There are now non-volatile versions of DIMMs. Add a new entry to "enum
mem_type" and a new string in edac_mem_types[].
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Somehow we ended up with two separate arrays of strings to describe the
"enum mem_type" values.
In edac_mc.c we have an exported list edac_mem_types[] that is used
by a couple of drivers in debug messaged.
In edac_mc_sysfs.c we have a private list that is used to display
values in:
/sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
/sys/devices/system/edac/mc/mc*/csrow*/mem_type
This list was missing a value for MEM_LRDDR3.
The string values in the two lists were different :-(
Combining the lists, I kept the values so that the sysfs output
will be unchanged as some scripts may depend on that.
Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The LS1021A has a memory controller supported by this driver. It builds
just fine, and I've done some rudimentary testing using the error
injection facility, which suggests that it is indeed working.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: York Sun <york.sun@nxp.com>
Cc: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180220150912.2954-1-rasmus.villemoes@prevas.dk
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit
3286d3eb90 ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights
Landing which supports up to 6 channels.
This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm
variables which don't pay critical role on KNL code path, so the memory
corruption wasn't causing any visible driver failures.
The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that.
An alternative solution would be to restructure the KNL part of the
driver to 2MC/3channel representation.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Karbownik <anna.karbownik@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: jim.m.snow@intel.com
Cc: krzysztof.paliswiat@intel.com
Cc: lukasz.odzioba@intel.com
Cc: qiuxu.zhuo@intel.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 3286d3eb90 ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
bank 4 in the smca_banks array. This means that when we check if a bank
is initialized, like during boot or resume, we will see that bank 4 is
not initialized and try to initialize it.
This will cause a call trace, when resuming from suspend, due to
rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
an IPI but we're running with interrupts disabled. This triggers:
WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
...
Reserved banks will be read-as-zero, so their MCA_IPID register will be
zero. So, like the smca_banks array, the threshold_banks array will not
have an entry for a reserved bank since all its MCA_MISC* registers will
be zero.
Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.
Use the "Reserved" type when checking if a bank is reserved. It's
possible that other bank numbers may be reserved on future systems.
Don't try to find the block address on reserved banks.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
x86_mask is a confusing name which is hard to associate with the
processor's stepping.
Additionally, correct an indent issue in lib/cpu.c.
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We should not call edac_mc_del_mc() if a corresponding call to
edac_mc_add_mc() has not been performed yet.
So here, we should go to err instead of err2 to branch at the right
place of the error handling path.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180107205400.14068-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
TI Keystone and DRA7xx SoCs have support for EDAC on DDR3 memory that can
correct one bit errors and detect two bit errors. Add EDAC driver for this
feature which plugs into the generic kernel EDAC framework.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-omap@vger.kernel.org
Link: http://lkml.kernel.org/r/1510578490-14510-1-git-send-email-t-kristo@ti.com
[ Add SPDX tag and make _emif_get_id() use edac_printk(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix an uninitialized variable warning in the Octeon EDAC driver, as seen
in MIPS cavium_octeon_defconfig builds since v4.14 with Codescape GNU
Tools 2016.05-03:
drivers/edac/octeon_edac-lmc.c In function ‘octeon_lmc_edac_poll_o2’:
drivers/edac/octeon_edac-lmc.c:87:24: warning: ‘((long unsigned int*)&int_reg)[1]’ may \
be used uninitialized in this function [-Wmaybe-uninitialized]
if (int_reg.s.sec_err || int_reg.s.ded_err) {
^
Iinitialise the whole int_reg variable to zero before the conditional
assignments in the error injection case.
Signed-off-by: James Hogan <jhogan@kernel.org>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.15+
Fixes: 1bc021e815 ("EDAC: Octeon: Add error injection support")
Link: http://lkml.kernel.org/r/20171113161206.20990-1-james.hogan@mips.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Summary of modules changes for the 4.15 merge window:
- Treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- Minor code cleanups
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
fbGX8juZWP4ypbK+irTB
=c8vb
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
Pull core locking updates from Ingo Molnar:
"The main changes in this cycle are:
- Another attempt at enabling cross-release lockdep dependency
tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
with better performance and fewer false positives. (Byungchul Park)
- Introduce lockdep_assert_irqs_enabled()/disabled() and convert
open-coded equivalents to lockdep variants. (Frederic Weisbecker)
- Add down_read_killable() and use it in the VFS's iterate_dir()
method. (Kirill Tkhai)
- Convert remaining uses of ACCESS_ONCE() to
READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
driven. (Mark Rutland, Paul E. McKenney)
- Get rid of lockless_dereference(), by strengthening Alpha atomics,
strengthening READ_ONCE() with smp_read_barrier_depends() and thus
being able to convert users of lockless_dereference() to
READ_ONCE(). (Will Deacon)
- Various micro-optimizations:
- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)
- ... plus other fixes and enhancements. (Borislav Petkov, Juergen
Gross, Miguel Bernal Marin)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
rcu: Use lockdep to assert IRQs are disabled/enabled
netpoll: Use lockdep to assert IRQs are disabled/enabled
timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
irq_work: Use lockdep to assert IRQs are disabled/enabled
irq/timings: Use lockdep to assert IRQs are disabled/enabled
perf/core: Use lockdep to assert IRQs are disabled/enabled
x86: Use lockdep to assert IRQs are disabled/enabled
smp/core: Use lockdep to assert IRQs are disabled/enabled
timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
timers/nohz: Use lockdep to assert IRQs are disabled/enabled
workqueue: Use lockdep to assert IRQs are disabled/enabled
irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
locking/rwlocks: Fix comments
x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
...
Worth noting are the changes to ghes_edac to use a whitelist of
known-good platforms on which GHES error reporting works relatively
reliably. By Toshi Kani and Borislav Petkov.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAloIt2sACgkQEsHwGGHe
VUpzmw/+PETRJYIwp9TnmLHsNjVA/TPFHmjQqYc+jR9sWzTdrsZqI1JbP6GNC4l6
0bMdcyCO5J57F8xLKkgYC5sv24m1vL8MKx0+sQSI+EXupCyqDhhclU3c2rfSj0Um
+qpPV4MnewqJNZlenKKHC+R2wADSsijIpIiMeV682/8HCf0EB68P5m6NRMSqL+hz
IVOD2xAyn+3dDw4ZazJgra5xSVt+xJdt+x0/bvovPapzvF81b+PCB4XXVm4J5d4f
gi0AwDbmyLpMx2HK5vLT5JgcbGg6aUS7OhwcZLSwGuWJTTsYM5uEk+F+dj3pxNge
hk0JSeCLm0S4f60p496VuiweIOrkk4feDbz+RAK9GJaR270K621Zb/06jCybB6aB
htR/VQS50BgCyehYzSWw9VAOcsEvNPHKpYSGkrDwPRkFZOxfigdnKb4MN8XbGund
bpQqx6VlgPhV32v0F1F893IIPHrcuc7qFo4KJPSGnP58+iBOeJ6iy2vlUdtMIfyL
opjS0nqg7B6pXCcANLFRWV0iPTH4e1PWAMGSSxp64cqjuEmHPtlvQGinevBq56Ob
HpxbtnOqen14xxMulkPPHBw3yKWQA0/teAGcSn2D5P/Qv0X3aTGsifr3Ruk/Cl0E
wQaGBNcn+5xJ8nhBS7GdUDdyfE5o9zW8wjw3eZEphREoOM3N8ko=
=XEah
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"The usual pile of bugfixes, cleanups and minor driver enhancements.
Worth noting are the changes to ghes_edac to use a whitelist of
known-good platforms on which GHES error reporting works relatively
reliably. By Toshi Kani and Borislav Petkov"
* tag 'edac_for_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, sb_edac: Fix missing break in switch
MAINTAINERS: Split Cavium EDAC entry and add myself
EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode
EDAC, skx_edac: Handle systems with segmented PCI busses
EDAC, thunderx: Remove suspend/resume support
EDAC, skx_edac: Fix detection of single-rank DIMMs
EDAC, sb_edac: Don't create a second memory controller if HA1 is not present
EDAC: Add owner check to the x86 platform drivers
EDAC: Add helper which returns the loaded platform driver
EDAC, ghes: Add platform check
EDAC, ghes: Model a single, logical memory controller
EDAC, ghes: Remove symbol exports
EDAC: Handle return value of kasprintf()
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:
@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@
module_param_call(_name, _set_func, _get_func, _arg, _mode);
@fix_set_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _set_func(
-_val_type _val
+const char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
@fix_get_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _get_func(
-_val_type _val
+char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:
drivers/platform/x86/thinkpad_acpi.c
fs/lockd/svc.c
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.
However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.
It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the Altera EDAC code and comments to use
{READ,WRITE}_ONCE() consistently.
----
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-2-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add missing break statement in order to prevent the code from falling
through.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171016174029.GA19757@embeddedor.com
Signed-off-by: Borislav Petkov <bp@suse.de>
When figuring out the size of the DIMMs and the cluster mode is SNC2 or SNC4 the
current algorithm ignores the contribution of some of the channels resulting in
EDAC never knowing of the existence of some DIMMs attached to such channels (thus
sysfs is not populated).
Instead of selectively iterating from 0 to interlv_ways when looking for all the
participants in the interleave, do an exhaustive search and iterate from 0 to
KNL_MAX_CHANNELS. The algorithm is already smart enough to consider participants
only one time.
This works fine in all KNL cluster modes and even when there are missing DIMMs
as the contribution of those channels is 0.
Signed-off-by: Luis Felipe Sandoval Castro <luis.felipe.sandoval.castro@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: arozansk@redhat.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: qiuxu.zhuo@intel.com
Link: http://lkml.kernel.org/r/1506606882-90521-1-git-send-email-luis.felipe.sandoval.castro@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Large systems separate their PCI busses into segments since
the limit of only 256 PCI busses can be too restrictive.
Extend this driver to check whether <segment, bus-number> matches
when deciding how to group memory controller PCI devices to
CPU sockets.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Charles Rose <charles.rose@dell.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/f58abfd10bf73c8bc5adc1fe4de7408128b00625.1506358467.git.tony.luck@intel.com
[ Make skx_dev.seg an int. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The memory controller on ThunderX/OcteonTX systems does not support
power management. Therefore remove the suspend/resume callbacks.
Signed-off-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan Glauber <jglauber@cavium.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Zhangshaokun <zhangshaokun@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20170925123502.17289-2-jglauber@cavium.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Single-rank DIMMs did not get detected on Skylake/Kabylake systems due
to wrong limit check. The single rank DIMM check is a simple typo. "0"
is a legal value in this field meaning single rank.
Signed-off-by: Charles Rose <charles.rose@dell.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/66df72d327c265fbf92fe25df96daa228a35f076.1506358467.git.tony.luck@intel.com
[ Also fix debug message to print number of ranks. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Expand commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.
The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.
The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:
4024 addresses fell in TOLM hole area
12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
106400 addresses were injected totally.
The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.
In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.
Thus, do not create a second memory controller if the key HA1 is absent.
[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: e2f747b1f4 ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Change x86 EDAC platform drivers to verify the module owner at the
beginning of their module init functions. This allows them to fail their
init immediately when ghes_edac is enabled. Similar change can be made
to other edac drivers if necessary.
Also, remove ".c" from module names of pnp2_edac, sb_edac, and skx_edac.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-6-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Only a single EDAC platform driver can be loaded. When ghes_edac is
enabled, an EDAC platform driver still attempts to register itself and
fails in edac_mc_add_mc().
Add edac_get_owner() so that EDAC platform drivers can check the owner
first.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-5-toshi.kani@hpe.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet. This driver obtains error info from firmware
interfaces (APEI), which are not properly implemented on many platforms,
as the driver says on load:
This EDAC driver relies on BIOS to enumerate memory and get error
reports. Unfortunately, not all BIOSes reflect the memory layout
correctly. So, the end result of using this driver varies from vendor
to vendor. If you find incorrect reports, please contact your hardware
vendor to correct its BIOS.
To get out from this situation, add a platform check to selectively
enable the driver on platforms that are known to have proper APEI
firmware implementation.
"ghes_edac.force_load=1" skips this platform check.
[1]: https://lkml.kernel.org/r/cover.1360931635.git.mchehab@redhat.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-4-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We're enumerating the DIMMs through a DMI walk and since we can't get
any more detailed topological information about which DIMMs belong to
which memory controller, convert it to a single, logical controller
which contains all the DIMMs.
The error reporting path from GHES ghes_edac_report_mem_error() doesn't
get called in NMI context but add a warning about it to catch any
changes in the future as if so, our locking scheme will be insufficient
then.
Signed-off-by: Borislav Petkov <bp@suse.de>
kasprintf() can fail and we must check its return value.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: linux-edac@vger.kernel.org
[ Merged into a single patch, small formatting fixups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
struct mce.cpuid contains CPUID(1).EAX which contains family, model and
stepping and thus has enough information for our purposes. Thus get rid
of some external dependencies which are not really needed.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Make these const as they are only stored in the type field of a device
structure, which is const.
Done using Coccinelle.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
On Deverton server, the P2SB PCI device (DEV:1F, FUN:1) is used by multiple
device drivers.
If it's hidden by some device driver (e.g. with the i801 I2C driver,
the commit
9424693035 ("i2c: i801: Create iTCO device on newer Intel PCHs")
unconditionally hid the P2SB PCI device wrongly) it will make the
pnd2_edac driver read out an invalid BAR value of 0xffffffff and then
fail on ioremap().
Therefore, store the presence state of P2SB PCI device before unhiding
it for reading BAR and restore the presence state after reading BAR.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-i2c@vger.kernel.org
Link: http://lkml.kernel.org/r/20170814154845.21663-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Bit[0] of BAR is always zero. Bit[2:1] and bit[3] of BAR contain the
information of 'type' and the 'prefetchable' accordingly. Therefore,
mask the lower four bits to retrieve the actual base address of a BAR.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170814154813.21619-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
I've been waing a long time for the generic sideband driver to
appear. Patience has run out, so include the minimum here to
just read registers.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Patrick Geary <patrickg@supermicro.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170803210536.5662-1-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Basically, there are full memory mirroring and address range partial
memory mirroring (supported by Haswell EX and Broadwell EX) modes.
a) In full memory mirroring, the memory behind each memory controller
is mirrored, i.e. the memory is split into two identical mirrors
(primary and secondary), half of the memory is reserved for redundancy.
b) In address range partial memory mirroring, the memory size (range)
of primary and secondary behind each memory controller can be user
defined by the TAD0 register. The rest of memory ranges defined by
TAD1/TAD2/... in that memory controller are non-mirrored.
For more detail on memory mirroring, see the following link written by Tony Luck:
https://01.org/lkp/blogs/tonyluck/2016/address-range-partial-memory-mirroring-linux
Currently the sb_edac driver only supports address decoding in full
memory mirroring and non-mirroring modes. In address range partial
memory mirroring mode, it may fail to decode an address that falls in a
non-mirroring area (the following was one of this kind of failed logs).
mce: Uncorrected hardware memory error in user-access at 566d53a400
Memory failure: 0x566d53a: Killing einj_mem_uc:4647 due to hardware memory corruption
Memory failure: 0x566d53a: recovery action for dirty LRU page: Recovered
mce: [Hardware Error]: Machine check events logged
EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
EDAC sbridge MC1: CPU 48: Machine Check Event: 0 Bank 7: ec00000000010090
EDAC sbridge MC1: TSC 4b914aa5a99dab
EDAC sbridge MC1: ADDR 566d53a400
EDAC sbridge MC1: MISC 1443a0c86
EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1499712764 SOCKET 2 APIC 80
EDAC MC1: 0 UE Can't discover the memory rank for ch addr 0x7fb54e900 on any memory ( page:0x0 offset:0x0 grain:32)
mce: [Hardware Error]: Machine check events logged
Therefore, classify memory mirroring modes and make the address decoding
in address range partial memory mode correct.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170730180651.30060-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing of
the full path string for each node.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170718214339.7774-19-robh@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
It is a write-only variable so get rid of it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Loc Ho <lho@apm.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Using the homegrown amd_get_nb_id() to find a node ID on AMD was fine
while the L3 to node mapping was 1:1. And Zen topology broke this. So
let's start slowly moving away from it and use the topology interfaces
instead.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1490041614-90057-2-git-send-email-Yazen.Ghannam@amd.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Non-existent or empty DIMM slots result in error return from
RD_REGP(). But we shouldn't give up on failure.
So long as we find at least one DIMM we can continue.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170628234407.21521-1-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The function sbi_send() is local to just pnd2_edac.c and does not need
to be in global scope, so make it static.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170623084855.9197-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Add code comment to make it clear that the fall-through is intentional
and, OR ret with its previous value to avoid overwriting it so that
callers can check the correct return value.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170622220535.GA4896@embeddedgus
[ Massage a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Use of_address_to_resource() and resource_size() instead of manually
parsing the "reg" property from the "memory" node(s).
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170606235500.22772-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
Xiaolong Ye reported the following failure on Broadwell D server:
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sbridge_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.
Broadwell D (only IMC0 per socket) and Broadwell X (IMC0 and IMC1 per
socket) use the same PCI device IDs for IMC0 per socket, then they
share pci_dev_descr_broadwell_table (n_imcs_per_sock=2). In this case,
Broadwell D wrongly creates the nonexistent SOCK EDAC memory controller
and reports above error messages, since it has no IMC1 per socket.
Avoid creating the nonexistent SOCK memory controller.
Reported-and-tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170608113351.25323-1-qiuxu.zhuo@intel.com
[ Massage. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
edac_op_state is a module parameter which affects the behaviour of
the driver probe which can potentially be invoked as soon as the
platform driver registration happens. Because of this we need to
ensure that we sanity check the module parameter before calling
platform_register_drivers().
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170607215530.8604-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
Compare the number of debugfs entries created by
thunderx_create_debugfs_nodes() with the requested number of entries to
properly determine whether to print a warning.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20170531155157.93583-1-stemerkhanov@cavium.com
Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Check the return status of platform_driver_register() in
mv64x60_edac_init(). Only output messages and initialise the
edac_op_state if the registration is successful.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170529212142.25572-2-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
Kaby Lake seems to work just like Skylake.
Reported-and-tested-by: Doug Thompson <bc.tdw@recursor.net>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1495823683-32569-1-git-send-email-jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Collapse 'case:' in *_mci_bind_devs() and update driver version from
1.1.1 to 1.1.2.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000934.87971-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This is based on previous work by Patrick Geary, see Link.
Additional cleanups ontop:
- Remove the code to read MCMTR from pci_ha1_ta and CHN_TO_HA macro,
now that TA0 and TA1 are unified.
- Remove get_pdev_same_bus(), since in get_dimm_config() the
variable "pvt->pci_ta" for KNL is also ready, we can simply use
pci_read_config_dword(pvt->pci_ta, KNL_MCMTR, &pvt->info.mcmtr) to read
MCMTR.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/57884350.1030401@supermicro.com
Link: http://lkml.kernel.org/r/20170523000910.87925-1-qiuxu.zhuo@intel.com
[ Make __populate_dimms() return int. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
We don't need this quirk anymore now that the EDAC memory controller
representation matches the hardware.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000834.87881-1-qiuxu.zhuo@intel.com
[ Commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Tony pointed out: "currently the driver pretends there is one big
8-channel memory controller per socket instead of 2 4-channel
controllers. This is fine with all memory controller populated with
symmetrical DIMM configurations, but runs into difficulties on
asymmetrical setups".
Restructure the driver to assign an EDAC memory controller to each real
h/w memory controller to resolve the issue.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com
[ Break some lines at convenient points. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
EDAC assigns logical memory controller numbers in the order that we find
memory controllers, which depends on which PCI bus they are on. Some
systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on
socket3.
All this is made more confusing for users because we use the string
"Socket" while generating names for memory controllers, but the number
that we attach there is the memory controller number. E.g.
EDAC MC0: Giving out device to module sbridge_edac.c controller
Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT)
Change the names to say "SrcID#%d" (where the number we use is read from
the h/w associated with the memory controller instead of some logical
number internal to the EDAC driver). New message:
EDAC MC0: Giving out device to module sbridge_edac.c controller
Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT)
Reported-by: Andrey Korolyov <andrey@xdel.ru>
Reported-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Each of the PCI device IDs belongs to a CPU socket, or to one of the
integrated memory controllers. Provide an enum to specify the domain of
each, and distinguish the resource number in each domain: the number
of the PCI device IDs per integrated memory controller/socket, and the
number of integrated memory controllers per socket.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com
[ Realign pci_dev_descr_knl members. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
struct irq_domain_ops is not modified, so it can be made const.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170524133505.1233-1-tklauser@distanz.ch
Signed-off-by: Borislav Petkov <bp@suse.de>
The wrong index into the csbases/csmasks arrays was being passed to
the function to compute the chip select sizes, which resulted in the
wrong size being computed. Address that so that the correct values are
computed and printed.
Also, redo how we calculate the number of pages in a CS row.
Reported-by: Benjamin Bennett <benbennett@gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: <stable@vger.kernel.org> # 4.10.x
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded integer math comment, minor cleanups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Leave it to the user to decide whether to enable this or not. Otherwise,
platform-specific drivers won't initialize (currently, EDAC supports
only a single platform driver loaded).
Signed-off-by: Borislav Petkov <bp@suse.de>
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of
that indirection. Update defconfigs which had it.
While at it, fix dependencies such that EDAC depends on RAS for the
tracepoints.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-edac@vger.kernel.org
Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.
Signed-off-by: Borislav Petkov <bp@suse.de>
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. See
c0d1217202 ("drivers/edac: add new nmi rescan").
From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.
Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."
So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:
inline int edac_handler_set(void)
{
if (edac_op_state == EDAC_OPSTATE_POLL)
return 0;
return atomic_read(&edac_handlers);
}
So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
The peripherals' RAS functionality only exist on the Arria10 SoCFPGA.
The Cyclone5 initialization generates EDAC warnings when the peripherals
aren't found in the device tree. Fix by checking for Arria10 in the init
functions.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1491415262-5018-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
DIMM number passed to edac_mc_handle_error() was accidentally hardcoded
to zero. Pass in the correct daddr->dimm value.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Provide debugfs function stubs when EDAC_DEBUG is not enabled so that we
don't fail the build:
drivers/edac/pnd2_edac.c: In function ‘pnd2_init’:
drivers/edac/pnd2_edac.c:1521:2: error: implicit declaration of function ‘setup_pnd2_debug’ [-Werror=implicit-function-declaration]
setup_pnd2_debug();
^
drivers/edac/pnd2_edac.c: In function ‘pnd2_exit’:
drivers/edac/pnd2_edac.c:1529:2: error: implicit declaration of function ‘teardown_pnd2_debug’ [-Werror=implicit-function-declaration]
teardown_pnd2_debug();
^
Signed-off-by: Borislav Petkov <bp@suse.de>
Initial target for this driver is the Intel Apollo Lake platform and
Denverton micro-server, they use the same internal memory controller IP
called Pondicherry2.
Memory controller registers are not in PCI config space like earlier
Intel memory controllers. For Apollo Lake platform they are accessed via
a "side-band" interface, for Denverton micro-server they are access via
PCI config space and memory map I/O. This driver is for Apollo Lake and
Denverton, but only the Denverton is fully enabled while we wait for the
sideband driver.
Apollo lake driver and initial cut at Denverton driver by Tony Luck.
Extensive cleanup, refactoring and basic verification by Qiuxu Zhuo.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170308174539.14432-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The MTR_DRAM_WIDTH macro returns the data width. It is sometimes used
as if it returned a boolean true if the width if 8. Fix the tests where
MTR_DRAM_WIDTH is misused.
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170309011809.8340-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix spelling mistake in dev_err message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170223002609.9440-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull RAS updates from Ingo Molnar:
"The main changes in this cycle were:
- Assign notifier chain priorities for all RAS related handlers to
make the ordering explicit (Borislav Petkov)
- Improve the AMD MCA banks sysfs output (Yazen Ghannam)
- Various cleanups and restructuring of the x86 RAS code (Borislav
Petkov)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
x86/ras: Get rid of mce_process_work()
EDAC/mce/amd: Dump TSC value
EDAC/mce/amd: Unexport amd_decode_mce()
x86/ras/amd/inj: Change dependency
x86/ras: Flip the TSC-adding logic
x86/ras/amd: Make sysfs names of banks more user-friendly
x86/ras/therm_throt: Do not log a fake MCE for thermal events
x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
Currently, the IPID and Syndrome are printed on the same line as the
Address. There are cases when we can have a valid Syndrome but not a
valid Address.
For example, the MCA_SYND register can be used to hold more detailed
error info that the hardware folks can use. It's not just DRAM ECC
syndromes. There are some error types that aren't related to memory that
may have valid syndromes, like some errors related to links in the Data
Fabric, etc.
In these cases, the IPID and Syndrome are not printed at the same log
level as the rest of the stanza, so users won't see them on the console.
Console:
[Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
[Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
Dmesg:
[Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
, Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
[Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
Print the IPID first and on a new line. The IPID should always be
printed on SMCA systems. The Syndrome will then be printed with the IPID
and at the same log level when valid:
[Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
[Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
[Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Last time we did that was when we enabled Bulldozer. Now, we enabled Zen
so it is only natural ... :-)
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Fix the following sparse warnings:
drivers/edac/fsl_ddr_edac.c:148:1: warning:
symbol 'dev_attr_inject_data_hi' was not declared. Should it be static?
drivers/edac/fsl_ddr_edac.c:150:1: warning:
symbol 'dev_attr_inject_data_lo' was not declared. Should it be static?
drivers/edac/fsl_ddr_edac.c:152:1: warning:
symbol 'dev_attr_inject_ctrl' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The L2 cache controller on the T2080 SoC has similar capabilities to the
others already supported by the mpc85xx_edac driver. Add it to the list
of compatible devices.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
Match one of the devices in amd64_cpuids[] before loading the module.
This is an additional sanity check against users trying to load
amd64_edac_mod on unsupported systems.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com
[ Get rid of err_ret label, make it a bit more readable this way. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Having ECC disabled on a node doesn't necessarily mean that it's
disabled for the entire system. So let's return a non-failing code when
ECC is disabled on a node. This way we can skip initialization for the
node but still continue with the remaining nodes.
After probing all instances, make sure we have at least one MC device
allocated.
This issue is seen and fix tested on Fam15h and Fam17h MCM systems.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Print the node number when informing that DRAM ECC is disabled so
that we can show which nodes have DRAM ECC disabled. Also, print more
detailed system information as edac_dbg(), so as to not bother general
users.
Switch amd64_notice to amd64_info to match the message above it.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We have a few functions that register/unregister an ECC error decoding
routine. These functions are called when we init/remove instances.
However, they are global and so don't need to be registered/unregistered
multiple times.
So move them out of the init/remove instance functions and into the
module init/exit routines.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Users may not be familiar with the concept of deferred errors. There is
no action for users to take on this type of error, so give more context
in the error message to make this more clear.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
REDMEMB[17] is the ECC_Locator bit, which, when set, identifies the
CS[3:2] as the simbols in error. And thus the second channel.
The macro computing it was wrong so get rid of it (it was used at one
place only) and get rid of the conditional too. Generates better code
this way anyway.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dump the TSC value of the time when the MCE got logged.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is not used outside of the driver anymore.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Function sbridge_register_mci() sets pvt->info.show_interleave_mode
to knl_show_interleave_mode() on Knight's Landing and
show_interleave_mode() anywhere else.
Merge show_interleave_mode() and knl_show_interleave_mode() in a single
implementation and use it without an indirect function pointer.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org
[ Call it get_intlv_mode_str(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The old csrowX sysfs directories have per-csrow error counters, but the
new dimmX directories do not currently expose error counts.
EDAC already keeps these counts, add them to sysfs so per-DIMM counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n.
Signed-off-by: Aaron Miller <aaronmiller@fb.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We should save the return code from probe_one_instance() so that it can
be returned from the module init function. Otherwise, we'll be returning
the -ENOMEM from above.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1484322741-41884-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
If ioremap_nocache() fails, it will return NULL. Which will then cause a
NULL-pointer dereference. Handle the returned value properly.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1484549092-11349-1-git-send-email-arvind.yadav.cs@gmail.com
[ Boris: massage commit message and improve error message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The dev_attr_sdram_scrub_rate is not declared in a header or used
anywhere else, so make it static to fix the following warning:
drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
'dev_attr_sdram_scrub_rate' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Borislav Petkov <bp@suse.de>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some kernel-doc tags don't provide good descriptions or use
a different style. Adjust them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Currently, there's no device driver documentation for the EDAC
subsystem at the driver-api book. Fill in the blanks for the
structures and functions that misses documentation, uniform
the word on the existing ones, and add a new edac.rst file at
driver-api, in order to document the EDAC subsystem.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Several functions are documented at edac_mc.c.
As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Several functions are documented at edac_pci.c and edac_pci_sysfs.c.
As we'll be including edac_pci.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.
As several of those kernel-doc macros are not in the right format,
fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Several functions are documented at edac_device.c.
As we'll be including edac_core.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.
As several of those kernel-doc macros are not in the right format,
fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The edac_core.h header contain data structures and function
definitions for both EDAC MC and EDAC device.
Let's move the devices ones to a separate header file, as part
of a header reorganization.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The edac_core.h header contain data structures and function
definitions for the 3 parts of EDAC: MC, PCI and device.
Let's move the PCI ones to a separate header file, as part
of a header reorganization.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Prefix the warn and error macros with the respective string so that
callers don't have to say "Error" or "Warning". We save us string length
this way in the actual calls.
While at it, shorten the calls in reserve_mc_sibling_devs().
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
How we need to decode UMC errors is different from how we decode bus
errors, so let's define a new function for this. We also need a way to
determine the UMC channel since we're not guaranteed that there is a
fixed relation between channel and MCA bank.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com
[ Fold in decode_synd_reg(), simplify. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
We need to determine the EDAC capabilities from all UMCs on the node. We
should only check UMCs that are enabled and make sure they all agree.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-15-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The UMCs on Fam17h are independent memory controllers so we need to
read the capabilities from all UMCs and make sure they agree. Once
we determine what capabilities are available we should save them for
convenience.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480431116-94683-1-git-send-email-Yazen.Ghannam@amd.com
[ Simplify f17h_determine_edac_ctl_cap(), preinit edac_mode in init_csrows(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Read a few more UMC registers and provide debug output in order to be as
similar as possible to older AMD systems.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded K8 check and comments, fixup others. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fam17h has new register offsets and fields for setting up the DRAM
scrubber so add support for this.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
MCA_STATUS[43] has been defined as "Poison" or "Reserved" for every bank
since Fam15h except for Fam15h, bank 4 in which case it's defined as
part of the McaStatSubCache bitfield.
Filter out that case.
Reported-by: Dean Liberty <Dean.Liberty@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479478222-19896-1-git-send-email-Yazen.Ghannam@amd.com
[ Split an almost unparseable ternary conditional, add a comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fam17h has a different set of registers and bitfields. Most of these
registers are read through SMN (System Management Network) rather
than PCI config space. Also, the derivation of various values is now
different.
Update amd64_edac to read the appropriate registers and extract the
correct values for Fam17h.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com
[ Save us the indentation level in read_mc_regs(), add defines ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older
systems. Update struct amd64_pvt to hold the new functions and reserve
them if on Fam17h.
Also, allocate an array of UMC structs within our newly allocated PVT
struct.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com
[ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Add a family type and associated ops for Fam17h. Define a struct to hold
all the UMC registers that we need. Make this a part of struct amd64_pvt
in order to maximize code reuse in the rest of the driver.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Update the ecc_enabled() function to work on Fam17h. This entails
reading a different set of registers and using the SMN (System
Management Network) rather than PCI devices.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com
[ Fixup ecc_en assignment and get_umc_base(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
tip:ras/core contains the respective Fam17h x86 RAS bits which
amd64_edac is going to use. So merge it into the EDAC branch.
Signed-off-by: Borislav Petkov <bp@suse.de>
It's not recommended for the OS to try and force-enable ECC checking.
This is considered a firmware task since it includes memory training,
etc, so don't change ECC settings on Fam17h or newer systems and inform
the user.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479850816-1595-1-git-send-email-Yazen.Ghannam@amd.com
[ Put the "forcing" message in an else branch. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Currently, deferred errors are classified as correctable in EDAC. Add a
new error type for deferred errors so that they are correctly reported
to the user.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We only use __log_bus_error() to log DRAM ECC errors, so let's change
the name to reflect this. We'll also use this function for DRAM ECC
errors on Fam17h, but we'll call it from a different function than
decode_bus_error().
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
AMD Fam17h will not be using PCI function 2 for EDAC, but will continue
to use function 3. So let's get the name of F3 instead of F2 to support
Fam17h and previous families.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
nb_bus_decoder() is only used for DRAM ECC errors so rename it so that
the name is more generic and descriptive.
Also, call it for DRAM ECC errors on SMCA systems.
[ Boris: rename it to real function name with a verb in it. ]
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
If we execute the below steps without this patch:
modprobe mpc85xx_edac [The first insmod, everything is well.]
modprobe -r mpc85xx_edac
modprobe mpc85xx_edac [insmod again, error happens.]
We would get the error messages as below:
BUG: recent printk recursion!
Oops: Kernel access of bad area, sig: 11 [#48]
Modules linked in: mpc85xx_edac edac_core softdog [last unloaded: mpc85xx_edac]
CPU: 5 PID: 14773 Comm: modprobe Tainted: G D C 4.8.3-rt2
.vsnprintf
.vscnprintf
.vprintk_emit
.printk
.edac_pci_add_device
.mpc85xx_pci_err_probe
.platform_drv_probe
.driver_probe_device
.__driver_attach
.bus_for_each_dev
.driver_attach
.bus_add_driver
.driver_register
.__platform_register_drivers
.mpc85xx_mc_init
.do_one_initcall
.do_init_module
.load_module
.SyS_finit_module
system_call
Address this by cleaning up properly when removing the platform driver.
Tested on a T4240QDS board.
Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: york.sun@nxp.com
Link: http://lkml.kernel.org/r/1479351380-17109-2-git-send-email-yanjiang.jin@windriver.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Trivial fix to spelling mistake "Mutilple" to "Multiple" in error
messages.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161114231104.5585-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
When accessing the mc_devices list of memory controller descriptors, we
need to hold mem_ctls_mutex. This was not always the case, fix that.
Make all external callers call a version which grabs the mutex since the
last is local to edac_mc.c.
Reported-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Add accessor functions and hide the smca_names array. Also, add a
sanity-check to bank HWID assignment in get_smca_bank_info().
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make it differ more from struct smca_bank_name for better readability.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Call it simply smca_hwid and call local variables "hwid". More readable.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Disable IRQs while injecting SDRAM errors. The RT patches exposed
a spinlock deadlock where the spinlock taken for the regmap write
deadlocked with the IRQ clear regmap write.
Error injection is not normally enabled for ECC but only for testing.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1476906827-9412-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix the following sparse warnings:
drivers/edac/skx_edac.c:266:25: warning:
symbol 'skx_cpuids' was not declared. Should it be static?
drivers/edac/skx_edac.c:1040:12: warning:
symbol 'skx_init' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1477147098-2842-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Add Knights Mill (KNM) to the list of CPU models supported by sb_edac.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We now have symbolic names for a bunch of Intel CPU models via
asm/intel-family.h. The original conversion missed the EDAC drivers.
Convert them.
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com
[ Remove comment, macro name is descriptive enough. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
buffers (Thor Thayer)
* Split the memory controller part out of mpc85xx and share it with a
* new Freescale ARM Layerscape driver (York Sun)
* amd64_edac fixes (Yazen Ghannam)
* Misc cleanups, refactoring and fixes all over the place
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX80EwAAoJEBLB8Bhh3lVKWwYP/1bbODQ7o+XhO8IaCDffYk30
8y4WdSnI0/QcP8JbSvFA7y6Zn4L0BbrbYhKLRDAg9c34V2bMaqonCnkDtT6YatUb
6l0H/hQ/Cah9AOm5PJLYg6O9s+ZBT8zA5b+F2Z9kUsuB6LSnVhp9skNrH6KPlm0U
4pFaLnHQenQPZbuRCfRxPU49ZuKtBZtQDkJLJlHXwn7e1qZy2Q4tMnnEtsY6U2ea
t3Hj+F8g+cdoiTQXOceCcOTR8GqDI6szgzn7vpXAGYvljBndszauAkxO7by79jg1
I8AQfgwoBF5CYL2Q0pzT1maHmmG2sydeRAHIvhmGxiEfFz1abWhriXbS33c32q8a
iFiVMAUIaSKpB/sB+5w5ymuBctI1mX5EQVW+8Xl2Gxt+olnhdJMocHnvQdYkfsYm
Ka8LcbaiK6ZQTbs/cIMOc2paE0AFPu5uXKHCPeZlhQAxOBvSPuDAv0+qUB/of5Uq
1SPidtsTmCI7X2hrdHAH9hLEkSjq68v3kqL5YnZL3H4gA3WohQEmX9ybjk097Kus
WWEhdi/PSFX0qQKotMUUDuxfNcKI6PZH9p+i2dN6tNCkiTDdb0Eo5lCXN7RVVhvq
qfE0Fcc4uDzh5MUS5jT58MWpA1cfdu9jbAf2BwFIU/poJcaeqy/SMyzCL+1D2/u6
dmDAtQbKUUwiltB8QzQd
=pcI8
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"A lot of movement in the EDAC tree this time around, coarse summary
below:
- Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
buffers (Thor Thayer)
- split the memory controller part out of mpc85xx and share it with a
new Freescale ARM Layerscape driver (York Sun)
- amd64_edac fixes (Yazen Ghannam)
- misc cleanups, refactoring and fixes all over the place"
* tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits)
EDAC, altera: Add IRQ Flags to disable IRQ while handling
EDAC, altera: Correct EDAC IRQ error message
EDAC, amd64: Autoload module using x86_cpu_id
EDAC, sb_edac: Remove NULL pointer check on array pci_tad
EDAC: Remove NO_IRQ from powerpc-only drivers
EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe()
EDAC, fsl_ddr: Add entry to MAINTAINERS
EDAC: Move Doug Thompson to CREDITS
EDAC, I3000: Orphan driver
EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul()
EDAC, layerscape: Add Layerscape EDAC support
EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed
EDAC, fsl_ddr: Add support for little endian
EDAC, fsl_ddr: Add missing DDR DRAM types
EDAC, fsl_ddr: Rename macros and names
EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx
EDAC, mpc85xx: Replace printk() with pr_* format
EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1
EDAC, altera: Rename MC trigger to common name
EDAC, altera: Rename device trigger to common name
...
Add the IRQF_ONESHOT and IRQF_TRIGGER_HIGH flags to disable the IRQ
while executing the IRQ handler. Remove the IRQF_SHARED because these
are not shared IRQs in the domain. Exposed when flooding IRQs.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1474582419-7053-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Bank 4 is reserved on family 0x17 and shouldn't generate any MCE
records. However, broken hardware and software is not something unheard
of so warn about bank 4 errors. They shouldn't be coming from bank 4
naturally but users can still use mce_amd_inj to simulate errors from it
for testing purposed.
Also, avoid special handling in the injector mce_amd_inj like it is
being done on the older families.
[ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ]
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The MCA_SYND and MCA_IPID registers contain valuable information and
should be included in MCE output. The MCA_SYND register contains
syndrome and other error information, and the MCA_IPID register will
uniquely identify the MCA bank's type without having to rely on system
software.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Scalable MCA defines a number of IP types. An MCA bank on an SMCA
system is defined as one of these IP types. A bank's type is uniquely
identified by the combination of the HWID and MCATYPE values read from
its MCA_IPID register.
Add the required tables in order to be able to lookup error descriptions
based on a bank's type and the error's extended error code.
[ bp: Align comments, simplify a bit. ]
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The error descriptions defined for Fam17h can be reused for other SMCA
systems, so their names should reflect this.
Change f17h prefix to smca for error descriptions.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Print SyndV bit status and print the raw value of the MCA_SYND register.
Further decoding of the syndrome from struct mce.synd can be done in
other places where appropriate, e.g. DRAM ECC.
Boris: make the error stanza more compact by putting the error address
and syndrome on the same line:
[Hardware Error]: Corrected error, no action required.
[Hardware Error]: CPU:2 (17:0:0) MC4_STATUS[-|CE|-|PCC|AddrV|-|-|SyndV|CECC]: 0x96204100001e0117
[Hardware Error]: Error Addr: 0x000000007f4c52e3, Syndrome: 0x0000000000000000
[Hardware Error]: Invalid IP block specified.
[Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: RD
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and
hence cannot be NULL, so the NULL pointer check on pci_tad is redundant.
Remove it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We'd like to eventually remove NO_IRQ on powerpc, so remove usages of it
from powerpc-only drivers.
The pdata structs are kzalloc'ed, so we don't need to initialise those
to 0, we can just drop the assignments entirely.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/1473674436-19467-1-git-send-email-mpe@ellerman.id.au
Signed-off-by: Borislav Petkov <bp@suse.de>
Return negative error code from the edac_mc_add_mc() error handling case
instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: York Sun <york.sun@nxp.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1473350284-26482-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The compatible DDR controllers may support DDR, DDR2, DDR3, DDR4 DRAM.
An individual controller doesn't support all of them. The EDAC driver
reads SDRAM_CFG to determine which mode is configured.
Add DDR4 and drop the defines used only in the mtype assignment.
Signed-off-by: York Sun <york.sun@nxp.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: morbidrsa@gmail.com
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-6-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The mpc85xx-compatible DDR controllers are used on ARM-based SoCs too.
Carve out the DDR part from the mpc85xx EDAC driver in preparation to
support both architectures.
Signed-off-by: York Sun <york.sun@nxp.com>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470946525-3410-1-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
On e500v1, read fault exception enable (RFXE) controls whether assertion
of core_fault_in causes a machine check interrupt. Assertion of
core_fault_in can result from uncorrectable data error, such as an L2
multi-bit ECC error. It can also occur from a system error if logic on
the integrated device signals a fault for nonfatal errors. RFXE bit is
cleared out of reset, and should be left clear for normal operation.
Assertion of core_fault_in does not cause a machine check.
RFXE is set specifically for RIO (Rapid IO) and PCI for book E to catch
the errors by machine check. With this bit set, the EDAC driver can't
get the interrupt in case of uncorrectable error. So this bit is cleared
in favor of EDAC. However, the benefit of catching such uncorrectable
error doesn't outweigh the other errors which may hang the system.
Besides, e500v2 has different errors masked by RFXE, and e500mc doesn't
support this bit. It is more reasonable to leave RFXE as is in the EDAC
driver, and leave the uncorrectable errors triggering machine check for
e500v1.
Suggested-by: Scott Wood <oss@buserror.net>
Signed-off-by: York Sun <york.sun@nxp.com>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-2-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The L2 and OCRAM devices have different ecc trigger names than the other
EDAC devices (FIFO peripherals). Make them all the same and remove the
character array from the device structure.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471622666-15197-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This is an entirely new driver instead of yet another set of patches
to sb_edac.c because:
1) Mapping from PCI devices to socket/memory controller is significantly
different. Skylake scatters devices on a socket across a number of
PCI buses.
2) There is an extra level of interleaving via the "mcroute" register
that would be a little messy to squeeze into the old driver.
3) Validation is getting too expensive. Changes to sb_edac need to
be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and
Knights Landing.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to the reference manual of MPC8572 and T4240, bit 31 of
PEX_ERR_CAP_STAT is W1C (write 1 to clear).
Add the corresponding write to PEX_ERR_CAP_STAT in order to fix the PCIe
error capture.
Tested on a T4240 processor.
Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160815190849.29327-1-theidsieck@leenox.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Replace the deprecated create_singlethread_workqueue() with
alloc_ordered_workqueue() with WQ_MEM_RECLAIM. This is the identity
conversion.
It's not recommended to stall it from memory pressure. Hence,
WQ_MEM_RECLAIM has been set to ensure forward progress under memory
pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160813164124.GA9077@Karyakshetra
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix the following sparse warning:
drivers/edac/altera_edac.c:1649:23: warning:
symbol 'a10_eccmgr_ic_ops' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1470836667-11822-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Fam15hMod60h systems are using the channel decode of Fam15hMod30h which
gives incorrect results. Fam15hMod60h systems should use the generic
channel decode method plus a couple more cases.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1470236355-30039-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
On Intel Xeon Phi Knights Landing processor family the channels of the
memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
channel name incorrectly.
We missed this change earlier, so the code already contains similar
comment, but the translation function is incorrect.
Without this patch:
errors in DIMM_A and DIMM_D were reported in DIMM_D
errors in DIMM_B and DIMM_E were reported in DIMM_E
errors in DIMM_C and DIMM_F were reported in DIMM_F
Correct this.
Hubert Chrzaniuk:
- rebased to 4.8
- comments and code cleanup
Fixes: d0cdf90031 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Cc: lukasz.odzioba@intel.com
Cc: mchehab@kernel.org
Cc: <stable@vger.kernel.org> # v4.5..
Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
[ Boris: Simplify a bit by removing char mc. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
* Minor cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXlbvGAAoJEBLB8Bhh3lVK9g0P/16Z5/WIqrrLjJegcZ3qKb4m
u2b5FBwbTDQyj/smZmfZfhZUleO7HDayL188fHPKO11my+Mvjyws8IZCvHshy7+3
b6Krno8NXZuNYtm32srSQQoLE4eN5pookOPMe2KJcRFghwnDqOo0ZfMZ8Lt6RVGg
YtqiVZ3DwaqoHUbvxkHvdJuMbtVEZIEC6SzK9t4URmdncAx+3/BTLqDSYc4LaL+B
tI6REgMy+QT1IUMImuUbOA5ktd78K+D26YKF9NuTSWNQvwzjSYy4y6WmDsjYpZF6
h6YWgGwERIy6gws5IdvkBMKX0s1RV7/sY6xFBsEQrX+IFywyY5OcI1gPRttD6VAn
ORJ4j7WNH8Phqld9YxjEX8yTwGxlxge88J/lb0i9s5jXPMX7ioxAqAjdpsJj6PlA
sNDP0fTyCECqME73cWaAS9bdrzB2dgOxEDfcWKWR24vZeHfnh72owD1k+J0iWu9c
0R2usPB1Yjbo6ODGpqG1R+u09LR+RpNe38oT5GcsFrj6/EaGz0kn2QulaNtylsDo
J5qkm02X1h3+F8/UHrzKavOGk82IQzLCbcW7eHPmyzkH8PTe1semLk9jHnXN5sK3
uL2nirZQVIF6QLbYxf0Uf2oSVs05xc7u9ZT+QOl6hqh/6/K7DWC45SoABwMXDaHH
jXr+y4vtr5IR8GQfZ/nd
=mdxx
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"This last cycle, Thor was busy adding Arria10 eth FIFO support to the
altera_edac driver along with other improvements. We have two
cleanups/fixes too.
Summary:
- Altera Arria10 ethernet FIFO buffer support (Thor Thayer)
- Minor cleanups"
* tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
ARM: dts: Add Arria10 Ethernet EDAC devicetree entry
EDAC, altera: Add Arria10 Ethernet EDAC support
EDAC, altera: Add Arria10 ECC memory init functions
Documentation: dt: socfpga: Add Arria10 Ethernet binding
EDAC, altera: Drop some ifdeffery
EDAC, altera: Add panic flag check to A10 IRQ
EDAC, altera: Check parent status for Arria10 EDAC block
EDAC, altera: Make all private data structures static
EDAC: Correct channel count limit
EDAC, amd64_edac: Init opstate at the proper time during init
EDAC, altera: Handle Arria10 SDRAM child node
EDAC, altera: Add ECC Manager IRQ controller support
Documentation: dt: socfpga: Add interrupt-controller to ecc-manager
In commit 2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver
detection") I broke Knights Landing because I failed to notice that it
called a wrapper macro "sbridge_get_all_devices_knl" instead of
"sbridge_get_all_devices" like all the other types.
Now that we include the processor type in the pci_id_table structure we
can skip the wrappers and just have the sbridge_get_all_devices() check
the type to decide whether to allow duplicate devices and controllers to
have registers spread across buses.
Fixes: 2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver detection")
Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for additional memory module ECCs, the IRQ function will
check a panic flag before doing a kernel panic on double bit errors.
OCRAM uncorrectable errors cause a panic because sleep/resume functions
and FPGA contents during sleep are stored in OCRAM.
ECCs on peripheral FIFO buffers will not cause a kernel panic on DBERRs
because the packet can be retried and therefore recovered.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1466603939-7526-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
c44696fff0 ("EDAC: Remove arbitrary limit on number of channels")
lifted the arbitrary limit on memory controller channels in EDAC.
However, the dynamic channel attributes dynamic_csrow_dimm_attr and
dynamic_csrow_ce_count_attr remained 6.
This wasn't a problem except channels 6 and 7 weren't visible in sysfs
on machines with more than 6 channels after the conversion to static
attr groups with
2c1946b6d6 ("EDAC: Use static attribute groups for managing sysfs entries")
[ without that, we're exploding in edac_create_sysfs_mci_device()
because we're dereferencing out of the bounds of the
dynamic_csrow_dimm_attr array. ]
Add attributes for channels 6 and 7 along with a guard for the
future, should more channels be required and/or to sanity check for
misconfigured machines.
We still need to check against the number of channels present on the MC
first, as Thor reported.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com>
Tested-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: <stable@vger.kernel.org> # 4.2
It is useless to do it if we're loaded on unsupported hardware so do
that only after we have detected at least 1 supported AMD northbridge.
Signed-off-by: Borislav Petkov <bp@suse.de>
Separate the device match arrays for each platform to prevent CycloneV
matches when calling of_platform_populate() on the Arria10 ECC manager
node.
If the SDRAM is a child node of ECC manager, call probe function via
of_platform_populate().
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1464193783-5071-4-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
In commit
2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver detection")
we switched from using PCI ids to determine which platform we are
running on to using CPU model instead.
I forgot that Broadwell-DE has its own distinct model number different
from Broadwell-EP or -EX.
Fixing this isn't just adding a line to the array of cpuids - the
exising code assumed a 1:1 mapping between entries in that array and the
"enum type" values. Added the type to pci_id_table structure to remove
this dependency and allows two Broadwell cpu models.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver detection")
Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Broadwell made a small change to the rank target register moving the
target rank ID field up from bits 16:19 to bits 20:23.
Also found that the offset field grew by one bit in the IVY_BRIDGE to
HASWELL transition, so fix the RIR_OFFSET() macro too.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org # v3.19+
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
* Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac. (Tony Luck)
* Do not register sb_edac with pci_register_driver(). (Tony Luck)
* Add support for Skylake to ie31200_edac. (Jason Baron)
* Do not register amd64_edac with pci_register_driver(). (Borislav Petkov)
+ the usual round of cleanups and fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXOaHaAAoJEBLB8Bhh3lVKRdoP/jDewaX4GAxgcKaK9Kx7VjMe
i42E/7syh5iLoyFgZ1hUjvqiQHN4RyWloUYyNHbNxGS9uXCMXIUoJQd2FjOY442s
oeEaoMCfZsJLzKQti3fUPK9GugZ57BSZxfn6NlJPpyG4FxSirOcZFCxGaNuyqqrh
0M+gFvbWfZcVDLE0FI5CUYZWRxsk//+pLM4ZkQZFA8Eo1tGVc3r0zEEAlL/uEMDW
P0ATvRHNUeib9YQGTdSD7cNUpRX4SX+T8lDUiaVm+tHL5ES3b4rayMBdbVMrahfc
Gnxu9GtO5gWXEY6QDDpOx+VkbfFmDFodx833psJ3MVD8evEFHHdinkgDaptLrrV6
92ZDKR5s3W6tKXkcmGuExrtc17UgjLcRZCebXbv+5FJlVrslzvQ9ESOTRyiZBrCD
ZpFi2TZhpYU1uEWuoBZCbWNXW2pcSt7/bQ9bYUvfrvNfgPzPnblubJuVKRcfY0WB
x2vj0PNnckpoPRvskV7GEe0Y/JISzAxBQUK6XO+GJgMgz5M+23SEaSVU5yeyf26e
x/yD5yImQGN84AxfRMMvbR2JvpVLN3vdFWtWigneht160erVA/Qgw5Gcrpw53tZr
zPD3fGaetrNgS8BvJAy/c6xHV1j6A1RGCGThy8Ivqep9WD90a5JhR1NRjZp6m8sf
aB0A1zTP7e+hRFkZGahO
=ud2P
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"It was pretty busy in EDAC land this time:
- Altera Arria10 L2 cache and On-Chip RAM ECC handling (Thor Thayer)
- Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac
(Tony Luck)
- Do not register sb_edac with pci_register_driver() (Tony Luck)
- Add support for Skylake to ie31200_edac (Jason Baron)
- Do not register amd64_edac with pci_register_driver() (Borislav
Petkov)
... plus the usual round of cleanups and fixes all over the place"
* tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (25 commits)
EDAC, amd64_edac: Drop pci_register_driver() use
EDAC, ie31200_edac: Add Skylake support
EDAC, sb_edac: Use cpu family/model in driver detection
EDAC, i7core: Remove double buffering of error records
EDAC, amd64_edac: Issue driver banner only on success
ARM: socfpga: Initialize Arria10 OCRAM ECC on startup
EDAC: Increment correct counter in edac_inc_ue_error()
EDAC, sb_edac: Remove double buffering of error records
EDAC: Fix used after kfree() error in edac_unregister_sysfs()
EDAC, altera: Avoid unused function warnings
EDAC, altera: Remove useless casts
ARM: socfpga: Enable Arria10 OCRAM ECC on startup
EDAC, altera: Add Arria10 OCRAM ECC support
Documentation: dt: socfpga: Add Altera Arria10 OCRAM binding
EDAC, altera: Make OCRAM ECC dependency check generic
EDAC, altera: Add register offset for ECC Enable
EDAC, altera: Extract error inject operations to a struct fops
ARM: socfpga: Enable Arria10 L2 cache ECC on startup
EDAC, altera: Add Arria10 L2 Cache ECC handling
Documentation, dt, socfpga: Add Altera Arria10 L2 cache binding
...
Use X86_FEATURE_SMCA when detecting if SMCA is available instead of
directly using CPUID 0x80000007_EBX.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- remove homegrown instances counting.
- take F3 PCI device from amd_nb caching instead of F2 which was used with the
PCI core.
With those changes, the driver doesn't need to register a PCI driver and
relies on the northbridges caching which we do anyway on AMD.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Skylake adjusts some register locations, but otherwise follows the
existing model quite closely. I was able to verify that the 'ce_count'
increments when 'bad dimms' are used. The accounting of 'ce_count' and
'ue_count' is the primary functionality of interest for us. Tested on
Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462547927-22679-1-git-send-email-jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Instead of picking a random PCI ID from the dozen or so we need to
access, just use x86_match_cpu() to pick based on CPU model number. The
choosing of PCI devices has been problematic in the past, see
11249e7399 ("sb_edac: Fix detection on SNB machines")
which fixed problems introduced by
d0585cd815 ("sb_edac: Claim a different PCI device").
This is especially ugly if future hardware might not even have
EDAC-relevant registers in PCI config space and we would still be
required to choose some "random" PCI devices to scan for just so our
driver loads.
Is this cleaner/clearer? It deletes much more code than it adds. Only
tested on Broadwell. The driver loads/unloads and loads again. Still
decodes errors too.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
In the bad old days the functions from x86_mce_decoder_chain could be
called in machine check context. So we used to carefully copy them and
defer processing until later. But in
f29a7aff4b ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
we switched the logging code to save the record in a genpool, and call
the functions that registered to be notified later from a work queue.
So drop all the double buffering and do all the work we want to do as
soon as i7core_mce_check_error() is called.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/29ab2c370915c6e132fc5d88e7b72cb834bedbfe.1461855008.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
In the bad old days the functions from x86_mce_decoder_chain could be
called in machine check context. So we used to carefully copy them and
defer processing until later. But in
f29a7aff4b ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
we switched the logging code to save the record in a genpool, and call
the functions that registered to be notified later from a work queue.
So drop all the double buffering and do all the work we want to do as
soon as sbridge_mce_check_error() is called.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: patrickg@supermicro.com
Link: http://lkml.kernel.org/r/100025611cd780d9bca72792b2b2146760da53e0.1460756761.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Code flow looks like this:
device_unregister(&mci->dev);
-> kobject_put+0x25/0x50
-> kobject_cleanup+0x77/0x190
-> device_release+0x32/0xa0
-> mci_attr_release+0x36/0x70
-> kfree(mci);
bus_unregister(mci->bus);
Fix is to grab a local copy of "mci->bus" and use that when we call
bus_unregister().
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The recently added Arria10 OCRAM ECC support caused some new harmless
warnings about unused functions when it is disabled:
drivers/edac/altera_edac.c:1067:20: error: 'altr_edac_a10_ecc_irq' defined but not used [-Werror=unused-function]
drivers/edac/altera_edac.c:658:12: error: 'altr_check_ecc_deps' defined but not used [-Werror=unused-function]
This rearranges the code slightly to have those two functions inside
of the same #ifdef that hides their callers. It also manages to
avoid a forward declaration of the IRQ handler in the process.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: Alan Tull <atull@opensource.altera.com>
Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: c7b4be8db8 ("EDAC, altera: Add Arria10 OCRAM ECC support")
Link: http://lkml.kernel.org/r/1460837650-1237650-2-git-send-email-arnd@arndb.de
Signed-off-by: Borislav Petkov <bp@suse.de>
The altera EDAC driver refers to its per-device data
using a cast to '(void *)', which makes the pointer
non-const, though both the source and destination are
actually const.
Removing the annotation makes the reference (almost)
fit into a single line for improved readability, and
ensures that it is actually defined as const.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: Alan Tull <atull@opensource.altera.com>
Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1460837650-1237650-1-git-send-email-arnd@arndb.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Haswell and Broadwell can be configured to hash the channel
interleave function using bits [27:12] of the physical address.
On those processor models we must check to see if hashing is
enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and
act accordingly.
Based on a patch by patrickg <patrickg@supermicro.com>
Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In commit:
eb1af3b71f ("Fix computation of channel address")
I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.
Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: eb1af3b71f ("Fix computation of channel address")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fix spelling typos found in printk
within various part of the kernel sources.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull RAS updates from Ingo Molnar:
"Various RAS updates:
- AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable
MCA) error decoding support (Aravind Gopalakrishnan)
- x86 memcpy_mcsafe() support, to enable smart(er) hardware error
recovery in NVDIMM drivers, based on an extension of the x86
exception handling code. (Tony Luck)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
EDAC/sb_edac: Fix computation of channel address
x86/mm, x86/mce: Add memcpy_mcsafe()
x86/mce/AMD: Document some functionality
x86/mce: Clarify comments regarding deferred error
x86/mce/AMD: Fix logic to obtain block address
x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
x86/mce: Move MCx_CONFIG MSR definitions
x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
x86/mm: Expand the exception table logic to allow new handling options
x86/mce/AMD: Set MCAX Enable bit
x86/mce/AMD: Carve out threshold block preparation
x86/mce/AMD: Fix LVT offset configuration for thresholding
x86/mce/AMD: Reduce number of blocks scanned per bank
x86/mce/AMD: Do not perform shared bank check for future processors
x86/mce: Fix order of AMD MCE init function call
Large memory Haswell-EX systems with multiple DIMMs per channel were
sometimes reporting the wrong DIMM.
Found three problems:
1) Debug printouts for socket and channel interleave were not interpreting
the register fields correctly. The socket interleave field is a 2^X
value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
2=3. 3=4).
2) Actual use of the socket interleave value didn't interpret as 2^X
3) Conversion of address to channel address was complicated, and wrong.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.
We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.
Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.
[ Minor fixups. ]
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Correct a typo introduced by
d0cdf90031 ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
As a result under some configurations DIMMs were not correctly
recognized. Problem affects only Xeon Phi architecture.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem on an EDAC device exit. However edac_debugfs might
not be empty. This is similar to
30f84a891b ("EDAC: Use edac_debugfs_remove_recursive()")
which changed the EDAC MCI code to use edac_debugfs_remove_recursive().
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455064165-3816-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We were getting this build warning:
drivers/edac/mpc85xx_edac.c:1247:6: warning: unused variable 'pvr'
pvr is only used if CONFIG_FSL_SOC_BOOKE is defined. Declare it
__maybe_unused.
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1454427573-7994-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
They're both running only when ->edac_check is initialized so remove
that check from the workqueue function itself. Synchronize/generalize
the ->op_state check between the two.
Kill useless comments, while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
We use the ->edac_check function pointers to determine whether we need
to setup a polling workqueue. However, the destroy path is not balanced
and we might try to teardown an unitialized workqueue.
Balance init and destroy paths by looking at ->edac_check in both cases.
Set op_state to OP_OFFLINE *before* destroying anything.
Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com>
Cc: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
dct_sel_base_off is declared as a u64 but we're only using the lower 32
bits because of a shift wrapping bug. This can possibly truncate the
upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS
row.
Fixes: c8e518d567 ('amd64_edac: Sanitize f10_get_base_addr_offset')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
Knights Landing does not come with register that could be used to fetch
DIMM width. However the value is fixed for this architecture so it can
be hardcoded.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Hide the EDAC workqueue pointer in a separate compilation unit and add
accessors for the workqueue manipulations needed.
Remove edac_pci_reset_delay_period() which wasn't used by anything. It
seems it got added without a user with
91b99041c1 ("drivers/edac: updated PCI monitoring")
Signed-off-by: Borislav Petkov <bp@suse.de>
It cannot fail now. We either load EDAC core after having successfully
initialized edac_subsys or we don't.
Signed-off-by: Borislav Petkov <bp@suse.de>
This was really dumb - reference counting for the main EDAC sysfs
object. While we could've simply registered it as the first thing in the
module init path and then hand it around to what needs it.
Do that and rip out all the code around it, thus simplifying the whole
handling significantly.
Move the edac_subsys node back to edac_module.c.
Signed-off-by: Borislav Petkov <bp@suse.de>
EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.
Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.
EDAC i7core: Driver loaded.
general protection fault: 0000 [#1] SMP
CPU 12
Modules linked in:
Supported: Yes
Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7
RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
< ... regs ...>
Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
Stack:
...
Call Trace:
call_timer_fn
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
intel_idle
cpuidle_idle_call
cpu_idle
Code: ...
RIP __queue_work
RSP <...>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Originally the mpc85xx-pci-edac driver bound directly to the PCI
controller node.
Commit
905e75c46d ("powerpc/fsl-pci: Unify pci/pcie initialization code")
turned the PCI controller code into a platform device. Since we can't
have two drivers binding to the same device, the EDAC code was changed
to be called into as a library-style submodule. However, this doesn't
work if the EDAC driver is built as a module.
Commit
8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting")
exposed another problem with this approach -- mpc85xx_pci_err_probe()
was being called in the same early boot phase that the PCI controller
is initialized, rather than in the device_initcall phase that the EDAC
layer expects. This caused a crash on boot.
To fix this, the PCI controller code now creates a child platform device
specifically for EDAC, which the mpc85xx-pci-edac driver binds to.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Jia Hongtao <B38951@freescale.com>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Kim Phillips <kim.phillips@freescale.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Knights Landing is the next generation architecture for HPC market.
KNL introduces concept of a tile and CHA - Cache/Home Agent for memory
accesses.
Some things are fixed in KNL:
() There's single DIMM slot per channel
() There's 2 memory controllers with 3 channels each, however,
from EDAC standpoint, it is presented as single memory controller
with 6 channels. In order to represent 2 MCs w/ 3 CH, it would
require major redesign of EDAC core driver.
Basically, two functionalities are added/extended:
() during driver initialization KNL topology is being recognized, i.e.
which channels are populated with what DIMM sizes
(knl_get_dimm_capacity function)
() handle MCE errors - channel swizzling
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449136134-23706-5-git-send-email-hubert.chrzaniuk@intel.com
[ Rebase to 4.4-rc3. ]
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Add options to sbridge_get_all_devices() to allow for duplicate device
IDs and devices that are scattered across mulitple PCI buses.
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449136134-23706-4-git-send-email-hubert.chrzaniuk@intel.com
[ Rebase to 4.4-rc3. ]
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
SAD limit, interleave mode and DRAM related functionalities are now
virtualized, so that overriding them is easier.
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449136134-23706-3-git-send-email-hubert.chrzaniuk@intel.com
[ Rebase to 4.4-rc3. ]
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1449073138-10852-2-git-send-email-thierry.reding@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1449136632-11680-1-git-send-email-thierry.reding@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig
to clean up various abuses of headers in there. The patch to rename the
io-64-nonatomic-*.h headers caused some conflicts with new users, so I
added a workaround that we can remove in the next merge window.
The only other patch is a warning fix from Marek Vasut
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5
nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh
sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL
rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA
4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3
uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01
KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe
wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ
iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p
jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN
smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu
vTUkB/I66Hk=
=dQKG
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"The asm-generic changes for 4.4 are mostly a series from Christoph
Hellwig to clean up various abuses of headers in there. The patch to
rename the io-64-nonatomic-*.h headers caused some conflicts with new
users, so I added a workaround that we can remove in the next merge
window.
The only other patch is a warning fix from Marek Vasut"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: temporarily add back asm-generic/io-64-nonatomic*.h
asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
gpio-mxc: stop including <asm-generic/bug>
n_tracesink: stop including <asm-generic/bug>
n_tracerouter: stop including <asm-generic/bug>
mlx5: stop including <asm-generic/kmap_types.h>
hifn_795x: stop including <asm-generic/kmap_types.h>
drbd: stop including <asm-generic/kmap_types.h>
move count_zeroes.h out of asm-generic
move io-64-nonatomic*.h out of asm-generic
Pull RAS changes from Ingo Molnar:
"The main system reliability related changes were from x86, but also
some generic RAS changes:
- AMD MCE error injection subsystem enhancements. (Aravind
Gopalakrishnan)
- Fix MCE and CPU hotplug interaction bug. (Ashok Raj)
- kcrash bootup robustness fix. (Baoquan He)
- kcrash cleanups. (Borislav Petkov)
- x86 microcode driver rework: simplify it by unmodularizing it and
other cleanups. (Borislav Petkov)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
x86/mce: Add a Scalable MCA vendor flags bit
MAINTAINERS: Unify the microcode driver section
x86/microcode/intel: Move #ifdef DEBUG inside the function
x86/microcode/amd: Remove maintainers from comments
x86/microcode: Remove modularization leftovers
x86/microcode: Merge the early microcode loader
x86/microcode: Unmodularize the microcode driver
x86/mce: Fix thermal throttling reporting after kexec
kexec/crash: Say which char is the unrecognized
x86/setup/crash: Check memblock_reserve() retval
x86/setup/crash: Cleanup some more
x86/setup/crash: Remove alignment variable
x86/setup: Cleanup crashkernel reservation functions
x86/amd_nb, EDAC: Rename amd_get_node_id()
x86/setup: Do not reserve crashkernel high memory if low reservation failed
x86/microcode/amd: Do not overwrite final patch levels
x86/microcode/amd: Extract current patch level read to a function
x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
...
The PAGES_TO_MiB macro is used for unit conversion but the
trace_mc_event() tracepoint expects a page address. Fix that.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This function doesn't give us the "Node ID" as the function name
suggests. Rather, it receives a PCI device as argument, checks
the available F3 PCI device IDs in the system and returns the
index of the matching Bus/Device IDs.
Rename it to amd_pci_dev_to_node_id().
No functional change is introduced.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The bootloader may or may not enable the ECC_CORR_EN bit. By
not enabling ECC_CORR_EN, when error happens, it is the user's
responsibility to perform a full SDRAM scrub.
Remove the check for ECC_CORR_EN.
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Thor Thayer <tthayer@opensource.altera.com>
Link: http://lkml.kernel.org/r/1444864456-21778-1-git-send-email-dinguyen@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
These are not implementations of default architecture code but helpers
for drivers. Move them to the place they belong to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem, but mci->debugfs might not empty.
This can be triggered by the following sequence:
1) Enable CONFIG_EDAC_DEBUG
2) insmod an EDAC module (like i3000_edac or similar)
3) rmmod this module
4) we can see files remaining under <debugfs_mountpoint>/edac/ like
"fake_inject", for example.
Removing edac_core then, causes a NULL pointer dereference.
Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com>
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This platform driver has an OF device ID table but the OF module alias
information is not created so module autoloading won't work.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20150917114619.GA13145@goodgumbo.baconseed.org
Signed-off-by: Borislav Petkov <bp@suse.de>
Git provides us all the changelogs anyway. So trim the comments section
here. Update the copyrights info while at it.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443440593-2316-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The scrub rate control register has moved to function 2 in PCI config
space and is at a different offset on family 0x15, models 0x60 and
later. The minimum recommended scrub rate has also changed. (Refer to
D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG).
Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate
this.
Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Cleanup conditionals. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Updating dimm_label to an empty string does not make much sense. Change
the sysfs dimm_label store operation to fail a request when an input
string is empty.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: elliott@hpe.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
in their store operation:
1) A newline-terminated input string causes redundant newlines:
# echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
test
# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 164 145 163 164 012 012
t e s t \n \n
0000006
2) The original label string (31 characters) cannot be stored due to
an improper size check:
# echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 012 012
\n \n
0000002
3) An input string longer than the buffer size results a wrong label
info as it allows a retry with the remaining string:
# echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
_TEST
Fix these issues by making the following changes:
1) Replace a newline character at the end by setting a null. It also
assures that the string is null-terminated in the label buffer.
2) Check the label buffer size with 'sizeof(dimm->label)'.
3) Fail a request if its string exceeds the label buffer size.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Robert Elliott <elliott@hpe.com>
Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
After
7d375bffa5 ("sb_edac: Fix support for systems with two home agents per socket")
sysfs "dimm_label" and "chX_dimm_label" show their label string without a
newline "\n" at the end.
[root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#
[root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#
The label strings now have 31 characters, which are the same as
EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
the newline in the format "%s\n" is ignored.
[root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
C P U _ S r c I D # 0 _ H a # 0
0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
_ C h a n # 0 _ D I M M # 0 \0
0000037
Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
snprintf()s in channel_dimm_label_show() and dimmdev_label_show().
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
In commit
7d375bffa5 ("sb_edac: Fix support for systems with two home agents per socket")
NUM_CHANNELS was changed to 8 and the channel space was renumerated to
handle EN, EP, and EX configurations.
The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() -
got a new device presence check in the form of saw_chan_mask. However,
sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop.
With the increase in NUM_CHANNELS, this loop fails at index 4 since
SB only has 4 TADs. This results in the following error on SB machines:
EDAC sbridge: Some needed devices are missing
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handle
This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as
well.
After this patch:
EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED)
EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED)
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.2
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We already have edac_mem_types[] that enumerates the different kinds of
memory. So, use that and remove the redundant memory_type[] array here.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1442436811-23382-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Drop CONFIG_EDAC_DEBUG ifdeffery too, while at it.
Tested-by: Loc Ho <lho@apm.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
This driver creates its debugfs hierarchy under the toplevel debugfs dir
- see i5100_init() - so make it use edac_debugfs_create_dir_at( , NULL)
because we're not breaking userspace. Oh well.
Signed-off-by: Borislav Petkov <bp@suse.de>
... into a separate compilation unit and drop a couple of
CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to
edac_create_debugfs_nodes(), while at it.
No functionality change.
Cc: <linux-edac@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
dimm_dev_type has been incorrectly determined in sb_edac. This patch fixes it
for Ivy Bridge and Haswell only since nothing like exists for Sandy Bridge.
We tested this patch in multiple systems matching the results with the
installed memory modules.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
In case the memory banks are populated so the first channel isn't used, the
DDRIO PCI device won't be visible and it won't be possible to determine the
memory type.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Pull RAS updates from Ingo Molnar:
"MCE handling updates, but also some generic drivers/edac/ changes to
better organize the Kconfig space"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras: Move AMD MCE injector to arch/x86/ras/
x86/mce: Add a wrapper around mce_log() for injection
x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check()
RAS: Add a menuconfig option with descriptive text
x86/mce: Reenable CMCI banks when swiching back to interrupt mode
x86/mce: Clear Local MCE opt-in before kexec
x86/mce: Remove unused function declarations
x86/mce: Kill drain_mcelog_buffer()
x86/mce: Avoid potential deadlock due to printk() in MCE context
x86/mce: Remove the MCE ring for Action Optional errors
x86/mce: Don't use percpu workqueues
x86/mce: Provide a lockless memory pool to save error records
x86/mce: Reuse one of the u16 padding fields in 'struct mce'
This is an x86-specific module and would benefit from being
closer to the arch code. Move it there. Update copyright while
at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This used to flush out MCEs logged during early boot and which
were in the MCA registers from a previous system run. No need
for that now, since we've moved to a genpool.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use unified genpool to save Action Optional error events and put
Action Optional error handling in the same notification chain as
MCE error decoding.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Fold in subsequent patch from Boris for early boot logging. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Correct a lot. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The commit
de3910eb79 ("edac: change the mem allocation scheme to
make Documentation/kobject.txt happy")
changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.
Signed-off-by: Michael Walle <michael@walle.cc>
Cc: <stable@vger.kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>
Currently, when decoding an MCE, we display 'CE' for a Deferred error, like
this:
[Hardware Error]: CPU:0 (15:2:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|Deferred|-|UECC]: 0xdc04b00095080813
When the 'UC' bit in the MCx_STATUS register is clear, the error status
is either a Corrected error or Deferred error as determined by the
'Deferred' bit. So do not print 'CE' on a deferred error.
Refer to AMD Error Scope Hierarchy table in a newer BKDG (example:
49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features").
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1436788382-6463-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
platform_driver does not need to set an owner because
platform_driver_register() will set it.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Loc Ho <lho@apm.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1436507243-11159-1-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Here is the driver core / firmware changes for 4.2-rc1.
A number of small changes all over the place in the driver core, and in
the firmware subsystem. Nothing really major, full details in the
shortlog. Some of it is a bit of churn, given that the platform driver
probing changes was found to not work well, so they were reverted.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D
jg8AnRgdW8ArDA/xOarULd/X43eA3J3C
=Al2B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the driver core / firmware changes for 4.2-rc1.
A number of small changes all over the place in the driver core, and
in the firmware subsystem. Nothing really major, full details in the
shortlog. Some of it is a bit of churn, given that the platform
driver probing changes was found to not work well, so they were
reverted.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
Revert "base/platform: Only insert MEM and IO resources"
Revert "base/platform: Continue on insert_resource() error"
Revert "of/platform: Use platform_device interface"
Revert "base/platform: Remove code duplication"
firmware: add missing kfree for work on async call
fs: sysfs: don't pass count == 0 to bin file readers
base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
base/platform: Remove code duplication
of/platform: Use platform_device interface
base/platform: Continue on insert_resource() error
base/platform: Only insert MEM and IO resources
firmware: use const for remaining firmware names
firmware: fix possible use after free on name on asynchronous request
firmware: check for file truncation on direct firmware loading
firmware: fix __getname() missing failure check
drivers: of/base: move of_init to driver_init
drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
sysfs: disambiguate between "error code" and "failure" in comments
driver-core: fix build for !CONFIG_MODULES
driver-core: make __device_attach() static
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVjFrRAAoJEAhfPr2O5OEVD70P/A17My+ABfqme1snLyJuiQsz
Uuu1JpS4ZcCxEdc4J/nCxexlbwr7b5/2bkSWASIQaOYMOcgM7CkBs5Z7fIDjpgOt
UMCMCnmVW6A/nMeVKfmcDri0BDzW73Yt8jT8Urtp4yz0k5wRV5fBHHK0JH2l8Zj6
ND9KPJQqA96cshe4TiLBJj+stZhRJhPG/FeQ5GwTs6j4mn+lyu7bGJ/V5h45iYkG
6frYP8UJMYR6g1VXl3ohTlyBJfEFdoG5Uz0H5oyqfRuFJtEuXNSCQVy2xWscBUc5
A4l/brTXXDhHgo6mq7tFDO2BldiIMWHZhLbDvzoAlVOK5s14q30DhSF8q25CP3+t
uszwV6mihGerr1VjUEm3n1Uus5wuAXsgn75k1dnqbL/3aut0UYkOS9skml3ioWiV
PQB+Ix3Bjz7k9Hwi4Rd2KZEhwvPAWXBKyRjScq9DYjo3+Er8ClgoJTcADiYXPTRM
ctR2WY6PLkMhTdePBMJbK51ZljMGKZj0boDFHofZceQWVaDod8dSFn5HFjQkc3nu
Ty9Qb3LMDPZeensVg5bOlIgXsLoUoo7zlwhdxVRAyc6duSJndKAJoQL6l1RTvXYM
UQbEdMQZURGzTRyGui+tX9+NBER0n7mJtw9qUKWaxi3GlTGH8ETg/ga0COM0SEkD
Ax+ZvqYeyJSh4/F2nGFV
=bjTk
-----END PGP SIGNATURE-----
Merge tag 'edac/v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac updates from Mauro Carvalho Chehab:
"Some fixes and additions to the EDAC driver used on modern Intel x86
CPUs. It includes support for Broadwell EP/EX platforms and fixes for
motherboards with more than 2 CPU sockets"
* tag 'edac/v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: support for Broadwell -EP and -EX
sb_edac: Fix support for systems with two home agents per socket
sb_edac: Fix a typo and a thinko in address handling for Haswell
EDAC: Remove arbitrary limit on number of channels
When during injection we populate MCi_MISC by writing into misc, we need
to set the MiscV bit in the corresponding MCi_STATUS register which
denotes that there's valid info in the MCi_MISC register.
Signed-off-by: Borislav Petkov <bp@suse.de>
Save us an indentation level, widen to 80 cols, make the text more
succinct and slender. Use i as the bank variable, same as what the
documentation uses.
Signed-off-by: Borislav Petkov <bp@suse.de>
Suspend-to-RAM and EDAC support are mutually exclusive on SOCFPGA. If
EDAC is enabled, it will prevent the platform from going into suspend.
The reason is that the IRQ vectors for OCRAM reside on DDR and in
Suspend-to-RAM mode we're executing out of OCRAM. If an ECC error
occurs, we can't handle it so it was decided to make them mutually
exclusive.
Signed-off-by: Alan Tull <atull@opensource.altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: dinh.linux@gmail.com
Cc: dougthompson@xmission.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: tthayer@opensource.altera.com
Link: http://lkml.kernel.org/r/1433512155-9906-1-git-send-email-dinguyen@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Provide information about each injection file and its usage for ease of
use and in-band documentation. This is a good idea adapted from ftrace.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1433277362-10911-7-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Add per-file permissions to the dfs_fls[] array.
In a later patch, we will add a README file that needs different
permissions. Hence the move here to add a perm field.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1433277362-10911-6-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Use strings such as "hw" or "sw" to indicate the type of error injection
to be performed.
Current flags attribute derives the meanings of values that can be
programmed into it from asm/mce.h. Moving to defined strings for the
attribute allows this module to be self-sufficient and removes the
dependency. Also, we can introduce new flags as and when needed without
having to worry about conflicting with the flags already defined in
asm/mce.h.
Also, modify do_inject() to use the newly defined injection_type enum to
figure out the injection mechanism we need to use
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1433277362-10911-4-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Use strstrip() return value. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The number of banks for a given processor is encoded in
MSR_IA32_MCG_CAP[7:0]. So obtain the value from that MSR and use it for
sanity checking in inj_bank_set() instead of doing a family/model check.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Link: http://lkml.kernel.org/r/1432753418-2985-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Basic support for the single socket Broadwell-DE processor
was added back in commit 1f39581a9a
sb_edac: Add support for Broadwell-DE processor
This patch extends Broadwell support to cover the two
socket "-EP" and four socket "-EX" versions of Broadwell.
Only tested on the 2 socket - but this code is largely
cloned from the Haswell path.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
First noticed a problem on a 4 socket machine where EDAC only reported
half the DIMMS. Tracked this down to the code that assumes that systems
with two home agents only have two memory channels on each agent. This
is true on 2 sockect ("-EP") machines. But four socket ("-EX") machines
have four memory channels on each home agent.
The old code would have had problems on two socket systems as it did
a shuffling trick to make the internals of the code think that the
channels from the first agent were '0' and '1', with the second agent
providing '2' and '3'. But the code didn't uniformly convert from
{ha,channel} tuples to this internal representation.
New code always considers up to eight channels.
On a machine with a single home agent these map easily to edac channels
0, 1, 2, 3. On machines with two home agents we map using:
edac_channel = 4*ha# + channel
So on a -EP machine where each home agent supports only two channels
we'll fill in channels 0, 1, 4, 5, and on a -EX machine we use all of 0,
1, 2, 3, 4, 5, 6, 7.
[mchehab@osg.samsung.com: fold a fixup patch as per Tony's request and fixed
a few CodingStyle issues]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
typo: "a7mode" chooses whether to use bits {8, 7, 9} or {8, 7, 6}
in the algorithm to spread access between memory resources. But
the non-a7mode path was incorrectly using GET_BITFIELD(addr, 7, 9)
and so picking bits {9, 8, 7}
thinko: BIT(1) of the dram_rule registers chooses whether to just
use the {8, 7, 6} (or {8, 7, 9}) bits mentioned above as they are,
or to XOR them with bits {18, 17, 16} but the code inverted the
test. We need the additional XOR when dram_rule{1} == 0.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Currently set to "6", but the reset of the code will dynamically
allocate as needed. We need to go to "8" today, but drop the check
completely to save doing this again when we need even larger numbers.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
<asm/edac.h> contains only the arch-specific scrubbing function and is
thus not needed in edac_stub.c. Kill it.
Signed-off-by: Borislav Petkov <bp@suse.de>
So first of all, this atomic_scrub() function's naming is bad. It looks
like an atomic_t helper. Change it to edac_atomic_scrub().
The bigger problem is that this function is arch-specific and every new
arch which doesn't necessarily need that functionality still needs to
define it, otherwise EDAC doesn't compile.
So instead of doing that and including arch-specific headers, have each
arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
for ifdeffery. Much cleaner.
And we already are doing this with another symbol - EDAC_SUPPORT. This
is also much cleaner than having CONFIG_EDAC enumerate all the arches
which need/have EDAC support and drivers.
This way I can kill the useless edac.h header in tile too.
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "Maciej W. Rozycki" <macro@codesourcery.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Steven J. Hill" <Steven.Hill@imgtec.com>
Cc: x86@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
While testing asynchronous PCI probe on this driver I noticed it failed
because the driver checks if any of the PCI devices have been bound to
the driver after registering it, which obviously does not work if
probing is asynchronous.
While there are patches and discussions on how the driver should behave
are ongoing, let's enforce synchronous probe for this driver for now.
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SDRAM EDAC requires SDRAM configuration/initialization before SDRAM
is accessed (in the preloader) and therefore before Linux is loaded.
Having a module compile is not desired so force to be built into kernel.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: mchehab@osg.samsung.com
Cc: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1429308974-26380-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
type T;
identifier f;
@@
static T f (...) { ... }
@@
identifier r.f;
declarer name EXPORT_SYMBOL_GPL;
@@
-EXPORT_SYMBOL_GPL(f);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tim Small <tim@buttersideup.com>
Link: http://lkml.kernel.org/r/1426092997-30605-13-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
edac_init() does not deallocate already allocated resources on failure
path.
Found by Linux Driver Verification project (linuxtesting.org).
[ Boris: The unwind path functions have __exit annotation but are being
used in an __init function, leading to section mismatches. Drop the
section annotation and make them normal functions. ]
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Borislav Petkov <bp@suse.de>
Instead of calling device_create_file() and device_remove_file()
manually, pass the static attribute groups with the new
edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs
files is done by a proper is_visible callback.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups. This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.
edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Instead of manual calls of device_create_file() and
device_remove_file(), use static attribute groups with proper
is_visible callbacks for managing the sysfs entries.
This simplifies the code a lot and avoids the possible races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
The pci_dev_put() function tests whether its argument is NULL and thus
the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/54CFC12C.9010002@users.sourceforge.net
Signed-off-by: Borislav Petkov <bp@suse.de>
* A fix to amd64_edac to not explode on Numascale machines with more
than 16 memory controllers, from Daniel J Blueman.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU5aSqAAoJEBLB8Bhh3lVKGbgP+gOzBaeBlIa9BcdrohiF2mKz
UyS/2v/RN/OK0F9u/LUJr15jwZex4TPbE7QPoMF6IvsHQR/5Jh5k4fZUaU8/3sY8
F6ugd7/I4x2mFrYvcJsK5PUm5ZqdYG6dyQKNAInqQw+/sAdL9i9shXz9SUUJXh98
Qa2zGSEyJVNIYmTi3rcSy8gxwJR8xdwL56iGmt4HEc/asjziSoIxOCwEVLh6OR9e
vgLDzOntgS56ymbPiNXHfpEE7IqBkJELCFma6RHer4L4R2fTlJRqk1oahS91tx3X
/m70IVrLu/v6/aDWMdDzSj9oTm6mZD2IpQEAurrp+kFqzLj7EUOlhnXJF1fsyEJY
OmDO+/Z72iRfgVv0SlT0AsrDkVvXUOfERBhyKZpd4wxUV2/XUYFqhwVPE86+al8p
wSuwUJrvKQ4bGdRQYEufZBO7JOUTk8K09iFEzREEYbzvEZPc4ZPMUTXAgOA54x6V
HOhD6NhPR0RJwg9OgqJndgnF0XTcruh6/LuFO2ioyKCR92hjutwuoYyxVI8jfv1W
rHB09wdNzv4EIIoH/5BeBNIv3Vtc04n5d6MRWbYHmBWAt6Ib+jBXWVNMoWSrAOIQ
4MNBgxH5sEhTFKnFOt+/cXZktLVr0G79vii5GaodXKgaTnV59Jm4sV06k5vxHxly
eee2zEWMoT3M4fhYC3St
=Zypx
-----END PGP SIGNATURE-----
Merge tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull two EDAC fixes from Borislav Petkov:
- A fix to sb_edac for proper detection on SNB machines
- A fix to amd64_edac to not explode on Numascale machines with more
than 16 memory controllers, from Daniel J Blueman.
* tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
sb_edac: Fix detection on SNB machines
d0585cd815 ("sb_edac: Claim a different PCI device") changed the
probing of sb_edac to look for PCI device 0x3ca0:
3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
...
but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
in sbridge_probe() therefore the probing fails.
Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
.i.e., the 14.0 device, fixes the issue and driver loads successfully
again:
[ 2449.013120] EDAC DEBUG: sbridge_init:
[ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
[ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
[ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
...
Add a debug printk while at it to be able to catch the failure in the
future and dump driver version on successful load.
Fixes: d0585cd815 ("sb_edac: Claim a different PCI device")
Cc: stable@vger.kernel.org # 3.18
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
If edac_mc_add_mc() fails then we should preserve the error code, but
instead the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20150128191351.GC10259@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
Also use goto labels for all failure paths in
edac_create_sysfs_mci_device and update meaningless labels.
Signed-off-by: Junjie Mao <junjie.mao@hotmail.com>
Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl
[ Boris: Use ! for 0 checks and add newlines for less crammed code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Add EDAC support for ecc errors reporting on the synopsys ddr
controller. The ddr ecc controller corrects single bit errors and
detects double bit errors.
Selected important-ish notes from the changelog:
- I have not taken care of spliting synps_edac_geterror_info function as
it adds additional indentation levels and moreover the existing changes
were made as part of the v2 review comments
- Removed dt binding info as already there is a binding info available
under memorycontroller. so, updated ecc info there.
- Shortened the prefix "sysnopsys" to "synps"
Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Link: http://lkml.kernel.org/r/a728a8d4678f4dbf9de189a480297c3d@BY2FFO11FD034.protection.gbl
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fixes the following sparse warnings:
drivers/edac/mce_amd_inj.c:204:3: warning:
symbol 'dfs_fls' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Link: http://lkml.kernel.org/r/1418087095-14174-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes, just
removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There are
some ath9k patches coming in through this tree that have been acked by
the wireless maintainers as they relied on the debugfs changes.
Everything has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
53kAoLeteByQ3iVwWurwwseRPiWa8+MI
=OVRS
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUhyDVAAoJEAhfPr2O5OEVsdMP/it7jKAKu1uHvVoWfwPcB0nG
QZ7Dt3Hu9aZtaZJs9OFOl2RrGyp4tXOK1SM2QInRGRjjquEP83AASk7XJRkXHr61
DTiMxHnKXvGfky4ONS76ZP3JWwfBffT5Y8oXIGiarAyUgcK66mnelrlju0r1XM11
uoUZqOzE5ySfARFcYm9VG53qLG4RxOERqP+EKSyctRE5gCsuymPaSKIwqj7rcg4R
QikDJQv2H8y2Ui0T9Dk6urdOjlUpBczGRVaXdY+2FyDfs0KPLEWUQVl8q2Lj/bur
1A10H+p9HdIA9emV2WEetQmJzz7POgn/j+BLkRKImPS5NgKv2Gu5BWiVUKQg4oRt
shvcpp38PYSmeDIy16v+3RDf0THSSYxhq7ExlGacIzaRk7G2asgRAWixiVS45NzI
yMb17ZJlHhzVMEyoSlTJH90o4y6UT9njZ7TcK3+9wmhbFyIlGWrrD1Mp56PDCYes
KO5VRYkwnw8GAd7gH0xwtHRREQYt4VneVyb7Ccw2VxrdIffVp1fDhpxnS+tw+Qri
294+RoJlnrp4JIpXerAJgQIIiMsRhg64EQ3rsT4Skj6Hfq4KExkmdrHE38p+FZLO
xHURKAt2jDSuu/DqaWpBclJXE4Q+MhS303pn80x0MHbFmRAYxwmWFjgPwQKrzNU9
x47JgdYNpBoF2+WK6ZGo
=CeIU
-----END PGP SIGNATURE-----
Merge tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac updates from Mauro Carvalho Chehab:
- Broadwell-DE support on sb-edac driver
- Some fixes at sb-edac driver
* tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: Fix typo computing number of banks
sb_edac: Add support for Broadwell-DE processor
sb_edac: Fix discovery of top-of-low-memory for Haswell
sb_edac: Fix erroneous bytes->gigabytes conversion
sb_edac: Fix off-by-one error in number of channels
Pull x86 RAS update from Ingo Molnar:
"The biggest change in this cycle is better support for UCNA
(UnCorrected No Action) events:
"Handle all uncorrected error reports in the same way (soft
offline the page). We used to only do that for SRAO
(software recoverable action optional) machine checks, but
it makes sense to also do it for UCNA (UnCorrected No
Action) logs found by CMCI or polling."
plus various x86 MCE handling updates and fixes"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Spell "panicked" correctly
x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
x86, MCE, AMD: Assign interrupt handler only when bank supports it
x86, MCE, AMD: Drop software-defined bank in error thresholding
x86, MCE, AMD: Move invariant code out from loop body
x86, MCE, AMD: Correct thresholding error logging
x86, MCE, AMD: Use macros to compute bank MSRs
RAS, HWPOISON: Fix wrong error recovery status
GHES: Make ghes_estatus_caches static
APEI, GHES: Cleanup unnecessary function for lockless list
* Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM
support. Out of tree stuff is
arch/x86/kernel/amd_nb.c | 2 +
include/linux/pci_ids.h | 2 +
adding the required PCI IDs. From Aravind Gopalakrishnan.
* Enable amd64_edac for 32-bit due to popular demand. From Tomasz Pala.
* Convert the AMD MCE injection module to debugfs, where it belongs.
* Misc EDAC cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUhZbcAAoJEBLB8Bhh3lVKd5kQAK77qsB4ebpke1rEBerQl9jQ
YCVrByCKu7QTRt/xvlqU9Vyp7EvcnpNxFbRCCqzIpBcJjre9v17QVRA2/zFS0q81
QRDTOWf9uhMWbssI2Zu1hbjDNMWYiEb9+aeZHjScVtkzPDmsgYuKGdWfTSLw9dkS
AG/UUZ3ojwyc2dK3i0W3pCjakKYLUsCmijyTZBfb37+u4rRGuAgrQ9G8fBn2lL+l
huptgV2BsTCQqdL554zTs64Yt912PnidsJWYCPCjMubgEPSeNcWRzTTBYDUf9NIn
RxFXYHnOQxetPSQqfLVXlX3V/cGNQg0yEXFZ9S0tCt5uLNbbN/D8Uumtst0rq9x3
XkJ/EGHXBFP+KwHdV/i9j6OYM5rq4z+4ql4OqbWzsvPrEDbh/4p5gRbhqd1Hhy9U
zgHJjVPpD/l2t82Tpz0jIOscQruZ6VqGMDSYo3LiLnNt724pcrmr3DiN9mc6ljzJ
rsNsemMH0IoH8KbBHKGtMLnBVO6HbnrtC6iKFfocNBisvo1PKKzn9s2O1pdjmsCs
jHwz5njoM7Ki/ygkJhbKiSDMXPs67eggwoGIGzpNMoY4RWxrcQzYE9yKfKzRNxET
Qb3xUwDWDyL8ErwHtL3xMxGwyfkhb+SZdMd5aKYA5Rdbf+TN8P6iAv05nrnfpkyk
lPTv5o9EQYvr8/Tc9FZW
=ULmb
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"EDAC updates all over the place:
- Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM
support. Out of tree stuff is adding the required PCI IDs. From
Aravind Gopalakrishnan.
- Enable amd64_edac for 32-bit due to popular demand. From Tomasz
Pala.
- Convert the AMD MCE injection module to debugfs, where it belongs.
- Misc EDAC cleanups"
* tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, MCE, AMD: Correct formatting of decoded text
EDAC, mce_amd_inj: Add an injector function
EDAC, mce_amd_inj: Add hw-injection attributes
EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
EDAC: Delete unnecessary check before calling pci_dev_put()
EDAC, pci_sysfs: remove unneccessary ifdef around entire file
ghes_edac: Use snprintf() to silence a static checker warning
amd64_edac: Build module on x86-32
EDAC, MCE, AMD: Add decoding table for MC6 xec
amd64_edac: Add F15h M60h support
{mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED
EDAC: Sync memory types and names
EDAC: Add DDR3 LRDIMM entries to edac_mem_types
x86, amd_nb: Add device IDs to NB tables for F15h M60h
pci_ids: Add PCI device IDs for F15h M60h
Code will always think there are 16 banks because of a typo
Reported-by: Misha
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Broadwell-DE is the microserver version of next generation Xeon
processors. A whole bunch of new PCIe device ids, but otherwise
pretty much the same as Haswell.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset. There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).
This resulted in a bogus value for the top of low memory:
EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)
which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:
sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523eac86
sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)
With the fix we see the correct TOLM value:
DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)
and we decode address 2000000 correctly:
sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523e1086
sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Write out MCx_ADDR into the more humanly readable "MCx Error Address"
and remove double colon in the output.
Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Selectively inject either a real MCE or a sw-only version which
exercises the decoding path only. The hardware-injected MCE triggers a
machine check exception (#MC) so that the MCE handler can be bothered to
do something too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.
This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.
In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.
Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test before the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/546CB20D.4070808@users.sourceforge.net
[ Boris: commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The file edac_pci_sysfs.c is dependent on CONFIG_PCI. This is already
modelled in the Makefile, but edac_pci_sysfs.o is still contained in
the list of files compiled even without CONFIG_PCI.
This change removes edac_pci_sysfs.o from the list of built objects
when not having CONFIG_PCI enabled and removes the then-unnecessary
ifdef from the source file.
Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Link: http://lkml.kernel.org/r/1407697803-3837-1-git-send-email-rupran@einserver.de
Signed-off-by: Borislav Petkov <bp@suse.de>
My static checker complains because the "e->location" has up to 256
characters but we are copying it into the "pvt->detail_location" which
only has space for 240 characters. That's not counting the surrounding
text and the "e->other_detail" string which can be over 80 characters
long.
I am not familiar with this code but presumably it normally works.
Let's add a limit though for safety.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/20140801082514.GD28869@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
By popular demand, enable amd64_edac on 32-bit too.
Boris:
- update Kconfig text.
- add a warning on load which states that 32-bit configurations are unsupported.
Signed-off-by: Tomasz Pala <gotar@polanet.pl>
Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.pl
Signed-off-by: Borislav Petkov <bp@suse.de>
This patch adds support for ECC error decoding for F15h M60h processor.
Aside from the usual changes, the patch adds support for some new features
in the processor:
- DDR4(unbuffered, registered); LRDIMM DDR3 support
- relevant debug messages have been modified/added to report these
memory types
- new dbam_to_cs mappers
- if (F15h M60h && LRDIMM); we need a 'multiplier' value to find
cs_size. This multiplier value is obtained from the per-dimm
DCSM register. So, change the interface to accept a 'cs_mask_nr'
value to facilitate this calculation
- switch-casing determine_memory_type()
- done to cleanse the function of too many if-else statements
and improve readability
- This is now called early in read_mc_regs() to cache dram_type
Misc cleanup:
- amd64_pci_table[] is condensed by using PCI_VDEVICE macro.
Testing details:
Tested the patch by injecting 'ECC' type errors using mce_amd_inj
and error decoding works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: determine_memory_type() cleanups ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Make keeping the sync between the mem_types enum and the actual string
names simpler by using designated initializers.
Signed-off-by: Borislav Petkov <bp@suse.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUNsUKAAoJEAhfPr2O5OEVXSMQAJ953vtyqmMEi02NMf24NpA+
OuTmoe5c8RT7/+YD2edHzG7VgYIE1L4evVOm71XLoPwI3mmGwuAIYZAFvVXsYPtB
U7xfWHpC5r29DlTQen7L0doD1NnRTQfOWFUtsHnd5ygdrzmIHToXqqN1IjoQudpK
i8ttU5zshSMt1IPwFh+CXMHkp8wA11hGX4LyonmJ0WD/bCeu8kreilRrK/ehZtAU
sBjzEif2+xtqAWBaxyZ0IzWBJdYBjo6u68jyifK0liM8oBZ8vov11i6cCZBuGGAy
eNG6lNBmak77U187yUeeyqjbdhnPy7NPLEvvfDN/C5voGHqPkrKEjH64j54ayeaR
TQ6u6VlPJ+3RXeqtRfYbMOgHAsxNMpJZLkKx0NNty073RQc2qgX8Xxs4t33+9B37
TfkoL8fnkh7Mq9x8czRw4n5X4qiw2ZeDDNX4TZPdGP2QQwP3JVDfMOzyr6x+BhMp
4TwCp+Sr9PeohyGYZ8InjRdxkA3yTrc1CbqSC00lXBe0lt9Pw4aQ9HGFQWSXxYH3
uZ8PoqBpxy3C5xvcwDQ8kjmu6y0GPtsRHAdb0G3HW3bjMiLuQbVohTxILEvg9HWr
2tSyKSfUZXJaTckXfWOVelyexkh3IHvQ0AoQFtG6qXQh2HF12S1OV7EoGxTua2Ye
/XDAjKzdi12tRLlY172L
=FRmr
-----END PGP SIGNATURE-----
Merge tag 'edac/v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac updates from Mauro Carvalho Chehab:
"Nothing really exiting here: just one bug fix at sb_edac, and some
changes to allow other drivers to use some shared PCI addresses"
* tag 'edac/v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: Claim a different PCI device
Move Intel SNB device ids from sb_edac to pci_ids.h
sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel
These are changes for drivers that are intimately tied to some SoC
and for some reason could not get merged through the respective
subsystem maintainer tree.
Most of the new code is for the Keystone Navigator driver, which is
new base support that is going to be needed for their hardware
accelerated network driver and other units.
Most of the commits are for moving old code around from at91 and omap
for things that are done in device drivers nowadays.
- at91: move reset, poweroff, memory and clocksource code into drivers
directories
- socfpga: add edac driver (through arm-soc, as requested by Boris)
- omap: move omap-intc code to drivers/irqchip
- sunxi: added an RTC driver for sun6i
- omap: mailbox driver related changes
- keystone: support for the "Navigator" component
- versatile: new reboot, led and soc drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAVDWWQGCrR//JCVInAQKX7Q//bDkoseKCZsGaXN7vfQ2YhT3SAc52mROV
YQKdNmtMUrHqDgngATZTx5ogOh1hInnqueFjGGhfMYsHQO1Vj8+odj0r+4jhjuUY
3YfY+qZ+91tq33JlUOhKn+mfVMdxJc8XarGgR6MSWYkqWVYCtLtBluum7hKm2UJ6
/e4hd2zzImX5ATwj/LXWLx5eTf1qAVFGWzNUph1DrW+1V5lOu58X4gKwk1QOCVEh
Pa0GV9oRTkjoswwz9drzjeFtie2yofQ2mygj6QKxg5NsosIF0+B8kJ61Sxwg56Ak
tF+qn1hGtB2cDQkpxK4o2cZgCELhkh5Aqgol/vZUS1DMBSUEGCV9PPp2eOW83r3B
0zsTgsShyVcTh7khdpQmHNRigvcc7e69LaAGC4o/RxaZpCU/LUNCQ+/iqVExSE8A
VNEXr+JNxGxhj3m9KUHuEktdWx1oNvaYR8Rr4RPr6EWR8R6emJ04I7kXInvzhJZL
HOGh75vSuAU83FrsP8fFRLadoHNVDXylAs38BPfGEMngVpjvwJLgQ3+729CwW+Q4
+xQXAKSwKfr8xA8eg6wBSbFcwnEW4QwRqFqQ5XPw7zTZkCZbiLtvn3JpI5bH5A5Q
/d2D+M2vFbB7VbWJBM4etO95eNS/pfhqJhcQh4t0DjXjoW6WqLiHCxhEx8Ogfvop
/4ckyGvtEOI=
=POJD
-----END PGP SIGNATURE-----
Merge tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
"These are changes for drivers that are intimately tied to some SoC and
for some reason could not get merged through the respective subsystem
maintainer tree.
Most of the new code is for the Keystone Navigator driver, which is
new base support that is going to be needed for their hardware
accelerated network driver and other units.
Most of the commits are for moving old code around from at91 and omap
for things that are done in device drivers nowadays.
- at91: move reset, poweroff, memory and clocksource code into
drivers directories
- socfpga: add edac driver (through arm-soc, as requested by Boris)
- omap: move omap-intc code to drivers/irqchip
- sunxi: added an RTC driver for sun6i
- omap: mailbox driver related changes
- keystone: support for the "Navigator" component
- versatile: new reboot, led and soc drivers"
* tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (92 commits)
bus: arm-ccn: Fix spurious warning message
leds: add device tree bindings for register bit LEDs
soc: add driver for the ARM RealView
power: reset: driver for the Versatile syscon reboot
leds: add a driver for syscon-based LEDs
drivers/soc: ti: fix build break with modules
MAINTAINERS: Add Keystone Multicore Navigator drivers entry
soc: ti: add Keystone Navigator DMA support
Documentation: dt: soc: add Keystone Navigator DMA bindings
soc: ti: add Keystone Navigator QMSS driver
Documentation: dt: soc: add Keystone Navigator QMSS bindings
rtc: sunxi: Depend on platforms sun4i/sun7i that actually have the rtc
rtc: sun6i: Add sun6i RTC driver
irqchip: omap-intc: remove unnecessary comments
irqchip: omap-intc: correct maximum number or MIR registers
irqchip: omap-intc: enable TURBO idle mode
irqchip: omap-intc: enable IP protection
irqchip: omap-intc: remove unnecesary of_address_to_resource() call
irqchip: omap-intc: comment style cleanup
irqchip: omap-intc: minor improvement to omap_irq_pending()
...
sb_edac controls a large number of different PCI functions. Rather
than registering as a normal PCI driver for all of them, it
registers for just one so that it gets probed and, at probe time, it
looks for all the others.
Coincidentally, the device it registers for also contains the SMBUS
registers, so the PCI core will refuse to probe both sb_edac and a
future iMC SMBUS driver. The drivers don't actually conflict, so
just change sb_edac's device table to probe a different device.
An alternative fix would be to merge the two drivers, but sb_edac
will also refuse to load on non-ECC systems, whereas i2c_imc would
still be useful without ECC.
The only user-visible change should be that sb_edac appears to bind
a different device.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Rui Wang <ruiv.wang@gmail.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The i2c_imc driver will use two of them, and moving only part of
the list seems messier.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but
EDAC doesn't know about this and returns and INTERNAL ERROR when the
channel is greater than NUM_CHANNELS:
kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f
kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0
kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4)
kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0)
This commit changes sb_edac to forward a channel of -1 to EDAC if the
channel is not specified. edac_mc_handle_error() sets the channel to -1
internally after the error message anyway, so this commit should have no
effect other than avoiding the INTERNAL ERROR message when the channel
is not specified.
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Rationale behind this change:
- F2x1xx addresses were stopped from being mapped explicitly to DCT1
from F15h (OR) onwards. They use _dct[0:1] mechanism to access the
registers. So we should move away from using address ranges to select
DCT for these families.
- On newer processors, the address ranges used to indicate DCT1 (0x140,
0x1a0) have different meanings than what is assumed currently.
Changes introduced:
- amd64_read_dct_pci_cfg() now takes in dct value and uses it for
'selecting the dct'
- Update usage of the function. Keep in mind that different families
have specific handling requirements
- Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different
from amd64_read_pci_cfg()
- Move the k8 specific check to amd64_read_pci_cfg
- Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg()
- Remove now needless .read_dct_pci_cfg
Testing:
- Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG'
and mce_amd_inj
- driver obtains info from F2x registers and caches it in pvt
structures correctly
- ECC decoding works fine
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix the following error
drivers/edac/ppc4xx_edac.c:977:45: error: request for member 'dimm' in something
not a structure or union
by changing member access to pointer dereference.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Link: http://lkml.kernel.org/r/1408482646-22541-1-git-send-email-bobby.prani@gmail.com
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
This patch adds support for the CycloneV and ArriaV SDRAM controllers.
Correction and reporting of SBEs, Panic on DBEs.
There was a discussion thread on whether this driver should be an mfd driver
or just make use of syscon, which is already a mfd. Ultimately, the
decision to use a simple syscon interface was reached.[1]
[1] https://lkml.org/lkml/2014/7/30/514
[dinguyen] Fixed Kconfig to have EDAC_ALTERA_MC as a tristate to prevent a
build failure for allmodconfig.
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Acked-by: Borislav Petkov <bp@suse.de>
[dinguyen] cleaned up commit message
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Pull EDAC updates from Mauro Carvalho Chehab.
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: add support for Haswell based systems
sb_edac: Fix mix tab/spaces alignments
edac: add DDR4 and RDDR4
sb_edac: remove bogus assumption on mc ordering
sb_edac: make minimal use of channel_mask
sb_edac: fix socket detection on Ivy Bridge controllers
sb_edac: update Kconfig description
sb_edac: search devices using product id
sb_edac: make RIR limit retrieval per model
sb_edac: make node id retrieval per model
sb_edac: make memory type detection per memory controller
window:
Group changes to the device tree. In preparation for adding device tree
overlay support, OF_DYNAMIC is reworked so that a set of device tree
changes can be prepared and applied to the tree all at once. OF_RECONFIG
notifiers see the most significant change here so that users always get
a consistent view of the tree. Notifiers generation is moved from before
a change to after it, and notifiers for a group of changes are emitted
after the entire block of changes have been applied
Automatic console selection from DT. Console drivers can now use
of_console_check() to see if the device node is specified as a console
device. If so then it gets added as a preferred console. UART devices
get this support automatically when uart_add_one_port() is called.
DT unit tests no longer depend on pre-loaded data in the device tree.
Data is loaded dynamically at the start of unit tests, and then unloaded
again when the tests have completed.
Also contains a few bugfixes for reserved regions and early memory setup.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT6M7XAAoJEMWQL496c2LNTTgP/2rXyrTTGZpK/qrLKHWKYHvr
XL7tcTkhA0OLU64E37fB+xtDXyBYnLsharuwUFSd1LlL1Wnc1cZN5ORRlMJbmjUR
Wvwl0A8/mkhGl4tzzKgJ4z4rMJXvlZfJRpnVoRB5FOn90LI7k/jsf5rIwF/6S90B
6D6II0r4RG9ku1m7g70cToxcIFCzp0V+eu2tym9GnhsyGKlunPT9iNiTpwfVhPAj
QUvMPKIQXReOv6xDU5q6E07839IMf7SdAvciBTHGnCDD7sGziHvnBIShj/2vTDAF
27sGRKrWUnnuxRUMOoIudiYyeHXIdt1WXp6FsS/ztVI37Ijh9YPShJwWGhQDppnp
4tcoSdefqw9IRUajAVWsB0RUW/tCL4a7tggWofylOA6itDi+HZnw6YafE1G1YzxF
q8OFo9uqLcmFQfHDJpk+sdtXoMZzOgrxlEscsGsQ8kd2Uoe8+chgR9EY1sqPkWVF
Zw0FJCTB6spBlsnXeboBGrnvpbPkacwhvesIFO0IANy4j4R55xlEeTcs1fe3ubUf
UhNyyFdnCAUA7e5CAabcAQYsdbEKG6v0Il3H6xts36c8lXCSFXVgNcw5mdCpFCcQ
DZ3/1FGSVzkYNXX8hlzIn1W0dqvwn4x0ZbnpXUJrGUBpSmjt9qkXx0XbdeFwartU
7X4tNGS1jfNYhOdP87jF
=8IFz
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull device tree updates from Grant Likely:
"The branch contains the following device tree changes the v3.17 merge
window:
Group changes to the device tree. In preparation for adding device
tree overlay support, OF_DYNAMIC is reworked so that a set of device
tree changes can be prepared and applied to the tree all at once.
OF_RECONFIG notifiers see the most significant change here so that
users always get a consistent view of the tree. Notifiers generation
is moved from before a change to after it, and notifiers for a group
of changes are emitted after the entire block of changes have been
applied
Automatic console selection from DT. Console drivers can now use
of_console_check() to see if the device node is specified as a console
device. If so then it gets added as a preferred console. UART
devices get this support automatically when uart_add_one_port() is
called.
DT unit tests no longer depend on pre-loaded data in the device tree.
Data is loaded dynamically at the start of unit tests, and then
unloaded again when the tests have completed.
Also contains a few bugfixes for reserved regions and early memory
setup"
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: (21 commits)
of: Fixing OF Selftest build error
drivers: of: add automated assignment of reserved regions to client devices
of: Use proper types for checking memory overflow
of: typo fix in __of_prop_dup()
Adding selftest testdata dynamically into live tree
of: Add todo tasklist for Devicetree
of: Transactional DT support.
of: Reorder device tree changes and notifiers
of: Move dynamic node fixups out of powerpc and into common code
of: Make sure attached nodes don't carry along extra children
of: Make devicetree sysfs update functions consistent.
of: Create unlocked versions of node and property add/remove functions
OF: Utility helper functions for dynamic nodes
of: Move CONFIG_OF_DYNAMIC code into a separate file
of: rename of_aliases_mutex to just of_mutex
of/platform: Fix of_platform_device_destroy iteration of devices
of: Migrate of_find_node_by_name() users to for_each_node_by_name()
tty: Update hypervisor tty drivers to use core stdout parsing code.
arm/versatile: Add the uart as the stdout device.
of: Enable console on serial ports specified by /chosen/stdout-path
...
Pull RAS updates from Ingo Molnar:
"The main changes in this cycle are:
- RAS tracing/events infrastructure, by Gong Chen.
- Various generalizations of the APEI code to make it available to
non-x86 architectures, by Tomasz Nowicki"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras: Fix build warnings in <linux/aer.h>
acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context.
acpi, apei, ghes: Make NMI error notification to be GHES architecture extension.
apei, mce: Factor out APEI architecture specific MCE calls.
RAS, extlog: Adjust init flow
trace, eMCA: Add a knob to adjust where to save event log
trace, RAS: Add eMCA trace event interface
RAS, debugfs: Add debugfs interface for RAS subsystem
CPER: Adjust code flow of some functions
x86, MCE: Robustify mcheck_init_device
trace, AER: Move trace into unified interface
trace, RAS: Add basic RAS trace event
x86, MCE: Kill CPU_POST_DEAD
Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge
ones. This patch adds support to Haswell based systems.
[m.chehab@samsung.com: Fix CodingStyle issues]
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Haswell memory controller can make use of DDR4 and Registered DDR4
Cc: tony.luck@intel.com
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
When a MC is handled, the correct sbridge_dev is searched based on the node,
checking again later with the assumption the first memory controller found is
the first socket's memory controller is a bogus assumption. Get rid of it.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
channel_mask will be used in the future to determine which group of memory
modules is causing the errors since when mirroring, lockstep and close page
are enabled you can't. While that doesn't happen, use the channel_mask to
determine the channel instead of relying on the MC event/exception.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This patch fixes the obvious bug while handling the socket/HA bitmask used in
Ivy Bridge memory controllers.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This patch changes the way devices are searched by using product id instead of
device/function numbers. Tested in a Sandy Bridge and a Ivy Bridge machine to
make sure everything works properly.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Haswell has a different way to retrieve RIR limits, make this procedure per
model.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Haswell has a different way to retrieve the node id, make so this procedure
can be reimplemented.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Haswell has different register, offset to determine memory type and supports
DDR4 in some models. This patch makes it easier to have a different method
depending on the memory controller type.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
There are a bunch of users open coding the for_each_node_by_name() by
calling of_find_node_by_name() directly instead of using the macro. This
is getting in the way of some cleanups, and the possibility of removing
of_find_node_by_name() entirely. Clean it up so that all the users are
consistent.
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Takashi Iwai <tiwai@suse.de>
To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.
Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Assign PCI resources before pci_bus_add_device(). The resources must be
assigned before a driver can claim the device.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_bus_add_device() always returns 0, so there's no point in returning
anything at all. Make it a void function and remove the tests of the
return value from the callers.
[bhelgaas: changelog, remove unused "err" from i82875p_setup_overfl_dev()]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
295d8cda26 ("EDAC, MCE, AMD: Drop local coreid reporting") removed the
code snippet which used that mask but forgot to drop the mask itself. Do
that now.
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull media updates from Mauro Carvalho Chehab:
"The main set of series of patches for media subsystem, including:
- document RC sysfs class
- added an API to setup scancode to allow waking up systems using the
Remote Controller
- add API for SDR devices. Drivers are still on staging
- some API improvements for getting EDID data from media
inputs/outputs
- new DVB frontend driver for drx-j (ATSC)
- one driver (it913x/it9137) got removed, in favor of an improvement
on another driver (af9035)
- added a skeleton V4L2 PCI driver at documentation
- added a dual flash driver (lm3646)
- added a new IR driver (img-ir)
- added an IR scancode decoder for the Sharp protocol
- some improvements at the usbtv driver, to allow its core to be
reused.
- added a new SDR driver (rtl2832u_sdr)
- added a new tuner driver (msi001)
- several improvements at em28xx driver to fix PM support, device
removal and to split the V4L2 specific bits into a separate
sub-driver
- one driver got converted to videobuf2 (s2255drv)
- the e4000 tuner driver now follows an improved binding model
- some fixes at V4L2 compat32 code
- several fixes and enhancements at videobuf2 code
- some cleanups at V4L2 API documentation
- usual driver enhancements, new board additions and misc fixups"
[ NOTE! This merge effective drops commit 4329b93b28 ("of: Reduce
indentation in of_graph_get_next_endpoint").
The of_graph_get_next_endpoint() function was moved and renamed by
commit fd9fdb78a9 ("[media] of: move graph helpers from
drivers/media/v4l2-core to drivers/of"). It was originally called
v4l2_of_get_next_endpoint() and lived in the file
drivers/media/v4l2-core/v4l2-of.c.
In that original location, it was then fixed to support empty port
nodes by commit b9db140c1e ("[media] v4l: of: Support empty port
nodes"), and that commit clashes badly with the dropped "Reduce
intendation" commit. I had to choose one or the other, and decided
that the "Support empty port nodes" commit was more important ]
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (426 commits)
[media] em28xx-dvb: fix PCTV 461e tuner I2C binding
Revert "[media] em28xx-dvb: fix PCTV 461e tuner I2C binding"
[media] em28xx: fix PCTV 290e LNA oops
[media] em28xx-dvb: fix PCTV 461e tuner I2C binding
[media] m88ds3103: fix bug on .set_tone()
[media] saa7134: fix WARN_ON during resume
[media] v4l2-dv-timings: add module name, description, license
[media] videodev2.h: add parenthesis around macro arguments
[media] saa6752hs: depends on CRC32
[media] si4713: fix Kconfig dependencies
[media] Sensoray 2255 uses videobuf2
[media] adv7180: free an interrupt on failure paths in init_device()
[media] e4000: make VIDEO_V4L2 dependency optional
[media] af9033: Don't export functions for the hardware filter
[media] af9035: use af9033 PID filters
[media] af9033: implement PID filter
[media] rtl2832_sdr: do not use dynamic stack allocation
[media] e4000: fix 32-bit build error
[media] em28xx-audio: make sure audio is unmuted on open()
[media] DocBook media: v4l2_format_sdr was renamed to v4l2_sdr_format
...
Pull sb_edac patches from Mauro Carvalho Chehab:
"A couple sb_edac driver improvements, cleaning a little bit the amount
of data sent to dmesg, and fixing one error message"
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: mark MCE messages as KERN_DEBUG
sb_edac: use "event" instead of "exception" when MC wasnt signaled
Pull MIPS updates from Ralf Baechle:
- Support for Imgtec's Aptiv family of MIPS cores.
- Improved detection of BCM47xx configurations.
- Fix hiberation for certain configurations.
- Add support for the Chinese Loongson 3 CPU, a MIPS64 R2 core and
systems.
- Detection and support for the MIPS P5600 core.
- A few more random fixes that didn't make 3.14.
- Support for the EVA Extended Virtual Addressing
- Switch Alchemy to the platform PATA driver
- Complete unification of Alchemy support
- Allow availability of I/O cache coherency to be runtime detected
- Improvments to multiprocessing support for Imgtec platforms
- A few microoptimizations
- Cleanups of FPU support
- Paul Gortmaker's fixes for the init stuff
- Support for seccomp
* 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr: (165 commits)
MIPS: CPC: Use __raw_ memory access functions
MIPS: CM: use __raw_ memory access functions
MIPS: Fix warning when including smp-ops.h with CONFIG_SMP=n
MIPS: Malta: GIC IPIs may be used without MT
MIPS: smp-mt: Use common GIC IPI implementation
MIPS: smp-cmp: Remove incorrect core number probe
MIPS: Fix gigaton of warning building with microMIPS.
MIPS: Fix core number detection for MT cores
MIPS: MT: core_nvpes function to retrieve VPE count
MIPS: Provide empty mips_mt_set_cpuoptions when CONFIG_MIPS_MT=n
MIPS: Lasat: Replace del_timer by del_timer_sync
MIPS: Malta: Setup PM I/O region on boot
MIPS: Loongson: Add a Loongson-3 default config file
MIPS: Loongson 3: Add CPU hotplug support
MIPS: Loongson 3: Add Loongson-3 SMP support
MIPS: Loongson: Add Loongson-3 Kconfig options
MIPS: Loongson: Add swiotlb to support All-Memory DMA
MIPS: Loongson 3: Add serial port support
MIPS: Loongson 3: Add IRQ init and dispatch support
MIPS: Loongson 3: Add HT-linked PCI support
...
* Support for new AMD models, along with more graceful fallback for
unsupported hw.
* Bunch of fixes from SUSE accumulated from bug reports
* Misc other fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTMUueAAoJEBLB8Bhh3lVKbGIP/iXBanEUUXbg027hvNch8tbJ
Yzsk3VkhmZ6hWXU4XNkvFFbSfCfHuHYQSEAxro6yc2t+6aVhMBrKIcCBNejNYt37
louswtfhC9A4geX206bzfOFNxOVsWzX2vxhhQFbqybZe6BnmcnCSKn//v+kF0/5E
6CrwQyKQjl2vGewPU4pBlrbnRNduPRs6bg6lWgzNxbKrhD8Ce/7NQRVd5v698pNm
NTqfS37J90o1DP3I6yY30trtprPjjfnFANxb2DPaf09Z/X+aI1ONNe4s/URBd+qg
yYIWAphbKopXPZOS0wDU0ikb7dGWH1lcQdVE69GxnCEvz9hz2ISg3plguwbYLPzK
CmI4RGU0D8n5Irxu8bUID4SLT6fSAqU+cgJCdu6dqUncB7khcwG8dWJxMYNGjlBm
TD9aCRRs+/b07KUrBeL1tufgd9gGPKWDeecTw55VMofIFkYvwIAz8TFdj3ql4Uz/
AsYJkAqvoO4dTz1ai+PJ06EWSijUT+KLkyZbybS5+PiP4uRAWY8pkCKbvTJQb1xN
0SAwm0tqT7mQKc/oIsvpZ/TLjwYxRhAeVMvrkElkNRPSsrt70rDVQiPgzQh9GTqV
d5GmYI65Tk2Jrl7dVN+Pk+NDdS/y52SPk1AMDsfoBP6dndftdF6/gtqnijmbLgMY
/yH+hIAKJXJUUKO4pELe
=vafk
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"A bunch of EDAC updates all over the place:
- Support for new AMD models, along with more graceful fallback for
unsupported hw.
- Bunch of fixes from SUSE accumulated from bug reports
- Misc other fixes and cleanups"
* tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
amd64_edac: Add support for newer F16h models
i7core_edac: Drop unused variable
i82875p_edac: Drop redundant call to pci_get_device()
amd8111_edac: Fix leaks in probe error paths
e752x_edac: Drop pvt->bridge_ck
MCE, AMD: Fix decoding module loading on unsupported hw
i5100_edac: Remove an unneeded condition in i5100_init_csrows()
sb_edac: Degrade log level for device registration
amd64_edac: Fix logic to determine channel for F15 M30h processors
edac/85xx: Remove deprecated IRQF_DISABLED
i3200_edac: Add a missing pci_disable_device() on the exit path
i5400_edac: Disable device when unloading module
e752x_edac: Simplify call to pci_get_device()
Since the driver is decoding the MCE, it's useless to have these
messages printed unless you're debugging a problem in the driver.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Corrected Errors are MC events, not exceptions and reporting as the
later might confuse users.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
turned on using mce_amd_inj module and the patch works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Tested-by: Arindam Nath <Arindam.Nath@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix the following warning:
drivers/edac/i7core_edac.c: In function "core_mce_output_error":
drivers/edac/i7core_edac.c:1711:8: warning: variable "type" set but not used [-Wunused-but-set-variable]
char *type, *optype, *err;
^
According to Mauro, type can just be dropped, as tp_event now maps if
the error is corrected, uncorrected non-fatal or uncorrected fatal
one.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224171358.692d7e5a@endymion.delvare
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
The first call to pci_get_device() in i82875p_probe1() is useless. The
result is immediately reset at the beginning of
i82875p_setup_overfl_dev(), which then issues the same
pci_get_device() call. Dropping the redundant call avoids a device
reference leak.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224160316.60e55fb6@endymion.delvare
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
pvt->bridge_ck always points to the same device as pvt->dev_d0f1, so
get rid of the former and only use the latter.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140224151544.16ba28a0@endymion.delvare
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.
This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.
Tested on an Intel S7000FC4UR system with a 7300 chipset.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.
In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.
This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
We want to still be able to issue some error information on systems for
which there is no decoding support (think older distro kernels here,
for example). Therefore, we allow module registration but skip the
per-family bank-specific decoders and issue the general information
only, i.e.:
[ 46.822828] [Hardware Error]: Error Status: Uncorrected, software containable error.
[ 46.822846] [Hardware Error]: CPU:0 (15:30:0) MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f
[ 46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)
with the hope that it still contains helpful useful bits.
Suggested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392659391-2411-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We checked that "npages" was not zero a couple lines earlier so we can
remove this and pull the code in one indent level.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20140213111738.GB15549@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
On a system with four Intel processors, it generates too many messages
"EDAC sbridge: Seeking for: dev 1d.3 PCI ID xxxx". And it doesn't give
many useful information for normal users, so change log level from INFO
to DEBUG.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1392613824-11230-1-git-send-email-jiang.liu@linux.intel.com
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.
On that second path we don't need to init the workqueue which has been
initialized already.
Thanks to Tejun for workqueue insights.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
Currently, i3200_edac.c forgets to call pci_disable_device() during a
process of disabling device. This patch adds that call.
The problem was detected by Huqiu Liu.
Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Link: http://lkml.kernel.org/r/1389587178-10167-1-git-send-email-mitake.hitoshi@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This was found by Huqiu Liu using a static analysis.
Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140116162021.GY15716@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
There are several left overs with my old email address.
Remove their occurrences and add myself at CREDITS, to
allow people to be able to reach me on my new addresses.
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
The last parameter, pvt->bridge_ck, is always NULL as far as I can
see, so just use that.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140130091101.22f2b550@endymion.delvare
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull x86 RAS changes from Ingo Molnar:
- SCI reporting for other error types not only correctable ones
- GHES cleanups
- Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which,
if done properly in the BIOS, makes a corresponding EDAC module
redundant
- PCIe AER tracepoint severity levels fix
- Error path correction for the mce device init
- MCE timer fix
- Add more flexibility to the error injection (EINJ) debugfs interface
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mce: Fix mce_start_timer semantics
ACPI, APEI, GHES: Cleanup ghes memory error handling
ACPI, APEI: Cleanup alignment-aware accesses
ACPI, APEI, GHES: Do not report only correctable errors with SCI
ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface
ACPI, eMCA: Combine eMCA/EDAC event reporting priority
EDAC, sb_edac: Modify H/W event reporting policy
EDAC: Add an edac_report parameter to EDAC
PCI, AER: Fix severity usage in aer trace event
x86, mce: Call put_device on device_register failure
* misc small enhancements/fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJS3PEWAAoJEBLB8Bhh3lVKziAP/3nNouipUvIcRpL3o8U3jX19
Z38tD0KPjIIX46rgexjoXbI/bKEy6whE1eDc0CG9GPpOaN9p9WWe81hfBIQc9DYK
epFNdH3Wk83Ml+3+01CBKfLfiIV25c7rc9IOM7g+nMOg65Z++iC1Cea7i7Ul7T5a
C+CIDdsmco9VrDBNjHHsrdYVZ1Ufx4NxPb5xhemQAiGCXseBtoD1IDS/gOM1F+aP
UBDSJobjDrFq5JasEkp83U62VBTDndEcxb6yHROVahIPRn4miroc0NMHJ3XEd1/H
K9u24iXmHgjjG3Euzd3mq8Axsscz8dUksKQT+lL2YIXz+AM8PcrCDrXK6TiVF9/A
k5Vj91svnWUDrEU0OF1HysOvn0Z/9Y0+Fwyt6//HMyl/7ehCZBINwxsZ4sc4whtd
nVHV2Ddcc/LjZzQEAvHuyWdf2pfxi3M1VRhNvuGyWtdHBPf5P8tqVUteoztELPM+
bKr7wkpvKzKajX6xldNNrVP9xNps8hrYdETjhP5FpUYpF4HKntAe+VJvIue+nOja
K2AxB25qGHyK5rgh69DkSuatZmImflG4brKWbOE+4La0E8OajrM790YdqP2hRnDQ
3zpuIRr/htIAy7C2vGbmWITRhxXnGx5FhDI7AQRr1VjvgJprWM5zgIacDuH+5C9K
ftouZ1hFTDI1WEEMDxRF
=EJ+3
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- mpc85xx PCIe error interrupt support
- misc small enhancements/fixes all over the place.
* tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Don't try to cancel workqueue when it's never setup
e752x_edac: Fix pci_dev usage count
sb_edac: Mark get_mci_for_node_id as static
EDAC: Mark edac_create_debug_nodes as static
amd64_edac: Remove "amd64" prefix from static functions
amd64_edac: Simplify code around decode_bus_error
amd64_edac: Mark amd64_decode_bus_error as static
EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
amd64_edac: Fix condition to verify max channels allowed for F15 M30h
edac/85xx: Add PCIe error interrupt edac support
We only setup a workqueue for edac devices that use the polling
method. We still try to cancel the workqueue if an edac_device
uses the irq method though. This causes a warning from debug
objects when we remove an edac device:
WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0()
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28
Modules linked in: krait_edac(-)
CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69
(unwind_backtrace+0x0/0x144)
(show_stack+0x20/0x24)
(dump_stack+0x74/0xb4)
(warn_slowpath_common+0x78/0x9c)
(warn_slowpath_fmt+0x40/0x48)
(debug_print_object+0x98/0xc0)
(debug_object_assert_init+0xdc/0xec)
(del_timer+0x24/0x7c)
(try_to_grab_pending+0xc0/0x1b0)
(cancel_delayed_work+0x2c/0xa0)
(edac_device_workq_teardown+0x1c/0x38)
(edac_device_del_device+0xb8/0xe4)
(krait_edac_remove+0x50/0x70 [krait_edac])
(platform_drv_remove+0x24/0x28)
(__device_release_driver+0x68/0xc0)
(driver_detach+0xc4/0xc8)
(bus_remove_driver+0xac/0x114)
(driver_unregister+0x38/0x58)
(platform_driver_unregister+0x1c/0x20)
(krait_edac_driver_exit+0x14/0x1c [krait_edac])
(SyS_delete_module+0x178/0x2b4)
Fix it by skipping the workqueue teardown for such devices.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org
Signed-off-by: Borislav Petkov <bp@suse.de>
In case the device 0, function 1 is not found using pci_get_device(),
pci_scan_single_device() will be used but, differently than
pci_get_device(), it allocates a pci_dev but doesn't does bump the usage
count on the pci_dev and after few module removals and loads the pci_dev
will be freed.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: mark gross <mark.gross@intel.com>
Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSrCysAAoJEBLB8Bhh3lVK1ikP/0hKY1Kk4tjbSta9A9Z8LdQG
9F5JzEny47DpTrLaKij7MqAlbYFO8sSm7Zw0CEztTF7Ou/H37GAuxhMlB8ECMGOm
Dzu53X1rySTna9mB+1gyXXd+pJypp/oe18/o16rw1QKjI9o2Kfgwfj7lKvytR549
kDM1dhxEImQIS5cpJPkOPbcpVlSqYN7BnK9/Qx3h0W70httT/8qrr9xVtVL7wjOT
auTA0R5/TkV06FtxyfHUNULEWTSP+2yNP/iJbusR6f4Jk1j0XmyCFr0BYOkPA1UO
9+wC9+2R+r7rJw8MBfMzNmPrRzDJHdaiHPwYqse05yewRHfRHe5cgZWJYbL8Qv0u
2WOX+fY12EfDYlihcOYtlupRzhGfGKRsaRpSuG1zX87ctDxAfNZencv4hnaJvfqG
Xk6ggIX6tHKEivO2gmaPsmhoKveh0zcozUs+wgh/tvV5QB6ioFCjzHfSEsix5+BH
ryyg1ri7IZnh92g3UuSUpE0OCbAquMfI7XIJo+kFs0u79dZTL/kD3wVu6oYazwdy
yTrvIq7Bq5cMWnnni5w7dIU09ef2uvDgyHyAS6+RiqaQxhYFsW8/yx2zJrIloWRs
7txz6t3CVmWFiejIg2gw6KyjaG6pXRBkDkI1XU6T+bKLb31ojx2+i9UKIIUeRZTB
iisWAOI6ZSdt4eAkgeaI
=r//I
-----END PGP SIGNATURE-----
Merge tag 'ras_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS updates from Borislav Petkov:
* Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch marks the function get_mci_for_node_id() as static because it
is not used outside of sb_edac.c.
Thus, it also eliminates the following warning:
drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.
Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This patch marks the function amd64_decode_bus_error() as static because
it is not used outside of amd64_edac.c.
It also eliminates the following warning:
drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Newer Intel platforms support more than one method to report H/W event.
On this kind of platform, H/W event report can adopt new method and
traditional EDAC method should be disabled. Moreover, if EDAC event
report method is set to *force*, it means event must be reported via
EDAC interface. IOW, it overrides the default event report policy.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit and error messages ]
Signed-off-by: Borislav Petkov <bp@suse.de>
This new parameter is used to control how to report HW error reporting,
especially for newer Intel platform, like Ivybridge-EX, which contains
an enhanced error decoding functionality in the firmware, i.e. eMCA.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Currently, there is no other bus that has something like this macro for
their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com
[ Boris: swap commit message with better one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The value returned from 'f15_m30h_determine_channel' will
always be 0x3 max. The condition
(channel > 4 || channel < 0)
works as hardware never returns a value of 4, but
it leads to static checker analysis errors like
http://marc.info/?l=linux-edac&m=138607615131951&w=2.
Fix that.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain
[ Boris: massage commit message a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix this:
In file included from drivers/edac/sb_edac.c:27:0:
drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized]
printk(level "EDAC " prefix ": " fmt, ##arg)
^
drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here
u64 ch_addr, offset, limit, prv = 0;
Limit can be initialized to 0. The only way limit wouldn't be
initialized is if there are no DIMMs present (which would be a bug of
course) and it'd fail on the next test.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Add pcie error interrupt edac support for mpc85xx, p3041, p4080, and
p5020. The mpc85xx uses the legacy interrupt report mechanism - the
error interrupts are reported directly to mpic. While the p3041/
p4080/p5020 attaches the most of error interrupts to interrupt zero. And
report error interrupts to mpic via interrupt 0.
This patch can handle both of them.
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Link: http://lkml.kernel.org/r/1384712714-8826-3-git-send-email-morbidrsa@gmail.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Dave Jiang <dave.jiang@gmail.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull EDAC driver updates from Mauro Carvalho Chehab:
- sb_edac: add support for Ivy Bridge support
- cell_edac: add a missing of_node_put() call
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
cell_edac: fix missing of_node_put
sb_edac: add support for Ivy Bridge
sb_edac: avoid decoding the same error multiple times
sb_edac: rename mci_bind_devs()
sb_edac: enable multiple PCI id tables to be used
sb_edac: rework sad_pkg
sb_edac: allow different interleave lists
sb_edac: allow different dram_rule arrays
sb_edac: isolate TOHM retrieval
sb_edac: rename pci_br
sb_edac: isolate TOLM retrieval
sb_edac: make RANK_CFG_A value part of sbridge_info
Highlights:
* Support for Calxeda ECX-2000 memory controller, from Robert Richter
* Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob Herring
and Robert Richter
* New maintainer for Freescale's MPC85xx EDAC driver: Johannes Thumshirn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSiRmeAAoJEBLB8Bhh3lVKYswQAKPCYSsIQg3L3K3320uvWV3x
NEAxAAdYyN2ds7ksLv//54FBLgH9UeiC56glTMmnK9QvfGCcgHGo1WJCa84LwqIf
M7H6mKSgTEBZXr7HpWgtarMYjpNve4Nh8SwHv7tlWRJNd3ufcBbOfCY6rEreULJd
sBTRMuEPc1Ki7NxZr2m/xsPyzWXS1N1nSd2aewiszyY3Rwp1vIAPv/Chr7UwF/Fm
GGeQonc801hVRIQONwxsXzS2qwj/wgx8OPab05psfMuv6CWLxQQJAzGWbe+gv3V4
mYx64+U4nOkQ/knRAf9s0fLwJX6DWSTtQer7m5YSUey0dYDfgV+DemLFvS5We7XB
os9PBGL+0bGUrJ0cnLE6O+6S1qniWaKZrhSZndcYiVoQeDZmaMuartFlIaeRvY21
WJML2oqqUop2ZyaIKInJEyeD74FIf7BsG3V+RJwsCZx5+38Pm0EnBZqGJ9bnJl7x
OxXlHjwjZRhlVFIdcN5WeaKoKmXpdcnzLcL1XE2wMgs9ZFaleeyTDXuQm+XxkKGd
ExD/1TbAoBqFF7FwIAQwqPf1f2HPDBKlSg38X6l6NV5uLK/u5rdffKTMQPWi/6p2
RDO+Ddbypzm72850hcYc9mY+B8Qe3T3F/iKlbELiK1S5+IQzm/hmfTDcyKStwKZn
cCTvsIo9QHflhshXR1a8
=pU/h
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"Following up on last week's discussion, here's my part of the EDAC
pile, highlights in the signed tag.
The last two patches have a date from just now because I've just
applied them to the tree after Johannes sent them to me earlier. I
decided to forward them now because they're trivial.
There's a third one for MPC85xx which adds PCIe error interrupt
support but since it is not so trivial and hasn't seen any linux-next
time, I'm deferring it to 3.14
EDAC update highlights:
- Support for Calxeda ECX-2000 memory controller, from Robert Richter
- Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob
Herring and Robert Richter
- New maintainer for Freescale's MPC85xx EDAC driver: Johannes
Thumshirn"
* tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
edac/85xx: Remove mpc85xx_pci_err_remove
EDAC: Add edac-mpc85xx driver to MAINTAINERS
edac, highbank: Moving error injection to sysfs for edac
edac, highbank: Add MAINTAINERS entry
edac: Unify reporting of device info for device, mc and pci
edac, highbank: Improve and unify naming
edac, highbank: Add Calxeda ECX-2000 support
ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi
edac, highbank: Fix interrupt setup of mem and l2 controller
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the
compiler warning which can be seen when building the driver either
statically or as a module.
Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com>
Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's
wiser to modify sb_edac to support both instead of creating another
driver.
[m.chehab@samsung.com: Fix CodingStyle]
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Whenever the extended error reporting is active, multiple MCEs will be
generated for the same event, which will lead to multiple repeated
errors to be reported. So check ADDRV and only decode the error if the
MCE address is valid.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy
Bridge.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Ivy Bridge has more than one, so rename pci_br to pci_br0
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for the Ivy Bridge support.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation of Ivy Bridge support.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
Always have the error injection i/f available, even if there is no
debugfs or EDAC_DEBUG enabled. We need this for testing production
kernels and environments.
Thus, the entry moves from:
/sys/kernel/debug/edac/mc0/inject_ctrl
to:
/sys/devices/system/edac/mc/mc0/inject_ctrl
No other changes of the interface.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Log messages slightly differ between edac subsystems. Unifying it.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rric@kernel.org>
Assinging correct names of the 'hb_mc_edac' and 'hb_l2_edac' edac
modules for module, controller and device. Reported values for
Highbank in dmesg are now:
EDAC MC0: Giving out device to module hb_mc_edac controller
calxeda,hb-ddr-ctrl: DEV fff00000.memory-controller (INTERRUPT)
EDAC DEVICE0: Giving out device to module hb_l2_edac controller
calxeda,hb-sregs-l2-ecc: DEV fff3c200.sregs (INTERRUPT)
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
Implement edac support for Calxeda ECX-2000.
The ECX-2000 memory controller is similar to Highbank but has
different register bases for error and interrupt registers. There is
an own device tree name "calxeda,ecx-2000-ddr-ctrl" for identification
and initialization of the ECX-2000 and its base addresses.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
Register and enable interrupts after the edac registration. Otherwise
incomming ecc error interrupts lead to crashes during device setup.
Fixing this in drivers for mc and l2.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Cc: stable <stable@vger.kernel.org> # 3.6+
Signed-off-by: Robert Richter <rric@kernel.org>
In latest UEFI spec(by now it's 2.4) there are some new
fields for memory error reporting. Add these new fields for
ghes_edac interface.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.
Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
GENMASK is used to create a contiguous bitmask([hi:lo]). It is
implemented twice in current kernel. One is in EDAC driver, the other
is in SiS/XGI FB driver. Move it to a more generic place for other
usage.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Winischhofer <thomas@winischhofer.net>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Powerpc is a mess of implicit includes by prom.h. Add the necessary
explicit includes to drivers in preparation of prom.h cleanup.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Pull Tile arch updates from Chris Metcalf:
"These changes bring in a bunch of new functionality that has been
maintained internally at Tilera over the last year, plus other stray
bits of work that I've taken into the tile tree from other folks.
The changes include some PCI root complex work, interrupt-driven
console support, support for performing fast-path unaligned data
fixups by kernel-based JIT code generation, CONFIG_PREEMPT support,
vDSO support for gettimeofday(), a serial driver for the tilegx
on-chip UART, KGDB support, more optimized string routines, support
for ftrace and kprobes, improved ASLR, and many bug fixes.
We also remove support for the old TILE64 chip, which is no longer
buildable"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits)
tile: refresh tile defconfig files
tile: rework <asm/cmpxchg.h>
tile PCI RC: make default consistent DMA mask 32-bit
tile: add null check for kzalloc in tile/kernel/setup.c
tile: make __write_once a synonym for __read_mostly
tile: remove support for TILE64
tile: use asm-generic/bitops/builtin-*.h
tile: eliminate no-op "noatomichash" boot argument
tile: use standard tile_bundle_bits type in traps.c
tile: simplify code referencing hypervisor API addresses
tile: change <asm/system.h> to <asm/switch_to.h> in comments
tile: mark pcibios_init() as __init
tile: check for correct compiler earlier in asm-offsets.c
tile: use standard 'generic-y' model for <asm/hw_irq.h>
tile: use asm-generic version of <asm/local64.h>
tile PCI RC: add comment about "PCI hole" problem
tile: remove DEBUG_EXTRA_FLAGS kernel config option
tile: add virt_to_kpte() API and clean up and document behavior
tile: support FRAME_POINTER
tile: support reporting Tilera hypervisor statistics
...
dct_base and dct_limit obtain 32 bit register values when they read
their respective pci config space registers. A left shift beyond 32 bits
will cause them to wrap around. Similar case for chan_addr as can be
seen from the bug report (link below). In the patch, we rectify this by
casting chan_addr to u64 and by comparing dct_base and dct_limit against
properly shifted sys_addr in order to compare the correct bits.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later.
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
The struct should be terminated by using empty braces in order to
fix the following sparse warning.
drivers/edac/cpc925_edac.c:792:10: warning: Using plain integer as NULL pointer
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ drop obvious comment ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.
Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.
While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:
* Erratum 505: fixed in model 0x1, stepping 0x1 and later.
* Erratum 637: not fixed.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Cleanups, corrections ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Make a local function static in order to fix the following sparse
warning:
drivers/edac/x38_edac.c:252:14: warning: symbol 'x38_map_mchbar' was not declared. Should it be static?
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ Boris: Correct commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
This local symbol is used only in this file.
Fix the following sparse warnings:
drivers/edac/i3200_edac.c:264:14: warning: symbol 'i3200_map_mchbar' was not declared. Should it be static?
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.
So always allocate two channels unconditionally.
This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.
Reported-and-tested-by: Roger Leigh <rleigh@debian.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Pull MIPS updates from Ralf Baechle:
"MIPS updates:
- All the things that didn't make 3.10.
- Removes the Windriver PPMC platform. Nobody will miss it.
- Remove a workaround from kernel/irq/irqdomain.c which was there
exclusivly for MIPS. Patch by Grant Likely.
- More small improvments for the SEAD 3 platform
- Improvments on the BMIPS / SMP support for the BCM63xx series.
- Various cleanups of dead leftovers.
- Platform support for the Cavium Octeon-based EdgeRouter Lite.
Two large KVM patchsets didn't make it for this pull request because
their respective authors are vacationing"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits)
MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER
MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions
MIPS: SEAD3: Disable L2 cache on SEAD-3.
MIPS: BCM63xx: Enable second core SMP on BCM6328 if available
MIPS: BCM63xx: Add SMP support to prom.c
MIPS: define write{b,w,l,q}_relaxed
MIPS: Expose missing pci_io{map,unmap} declarations
MIPS: Malta: Update GCMP detection.
Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
MIPS: APSP: Remove <asm/kspd.h>
SSB: Kconfig: Amend SSB_EMBEDDED dependencies
MIPS: microMIPS: Fix improper definition of ISA exception bit.
MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
MIPS: Declare emulate_load_store_microMIPS as a static function.
MIPS: Fix typos and cleanup comment
MIPS: Cleanup indentation and whitespace
MIPS: BMIPS: support booting from physical CPU other than 0
MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
MIPS: GIC: Fix gic_set_affinity infinite loop
MIPS: Don't save/restore OCTEON wide multiplier state on syscalls.
...
Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ cleanup commit message and error string ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off. Remove all the remaining references to it.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix yet another issue caught by 8f46baaa7e ("base: core: WARN() about
bogus permissions on device attributes").
Signed-off-by: Borislav Petkov <bp@suse.de>
I get the following warning on boot:
------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name: -[8737R2A]-
Write permission without 'store'
...
</snip>
Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull edac fixes from Mauro Carvalho Chehab:
"Two edac fixes:
- i7300_edac currently reports a wrong number of DIMMs when the
memory controller is in single channel mode
- on some Sandy Bridge machines, the EDAC driver bails out as one of
the PCI IDs used by the driver is hidden by BIOS. As the driver
uses it only to detect the type of memory, make it optional at the
driver"
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
edac: sb_edac.c should not require prescence of IMC_DDRIO device
i7300_edac: Fix memory detection in single mode
The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
space to determine the type of DIMMs (registered or unregistered).
But this device does not exist on some single socket Sandy Bridge
servers. While the type of DIMMs is nice to know, it is not essential
for this driver's other functions. So it seems harsh to have it
refuse to load at all when it cannot find this device.
Make the check for this device be optional. If it isn't present
just report the memory type as "MEM_UNKNOWN".
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add code to handle DRAM ECC errors decoding for Fam16h.
Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.
There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:
if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
We were filling the csrow size with a wrong value. 16a528ee39 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
"For:
- Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
- error injection support for i5100, when EDAC debug is enabled;
- fix edac when it is loaded builtin (early init for the subsystem);
- a "Firmware First" EDAC driver, allowing ghes to report errors via
EDAC (ghes-edac).
With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
i7core_edac or sb_edac are running, the error reports are
unpredictable, as both BIOS and OS race to access the registers. With
ghes-edac, the EDAC core will refuse to register any other concurrent
memory error driver.
This patchset moves the ghes struct definitions to a separate header
file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
register/unregister and to report errors via ghes-edac. Those changes
were acked by ghes driver maintainer (Huang)."
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
i5100_edac: convert to use simple_open()
ghes_edac: fix to use list_for_each_entry_safe() when delete list items
ghes_edac: Fix RAS tracing
ghes_edac: Make it compliant with UEFI spec 2.3.1
ghes_edac: Improve driver's printk messages
ghes_edac: Don't credit the same memory dimm twice
ghes_edac: do a better job of filling EDAC DIMM info
ghes_edac: add support for reporting errors via EDAC
ghes_edac: Register at EDAC core the BIOS report
ghes: add the needed hooks for EDAC error report
ghes: move structures/enum to a header file
edac: add support for error type "Info"
edac: add support for raw error reports
edac: reduce stack pressure by using a pre-allocated buffer
edac: lock module owner to avoid error report conflicts
edac: remove proc_name from mci structure
edac: add a new memory layer type
edac: initialize the core earlier
edac: better report error conditions in debug mode
i5100_edac: Remove two checkpatch warnings
...
This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.
As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.
The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.
Now, the error message:
[ 72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 72.396627] {1}[Hardware Error]: APEI generic hardware error status
[ 72.396628] {1}[Hardware Error]: severity: 2, corrected
[ 72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[ 72.396632] {1}[Hardware Error]: flags: 0x01
[ 72.396634] {1}[Hardware Error]: primary
[ 72.396635] {1}[Hardware Error]: section_type: memory error
[ 72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[ 72.396638] {1}[Hardware Error]: node: 3
[ 72.396639] {1}[Hardware Error]: card: 0
[ 72.396640] {1}[Hardware Error]: module: 0
[ 72.396641] {1}[Hardware Error]: device: 0
[ 72.396643] {1}[Hardware Error]: error_type: 18, unknown
[ 72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
Is properly represented on the trace event:
kworker/0:2-584 [000] .... 72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location👎-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)
Tested on a 4 sockets E5-4650 Sandy Bridge machine.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Provide a better infrastructure for printk's inside the driver:
- use edac_dbg() for debug messages;
- standardize the usage of pr_info();
- provide warning about the risk of relying on this
driver.
While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Conflicts:
drivers/edac/ghes_edac.c
Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.
The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.
For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers all
over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
If you need me to provide a merged tree to handle these resolutions,
please let me know.
Other than those patches, there's not much here, some minor fixes and
updates.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
=yWAQ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
Other than those patches, there's not much here, some minor fixes and
updates"
Fix up trivial conflicts
* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...
The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
"There can be only one".
There are two reasons for that:
1) Each driver assumes that it is the only one registering at
the EDAC core, as it is driver's responsibility to number
the memory controllers, and all of them start from 0;
2) If BIOS is handling the memory errors, the OS can't also be
doing it, as one will mangle with the other.
So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
In order for it to work with it builtin, the EDAC core should
be initialized earlier, otherwise the ghes_edac driver initializes
before edac_mc_sysfs_init() being called:
...
[ 4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
...
[ 4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
[ 6.519495] EDAC MC: Ver: 3.0.0
[ 6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created
The net result is that no EDAC sysfs nodes will appear.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The last changeset introduced a few checkpatch warnings:
WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+ if (priv->debugfs)
+ debugfs_remove_recursive(priv->debugfs);
WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+ if (i5100_debugfs)
+ debugfs_remove(i5100_debugfs);
Get rid of them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.
After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.
Example of a CE injection:
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable
Example of UE injection:
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable
Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.
[1] Intel 5100 Memory Controller Hub Chipset
Doc.Nr: 318378
http://www.intel.com/content/dam/doc/datasheet/5100-
memory-controller-hub-chipset-datasheet.pdf
[2] Intel 7300 Chipset MemoryController Hub (MCH)
Doc.Nr: 318082
http://www.intel.com/assets/pdf/datasheet/318082.pdf
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Probe and store the device handle for the device 19 function 0 during
driver initialization. The device is used during fault injection.
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.
Reported-by: Felipe Balbi <balbi@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
After running a series of tests on an HP DL320, filled with different
memory sizes, it was noticed that, when filled with just one DIMM
on such hardware, the driver wrongly detects twice the memory, and
thinks that both channels 0 and 1 are filled.
It seems to be partially caused by the BIOS and partially by the driver.
The i3200_edac current logic would be working fine if the BIOS were
disabling the unused second channel when just one DIMM is connected,
in order to do power-saving, as recommended on this chipset's datasheet.
However, the BIOS on this particular machine doesn't do it:
[ 16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
[ 16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled
So, the driver were assuming that 2 channels are enabled (well, they are,
but the second is unused).
Combined with that, I found two issues at the logic that creates the
EDAC data, that were failing when the two channels are not equally
filled (AFAICT, that happens only when just 1 DIMM is plugged).
The first one is that a 0 at DRB means that nothing is filled. The
driver's logic, however, do some calculation with that.
The second one is that the logic that fills the DIMM data currently
assumes that both channels are equally filled.
I tested the system already with the current configuration and my
patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
at dimm slot 01 (channel 0), it is now displaying:
[ 16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
[ 16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
[ 16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
[ 16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
...
[ 16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
[ 16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb
and the corresponding sysfs nodes are now properly filled.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently, it is not possible to know, when debug is enabled,
if the driver is using 2 DIMMS per channel mode or not. It is
not possible to know the values of the drbs registers, used
to identify the memory rank sizes.
Add debug for both, as it helps to track issues on the driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Only one: AMD F16h MCE decoding enablement from Jacob Shin.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRI2t7AAoJEBLB8Bhh3lVKmLAP/RbYNVVvLZZxXqB6heCkX7D3
Avhi4GtuHqjY+79nqBoJEH9HjgzBzL6ffsJGPF3gtXrWybi6nzZkqog5QGRQFfGS
gIoldbfP9MMF/RrMHbUF5x9Ha9EcdudDgYE3eqCq6KUfOMlUrBRECaU5YRrB+FKR
qsuQNmF6J3uMPYwO9oGhTpA/vPwwapFaKgm+KND8q5h5JcpJrRpooKNQc1DheW+x
nfYFpmSBW8eamVwV5DTjqKVhKeG3gB/3i6uSTpLvh4i0ZmV2vQ+drwC3aU0Eh0Q4
wnslEpQENjODsvbZH6b0j2WTDgBGKEpALRkMUDTnnqvYrR96cV02MWUfp6cbKYCv
+d/iPt3hEBCyDTDzfP66N+1b+Wqls4Dx8lhhqLdPhsIpY2Qu8OQ8/mjyIIs12qK5
g9w3lsfy3shWmdsK/Ehd0IhNt1bzNV+63CINA2isC2QAp0Eqecj3jKrXor+MTPu1
uRY3wM+8CrdeFNNhhhsMQYIb4+DFGLgMwY5mV6z8ceFJAjxvyUxjObnSMopjBKog
9+JcyEHCADbpHsRup/zRIoGtSs+44sQ6l8l7pq6d4K4UfkG7Biq9lcxThID7rImn
Rl6MOJVmJuM5n9JVIU4/bmhjfuTcI+gdshexDEH2HnqaXxd7ceIbnrFWFaBZEO5t
t3WasBuIb30g1XCQmNT6
=LbcU
-----END PGP SIGNATURE-----
Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"Mostly AMD's side of EDAC. It is basically a new family enablement
stuff: AMD F16h MCE decoding enablement from Jacob Shin. The rest is
trivial cleanups."
* tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
mpc85xx_edac: Fix typo
EDAC, MCE, AMD: Remove unneeded exports
EDAC, MCE, AMD: Add MCE decoding support for Family 16h
EDAC, MCE, AMD: Make MC2 decoding per-family
amd64_edac: Remove dead code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q
GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5
E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+
Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo
EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL
ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M=
=jjBc
-----END PGP SIGNATURE-----
Merge tag 'v3.8-rc7' into next
Linux 3.8-rc7
* tag 'v3.8-rc7': (12052 commits)
Linux 3.8-rc7
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
atm/iphase: rename fregt_t -> ffreg_t
ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
ARM: DMA mapping: fix bad atomic test
ARM: realview: ensure that we have sufficient IRQs available
ARM: GIC: fix GIC cpumask initialization
net: usb: fix regression from FLAG_NOARP code
l2tp: dont play with skb->truesize
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
netback: correct netbk_tx_err to handle wrap around.
xen/netback: free already allocated memory on failure in xen_netbk_get_requests
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
xen/netback: shutdown the ring if it contains garbage.
drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
virtio_console: Don't access uninitialized data.
net: qmi_wwan: add more Huawei devices, including E320
net: cdc_ncm: add another Huawei vendor specific device
ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
...
Pull x86 platform changes from Ingo Molnar:
- Support for the Technologic Systems TS-5500 platform, by Vivien
Didelot
- Improved NUMA support on AMD systems:
Add support for federated systems where multiple memory controllers
can exist and see each other over multiple PCI domains. This
basically means that AMD node ids can be more than 8 now and the code
handling this is taught to incorporate PCI domain into those IDs.
- Support for the Goldfish virtual Android emulator, by Jun Nakajima,
Intel, Google, et al.
- Misc fixlets.
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add TS-5500 platform support
x86/srat: Simplify memory affinity init error handling
x86/apb/timer: Remove unnecessary "if"
goldfish: platform device for x86
amd64_edac: Fix type usage in NB IDs and memory ranges
amd64_edac: Fix PCI function lookup
x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
x86, AMD, NB: Add multi-domain support