Commit Graph

2110 Commits

Author SHA1 Message Date
Thomas Gleixner
457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Linus Torvalds
11b1177399 * Do not build mpc85_edac as a module (Michael Ellerman)
* Correct edac_mc_find()'s return value on error	(Robert Richter)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlzdOjIACgkQEsHwGGHe
 VUrkxQ/+J+mSL23ush4HNq43OqW7Wgpkdoljo+5hr1XWH790Kc/HNI+B/Qdsgl6I
 UoFkJPwMTfxqV/JZFuNC22ADnrXw6nRDZSE7+K+fHmpBgj2rK4hQ6tOYON9dTg/X
 9nPF1ylBQcdzwNt4eTTnS5U1eKBSWqmQm5NtYmq2s83DQ1v3r2Pckqx00SC5Stdy
 9vQLCPUR5SjsGpeQZnrdesxO89t3pWWzfZrAsCxPiY36VbUVmsHQVbHC9yfpLSIE
 6pzc0esQx6zqyb4e0Rx+aKJPJ6fS43NsxifkI81h7S5F/xPEuZy39rhAmnEOYCNE
 0ceQwl+ktzzGzmurgXf9Y5LLy8vOcCUJX4HBCkaweHSDRiP1xZtIl7Ao3aAaP4no
 AUQqX/O0BydkO+awarhCduM7vIGdr0B9xn0xCtUODitoTIzaoVZ8F2iF7xSOi/3F
 QkP4dSsf72ZK/EdA/8k1bxyBnxQ/TKgwLIQEJ1n9Bz7biPxZmqCyHQMDg7HaA1Uc
 crGkJJr0cGHb9TEAzy0ygtRNrKE+hpt0FGFGE0zlE4uX7YP1Y/Bw3R38zbcl2x9h
 nIqEmHwswVfBdm28RfxRH3LnS9SHwsXTtUk0y89Je1pK0qZ2Kv6nRiEFzWIAqPyY
 I/7eorSkNXU+vVnZko8cvuHEuJh1VUj2bGJeS6NUo9X5+IOE0LE=
 =yjXI
 -----END PGP SIGNATURE-----

Merge tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fixes from Borislav Petkov:

 - Do not build mpc85_edac as a module (Michael Ellerman)

 - Correct edac_mc_find()'s return value on error (Robert Richter)

* tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC/mc: Fix edac_mc_find() in case no device is found
  EDAC/mpc85xx: Prevent building as a module
2019-05-16 11:55:35 -07:00
Robert Richter
29a0c84397 EDAC/mc: Fix edac_mc_find() in case no device is found
The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.

I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.

 [ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]

Fixes: c73e8833be ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
2019-05-14 17:08:46 +02:00
Michael Ellerman
2b8358a951 EDAC/mpc85xx: Prevent building as a module
The mpc85xx EDAC driver can be configured as a module but then fails to
build because it uses two unexported symbols:

  ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
  ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!

We don't want to export those symbols just for this driver, so make the
driver only configurable as a built-in.

This seems to have been broken since at least

  c92132f598 ("edac/85xx: Add PCIe error interrupt edac support")

(Nov 2013).

 [ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
   as a module. ]

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Cc: morbidrsa@gmail.com
Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
2019-05-10 20:15:11 +02:00
Linus Torvalds
ffa6f55eb6 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:

 - Support for varying MCA bank numbers per CPU: this is in preparation
   for future CPU enablement (Yazen Ghannam)

 - MCA banks read race fix (Tony Luck)

 - Facility to filter MCEs which should not be logged (Yazen Ghannam)

 - The usual round of cleanups and fixes

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
  x86/MCE: Add an MCE-record filtering function
  RAS/CEC: Increment cec_entered under the mutex lock
  x86/mce: Fix debugfs_simple_attr.cocci warnings
  x86/mce: Remove mce_report_event()
  x86/mce: Handle varying MCA bank counts
  x86/mce: Fix machine_check_poll() tests for error types
  MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE
  x86/MCE: Group AMD function prototypes in <asm/mce.h>
2019-05-06 19:54:57 -07:00
Borislav Petkov
8de9930a46 Revert "EDAC/amd64: Support more than two controllers for chip select handling"
This reverts commit 0a227af521.

Unfortunately, this commit caused wrong detection of chip select sizes
on some F17h client machines:

  --- 00-rc6+     2019-02-14 14:28:03.126622904 +0100
  +++ 01-rc4+     2019-04-14 21:06:16.060614790 +0200
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0MB
   EDAC MC: UMC1 chip selects:
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0M

Revert it for now until it has been solved properly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2019-04-25 16:30:34 +02:00
Yazen Ghannam
71a84402b9 x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
errors under certain conditions. The errors are benign and can safely be
ignored. However, the high error rate may cause the MCA threshold
counter to overflow causing a high rate of thresholding interrupts.

In addition, users may see the errors reported through the AMD MCE
decoder module, even with the interrupt disabled, due to MCA polling.

Clear the "Counter Present" bit in the Instruction Fetch bank's
MCA_MISC0 register. This will prevent enabling MCA thresholding on this
bank which will prevent the high interrupt rate due to this error.

Define an AMD-specific function to filter these errors from the MCE
event pool so that they don't get reported during early boot.

Rename filter function in EDAC/mce_amd to avoid a naming conflict, while
at it.

 [ bp: Move function prototype to the internal header and
   massage/cleanup, fix typos. ]

Reported-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "clemej@gmail.com" <clemej@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # 5.0.x: c95b323dcd: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
Cc: <stable@vger.kernel.org> # 5.0.x: 30aa3d26ed: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
Cc: <stable@vger.kernel.org> # 5.0.x: 9308fd4074: x86/MCE: Group AMD function prototypes in <asm/mce.h>
Cc: <stable@vger.kernel.org> # 5.0.x
Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com
2019-04-23 18:16:07 +02:00
Thor Thayer
fad9fab975 EDAC/altera, firmware/intel: Add Stratix10 ECC DBE SMC call
Reserve ECC Double Bit Error SMC call to alert U-Boot that a DBE has
occurred. Move the call from local EDAC header file to a common header.

 [ bp: Merge the two patches. ]

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Richard Gong <richard.gong@intel.com>
Reviewed-by: Alan Tull <atull@kernel.org> # firmware
Cc: Greg KH <greg@kroah.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@kernel.org
Link: https://lkml.kernel.org/r/1553870639-23895-1-git-send-email-thor.thayer@linux.intel.com

Signed-off-by: Borislav Petkov <bp@suse.de>
2019-04-02 17:42:15 +02:00
Thor Thayer
788586efd1 EDAC/altera: Initialize peripheral FIFOs in probe()
The FIFO memory and ECC initialization doesn't need to be
done as a separate operation early in the startup.

Improve the Arria10 and Stratix10 peripheral FIFO init
by initializing memory and enabling ECC as part of the
device driver initialization.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-2-git-send-email-thor.thayer@linux.intel.com
2019-03-29 11:46:23 +01:00
Thor Thayer
436b0a583a EDAC/altera: Do less intrusive error injection
Improve the Arria10 and Stratix10 error injection routine
by reading the data and changing just 1 bit before writing
back out. Previous routine would overwrite the first bytes
to 0 then change 1 bit but this method is less intrusive.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-1-git-send-email-thor.thayer@linux.intel.com
2019-03-29 11:35:39 +01:00
Yazen Ghannam
fc00c6a416 EDAC/amd64: Adjust printed chip select sizes when interleaved
AMD systems may support chip select interleaving. However, on family
17h+ this was not taken into account when printing the chip select
sizes.

Add support to detect if chip selects are interleaved on family 17h+,
and adjust the sizes accordingly.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-6-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
0a227af521 EDAC/amd64: Support more than two controllers for chip select handling
The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
7835961d37 EDAC/amd64: Recognize x16 symbol size
Future AMD systems may support x16 symbol sizes.

Recognize if a system is using x16 symbol size. Also, simplify the print
statement.

Note that a x16 syndrome vector table is not necessary like with x4 or
x8 syndromes. This is because systems that support x16 symbol sizes are
SMCA systems and in that case, the syndrome can be directly extracted
from the MCA_SYND[Syndrome] field.

 [ bp: massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
869adc4316 EDAC/amd64: Set maximum channel layer size depending on family
The AMD64 EDAC module currently hardcodes the EDAC channel layer size
count to two. Future AMD systems may have more channels than this.

Set the EDAC channel layer size equal to the maximum number of channels
possible for the system. On Family 17h and later, this is set in the
num_umcs variable. Older systems will continue to use two as the
default.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190325203319.7603-1-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
bdcee7747f EDAC/amd64: Support more than two Unified Memory Controllers
The first few models of Family 17h all had 2 Unified Memory Controllers
per Die, so this was treated as a fixed value. However, future systems
may have more Unified Memory Controllers per Die.

Related to this, the channel number and base address of a Unified Memory
Controller were found by matching on fixed, known values. However,
current and future systems follow this pattern for the channel number
and base address of a Unified Memory Controller: 0xYXXXXX, where Y is
the channel number. So matching on hardcoded values is not necessary.

Set the number of Unified Memory Controllers at driver init time based
on the family/model. Also, update the functions that find the channel
number and base address of a Unified Memory Controller to support more
than two.

 [ bp: Move num_umcs into the .c file and simplify comment. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
4d30d2bc3c EDAC/amd64: Use a macro for iterating over Unified Memory Controllers
Define and use a macro for looping over the number of Unified Memory
Controllers.

No functional change.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-2-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Yazen Ghannam
6e846239e5 EDAC/amd64: Add Family 17h Model 30h PCI IDs
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.

This also fixes a probe failure that appeared when some other PCI IDs
for Family 17h Model 30h were added to the AMD NB code.

Fixes: be3518a16e (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
2019-03-27 00:13:25 +01:00
Thor Thayer
1bd76ff448 EDAC, altera: Fix S10 Double Bit Error Notification
Stratix10 Double Bit Error Address was always read from SDRAM Address
register instead of each device's Address register.

To determine which device had the DBE, cycle through the EDAC devices
comparing the DBE value to the db_irq value. Once found, report the DBE
Address from the device registers as well as the device name.

Finally, notify the system via an SMC call and indicate the panic should
result in a system reboot. Change a run-time check to a Stratix10
compile-time check for a clean SMC notification.

Fixes: d5fc912556 ("EDAC, altera: Combine Stratix10 and Arria10 probe functions")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1552490842-25440-1-git-send-email-thor.thayer@linux.intel.com
2019-03-23 10:03:30 +01:00
Qiuxu Zhuo
fe783516e3 EDAC, skx, i10nm: Make skx_common.c a pure library
The following Kconfig constellations fail randconfig builds:

  CONFIG_ACPI_NFIT=y
  CONFIG_EDAC_DEBUG=y
  CONFIG_EDAC_SKX=m
  CONFIG_EDAC_I10NM=y

or

  CONFIG_ACPI_NFIT=y
  CONFIG_EDAC_DEBUG=y
  CONFIG_EDAC_SKX=y
  CONFIG_EDAC_I10NM=m

with:
  ...
  CC [M]  drivers/edac/skx_common.o
  ...
  .../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module'

That is because if one of the two drivers - skx_edac or i10nm_edac - is
built-in and the other one is a module, the shared file skx_common.c
gets linked into a module object by kbuild. Therefore, when linking that
same file into vmlinux, the '__this_module' symbol used in debugfs isn't
defined, leading to the above error.

Fix it by moving all debugfs code from skx_common.c to both skx_base.c
and i10nm_base.c respectively. Thus, skx_common.c doesn't refer to the
'__this_module' symbol anymore.

Clarify skx_common.c's purpose at the top of the file for future
reference, while at it.

 [ bp: Make text more readable. ]

Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190321221339.GA32323@agluck-desk
2019-03-23 09:43:50 +01:00
Linus Torvalds
e13284da94 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
 "This time around we have in store:

   - Disable MC4_MISC thresholding banks on all AMD family 0x15 models
     (Shirish S)

   - AMD MCE error descriptions update and error decode improvements
     (Yazen Ghannam)

   - The usual smaller conversions and fixes"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Improve error message when kernel cannot recover, p2
  EDAC/mce_amd: Decode MCA_STATUS in bit definition order
  EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit
  EDAC, mce_amd: Print ExtErrorCode and description on a single line
  EDAC, mce_amd: Match error descriptions to latest documentation
  x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
  x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
  x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
  RAS: Add a MAINTAINERS entry
  RAS: Use consistent types for UUIDs
  x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
  x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
  x86/MCE: Switch to use the new generic UUID API
2019-03-08 09:11:39 -08:00
Linus Torvalds
1b37b8c48d * A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler)
* New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)
 
 * Altera SDRAM functionality carveout for separate enablement of RAS and
 SDRAM capabilities on some Altera chips. (Thor Thayer)
 
 * The usual round of cleanups and fixes
 
 Last but not least:
 
 * Recruit James Morse as a reviewer for the ARM side
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlx+SqYACgkQEsHwGGHe
 VUpwxg//fDdlIcnNjPUKWcBQxfy7meFd5xlDwbbIbkdE1mfHLBP6n2gRVM9NguSm
 shYPXcdqIrFTn4D7nOxVLS2Gqa7cF/j9M+YaqTNfe9/OVI0oSeM84D2+kEUi2tHQ
 LkCbBL9W+SAk4wjcFUPrEuwPaABfPdt0g9wuEf3Yg+PQsZ4FojwF7p91plBiKo/X
 GewLIM4+QT/mIkyn5u+2UJWayUvtdc1nchBGg3klYaDTRsUqH9pn284bInj7/Woj
 r34288yXuksIhDnUd2h4F9RCdZegBLIZf/k7Rqdg+Acot64c3PprE+/SI9nFcYfn
 fcF/48Sv6vMfP5kDKeJhsDjWu85VdpP+Cp4bxebXx4NURWn30kyYGDdpvbpgWxzc
 XDOXiEDxfh43/dNEyqCRr86dcZS8ro1pQNlnQvxOJyMljdEGjbB4JizG2ZvVluBP
 hSu3ifgpTiBGJMRQQijha41SMuWE7Z1ZgZt/XnyPAKwEEFtQVrm7IfnDohag3VYw
 6kWMVeyenmx/yF1JmA0fTxAdeeZPMnbUx0JxHRo1wJXF+1b19b0P+1nYUjgKlXQN
 Wq78DGPkQ9InfISFegS/A2AMWk+ZgLZ5d4pVwRVWdyeOMQVUoXO4R3KQur1tV7gu
 vm5BpWRZUszhcVvuhly8fOTyOsudYsNe7EeMd2V0Q2FZBy81MH8=
 =A1kA
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler)

 - New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)

 - Altera SDRAM functionality carveout for separate enablement of RAS
   and SDRAM capabilities on some Altera chips. (Thor Thayer)

 - The usual round of cleanups and fixes

And last but not least: recruit James Morse as a reviewer for the ARM
side.

* tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC/altera: Add separate SDRAM EDAC config
  EDAC, altera: Add missing of_node_put()
  EDAC, skx_common: Add code to recognise new compound error code
  EDAC, i10nm: Fix randconfig builds
  EDAC, i10nm: Add a driver for Intel 10nm server processors
  EDAC, skx_edac: Delete duplicated code
  EDAC, skx_common: Separate common code out from skx_edac
  EDAC: Do not check return value of debugfs_create() functions
  EDAC: Add James Morse as a reviewer
  dt-bindings, EDAC: Add Aspeed AST2500
  EDAC, aspeed: Add an Aspeed AST2500 EDAC driver
2019-03-08 09:07:07 -08:00
Thor Thayer
580b5cf50c EDAC/altera: Add separate SDRAM EDAC config
The CONFIG_ALTERA_EDAC Kconfig symbol always enables the SDRAM EDAC
functionality. On the newer architectures, however, there are cases
where the peripheral EDAC functionality is enabled but SDRAM needs to be
disabled.

Move SDRAM functions so they can be contained inside the conditional
CONFIG. Create new CONFIG option just for SDRAM.

 [ bp: Massage commit message. ]

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: dinguyen@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux@armlinux.org.uk
Link: https://lkml.kernel.org/r/1551121006-4657-2-git-send-email-thor.thayer@linux.intel.com
2019-02-26 16:18:57 +01:00
Yazen Ghannam
a0bcd3c0b8 EDAC/mce_amd: Decode MCA_STATUS in bit definition order
Sort the MCA_STATUS bits in decode output to follow how they are defined
in the register.

The order is as follows:

  Bit | Decode
  ------------
  62  | Over
  61  | UC
  59  | MiscV
  58  | AddrV
  57  | PCC
  55  | TCC
  53  | SyndV
  46  | CECC
  45  | UECC
  44  | Deferred
  43  | Poison
  40  | Scrub

 [ bp: Massage a bit. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190212212417.107049-2-Yazen.Ghannam@amd.com
2019-02-15 14:36:31 +01:00
Yazen Ghannam
3f4da372ec EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit
Previous AMD systems have had a bit in MCA_STATUS to indicate that an
error was detected on a scrub operation. However, this bit was defined
differently within different banks and families/models.

Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero
or defined as "Scrub", for all MCA banks and CPU models. Therefore, this
bit can be defined as the "Scrub" bit.

Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding
module for Family 17h and newer systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190212212417.107049-1-Yazen.Ghannam@amd.com
2019-02-15 14:25:58 +01:00
Huang Zijiang
7f736599d6 EDAC, altera: Add missing of_node_put()
The call to of_parse_phandle() returns a node pointer with refcount
incremented thus it must be explicitly decremented here after the last
usage.

Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: wang.yi59@zte.com.cn
Link: https://lkml.kernel.org/r/1550126347-27984-1-git-send-email-huang.zijiang@zte.com.cn
2019-02-15 12:02:47 +01:00
Tony Luck
cbfa482f7e EDAC, skx_common: Add code to recognise new compound error code
A new error code for systems that use DRAM as an extra level of cache
looks like:

    000F 0010 1MMM CCCC

where the MMM and CCCC bits are used for the same purpose as the
original code. For this new class of errors the ADXL translation will
provide details of both the DIMM used as cache for the error location
and the component that is being cached.

Note: This new error code is first supported in Skylake. Older EDAC
drivers do not need to be updated.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190205182109.27828-1-tony.luck@intel.com
2019-02-06 11:03:06 +01:00
Tony Luck
d6a9f7336d EDAC, i10nm: Fix randconfig builds
I10NM_EDAC depends on CONFIG_ACPI so make that dependency explicit.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190205180200.26865-1-tony.luck@intel.com
2019-02-06 10:40:58 +01:00
Yazen Ghannam
1c1522d32a EDAC, mce_amd: Print ExtErrorCode and description on a single line
Save a log line by printing the extended error code and the description
on a single line. This is similar to how errors are printed in other
subsystems, e.g. "#, description". If we don't have a valid description
then only the number/code is printed.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-6-Yazen.Ghannam@amd.com
2019-02-04 19:29:13 +01:00
Yazen Ghannam
e03447ee71 EDAC, mce_amd: Match error descriptions to latest documentation
Update the error descriptions to match the latest documentation for
easier searching. In some cases the changes are small and in other cases
the changes may be total rewording of the description.

No functional changes.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-5-Yazen.Ghannam@amd.com
2019-02-03 13:16:50 +01:00
Yazen Ghannam
8a5dd2cd2f x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
Some SMCA bank types on future systems will report new error types even
though the bank type is not treated as a new version. These new error
types will reported by bits that are reserved in past systems.

Add the new error descriptions to the lists in edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com
2019-02-03 13:05:16 +01:00
Yazen Ghannam
3ad7e748c1 x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
The existing CS, PSP, and SMU SMCA bank types will see new versions (as
indicated by their McaTypes) in future SMCA systems.

Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
same names as the older versions, since they are logically the same to
the user. SMCA systems won't mix and match IP blocks with different
McaType versions in the same system, so there isn't a need to
distinguish them. The MCA_IPID register is saved when logging an MCA
error, and that can be used to triage the error.

Also, add the new error descriptions to edac_mce_amd. Some error types
(positions in the list) are overloaded compared to the previous
McaTypes. Therefore, just create new lists of the error descriptions to
keep things simple even if some of the error descriptions are the same
between versions.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com
2019-02-03 13:01:57 +01:00
Yazen Ghannam
cbfa447edd x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and
PCIE SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.com
2019-02-03 13:01:44 +01:00
Qiuxu Zhuo
d4dc89d069 EDAC, i10nm: Add a driver for Intel 10nm server processors
This driver supports the Intel 10nm series server integrated memory
controller. It gets the memory capacity and topology information by
reading the registers in PCI configuration space and memory-mapped I/O.

It decodes the memory error address to the platform specific address
by using the ACPI Address Translation (ADXL) Device Specific Method
(DSM).

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-5-tony.luck@intel.com
2019-02-02 13:33:18 +01:00
Qiuxu Zhuo
98f2fc829e EDAC, skx_edac: Delete duplicated code
Delete the duplicated code from skx_edac.c and rename skx_edac.c to
skx_base.c. Update the Makefile to build the skx_edac driver from
skx_base.c and skx_common.c.

Add SPDX to skx_base.c and clean out unnecessary #include lines.

 [ bp: Drop the license boilerplate - there's an SPDX identifier now. ]

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-4-tony.luck@intel.com
2019-02-02 13:33:11 +01:00
Qiuxu Zhuo
88a242c987 EDAC, skx_common: Separate common code out from skx_edac
Parts of skx_edac can be shared with the Intel 10nm server EDAC driver.

Carve out the common parts from skx_edac in preparation to support both
skx_edac driver and i10nm_edac drivers.

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-3-tony.luck@intel.com
2019-02-02 10:50:59 +01:00
Thor Thayer
245b6c6558 EDAC, altera: Fix S10 persistent register offset
Correct the persistent register offset where address and status are
stored.

Fixes: 08f08bfb7b ("EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: robh+dt@kernel.org
Cc: stable <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1548179287-21760-2-git-send-email-thor.thayer@linux.intel.com
2019-01-24 17:13:59 +01:00
Greg Kroah-Hartman
912ebd99ed EDAC: Do not check return value of debugfs_create() functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

 [ bp: Make edac_debugfs_init() return void too, while at it. ]

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190122152151.16139-17-gregkh@linuxfoundation.org
2019-01-23 10:41:18 +01:00
Stefan M Schaeckeler
9b7e6242ee EDAC, aspeed: Add an Aspeed AST2500 EDAC driver
Add support for the Aspeed AST2500 SoC.

Signed-off-by: Stefan M Schaeckeler <sschaeck@cisco.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-aspeed@lists.ozlabs.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1547743097-5236-2-git-send-email-schaecsn@gmx.net
2019-01-18 15:23:11 +01:00
Patrick Havelange
75dfa87035 EDAC, fsl_ddr: Add LS1021A to the list of supported hardware
The Freescale ddr driver also works on the LS1021A board.

Signed-off-by: Patrick Havelange <patrick.havelange@essensium.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: York Sun <york.sun@nxp.com>
Cc: arnout.vandecappelle@essensium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: matthew.weber@rockwellcollins.com
Cc: patrick.havelange@essensium.com
Link: https://lkml.kernel.org/r/20181219104323.10324-1-patrick.havelange@essensium.com
2018-12-19 11:57:45 +01:00
YueHaibing
bd44735418 EDAC, i5000: Remove set but not used local variables
Remove unused local variables as reported by gcc's -Wunused-but-set-variable option.

 [ bp: simplify commit message. ]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20181211095207.25936-1-yuehaibing@huawei.com
2018-12-11 14:53:49 +01:00
Alexandre Belloni
37d964f914 EDAC, i82975x: Fix spelling mistake "reserverd" -> "reserved"
Fix a spelling mistake in a register layout description.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20181120153304.1218-1-alexandre.belloni@bootlin.com
2018-11-20 17:46:01 +01:00
York Sun
a59817fa8f EDAC, fsl: Move error injection under CONFIG_EDAC_DEBUG
Gate error injection feature with CONFIG_EDAC_DEBUG so that it is not
visible in production setups.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: York Sun <york.sun@nxp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20181119225303.13265-1-york.sun@nxp.com
2018-11-20 17:33:32 +01:00
Qiuxu Zhuo
fa1c071c1e EDAC, skx: Let EDAC core show the decoded result for debugfs
Current debugfs shows the decoded result in its own print format which
is inconvenient for analysis/statistics.

Use skx_mce_check_error() instead of skx_decode() for debugfs,
then the decoded result is showed via EDAC core in a more readable
format like "CPU_SrcID#[0-9]_MC#[0-9]_Chan#[0-9]_DIMM#[0-9]".

Print a warning the first time this interface is used so the
administrator can see the console log that error(s) have been faked.

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353705-13531-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:19:53 +01:00
Qiuxu Zhuo
85b9c8bfee EDAC, skx: Move debugfs node under EDAC's hierarchy
The debugfs node is /sys/kernel/debug/skx_edac_test. Rename it and move
under EDAC debugfs root directory. Remove the unused 'skx_fake_addr' and
remove the 'skx_test' on error.

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353684-13496-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:17:24 +01:00
Qiuxu Zhuo
e235dd43d8 EDAC, skx: Prepend hex formatting with '0x'
Some debug/error strings in hex formatting do not have the '0x' prefix.

Prepend hex formatting with '0x' for them, but with one exception:
"Couldn't enable %04x:%04x", instead of putting '0x' in this line,
add the word 'device'. We commonly use 8086:1234 without the leading
'0x' (e.g. as '-d' argument to lspci(8) and setpci(8) commands).

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Tony Luck <tony.luck@intel.com>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353660-13458-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:17:19 +01:00
Qiuxu Zhuo
a6a386152a EDAC, skx: Fix function calling order in skx_exit()
The order of function calling in skx_exit() is not the reversed order in
skx_init(). Fix it.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Tony Luck <tony.luck@intel.com>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1542353616-13421-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16 11:17:11 +01:00
Borislav Petkov
861e6ed667 EDAC: Drop per-memory controller buses
... and use the single edac_subsys object returned from
subsys_system_register(). The idea is to have a single bus
and multiple devices on it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
2018-11-13 21:55:24 +01:00
Tony Luck
88a10b1517 EDAC: Don't add devices under /sys/bus/edac
Nobody(*) uses them. Dropping this will allow us to make the total
number of memory controllers configurable (as we won't have to worry
about duplicated device names under this directory).

(*) https://lkml.kernel.org/r/20180927221054.580220e5@coco.lan

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181001224313.GA9487@agluck-desk
2018-11-13 20:26:41 +01:00
Colin Ian King
1722bc0e8c EDAC: Fix indentation issues in several EDAC drivers
Replace spaces with tabs and insert missing indentation.

 [ bp: Rewrite commit message. ]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Arvind R." <arvino55@gmail.com>
CC: Mark Gross <mark.gross@intel.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Ranganathan Desikan <ravi@jetztechnologies.com>
CC: kernel-janitors@vger.kernel.org
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181109133757.21471-1-colin.king@canonical.com
2018-11-10 16:56:16 +01:00
Luck, Tony
24c9d423e8 EDAC, skx: Fix randconfig builds in a better way
It was previously noted that Kconfig complained about unmet dependencies
when trying to configure skx_edac together with CONFIG_ACPI=n. First fix
for this checked for ACPI when doing

  select ACPI_ADXL

but this required stub functions for the case where ACPI wasn't
selected. It also allowed building a driver that didn't actually work
for a system that has non-volatile DIMMs.

Arnd Bergmann pointed out that the right fix is to make EDAC_SKX
"depend on ACPI".

Fixes: a324e9396c ("EDAC, skx: Fix randconfig builds")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: qiuxu.zhuo@intel.com
Link: http://lkml.kernel.org/r/20181106183914.GA26731@agluck-desk
2018-11-07 22:58:29 +01:00
YueHaibing
96c1c58eb0 EDAC, i82975x: Remove set but not used variable dtype
Fix this gcc -Wunused-but-set-variable warning:

  drivers/edac/i82975x_edac.c:378:16: warning: variable 'dtype'
  	set but not used [-Wunused-but-set-variable]

It was introduced in

  084a4fccef ("edac: move dimm properties to struct dimm_info")

but never used.

Also, remove the function i82975x_dram_type() and move the comment and
the assignment to the place where it is used.

 [ bp: massage commit message and shorten comment. ]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "Arvind R." <arvino55@gmail.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: ravi@jetztechnologies.com
CC: arvino55@gmail.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181107022237.14048-1-yuehaibing@huawei.com
2018-11-07 20:19:19 +01:00
Dan Carpenter
8fd8cbfead EDAC, qcom_edac: Remove irq_handled local variable
irq_handled isn't initialized to false on function entry. However, it is
not really needed and the IRQ handler return value can be set directly
instead.

 [ bp: rewrite commit message. ]

Fixes: 27450653f1 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Channagoud Kadabi <ckadabi@codeaurora.org>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
CC: kernel-janitors@vger.kernel.org
CC: linux-arm-msm@vger.kernel.org
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181018141522.rywdvjmlpk4ticiw@kili.mountain
2018-11-06 12:03:16 +01:00
Manish Narani
1a81361f75 EDAC, synopsys: Add Error Injection support for ZynqMP DDR controller
Add support for Error Injection for ZynqMP DDR controller IP. For
injecting errors, the Row, Column, Bank, Bank Group and Rank bits
positions are determined via Address Map registers of the Synopsys DDR
controller.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1540447621-22870-7-git-send-email-manish.narani@xilinx.com
2018-11-06 10:38:27 +01:00
Manish Narani
b500b4a029 EDAC, synopsys: Add ECC support for ZynqMP DDR controller
Add ECC support for ZynqMP DDR controller IP. The IP supports interrupts
for corrected and uncorrected errors. Add interrupt handlers for the
same.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1540447621-22870-5-git-send-email-manish.narani@xilinx.com
2018-11-06 10:30:29 +01:00
Manish Narani
e926ae573b EDAC, synopsys: Add macro defines for ZynqMP DDRC
Add macro defines for ZynqMP DDR controller. These macros will be used
for ZynqMP ECC operations.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1540447621-22870-4-git-send-email-manish.narani@xilinx.com
2018-11-05 13:40:32 +01:00
Manish Narani
84de0b493f EDAC, synopsys: Add error handling for the of_device_get_match_data() result
The function of_device_get_match_data() can return NULL in case of
error. Add error handling for the same in the mc_probe() function.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1540447621-22870-2-git-send-email-manish.narani@xilinx.com
2018-11-05 13:36:20 +01:00
Manish Narani
3d02a8975e EDAC, synopsys: Add platform specific structures for the DDR Controller
Add platform specific structures so that different IP support can be
added later using quirks.

 [ bp: fix function names. ]

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1538667328-9465-6-git-send-email-manish.narani@xilinx.com
2018-11-05 13:30:49 +01:00
Manish Narani
fa9f6b9e1c EDAC, synopsys: Return void for functions always returning 0
The current driver has functions which are always returning 0 - make
them return void instead.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1538667328-9465-5-git-send-email-manish.narani@xilinx.com
2018-11-05 13:30:00 +01:00
Manish Narani
225af74d63 EDAC, synopsys: Correct comments
Spellcheck and improve/correct comments.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: michal.simek@xilinx.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1538667328-9465-4-git-send-email-manish.narani@xilinx.com
2018-11-05 13:28:29 +01:00
Manish Narani
bb894bc46e EDAC, synopsys: Shorten static function names
Shorten static function names, remove the unnecessary 'synps_' prefix in
function names.

 [ bp: Drop the "edac_" prefix too as that prefix is reserved for
   EDAC core functions. ]

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: mark.rutland@arm.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1538667328-9465-3-git-send-email-manish.narani@xilinx.com
2018-11-05 13:27:56 +01:00
Manish Narani
1b51adc6b7 EDAC, synopsys: Improve code readability
Clean up the driver code. Update the debug messages for EDAC errors
reported. Increase the indentation of the macros for better readability.

Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Michal Simek <michal.simek@xilinx.com>
CC: amit.kucheria@linaro.org
CC: devicetree@vger.kernel.org
CC: leoyang.li@nxp.com
CC: linux-arm-kernel@lists.infradead.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: manish.narani@xilinx.com
CC: mark.rutland@arm.com
CC: mchehab@kernel.org
CC: michal.simek@xilinx.com
CC: robh+dt@kernel.org
CC: sudeep.holla@arm.com
Link: http://lkml.kernel.org/r/1538667328-9465-2-git-send-email-manish.narani@xilinx.com
2018-11-05 13:22:55 +01:00
Linus Torvalds
0b21f21ae0 * skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo)
* ACPI_ADXL build fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlvcKnEACgkQEsHwGGHe
 VUr45g/9E67lU84Dz41ly6zFDTmQdYBNPRayz9QgIGHfIMwIN8aAVoezC8B4NCqc
 8rQ3W48REkewLmPO2GoEVld4UnHbvVlZYZ4bGcxhYzWL5dcleoJIFVupF4Ifo6/Y
 SPbVUyihtw4FFr+Ft8x/bOJQ6QQ4CIiVX3mJBdwdqQ6Lm9yoEz6AlSbTJiyyzr8I
 gGfcKD5TcmJWpsDzRXJ/xWddfA+hfUpKxkuJqPIRZvmKnJpy79af8MlQAZwXuXVS
 361wj5SzP0LzktT5JQn73V04NzSSDTbFSycnWXUex3lxIsE6KolsEwfglccdhvIy
 Nz/Du4kJn1Ye/zbsO27qtkGCXSz0qYKsdfUg1RY+MnZPe4mFmmAm0izTKH3EltQx
 +OQWtcBz5vvNf3Odwnfw1nNvYrbnzaDNIsjHspopVrsmD2oRefmpsYsOEzGyAGDw
 PtUC3l70u7i4e2NtF7Doo23g1yIyXWQZtrEDDFmQZGUo7YKxE7AkGPzINxpNSQME
 z11ny9GyISaUc/Zf5zUHYmIFcYtiCngecl4F8hCXvfyp4MbxYgTI+5YY185NfLSU
 pQHwyMGKLifI6ndhWd3sO9KSCqFzSZKaVF/DdKScSqB+v1NqflMyjzfulR625/5U
 cSWWQP8Zu6H7os/X3+o5KCC/Pzbs5Nx/QPLwCabrOwcI3kdmQ2E=
 =JQUK
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull more EDAC updates from Borislav Petkov:
 "The second part of the EDAC pile which contains the ADXL user and a
  build fix which addresses a not-so-sensical .config but fixes
  randconfig builds people do:

   - skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo)

   - ACPI_ADXL build fix"

[ I don't think "sensical" is a word, particularly when used in the
  context of actually meaning "nonsensical", but I like it   - Linus ]

* tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, skx: Fix randconfig builds
  EDAC, skx_edac: Add address translation for non-volatile DIMMs
2018-11-02 11:17:22 -07:00
Borislav Petkov
a324e9396c EDAC, skx: Fix randconfig builds
The driver depends on the ADXL component glue and selects it. However,
ADXL itself implicitly depends on ACPI and in nonsensical randconfig
builds like this:

  # CONFIG_ACPI is not set
  CONFIG_ACPI_ADXL=y

where ACPI is not enabled, the build fails with:

  drivers/edac/skx_edac.o: In function `skx_mce_check_error':
  skx_edac.c:(.text+0xab): undefined reference to `adxl_decode'
  drivers/edac/skx_edac.o: In function `skx_init':
  skx_edac.c:(.init.text+0x8bf): undefined reference to `adxl_get_component_names'
  make: *** [vmlinux] Error 1

Add stubs for that case so that the build succeeds. CONFIG_ACPI=n
doesn't make any sense for real configurations but this fix will at
least silence randconfig builds.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
2018-10-31 19:24:21 +01:00
Linus Torvalds
b22b6beae6 ARM: SoC driver updates for 4.17
The most noteworthy SoC driver changes this time include:
 
 - The TEE subsystem gains an in-kernel interface to access the TEE
   from device drivers.
 
 - The reset controller subsystem gains a driver for the Qualcomm
   Snapdragon 845 Power Domain Controller.
 
 - The Xilinx Zynq platform now has a firmware interface for its
   platform management unit. This contains a firmware "ioctl" interface
   that was a little controversial at first, but the version we merged
   solved that by not exposing arbitrary firmware calls to user space.
 
 - The Amlogic Meson platform gains a "canvas" driver that is used
   for video processing and shared between different high-level drivers.
 
 The rest is more of the usual, mostly related to SoC specific power
 management support and core drivers in drivers/soc:
 
 - Several Renesas SoCs (RZ/G1N, RZ/G2M, R-Car V3M, RZ/A2M) gain new
   features related to power and reset control.
 
 - The Mediatek mt8183 and mt6765 SoC platforms gain support for
   their respective power management chips.
 
 - A new driver for NXP i.MX8, which need a firmware interface for
   power management.
 
 - The SCPI firmware interface now contains support estimating power
   usage of performance states
 
 - The NVIDIA Tegra "pmc" driver gains a few new features, in particular
   a pinctrl interface for configuring the pads.
 
 - Lots of small changes for Qualcomm, in particular the "smem"
   device driver.
 
 - Some cleanups for the TI OMAP series related to their sysc
   controller.
 
 Additional cleanups and bugfixes in SoC specific drivers include the
 Meson, Keystone, NXP, AT91, Sunxi, Actions, and Tegra platforms.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJb1zEhAAoJEGCrR//JCVInnYQP/1pPXWsR/DV4COf4kGJFSAFn
 EfHXJM1vKtb7AWl6SClpHFlUMt+fvL+dzDNJ9aeRr2GjcuWfzKDcrBM1ZvM70I31
 C1Oc3b6OXEERCozDpRg/Vt8OpIvvWnVpaVffS9E5y6KqF8KZ0UbpWIxUJ87ik44D
 UvNXYOU/LUGPxR1UFm5rm2zWF4i+rBvqnpVaXbeOsXsLElzxXVfv2ymhhqIpo2ws
 o6e00DSjUImg8hLL4HCGFs2EX1KSD+oFzYaOHIE0/DEaiOnxVOpMSRhX2tZ+tRRb
 DekbjL+wz5gOAKJTQfQ2sNNkOuK8WFqmE5G0RJ0iYPXuNsB/17UNb2bhTJeqGdcD
 dqCQBLQuDUD2iHJ/d4RK5Kx3a8h2X63n5bdefgF5UX/2RBpXwFk1QtHr8X0DuY8c
 o/dPGFNBOn3egzMyXrD5VEtnaTwK1Y6/h09qfuOOF1ZuYDmELKRkWMV9l8dIsvd8
 ANjaw5B8MOUAf8DccBmPgUGu0XLCDyuFGqNVd9Kj5u3az+tyggIsgkEjWg1pxTv0
 7dDDyv4Ara1V1HVDZ23l3CgmYCZQx2R/vdpX/DjuDPGEHGjZ5s2TW8P6oegdxtIh
 LcTonNoTsRYzMrGD/aqhG/8fYsAScXePa3CLKl1Hrl+wFVV0XcaggH23GwD/k+7S
 eDBrEzLkOTxM+WXvsvKY
 =c/PQ
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "The most noteworthy SoC driver changes this time include:

   - The TEE subsystem gains an in-kernel interface to access the TEE
     from device drivers.

   - The reset controller subsystem gains a driver for the Qualcomm
     Snapdragon 845 Power Domain Controller.

   - The Xilinx Zynq platform now has a firmware interface for its
     platform management unit. This contains a firmware "ioctl"
     interface that was a little controversial at first, but the version
     we merged solved that by not exposing arbitrary firmware calls to
     user space.

   - The Amlogic Meson platform gains a "canvas" driver that is used for
     video processing and shared between different high-level drivers.

  The rest is more of the usual, mostly related to SoC specific power
  management support and core drivers in drivers/soc:

   - Several Renesas SoCs (RZ/G1N, RZ/G2M, R-Car V3M, RZ/A2M) gain new
     features related to power and reset control.

   - The Mediatek mt8183 and mt6765 SoC platforms gain support for their
     respective power management chips.

   - A new driver for NXP i.MX8, which need a firmware interface for
     power management.

   - The SCPI firmware interface now contains support estimating power
     usage of performance states

   - The NVIDIA Tegra "pmc" driver gains a few new features, in
     particular a pinctrl interface for configuring the pads.

   - Lots of small changes for Qualcomm, in particular the "smem" device
     driver.

   - Some cleanups for the TI OMAP series related to their sysc
     controller.

  Additional cleanups and bugfixes in SoC specific drivers include the
  Meson, Keystone, NXP, AT91, Sunxi, Actions, and Tegra platforms"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (129 commits)
  firmware: tegra: bpmp: Implement suspend/resume support
  drivers: clk: Add ZynqMP clock driver
  dt-bindings: clock: Add bindings for ZynqMP clock driver
  firmware: xilinx: Add zynqmp IOCTL API for device control
  Documentation: xilinx: Add documentation for eemi APIs
  MAINTAINERS: imx: include drivers/firmware/imx path
  firmware: imx: add misc svc support
  firmware: imx: add SCU firmware driver support
  reset: Fix potential use-after-free in __of_reset_control_get()
  dt-bindings: arm: fsl: add scu binding doc
  soc: fsl: qbman: add interrupt coalesce changing APIs
  soc: fsl: bman_portals: defer probe after bman's probe
  soc: fsl: qbman: Use last response to determine valid bit
  soc: fsl: qbman: Add 64 bit DMA addressing requirement to QBMan
  soc: fsl: qbman: replace CPU 0 with any online CPU in hotplug handlers
  soc: fsl: qbman: Check if CPU is offline when initializing portals
  reset: qcom: PDC Global (Power Domain Controller) reset controller
  dt-bindings: reset: Add PDC Global binding for SDM845 SoCs
  reset: Grammar s/more then once/more than once/
  bus: ti-sysc: Just use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS
  ...
2018-10-29 15:16:01 -07:00
Linus Torvalds
b27186abb3 Devicetree updates for 4.20:
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
 
 - Work to get rid of direct accesses to struct device_node name and
   type pointers in preparation for removing them. New helpers for
   parsing DT cpu nodes and conversions to use the helpers. printk
   conversions to %pOFn for printing DT node names. Most went thru
   subystem trees, so this is the remainder.
 
 - Fixes to DT child node lookups to actually be restricted to child
   nodes instead of treewide.
 
 - Refactoring of dtb targets out of arch code. This makes the support
   more uniform and enables building all dtbs on c6x, microblaze, and
   powerpc.
 
 - Various DT binding updates for Renesas r8a7744 SoC
 
 - Vendor prefixes for Facebook, OLPC
 
 - Restructuring of some ARM binding docs moving some peripheral bindings
   out of board/SoC binding files
 
 - New "secure-chosen" binding for secure world settings on ARM
 
 - Dual licensing of 2 DT IRQ binding headers
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlvTKWYQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw8J5EACMAnrTxWQmXfQXOZEVxztcFavH6LP8mh2e
 7FZIZ38jzHXXvl81tAg1nBhzFUU/qtvqW8NDCZ9OBxKvp6PFDNhWu241ZodSB1Kw
 MZWy2A9QC+qbHYCC+SB5gOT0+Py3v7LNCBa5/TxhbFd35THJM8X0FP7gmcCGX593
 9Ml1rqawT4mK5XmCpczT0cXxyC4TgVtpfDWZH2KgJTR/kwXVQlOQOGZ8a1y/wrt7
 8TLIe7Qy4SFRzjhwbSta1PUehyYfe4uTSsXIJ84kMvNMxinLXQtvd7t9TfsK8p/R
 WjYUneJskVjtxVrMQfdV4MxyFL1YEt2mYcr0PMKIWxMCgGDAZsHPoUZmjyh/PrCI
 uiZtEHn3fXpUZAV/xEHHNirJxYyQfHGiksAT+lPrUXYYLCcZ3ZmqiTEYhGoQAfH5
 CQPMuxA6yXxp6bov6zJwZSTZtkXciju8aQRhUhlxIfHTqezmGYeql/bnWd+InNuR
 upANLZBh6D2jTWzDyobconkCCLlVkSqDoqOx725mMl6hIcdH9d2jVX7hwRf077VI
 5i3CyPSJOkSOLSdB8bAPYfBoaDtH2bthxieUrkkSbIjbwHO1H6a2lxPeG/zah0a3
 ePMGhi7J84UM4VpJEi000cP+bhPumJtJrG7zxP7ldXdfAF436sQ6KRptlcpLpj5i
 IwMhUQNH+g==
 =335v
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull Devicetree updates from Rob Herring:
 "A bit bigger than normal as I've been busy this cycle.

  There's a few things with dependencies and a few things subsystem
  maintainers didn't pick up, so I'm taking them thru my tree.

  The fixes from Johan didn't get into linux-next, but they've been
  waiting for some time now and they are what's left of what subsystem
  maintainers didn't pick up.

  Summary:

   - Sync dtc with upstream version v1.4.7-14-gc86da84d30e4

   - Work to get rid of direct accesses to struct device_node name and
     type pointers in preparation for removing them. New helpers for
     parsing DT cpu nodes and conversions to use the helpers. printk
     conversions to %pOFn for printing DT node names. Most went thru
     subystem trees, so this is the remainder.

   - Fixes to DT child node lookups to actually be restricted to child
     nodes instead of treewide.

   - Refactoring of dtb targets out of arch code. This makes the support
     more uniform and enables building all dtbs on c6x, microblaze, and
     powerpc.

   - Various DT binding updates for Renesas r8a7744 SoC

   - Vendor prefixes for Facebook, OLPC

   - Restructuring of some ARM binding docs moving some peripheral
     bindings out of board/SoC binding files

   - New "secure-chosen" binding for secure world settings on ARM

   - Dual licensing of 2 DT IRQ binding headers"

* tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (78 commits)
  ARM: dt: relicense two DT binding IRQ headers
  power: supply: twl4030-charger: fix OF sibling-node lookup
  NFC: nfcmrvl_uart: fix OF child-node lookup
  net: stmmac: dwmac-sun8i: fix OF child-node lookup
  net: bcmgenet: fix OF child-node lookup
  drm/msm: fix OF child-node lookup
  drm/mediatek: fix OF sibling-node lookup
  of: Add missing exports of node name compare functions
  dt-bindings: Add OLPC vendor prefix
  dt-bindings: misc: bk4: Add device tree binding for Liebherr's BK4 SPI bus
  dt-bindings: thermal: samsung: Add SPDX license identifier
  dt-bindings: clock: samsung: Add SPDX license identifiers
  dt-bindings: timer: ostm: Add R7S9210 support
  dt-bindings: phy: rcar-gen2: Add r8a7744 support
  dt-bindings: can: rcar_can: Add r8a7744 support
  dt-bindings: timer: renesas, cmt: Document r8a7744 CMT support
  dt-bindings: watchdog: renesas-wdt: Document r8a7744 support
  dt-bindings: thermal: rcar: Add device tree support for r8a7744
  Documentation: dt: Add binding for /secure-chosen/stdout-path
  dt-bindings: arm: zte: Move sysctrl bindings to their own doc
  ...
2018-10-26 12:09:58 -07:00
Qiuxu Zhuo
ad6e16059d EDAC, skx_edac: Add address translation for non-volatile DIMMs
Currently, this driver doesn't support address translation for
non-volatile DIMMs.

The ACPI ADXL DSM method provides address translation for both volatile
and non-volatile DIMMs. Enable it to use the ACPI DSM methods if they
are supported and there are non-volatile DIMMs populated on the system.

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: arozansk@redhat.com
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1540106336-5212-1-git-send-email-qiuxu.zhuo@intel.com
2018-10-25 16:59:18 +02:00
Linus Torvalds
36168d7123 - amd64_edac: AMD family 0x17, models 0x10-0x2f support (Michael Jin)
Hygon Dhyana support (Pu Wen)
 
 - sb_edac: New maintainer + fixes (Tony Luck)
 	   Error reporting improvements and fixes (Qiuxu Zhuo)
 
 - ghes_edac: SMBIOS handle type 17 for DIMM locating and per-DIMM error
 	     accounting (Fan Wu)
 
 - altera_edac: Stratix10 support and refactoring (Thor Thayer)
 
 Out of tree addition:
 
 - acpi_adxl: Address Translation interface using an ACPI DSM (Tony Luck)
 
 - the usual amount of other misc fixes and cleanups all over.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlvRbEYACgkQEsHwGGHe
 VUqMXRAArOg2zJd2IuAMH/QxGSpwCIWYAQRN3mhLV7ngnBQ9d12CaadxUbhFLfgK
 j70DkvG7a3QzeaXkm/wf0EVro590x7zi7AgAPuOJNx1gdOEJTJpT/Bc7YSN0m8R3
 gjruqIVTKuaeY33tmhWN8XoMpX5W8Hmff7j1wrLxDfz/8vTPxZto/XpJaNVkmzUQ
 cIsDzOJzedGIeeto/dEqlwMr1sm4qddoTrVJhSsHrtnXKBRQMcxLQVL5LFtvbTPS
 tnXNN+xphtknzXWh151YmmEv4pPL34FN6JZOYpfViY/uTtiIuYD3O1Gsl1CVMXV7
 XPGAngo5lT+HV6Nb00sv4Ncdkl0vatQVMFWynOM0eCjVU39bD4ZrfL6alxBgx0PH
 JEBqaGGZEs71J3/GXpFeFPiy1SiZmUTwncPCouVAwkFu1HPBfRT5yVFLnl53ITg4
 Z0Hmdgd/zDkowfiXDQxKg2muFLHQwIYj1L0m0d+YfjYqoTIbPSsOsZMDlokwwHCS
 /AUGoi+pQCcexdeTbSfmz/8SS2shLhSudvLGtpehPiM8evff6qgzOvF7mVhaN8M7
 gPUIOiiTaC5z68RqbksqlxPdDEfq+A6R+0a2VUYkma2qMHLVCCuKZ9g+XiFBJ55D
 f8xDzwrZ4jkVI925bsThz7wxBITGgb6cHGNw3s9XwUUbns2Oz9g=
 =HEJK
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "The EDAC tree was busier than usual this cycle as the shortlog below
  shows.

  Also, this pull request is carrying an ACPI DSM driver which is used
  to ask the platform to supply the DIMM location of a reported hardware
  error and thus simplify all the EDAC logic when trying to map the
  error address to the respective DIMM.

  Core EDAC updates:

   - amd64_edac: AMD family 0x17, models 0x10-0x2f support (Michael Jin)
     Hygon Dhyana support (Pu Wen)

   - sb_edac: New maintainer + fixes (Tony Luck) Error reporting
     improvements and fixes (Qiuxu Zhuo)

   - ghes_edac: SMBIOS handle type 17 for DIMM locating and per-DIMM
     error accounting (Fan Wu)

   - altera_edac: Stratix10 support and refactoring (Thor Thayer)

  Out of tree addition:

   - acpi_adxl: Address Translation interface using an ACPI DSM (Tony
     Luck)

   - the usual amount of other misc fixes and cleanups all over"

* tag 'edac_for_4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (22 commits)
  ACPI/ADXL: Add address translation interface using an ACPI DSM
  EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr()
  EDAC, skx_edac: Fix logical channel intermediate decoding
  EDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting
  EDAC, altera: Work around int-to-pointer-cast warnings
  EDAC, amd64: Add Hygon Dhyana support
  EDAC: Raise the maximum number of memory controllers
  arm64: dts: stratix10: Add peripheral EDAC nodes
  EDAC, altera: Add Stratix10 peripheral support
  EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine
  arm64: dts: stratix10: Add SDRAM node
  EDAC, altera: Combine Stratix10 and Arria10 probe functions
  arm64: dts: stratix10: Additions to EDAC System Manager
  EDAC, i7core: Remove set but not used variable pvt
  EDAC, ghes: Use CPER module handles to locate DIMMs
  EDAC: Correct DIMM capacity unit symbol
  EDAC, sb_edac: Fix signedness bugs in *_get_ha() functions
  EDAC, sb_edac: Fix reporting for patrol scrubber errors
  EDAC, sb_edac: Return early on ADDRV bit and address type test
  MAINTAINERS: Update maintainer for drivers/edac/sb_edac.c
  ...
2018-10-25 06:40:00 -07:00
Linus Torvalds
c05f3642f4 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main updates in this cycle were:

   - Lots of perf tooling changes too voluminous to list (big perf trace
     and perf stat improvements, lots of libtraceevent reorganization,
     etc.), so I'll list the authors and refer to the changelog for
     details:

       Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
       Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
       Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
       Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
       Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.

     ... with the bulk of the changes written by Jiri Olsa, Tzvetomir
     Stoyanov and Arnaldo Carvalho de Melo.

   - Continued intel_rdt work with a focus on playing well with perf
     events. This also imported some non-perf RDT work due to
     dependencies. (Reinette Chatre)

   - Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
     This allows to speed up the PMI handler by avoiding unnecessary MSR
     writes and make it more accurate. (Andi Kleen)

   - kprobes cleanups and simplification (Masami Hiramatsu)

   - Intel Goldmont PMU updates (Kan Liang)

   - ... plus misc other fixes and updates"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
  kprobes/x86: Use preempt_enable() in optimized_callback()
  x86/intel_rdt: Prevent pseudo-locking from using stale pointers
  kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
  perf/x86/intel: Export mem events only if there's PEBS support
  x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
  x86/intel_rdt: Fix initial allocation to consider CDP
  x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
  x86/intel_rdt: Introduce utility to obtain CDP peer
  tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
  tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
  perf python: More portable way to make CFLAGS work with clang
  perf python: Make clang_has_option() work on Python 3
  perf tools: Free temporary 'sys' string in read_event_files()
  perf tools: Avoid double free in read_event_file()
  perf tools: Free 'printk' string in parse_ftrace_printk()
  perf tools: Cleanup trace-event-info 'tdata' leak
  perf strbuf: Match va_{add,copy} with va_end
  perf test: S390 does not support watchpoints in test 22
  perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
  tools include: Adopt linux/bits.h
  ...
2018-10-23 13:32:18 +01:00
Dan Carpenter
d8c27ba86a EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr()
Fix memory leak in L2c threaded interrupt handler.

 [ bp: Rewrite commit message. ]

Fixes: 41003396f9 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: David Daney <david.daney@cavium.com>
CC: Jan Glauber <jglauber@cavium.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Sergey Temerkhanov <s.temerkhanov@gmail.com>
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181013102843.GG16086@mwanda
2018-10-13 13:58:06 +02:00
Qiuxu Zhuo
8f18973877 EDAC, skx_edac: Fix logical channel intermediate decoding
The code "lchan = (lchan << 1) | ~lchan" for logical channel
intermediate decoding is wrong. The wrong intermediate decoding
result is {0xffffffff, 0xfffffffe}.

Fix it by replacing '~' with '!'. The correct intermediate
decoding result is {0x1, 0x2}.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Aristeu Rozanski <aris@redhat.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20181009172025.18594-1-tony.luck@intel.com
2018-10-09 19:30:04 +02:00
Peter Zijlstra
f2c4db1bd8 x86/cpu: Sanitize FAM6_ATOM naming
Going primarily by:

  https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

with additional information gleaned from other related pages; notably:

 - Bonnell shrink was called Saltwell
 - Moorefield is the Merriefield refresh which makes it Airmont

The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE

  for i in `git grep -l FAM6_ATOM` ; do
	sed -i  -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g'		\
		-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/'		\
		-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g'		\
		-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g'	\
		-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g'		\
		-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g'	\
		-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g'	\
		-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g'	\
		-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g'	\
		-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g'		\
		-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
  done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 10:14:32 +02:00
Tony Luck
432de7fd76 EDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting
The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com
2018-09-29 10:58:16 +02:00
Rob Herring
37dc218bed edac: cpc925: use for_each_of_cpu_node iterator
Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
has the side effect of defaulting to iterating using "cpu" node names in
preference to the deprecated (for FDT) device_type == "cpu".

The error messages are removed in the process as it's not the driver's
job to be checking cpu nodes. Any problems with cpu nodes should be
noticed by the architecture code.

Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rob Herring <robh@kernel.org>
2018-09-28 14:25:58 -05:00
Arnd Bergmann
8537bf1097 EDAC, altera: Work around int-to-pointer-cast warnings
The altera edac driver passes a token from a DT resource as
resource_size_t into an SMC call, but casts it to an __iomem pointer and
then a plain void pointer inbetween, mixing three or four incompatible
types in the process. The compiler complains about one of the
conversions:

  drivers/edac/altera_edac.c: In function 'altr_init_a10_ecc_block':
  drivers/edac/altera_edac.c:1053:10: error: cast to pointer from integer of \
	  different size [-Werror=int-to-pointer-cast]
     base = (void __iomem *)res.start;
            ^
  drivers/edac/altera_edac.c: In function 'altr_edac_a10_probe':
  drivers/edac/altera_edac.c:2062:10: error: cast to pointer from integer of \
	  different size [-Werror=int-to-pointer-cast]
     base = (void __iomem *)res.start;

Using a static checker probably also notices the __iomem cast.  Solving
this properly isn't trivial, but simply casting to a 'uintptr_t' instead
of 'void __iomem *' makes it less wrong and should avoid the warnings.

Fixes: d5fc912556 ("EDAC, altera: Combine Stratix10 and Arria10 probe functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: David Frey <dpfrey@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180927100949.973078-1-arnd@arndb.de
2018-09-28 10:09:27 +02:00
Pu Wen
c4a3e94641 EDAC, amd64: Add Hygon Dhyana support
Add support for Hygon Dhyana CPU to EDAC.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: mchehab@kernel.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: thomas.lendacky@amd.com
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/9d71061301177822bc55b3bfd44f91057458d886.1537533369.git.puwen@hygon.cn
2018-09-27 18:38:26 +02:00
Thor Thayer
064acbd4f4 EDAC, altera: Add Stratix10 peripheral support
Add a new peripheral ECC error injection algorithm for Stratix10 and
some Arria10 peripherals. Inject a single bit error and upon readback,
it will be corrected and the SBE IRQ handler will be called.

Add regmap selection for Stratix10 or Arria10 peripheral device memory
initialization.

Add checks for both Arria10 and Stratix10 to the peripheral ECC setup.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dinguyen@kernel.org
Cc: robh+dt@kernel.org
Cc: mark.rutland@arm.com
Cc: mchehab@kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/1537883342-30180-6-git-send-email-thor.thayer@linux.intel.com
2018-09-25 21:22:20 +02:00
Thor Thayer
08f08bfb7b EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine
Change Stratix10 regmap to use offsets from a base to match
the Arria10 regmap and allow re-use of the Arria10 functions.
Only the regmap initialization differs (Arria10 mmio_regmap
vs Stratix10 custom regmap).

Modify the SDRAM probe function to handle Stratix10. Remove the
Stratix10 offset defines if Arria10 can be used. Remove the unused
Stratix10 probe function.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dinguyen@kernel.org
Cc: robh+dt@kernel.org
Cc: mark.rutland@arm.com
Cc: mchehab@kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/1537883342-30180-5-git-send-email-thor.thayer@linux.intel.com
2018-09-25 21:21:26 +02:00
Thor Thayer
d5fc912556 EDAC, altera: Combine Stratix10 and Arria10 probe functions
On Stratix10, the ECC offsets are similar to the existing
Arria10 functions and this can be leveraged to simplify
the EDAC driver as follows:

1. Fold Stratix10 specifics into Arria10 structures and
functions.

2. Implement the Stratix10 System Manager register accesses
using a custom regmap to allow use with the Arria10 System
Manager regmaps.

3. Stratix10 double bit errors are implemented as SError
instead of interrupts so use a panic notifier.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dinguyen@kernel.org
Cc: robh+dt@kernel.org
Cc: mark.rutland@arm.com
Cc: mchehab@kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/1537883342-30180-3-git-send-email-thor.thayer@linux.intel.com
2018-09-25 21:17:38 +02:00
YueHaibing
0751736905 EDAC, i7core: Remove set but not used variable pvt
Remove the unused local variable pvt:

  drivers/edac/i7core_edac.c: In function 'i7core_mce_check_error':
  drivers/edac/i7core_edac.c:1818:21: warning: variable 'pvt' set but not used \
	  [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1537841043-108267-1-git-send-email-yuehaibing@huawei.com
2018-09-25 12:02:28 +02:00
Fan Wu
c798c88f39 EDAC, ghes: Use CPER module handles to locate DIMMs
Use SMBIOS module handle type 17, on platforms which provide valid
ones, to locate the corresponding DIMM and thus have per-DIMM error
counter updates.

Signed-off-by: Fan Wu <wufan@codeaurora.org>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tyler Baicar <baicar.tyler@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: baicar.tyler@gmail.com
Cc: john.garry@huawei.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: shiju.jose@huawei.com
Cc: tanxiaofei@huawei.com
Cc: wanghuiqiang@huawei.com
Link: http://lkml.kernel.org/r/1537322340-1860-1-git-send-email-wufan@codeaurora.org
2018-09-22 18:35:40 +02:00
Qiuxu Zhuo
6f6da13604 EDAC: Correct DIMM capacity unit symbol
The {i3200|i7core|sb|skx}_edac drivers show DIMM capacity using the
wrong unit symbol: 'Mb' - megabit. Fix them by replacing 'Mb' with
'MiB' - mebibyte.

[Tony: These are all "edac_dbg()" messages, so this won't break scripts
       that parse console logs.]

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919003433.16475-1-tony.luck@intel.com
2018-09-22 18:18:57 +02:00
Luck, Tony
c968ed0859 EDAC, sb_edac: Fix signedness bugs in *_get_ha() functions
A static checker gave the following warnings:

  drivers/edac/sb_edac.c:1030 ibridge_get_ha() warn: signedness bug returning '(-22)'
  drivers/edac/sb_edac.c:1037 knl_get_ha() warn: signedness bug returning '(-22)'

Both because the functions are declared to return a "u8", but try to
return -EINVAL for the error case.

Fix by returning 0xff (since the caller doesn't look at, or pass on, the
return value).

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180914201905.GA30946@agluck-desk
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-09-15 11:41:08 +02:00
Channagoud Kadabi
27450653f1 drivers: edac: Add EDAC driver support for QCOM SoCs
Add error reporting driver for Single Bit Errors (SBEs) and Double Bit
Errors (DBEs). As of now, this driver supports error reporting for
Last Level Cache Controller (LLCC) of Tag RAM and Data RAM. Interrupts
are triggered when the errors happen in the cache, the driver handles
those interrupts and dumps the syndrome registers.

Signed-off-by: Channagoud Kadabi <ckadabi@codeaurora.org>
Signed-off-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
Co-developed-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
2018-09-13 15:54:05 -05:00
Qiuxu Zhuo
8489b17ce2 EDAC, sb_edac: Fix reporting for patrol scrubber errors
sb_edac sometimes reports the wrong DIMM for a memory error found by
the patrol scrubber. That is because the hardware provides only a 4KB
page-aligned address for the error case.

This means that the EDAC driver will point at the DIMM matching offset
0x0 in the 4KB page, but because of interleaving across channels and
ranks, the actual DIMM involved may be different if the error is on some
other cache line within the page.

Therefore, reconstruct the socket/iMC/channel information from the "mce"
structure passed to the EDAC driver. The DIMM cannot be determined, so
pass "dimm=-1" to the EDAC core. It will report that all the DIMMs on
that channel may be affected.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180907230828.13901-3-tony.luck@intel.com
[ Improve comments on the functions to convert bank number
  to memory controller number. Minor cleanup to commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message more. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-09-11 11:09:54 +02:00
Qiuxu Zhuo
dcc960b225 EDAC, sb_edac: Return early on ADDRV bit and address type test
Users of the mce_register_decode_chain() are called for every logged
error. EDAC drivers should check:

1) Is this a memory error? [bit 7 in status register]
2) Is there a valid address? [bit 58 in status register]
3) Is the address a system address? [bitfield 8:6 in misc register]

The sb_edac driver performed test "1" twice. Waited far too long to
perform check "2". Didn't do check "3" at all.

Fix it by moving the test for valid address from
sbridge_mce_output_error() into sbridge_mce_check_error() and add a test
for the type immediately after. Delete the redundant check for the type
of the error from sbridge_mce_output_error().

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com
[ Re-word commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-09-11 10:59:21 +02:00
David Frey
1c96a2f67c
regmap: split up regmap_config.use_single_rw
Split regmap_config.use_single_rw into use_single_read and
use_single_write. This change enables drivers of devices which only
support bulk operations in one direction to use the regmap_bulk_*()
functions for both directions and have their bulk operation split into
single operations only when necessary.

Update all struct regmap_config instances where use_single_rw==true to
instead set both use_single_read and use_single_write. No attempt was
made to evaluate whether it is possible to set only one of
use_single_read or use_single_write.

Signed-off-by: David Frey <dpfrey@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-09-07 13:03:55 +01:00
Andy Shevchenko
bd1852317f EDAC: Get rid of custom ICPU() macro
Replace custom grown macro with generic INTEL_CPU_FAM6() one.

No functional change intended.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180831082341.72363-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-09-03 12:18:35 +02:00
Michael Jin
8960de4a5c EDAC, amd64: Add Family 17h, models 10h-2fh support
Add new device IDs for family 17h, models 10h-2fh.

This is required by amd64_edac_mod in order to properly detect PCI
device functions 0 and 6.

Signed-off-by: Michael Jin <mikhail.jin@gmail.com>
Reviewed-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-08-27 06:17:21 +02:00
Takashi Iwai
b748f2de4b EDAC: Add missing MEM_LRDDR4 entry in edac_mem_types[]
The edac_mem_types[] array misses a MEM_LRDDR4 entry, which leads to
NULL pointer dereference when accessed via sysfs or such.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180810141426.8918-1-tiwai@suse.de
Fixes: 1e8096bb20 ("EDAC: Add LRDDR4 DRAM type")
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-08-17 15:13:34 +02:00
Masayoshi Mizuma
190bd6e98a EDAC, sb_edac: Add support for systems with segmented PCI buses
Extend the driver to check whether segment number and bus number matches
when deciding how to group memory controller PCI devices to CPU sockets.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180724190213.26359-1-msys.mizuma@gmail.com
[ Cleanup commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-07-25 11:17:15 +02:00
Kees Cook
6663484b4e EDAC, thunderx: Remove VLA usage
In the quest to remove all stack VLA usage from the kernel[1], switch to
using a kmalloc-allocated buffer instead of stack space. This should be
fine since the existing routine is allocating memory too.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180629184850.GA37464@beast
Link: https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com [1]
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-07-09 11:33:02 +02:00
Johan Hovold
6c974d4dfa EDAC, i7core: Fix memleaks and use-after-free on probe and remove
Make sure to free and deregister the addrmatch and chancounts devices
allocated during probe in all error paths. Also fix use-after-free in a
probe error path and in the remove success path where the devices were
being put before before deregistration.

Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 356f0a3086 ("i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy")
Link: http://lkml.kernel.org/r/20180612124335.6420-2-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-06-18 13:26:05 +02:00
Johan Hovold
4708aa85d5 EDAC: Fix memleak in module init error path
Make sure to use put_device() to free the initialised struct device so
that resources managed by driver core also gets released in the event of
a registration failure.

Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2d56b109e3 ("EDAC: Handle error path in edac_mc_sysfs_init() properly")
Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-06-17 13:09:44 +02:00
Christophe JAILLET
9d72fe1ce8 EDAC, altera: Fix an error handling path in altr_s10_sdram_probe()
If regmap_write() fails, we should release some resources as done in all
the other error handling paths of the function.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180610174532.22071-1-christophe.jaillet@wanadoo.fr
Fixes: e9918d7faf ("EDAC, altera: Handle SDRAM Uncorrectable Errors on Stratix10")
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-06-17 12:23:33 +02:00
Kees Cook
6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Borislav Petkov
eaa3a1d46c EDAC, ghes: Make platform-based whitelisting x86-only
ARM machines all have DMI tables so if they request hw error reporting
through GHES, then the driver should be able to detect DIMMs and report
errors successfully (famous last words :)).

Make the platform-based list x86-specific so that ghes_edac can load on
ARM.

Reported-by: Qiang Zheng <zhengqiang10@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Tested-by: Qiang Zheng <zhengqiang10@huawei.com>
Link: https://lkml.kernel.org/r/1526039543-180996-1-git-send-email-zhengqiang10@huawei.com
2018-05-21 12:18:57 +02:00
Thor Thayer
9ef20753e0 EDAC, altera: Fix ARM64 build warning
The kbuild test robot reported the following warning:

  drivers/edac/altera_edac.c: In function 'ocram_free_mem':
  drivers/edac/altera_edac.c:1410:42: warning: cast from pointer to integer
	of different size [-Wpointer-to-int-cast]
    gen_pool_free((struct gen_pool *)other, (u32)p, size);
                                             ^

After adding support for ARM64 architectures, the unsigned long
parameter is 64 bits and causes a build warning on 64-bit configs. Fix
by casting to the correct size (unsigned long) instead of u32.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: c3eea1942a ("EDAC, altera: Add Altera L2 cache and OCRAM support")
Link: http://lkml.kernel.org/r/1526317441-4996-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-15 11:01:26 +02:00
Randy Dunlap
de245ae073 EDAC, skx: Fix skx_edac build error when ACPI_NFIT=m
Prevent build error when CONFIG_ACPI_NFIT=m and CONFIG_EDAC_SKX=y by
limiting EDAC_SKX based on how ACPI_NFIT is set.

Fixes this build error:
  drivers/edac/skx_edac.o: In function `get_nvdimm_info':
  ../drivers/edac/skx_edac.c:399: undefined reference to `nfit_get_smbios_id'

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 58ca9ac146 ("EDAC, skx_edac: Detect non-volatile DIMMs")
Link: http://lkml.kernel.org/r/3af91354-8e19-d2af-1bba-ced8dce053f1@infradead.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-14 11:35:23 +02:00
Borislav Petkov
a0671c39dd EDAC, ghes: Use BIT() macro
... for improved readability. Also, add a local mask variable for the
same reason.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
2018-05-12 14:35:30 +02:00
Toshi Kani
ad0d73b324 EDAC, ghes: Add DDR4 and NVDIMM memory types
The ghes_edac driver obtains memory type from SMBIOS type 17, but it
does not recognize DDR4 and NVDIMM types.

Add support of DDR4 and NVDIMM types. NVDIMM type is denoted by memory
type DDR3/4 and non-volatile.

Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180509222030.9299-1-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-12 14:16:49 +02:00
Thor Thayer
e9918d7faf EDAC, altera: Handle SDRAM Uncorrectable Errors on Stratix10
On Stratix10, uncorrectable errors are routed to the SError exception
instead of the IRQ exceptions. In Stratix10, uncorrectable SErrors
must be treated as fatal and will cause a panic. Older Altera/Intel
parts printed out a message for UE so do that here using the notifier
framework.

Record the UE in sticky registers that retain the state through a reset.
Check these registers on probe and printout the error on startup.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: mchehab@kernel.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1526079610-5527-1-git-send-email-thor.thayer@linux.intel.com
[ Remove unused var in s10_edac_dberr_handler(), reorder args. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-12 12:29:41 +02:00
Thor Thayer
3dab6bd526 EDAC, altera: Add support for Stratix10 SDRAM EDAC
Support for Stratix10 SDRAM ECC requires the use of SMC calls to Secure
Monitor for accessing registers.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: catalin.marinas@arm.com
Cc: devicetree@vger.kernel.org
Cc: dinguyen@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: robh+dt@kernel.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1524854238-19394-3-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-12 11:13:39 +02:00
Alexandru Gagniuc
305d0e006a EDAC, ghes: Remove unused argument to ghes_edac_report_mem_error()
The use of the @ghes argument was removed in a previous commit, but
function signature was not updated to reflect this.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180430213358.8319-1-mr.nuke.me@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-12 10:59:15 +02:00
Colin Ian King
83e548bea4 EDAC, i7core: Fix spelling mistake: "redundacy" -> "redundancy"
Trivial fix to spelling mistake in err string.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180504113804.17103-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-04 14:59:29 +02:00
Sughosh Ganu
a66bdf5d11 EDAC, ghes: Add a null pointer check in ghes_edac_unregister()
Add a null check for ghes_pvt, before dereferencing it. The pointer
could still be null in case the return path is taken before initialising
ghes_pvt in the registration function.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Sughosh Ganu <sughosh.ganu@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1524737809-24475-1-git-send-email-sughosh.ganu@arm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-02 13:57:49 +02:00
Borislav Petkov
cc7f3f1326 ghes, EDAC: Fix ghes_edac registration
Tony reported seeing

  "Internal error: Can't find EDAC structure"

when injecting correctable errors due to the fact that ghes_edac would
still load even if the whitelist won't hit. Drop the pr_err() in
ghes_edac_report_mem_error() for now due to the hacky way how ghes_edac
depends on ghes.c.

While at it, make ghes_edac_register() return an error if it doesn't hit
in the whitelist as it is the only sensible thing to do in that
situation.

Furthermore, move the call to it to happen last in ghes_probe() so that
GHES initializing properly does not depend on ghes_edac init at all
as latter is only reporting errors and not required for GHES's proper
functioning.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Tested-by: Sughosh Ganu <sughosh.ganu@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180420182015.zao3olss4tvvlxki@agluck-desk
2018-05-02 13:57:30 +02:00
Linus Torvalds
dd972f924d * Add NVDIMM support to EDAC (Tony Luck)
* misc fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlrDZ6EACgkQEsHwGGHe
 VUpWWA//RG9WqqdXxPa05TxiRFWMiNSH/HSGJQZsc3K7xm2TEGVdDv5thYsxXCaW
 kA0oiUJA3zd62HeUIWMeGypm+hLM8T73836aIKcfxMduxQ6DR9nk51IV4RmHsLvT
 rd4bUsrjCov6spapqS265LOxZVnpKvxZv0NVTj6pIk0pkeghWMZh63vZtrxVkLsS
 6dHsdIDBr0fMhcFBVQ67SEDpMIyMAqziPOP9sv6NNElG4Q93RnZr6KYvQdE32Iev
 Z6lIUU/sjVJEuhgp6FjJrhvjc+FmGJGERsFqo3Z8RT9P26Cj3wq0ov8IY/KSBazG
 mfJFGbL+O3lppGKjjuAwMGr4R3fZMLQpgLxdYe7VNvOHB9ea7732q9MDGiNkN/u7
 z8o+dtvmPWr0Y1ir8te+7LnSQXIAwxMkKP/lMStzzuFUETQDd0RWr7GAKlbrgPOU
 +DhiQRPBFS7qpslA7BdqV6Ybr3nhDVZciDYWsvNpDMwdem05TxlFwC5l/BzP9NtM
 FmQ3B0rQ5we6gib6UZWUjW3BH9wUgFUpu/WXKJl1unMHwYNe3CvdM8yD5W7mUylh
 6SVYAmkZyuJ1Du40+YMeimroe77XV60IqoUFNEzMSFeB4/GxMG6+u2gUqNuTpHpZ
 aMmCRj9DxWGxZyDsTc49An0X/JbVyIE/Aq6heMTYScgBvCaD1VE=
 =K89j
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "Noteworthy is the NVDIMM support:

   - NVDIMM support to EDAC (Tony Luck)

   - misc fixes"

* tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, sb_edac: Remove variable length array usage
  EDAC, skx_edac: Detect non-volatile DIMMs
  firmware, DMI: Add function to look up a handle and return DIMM size
  acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle
  EDAC: Add new memory type for non-volatile DIMMs
  EDAC: Drop duplicated array of strings for memory type names
  EDAC, layerscape: Allow building for LS1021A
2018-04-05 14:21:13 -07:00
Linus Torvalds
f5a8eb632b arch: remove obsolete architecture ports
This removes the entire architecture code for blackfin, cris, frv, m32r,
 metag, mn10300, score, and tile, including the associated device drivers.
 
 I have been working with the (former) maintainers for each one to ensure
 that my interpretation was right and the code is definitely unused in
 mainline kernels. Many had fond memories of working on the respective
 ports to start with and getting them included in upstream, but also saw
 no point in keeping the port alive without any users.
 
 In the end, it seems that while the eight architectures are extremely
 different, they all suffered the same fate: There was one company
 in charge of an SoC line, a CPU microarchitecture and a software
 ecosystem, which was more costly than licensing newer off-the-shelf
 CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems
 that all the SoC product lines are still around, but have not used the
 custom CPU architectures for several years at this point. In contrast,
 CPU instruction sets that remain popular and have actively maintained
 kernel ports tend to all be used across multiple licensees.
 
 The removal came out of a discussion that is now documented at
 https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
 marking any ports as deprecated but remove them all at once after I made
 sure that they are all unused. Some architectures (notably tile, mn10300,
 and blackfin) are still being shipped in products with old kernels,
 but those products will never be updated to newer kernel releases.
 
 After this series, we still have a few architectures without mainline
 gcc support:
 
 - unicore32 and hexagon both have very outdated gcc releases, but the
   maintainers promised to work on providing something newer. At least
   in case of hexagon, this will only be llvm, not gcc.
 
 - openrisc, risc-v and nds32 are still in the process of finishing their
   support or getting it added to mainline gcc in the first place.
   They all have patched gcc-7.3 ports that work to some degree, but
   complete upstream support won't happen before gcc-8.1. Csky posted
   their first kernel patch set last week, their situation will be similar.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJawdL2AAoJEGCrR//JCVInuH0P/RJAZh1nTD+TR34ZhJq2TBoo
 PgygwDU7Z2+tQVU+EZ453Gywz9/NMRFk1RWAZqrLix4ZtyIMvC6A1qfT2yH1Y7Fb
 Qh6tccQeLe4ezq5u4S/46R/fQXu3Txr92yVwzJJUuPyU0arF9rv5MmI8e6p7L1en
 yb74kSEaCe+/eMlsEj1Cc1dgthDNXGKIURHkRsILoweysCpesjiTg4qDcL+yTibV
 FP2wjVbniKESMKS6qL71tiT5sexvLsLwMNcGiHPj94qCIQuI7DLhLdBVsL5Su6gI
 sbtgv0dsq4auRYAbQdMaH1hFvu6WptsuttIbOMnz2Yegi2z28H8uVXkbk2WVLbqG
 ZESUwutGh8MzOL2RJ4jyyQq5sfo++CRGlfKjr6ImZRv03dv0pe/W85062cK5cKNs
 cgDDJjGRorOXW7dyU6jG2gRqODOQBObIv3w5efdq5OgzOWlbI4EC+Y5u1Z0JF/76
 pSwtGXA6YhwC+9LLAlnVTHG+yOwuLmAICgoKcTbzTVDKA2YQZG/cYuQfI5S1wD8e
 X6urPx3Md2GCwLXQ9mzKBzKZUpu/Tuhx0NvwF4qVxy6x1PELjn68zuP7abDHr46r
 57/09ooVN+iXXnEGMtQVS/OPvYHSa2NgTSZz6Y86lCRbZmUOOlK31RDNlMvYNA+s
 3iIVHovno/JuJnTOE8LY
 =fQ8z
 -----END PGP SIGNATURE-----

Merge tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pul removal of obsolete architecture ports from Arnd Bergmann:
 "This removes the entire architecture code for blackfin, cris, frv,
  m32r, metag, mn10300, score, and tile, including the associated device
  drivers.

  I have been working with the (former) maintainers for each one to
  ensure that my interpretation was right and the code is definitely
  unused in mainline kernels. Many had fond memories of working on the
  respective ports to start with and getting them included in upstream,
  but also saw no point in keeping the port alive without any users.

  In the end, it seems that while the eight architectures are extremely
  different, they all suffered the same fate: There was one company in
  charge of an SoC line, a CPU microarchitecture and a software
  ecosystem, which was more costly than licensing newer off-the-shelf
  CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
  seems that all the SoC product lines are still around, but have not
  used the custom CPU architectures for several years at this point. In
  contrast, CPU instruction sets that remain popular and have actively
  maintained kernel ports tend to all be used across multiple licensees.

  [ See the new nds32 port merged in the previous commit for the next
    generation of "one company in charge of an SoC line, a CPU
    microarchitecture and a software ecosystem"   - Linus ]

  The removal came out of a discussion that is now documented at
  https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
  marking any ports as deprecated but remove them all at once after I
  made sure that they are all unused. Some architectures (notably tile,
  mn10300, and blackfin) are still being shipped in products with old
  kernels, but those products will never be updated to newer kernel
  releases.

  After this series, we still have a few architectures without mainline
  gcc support:

   - unicore32 and hexagon both have very outdated gcc releases, but the
     maintainers promised to work on providing something newer. At least
     in case of hexagon, this will only be llvm, not gcc.

   - openrisc, risc-v and nds32 are still in the process of finishing
     their support or getting it added to mainline gcc in the first
     place. They all have patched gcc-7.3 ports that work to some
     degree, but complete upstream support won't happen before gcc-8.1.
     Csky posted their first kernel patch set last week, their situation
     will be similar

  [ Palmer Dabbelt points out that RISC-V support is in mainline gcc
    since gcc-7, although gcc-7.3.0 is the recommended minimum  - Linus ]"

This really says it all:

 2498 files changed, 95 insertions(+), 467668 deletions(-)

* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
  MAINTAINERS: UNICORE32: Change email account
  staging: iio: remove iio-trig-bfin-timer driver
  tty: hvc: remove tile driver
  tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
  serial: remove tile uart driver
  serial: remove m32r_sio driver
  serial: remove blackfin drivers
  serial: remove cris/etrax uart drivers
  usb: Remove Blackfin references in USB support
  usb: isp1362: remove blackfin arch glue
  usb: musb: remove blackfin port
  usb: host: remove tilegx platform glue
  pwm: remove pwm-bfin driver
  i2c: remove bfin-twi driver
  spi: remove blackfin related host drivers
  watchdog: remove bfin_wdt driver
  can: remove bfin_can driver
  mmc: remove bfin_sdh driver
  input: misc: remove blackfin rotary driver
  input: keyboard: remove bf54x driver
  ...
2018-04-02 20:20:12 -07:00
Arnd Bergmann
0833f7634f edac: remove tile driver
The Tile architecture is obsolete and getting removed from the kernel,
this driver appears to only be used there, and not on the ARM based
successors (Tile-Mx, BlueField), so we should remove it as well.

Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-03-26 15:56:17 +02:00
Gustavo A. R. Silva
6fd0526652 EDAC, sb_edac: Remove variable length array usage
In preparation for enabling -Wvla, remove VLA and replace it with a
fixed-length array instead.

Also, remove max_interleave as it is no longer needed.

Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180314182131.GA25259@embeddedgus
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-03-17 05:24:55 +01:00
Tony Luck
58ca9ac146 EDAC, skx_edac: Detect non-volatile DIMMs
This just covers the topology function of the EDAC driver. We locate
which DIMM slots are populated with NVDIMMs and query the NFIT and
SMBIOS tables to get the size.

Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-6-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-03-15 00:33:55 +01:00
Tony Luck
001f86137d EDAC: Add new memory type for non-volatile DIMMs
There are now non-volatile versions of DIMMs. Add a new entry to "enum
mem_type" and a new string in edac_mem_types[].

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-03-14 12:32:06 +01:00
Tony Luck
d6dd77ebcd EDAC: Drop duplicated array of strings for memory type names
Somehow we ended up with two separate arrays of strings to describe the
"enum mem_type" values.

In edac_mc.c we have an exported list edac_mem_types[] that is used
by a couple of drivers in debug messaged.

In edac_mc_sysfs.c we have a private list that is used to display
values in:
  /sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
  /sys/devices/system/edac/mc/mc*/csrow*/mem_type

This list was missing a value for MEM_LRDDR3.

The string values in the two lists were different :-(

Combining the lists, I kept the values so that the sysfs output
will be unchanged as some scripts may depend on that.

Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-03-14 12:20:16 +01:00
Thomas Gleixner
422caa5f7a Merge branch 'ras/urgent' into ras/core
Pick up urgent fixes to apply further development changes.
2018-03-08 15:52:08 +01:00
Rasmus Villemoes
28dd6726e7 EDAC, layerscape: Allow building for LS1021A
The LS1021A has a memory controller supported by this driver. It builds
just fine, and I've done some rudimentary testing using the error
injection facility, which suggests that it is indeed working.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: York Sun <york.sun@nxp.com>
Cc: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180220150912.2954-1-rasmus.villemoes@prevas.dk
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-02-27 14:57:47 +01:00
Anna Karbownik
bf8486709a EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNL
Commit

  3286d3eb90 ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")

decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights
Landing which supports up to 6 channels.

This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm
variables which don't pay critical role on KNL code path, so the memory
corruption wasn't causing any visible driver failures.

The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that.

An alternative solution would be to restructure the KNL part of the
driver to 2MC/3channel representation.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Karbownik <anna.karbownik@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: jim.m.snow@intel.com
Cc: krzysztof.paliswiat@intel.com
Cc: lukasz.odzioba@intel.com
Cc: qiuxu.zhuo@intel.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 3286d3eb90 ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-02-23 12:05:37 +01:00
Yazen Ghannam
68627a697c x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
bank 4 in the smca_banks array. This means that when we check if a bank
is initialized, like during boot or resume, we will see that bank 4 is
not initialized and try to initialize it.

This will cause a call trace, when resuming from suspend, due to
rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
an IPI but we're running with interrupts disabled. This triggers:

  WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
  ...

Reserved banks will be read-as-zero, so their MCA_IPID register will be
zero. So, like the smca_banks array, the threshold_banks array will not
have an entry for a reserved bank since all its MCA_MISC* registers will
be zero.

Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.

Use the "Reserved" type when checking if a bank is reserved. It's
possible that other bank numbers may be reserved on future systems.

Don't try to find the block address on reserved banks.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 17:00:54 +01:00
Linus Torvalds
d4667ca142 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
 "Here's the latest set of Spectre and PTI related fixes and updates:

  Spectre:
   - Add entry code register clearing to reduce the Spectre attack
     surface
   - Update the Spectre microcode blacklist
   - Inline the KVM Spectre helpers to get close to v4.14 performance
     again.
   - Fix indirect_branch_prediction_barrier()
   - Fix/improve Spectre related kernel messages
   - Fix array_index_nospec_mask() asm constraint
   - KVM: fix two MSR handling bugs

  PTI:
   - Fix a paranoid entry PTI CR3 handling bug
   - Fix comments

  objtool:
   - Fix paranoid_entry() frame pointer warning
   - Annotate WARN()-related UD2 as reachable
   - Various fixes
   - Add Add Peter Zijlstra as objtool co-maintainer

  Misc:
   - Various x86 entry code self-test fixes
   - Improve/simplify entry code stack frame generation and handling
     after recent heavy-handed PTI and Spectre changes. (There's two
     more WIP improvements expected here.)
   - Type fix for cache entries

  There's also some low risk non-fix changes I've included in this
  branch to reduce backporting conflicts:

   - rename a confusing x86_cpu field name
   - de-obfuscate the naming of single-TLB flushing primitives"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  x86/entry/64: Fix CR3 restore in paranoid_exit()
  x86/cpu: Change type of x86_cache_size variable to unsigned int
  x86/spectre: Fix an error message
  x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
  selftests/x86/mpx: Fix incorrect bounds with old _sigfault
  x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
  x86/speculation: Add <asm/msr-index.h> dependency
  nospec: Move array_index_nospec() parameter checking into separate macro
  x86/speculation: Fix up array_index_nospec_mask() asm constraint
  x86/debug: Use UD2 for WARN()
  x86/debug, objtool: Annotate WARN()-related UD2 as reachable
  objtool: Fix segfault in ignore_unreachable_insn()
  selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
  selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
  selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
  selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
  selftests/x86/pkeys: Remove unused functions
  selftests/x86: Clean up and document sscanf() usage
  selftests/x86: Fix vDSO selftest segfault for vsyscall=none
  x86/entry/64: Remove the unused 'icebp' macro
  ...
2018-02-14 17:02:15 -08:00
Jia Zhang
b399151cb4 x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
x86_mask is a confusing name which is hard to associate with the
processor's stepping.

Additionally, correct an indent issue in lib/cpu.c.

Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15 01:15:52 +01:00
Christophe JAILLET
68fa24f912 EDAC, mv64x60: Fix an error handling path
We should not call edac_mc_del_mc() if a corresponding call to
edac_mc_add_mc() has not been performed yet.

So here, we should go to err instead of err2 to branch at the right
place of the error handling path.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180107205400.14068-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-01-09 20:14:23 +01:00
Tero Kristo
86a18ee21e EDAC, ti: Add support for TI keystone and DRA7xx EDAC
TI Keystone and DRA7xx SoCs have support for EDAC on DDR3 memory that can
correct one bit errors and detect two bit errors. Add EDAC driver for this
feature which plugs into the generic kernel EDAC framework.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-omap@vger.kernel.org
Link: http://lkml.kernel.org/r/1510578490-14510-1-git-send-email-t-kristo@ti.com
[ Add SPDX tag and make _emif_get_id() use edac_printk(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-11-27 13:51:19 +01:00
James Hogan
544e92581a EDAC, octeon: Fix an uninitialized variable warning
Fix an uninitialized variable warning in the Octeon EDAC driver, as seen
in MIPS cavium_octeon_defconfig builds since v4.14 with Codescape GNU
Tools 2016.05-03:

  drivers/edac/octeon_edac-lmc.c In function ‘octeon_lmc_edac_poll_o2’:
  drivers/edac/octeon_edac-lmc.c:87:24: warning: ‘((long unsigned int*)&int_reg)[1]’ may \
    be used uninitialized in this function [-Wmaybe-uninitialized]
    if (int_reg.s.sec_err || int_reg.s.ded_err) {
                        ^
Iinitialise the whole int_reg variable to zero before the conditional
assignments in the error injection case.

Signed-off-by: James Hogan <jhogan@kernel.org>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.15+
Fixes: 1bc021e815 ("EDAC: Octeon: Add error injection support")
Link: http://lkml.kernel.org/r/20171113161206.20990-1-james.hogan@mips.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-11-27 11:57:26 +01:00
Linus Torvalds
1be2172e96 Modules updates for v4.15
Summary of modules changes for the 4.15 merge window:
 
 - Treewide module_param_call() cleanup, fix up set/get function
   prototype mismatches, from Kees Cook
 
 - Minor code cleanups
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
 Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
 2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
 iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
 D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
 bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
 5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
 6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
 kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
 ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
 7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
 fbGX8juZWP4ypbK+irTB
 =c8vb
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "Summary of modules changes for the 4.15 merge window:

   - treewide module_param_call() cleanup, fix up set/get function
     prototype mismatches, from Kees Cook

   - minor code cleanups"

* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Do not paper over type mismatches in module_param_call()
  treewide: Fix function prototypes for module_param_call()
  module: Prepare to convert all module_param_call() prototypes
  kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
2017-11-15 13:46:33 -08:00
Linus Torvalds
8e9a2dba86 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Another attempt at enabling cross-release lockdep dependency
     tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
     with better performance and fewer false positives. (Byungchul Park)

   - Introduce lockdep_assert_irqs_enabled()/disabled() and convert
     open-coded equivalents to lockdep variants. (Frederic Weisbecker)

   - Add down_read_killable() and use it in the VFS's iterate_dir()
     method. (Kirill Tkhai)

   - Convert remaining uses of ACCESS_ONCE() to
     READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
     driven. (Mark Rutland, Paul E. McKenney)

   - Get rid of lockless_dereference(), by strengthening Alpha atomics,
     strengthening READ_ONCE() with smp_read_barrier_depends() and thus
     being able to convert users of lockless_dereference() to
     READ_ONCE(). (Will Deacon)

   - Various micro-optimizations:

        - better PV qspinlocks (Waiman Long),
        - better x86 barriers (Michael S. Tsirkin)
        - better x86 refcounts (Kees Cook)

   - ... plus other fixes and enhancements. (Borislav Petkov, Juergen
     Gross, Miguel Bernal Marin)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
  rcu: Use lockdep to assert IRQs are disabled/enabled
  netpoll: Use lockdep to assert IRQs are disabled/enabled
  timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
  sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
  irq_work: Use lockdep to assert IRQs are disabled/enabled
  irq/timings: Use lockdep to assert IRQs are disabled/enabled
  perf/core: Use lockdep to assert IRQs are disabled/enabled
  x86: Use lockdep to assert IRQs are disabled/enabled
  smp/core: Use lockdep to assert IRQs are disabled/enabled
  timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
  timers/nohz: Use lockdep to assert IRQs are disabled/enabled
  workqueue: Use lockdep to assert IRQs are disabled/enabled
  irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
  locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
  locking/pvqspinlock: Implement hybrid PV queued/unfair locks
  locking/rwlocks: Fix comments
  x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
  block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
  workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
  ...
2017-11-13 12:38:26 -08:00
Linus Torvalds
1ec1699122 The usual pile of bugfixes, cleanups and minor driver enhancements.
Worth noting are the changes to ghes_edac to use a whitelist of
 known-good platforms on which GHES error reporting works relatively
 reliably. By Toshi Kani and Borislav Petkov.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAloIt2sACgkQEsHwGGHe
 VUpzmw/+PETRJYIwp9TnmLHsNjVA/TPFHmjQqYc+jR9sWzTdrsZqI1JbP6GNC4l6
 0bMdcyCO5J57F8xLKkgYC5sv24m1vL8MKx0+sQSI+EXupCyqDhhclU3c2rfSj0Um
 +qpPV4MnewqJNZlenKKHC+R2wADSsijIpIiMeV682/8HCf0EB68P5m6NRMSqL+hz
 IVOD2xAyn+3dDw4ZazJgra5xSVt+xJdt+x0/bvovPapzvF81b+PCB4XXVm4J5d4f
 gi0AwDbmyLpMx2HK5vLT5JgcbGg6aUS7OhwcZLSwGuWJTTsYM5uEk+F+dj3pxNge
 hk0JSeCLm0S4f60p496VuiweIOrkk4feDbz+RAK9GJaR270K621Zb/06jCybB6aB
 htR/VQS50BgCyehYzSWw9VAOcsEvNPHKpYSGkrDwPRkFZOxfigdnKb4MN8XbGund
 bpQqx6VlgPhV32v0F1F893IIPHrcuc7qFo4KJPSGnP58+iBOeJ6iy2vlUdtMIfyL
 opjS0nqg7B6pXCcANLFRWV0iPTH4e1PWAMGSSxp64cqjuEmHPtlvQGinevBq56Ob
 HpxbtnOqen14xxMulkPPHBw3yKWQA0/teAGcSn2D5P/Qv0X3aTGsifr3Ruk/Cl0E
 wQaGBNcn+5xJ8nhBS7GdUDdyfE5o9zW8wjw3eZEphREoOM3N8ko=
 =XEah
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "The usual pile of bugfixes, cleanups and minor driver enhancements.

  Worth noting are the changes to ghes_edac to use a whitelist of
  known-good platforms on which GHES error reporting works relatively
  reliably. By Toshi Kani and Borislav Petkov"

* tag 'edac_for_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, sb_edac: Fix missing break in switch
  MAINTAINERS: Split Cavium EDAC entry and add myself
  EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode
  EDAC, skx_edac: Handle systems with segmented PCI busses
  EDAC, thunderx: Remove suspend/resume support
  EDAC, skx_edac: Fix detection of single-rank DIMMs
  EDAC, sb_edac: Don't create a second memory controller if HA1 is not present
  EDAC: Add owner check to the x86 platform drivers
  EDAC: Add helper which returns the loaded platform driver
  EDAC, ghes: Add platform check
  EDAC, ghes: Model a single, logical memory controller
  EDAC, ghes: Remove symbol exports
  EDAC: Handle return value of kasprintf()
2017-11-13 08:54:06 -08:00
Ingo Molnar
8c5db92a70 Merge branch 'linus' into locking/core, to resolve conflicts
Conflicts:
	include/linux/compiler-clang.h
	include/linux/compiler-gcc.h
	include/linux/compiler-intel.h
	include/uapi/linux/stddef.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 10:32:44 +01:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Kees Cook
e4dca7b7aa treewide: Fix function prototypes for module_param_call()
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:

@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@

 module_param_call(_name, _set_func, _get_func, _arg, _mode);

@fix_set_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _set_func(
-_val_type _val
+const char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

@fix_get_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _get_func(
-_val_type _val
+char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:

	drivers/platform/x86/thinkpad_acpi.c
	fs/lockd/svc.c

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2017-10-31 15:30:37 +01:00
Mark Rutland
332efa6374 locking/atomics, EDAC/altera: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE()
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the Altera EDAC code and comments to use
{READ,WRITE}_ONCE() consistently.

----
virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-2-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:00:56 +02:00
Gustavo A. R. Silva
a8e9b186f1 EDAC, sb_edac: Fix missing break in switch
Add missing break statement in order to prevent the code from falling
through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171016174029.GA19757@embeddedor.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-10-19 10:53:42 +02:00
Luis Felipe Sandoval Castro
24281a2f4c EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode
When figuring out the size of the DIMMs and the cluster mode is SNC2 or SNC4 the
current algorithm ignores the contribution of some of the channels resulting in
EDAC never knowing of the existence of some DIMMs attached to such channels (thus
sysfs is not populated).

Instead of selectively iterating from 0 to interlv_ways when looking for all the
participants in the interleave, do an exhaustive search and iterate from 0 to
KNL_MAX_CHANNELS. The algorithm is already smart enough to consider participants
only one time.

This works fine in all KNL cluster modes and even when there are missing DIMMs
as the contribution of those channels is 0.

Signed-off-by: Luis Felipe Sandoval Castro <luis.felipe.sandoval.castro@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: arozansk@redhat.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: qiuxu.zhuo@intel.com
Link: http://lkml.kernel.org/r/1506606882-90521-1-git-send-email-luis.felipe.sandoval.castro@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-10-11 15:57:25 +02:00
Tony Luck
88ae80aa60 EDAC, skx_edac: Handle systems with segmented PCI busses
Large systems separate their PCI busses into segments since
the limit of only 256 PCI busses can be too restrictive.

Extend this driver to check whether <segment, bus-number> matches
when deciding how to group memory controller PCI devices to
CPU sockets.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Charles Rose <charles.rose@dell.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/f58abfd10bf73c8bc5adc1fe4de7408128b00625.1506358467.git.tony.luck@intel.com
[ Make skx_dev.seg an int. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-28 18:01:55 +02:00
Jan Glauber
f821fe8cc7 EDAC, thunderx: Remove suspend/resume support
The memory controller on ThunderX/OcteonTX systems does not support
power management. Therefore remove the suspend/resume callbacks.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan Glauber <jglauber@cavium.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Zhangshaokun <zhangshaokun@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20170925123502.17289-2-jglauber@cavium.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-27 17:35:58 +02:00
Charles Rose
a9c0a10888 EDAC, skx_edac: Fix detection of single-rank DIMMs
Single-rank DIMMs did not get detected on Skylake/Kabylake systems due
to wrong limit check. The single rank DIMM check is a simple typo. "0"
is a legal value in this field meaning single rank.

Signed-off-by: Charles Rose <charles.rose@dell.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/66df72d327c265fbf92fe25df96daa228a35f076.1506358467.git.tony.luck@intel.com
[ Also fix debug message to print number of ranks. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Expand commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-27 17:31:38 +02:00
Qiuxu Zhuo
15cc3ae001 EDAC, sb_edac: Don't create a second memory controller if HA1 is not present
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: e2f747b1f4 ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-27 12:15:43 +02:00
Toshi Kani
301375e764 EDAC: Add owner check to the x86 platform drivers
Change x86 EDAC platform drivers to verify the module owner at the
beginning of their module init functions. This allows them to fail their
init immediately when ghes_edac is enabled. Similar change can be made
to other edac drivers if necessary.

Also, remove ".c" from module names of pnp2_edac, sb_edac, and skx_edac.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-6-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-25 13:09:39 +02:00
Toshi Kani
3877c7d1e2 EDAC: Add helper which returns the loaded platform driver
Only a single EDAC platform driver can be loaded. When ghes_edac is
enabled, an EDAC platform driver still attempts to register itself and
fails in edac_mc_add_mc().

Add edac_get_owner() so that EDAC platform drivers can check the owner
first.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-5-toshi.kani@hpe.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-25 12:55:59 +02:00
Toshi Kani
5deed6b6a4 EDAC, ghes: Add platform check
The ghes_edac driver was introduced in 2013 [1], but it has not been
enabled by any distro yet. This driver obtains error info from firmware
interfaces (APEI), which are not properly implemented on many platforms,
as the driver says on load:

  This EDAC driver relies on BIOS to enumerate memory and get error
  reports. Unfortunately, not all BIOSes reflect the memory layout
  correctly. So, the end result of using this driver varies from vendor
  to vendor. If you find incorrect reports, please contact your hardware
  vendor to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on platforms that are known to have proper APEI
firmware implementation.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lkml.kernel.org/r/cover.1360931635.git.mchehab@redhat.com

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-4-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-25 12:55:12 +02:00
Borislav Petkov
0fe5f281f7 EDAC, ghes: Model a single, logical memory controller
We're enumerating the DIMMs through a DMI walk and since we can't get
any more detailed topological information about which DIMMs belong to
which memory controller, convert it to a single, logical controller
which contains all the DIMMs.

The error reporting path from GHES ghes_edac_report_mem_error() doesn't
get called in NMI context but add a warning about it to catch any
changes in the future as if so, our locking scheme will be insufficient
then.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-25 12:35:28 +02:00
Borislav Petkov
c9c8b4d6d0 EDAC, ghes: Remove symbol exports
They're called from builtin code so no need for the exports.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-25 12:35:27 +02:00
Arvind Yadav
75f029c3a8 EDAC: Handle return value of kasprintf()
kasprintf() can fail and we must check its return value.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: linux-edac@vger.kernel.org
[ Merged into a single patch, small formatting fixups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-21 12:18:44 +02:00
Borislav Petkov
398443471f EDAC, mce_amd: Get rid of local var in amd_filter_mce()
... and use the macro for that.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21 17:59:38 +02:00
Borislav Petkov
f3c0891c2f EDAC, mce_amd: Get rid of most struct cpuinfo_x86 uses
struct mce.cpuid contains CPUID(1).EAX which contains family, model and
stepping and thus has enough information for our purposes. Thus get rid
of some external dependencies which are not really needed.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21 17:54:57 +02:00
Borislav Petkov
4ab1784b48 EDAC, mce_amd: Rename decode_smca_errors() to decode_smca_error()
Singular fits better because it decodes a single error.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21 17:44:09 +02:00
Bhumika Goyal
b2b3e7362e EDAC: Make device_type const
Make these const as they are only stored in the type field of a device
structure, which is const.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-20 13:12:32 +02:00
Qiuxu Zhuo
bc8f10babc EDAC, pnd2: Properly toggle hidden state for P2SB PCI device
Properly handle hidden state of P2SB PCI device (DEV:D, FUN:0) for
Apollo Lake.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170814154905.21707-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19 10:51:11 +02:00
Qiuxu Zhuo
5fd77cb3ba EDAC, pnd2: Conditionally unhide/hide the P2SB PCI device to read BAR
On Deverton server, the P2SB PCI device (DEV:1F, FUN:1) is used by multiple
device drivers.

If it's hidden by some device driver (e.g. with the i801 I2C driver,
the commit

  9424693035 ("i2c: i801: Create iTCO device on newer Intel PCHs")

unconditionally hid the P2SB PCI device wrongly) it will make the
pnd2_edac driver read out an invalid BAR value of 0xffffffff and then
fail on ioremap().

Therefore, store the presence state of P2SB PCI device before unhiding
it for reading BAR and restore the presence state after reading BAR.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-i2c@vger.kernel.org
Link: http://lkml.kernel.org/r/20170814154845.21663-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19 10:47:24 +02:00
Qiuxu Zhuo
d84676a9e1 EDAC, pnd2: Mask off the lower four bits of a BAR
Bit[0] of BAR is always zero. Bit[2:1] and bit[3] of BAR contain the
information of 'type' and the 'prefetchable' accordingly. Therefore,
mask the lower four bits to retrieve the actual base address of a BAR.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170814154813.21619-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19 10:33:30 +02:00
Christophe JAILLET
3eaef0fa39 EDAC, thunderx: Fix error handling path in thunderx_lmc_probe()
Return the proper error value if ioremap() fails (and not 0).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20170816045821.14165-1-christophe.jaillet@wanadoo.fr
[ Massage commit message, remove newline. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-18 19:12:08 +02:00
Christophe JAILLET
8b073d945c EDAC, altera: Fix error handling path in altr_edac_device_probe()
Return the proper error value if devm_ioremap() fails (and not 0).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170816050506.14541-1-christophe.jaillet@wanadoo.fr
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-18 18:21:27 +02:00
Tony Luck
3e5d2bd191 EDAC, pnd2: Build in a minimal sideband driver for Apollo Lake
I've been waing a long time for the generic sideband driver to
appear. Patience has run out, so include the minimum here to
just read registers.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Patrick Geary <patrickg@supermicro.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170803210536.5662-1-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-04 05:58:23 +02:00
Qiuxu Zhuo
039d7af651 EDAC, sb_edac: Classify memory mirroring modes
Basically, there are full memory mirroring and address range partial
memory mirroring (supported by Haswell EX and Broadwell EX) modes.

a) In full memory mirroring, the memory behind each memory controller
   is mirrored, i.e. the memory is split into two identical mirrors
   (primary and secondary), half of the memory is reserved for redundancy.

b) In address range partial memory mirroring, the memory size (range)
   of primary and secondary behind each memory controller can be user
   defined by the TAD0 register. The rest of memory ranges defined by
   TAD1/TAD2/... in that memory controller are non-mirrored.

For more detail on memory mirroring, see the following link written by Tony Luck:

  https://01.org/lkp/blogs/tonyluck/2016/address-range-partial-memory-mirroring-linux

Currently the sb_edac driver only supports address decoding in full
memory mirroring and non-mirroring modes. In address range partial
memory mirroring mode, it may fail to decode an address that falls in a
non-mirroring area (the following was one of this kind of failed logs).

  mce: Uncorrected hardware memory error in user-access at 566d53a400
  Memory failure: 0x566d53a: Killing einj_mem_uc:4647 due to hardware memory corruption
  Memory failure: 0x566d53a: recovery action for dirty LRU page: Recovered
  mce: [Hardware Error]: Machine check events logged
  EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
  EDAC sbridge MC1: CPU 48: Machine Check Event: 0 Bank 7: ec00000000010090
  EDAC sbridge MC1: TSC 4b914aa5a99dab
  EDAC sbridge MC1: ADDR 566d53a400
  EDAC sbridge MC1: MISC 1443a0c86
  EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1499712764 SOCKET 2 APIC 80
  EDAC MC1: 0 UE Can't discover the memory rank for ch addr 0x7fb54e900 on any memory ( page:0x0 offset:0x0 grain:32)
  mce: [Hardware Error]: Machine check events logged

Therefore, classify memory mirroring modes and make the address decoding
in address range partial memory mode correct.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170730180651.30060-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-02 05:40:11 +02:00
Rob Herring
2efdda4a41 EDAC, cpc925, ppc4xx: Convert to using %pOF instead of full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing of
the full path string for each node.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170718214339.7774-19-robh@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-19 07:42:41 +02:00
Borislav Petkov
c54182ec0e EDAC: Get rid of mci->mod_ver
It is a write-only variable so get rid of it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Loc Ho <lho@apm.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
2017-07-17 13:42:48 +02:00
Arvind Yadav
1c18be5a4e EDAC: Constify attribute_group structures
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
CC: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/776cb8265509054abd01b0b551624cc0da3b88e7.1499078335.git.arvind.yadav.cs@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17 10:20:25 +02:00
Yazen Ghannam
fbe63acf62 EDAC, mce_amd: Use cpu_to_node() to find the node ID
Using the homegrown amd_get_nb_id() to find a node ID on AMD was fine
while the L3 to node mapping was 1:1. And Zen topology broke this. So
let's start slowly moving away from it and use the topology interfaces
instead.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1490041614-90057-2-git-send-email-Yazen.Ghannam@amd.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17 07:01:08 +02:00
Tony Luck
164c29244d EDAC, pnd2: Fix Apollo Lake DIMM detection
Non-existent or empty DIMM slots result in error return from
RD_REGP(). But we shouldn't give up on failure.

So long as we find at least one DIMM we can continue.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170628234407.21521-1-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29 10:37:50 +02:00
Jérémy Lefaure
a8c8261425 EDAC, i5000, i5400: Fix definition of NRECMEMB register
In the i5000 and i5400 drivers, the NRECMEMB register is defined as a
16-bit value, which results in wrong shifts in the code, as reported by
sparse.

In the datasheets ([1], section 3.9.22.20 and [2], section 3.9.22.21),
this register is a 32-bit register. A u32 value for the register fixes
the wrong shifts warnings and matches the datasheet.

Also fix the mask to access to the CAS bits [27:16] in the i5000 driver.

[1]: https://www.intel.com/content/dam/doc/datasheet/5000p-5000v-5000z-chipset-memory-controller-hub-datasheet.pdf
[2]: https://www.intel.se/content/dam/doc/datasheet/5400-chipset-memory-controller-hub-datasheet.pdf

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170629005729.8478-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29 10:33:13 +02:00
Colin Ian King
77641dacea EDAC, pnd2: Make function sbi_send() static
The function sbi_send() is local to just pnd2_edac.c and does not need
to be in global scope, so make it static.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170623084855.9197-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-26 16:13:25 +02:00
Gustavo A. R. Silva
ee514c7a23 EDAC, pnd2: Return proper error value from apl_rd_reg()
Add code comment to make it clear that the fall-through is intentional
and, OR ret with its previous value to avoid overwriting it so that
callers can check the correct return value.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170622220535.GA4896@embeddedgus
[ Massage a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-23 09:48:50 +02:00
Chris Packham
ff0abed492 EDAC, altera: Simplify calculation of total memory
Use of_address_to_resource() and resource_size() instead of manually
parsing the "reg" property from the "memory" node(s).

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170606235500.22772-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-14 13:49:25 +02:00
Qiuxu Zhuo
133e4455c9 EDAC, sb_edac: Avoid creating SOCK memory controller
Xiaolong Ye reported the following failure on Broadwell D server:

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sbridge_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

Broadwell D (only IMC0 per socket) and Broadwell X (IMC0 and IMC1 per
socket) use the same PCI device IDs for IMC0 per socket, then they
share pci_dev_descr_broadwell_table (n_imcs_per_sock=2). In this case,
Broadwell D wrongly creates the nonexistent SOCK EDAC memory controller
and reports above error messages, since it has no IMC1 per socket.

Avoid creating the nonexistent SOCK memory controller.

Reported-and-tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170608113351.25323-1-qiuxu.zhuo@intel.com
[ Massage. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-14 11:53:39 +02:00
Yazen Ghannam
bdf1bf1744 EDAC, mce_amd: Fix typo in SMCA error description
Fix typo in "poison consumption" error description.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1497286703-62853-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-12 19:03:55 +02:00
Chris Packham
3b405e30cb EDAC, mv64x60: Sanity check edac_op_state before registering
edac_op_state is a module parameter which affects the behaviour of
the driver probe which can potentially be invoked as soon as the
platform driver registration happens. Because of this we need to
ensure that we sanity check the module parameter before calling
platform_register_drivers().

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170607215530.8604-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-09 11:55:55 +02:00
Vadim Lomovtsev
cf97825862 EDAC, thunderx: Fix a warning during l2c debugfs node creation
Compare the number of debugfs entries created by
thunderx_create_debugfs_nodes() with the requested number of entries to
properly determine whether to print a warning.

Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20170531155157.93583-1-stemerkhanov@cavium.com
Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-01 20:08:17 +02:00
Chris Packham
7d2fdaa694 EDAC, mv64x60: Check driver registration success
Check the return status of platform_driver_register() in
mv64x60_edac_init(). Only output messages and initialise the
edac_op_state if the registration is successful.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170529212142.25572-2-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-30 09:55:51 +02:00
Jason Baron
7103de0e58 EDAC, ie31200: Add Intel Kaby Lake CPU support
Kaby Lake seems to work just like Skylake.

Reported-and-tested-by: Doug Thompson <bc.tdw@recursor.net>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1495823683-32569-1-git-send-email-jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-28 19:29:40 +02:00
Chris Packham
8b9afe5946 EDAC, mv64x60: Replace in_le32()/out_le32() with readl()/writel()
To allow this driver to be used on non-powerpc platforms it needs to use
io accessors suitable for all platforms.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170518083135.28048-4-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-26 22:55:50 +02:00
Chris Packham
0b3df44eeb EDAC, mv64x60: Fix pdata->name
Change this from mpc85xx_pci_err to mv64x60_pci_err. The former is
likely a hangover from when this driver was created.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170518083135.28048-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-26 22:54:04 +02:00
Qiuxu Zhuo
d14e3a201f EDAC, sb_edac: Bump driver version and do some cleanups
Collapse 'case:' in *_mci_bind_devs() and update driver version from
1.1.1 to 1.1.2.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000934.87971-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 15:00:36 +02:00
Qiuxu Zhuo
4d475dde79 EDAC, sb_edac: Check if ECC enabled when at least one DIMM is present
This is based on previous work by Patrick Geary, see Link.

Additional cleanups ontop:

 - Remove the code to read MCMTR from pci_ha1_ta and CHN_TO_HA macro,
 now that TA0 and TA1 are unified.

 - Remove get_pdev_same_bus(), since in get_dimm_config() the
 variable "pvt->pci_ta" for KNL is also ready, we can simply use
 pci_read_config_dword(pvt->pci_ta, KNL_MCMTR, &pvt->info.mcmtr) to read
 MCMTR.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/57884350.1030401@supermicro.com
Link: http://lkml.kernel.org/r/20170523000910.87925-1-qiuxu.zhuo@intel.com
[ Make __populate_dimms() return int. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:57:52 +02:00
Qiuxu Zhuo
3286d3eb90 EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4
We don't need this quirk anymore now that the EDAC memory controller
representation matches the hardware.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000834.87881-1-qiuxu.zhuo@intel.com
[ Commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:40:40 +02:00
Borislav Petkov
6696522957 EDAC, sb_edac: Carve out dimm-populating loop
... to slim down get_dimm_config().

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:34 +02:00
Borislav Petkov
199389acd9 EDAC, sb_edac: Fix mod_name
It is called "sb_edac.c" now.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:33 +02:00
Qiuxu Zhuo
e2f747b1f4 EDAC, sb_edac: Assign EDAC memory controller per h/w controller
Tony pointed out: "currently the driver pretends there is one big
8-channel memory controller per socket instead of 2 4-channel
controllers. This is fine with all memory controller populated with
symmetrical DIMM configurations, but runs into difficulties on
asymmetrical setups".

Restructure the driver to assign an EDAC memory controller to each real
h/w memory controller to resolve the issue.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com
[ Break some lines at convenient points. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:21 +02:00
Tony Luck
7fd562b75d EDAC, sb_edac: Don't use "Socket#" in the memory controller name
EDAC assigns logical memory controller numbers in the order that we find
memory controllers, which depends on which PCI bus they are on. Some
systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on
socket3.

All this is made more confusing for users because we use the string
"Socket" while generating names for memory controllers, but the number
that we attach there is the memory controller number. E.g.

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT)

Change the names to say "SrcID#%d" (where the number we use is read from
the h/w associated with the memory controller instead of some logical
number internal to the EDAC driver). New message:

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT)

Reported-by: Andrey Korolyov <andrey@xdel.ru>
Reported-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 11:47:11 +02:00
Qiuxu Zhuo
00cf50d90a EDAC, sb_edac: Classify PCI-IDs by topology
Each of the PCI device IDs belongs to a CPU socket, or to one of the
integrated memory controllers. Provide an enum to specify the domain of
each, and distinguish the resource number in each domain: the number
of the PCI device IDs per integrated memory controller/socket, and the
number of integrated memory controllers per socket.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com
[ Realign pci_dev_descr_knl members. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 11:19:25 +02:00
Tobias Klauser
18caec20bf EDAC, altera: Constify irq_domain_ops
struct irq_domain_ops is not modified, so it can be made const.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170524133505.1233-1-tklauser@distanz.ch
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-24 15:46:25 +02:00
Yazen Ghannam
eb77e6b80f EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h
The wrong index into the csbases/csmasks arrays was being passed to
the function to compute the chip select sizes, which resulted in the
wrong size being computed. Address that so that the correct values are
computed and printed.

Also, redo how we calculate the number of pages in a CS row.

Reported-by: Benjamin Bennett <benbennett@gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: <stable@vger.kernel.org> # 4.10.x
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded integer math comment, minor cleanups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-03 16:27:36 +02:00
Borislav Petkov
f8d5549df2 EDAC, ghes: Do not enable it by default
Leave it to the user to decide whether to enable this or not. Otherwise,
platform-specific drivers won't initialize (currently, EDAC supports
only a single platform driver loaded).

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-27 14:15:38 +02:00
Borislav Petkov
bffc7dece9 EDAC: Rename report status accessors
Change them to have the edac_ prefix.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:15:02 +02:00
Borislav Petkov
fee27d7d97 EDAC: Delete edac_stub.c
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to
a module parameter.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:48 +02:00
Borislav Petkov
a06b85ff07 EDAC: Update Kconfig help text
Remove the old URLs.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:44 +02:00
Borislav Petkov
e3c4ff6d8c EDAC: Remove EDAC_MM_EDAC
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of
that indirection. Update defconfigs which had it.

While at it, fix dependencies such that EDAC depends on RAS for the
tracepoints.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-edac@vger.kernel.org
2017-04-10 17:14:41 +02:00
Borislav Petkov
be1d162948 EDAC: Issue tracepoint only when it is defined
... and this happens only when CONFIG_RAS is enabled.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:38 +02:00
Borislav Petkov
8c22b4fece EDAC: Move edac_op_state to edac_mc.c
... as part of moving stuff away from edac_stub.c

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:29 +02:00
Borislav Petkov
d3116a0837 EDAC: Remove edac_err_assert
... and the glue around it. It is not needed anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:21 +02:00
Borislav Petkov
97bb6c17ad EDAC: Get rid of edac_handlers
Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:17 +02:00
Borislav Petkov
db47d5f856 x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. See

  c0d1217202 ("drivers/edac: add new nmi rescan").

From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.

Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."

So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:

  inline int edac_handler_set(void)
  {
         if (edac_op_state == EDAC_OPSTATE_POLL)
                 return 0;

         return atomic_read(&edac_handlers);
  }

So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
2017-04-10 17:13:48 +02:00
Borislav Petkov
76f6a26ce9 EDAC, highbank: Align Makefile directives
... like the rest of the file.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:10:43 +02:00
Sergey Temerkhanov
5195c206fd EDAC, thunderx: Remove unused code
Remove unused code reserved for upcoming CPUs.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170406113834.17153-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-07 11:49:32 +02:00
Sergey Temerkhanov
3d2d8c0f84 EDAC, thunderx: Change LMC index calculation
Shift the node number by 3 bits instead of 8 allowing proper functioning
with default EDAC_MAX_MCS.

Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170406113755.17082-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-07 11:47:44 +02:00
Thor Thayer
25b223ddfe EDAC, altera: Fix peripheral warnings for Cyclone5
The peripherals' RAS functionality only exist on the Arria10 SoCFPGA.
The Cyclone5 initialization generates EDAC warnings when the peripherals
aren't found in the device tree. Fix by checking for Arria10 in the init
functions.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1491415262-5018-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-06 11:42:38 +02:00
Jan Glauber
621c4fe3cc EDAC, thunderx: Fix L2C MCI interrupt disable
Fix a typo that disabled the MCI interrupts using the wrong bitmask.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170405102739.6301-1-jglauber@cavium.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-05 14:46:05 +02:00
Sergey Temerkhanov
41003396f9 EDAC, thunderx: Add Cavium ThunderX EDAC driver
Add support for Cavium ThunderX EDAC capable on-chip peripherals, namely
the DRAM controller (LMC), cache coherent processor interconnect (CCPI)
and level 2 cache blocks (L2C-TAD, L2C-MCI, L2C-CBC)

Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David.Daney@cavium.com
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170324222837.60583-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-27 11:43:56 +02:00
Qiuxu Zhuo
819f60fb7d EDAC, pnd2_edac: Fix reported DIMM number
DIMM number passed to edac_mc_handle_error() was accidentally hardcoded
to zero. Pass in the correct daddr->dimm value.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-26 09:36:28 +02:00
Borislav Petkov
cd1be315ac EDAC, pnd2_edac: Fix !EDAC_DEBUG build
Provide debugfs function stubs when EDAC_DEBUG is not enabled so that we
don't fail the build:

  drivers/edac/pnd2_edac.c: In function ‘pnd2_init’:
  drivers/edac/pnd2_edac.c:1521:2: error: implicit declaration of function ‘setup_pnd2_debug’ [-Werror=implicit-function-declaration]
    setup_pnd2_debug();
    ^
  drivers/edac/pnd2_edac.c: In function ‘pnd2_exit’:
  drivers/edac/pnd2_edac.c:1529:2: error: implicit declaration of function ‘teardown_pnd2_debug’ [-Werror=implicit-function-declaration]
    teardown_pnd2_debug();
    ^

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-23 12:56:23 +01:00
Borislav Petkov
1c5bf78114 EDAC: Select DEBUG_FS
The debugfs.c functionality relies on DEBUG_FS so select it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-23 12:56:09 +01:00
Tony Luck
5c71ad17f9 EDAC, pnd2_edac: Add new EDAC driver for Intel SoC platforms
Initial target for this driver is the Intel Apollo Lake platform and
Denverton micro-server, they use the same internal memory controller IP
called Pondicherry2.

Memory controller registers are not in PCI config space like earlier
Intel memory controllers. For Apollo Lake platform they are accessed via
a "side-band" interface, for Denverton micro-server they are access via
PCI config space and memory map I/O. This driver is for Apollo Lake and
Denverton, but only the Denverton is fully enabled while we wait for the
sideband driver.

Apollo lake driver and initial cut at Denverton driver by Tony Luck.
Extensive cleanup, refactoring and basic verification by Qiuxu Zhuo.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170308174539.14432-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-16 12:40:52 +01:00
Jérémy Lefaure
e61555c29c EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
The MTR_DRAM_WIDTH macro returns the data width. It is sometimes used
as if it returned a boolean true if the width if 8. Fix the tests where
MTR_DRAM_WIDTH is misused.

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170309011809.8340-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-09 09:25:29 +01:00
Colin Ian King
4bd035eae2 EDAC, xgene: Fix wrongly spelled "procesing"
Fix spelling mistake in dev_err message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170223002609.9440-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-06 17:08:19 +01:00
Linus Torvalds
60c906bab1 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "The main changes in this cycle were:

  - Assign notifier chain priorities for all RAS related handlers to
    make the ordering explicit (Borislav Petkov)

  - Improve the AMD MCA banks sysfs output (Yazen Ghannam)

  - Various cleanups and restructuring of the x86 RAS code (Borislav
    Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
  x86/ras: Get rid of mce_process_work()
  EDAC/mce/amd: Dump TSC value
  EDAC/mce/amd: Unexport amd_decode_mce()
  x86/ras/amd/inj: Change dependency
  x86/ras: Flip the TSC-adding logic
  x86/ras/amd: Make sysfs names of banks more user-friendly
  x86/ras/therm_throt: Do not log a fake MCE for thermal events
  x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
2017-02-20 12:47:44 -08:00
Yazen Ghannam
75bf2f6478 EDAC, mce_amd: Print IPID and Syndrome on a separate line
Currently, the IPID and Syndrome are printed on the same line as the
Address. There are cases when we can have a valid Syndrome but not a
valid Address.

For example, the MCA_SYND register can be used to hold more detailed
error info that the hardware folks can use. It's not just DRAM ECC
syndromes. There are some error types that aren't related to memory that
may have valid syndromes, like some errors related to links in the Data
Fabric, etc.

In these cases, the IPID and Syndrome are not printed at the same log
level as the rest of the stanza, so users won't see them on the console.

Console:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Dmesg:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Print the IPID first and on a new line. The IPID should always be
printed on SMCA systems. The Syndrome will then be printed with the IPID
and at the same log level when valid:

  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-16 15:39:32 +01:00
Borislav Petkov
e62d2ca9d0 EDAC, amd64: Bump driver version
Last time we did that was when we enabled Bulldozer. Now, we enabled Zen
so it is only natural ... :-)

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
2017-02-14 11:58:05 +01:00
Wei Yongjun
279fa58035 EDAC, fsl_ddr: Make locally used symbols static
Fix the following sparse warnings:

  drivers/edac/fsl_ddr_edac.c:148:1: warning:
   symbol 'dev_attr_inject_data_hi' was not declared. Should it be static?
  drivers/edac/fsl_ddr_edac.c:150:1: warning:
   symbol 'dev_attr_inject_data_lo' was not declared. Should it be static?
  drivers/edac/fsl_ddr_edac.c:152:1: warning:
   symbol 'dev_attr_inject_ctrl' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-09 17:40:54 +01:00
Chris Packham
321d17c19b EDAC, mpc85xx: Add T2080 l2-cache support
The L2 cache controller on the T2080 SoC has similar capabilities to the
others already supported by the mpc85xx_edac driver. Add it to the list
of compatible devices.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-03 10:36:35 +01:00
Yazen Ghannam
1bd9900b83 EDAC, amd64: Add x86cpuid sanity check during init
Match one of the devices in amd64_cpuids[] before loading the module.
This is an additional sanity check against users trying to load
amd64_edac_mod on unsupported systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com
[ Get rid of err_ret label, make it a bit more readable this way. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:43:06 +01:00
Yazen Ghannam
4688c9b42d EDAC, amd64: Don't treat ECC disabled as failure
Having ECC disabled on a node doesn't necessarily mean that it's
disabled for the entire system. So let's return a non-failing code when
ECC is disabled on a node. This way we can skip initialization for the
node but still continue with the remaining nodes.

After probing all instances, make sure we have at least one MC device
allocated.

This issue is seen and fix tested on Fam15h and Fam17h MCM systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:38:49 +01:00
Yazen Ghannam
d7fc9d77ac EDAC: Add routine to check if MC devices list is empty
We need to know if any MC devices have been allocated.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com
[ Prettify text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:36:47 +01:00
Yazen Ghannam
df64636fa4 EDAC, amd64: Remove unused printing macros
amd64_{debug,notice} don't have any users, so remove them.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:20:06 +01:00
Yazen Ghannam
11ab1cae58 EDAC, amd64: Rework messages in ecc_enabled()
Print the node number when informing that DRAM ECC is disabled so
that we can show which nodes have DRAM ECC disabled. Also, print more
detailed system information as edac_dbg(), so as to not bother general
users.

Switch amd64_notice to amd64_info to match the message above it.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:18:33 +01:00
Yazen Ghannam
234365f56e EDAC, amd64: Move global code out of instance functions
We have a few functions that register/unregister an ECC error decoding
routine. These functions are called when we init/remove instances.
However, they are global and so don't need to be registered/unregistered
multiple times.

So move them out of the init/remove instance functions and into the
module init/exit routines.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:08:10 +01:00
Yazen Ghannam
2b9b2c4659 EDAC, amd64: Free unused memory when init_one_instance() fails
Jump to memory freeing routines when init_one_instance() fails.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:03:40 +01:00
Yazen Ghannam
67d7fd306e EDAC, mce_amd: Give more context to deferred error message
Users may not be familiar with the concept of deferred errors. There is
no action for users to take on this type of error, so give more context
in the error message to make this more clear.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:03:29 +01:00
Borislav Petkov
58fb24cb95 EDAC, i7300: Test for the second channel properly
REDMEMB[17] is the ECC_Locator bit, which, when set, identifies the
CS[3:2] as the simbols in error. And thus the second channel.

The macro computing it was wrong so get rid of it (it was used at one
place only) and get rid of the conditional too. Generates better code
this way anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-01-26 11:35:23 +01:00
Borislav Petkov
9026cc82b6 x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:57 +01:00
Borislav Petkov
0bceab677d EDAC/mce/amd: Dump TSC value
Dump the TSC value of the time when the MCE got logged.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:56 +01:00
Borislav Petkov
1fbcd90903 EDAC/mce/amd: Unexport amd_decode_mce()
It is not used outside of the driver anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:55 +01:00
Nicolas Iooss
127c1225bf EDAC, sb_edac: Get rid of ->show_interleave_mode()
Function sbridge_register_mci() sets pvt->info.show_interleave_mode
to knl_show_interleave_mode() on Knight's Landing and
show_interleave_mode() anywhere else.

Merge show_interleave_mode() and knl_show_interleave_mode() in a single
implementation and use it without an indirect function pointer.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org
[ Call it get_intlv_mode_str(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-23 11:39:48 +01:00
Aaron Miller
4fb6fde74d EDAC: Expose per-DIMM error counts in sysfs
The old csrowX sysfs directories have per-csrow error counters, but the
new dimmX directories do not currently expose error counts.

EDAC already keeps these counts, add them to sysfs so per-DIMM counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n.

Signed-off-by: Aaron Miller <aaronmiller@fb.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-19 10:29:40 +01:00
Yazen Ghannam
2287c63643 EDAC, amd64: Save and return err code from probe_one_instance()
We should save the return code from probe_one_instance() so that it can
be returned from the module init function. Otherwise, we'll be returning
the -ENOMEM from above.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1484322741-41884-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-16 11:53:39 +01:00
Arvind Yadav
f5c61277f6 EDAC, i82975x: Add ioremap_nocache() error handling
If ioremap_nocache() fails, it will return NULL. Which will then cause a
NULL-pointer dereference. Handle the returned value properly.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1484549092-11349-1-git-send-email-arvind.yadav.cs@gmail.com
[ Boris: massage commit message and improve error message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-16 11:09:22 +01:00
Ben Dooks
628ea92f09 EDAC: Make dev_attr_sdram_scrub_rate static
The dev_attr_sdram_scrub_rate is not declared in a header or used
anywhere else, so make it static to fix the following warning:

  drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
  'dev_attr_sdram_scrub_rate' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-06 15:54:16 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Mauro Carvalho Chehab
66c222a02f edac: fix kernel-doc tags at the drivers/edac_*.h
Some kernel-doc tags don't provide good descriptions or use
a different style. Adjust them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:58:10 -02:00
Mauro Carvalho Chehab
6634fbb6b6 driver-api: create an edac.rst file with EDAC documentation
Currently, there's no device driver documentation for the EDAC
subsystem at the driver-api book. Fill in the blanks for the
structures and functions that misses documentation, uniform
the word on the existing ones, and add a new edac.rst file at
driver-api, in order to document the EDAC subsystem.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab
e01aa14cf2 edac: move documentation from edac_mc.c to edac_core.h
Several functions are documented at edac_mc.c.

As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab
fdaf0b3505 edac: move documentation from edac_pci*.c to edac_pci.h
Several functions are documented at edac_pci.c and edac_pci_sysfs.c.

As we'll be including edac_pci.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab
5336f75499 edac: move documentation from edac_device to edac_core.h
Several functions are documented at edac_device.c.

As we'll be including edac_core.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab
78d88e8a3d edac: rename edac_core.h to edac_mc.h
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab
6d8ef24724 edac: move EDAC device definitions to drivers/edac/edac_device.h
The edac_core.h header contain data structures and function
definitions for both EDAC MC and EDAC device.

Let's move the devices ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab
0b892c7177 edac: move EDAC PCI definitions to drivers/edac/edac_pci.h
The edac_core.h header contain data structures and function
definitions for the 3 parts of EDAC: MC, PCI and device.

Let's move the PCI ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:50 -02:00
Mauro Carvalho Chehab
7230617537 edac: edac_core.h: remove prototype for edac_pci_reset_delay_period()
This function doesn't exist. So, remove its prototype.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:49 -02:00
Mauro Carvalho Chehab
a2c223b5ed edac: edac_core.h: get rid of unused kobj_complete
This element of struct edac_pci_ctl_info is never used. So,
get rid of it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:49 -02:00
Pan Bian
0de2788447 EDAC, amd64: Fix improper return value
When the call to zalloc_cpumask_var() fails, returning "false" seems
improper. The real value of macro "false" is 0, and 0 means no error.
Return -ENOMEM instead.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189071

Signed-off-by: Pan Bian <bianpan2016@163.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1480831638-5361-1-git-send-email-bianpan201604@163.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-12-04 10:51:42 +01:00
Borislav Petkov
5246c54007 EDAC, amd64: Improve amd64-specific printing macros
Prefix the warn and error macros with the respective string so that
callers don't have to say "Error" or "Warning". We save us string length
this way in the actual calls.

While at it, shorten the calls in reserve_mc_sibling_devs().

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
2016-12-01 11:35:07 +01:00
Yazen Ghannam
95d3af6bd1 EDAC, amd64: Autoload amd64_edac_mod on Fam17h systems
Add Fam17h to the list of families to autoload amd64_edac_mod.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-18-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:49 +01:00
Yazen Ghannam
713ad54675 EDAC, amd64: Define and register UMC error decode function
How we need to decode UMC errors is different from how we decode bus
errors, so let's define a new function for this. We also need a way to
determine the UMC channel since we're not guaranteed that there is a
fixed relation between channel and MCA bank.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com
[ Fold in decode_synd_reg(), simplify. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:48 +01:00
Yazen Ghannam
d27f3a348e EDAC, amd64: Determine EDAC capabilities on Fam17h systems
We need to determine the EDAC capabilities from all UMCs on the node. We
should only check UMCs that are enabled and make sure they all agree.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-15-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:47 +01:00
Yazen Ghannam
2d09d8f301 EDAC, amd64: Determine EDAC MC capabilities on Fam17h
The UMCs on Fam17h are independent memory controllers so we need to
read the capabilities from all UMCs and make sure they agree. Once
we determine what capabilities are available we should save them for
convenience.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480431116-94683-1-git-send-email-Yazen.Ghannam@amd.com
[ Simplify f17h_determine_edac_ctl_cap(), preinit edac_mode in init_csrows(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:04:54 +01:00
Yazen Ghannam
07ed82ef93 EDAC, amd64: Add Fam17h debug output
Read a few more UMC registers and provide debug output in order to be as
similar as possible to older AMD systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded K8 check and comments, fixup others. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 17:16:09 +01:00
Yazen Ghannam
8051c0af3c EDAC, amd64: Add Fam17h scrubber support
Fam17h has new register offsets and fields for setting up the DRAM
scrubber so add support for this.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:12 +01:00
Yazen Ghannam
a6c14dce85 EDAC, mce_amd: Don't report poison bit on Fam15h, bank 4
MCA_STATUS[43] has been defined as "Poison" or "Reserved" for every bank
since Fam15h except for Fam15h, bank 4 in which case it's defined as
part of the McaStatSubCache bitfield.

Filter out that case.

Reported-by: Dean Liberty <Dean.Liberty@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479478222-19896-1-git-send-email-Yazen.Ghannam@amd.com
[ Split an almost unparseable ternary conditional, add a comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:12 +01:00
Yazen Ghannam
b64ce7cd7f EDAC, amd64: Read MC registers on AMD Fam17h
Fam17h has a different set of registers and bitfields. Most of these
registers are read through SMN (System Management Network) rather
than PCI config space. Also, the derivation of various values is now
different.

Update amd64_edac to read the appropriate registers and extract the
correct values for Fam17h.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com
[ Save us the indentation level in read_mc_regs(), add defines ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:11 +01:00
Yazen Ghannam
936fc3afaa EDAC, amd64: Reserve correct PCI devices on AMD Fam17h
Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older
systems. Update struct amd64_pvt to hold the new functions and reserve
them if on Fam17h.

Also, allocate an array of UMC structs within our newly allocated PVT
struct.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com
[ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:49:40 +01:00
Yazen Ghannam
f1cbbec9fc EDAC, amd64: Add AMD Fam17h family type and ops
Add a family type and associated ops for Fam17h. Define a struct to hold
all the UMC registers that we need. Make this a part of struct amd64_pvt
in order to maximize code reuse in the rest of the driver.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24 21:24:09 +01:00
Yazen Ghannam
196b79fcc8 EDAC, amd64: Extend ecc_enabled() to Fam17h
Update the ecc_enabled() function to work on Fam17h. This entails
reading a different set of registers and using the SMN (System
Management Network) rather than PCI devices.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com
[ Fixup ecc_en assignment and get_umc_base(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24 21:07:03 +01:00
Borislav Petkov
627bc29ed9 Merge tip:ras/core to pick up dependent changes
tip:ras/core contains the respective Fam17h x86 RAS bits which
amd64_edac is going to use. So merge it into the EDAC branch.

Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-23 21:13:40 +01:00
Yazen Ghannam
044e7a414b EDAC, amd64: Don't force-enable ECC checking on newer systems
It's not recommended for the OS to try and force-enable ECC checking.
This is considered a firmware task since it includes memory training,
etc, so don't change ECC settings on Fam17h or newer systems and inform
the user.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479850816-1595-1-git-send-email-Yazen.Ghannam@amd.com
[ Put the "forcing" message in an else branch. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-23 19:04:11 +01:00
Yazen Ghannam
d12a969ebb EDAC, amd64: Add Deferred Error type
Currently, deferred errors are classified as correctable in EDAC. Add a
new error type for deferred errors so that they are correctly reported
to the user.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:57:19 +01:00
Yazen Ghannam
e70984d9eb EDAC, amd64: Rename __log_bus_error() to be more specific
We only use __log_bus_error() to log DRAM ECC errors, so let's change
the name to reflect this. We'll also use this function for DRAM ECC
errors on Fam17h, but we'll call it from a different function than
decode_bus_error().

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:42:55 +01:00
Yazen Ghannam
e7934b70d7 EDAC, amd64: Change target of pci_name from F2 to F3
AMD Fam17h will not be using PCI function 2 for EDAC, but will continue
to use function 3. So let's get the name of F3 instead of F2 to support
Fam17h and previous families.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:24:20 +01:00
Yazen Ghannam
5c332202f8 EDAC, mce_amd: Rename nb_bus_decoder to dram_ecc_decoder
nb_bus_decoder() is only used for DRAM ECC errors so rename it so that
the name is more generic and descriptive.

Also, call it for DRAM ECC errors on SMCA systems.

[ Boris: rename it to real function name with a verb in it. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 09:43:15 +01:00
Yanjiang Jin
27bda205ba EDAC, mpc85xx: Implement remove method for the platform driver
If we execute the below steps without this patch:

  modprobe mpc85xx_edac [The first insmod, everything is well.]
  modprobe -r mpc85xx_edac
  modprobe mpc85xx_edac [insmod again, error happens.]

We would get the error messages as below:

  BUG: recent printk recursion!
  Oops: Kernel access of bad area, sig: 11 [#48]
  Modules linked in: mpc85xx_edac edac_core softdog [last unloaded: mpc85xx_edac]
  CPU: 5 PID: 14773 Comm: modprobe Tainted: G D C 4.8.3-rt2
   .vsnprintf
   .vscnprintf
   .vprintk_emit
   .printk
   .edac_pci_add_device
   .mpc85xx_pci_err_probe
   .platform_drv_probe
   .driver_probe_device
   .__driver_attach
   .bus_for_each_dev
   .driver_attach
   .bus_add_driver
   .driver_register
   .__platform_register_drivers
   .mpc85xx_mc_init
   .do_one_initcall
   .do_init_module
   .load_module
   .SyS_finit_module
   system_call

Address this by cleaning up properly when removing the platform driver.

Tested on a T4240QDS board.

Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: york.sun@nxp.com
Link: http://lkml.kernel.org/r/1479351380-17109-2-git-send-email-yanjiang.jin@windriver.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-17 11:19:50 +01:00
Colin Ian King
8176170e03 EDAC, xgene: Fix spelling mistake in error messages
Trivial fix to spelling mistake "Mutilple" to "Multiple" in error
messages.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161114231104.5585-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-15 11:10:02 +01:00
Borislav Petkov
c73e8833be EDAC, mc: Fix locking around mc_devices list
When accessing the mc_devices list of memory controller descriptors, we
need to hold mem_ctls_mutex. This was not always the case, fix that.

Make all external callers call a version which grabs the mutex since the
last is local to edac_mc.c.

Reported-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-14 13:26:11 +01:00
Borislav Petkov
c09a8c40e0 x86/RAS: Hide SMCA bank names
Add accessor functions and hide the smca_names array. Also, add a
sanity-check to bank HWID assignment in get_smca_bank_info().

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:15 +01:00
Borislav Petkov
a9a1c0ee04 x86/RAS: Rename smca_bank_names to smca_names
Make it differ more from struct smca_bank_name for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov
1ce9cd7f9f x86/RAS: Simplify SMCA HWID descriptor struct
Call it simply smca_hwid and call local variables "hwid". More readable.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Thor Thayer
90e493d7d5 EDAC, altera: Disable IRQs while injecting SDRAM errors
Disable IRQs while injecting SDRAM errors. The RT patches exposed
a spinlock deadlock where the spinlock taken for the regmap write
deadlocked with the IRQ clear regmap write.

Error injection is not normally enabled for ECC but only for testing.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1476906827-9412-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-22 20:14:03 +02:00
Wei Yongjun
240ea9214a EDAC, skx_edac: Fix non static symbol warnings
Fix the following sparse warnings:

  drivers/edac/skx_edac.c:266:25: warning:
   symbol 'skx_cpuids' was not declared. Should it be static?
  drivers/edac/skx_edac.c:1040:12: warning:
   symbol 'skx_init' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1477147098-2842-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-22 20:10:38 +02:00
Piotr Luc
9a9260ca92 EDAC, sb_edac: Add Knights Mill support
Add Knights Mill (KNM) to the list of CPU models supported by sb_edac.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-19 12:37:44 +02:00
Dave Hansen
20f4d69243 EDAC, {sb,skx}_edac: Use Intel model macros instead of open-coding them
We now have symbolic names for a bunch of Intel CPU models via
asm/intel-family.h. The original conversion missed the EDAC drivers.
Convert them.

Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com
[ Remove comment, macro name is descriptive enough. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-19 12:32:40 +02:00
Linus Torvalds
19fe416532 * Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
buffers (Thor Thayer)
 
 * Split the memory controller part out of mpc85xx and share it with a
 * new Freescale ARM Layerscape driver (York Sun)
 
 * amd64_edac fixes (Yazen Ghannam)
 
 * Misc cleanups, refactoring and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX80EwAAoJEBLB8Bhh3lVKWwYP/1bbODQ7o+XhO8IaCDffYk30
 8y4WdSnI0/QcP8JbSvFA7y6Zn4L0BbrbYhKLRDAg9c34V2bMaqonCnkDtT6YatUb
 6l0H/hQ/Cah9AOm5PJLYg6O9s+ZBT8zA5b+F2Z9kUsuB6LSnVhp9skNrH6KPlm0U
 4pFaLnHQenQPZbuRCfRxPU49ZuKtBZtQDkJLJlHXwn7e1qZy2Q4tMnnEtsY6U2ea
 t3Hj+F8g+cdoiTQXOceCcOTR8GqDI6szgzn7vpXAGYvljBndszauAkxO7by79jg1
 I8AQfgwoBF5CYL2Q0pzT1maHmmG2sydeRAHIvhmGxiEfFz1abWhriXbS33c32q8a
 iFiVMAUIaSKpB/sB+5w5ymuBctI1mX5EQVW+8Xl2Gxt+olnhdJMocHnvQdYkfsYm
 Ka8LcbaiK6ZQTbs/cIMOc2paE0AFPu5uXKHCPeZlhQAxOBvSPuDAv0+qUB/of5Uq
 1SPidtsTmCI7X2hrdHAH9hLEkSjq68v3kqL5YnZL3H4gA3WohQEmX9ybjk097Kus
 WWEhdi/PSFX0qQKotMUUDuxfNcKI6PZH9p+i2dN6tNCkiTDdb0Eo5lCXN7RVVhvq
 qfE0Fcc4uDzh5MUS5jT58MWpA1cfdu9jbAf2BwFIU/poJcaeqy/SMyzCL+1D2/u6
 dmDAtQbKUUwiltB8QzQd
 =pcI8
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "A lot of movement in the EDAC tree this time around, coarse summary
  below:

   - Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
     buffers (Thor Thayer)

   - split the memory controller part out of mpc85xx and share it with a
     new Freescale ARM Layerscape driver (York Sun)

   - amd64_edac fixes (Yazen Ghannam)

   - misc cleanups, refactoring and fixes all over the place"

* tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits)
  EDAC, altera: Add IRQ Flags to disable IRQ while handling
  EDAC, altera: Correct EDAC IRQ error message
  EDAC, amd64: Autoload module using x86_cpu_id
  EDAC, sb_edac: Remove NULL pointer check on array pci_tad
  EDAC: Remove NO_IRQ from powerpc-only drivers
  EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe()
  EDAC, fsl_ddr: Add entry to MAINTAINERS
  EDAC: Move Doug Thompson to CREDITS
  EDAC, I3000: Orphan driver
  EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul()
  EDAC, layerscape: Add Layerscape EDAC support
  EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed
  EDAC, fsl_ddr: Add support for little endian
  EDAC, fsl_ddr: Add missing DDR DRAM types
  EDAC, fsl_ddr: Rename macros and names
  EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx
  EDAC, mpc85xx: Replace printk() with pr_* format
  EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1
  EDAC, altera: Rename MC trigger to common name
  EDAC, altera: Rename device trigger to common name
  ...
2016-10-04 12:06:26 -07:00
Thor Thayer
a29d64a45e EDAC, altera: Add IRQ Flags to disable IRQ while handling
Add the IRQF_ONESHOT and IRQF_TRIGGER_HIGH flags to disable the IRQ
while executing the IRQ handler. Remove the IRQF_SHARED because these
are not shared IRQs in the domain. Exposed when flooding IRQs.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1474582419-7053-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-23 12:03:34 +02:00
Thor Thayer
3763569f4c EDAC, altera: Correct EDAC IRQ error message
Correct the error message sent out in the case of a single bit error IRQ
allocation.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1474582419-7053-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-23 11:52:39 +02:00
Yazen Ghannam
d6efab74f6 EDAC, amd64: Autoload module using x86_cpu_id
Reinstate driver autoloading now that PCI dependency is gone.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1473984445-1726-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-21 12:48:15 +02:00
Yazen Ghannam
a884675b87 x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly
Bank 4 is reserved on family 0x17 and shouldn't generate any MCE
records. However, broken hardware and software is not something unheard
of so warn about bank 4 errors. They shouldn't be coming from bank 4
naturally but users can still use mce_amd_inj to simulate errors from it
for testing purposed.

Also, avoid special handling in the injector mce_amd_inj like it is
being done on the older families.

[ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Aravind Gopalakrishnan  <aravindksg.lkml@gmail.com>
Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:14 +02:00
Yazen Ghannam
4b711f92c9 x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems
The MCA_SYND and MCA_IPID registers contain valuable information and
should be included in MCE output. The MCA_SYND register contains
syndrome and other error information, and the MCA_IPID register will
uniquely identify the MCA bank's type without having to rely on system
software.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:13 +02:00
Yazen Ghannam
5896820e0a x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types
Scalable MCA defines a number of IP types. An MCA bank on an SMCA
system is defined as one of these IP types. A bank's type is uniquely
identified by the combination of the HWID and MCATYPE values read from
its MCA_IPID register.

Add the required tables in order to be able to lookup error descriptions
based on a bank's type and the error's extended error code.

[ bp: Align comments, simplify a bit. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:10 +02:00
Yazen Ghannam
856095b179 EDAC/mce_amd: Use SMCA prefix for error descriptions arrays
The error descriptions defined for Fam17h can be reused for other SMCA
systems, so their names should reflect this.

Change f17h prefix to smca for error descriptions.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:09 +02:00
Yazen Ghannam
c019b951e1 EDAC/mce_amd: Add missing SMCA error descriptions
Add missing SMCA error descriptions to the error descriptions arrays.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:09 +02:00
Yazen Ghannam
b300e87300 EDAC/mce_amd: Print syndrome register value on SMCA systems
Print SyndV bit status and print the raw value of the MCA_SYND register.
Further decoding of the syndrome from struct mce.synd can be done in
other places where appropriate, e.g. DRAM ECC.

Boris: make the error stanza more compact by putting the error address
and syndrome on the same line:

  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:2 (17:0:0) MC4_STATUS[-|CE|-|PCC|AddrV|-|-|SyndV|CECC]: 0x96204100001e0117
  [Hardware Error]: Error Addr: 0x000000007f4c52e3, Syndrome: 0x0000000000000000
  [Hardware Error]: Invalid IP block specified.
  [Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: RD

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:07 +02:00
Colin Ian King
c7c35407cd EDAC, sb_edac: Remove NULL pointer check on array pci_tad
pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and
hence cannot be NULL, so the NULL pointer check on pci_tad is redundant.
Remove it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-12 20:15:43 +02:00
Michael Ellerman
372095723a EDAC: Remove NO_IRQ from powerpc-only drivers
We'd like to eventually remove NO_IRQ on powerpc, so remove usages of it
from powerpc-only drivers.

The pdata structs are kzalloc'ed, so we don't need to initialise those
to 0, we can just drop the assignments entirely.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/1473674436-19467-1-git-send-email-mpe@ellerman.id.au
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-12 13:04:56 +02:00
Wei Yongjun
43fa9ba632 EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe()
Return negative error code from the edac_mc_add_mc() error handling case
instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: York Sun <york.sun@nxp.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1473350284-26482-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-09 19:27:22 +02:00
York Sun
f47ae798d8 EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul()
Replace obsolete simple_strtoul() with kstrtoul().

Signed-off-by: York Sun <york.sun@nxp.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471990593-27536-1-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:28:03 +02:00
York Sun
eeb3d68b6c EDAC, layerscape: Add Layerscape EDAC support
Add DDR EDAC driver for ARM-based compatible controllers. Both
big-endian and little-endian are supported, as specified in device tree.

Signed-off-by: York Sun <york.sun@nxp.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471990465-27443-1-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:28:03 +02:00
York Sun
55764ed37e EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed
When compiled as a module, removing it causes kernel warnings
when irq_dispose_mapping() is called. Instead of calling
irq_of_parse_and_map(), use platform_get_irq() to acquire the IRQ
number.

Signed-off-by: York Sun <york.sun@nxp.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: morbidrsa@gmail.com
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-8-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:28:02 +02:00
York Sun
339fdff14c EDAC, fsl_ddr: Add support for little endian
Get endianness from device tree. Both big endian and little endian are
supported. Default to big endian for backwards compatibility to MPC85xx.

Signed-off-by: York Sun <york.sun@nxp.com>
Acked-by: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: morbidrsa@gmail.com
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-7-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:28:02 +02:00
York Sun
4e2c3252d2 EDAC, fsl_ddr: Add missing DDR DRAM types
The compatible DDR controllers may support DDR, DDR2, DDR3, DDR4 DRAM.
An individual controller doesn't support all of them. The EDAC driver
reads SDRAM_CFG to determine which mode is configured.

Add DDR4 and drop the defines used only in the mtype assignment.

Signed-off-by: York Sun <york.sun@nxp.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: morbidrsa@gmail.com
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-6-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:28:01 +02:00
York Sun
d43a9fb202 EDAC, fsl_ddr: Rename macros and names
Use FSL-specific prefix for macros, variables and functions.

Signed-off-by: York Sun <york.sun@nxp.com>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-5-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:28:01 +02:00
York Sun
ea2eb9a8b6 EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx
The mpc85xx-compatible DDR controllers are used on ARM-based SoCs too.
Carve out the DDR part from the mpc85xx EDAC driver in preparation to
support both architectures.

Signed-off-by: York Sun <york.sun@nxp.com>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470946525-3410-1-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:28:00 +02:00
York Sun
88857ebe71 EDAC, mpc85xx: Replace printk() with pr_* format
Replace printk() with pr_err/pr_warn/pr_info macros.

Signed-off-by: York Sun <york.sun@nxp.com>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-3-git-send-email-york.sun@nxp.com
[ Boris: unbreak strings for easier greppability. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:27:59 +02:00
York Sun
9e6a03a044 EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1
On e500v1, read fault exception enable (RFXE) controls whether assertion
of core_fault_in causes a machine check interrupt. Assertion of
core_fault_in can result from uncorrectable data error, such as an L2
multi-bit ECC error. It can also occur from a system error if logic on
the integrated device signals a fault for nonfatal errors. RFXE bit is
cleared out of reset, and should be left clear for normal operation.
Assertion of core_fault_in does not cause a machine check.

RFXE is set specifically for RIO (Rapid IO) and PCI for book E to catch
the errors by machine check. With this bit set, the EDAC driver can't
get the interrupt in case of uncorrectable error. So this bit is cleared
in favor of EDAC. However, the benefit of catching such uncorrectable
error doesn't outweigh the other errors which may hang the system.
Besides, e500v2 has different errors masked by RFXE, and e500mc doesn't
support this bit. It is more reasonable to leave RFXE as is in the EDAC
driver, and leave the uncorrectable errors triggering machine check for
e500v1.

Suggested-by: Scott Wood <oss@buserror.net>
Signed-off-by: York Sun <york.sun@nxp.com>
Cc: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: oss@buserror.net
Cc: stuart.yoder@nxp.com
Link: http://lkml.kernel.org/r/1470779760-16483-2-git-send-email-york.sun@nxp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 10:27:59 +02:00
Thor Thayer
b8978badc4 EDAC, altera: Rename MC trigger to common name
Rename the Memory Controller debug trigger to the same common name as
the EDAC devices.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471622666-15197-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 09:06:42 +02:00
Thor Thayer
f399f34bdb EDAC, altera: Rename device trigger to common name
The L2 and OCRAM devices have different ecc trigger names than the other
EDAC devices (FIFO peripherals). Make them all the same and remove the
character array from the device structure.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471622666-15197-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-01 08:55:24 +02:00
Tony Luck
4ec656bdf4 EDAC, skx_edac: Add EDAC driver for Skylake
This is an entirely new driver instead of yet another set of patches
to sb_edac.c because:

1) Mapping from PCI devices to socket/memory controller is significantly
   different. Skylake scatters devices on a socket across a number of
   PCI buses.
2) There is an extra level of interleaving via the "mcroute" register
   that would be a little messy to squeeze into the old driver.
3) Validation is getting too expensive. Changes to sb_edac need to
   be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and
   Knights Landing.

Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-21 10:58:34 -07:00
Tillmann Heidsieck
6fa06b0d9e EDAC, mpc85xx: Fix PCIe error capture
According to the reference manual of MPC8572 and T4240, bit 31 of
PEX_ERR_CAP_STAT is W1C (write 1 to clear).

Add the corresponding write to PEX_ERR_CAP_STAT in order to fix the PCIe
error capture.

Tested on a T4240 processor.

Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160815190849.29327-1-theidsieck@leenox.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-18 10:17:40 +02:00
Bhaktipriya Shridhar
7bb8b77779 EDAC, wq: Remove deprecated create_singlethread_workqueue()
Replace the deprecated create_singlethread_workqueue() with
alloc_ordered_workqueue() with WQ_MEM_RECLAIM. This is the identity
conversion.

It's not recommended to stall it from memory pressure. Hence,
WQ_MEM_RECLAIM has been set to ensure forward progress under memory
pressure.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160813164124.GA9077@Karyakshetra
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-15 07:21:29 +02:00
Wei Yongjun
9bcd919eb8 EDAC, altera: Make a10_eccmgr_ic_ops static
Fix the following sparse warning:

  drivers/edac/altera_edac.c:1649:23: warning:
   symbol 'a10_eccmgr_ic_ops' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1470836667-11822-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-10 18:37:06 +02:00
Thor Thayer
911049845d EDAC, altera: Add Arria10 SD-MMC EDAC support
Add Altera Arria10 SD-MMC FIFO memory EDAC support. The SD-MMC is a
dual port RAM implementation which is different than any of the other
peripherals and therefore requires additional code.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1470753653-23465-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-10 14:43:14 +02:00
Yazen Ghannam
dc0a50a841 EDAC, amd64: Fix channel decode on Fam15hMod60h systems
Fam15hMod60h systems are using the channel decode of Fam15hMod30h which
gives incorrect results. Fam15hMod60h systems should use the generic
channel decode method plus a couple more cases.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1470236355-30039-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08 05:59:42 +02:00
Thor Thayer
485fe9e24e EDAC, altera: Add Arria10 QSPI support
Add Altera Arria10 QSPI FIFO memory support.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1468512408-5156-9-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08 05:59:39 +02:00
Thor Thayer
c609581d1f EDAC, altera: Add Arria10 USB support
Add Altera Arria10 USB FIFO memory support.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1468512408-5156-8-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08 05:59:37 +02:00
Thor Thayer
e8263793b7 EDAC, altera: Add Arria10 DMA support
Add Altera Arria10 DMA FIFO memory support.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1468512408-5156-7-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08 05:59:36 +02:00
Thor Thayer
c6882fb2e8 EDAC, altera: Add Arria10 NAND support
Add Altera Arria10 NAND FIFO memory support.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1468512408-5156-6-git-send-email-tthayer@opensource.altera.com
[ Reformat loop in altr_edac_a10_probe() for better readability. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08 05:59:35 +02:00
Lukasz Odzioba
c5b48fa7e2 EDAC, sb_edac: Fix channel reporting on Knights Landing
On Intel Xeon Phi Knights Landing processor family the channels of the
memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
channel name incorrectly.

We missed this change earlier, so the code already contains similar
comment, but the translation function is incorrect.

Without this patch:
  errors in DIMM_A and DIMM_D were reported in DIMM_D
  errors in DIMM_B and DIMM_E were reported in DIMM_E
  errors in DIMM_C and DIMM_F were reported in DIMM_F

Correct this.

Hubert Chrzaniuk:
 - rebased to 4.8
 - comments and code cleanup

Fixes: d0cdf90031 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Cc: lukasz.odzioba@intel.com
Cc: mchehab@kernel.org
Cc: <stable@vger.kernel.org> # v4.5..
Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
[ Boris: Simplify a bit by removing char mc. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08 05:52:08 +02:00
Linus Torvalds
c79a14defb * Altera Arria10 ethernet FIFO buffer support (Thor Thayer)
* Minor cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXlbvGAAoJEBLB8Bhh3lVK9g0P/16Z5/WIqrrLjJegcZ3qKb4m
 u2b5FBwbTDQyj/smZmfZfhZUleO7HDayL188fHPKO11my+Mvjyws8IZCvHshy7+3
 b6Krno8NXZuNYtm32srSQQoLE4eN5pookOPMe2KJcRFghwnDqOo0ZfMZ8Lt6RVGg
 YtqiVZ3DwaqoHUbvxkHvdJuMbtVEZIEC6SzK9t4URmdncAx+3/BTLqDSYc4LaL+B
 tI6REgMy+QT1IUMImuUbOA5ktd78K+D26YKF9NuTSWNQvwzjSYy4y6WmDsjYpZF6
 h6YWgGwERIy6gws5IdvkBMKX0s1RV7/sY6xFBsEQrX+IFywyY5OcI1gPRttD6VAn
 ORJ4j7WNH8Phqld9YxjEX8yTwGxlxge88J/lb0i9s5jXPMX7ioxAqAjdpsJj6PlA
 sNDP0fTyCECqME73cWaAS9bdrzB2dgOxEDfcWKWR24vZeHfnh72owD1k+J0iWu9c
 0R2usPB1Yjbo6ODGpqG1R+u09LR+RpNe38oT5GcsFrj6/EaGz0kn2QulaNtylsDo
 J5qkm02X1h3+F8/UHrzKavOGk82IQzLCbcW7eHPmyzkH8PTe1semLk9jHnXN5sK3
 uL2nirZQVIF6QLbYxf0Uf2oSVs05xc7u9ZT+QOl6hqh/6/K7DWC45SoABwMXDaHH
 jXr+y4vtr5IR8GQfZ/nd
 =mdxx
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "This last cycle, Thor was busy adding Arria10 eth FIFO support to the
  altera_edac driver along with other improvements.  We have two
  cleanups/fixes too.

  Summary:

   - Altera Arria10 ethernet FIFO buffer support (Thor Thayer)

   - Minor cleanups"

* tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  ARM: dts: Add Arria10 Ethernet EDAC devicetree entry
  EDAC, altera: Add Arria10 Ethernet EDAC support
  EDAC, altera: Add Arria10 ECC memory init functions
  Documentation: dt: socfpga: Add Arria10 Ethernet binding
  EDAC, altera: Drop some ifdeffery
  EDAC, altera: Add panic flag check to A10 IRQ
  EDAC, altera: Check parent status for Arria10 EDAC block
  EDAC, altera: Make all private data structures static
  EDAC: Correct channel count limit
  EDAC, amd64_edac: Init opstate at the proper time during init
  EDAC, altera: Handle Arria10 SDRAM child node
  EDAC, altera: Add ECC Manager IRQ controller support
  Documentation: dt: socfpga: Add interrupt-controller to ecc-manager
2016-07-27 13:40:47 -07:00
Tony Luck
0ba169ac36 EDAC, sb_edac: Fix Knights Landing
In commit 2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver
detection") I broke Knights Landing because I failed to notice that it
called a wrapper macro "sbridge_get_all_devices_knl" instead of
"sbridge_get_all_devices" like all the other types.

Now that we include the processor type in the pci_id_table structure we
can skip the wrappers and just have the sbridge_get_all_devices() check
the type to decide whether to allow duplicate devices and controllers to
have registers spread across buses.

Fixes: 2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver detection")
Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-16 06:11:59 +09:00
Thor Thayer
ab8c1e0fb0 EDAC, altera: Add Arria10 Ethernet EDAC support
Add Altera Arria10 Ethernet FIFO memory EDAC support. Update to support
a common compatibility string for all Ethernet FIFOs in the DT.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1466603939-7526-8-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-25 11:31:34 +02:00
Thor Thayer
1166fde93d EDAC, altera: Add Arria10 ECC memory init functions
In preparation for additional memory module ECCs, add the memory
initialization functions and helpers.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1466603939-7526-7-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24 21:16:38 +02:00
Thor Thayer
6b300fb953 EDAC, altera: Drop some ifdeffery
Make the IRQ and check_deps() functions available to all the memory
buffers by moving them outside of the OCRAM only area.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1466603939-7526-5-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24 14:18:33 +02:00
Thor Thayer
2b083d65ff EDAC, altera: Add panic flag check to A10 IRQ
In preparation for additional memory module ECCs, the IRQ function will
check a panic flag before doing a kernel panic on double bit errors.

OCRAM uncorrectable errors cause a panic because sleep/resume functions
and FPGA contents during sleep are stored in OCRAM.

ECCs on peripheral FIFO buffers will not cause a kernel panic on DBERRs
because the packet can be retried and therefore recovered.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1466603939-7526-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24 11:58:43 +02:00
Thor Thayer
44ec9b307e EDAC, altera: Check parent status for Arria10 EDAC block
In preparation for the Arria10 ECC modules, check the status of the
parent in the device tree to ensure the block is enabled. Skip if no
parent phandle is set in the device tree.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1466603939-7526-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24 11:58:42 +02:00
Thor Thayer
1cf7037724 EDAC, altera: Make all private data structures static
The device private data structures are used only here so make them
static.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1466603939-7526-4-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-24 11:58:42 +02:00
Borislav Petkov
bba142957e EDAC: Correct channel count limit
c44696fff0 ("EDAC: Remove arbitrary limit on number of channels")
lifted the arbitrary limit on memory controller channels in EDAC.
However, the dynamic channel attributes dynamic_csrow_dimm_attr and
dynamic_csrow_ce_count_attr remained 6.

This wasn't a problem except channels 6 and 7 weren't visible in sysfs
on machines with more than 6 channels after the conversion to static
attr groups with

  2c1946b6d6 ("EDAC: Use static attribute groups for managing sysfs entries")

 [ without that, we're exploding in edac_create_sysfs_mci_device()
   because we're dereferencing out of the bounds of the
   dynamic_csrow_dimm_attr array. ]

Add attributes for channels 6 and 7 along with a guard for the
future, should more channels be required and/or to sanity check for
misconfigured machines.

We still need to check against the number of channels present on the MC
first, as Thor reported.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com>
Tested-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: <stable@vger.kernel.org> # 4.2
2016-06-16 10:06:35 +02:00
Borislav Petkov
6ba92fea1b EDAC, amd64_edac: Init opstate at the proper time during init
It is useless to do it if we're loaded on unsupported hardware so do
that only after we have detected at least 1 supported AMD northbridge.

Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-16 01:13:18 +02:00
Thor Thayer
ab564cb51e EDAC, altera: Handle Arria10 SDRAM child node
Separate the device match arrays for each platform to prevent CycloneV
matches when calling of_platform_populate() on the Arria10 ECC manager
node.

If the SDRAM is a child node of ECC manager, call probe function via
of_platform_populate().

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1464193783-5071-4-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-08 13:23:09 +02:00
Thor Thayer
13ab8448d2 EDAC, altera: Add ECC Manager IRQ controller support
To better support child devices, the ECC manager needs to be
implemented as an IRQ controller.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1465331757-10227-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-08 13:23:01 +02:00
Tony Luck
665f05e0b8 EDAC, sb_edac: Readd accidentally dropped Broadwell-D support
In commit

  2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver detection")

we switched from using PCI ids to determine which platform we are
running on to using CPU model instead.

I forgot that Broadwell-DE has its own distinct model number different
from Broadwell-EP or -EX.

Fixing this isn't just adding a line to the array of cpuids - the
exising code assumed a 1:1 mapping between entries in that array and the
"enum type" values. Added the type to pci_id_table structure to remove
this dependency and allows two Broadwell cpu models.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2c1ea4c700 ("EDAC, sb_edac: Use cpu family/model in driver detection")
Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-03 17:28:21 +02:00
Nicholas Krause
fbedcaf43f EDAC: Fix workqueues poll period resetting
After the workqueue cleanup, we're registering workqueues based on
the presence of an ->edac_check function. When that is the case,
we're setting OP_RUNNING_POLL. But we forgot to check that in
edac_mc_reset_delay_period(), leading to:

  BUG: unable to handle kernel paging request at 0000000000015d10
  IP: [ .. ] queued_spin_lock_slowpath
  PGD 3ffcc8067 PUD 3ffc56067 PMD 0
  Oops: 0002 [#1] SMP
  Modules linked in: ...
  CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
  Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
  Stack:
  Call Trace:
    ? _raw_spin_lock_irqsave
    ? lock_timer_base.isra.34
    ? del_timer
    ? try_to_grab_pending
    ? mod_delayed_work_on
    ? edac_mc_reset_delay_period
    ? edac_set_poll_msec
    ? param_attr_store
    ? module_attr_store
    ? kernfs_fop_write
    ? __vfs_write
    ? __vfs_read
    ? __alloc_fd
    ? vfs_write
    ? SyS_write
    ? entry_SYSCALL_64_fastpath
  Code:
  RIP  [ .. ] queued_spin_lock_slowpath
   RSP <>
  CR2: 0000000000015d10
  ---[ end trace 3f286bc71cca15d1 ]---
  Kernel panic - not syncing: Fatal exception

Fix it.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Cc: <stable@vger.kernel.org> # 4.5
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
[ Rewrite commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-03 11:14:27 +02:00
Tony Luck
c7103f650a EDAC, sb_edac: Fix rank lookup on Broadwell
Broadwell made a small change to the rank target register moving the
target rank ID field up from bits 16:19 to bits 20:23.

Also found that the offset field grew by one bit in the IVY_BRIDGE to
HASWELL transition, so fix the RIR_OFFSET() macro too.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org # v3.19+
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-03 10:05:49 +02:00
Linus Torvalds
16bf834805 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits)
  gitignore: fix wording
  mfd: ab8500-debugfs: fix "between" in printk
  memstick: trivial fix of spelling mistake on management
  cpupowerutils: bench: fix "average"
  treewide: Fix typos in printk
  IB/mlx4: printk fix
  pinctrl: sirf/atlas7: fix printk spelling
  serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/
  w1: comment spelling s/minmum/minimum/
  Blackfin: comment spelling s/divsor/divisor/
  metag: Fix misspellings in comments.
  ia64: Fix misspellings in comments.
  hexagon: Fix misspellings in comments.
  tools/perf: Fix misspellings in comments.
  cris: Fix misspellings in comments.
  c6x: Fix misspellings in comments.
  blackfin: Fix misspelling of 'register' in comment.
  avr32: Fix misspelling of 'definitions' in comment.
  treewide: Fix typos in printk
  Doc: treewide : Fix typos in DocBook/filesystem.xml
  ...
2016-05-17 17:05:30 -07:00
Linus Torvalds
1cc3880a3c * Altera Arria10 L2 cache and On-Chip RAM ECC handling. (Thor Thayer)
* Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac. (Tony Luck)
 
 * Do not register sb_edac with pci_register_driver(). (Tony Luck)
 
 * Add support for Skylake to ie31200_edac. (Jason Baron)
 
 * Do not register amd64_edac with pci_register_driver(). (Borislav Petkov)
 
 + the usual round of cleanups and fixes all over the place.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXOaHaAAoJEBLB8Bhh3lVKRdoP/jDewaX4GAxgcKaK9Kx7VjMe
 i42E/7syh5iLoyFgZ1hUjvqiQHN4RyWloUYyNHbNxGS9uXCMXIUoJQd2FjOY442s
 oeEaoMCfZsJLzKQti3fUPK9GugZ57BSZxfn6NlJPpyG4FxSirOcZFCxGaNuyqqrh
 0M+gFvbWfZcVDLE0FI5CUYZWRxsk//+pLM4ZkQZFA8Eo1tGVc3r0zEEAlL/uEMDW
 P0ATvRHNUeib9YQGTdSD7cNUpRX4SX+T8lDUiaVm+tHL5ES3b4rayMBdbVMrahfc
 Gnxu9GtO5gWXEY6QDDpOx+VkbfFmDFodx833psJ3MVD8evEFHHdinkgDaptLrrV6
 92ZDKR5s3W6tKXkcmGuExrtc17UgjLcRZCebXbv+5FJlVrslzvQ9ESOTRyiZBrCD
 ZpFi2TZhpYU1uEWuoBZCbWNXW2pcSt7/bQ9bYUvfrvNfgPzPnblubJuVKRcfY0WB
 x2vj0PNnckpoPRvskV7GEe0Y/JISzAxBQUK6XO+GJgMgz5M+23SEaSVU5yeyf26e
 x/yD5yImQGN84AxfRMMvbR2JvpVLN3vdFWtWigneht160erVA/Qgw5Gcrpw53tZr
 zPD3fGaetrNgS8BvJAy/c6xHV1j6A1RGCGThy8Ivqep9WD90a5JhR1NRjZp6m8sf
 aB0A1zTP7e+hRFkZGahO
 =ud2P
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "It was pretty busy in EDAC land this time:

   - Altera Arria10 L2 cache and On-Chip RAM ECC handling (Thor Thayer)

   - Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac
     (Tony Luck)

   - Do not register sb_edac with pci_register_driver() (Tony Luck)

   - Add support for Skylake to ie31200_edac (Jason Baron)

   - Do not register amd64_edac with pci_register_driver() (Borislav
     Petkov)

  ... plus the usual round of cleanups and fixes all over the place"

* tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (25 commits)
  EDAC, amd64_edac: Drop pci_register_driver() use
  EDAC, ie31200_edac: Add Skylake support
  EDAC, sb_edac: Use cpu family/model in driver detection
  EDAC, i7core: Remove double buffering of error records
  EDAC, amd64_edac: Issue driver banner only on success
  ARM: socfpga: Initialize Arria10 OCRAM ECC on startup
  EDAC: Increment correct counter in edac_inc_ue_error()
  EDAC, sb_edac: Remove double buffering of error records
  EDAC: Fix used after kfree() error in edac_unregister_sysfs()
  EDAC, altera: Avoid unused function warnings
  EDAC, altera: Remove useless casts
  ARM: socfpga: Enable Arria10 OCRAM ECC on startup
  EDAC, altera: Add Arria10 OCRAM ECC support
  Documentation: dt: socfpga: Add Altera Arria10 OCRAM binding
  EDAC, altera: Make OCRAM ECC dependency check generic
  EDAC, altera: Add register offset for ECC Enable
  EDAC, altera: Extract error inject operations to a struct fops
  ARM: socfpga: Enable Arria10 L2 cache ECC on startup
  EDAC, altera: Add Arria10 L2 Cache ECC handling
  Documentation, dt, socfpga: Add Altera Arria10 L2 cache binding
  ...
2016-05-16 18:44:39 -07:00
Yazen Ghannam
a348ed83d9 EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCA
Use X86_FEATURE_SMCA when detecting if SMCA is available instead of
directly using CPUID 0x80000007_EBX.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:23 +02:00
Borislav Petkov
3f37a36b62 EDAC, amd64_edac: Drop pci_register_driver() use
- remove homegrown instances counting.
- take F3 PCI device from amd_nb caching instead of F2 which was used with the
PCI core.

With those changes, the driver doesn't need to register a PCI driver and
relies on the northbridges caching which we do anyway on AMD.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2016-05-09 20:41:16 +02:00
Jason Baron
953dee9bbd EDAC, ie31200_edac: Add Skylake support
Skylake adjusts some register locations, but otherwise follows the
existing model quite closely. I was able to verify that the 'ce_count'
increments when 'bad dimms' are used. The accounting of 'ce_count' and
'ue_count' is the primary functionality of interest for us. Tested on
Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462547927-22679-1-git-send-email-jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-05-06 18:50:14 +02:00
Tony Luck
2c1ea4c700 EDAC, sb_edac: Use cpu family/model in driver detection
Instead of picking a random PCI ID from the dozen or so we need to
access, just use x86_match_cpu() to pick based on CPU model number. The
choosing of PCI devices has been problematic in the past, see

  11249e7399 ("sb_edac: Fix detection on SNB machines")

which fixed problems introduced by

  d0585cd815 ("sb_edac: Claim a different PCI device").

This is especially ugly if future hardware might not even have
EDAC-relevant registers in PCI config space and we would still be
required to choose some "random" PCI devices to scan for just so our
driver loads.

Is this cleaner/clearer? It deletes much more code than it adds. Only
tested on Broadwell. The driver loads/unloads and loads again. Still
decodes errors too.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-05-02 19:44:43 +02:00
Tony Luck
5359534505 EDAC, i7core: Remove double buffering of error records
In the bad old days the functions from x86_mce_decoder_chain could be
called in machine check context. So we used to carefully copy them and
defer processing until later. But in

  f29a7aff4b ("x86/mce: Avoid potential deadlock due to printk() in MCE context")

we switched the logging code to save the record in a genpool, and call
the functions that registered to be notified later from a work queue.

So drop all the double buffering and do all the work we want to do as
soon as i7core_mce_check_error() is called.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/29ab2c370915c6e132fc5d88e7b72cb834bedbfe.1461855008.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-29 16:41:24 +02:00
Tony Luck
c4fc1956fa EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-29 15:43:10 +02:00
Borislav Petkov
de0336b30d EDAC, amd64_edac: Issue driver banner only on success
... and don't mislead users into thinking that the driver has loaded
successfully.

Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-27 12:30:26 +02:00
Emmanouil Maroudas
993f88f1cc EDAC: Increment correct counter in edac_inc_ue_error()
Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of
ce_noinfo_count.

Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 4275be6355 ("edac: Change internal representation to work with layers")
Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23 18:10:09 +02:00
Tony Luck
ad08c4e974 EDAC, sb_edac: Remove double buffering of error records
In the bad old days the functions from x86_mce_decoder_chain could be
called in machine check context. So we used to carefully copy them and
defer processing until later. But in

  f29a7aff4b ("x86/mce: Avoid potential deadlock due to printk() in MCE context")

we switched the logging code to save the record in a genpool, and call
the functions that registered to be notified later from a work queue.

So drop all the double buffering and do all the work we want to do as
soon as sbridge_mce_check_error() is called.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: patrickg@supermicro.com
Link: http://lkml.kernel.org/r/100025611cd780d9bca72792b2b2146760da53e0.1460756761.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23 14:02:02 +02:00
Tony Luck
ab67b6c22d EDAC: Fix used after kfree() error in edac_unregister_sysfs()
Code flow looks like this:

  device_unregister(&mci->dev);
   -> kobject_put+0x25/0x50
    -> kobject_cleanup+0x77/0x190
      -> device_release+0x32/0xa0
	-> mci_attr_release+0x36/0x70
	  -> kfree(mci);
  bus_unregister(mci->bus);

Fix is to grab a local copy of "mci->bus" and use that when we call
bus_unregister().

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23 13:23:54 +02:00
Arnd Bergmann
1aa6eb5c5b EDAC, altera: Avoid unused function warnings
The recently added Arria10 OCRAM ECC support caused some new harmless
warnings about unused functions when it is disabled:

  drivers/edac/altera_edac.c:1067:20: error: 'altr_edac_a10_ecc_irq' defined but not used [-Werror=unused-function]
  drivers/edac/altera_edac.c:658:12: error: 'altr_check_ecc_deps' defined but not used [-Werror=unused-function]

This rearranges the code slightly to have those two functions inside
of the same #ifdef that hides their callers. It also manages to
avoid a forward declaration of the IRQ handler in the process.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: Alan Tull <atull@opensource.altera.com>
Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: c7b4be8db8 ("EDAC, altera: Add Arria10 OCRAM ECC support")
Link: http://lkml.kernel.org/r/1460837650-1237650-2-git-send-email-arnd@arndb.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23 12:00:12 +02:00
Arnd Bergmann
2c911f6cac EDAC, altera: Remove useless casts
The altera EDAC driver refers to its per-device data
using a cast to '(void *)', which makes the pointer
non-const, though both the source and destination are
actually const.

Removing the annotation makes the reference (almost)
fit into a single line for improved readability, and
ensures that it is actually defined as const.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: Alan Tull <atull@opensource.altera.com>
Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1460837650-1237650-1-git-send-email-arnd@arndb.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23 11:54:42 +02:00
Tony Luck
ea5dfb5fae x86 EDAC, sb_edac.c: Take account of channel hashing when needed
Haswell and Broadwell can be configured to hash the channel
interleave function using bits [27:12] of the physical address.

On those processor models we must check to see if hashing is
enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and
act accordingly.

Based on a patch by patrickg <patrickg@supermicro.com>

Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:10:01 +02:00
Tony Luck
ff15e95c82 x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
In commit:

  eb1af3b71f ("Fix computation of channel address")

I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.

Tested-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: eb1af3b71f ("Fix computation of channel address")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:10:01 +02:00
Masanari Iida
c19ca6cb4c treewide: Fix typos in printk
This patch fix spelling typos found in printk
within various part of the kernel sources.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18 11:23:24 +02:00
Thor Thayer
c7b4be8db8 EDAC, altera: Add Arria10 OCRAM ECC support
Add Arria10 On-Chip RAM ECC handling.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1459992174-8015-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-07 12:42:56 +02:00
Thor Thayer
aa1f06dcc0 EDAC, altera: Make OCRAM ECC dependency check generic
In preparation for the Arria10 peripheral ECCs, move the OCRAM ECC
dependency check into the general ECC area since this same function can
be used by other memories.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1459450087-24792-4-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-02 13:47:15 +02:00
Thor Thayer
943ad91798 EDAC, altera: Add register offset for ECC Enable
In preparation for the Arria10 peripheral ECCs, add a register offset
from the ECC base to index to the ECC enable register.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1459450087-24792-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-02 13:38:57 +02:00
Thor Thayer
e17ced2cb6 EDAC, altera: Extract error inject operations to a struct fops
In preparation for the Arria10 peripheral ECCs, extract the inject file
operations because the Arria10 IRQ trigger mechanism is different than
Cyclone5/Arria5 and Arria10 L2 cache.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1459450087-24792-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-02 13:27:06 +02:00
Thor Thayer
588cb03ea2 EDAC, altera: Add Arria10 L2 Cache ECC handling
Add a private data structure for Arria10 L2 cache ECC and the probe
function for it.

The Arria10 ECC device IRQs are in a shared register so the ECC Manager
parent/child relationship requires a different probe function.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1458576106-24505-8-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29 10:34:06 +02:00
Thor Thayer
811fce4f2a EDAC, altera: Add register offset for ECC Error Inject
In preparation for the Arria10 peripheral ECCs, add a register offset
from the ECC base to the private data structure to index to the error
injection register.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1458576106-24505-6-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29 10:20:48 +02:00
Thor Thayer
27439a1a63 EDAC, altera: Abstract ECC Enable Mask in check_deps()
In preparation for the Arria10 peripheral ECCs, use the ECC Enable mask
in place of hard coded masks in the check dependency functions.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1458576106-24505-5-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29 10:17:32 +02:00
Thor Thayer
328ca7ae81 EDAC, altera: Remove platform device from check_deps()
In preparation for the Arria10 peripheral ECCs, remove the platform
device parameter from the check_deps() functions because it is not
needed and makes the Arria10 check_deps() cleaner.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1458576106-24505-4-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29 10:14:21 +02:00
Thor Thayer
05b088b6f8 EDAC, altera: Move device structs and defines to the header
Move the device structs and defines to altera_edac.h in preparation for
adding the Arria10 L2 cache ECC.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1458576106-24505-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29 10:11:16 +02:00
Thor Thayer
3a8f21f170 EDAC, altera: Make L2C depend on L2x0 cache controller
Make L2 cache depend instead of forcibly select the L2 cache support.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1458576106-24505-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29 10:06:11 +02:00
Linus Torvalds
047486d8e7 EDAC queue for 4.6
* Altera: L2 cache and On-Chip RAM support (Thor Thayer).
 
 * EDAC: Workqueue handling cleanups (Borislav Petkov).
 
 * Xgene: Register bus error handling (Loc Ho).
 
 * Misc small fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW5sS0AAoJEBLB8Bhh3lVKxPEP/j8pleG17s1HI9wfhukzuT9z
 epjyaYjEv1BR6+3drmFjbAYGgk7hhL8khVePEhouS/P5WQSzRHMdgimL5NMpnwSZ
 6XXyR2szhz86eYMC1lQxdRnESZarbnVtqyiRG0Mv1hFbObyzM0ewiHt2lTtCa1Uj
 DeprrSE6QCQLwq0WSIF8MJ9LBcmGh5dXJTy0sTeATfkmgBaBQeJhiZOmJer25Jqf
 tEyvxkkFw0OLAy3BoI7eeI7ALmgzIXPLIWVOo0t1qTeKsURwdfap8xjteT9/5iiJ
 HHB+4A+8iUMSPYMoIiSr0qsyZT1CI2ncVV4VOAhm11if4gB8sCnOkioMo6bHhtbq
 NZrug6BnwizBBh4FFGHmTTpN8MlhLgYA8YfSkUD5jal3jM/l5VvlM9DDM72JuzLy
 0hd5zaAqlqEJj4plvSx4ieuDHAkA3Qzd58U12LuN+R+nbN2EO8Svr9CvEMSLE+1r
 exxOYRP5xO9SPeK7pTnXJFsI09MrRmXGinRQu/LChWAm1j7NxaHkeUl7ulzxjV+5
 E6Bx+mOPvYTUF32CQi4i/oIePK2kVGoExaUSkRNKeRgwF/ObQaVxjM+cYvM+VJEh
 2OknzHNG/F87UvkcOdGMEy6G4E5xkHSuEYtq7bHGWI2prKn+f0nbPpWgx9d4DiE8
 jo0vi58v+Irwk9H465lH
 =ssy9
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - Altera: L2 cache and On-Chip RAM support (Thor Thayer).

 - EDAC: Workqueue handling cleanups (Borislav Petkov).

 - Xgene: Register bus error handling (Loc Ho).

 - Misc small fixes.

* tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  ARM: socfpga: Enable OCRAM ECC on startup
  ARM: socfpga: Enable L2 cache ECC on startup
  ARM: dts: Add Altera L2 Cache and OCRAM EDAC entries
  EDAC, altera: Add Altera L2 cache and OCRAM support
  EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit()
  EDAC, mpc85xx: Silence unused variable warning
  EDAC: Cleanup/sync workqueue functions
  EDAC: Kill workqueue setup/teardown functions
  EDAC: Balance workqueue setup and teardown
  arm64: Update the APM X-Gene EDAC node with the RB register resource
  EDAC, xgene: Add missing SoC register bus error handling
  Documentation, EDAC: Update xgene binding for missing register bus
  EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()
2016-03-16 08:36:55 -07:00
Linus Torvalds
d88bfe1d68 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "Various RAS updates:

   - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable
     MCA) error decoding support (Aravind Gopalakrishnan)

   - x86 memcpy_mcsafe() support, to enable smart(er) hardware error
     recovery in NVDIMM drivers, based on an extension of the x86
     exception handling code.  (Tony Luck)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/sb_edac: Fix computation of channel address
  x86/mm, x86/mce: Add memcpy_mcsafe()
  x86/mce/AMD: Document some functionality
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
  x86/mce: Move MCx_CONFIG MSR definitions
  x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
  x86/mm: Expand the exception table logic to allow new handling options
  x86/mce/AMD: Set MCAX Enable bit
  x86/mce/AMD: Carve out threshold block preparation
  x86/mce/AMD: Fix LVT offset configuration for thresholding
  x86/mce/AMD: Reduce number of blocks scanned per bank
  x86/mce/AMD: Do not perform shared bank check for future processors
  x86/mce: Fix order of AMD MCE init function call
2016-03-14 18:43:51 -07:00
Luck, Tony
eb1af3b71f EDAC/sb_edac: Fix computation of channel address
Large memory Haswell-EX systems with multiple DIMMs per channel were
sometimes reporting the wrong DIMM.

Found three problems:

 1) Debug printouts for socket and channel interleave were not interpreting
    the register fields correctly. The socket interleave field is a 2^X
    value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
    2=3. 3=4).

 2) Actual use of the socket interleave value didn't interpret as 2^X

 3) Conversion of address to channel address was complicated, and wrong.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 18:31:55 +01:00
Aravind Gopalakrishnan
be0aec23bf x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.

Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.

[ Minor fixups. ]
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:14 +01:00
Hubert Chrzaniuk
83bdaad4d9 EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon Phi
Correct a typo introduced by

  d0cdf90031 ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support")

As a result under some configurations DIMMs were not correctly
recognized. Problem affects only Xeon Phi architecture.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-07 19:07:40 +01:00
Thor Thayer
c3eea1942a EDAC, altera: Add Altera L2 cache and OCRAM support
Add L2 Cache and On-Chip RAM EDAC support for the Altera SoCs. The SDRAM
controller is using the Memory Controller model.

Each type of ECC is individually configurable.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: galak@codeaurora.org
Cc: grant.likely@linaro.org
Cc: ijc+devicetree@hellion.org.uk
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux@arm.linux.org.uk
Cc: linux-doc@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: pawel.moll@arm.com
Cc: robh+dt@kernel.org
Link: http://lkml.kernel.org/r/1455132384-17108-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-11 12:23:06 +01:00
Thor Thayer
9bf4f00567 EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit()
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem on an EDAC device exit. However edac_debugfs might
not be empty. This is similar to

  30f84a891b ("EDAC: Use edac_debugfs_remove_recursive()")

which changed the EDAC MCI code to use edac_debugfs_remove_recursive().

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455064165-3816-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-10 10:37:46 +01:00
Sudip Mukherjee
f2b59ac66f EDAC, mpc85xx: Silence unused variable warning
We were getting this build warning:

  drivers/edac/mpc85xx_edac.c:1247:6: warning: unused variable 'pvr'

pvr is only used if CONFIG_FSL_SOC_BOOKE is defined. Declare it
__maybe_unused.

Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1454427573-7994-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02 18:53:15 +01:00
Borislav Petkov
06e912d4d4 EDAC: Cleanup/sync workqueue functions
They're both running only when ->edac_check is initialized so remove
that check from the workqueue function itself. Synchronize/generalize
the ->op_state check between the two.

Kill useless comments, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02 11:38:50 +01:00
Borislav Petkov
626a7a4dba EDAC: Kill workqueue setup/teardown functions
We have the generic wrappers now, use those. edac_pci_workq_setup() had
an unused argument anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02 11:38:43 +01:00
Borislav Petkov
0966760619 EDAC: Balance workqueue setup and teardown
We use the ->edac_check function pointers to determine whether we need
to setup a polling workqueue. However, the destroy path is not balanced
and we might try to teardown an unitialized workqueue.

Balance init and destroy paths by looking at ->edac_check in both cases.
Set op_state to OP_OFFLINE *before* destroying anything.

Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com>
Cc: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02 11:04:29 +01:00
Loc Ho
4d67e3ce75 EDAC, xgene: Add missing SoC register bus error handling
Add missing register bus error handling for APM X-Gene EDAC SoC and fix
a checking condition for CE error promoted to UE.

Signed-off-by: Loc Ho <lho@apm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: patches@apm.com
Link: http://lkml.kernel.org/r/1453495625-28006-3-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-25 11:17:22 +01:00
Dan Carpenter
6f3508f61c EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()
dct_sel_base_off is declared as a u64 but we're only using the lower 32
bits because of a shift wrapping bug. This can possibly truncate the
upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS
row.

Fixes: c8e518d567 ('amd64_edac: Sanitize f10_get_base_addr_offset')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-25 11:17:14 +01:00
Geliang Tang
1cac5503fb EDAC, i5100: Use to_delayed_work()
Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/58c0e319c7263a10b692100c657c06c42814aecf.1451659910.git.geliangtang@163.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-01 18:31:34 +01:00
Hubert Chrzaniuk
45f4d3ab3e EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing
Knights Landing does not come with register that could be used to fetch
DIMM width. However the value is fixed for this architecture so it can
be hardcoded.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:58:32 +01:00
Borislav Petkov
c4cf3b454e EDAC: Rework workqueue handling
Hide the EDAC workqueue pointer in a separate compilation unit and add
accessors for the workqueue manipulations needed.

Remove edac_pci_reset_delay_period() which wasn't used by anything. It
seems it got added without a user with

  91b99041c1 ("drivers/edac: updated PCI monitoring")

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:43 +01:00
Borislav Petkov
e136fa016f EDAC: Make edac_device workqueue setup/teardown functions static
They're not used anywhere else.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:42 +01:00
Borislav Petkov
d4538000ca EDAC: Remove edac_get_sysfs_subsys() error handling
It cannot fail now. We either load EDAC core after having successfully
initialized edac_subsys or we don't.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:41 +01:00
Borislav Petkov
a97d262701 EDAC: Unexport and make edac_subsys static
... and use the accessor instead.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:40 +01:00
Borislav Petkov
733476cf20 EDAC: Rip out the edac_subsys reference counting
This was really dumb - reference counting for the main EDAC sysfs
object. While we could've simply registered it as the first thing in the
module init path and then hand it around to what needs it.

Do that and rip out all the code around it, thus simplifying the whole
handling significantly.

Move the edac_subsys node back to edac_module.c.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:39 +01:00
Borislav Petkov
fcd5c4dd82 EDAC: Robustify workqueues destruction
EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.

Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.

  EDAC i7core: Driver loaded.
  general protection fault: 0000 [#1] SMP
  CPU 12
  Modules linked in:
  Supported: Yes
  Pid: 0, comm: kworker/0:1 Tainted: G          IE   3.0.101-0-default #1 HP ProLiant DL380 G7
  RIP: 0010:[<ffffffff8107dcd7>]  [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
  < ... regs ...>
  Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
  Stack:
   ...
  Call Trace:
   call_timer_fn
   run_timer_softirq
   __do_softirq
   call_softirq
   do_softirq
   irq_exit
   smp_apic_timer_interrupt
   apic_timer_interrupt
   intel_idle
   cpuidle_idle_call
   cpu_idle
  Code: ...
  RIP  __queue_work
   RSP <...>

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
2015-12-11 16:56:39 +01:00
Borislav Petkov
12e26969b3 EDAC, mc_sysfs: Fix freeing bus' name
I get the splat below when modprobing/rmmoding EDAC drivers. It happens
because bus->name is invalid after bus_unregister() has run. The Code: section
below corresponds to:

  .loc 1 1108 0
  movq    672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
  .loc 1 1109 0
  popq    %rbx    #

  .loc 1 1108 0
  movq    (%rax), %rdi    # _7->name,
  jmp     kfree   #

and %rax has some funky stuff 2030203020312030 which looks a lot like
something walked over it.

Fix that by saving the name ptr before doing stuff to string it points to.

  general protection fault: 0000 [#1] SMP
  Modules linked in: ...
  CPU: 4 PID: 10318 Comm: modprobe Tainted: G          I EN  3.12.51-11-default+ #48
  Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
  task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
  RIP: 0010:[<ffffffffa019da92>]  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
  RSP: 0018:ffff88030da3fe28  EFLAGS: 00010292
  RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
  RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
  RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
  R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
  R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
  FS:  00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
  Stack:
  Call Trace:
    i7core_unregister_mci.isra.9
    i7core_remove
    pci_device_remove
    __device_release_driver
    driver_detach
    bus_remove_driver
    pci_unregister_driver
    i7core_exit
    SyS_delete_module
    system_call_fastpath
    0x7fc9bf426536
  Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
  RIP  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
   RSP <ffff88030da3fe28>

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: <stable@vger.kernel.org> # v3.6..
Fixes: 7a623c0390 ("edac: rewrite the sysfs code to use struct device")
2015-12-11 16:56:38 +01:00
Scott Wood
666db563d3 EDAC, mpc85xx: Make mpc85xx-pci-edac a platform device
Originally the mpc85xx-pci-edac driver bound directly to the PCI
controller node.

Commit

  905e75c46d ("powerpc/fsl-pci: Unify pci/pcie initialization code")

turned the PCI controller code into a platform device. Since we can't
have two drivers binding to the same device, the EDAC code was changed
to be called into as a library-style submodule. However, this doesn't
work if the EDAC driver is built as a module.

Commit

  8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting")

exposed another problem with this approach -- mpc85xx_pci_err_probe()
was being called in the same early boot phase that the PCI controller
is initialized, rather than in the device_initcall phase that the EDAC
layer expects. This caused a crash on boot.

To fix this, the PCI controller code now creates a child platform device
specifically for EDAC, which the mpc85xx-pci-edac driver binds to.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Jia Hongtao <B38951@freescale.com>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Kim Phillips <kim.phillips@freescale.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:16 +01:00
Jim Snow
d0cdf90031 EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support
Knights Landing is the next generation architecture for HPC market.

KNL introduces concept of a tile and CHA - Cache/Home Agent for memory
accesses.

Some things are fixed in KNL:
() There's single DIMM slot per channel
() There's 2 memory controllers with 3 channels each, however,
   from EDAC standpoint, it is presented as single memory controller
   with 6 channels. In order to represent 2 MCs w/ 3 CH, it would
   require major redesign of EDAC core driver.

Basically, two functionalities are added/extended:
() during driver initialization KNL topology is being recognized, i.e.
   which channels are populated with what DIMM sizes
   (knl_get_dimm_capacity function)
() handle MCE errors - channel swizzling

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449136134-23706-5-git-send-email-hubert.chrzaniuk@intel.com
[ Rebase to 4.4-rc3. ]
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-05 19:00:52 +01:00
Jim Snow
c1979ba254 EDAC, sb_edac: Add support for duplicate device IDs
Add options to sbridge_get_all_devices() to allow for duplicate device
IDs and devices that are scattered across mulitple PCI buses.

Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449136134-23706-4-git-send-email-hubert.chrzaniuk@intel.com
[ Rebase to 4.4-rc3. ]
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-05 18:57:41 +01:00
Jim Snow
c59f9c06bd EDAC, sb_edac: Virtualize several hard-coded functions
SAD limit, interleave mode and DRAM related functionalities are now
virtualized, so that overriding them is easier.

Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449136134-23706-3-git-send-email-hubert.chrzaniuk@intel.com
[ Rebase to 4.4-rc3. ]
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-05 18:54:45 +01:00
Thierry Reding
768ce42cce EDAC, mv64x60: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1449073138-10852-2-git-send-email-thierry.reding@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-03 12:05:40 +01:00
Thierry Reding
d54051f1cc EDAC, mpc85xx: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1449136632-11680-1-git-send-email-thierry.reding@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-03 12:03:36 +01:00
Borislav Petkov
e8f937da74 EDAC, pci: Remove old disabled code
Remove an unused edac_pci_find() function iterating over edac_pci_list.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-11-18 14:00:05 +01:00
Linus Torvalds
9cf5c095b6 asm-generic cleanups
The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig
 to clean up various abuses of headers in there. The patch to rename the
 io-64-nonatomic-*.h headers caused some conflicts with new users, so I
 added a workaround that we can remove in the next merge window.
 
 The only other patch is a warning fix from Marek Vasut
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5
 nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh
 sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL
 rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA
 4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3
 uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01
 KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe
 wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ
 iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p
 jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN
 smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu
 vTUkB/I66Hk=
 =dQKG
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "The asm-generic changes for 4.4 are mostly a series from Christoph
  Hellwig to clean up various abuses of headers in there.  The patch to
  rename the io-64-nonatomic-*.h headers caused some conflicts with new
  users, so I added a workaround that we can remove in the next merge
  window.

  The only other patch is a warning fix from Marek Vasut"

* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: temporarily add back asm-generic/io-64-nonatomic*.h
  asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
  gpio-mxc: stop including <asm-generic/bug>
  n_tracesink: stop including <asm-generic/bug>
  n_tracerouter: stop including <asm-generic/bug>
  mlx5: stop including <asm-generic/kmap_types.h>
  hifn_795x: stop including <asm-generic/kmap_types.h>
  drbd: stop including <asm-generic/kmap_types.h>
  move count_zeroes.h out of asm-generic
  move io-64-nonatomic*.h out of asm-generic
2015-11-06 14:22:15 -08:00
Linus Torvalds
b831ef2cad Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS changes from Ingo Molnar:
 "The main system reliability related changes were from x86, but also
  some generic RAS changes:

   - AMD MCE error injection subsystem enhancements.  (Aravind
     Gopalakrishnan)

   - Fix MCE and CPU hotplug interaction bug.  (Ashok Raj)

   - kcrash bootup robustness fix.  (Baoquan He)

   - kcrash cleanups.  (Borislav Petkov)

   - x86 microcode driver rework: simplify it by unmodularizing it and
     other cleanups.  (Borislav Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
  x86/mce: Add a Scalable MCA vendor flags bit
  MAINTAINERS: Unify the microcode driver section
  x86/microcode/intel: Move #ifdef DEBUG inside the function
  x86/microcode/amd: Remove maintainers from comments
  x86/microcode: Remove modularization leftovers
  x86/microcode: Merge the early microcode loader
  x86/microcode: Unmodularize the microcode driver
  x86/mce: Fix thermal throttling reporting after kexec
  kexec/crash: Say which char is the unrecognized
  x86/setup/crash: Check memblock_reserve() retval
  x86/setup/crash: Cleanup some more
  x86/setup/crash: Remove alignment variable
  x86/setup: Cleanup crashkernel reservation functions
  x86/amd_nb, EDAC: Rename amd_get_node_id()
  x86/setup: Do not reserve crashkernel high memory if low reservation failed
  x86/microcode/amd: Do not overwrite final patch levels
  x86/microcode/amd: Extract current patch level read to a function
  x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
  x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
  ...
2015-11-03 17:51:33 -08:00
Tan Xiaojun
990995bad1 EDAC: Fix PAGES_TO_MiB macro misuse
The PAGES_TO_MiB macro is used for unit conversion but the
trace_mc_event() tracepoint expects a page address. Fix that.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-22 22:57:30 +02:00
Aravind Gopalakrishnan
1a6775c1a2 x86/amd_nb, EDAC: Rename amd_get_node_id()
This function doesn't give us the "Node ID" as the function name
suggests. Rather, it receives a PCI device as argument, checks
the available F3 PCI device IDs in the system and returns the
index of the matching Bus/Device IDs.

Rename it to amd_pci_dev_to_node_id().

No functional change is introduced.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:55 +02:00
Dinh Nguyen
941fd2e709 EDAC, altera: SoCFPGA EDAC should not look for ECC_CORR_EN
The bootloader may or may not enable the ECC_CORR_EN bit. By
not enabling ECC_CORR_EN, when error happens, it is the user's
responsibility to perform a full SDRAM scrub.

Remove the check for ECC_CORR_EN.

Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Thor Thayer <tthayer@opensource.altera.com>
Link: http://lkml.kernel.org/r/1444864456-21778-1-git-send-email-dinguyen@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-15 11:57:23 +02:00
Christoph Hellwig
2f8e2c8777 move io-64-nonatomic*.h out of asm-generic
These are not implementations of default architecture code but helpers
for drivers. Move them to the place they belong to.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-10-15 00:21:07 +02:00
Tan Xiaojun
30f84a891b EDAC: Use edac_debugfs_remove_recursive()
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem, but mci->debugfs might not empty.

This can be triggered by the following sequence:

1) Enable CONFIG_EDAC_DEBUG
2) insmod an EDAC module (like i3000_edac or similar)
3) rmmod this module
4) we can see files remaining under <debugfs_mountpoint>/edac/ like
   "fake_inject", for example.

Removing edac_core then, causes a NULL pointer dereference.

Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com>
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-14 18:50:32 +02:00
Luis de Bethencourt
90b3b37383 EDAC, ppc4xx_edac: Fix module autoload for OF platform driver
This platform driver has an OF device ID table but the OF module alias
information is not created so module autoloading won't work.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20150917114619.GA13145@goodgumbo.baconseed.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-03 12:19:42 +02:00
Aravind Gopalakrishnan
1a8bc7707e EDAC, amd64_edac: Update copyright and remove changelog
Git provides us all the changelogs anyway. So trim the comments section
here. Update the copyrights info while at it.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443440593-2316-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-29 13:41:04 +02:00
Aravind Gopalakrishnan
da92110dfd EDAC, amd64_edac: Extend scrub rate support to F15hM60h
The scrub rate control register has moved to function 2 in PCI config
space and is at a different offset on family 0x15, models 0x60 and
later. The minimum recommended scrub rate has also changed. (Refer to
D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG).

Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate
this.

Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Cleanup conditionals. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-29 13:25:33 +02:00
Toshi Kani
d0c9c93019 EDAC: Don't allow empty DIMM labels
Updating dimm_label to an empty string does not make much sense. Change
the sysfs dimm_label store operation to fail a request when an input
string is empty.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: elliott@hpe.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-28 16:39:05 +02:00
Toshi Kani
438470b84c EDAC: Fix sysfs dimm_label store operation
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
in their store operation:

 1) A newline-terminated input string causes redundant newlines:

  # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat  /sys/bus/mc0/devices/dimm0/dimm_label
  test

  #  od -bc /sys/bus/mc0/devices/dimm0/dimm_label
  0000000 164 145 163 164 012 012
            t   e   s   t  \n  \n
  0000006

 2) The original label string (31 characters) cannot be stored due to
    an improper size check:

  # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat /sys/bus/mc0/devices/dimm0/dimm_label

  # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
   0000000 012 012
            \n  \n
   0000002

 3) An input string longer than the buffer size results a wrong label
    info as it allows a retry with the remaining string:

  # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat  /sys/bus/mc0/devices/dimm0/dimm_label
  _TEST

Fix these issues by making the following changes:
 1) Replace a newline character at the end by setting a null. It also
    assures that the string is null-terminated in the label buffer.
 2) Check the label buffer size with 'sizeof(dimm->label)'.
 3) Fail a request if its string exceeds the label buffer size.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Robert Elliott <elliott@hpe.com>
Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 19:45:59 +02:00
Toshi Kani
1ea62c59c8 EDAC: Fix sysfs dimm_label show operation
After

  7d375bffa5 ("sb_edac: Fix support for systems with two home agents per socket")

sysfs "dimm_label" and "chX_dimm_label" show their label string without a
newline "\n" at the end.

  [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
  CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

  [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
  CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

The label strings now have 31 characters, which are the same as
EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
the newline in the format "%s\n" is ignored.

  [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
  0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
            C   P   U   _   S   r   c   I   D   #   0   _   H   a   #   0
  0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
            _   C   h   a   n   #   0   _   D   I   M   M   #   0  \0
  0000037

Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
snprintf()s in channel_dimm_label_show() and dimmdev_label_show().

Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 19:19:21 +02:00
Loc Ho
f864b79ba2 EDAC, xgene: Add SoC support
Add support for the SoC component.

Signed-off-by: Loc Ho <lho@apm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: ijc+devicetree@hellion.org.uk
Cc: jcm@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: patches@apm.com
Cc: robh+dt@kernel.org
Link: http://lkml.kernel.org/r/1443055261-8613-4-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 15:41:46 +02:00
Loc Ho
9bc1c0c0ec EDAC, xgene: Fix possible sprintf() overflow issue
Replace sprintf() with snprintf() to avoid possible string array
overflow.

Signed-off-by: Loc Ho <lho@apm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: ijc+devicetree@hellion.org.uk
Cc: jcm@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: patches@apm.com
Cc: robh+dt@kernel.org
Link: http://lkml.kernel.org/r/1443116287-11752-1-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 15:39:52 +02:00
Loc Ho
9347473c7d EDAC, xgene: Add L3 support
Add EDAC support for the L3 component.

Signed-off-by: Loc Ho <lho@apm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: ijc+devicetree@hellion.org.uk
Cc: jcm@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: patches@apm.com
Cc: robh+dt@kernel.org
Link: http://lkml.kernel.org/r/1443055261-8613-3-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 15:36:31 +02:00
Seth Jennings
2900ea6096 EDAC, sb_edac: Fix TAD presence check for sbridge_mci_bind_devs()
In commit

  7d375bffa5 ("sb_edac: Fix support for systems with two home agents per socket")

NUM_CHANNELS was changed to 8 and the channel space was renumerated to
handle EN, EP, and EX configurations.

The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() -
got a new device presence check in the form of saw_chan_mask. However,
sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop.

With the increase in NUM_CHANNELS, this loop fails at index 4 since
SB only has 4 TADs.  This results in the following error on SB machines:

  EDAC sbridge: Some needed devices are missing
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handle

This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as
well.

After this patch:

  EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED)
  EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED)

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.2
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-24 20:40:50 +02:00
Aravind Gopalakrishnan
58a9c251c9 EDAC, ghes_edac: Remove redundant memory_type array
We already have edac_mem_types[] that enumerates the different kinds of
memory. So, use that and remove the redundant memory_type[] array here.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1442436811-23382-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-23 16:59:25 +02:00
Borislav Petkov
09bd1b4f81 EDAC, xgene: Convert to debugfs wrappers
Drop CONFIG_EDAC_DEBUG ifdeffery too, while at it.

Tested-by: Loc Ho <lho@apm.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-23 11:37:45 +02:00
Borislav Petkov
52019e406c EDAC, i5100: Convert to debugfs wrappers
This driver creates its debugfs hierarchy under the toplevel debugfs dir
- see i5100_init() - so make it use edac_debugfs_create_dir_at( , NULL)
because we're not breaking userspace. Oh well.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-22 18:21:12 +02:00
Borislav Petkov
bba3b31e44 EDAC, altera: Convert to debugfs wrappers
Use the EDAC-specific wrappers. Drop CONFIG_EDAC_DEBUG ifdeffery.

Cc: Thor Thayer <tthayer@opensource.altera.com>
Cc: <linux-edac@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-22 18:20:56 +02:00
Borislav Petkov
4397bcb4fa EDAC: Add debugfs wrappers
Later patches will convert EDAC users to those.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-22 18:10:22 +02:00
Borislav Petkov
7ac8bf9bc9 EDAC: Carve out debugfs functionality
... into a separate compilation unit and drop a couple of
CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to
edac_create_debugfs_nodes(), while at it.

No functionality change.

Cc: <linux-edac@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-22 12:29:46 +02:00
Linus Torvalds
d9b44fe30f edac updates for v4.3-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV8giFAAoJEAhfPr2O5OEVYxsP/imxjTa1utbeYToA+oqut9Yw
 HbMB6yjscfZF/CwP/rB/T5jNTTww6pvavBntw29NewTmPIfREuvYMvvOtCkyKdQf
 yEVeME0cvJAiSdtiqeWoAMJYm3VMDKPh0p/cncvhcpip0ORU3prV9XEqNo1byvEU
 S4laHNfxgDbTHptVhvaT7Li/yN3KyenlaT61K3dYn0LtbPf6uxSsvqX+ExuZSix7
 ZWS+wQrzX35hQHvL7Ax1iZv15uTfrC2kD2bwZrgetvl6wVdwBKx9vbWosaZ+XffV
 gv4dxJG5zNV2nFPfFUxqQZ++OcieuqP1yeWtwbuBPx3qibDKTb/IJYX8VvHAAc8n
 EgaDwFsYH6YnLsxNL58+W9PYigVyc2VS2Nfqz9uRuXmuryoyDbCpcgVmZYdmgYKn
 c1Rc4DzOqvOWrv5jdtSxYmdmAMqgf0LjlMzUmZ7shJYWprsqjgwqxevPXmH9xb7s
 myBpM604KofiWGPGqVNVYIdt8AldXc1QNnyCxGHcAAdIzEOn+7WEdeWqVQ665Vya
 x2Cp9h1IURPtcniw7Dx/0nqBu64Y6OMGz9W6H0SxBUWc2cC6QZZ4k9zOg2krtFkt
 c/S6RCYdIoxIcwTPfhyoBfCAHueSfUaJ8sIAWMrQLFUCgALN2f1WslIYEJVGcPOT
 UeQzSo1pe9wyZu5hQzji
 =DrsG
 -----END PGP SIGNATURE-----

Merge tag 'edac/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac

Pull edac updates from Mauro Carvalho Chehab:
 "Two EDAC fixes for Intel systems (Haswell and Ivy Bridge)"

* tag 'edac/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  sb_edac: correctly fetch DIMM width on Ivy Bridge and Haswell
  sb_edac: look harder for DDRIO on Haswell systems
2015-09-11 16:21:12 -07:00
Aristeu Rozanski
12f0721c5a sb_edac: correctly fetch DIMM width on Ivy Bridge and Haswell
dimm_dev_type has been incorrectly determined in sb_edac. This patch fixes it
for Ivy Bridge and Haswell only since nothing like exists for Sandy Bridge.
We tested this patch in multiple systems matching the results with the
installed memory modules.

Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-09-08 20:33:48 -03:00
Aristeu Rozanski
7179385afe sb_edac: look harder for DDRIO on Haswell systems
In case the memory banks are populated so the first channel isn't used, the
DDRIO PCI device won't be visible and it won't be possible to determine the
memory type.

Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-09-08 20:32:13 -03:00
Linus Torvalds
2d678b68e8 Minor stuff: AMD MCE decoding correction and xgene_edac cleanup.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV5aADAAoJEBLB8Bhh3lVKHM8P/RLgWShqZ0CwXz+2v9e8wZxr
 Uwo3xhx0jqUTefBNZqMGgJKq7DexNXWa/40vHydPO4S7yNRka4aftev01OVr2tVG
 3v/ggqxV3SRYkTyhQhjzs39MDeb6+jZlMPz17DmM75iry1og9iT213jPIk6aAt9G
 OoIGgWWT6rtsDa55+Yat/15qc4ajE5mmJnwej/Ybr/B3qI4/Zm3XGVqopbg2axrw
 CtmcK7+3OJ2mvbBTGCoNAL+xtcCRXsJvch8+eo1PhJDjA1lY37lCdZa2TaD0RvsU
 /YFDotPNYf09RJW+f8FvMkj4P3/Upwo0w97e1nhqLRYemmezdSBd+iU3KWHCdWkf
 wj1XFtdpDEDuDzpo2zX5KOA/KgB7ozcRKsm9NzcQHNnbt5UCXmHIWuK3+3ED/RdM
 Pbm4SrLzFX0Fp/08/Vd55CSZ0bP8+bduP5su8yBfX+oUuNY3cvNO2e7zmSRaifON
 CI+hpjfjpaDOmydK4/RbtPvB3YkcykeB7Jk/gyDTD2ZKjxoQw8YMH5+7Xy/cMQYA
 aKnnbE0O14+fnp7JrZJ/ER2FlcSOMiWBhgPJ2wi+0BqcJZhpwvhkEMmClsMtrQss
 u7rX9B5+GqeAQ145X+67CK3VwckyD6ZO6rwP7q6VEiknmkMAO9ML4DFUhCO7YeHx
 8s5Cpy94v2rPext0O2Uw
 =23M/
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fixes from Borislav Petkov:
 "Two minor fixlets this time: AMD MCE decoding correction and
  xgene_edac cleanup"

* tag 'edac_for_4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, mce_amd: Don't emit 'CE' for Deferred error
  EDAC, xgene: Drop owner assignment from platform_driver
2015-09-01 18:34:22 -07:00
Linus Torvalds
3959df1dfb Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "MCE handling updates, but also some generic drivers/edac/ changes to
  better organize the Kconfig space"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras: Move AMD MCE injector to arch/x86/ras/
  x86/mce: Add a wrapper around mce_log() for injection
  x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check()
  RAS: Add a menuconfig option with descriptive text
  x86/mce: Reenable CMCI banks when swiching back to interrupt mode
  x86/mce: Clear Local MCE opt-in before kexec
  x86/mce: Remove unused function declarations
  x86/mce: Kill drain_mcelog_buffer()
  x86/mce: Avoid potential deadlock due to printk() in MCE context
  x86/mce: Remove the MCE ring for Action Optional errors
  x86/mce: Don't use percpu workqueues
  x86/mce: Provide a lockless memory pool to save error records
  x86/mce: Reuse one of the u16 padding fields in 'struct mce'
2015-08-31 20:20:30 -07:00
Borislav Petkov
6c36dfe949 x86/ras: Move AMD MCE injector to arch/x86/ras/
This is an x86-specific module and would benefit from being
closer to the arch code. Move it there. Update copyright while
at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:54 +02:00
Borislav Petkov
eef4dfa0cb x86/mce: Kill drain_mcelog_buffer()
This used to flush out MCEs logged during early boot and which
were in the MCA registers from a previous system run. No need
for that now, since we've moved to a genpool.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:52 +02:00
Chen, Gong
fd4cf79fcc x86/mce: Remove the MCE ring for Action Optional errors
Use unified genpool to save Action Optional error events and put
Action Optional error handling in the same notification chain as
MCE error decoding.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Fold in subsequent patch from Boris for early boot logging. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Correct a lot. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:51 +02:00
Michael Walle
5c16179b55 EDAC, ppc4xx: Access mci->csrows array elements properly
The commit

  de3910eb79 ("edac: change the mem allocation scheme to
		 make Documentation/kobject.txt happy")

changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.

Signed-off-by: Michael Walle <michael@walle.cc>
Cc: <stable@vger.kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-08-13 06:02:19 +02:00
Aravind Gopalakrishnan
99e1dfb7d2 EDAC, mce_amd: Don't emit 'CE' for Deferred error
Currently, when decoding an MCE, we display 'CE' for a Deferred error, like
this:

[Hardware Error]: CPU:0 (15:2:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|Deferred|-|UECC]: 0xdc04b00095080813

When the 'UC' bit in the MCx_STATUS register is clear, the error status
is either a Corrected error or Deferred error as determined by the
'Deferred' bit. So do not print 'CE' on a deferred error.

Refer to AMD Error Scope Hierarchy table in a newer BKDG (example:
49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features").

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1436788382-6463-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-07-14 06:32:53 +02:00
Krzysztof Kozlowski
ca12bb14fe EDAC, xgene: Drop owner assignment from platform_driver
platform_driver does not need to set an owner because
platform_driver_register() will set it.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Loc Ho <lho@apm.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1436507243-11159-1-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-07-10 14:16:24 +02:00
Linus Torvalds
b53343fc6c A build fix for octeon_edac from Aaro Koskinen.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVlPv5AAoJEBLB8Bhh3lVKlxIP/35j4O/i4+xgw2CRQvAflJgz
 haUZsguT7QRXqbVmQdAMqyCrz77jnIIFIJEcoqpvpcHSVuNblbjBS6xc7u1UAfmF
 h7M+XIFZuIX7mp6svVJujN+VgP1Obgsx860bG335HIJZZzPfOfs31pisxUddXNpC
 deuP4opLlKcykioFF4GXhoLzDBoc2PLfPGczNRzw9HUisN2F+SK4NEa7iehp8RBX
 UCQTtwsLFuBzi4AHqAfrNKgDiEQ+DZ529sVh5bqAk/U8VDqUulKqIjLh0G75tYm2
 SauVE2woOn4WqUTG8CJBjCTauL+BIc2LTcPh6qIzWxsgZnyeHCve/1QBBOMi/yk2
 FAJYg2VDhPqj0OyUqC7JUXicB3ZKviWYTZ3knv9mIF8m98N8K4kAwGPuUkTc3Z2P
 fBBusvzb+6/OTN2WyQaLx8DjG/+Ha/o0AEeWE6S4A9+urCrJIx+Dkelt3dngdrVe
 kbHyvKmTPy13U1zEo5jmMZkjf7xNfotpckTIwCz3l00Z5AwPPVjC5KylhPvlwM7L
 BpftYJsPRJrIK0zkuU3Yfcrhy8U2QcQAV14IMEW1r5pZwpLH+EfoIHAbh8PWJc5C
 toporRXFc2R7MT1+0DA4faNzfwXXBLbT5BYRzNfTZPClAeSqKE0OPj4Ytv+JWyO5
 kx3HK/kWiSH2SxQ1Sid1
 =7C57
 -----END PGP SIGNATURE-----

Merge tag 'edac_urgent_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
 "A build fix for octeon_edac from Aaro Koskinen"

* tag 'edac_urgent_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, octeon: Fix broken build due to model helper renames
2015-07-03 12:10:12 -07:00
Aaro Koskinen
75a15a7864 EDAC, octeon: Fix broken build due to model helper renames
Commit

  debe6a623d ("MIPS: OCTEON: Update octeon-model.h code for new SoCs.")

renamed some SoC model helper functions, but forgot to update the EDAC
drivers resulting in build failures. Fix that.

Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/1435747132-10954-1-git-send-email-aaro.koskinen@nokia.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-07-02 10:46:28 +02:00
Linus Torvalds
8d7804a2f0 Driver core patches for 4.2-rc1
Here is the driver core / firmware changes for 4.2-rc1.
 
 A number of small changes all over the place in the driver core, and in
 the firmware subsystem.  Nothing really major, full details in the
 shortlog.  Some of it is a bit of churn, given that the platform driver
 probing changes was found to not work well, so they were reverted.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D
 jg8AnRgdW8ArDA/xOarULd/X43eA3J3C
 =Al2B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the driver core / firmware changes for 4.2-rc1.

  A number of small changes all over the place in the driver core, and
  in the firmware subsystem.  Nothing really major, full details in the
  shortlog.  Some of it is a bit of churn, given that the platform
  driver probing changes was found to not work well, so they were
  reverted.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
  Revert "base/platform: Only insert MEM and IO resources"
  Revert "base/platform: Continue on insert_resource() error"
  Revert "of/platform: Use platform_device interface"
  Revert "base/platform: Remove code duplication"
  firmware: add missing kfree for work on async call
  fs: sysfs: don't pass count == 0 to bin file readers
  base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
  base/platform: Remove code duplication
  of/platform: Use platform_device interface
  base/platform: Continue on insert_resource() error
  base/platform: Only insert MEM and IO resources
  firmware: use const for remaining firmware names
  firmware: fix possible use after free on name on asynchronous request
  firmware: check for file truncation on direct firmware loading
  firmware: fix __getname() missing failure check
  drivers: of/base: move of_init to driver_init
  drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
  sysfs: disambiguate between "error code" and "failure" in comments
  driver-core: fix build for !CONFIG_MODULES
  driver-core: make __device_attach() static
  ...
2015-06-26 15:07:37 -07:00
Linus Torvalds
da996f7310 edac updates for v4.2-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVjFrRAAoJEAhfPr2O5OEVD70P/A17My+ABfqme1snLyJuiQsz
 Uuu1JpS4ZcCxEdc4J/nCxexlbwr7b5/2bkSWASIQaOYMOcgM7CkBs5Z7fIDjpgOt
 UMCMCnmVW6A/nMeVKfmcDri0BDzW73Yt8jT8Urtp4yz0k5wRV5fBHHK0JH2l8Zj6
 ND9KPJQqA96cshe4TiLBJj+stZhRJhPG/FeQ5GwTs6j4mn+lyu7bGJ/V5h45iYkG
 6frYP8UJMYR6g1VXl3ohTlyBJfEFdoG5Uz0H5oyqfRuFJtEuXNSCQVy2xWscBUc5
 A4l/brTXXDhHgo6mq7tFDO2BldiIMWHZhLbDvzoAlVOK5s14q30DhSF8q25CP3+t
 uszwV6mihGerr1VjUEm3n1Uus5wuAXsgn75k1dnqbL/3aut0UYkOS9skml3ioWiV
 PQB+Ix3Bjz7k9Hwi4Rd2KZEhwvPAWXBKyRjScq9DYjo3+Er8ClgoJTcADiYXPTRM
 ctR2WY6PLkMhTdePBMJbK51ZljMGKZj0boDFHofZceQWVaDod8dSFn5HFjQkc3nu
 Ty9Qb3LMDPZeensVg5bOlIgXsLoUoo7zlwhdxVRAyc6duSJndKAJoQL6l1RTvXYM
 UQbEdMQZURGzTRyGui+tX9+NBER0n7mJtw9qUKWaxi3GlTGH8ETg/ga0COM0SEkD
 Ax+ZvqYeyJSh4/F2nGFV
 =bjTk
 -----END PGP SIGNATURE-----

Merge tag 'edac/v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac

Pull edac updates from Mauro Carvalho Chehab:
 "Some fixes and additions to the EDAC driver used on modern Intel x86
  CPUs.  It includes support for Broadwell EP/EX platforms and fixes for
  motherboards with more than 2 CPU sockets"

* tag 'edac/v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  sb_edac: support for Broadwell -EP and -EX
  sb_edac: Fix support for systems with two home agents per socket
  sb_edac: Fix a typo and a thinko in address handling for Haswell
  EDAC: Remove arbitrary limit on number of channels
2015-06-25 18:22:20 -07:00
Borislav Petkov
cda9459da7 EDAC, mce_amd_inj: Set MISCV on injection
When during injection we populate MCi_MISC by writing into misc, we need
to set the MiscV bit in the corresponding MCi_STATUS register which
denotes that there's valid info in the MCi_MISC register.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:17:38 +02:00
Borislav Petkov
6d1e9bf5b0 EDAC, mce_amd_inj: Move bit preparations before the injection
We do get_online_cpus() and then start noodling with the bits. Do that
*before* we grab the hotplug lock.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:17:37 +02:00
Borislav Petkov
f2f3dca1b7 EDAC, mce_amd_inj: Cleanup and simplify README
Save us an indentation level, widen to 80 cols, make the text more
succinct and slender. Use i as the bank variable, same as what the
documentation uses.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:17:34 +02:00
Alan Tull
6f2b6422d4 EDAC, altera: Do not allow suspend when EDAC is enabled
Suspend-to-RAM and EDAC support are mutually exclusive on SOCFPGA. If
EDAC is enabled, it will prevent the platform from going into suspend.

The reason is that the IRQ vectors for OCRAM reside on DDR and in
Suspend-to-RAM mode we're executing out of OCRAM. If an ECC error
occurs, we can't handle it so it was decided to make them mutually
exclusive.

Signed-off-by: Alan Tull <atull@opensource.altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Cc: dinh.linux@gmail.com
Cc: dougthompson@xmission.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: tthayer@opensource.altera.com
Link: http://lkml.kernel.org/r/1433512155-9906-1-git-send-email-dinguyen@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:16:12 +02:00
kbuild test robot
de2776787f EDAC, mce_amd_inj: Make inj_type static
It is used there only anyway.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: kbuild-all@01.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20150605112426.GA97073@lkp-sb04
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:16:11 +02:00
Thor Thayer
73bcc942f4 EDAC, altera: Add Arria10 EDAC support
The Arria10 SDRAM and ECC system differs significantly from the
Cyclone5 and Arria5 SoCs. This patch adds support for the Arria10
SoC.
1) IRQ handler needs to support SHARED IRQ
2) Support sberr and dberr address reporting.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: galak@codeaurora.org
Cc: grant.likely@linaro.org
Cc: ijc+devicetree@hellion.org.uk
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: m.chehab@samsung.com
Cc: mark.rutland@arm.com
Cc: pawel.moll@arm.com
Cc: robh+dt@kernel.org
Cc: tthayer.linux@gmail.com
Link: http://lkml.kernel.org/r/1433428128-7292-4-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:16:09 +02:00
Thor Thayer
143f4a5ac5 EDAC, altera: Refactor for Altera CycloneV SoC
The Arria10 SoC uses a completely different SDRAM controller from the
earlier CycloneV and ArriaV SoCs. This patch abstracts the SDRAM bits
for the CycloneV/ArriaV SoCs in preparation for the Arria10 support.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: galak@codeaurora.org
Cc: grant.likely@linaro.org
Cc: ijc+devicetree@hellion.org.uk
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: m.chehab@samsung.com
Cc: mark.rutland@arm.com
Cc: pawel.moll@arm.com
Cc: robh+dt@kernel.org
Cc: tthayer.linux@gmail.com
Link: http://lkml.kernel.org/r/1433428128-7292-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:16:08 +02:00
Thor Thayer
f9ae487e04 EDAC, altera: Generalize driver to use DT Memory size
The Arria10 SOC uses a completely different SDRAM controller from the
earlier CycloneV and ArriaV SoCs. The memory size is calculated in the
bootloader and passed via the device tree. Using this device tree size
is more generic than using the register fields to calculate the memory
size for different SDRAM controllers.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: dinguyen@opensource.altera.com
Cc: galak@codeaurora.org
Cc: grant.likely@linaro.org
Cc: ijc+devicetree@hellion.org.uk
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: m.chehab@samsung.com
Cc: mark.rutland@arm.com
Cc: pawel.moll@arm.com
Cc: robh+dt@kernel.org
Cc: tthayer.linux@gmail.com
Link: http://lkml.kernel.org/r/1433428128-7292-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:16:07 +02:00
Aravind Gopalakrishnan
99e21fea47 EDAC, mce_amd_inj: Add README file
Provide information about each injection file and its usage for ease of
use and in-band documentation. This is a good idea adapted from ftrace.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1433277362-10911-7-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 18:15:51 +02:00
Aravind Gopalakrishnan
4c6034e8e1 EDAC, mce_amd_inj: Add individual permissions field to dfs_node
Add per-file permissions to the dfs_fls[] array.

In a later patch, we will add a README file that needs different
permissions. Hence the move here to add a perm field.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1433277362-10911-6-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24 15:17:18 +02:00
Aravind Gopalakrishnan
0451d14d05 EDAC, mce_amd_inj: Modify flags attribute to use string arguments
Use strings such as "hw" or "sw" to indicate the type of error injection
to be performed.

Current flags attribute derives the meanings of values that can be
programmed into it from asm/mce.h. Moving to defined strings for the
attribute allows this module to be self-sufficient and removes the
dependency. Also, we can introduce new flags as and when needed without
having to worry about conflicting with the flags already defined in
asm/mce.h.

Also, modify do_inject() to use the newly defined injection_type enum to
figure out the injection mechanism we need to use

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1433277362-10911-4-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Use strstrip() return value. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-03 16:47:51 +02:00
Aravind Gopalakrishnan
685d46d72b EDAC, mce_amd_inj: Read out number of MCE banks from the hardware
The number of banks for a given processor is encoded in
MSR_IA32_MCG_CAP[7:0]. So obtain the value from that MSR and use it for
sanity checking in inj_bank_set() instead of doing a family/model check.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Link: http://lkml.kernel.org/r/1432753418-2985-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-03 16:18:22 +02:00
Aravind Gopalakrishnan
e7f2ea1dbe EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too
inj_bank_get() is generic enough that we can use the MCE_INJECT_GET
macro instead.

No functionality change.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1433277362-10911-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-03 16:16:21 +02:00
Tony Luck
fa2ce64f85 sb_edac: support for Broadwell -EP and -EX
Basic support for the single socket Broadwell-DE processor
was added back in commit 1f39581a9a
   sb_edac: Add support for Broadwell-DE processor
This patch extends Broadwell support to cover the two
socket "-EP" and four socket "-EX" versions of Broadwell.
Only tested on the 2 socket - but this code is largely
cloned from the Haswell path.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-06-03 10:10:59 -03:00
Tony Luck
7d375bffa5 sb_edac: Fix support for systems with two home agents per socket
First noticed a problem on a 4 socket machine where EDAC only reported
half the DIMMS.  Tracked this down to the code that assumes that systems
with two home agents only have two memory channels on each agent. This
is true on 2 sockect ("-EP") machines. But four socket ("-EX") machines
have four memory channels on each home agent.

The old code would have had problems on two socket systems as it did
a shuffling trick to make the internals of the code think that the
channels from the first agent were '0' and '1', with the second agent
providing '2' and '3'. But the code didn't uniformly convert from
{ha,channel} tuples to this internal representation.

New code always considers up to eight channels.
On a machine with a single home agent these map easily to edac channels
0, 1, 2, 3. On machines with two home agents we map using:
  edac_channel = 4*ha# + channel
So on a -EP machine where each home agent supports only two channels
we'll fill in channels 0, 1, 4, 5, and on a -EX machine we use all of 0,
1, 2, 3, 4, 5, 6, 7.

[mchehab@osg.samsung.com: fold a fixup patch as per Tony's request and fixed
 a few CodingStyle issues]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-06-03 10:10:52 -03:00
Tony Luck
bb89e7141a sb_edac: Fix a typo and a thinko in address handling for Haswell
typo: "a7mode" chooses whether to use bits {8, 7, 9} or {8, 7, 6}
in the algorithm to spread access between memory resources. But
the non-a7mode path was incorrectly using GET_BITFIELD(addr, 7, 9)
and so picking bits {9, 8, 7}

thinko: BIT(1) of the dram_rule registers chooses whether to just
use the {8, 7, 6} (or {8, 7, 9}) bits mentioned above as they are,
or to XOR them with bits {18, 17, 16} but the code inverted the
test. We need the additional XOR when dram_rule{1} == 0.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-06-03 10:10:47 -03:00
Tony Luck
c44696fff0 EDAC: Remove arbitrary limit on number of channels
Currently set to "6", but the reset of the code will dynamically
allocate as needed.  We need to go to "8" today, but drop the check
completely to save doing this again when we need even larger numbers.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-06-03 10:10:22 -03:00
Arnd Bergmann
451bb7fbcc EDAC, xgene: Fix cpuid abuse
The new x-gene EDAC driver incorrectly tried to figure out the version
of one of its IP blocks by looking at the version of the CPU core, which
is only vagely related.

This removes the incorrect code and instead uses the version of the IP
block in the compatible string where it belongs.

Found using build testing on x86, which does not provide the arm64
cpuid interface.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ Changed subnode to "apm,xgene-edac-pmd-v2", adjusted check. ]
Signed-off-by: Loc Ho <lho@apm.com>
Cc: devicetree@vger.kernel.org
Cc: dougthompson@xmission.com
Cc: ijc+devicetree@hellion.org.uk
Cc: jcm@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: mchehab@osg.samsung.com
Cc: patches@apm.com
Cc: robh+dt@kernel.org
Link: http://lkml.kernel.org/r/3195065.IK73o60xya@wuerfel
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-02 19:07:50 +02:00
York Sun
2ce39109a5 EDAC, mpc85xx: Extend error address to 64 bit
Extend err_addr to cover 64 bits for DDR errors.

Signed-off-by: York Sun <yorksun@freescale.com>
Acked-by: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: Mingkai.hu@freescale.com
Link: http://lkml.kernel.org/r/1431425022-44766-2-git-send-email-Wenbin.Song@freescale.com
Signed-off-by: songwenbin <wenbin.song@freescale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-31 12:51:08 +02:00
York Sun
74210267a5 EDAC, mpc8xxx: Adapt for FSL SoC
Remove mpc83xx and mpc85xx as dependency.

Signed-off-by: York Sun <yorksun@freescale.com>
Acked-by: Johannes Thumshirn <morbidrsa@gmail.com>
Cc: Mingkai.hu@freescale.com
Link: http://lkml.kernel.org/r/1431425022-44766-1-git-send-email-Wenbin.Song@freescale.com
Signed-off-by: songwenbin <wenbin.song@freescale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-31 12:50:31 +02:00
Borislav Petkov
cc14c5a808 EDAC, edac_stub: Drop arch-specific include
<asm/edac.h> contains only the arch-specific scrubbing function and is
thus not needed in edac_stub.c. Kill it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-29 22:01:00 +02:00
Loc Ho
0d4429301c EDAC: Add APM X-Gene SoC EDAC driver
Add support for the APM X-Gene SoC EDAC driver.

Signed-off-by: Loc Ho <lho@apm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: dougthompson@xmission.com
Cc: ijc+devicetree@hellion.org.uk
Cc: jcm@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: mchehab@osg.samsung.com
Cc: patches@apm.com
Cc: robh+dt@kernel.org
Link: http://lkml.kernel.org/r/1432337580-3750-5-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-29 11:39:24 +02:00
Borislav Petkov
b01aec9b2c EDAC: Cleanup atomic_scrub mess
So first of all, this atomic_scrub() function's naming is bad. It looks
like an atomic_t helper. Change it to edac_atomic_scrub().

The bigger problem is that this function is arch-specific and every new
arch which doesn't necessarily need that functionality still needs to
define it, otherwise EDAC doesn't compile.

So instead of doing that and including arch-specific headers, have each
arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
for ifdeffery. Much cleaner.

And we already are doing this with another symbol - EDAC_SUPPORT. This
is also much cleaner than having CONFIG_EDAC enumerate all the arches
which need/have EDAC support and drivers.

This way I can kill the useless edac.h header in tile too.

Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "Maciej W. Rozycki" <macro@codesourcery.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Steven J. Hill" <Steven.Hill@imgtec.com>
Cc: x86@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-28 15:31:53 +02:00
Luis R. Rodriguez
735c0f8f12 amd64_edac: enforce synchronous probe
While testing asynchronous PCI probe on this driver I noticed it failed
because the driver checks if any of the PCI devices have been bound to
the driver after registering it, which obviously does not work if
probing is asynchronous.

While there are patches and discussions on how the driver should behave
are ongoing, let's enforce synchronous probe for this driver for now.

Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:25 -07:00
Thor Thayer
7e52a03646 EDAC, altera: Do not build it as a module
The SDRAM EDAC requires SDRAM configuration/initialization before SDRAM
is accessed (in the preloader) and therefore before Linux is loaded.
Having a module compile is not desired so force to be built into kernel.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: mchehab@osg.samsung.com
Cc: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1429308974-26380-3-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-03 11:56:52 +02:00
Fabian Frederick
1afaa05515 EDAC: Constify of_device_id array
of_device_id is always used as const. See driver.of_match_table and open
firmware functions.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Johannes Thumshirn <johannes.thumshirn@men.de>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Sören Brinkmann <soren.brinkmann@xilinx.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1426535685-25996-10-git-send-email-fabf@skynet.be
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-20 17:50:07 +01:00
Julia Lawall
18b44b2b95 EDAC, i82443bxgx: Don't export static symbol
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
type T;
identifier f;
@@

static T f (...) { ... }

@@
identifier r.f;
declarer name EXPORT_SYMBOL_GPL;
@@

-EXPORT_SYMBOL_GPL(f);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tim Small <tim@buttersideup.com>
Link: http://lkml.kernel.org/r/1426092997-30605-13-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-11 20:39:15 +01:00
Borislav Petkov
2ec591ac74 EDAC, amd64_edac: Get rid of per-node driver instances
... and do the proper thing using EDAC core facilities.

Cc: Daniel J Blueman <daniel@numascale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:16:01 +01:00
Alexey Khoroshilov
c6b97bcf8e EDAC: Properly unwind on failure path in edac_init()
edac_init() does not deallocate already allocated resources on failure
path.

Found by Linux Driver Verification project (linuxtesting.org).

 [ Boris: The unwind path functions have __exit annotation but are being
   used in an __init function, leading to section mismatches. Drop the
   section annotation and make them normal functions. ]

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:13:07 +01:00
Takashi Iwai
fc7cc6b782 EDAC: highbank: Use static attribute groups for sysfs entries
... instead of manual device_create_file() and device_remove_file()
calls.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-9-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:12:22 +01:00
Takashi Iwai
1bf06a0d55 EDAC: octeon: Use static attribute groups for sysfs entries
... instead of manual device_create_file() and device_remove_file()
calls.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-8-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:12:05 +01:00
Takashi Iwai
917c85b545 EDAC: mpc85xx: Use static attribute groups for sysfs entries
... instead of manual device_create_file() and device_remove_file()
calls.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@men.de>
Link: http://lkml.kernel.org/r/1423046938-18111-7-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:11:40 +01:00
Takashi Iwai
2eace188f6 EDAC: i7core: Use static attribute groups for sysfs entries
... instead of manual device_create_file() and device_remove_file()
calls.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ Add NULL terminator to i7core_dev_attrs[] caught by the build robot. ]
Reported-by: Huang Ying <ying.huang@intel.com>
Link: http://lkml.kernel.org/r/1423046938-18111-6-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:09:39 +01:00
Takashi Iwai
e97d7e3816 EDAC: i7core: Return proper error codes for kzalloc() errors
... instead of possibly uninitialized return value.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-5-git-send-email-tiwai@suse.de
[ Add a commit message, albeit a small one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:08:45 +01:00
Takashi Iwai
e339f1ec97 EDAC: amd64: Use static attribute groups
Instead of calling device_create_file() and device_remove_file()
manually, pass the static attribute groups with the new
edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs
files is done by a proper is_visible callback.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:08:09 +01:00
Takashi Iwai
4e8d230de9 EDAC: Allow to pass driver-specific attribute groups
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups.  This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.

edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:07:41 +01:00
Takashi Iwai
2c1946b6d6 EDAC: Use static attribute groups for managing sysfs entries
Instead of manual calls of device_create_file() and
device_remove_file(), use static attribute groups with proper
is_visible callbacks for managing the sysfs entries.

This simplifies the code a lot and avoids the possible races.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:06:58 +01:00
Markus Elfring
7260194595 EDAC: Delete unnecessary checks before pci_dev_put()
The pci_dev_put() function tests whether its argument is NULL and thus
the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/54CFC12C.9010002@users.sourceforge.net
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:06:23 +01:00
Linus Torvalds
477ea11696 * A fix to sb_edac for proper detection on SNB machines
* A fix to amd64_edac to not explode on Numascale machines with more
 than 16 memory controllers, from Daniel J Blueman.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU5aSqAAoJEBLB8Bhh3lVKGbgP+gOzBaeBlIa9BcdrohiF2mKz
 UyS/2v/RN/OK0F9u/LUJr15jwZex4TPbE7QPoMF6IvsHQR/5Jh5k4fZUaU8/3sY8
 F6ugd7/I4x2mFrYvcJsK5PUm5ZqdYG6dyQKNAInqQw+/sAdL9i9shXz9SUUJXh98
 Qa2zGSEyJVNIYmTi3rcSy8gxwJR8xdwL56iGmt4HEc/asjziSoIxOCwEVLh6OR9e
 vgLDzOntgS56ymbPiNXHfpEE7IqBkJELCFma6RHer4L4R2fTlJRqk1oahS91tx3X
 /m70IVrLu/v6/aDWMdDzSj9oTm6mZD2IpQEAurrp+kFqzLj7EUOlhnXJF1fsyEJY
 OmDO+/Z72iRfgVv0SlT0AsrDkVvXUOfERBhyKZpd4wxUV2/XUYFqhwVPE86+al8p
 wSuwUJrvKQ4bGdRQYEufZBO7JOUTk8K09iFEzREEYbzvEZPc4ZPMUTXAgOA54x6V
 HOhD6NhPR0RJwg9OgqJndgnF0XTcruh6/LuFO2ioyKCR92hjutwuoYyxVI8jfv1W
 rHB09wdNzv4EIIoH/5BeBNIv3Vtc04n5d6MRWbYHmBWAt6Ib+jBXWVNMoWSrAOIQ
 4MNBgxH5sEhTFKnFOt+/cXZktLVr0G79vii5GaodXKgaTnV59Jm4sV06k5vxHxly
 eee2zEWMoT3M4fhYC3St
 =Zypx
 -----END PGP SIGNATURE-----

Merge tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull two EDAC fixes from Borislav Petkov:

 - A fix to sb_edac for proper detection on SNB machines

 - A fix to amd64_edac to not explode on Numascale machines with more
   than 16 memory controllers, from Daniel J Blueman.

* tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
  sb_edac: Fix detection on SNB machines
2015-02-19 11:18:14 -08:00
Daniel J Blueman
0c510cc83b EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
the kernel fatally dereferences unallocated structures, see splat below;
this occurs on at least NumaConnect systems.

Fix by checking if a memory controller info structure was found.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in:
CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
Stack:
 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
 ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
Call Trace:
 <IRQ>
 ? vprintk_default
 ? printk
 amd_decode_mce
 notifier_call_chain
 atomic_notifier_call_chain
 mce_log
 machine_check_poll
 mce_timer_fn
 ? mce_cpu_restart
 call_timer_fn.isra.29
 run_timer_softirq
 __do_softirq
 irq_exit
 smp_apic_timer_interrupt
 apic_timer_interrupt
 <EOI>
 ? down_read_trylock
 __do_page_fault
 ? __schedule
 do_page_fault
 page_fault

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
Cc: stable@vger.kernel.org
[ Boris: massage commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-17 10:32:12 +01:00
Borislav Petkov
11249e7399 sb_edac: Fix detection on SNB machines
d0585cd815 ("sb_edac: Claim a different PCI device") changed the
probing of sb_edac to look for PCI device 0x3ca0:

3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
...

but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
in sbridge_probe() therefore the probing fails.

Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
.i.e., the 14.0 device, fixes the issue and driver loads successfully
again:

[ 2449.013120] EDAC DEBUG: sbridge_init:
[ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
[ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
[ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
...

Add a debug printk while at it to be able to catch the failure in the
future and dump driver version on successful load.

Fixes: d0585cd815 ("sb_edac: Claim a different PCI device")
Cc: stable@vger.kernel.org # 3.18
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-09 16:55:26 +01:00
Dan Carpenter
30263b4052 EDAC, mv64x60_edac: Fix an error code in probe()
If edac_mc_add_mc() fails then we should preserve the error code, but
instead the current code returns success.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20150128191351.GC10259@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30 17:00:43 +01:00
Borislav Petkov
f11135d87d EDAC: edac_mc_sysfs: Make stuff static
Fix sparse warnings.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30 14:49:04 +01:00
Junjie Mao
1bf1950c4e EDAC: Fix the leak of mci->bus->name when bus_register fails
Also use goto labels for all failure paths in
edac_create_sysfs_mci_device and update meaningless labels.

Signed-off-by: Junjie Mao <junjie.mao@hotmail.com>
Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl
[ Boris: Use ! for 0 checks and add newlines for less crammed code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30 14:38:45 +01:00
Rickard Strandqvist
a4972b1b9a edac: i5100_edac: Remove unused i5100_recmema_dm_buf_id
Remove the function i5100_recmema_dm_buf_id() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Link: http://lkml.kernel.org/r/1420999030-21770-1-git-send-email-rickard_strandqvist@spectrumdigital.se
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-12 16:00:56 +01:00
Punnaiah Choudary Kalluri
ae9b56e399 EDAC, synps: Add EDAC support for zynq ddr ecc controller
Add EDAC support for ecc errors reporting on the synopsys ddr
controller. The ddr ecc controller corrects single bit errors and
detects double bit errors.

Selected important-ish notes from the changelog:

- I have not taken care of spliting synps_edac_geterror_info function as
it adds additional indentation levels and moreover the existing changes
were made as part of the v2 review comments

- Removed dt binding info as already there is a binding info available
under memorycontroller. so, updated ecc info there.

- Shortened the prefix "sysnopsys" to "synps"

Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Link: http://lkml.kernel.org/r/a728a8d4678f4dbf9de189a480297c3d@BY2FFO11FD034.protection.gbl
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-07 11:42:05 +01:00
Alexander Kuleshov
775c503f65 mpc85xx_edac: Fix a typo in comments
s/kenel/kernel/g

[ Boris: massage commit message a bit ]

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Johannes Thumshirn <johannes.thumshirn@men.de>
Link: http://lkml.kernel.org/r/1419749085-7128-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-02 14:03:33 +01:00
Wei Yongjun
8c2b117fdf EDAC, mce_amd_inj: Fix sparse non static symbol warning
Fixes the following sparse warnings:

drivers/edac/mce_amd_inj.c:204:3: warning:
 symbol 'dfs_fls' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Link: http://lkml.kernel.org/r/1418087095-14174-1-git-send-email-weiyj_lk@163.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-21 12:44:58 +01:00
Linus Torvalds
e6b5be2be4 Driver core patches for 3.19-rc1
Here's the set of driver core patches for 3.19-rc1.
 
 They are dominated by the removal of the .owner field in platform
 drivers.  They touch a lot of files, but they are "simple" changes, just
 removing a line in a structure.
 
 Other than that, a few minor driver core and debugfs changes.  There are
 some ath9k patches coming in through this tree that have been acked by
 the wireless maintainers as they relied on the debugfs changes.
 
 Everything has been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
 53kAoLeteByQ3iVwWurwwseRPiWa8+MI
 =OVRS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core update from Greg KH:
 "Here's the set of driver core patches for 3.19-rc1.

  They are dominated by the removal of the .owner field in platform
  drivers.  They touch a lot of files, but they are "simple" changes,
  just removing a line in a structure.

  Other than that, a few minor driver core and debugfs changes.  There
  are some ath9k patches coming in through this tree that have been
  acked by the wireless maintainers as they relied on the debugfs
  changes.

  Everything has been in linux-next for a while"

* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
  Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
  fs: debugfs: add forward declaration for struct device type
  firmware class: Deletion of an unnecessary check before the function call "vunmap"
  firmware loader: fix hung task warning dump
  devcoredump: provide a one-way disable function
  device: Add dev_<level>_once variants
  ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
  ath: use seq_file api for ath9k debugfs files
  debugfs: add helper function to create device related seq_file
  drivers/base: cacheinfo: remove noisy error boot message
  Revert "core: platform: add warning if driver has no owner"
  drivers: base: support cpu cache information interface to userspace via sysfs
  drivers: base: add cpu_device_create to support per-cpu devices
  topology: replace custom attribute macros with standard DEVICE_ATTR*
  cpumask: factor out show_cpumap into separate helper function
  driver core: Fix unbalanced device reference in drivers_probe
  driver core: fix race with userland in device_add()
  sysfs/kernfs: make read requests on pre-alloc files use the buffer.
  sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
  fs: sysfs: return EGBIG on write if offset is larger than file size
  ...
2014-12-14 16:10:09 -08:00
Linus Torvalds
709d9f09b6 edac updates for v3.19-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUhyDVAAoJEAhfPr2O5OEVsdMP/it7jKAKu1uHvVoWfwPcB0nG
 QZ7Dt3Hu9aZtaZJs9OFOl2RrGyp4tXOK1SM2QInRGRjjquEP83AASk7XJRkXHr61
 DTiMxHnKXvGfky4ONS76ZP3JWwfBffT5Y8oXIGiarAyUgcK66mnelrlju0r1XM11
 uoUZqOzE5ySfARFcYm9VG53qLG4RxOERqP+EKSyctRE5gCsuymPaSKIwqj7rcg4R
 QikDJQv2H8y2Ui0T9Dk6urdOjlUpBczGRVaXdY+2FyDfs0KPLEWUQVl8q2Lj/bur
 1A10H+p9HdIA9emV2WEetQmJzz7POgn/j+BLkRKImPS5NgKv2Gu5BWiVUKQg4oRt
 shvcpp38PYSmeDIy16v+3RDf0THSSYxhq7ExlGacIzaRk7G2asgRAWixiVS45NzI
 yMb17ZJlHhzVMEyoSlTJH90o4y6UT9njZ7TcK3+9wmhbFyIlGWrrD1Mp56PDCYes
 KO5VRYkwnw8GAd7gH0xwtHRREQYt4VneVyb7Ccw2VxrdIffVp1fDhpxnS+tw+Qri
 294+RoJlnrp4JIpXerAJgQIIiMsRhg64EQ3rsT4Skj6Hfq4KExkmdrHE38p+FZLO
 xHURKAt2jDSuu/DqaWpBclJXE4Q+MhS303pn80x0MHbFmRAYxwmWFjgPwQKrzNU9
 x47JgdYNpBoF2+WK6ZGo
 =CeIU
 -----END PGP SIGNATURE-----

Merge tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac

Pull edac updates from Mauro Carvalho Chehab:
 - Broadwell-DE support on sb-edac driver
 - Some fixes at sb-edac driver

* tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  sb_edac: Fix typo computing number of banks
  sb_edac: Add support for Broadwell-DE processor
  sb_edac: Fix discovery of top-of-low-memory for Haswell
  sb_edac: Fix erroneous bytes->gigabytes conversion
  sb_edac: Fix off-by-one error in number of channels
2014-12-11 11:58:50 -08:00
Linus Torvalds
c9f861c772 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The biggest change in this cycle is better support for UCNA
  (UnCorrected No Action) events:

    "Handle all uncorrected error reports in the same way (soft
     offline the page). We used to only do that for SRAO
     (software recoverable action optional) machine checks, but
     it makes sense to also do it for UCNA (UnCorrected No
     Action) logs found by CMCI or polling."

  plus various x86 MCE handling updates and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Spell "panicked" correctly
  x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
  x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
  x86, MCE, AMD: Assign interrupt handler only when bank supports it
  x86, MCE, AMD: Drop software-defined bank in error thresholding
  x86, MCE, AMD: Move invariant code out from loop body
  x86, MCE, AMD: Correct thresholding error logging
  x86, MCE, AMD: Use macros to compute bank MSRs
  RAS, HWPOISON: Fix wrong error recovery status
  GHES: Make ghes_estatus_caches static
  APEI, GHES: Cleanup unnecessary function for lockless list
2014-12-10 14:20:10 -08:00
Linus Torvalds
0160928e79 EDAC updates all over the place:
* Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM
 support. Out of tree stuff is
 
  arch/x86/kernel/amd_nb.c       |   2 +
  include/linux/pci_ids.h        |   2 +
 
 adding the required PCI IDs. From Aravind Gopalakrishnan.
 
 * Enable amd64_edac for 32-bit due to popular demand. From Tomasz Pala.
 
 * Convert the AMD MCE injection module to debugfs, where it belongs.
 
 * Misc EDAC cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUhZbcAAoJEBLB8Bhh3lVKd5kQAK77qsB4ebpke1rEBerQl9jQ
 YCVrByCKu7QTRt/xvlqU9Vyp7EvcnpNxFbRCCqzIpBcJjre9v17QVRA2/zFS0q81
 QRDTOWf9uhMWbssI2Zu1hbjDNMWYiEb9+aeZHjScVtkzPDmsgYuKGdWfTSLw9dkS
 AG/UUZ3ojwyc2dK3i0W3pCjakKYLUsCmijyTZBfb37+u4rRGuAgrQ9G8fBn2lL+l
 huptgV2BsTCQqdL554zTs64Yt912PnidsJWYCPCjMubgEPSeNcWRzTTBYDUf9NIn
 RxFXYHnOQxetPSQqfLVXlX3V/cGNQg0yEXFZ9S0tCt5uLNbbN/D8Uumtst0rq9x3
 XkJ/EGHXBFP+KwHdV/i9j6OYM5rq4z+4ql4OqbWzsvPrEDbh/4p5gRbhqd1Hhy9U
 zgHJjVPpD/l2t82Tpz0jIOscQruZ6VqGMDSYo3LiLnNt724pcrmr3DiN9mc6ljzJ
 rsNsemMH0IoH8KbBHKGtMLnBVO6HbnrtC6iKFfocNBisvo1PKKzn9s2O1pdjmsCs
 jHwz5njoM7Ki/ygkJhbKiSDMXPs67eggwoGIGzpNMoY4RWxrcQzYE9yKfKzRNxET
 Qb3xUwDWDyL8ErwHtL3xMxGwyfkhb+SZdMd5aKYA5Rdbf+TN8P6iAv05nrnfpkyk
 lPTv5o9EQYvr8/Tc9FZW
 =ULmb
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "EDAC updates all over the place:

   - Enablement for AMD F15h models 0x60 CPUs.  Most notably DDR4 RAM
     support.  Out of tree stuff is adding the required PCI IDs.  From
     Aravind Gopalakrishnan.

   - Enable amd64_edac for 32-bit due to popular demand.  From Tomasz
     Pala.

   - Convert the AMD MCE injection module to debugfs, where it belongs.

   - Misc EDAC cleanups"

* tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE, AMD: Correct formatting of decoded text
  EDAC, mce_amd_inj: Add an injector function
  EDAC, mce_amd_inj: Add hw-injection attributes
  EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
  EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
  EDAC: Delete unnecessary check before calling pci_dev_put()
  EDAC, pci_sysfs: remove unneccessary ifdef around entire file
  ghes_edac: Use snprintf() to silence a static checker warning
  amd64_edac: Build module on x86-32
  EDAC, MCE, AMD: Add decoding table for MC6 xec
  amd64_edac: Add F15h M60h support
  {mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED
  EDAC: Sync memory types and names
  EDAC: Add DDR3 LRDIMM entries to edac_mem_types
  x86, amd_nb: Add device IDs to NB tables for F15h M60h
  pci_ids: Add PCI device IDs for F15h M60h
2014-12-08 20:17:49 -08:00
Tony Luck
fec53af531 sb_edac: Fix typo computing number of banks
Code will always think there are 16 banks because of a typo

Reported-by: Misha
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 16:46:33 -02:00
Tony Luck
1f39581a9a sb_edac: Add support for Broadwell-DE processor
Broadwell-DE is the microserver version of next generation Xeon
processors.  A whole bunch of new PCIe device ids, but otherwise
pretty much the same as Haswell.

Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 16:46:08 -02:00
Tony Luck
f7cf2a22a2 sb_edac: Fix discovery of top-of-low-memory for Haswell
Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset.  There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).

This resulted in a bogus value for the top of low memory:

  EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)

which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523eac86
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
   MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)

With the fix we see the correct TOLM value:

   DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)

and we decode address 2000000 correctly:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523e1086
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
   DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
   DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
   DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
   DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
   DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
   MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:52 -02:00
Jim Snow
8c00910029 sb_edac: Fix erroneous bytes->gigabytes conversion
Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:51 -02:00
Jim Snow
50043e257c sb_edac: Fix off-by-one error in number of channels
This prevented edac sysfs code from properly handling 6 channels
per memory controller.

Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:51 -02:00
Borislav Petkov
50872ccd87 EDAC, MCE, AMD: Correct formatting of decoded text
Write out MCx_ADDR into the more humanly readable "MCx Error Address"
and remove double colon in the output.

Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:49 +01:00
Borislav Petkov
51756a5034 EDAC, mce_amd_inj: Add an injector function
Selectively inject either a real MCE or a sw-only version which
exercises the decoding path only. The hardware-injected MCE triggers a
machine check exception (#MC) so that the MCE handler can be bothered to
do something too.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:45 +01:00
Borislav Petkov
b18f3864bf EDAC, mce_amd_inj: Add hw-injection attributes
Expose struct mce->inject_flags.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:42 +01:00
Borislav Petkov
21690934d9 EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
Normally, writing those causes a #GP but HWCR[McStatusWrEn] controls
that. Provide a knob.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:37 +01:00
Borislav Petkov
fd19fcd632 EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
This module's interface belongs in debugfs, not in sysfs.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:33 +01:00
Chen Yucong
e3480271f5 x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:55:43 -08:00
Markus Elfring
0a98babd85 EDAC: Delete unnecessary check before calling pci_dev_put()
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test before the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/546CB20D.4070808@users.sourceforge.net
[ Boris: commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-19 16:33:48 +01:00
Andreas Ruprecht
19ca5a3cc4 EDAC, pci_sysfs: remove unneccessary ifdef around entire file
The file edac_pci_sysfs.c is dependent on CONFIG_PCI. This is already
modelled in the Makefile, but edac_pci_sysfs.o is still contained in
the list of files compiled even without CONFIG_PCI.

This change removes edac_pci_sysfs.o from the list of built objects
when not having CONFIG_PCI enabled and removes the then-unnecessary
ifdef from the source file.

Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Link: http://lkml.kernel.org/r/1407697803-3837-1-git-send-email-rupran@einserver.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-11 18:17:57 +01:00
Dan Carpenter
665aa8cdc4 ghes_edac: Use snprintf() to silence a static checker warning
My static checker complains because the "e->location" has up to 256
characters but we are copying it into the "pvt->detail_location" which
only has space for 240 characters.  That's not counting the surrounding
text and the "e->other_detail" string which can be over 80 characters
long.

I am not familiar with this code but presumably it normally works.
Let's add a limit though for safety.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/20140801082514.GD28869@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-11 18:08:56 +01:00
Tomasz Pala
f5b10c45ef amd64_edac: Build module on x86-32
By popular demand, enable amd64_edac on 32-bit too.

Boris:
 - update Kconfig text.
 - add a warning on load which states that 32-bit configurations are unsupported.

Signed-off-by: Tomasz Pala <gotar@polanet.pl>
Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.pl
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-05 15:54:34 +01:00
Aravind Gopalakrishnan
bc4febe93c EDAC, MCE, AMD: Add decoding table for MC6 xec
Extended error code meanings are tabulated for other banks. Extend that
tradition for MC6 too.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1415122868-10969-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-04 18:49:20 +01:00
Greg Kroah-Hartman
a8a93c6f99 Merge branch 'platform/remove_owner' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into driver-core-next
Remove all .owner fields from platform drivers
2014-11-03 19:53:56 -08:00
Aravind Gopalakrishnan
a597d2a5d9 amd64_edac: Add F15h M60h support
This patch adds support for ECC error decoding for F15h M60h processor.
Aside from the usual changes, the patch adds support for some new features
in the processor:
 - DDR4(unbuffered, registered); LRDIMM DDR3 support
   - relevant debug messages have been modified/added to report these
     memory types
 - new dbam_to_cs mappers
   - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find
     cs_size. This multiplier value is obtained from the per-dimm
     DCSM register. So, change the interface to accept a 'cs_mask_nr'
     value to facilitate this calculation
 - switch-casing determine_memory_type()
   - done to cleanse the function of too many if-else statements
     and improve readability
   - This is now called early in read_mc_regs() to cache dram_type

Misc cleanup:
 - amd64_pci_table[] is condensed by using PCI_VDEVICE macro.

Testing details:
Tested the patch by injecting 'ECC' type errors using mce_amd_inj
and error decoding works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: determine_memory_type() cleanups ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-30 13:42:48 +01:00
Jason Baron
8030122a9c e7xxx_edac: Report CE events properly
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-22 22:59:00 +02:00
Jason Baron
fa19ac4b92 cpc925_edac: Report UE events properly
Fix UE event being reported as HW_EVENT_ERR_CORRECTED.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-22 22:58:45 +02:00
Jason Baron
ab0543de6f i82860_edac: Report CE events properly
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-22 22:58:31 +02:00
Jason Baron
8a3f075d6c i3200_edac: Report CE events properly
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/d02465b4f30314b390c12c061502eda5e9d29c52.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-22 22:58:13 +02:00
Wolfram Sang
b7382f8349 edac: drop owner assignment from platform_drivers
A platform_driver does not need to set an owner, it will be populated by the
driver core.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-10-20 16:20:30 +02:00
Michael Opdenacker
5c43cbdf78 {mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED
It's a NOOP since 2.6.35.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Link: http://lkml.kernel.org/r/1412159043-7348-1-git-send-email-michael.opdenacker@free-electrons.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-20 14:23:09 +02:00
Borislav Petkov
4cfc3a40f7 EDAC: Sync memory types and names
Make keeping the sync between the mem_types enum and the actual string
names simpler by using designated initializers.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-20 14:22:50 +02:00
Aravind Gopalakrishnan
348fec7021 EDAC: Add DDR3 LRDIMM entries to edac_mem_types
F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: improve comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-20 14:22:22 +02:00
Linus Torvalds
bf65dea87e edac updates for v3.18-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUNsUKAAoJEAhfPr2O5OEVXSMQAJ953vtyqmMEi02NMf24NpA+
 OuTmoe5c8RT7/+YD2edHzG7VgYIE1L4evVOm71XLoPwI3mmGwuAIYZAFvVXsYPtB
 U7xfWHpC5r29DlTQen7L0doD1NnRTQfOWFUtsHnd5ygdrzmIHToXqqN1IjoQudpK
 i8ttU5zshSMt1IPwFh+CXMHkp8wA11hGX4LyonmJ0WD/bCeu8kreilRrK/ehZtAU
 sBjzEif2+xtqAWBaxyZ0IzWBJdYBjo6u68jyifK0liM8oBZ8vov11i6cCZBuGGAy
 eNG6lNBmak77U187yUeeyqjbdhnPy7NPLEvvfDN/C5voGHqPkrKEjH64j54ayeaR
 TQ6u6VlPJ+3RXeqtRfYbMOgHAsxNMpJZLkKx0NNty073RQc2qgX8Xxs4t33+9B37
 TfkoL8fnkh7Mq9x8czRw4n5X4qiw2ZeDDNX4TZPdGP2QQwP3JVDfMOzyr6x+BhMp
 4TwCp+Sr9PeohyGYZ8InjRdxkA3yTrc1CbqSC00lXBe0lt9Pw4aQ9HGFQWSXxYH3
 uZ8PoqBpxy3C5xvcwDQ8kjmu6y0GPtsRHAdb0G3HW3bjMiLuQbVohTxILEvg9HWr
 2tSyKSfUZXJaTckXfWOVelyexkh3IHvQ0AoQFtG6qXQh2HF12S1OV7EoGxTua2Ye
 /XDAjKzdi12tRLlY172L
 =FRmr
 -----END PGP SIGNATURE-----

Merge tag 'edac/v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac

Pull edac updates from Mauro Carvalho Chehab:
 "Nothing really exiting here: just one bug fix at sb_edac, and some
  changes to allow other drivers to use some shared PCI addresses"

* tag 'edac/v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  sb_edac: Claim a different PCI device
  Move Intel SNB device ids from sb_edac to pci_ids.h
  sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel
2014-10-10 22:07:55 -04:00
Linus Torvalds
8b45bc892e ARM: SoC driver updates for 3.18
These are changes for drivers that are intimately tied to some SoC
 and for some reason could not get merged through the respective
 subsystem maintainer tree.
 
 Most of the new code is for the Keystone Navigator driver, which is
 new base support that is going to be needed for their hardware
 accelerated network driver and other units.
 
 Most of the commits are for moving old code around from at91 and omap
 for things that are done in device drivers nowadays.
 
 - at91: move reset, poweroff, memory and clocksource code into drivers
   directories
 - socfpga: add edac driver (through arm-soc, as requested by Boris)
 - omap: move omap-intc code to drivers/irqchip
 - sunxi: added an RTC driver for sun6i
 - omap: mailbox driver related changes
 - keystone: support for the "Navigator" component
 - versatile: new reboot, led and soc drivers
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAVDWWQGCrR//JCVInAQKX7Q//bDkoseKCZsGaXN7vfQ2YhT3SAc52mROV
 YQKdNmtMUrHqDgngATZTx5ogOh1hInnqueFjGGhfMYsHQO1Vj8+odj0r+4jhjuUY
 3YfY+qZ+91tq33JlUOhKn+mfVMdxJc8XarGgR6MSWYkqWVYCtLtBluum7hKm2UJ6
 /e4hd2zzImX5ATwj/LXWLx5eTf1qAVFGWzNUph1DrW+1V5lOu58X4gKwk1QOCVEh
 Pa0GV9oRTkjoswwz9drzjeFtie2yofQ2mygj6QKxg5NsosIF0+B8kJ61Sxwg56Ak
 tF+qn1hGtB2cDQkpxK4o2cZgCELhkh5Aqgol/vZUS1DMBSUEGCV9PPp2eOW83r3B
 0zsTgsShyVcTh7khdpQmHNRigvcc7e69LaAGC4o/RxaZpCU/LUNCQ+/iqVExSE8A
 VNEXr+JNxGxhj3m9KUHuEktdWx1oNvaYR8Rr4RPr6EWR8R6emJ04I7kXInvzhJZL
 HOGh75vSuAU83FrsP8fFRLadoHNVDXylAs38BPfGEMngVpjvwJLgQ3+729CwW+Q4
 +xQXAKSwKfr8xA8eg6wBSbFcwnEW4QwRqFqQ5XPw7zTZkCZbiLtvn3JpI5bH5A5Q
 /d2D+M2vFbB7VbWJBM4etO95eNS/pfhqJhcQh4t0DjXjoW6WqLiHCxhEx8Ogfvop
 /4ckyGvtEOI=
 =POJD
 -----END PGP SIGNATURE-----

Merge tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "These are changes for drivers that are intimately tied to some SoC and
  for some reason could not get merged through the respective subsystem
  maintainer tree.

  Most of the new code is for the Keystone Navigator driver, which is
  new base support that is going to be needed for their hardware
  accelerated network driver and other units.

  Most of the commits are for moving old code around from at91 and omap
  for things that are done in device drivers nowadays.

   - at91: move reset, poweroff, memory and clocksource code into
     drivers directories
   - socfpga: add edac driver (through arm-soc, as requested by Boris)
   - omap: move omap-intc code to drivers/irqchip
   - sunxi: added an RTC driver for sun6i
   - omap: mailbox driver related changes
   - keystone: support for the "Navigator" component
   - versatile: new reboot, led and soc drivers"

* tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (92 commits)
  bus: arm-ccn: Fix spurious warning message
  leds: add device tree bindings for register bit LEDs
  soc: add driver for the ARM RealView
  power: reset: driver for the Versatile syscon reboot
  leds: add a driver for syscon-based LEDs
  drivers/soc: ti: fix build break with modules
  MAINTAINERS: Add Keystone Multicore Navigator drivers entry
  soc: ti: add Keystone Navigator DMA support
  Documentation: dt: soc: add Keystone Navigator DMA bindings
  soc: ti: add Keystone Navigator QMSS driver
  Documentation: dt: soc: add Keystone Navigator QMSS bindings
  rtc: sunxi: Depend on platforms sun4i/sun7i that actually have the rtc
  rtc: sun6i: Add sun6i RTC driver
  irqchip: omap-intc: remove unnecessary comments
  irqchip: omap-intc: correct maximum number or MIR registers
  irqchip: omap-intc: enable TURBO idle mode
  irqchip: omap-intc: enable IP protection
  irqchip: omap-intc: remove unnecesary of_address_to_resource() call
  irqchip: omap-intc: comment style cleanup
  irqchip: omap-intc: minor improvement to omap_irq_pending()
  ...
2014-10-08 17:37:16 -04:00
Andy Lutomirski
d0585cd815 sb_edac: Claim a different PCI device
sb_edac controls a large number of different PCI functions.  Rather
than registering as a normal PCI driver for all of them, it
registers for just one so that it gets probed and, at probe time, it
looks for all the others.

Coincidentally, the device it registers for also contains the SMBUS
registers, so the PCI core will refuse to probe both sb_edac and a
future iMC SMBUS driver.  The drivers don't actually conflict, so
just change sb_edac's device table to probe a different device.

An alternative fix would be to merge the two drivers, but sb_edac
will also refuse to load on non-ECC systems, whereas i2c_imc would
still be useful without ECC.

The only user-visible change should be that sb_edac appears to bind
a different device.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Rui Wang <ruiv.wang@gmail.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-10-08 17:04:16 -03:00
Andy Lutomirski
68939df1d7 Move Intel SNB device ids from sb_edac to pci_ids.h
The i2c_imc driver will use two of them, and moving only part of
the list seems messier.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-10-08 17:04:16 -03:00
Seth Jennings
351fc4a99d sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel
Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but
EDAC doesn't know about this and returns and INTERNAL ERROR when the
channel is greater than NUM_CHANNELS:

kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f
kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0
kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4)
kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0)

This commit changes sb_edac to forward a channel of -1 to EDAC if the
channel is not specified.  edac_mc_handle_error() sets the channel to -1
internally after the error message anyway, so this commit should have no
effect other than avoiding the INTERNAL ERROR message when the channel
is not specified.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-10-08 17:04:16 -03:00
Borislav Petkov
a18c3f16a9 mpc85xx_edac: Make L2 interrupt shared too
The other two interrupt handlers in this driver are shared, except this
one. When loading the driver, it fails like this.

So make the IRQ line shared.

Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
mpc85xx_mc_err_probe: No ECC DIMMs discovered
EDAC DEVICE0: Giving out device to module MPC85xx_edac controller mpc85xx_l2_err: DEV mpc85xx_l2_err (INTERRUPT)
genirq: Flags mismatch irq 16. 00000000 ([EDAC] L2 err) vs. 00000080 ([EDAC] PCI err)
mpc85xx_l2_err_probe: Unable to request irq 16 for MPC85xx L2 err
remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'aerdrv'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:521
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc5-dirty #1
task: ee058000 ti: ee046000 task.ti: ee046000
NIP: c016c0c4 LR: c016c0c4 CTR: c037b51c
REGS: ee047c10 TRAP: 0700 Not tainted (3.17.0-rc5-dirty)
MSR: 00029000 <CE,EE,ME> CR: 22008022 XER: 20000000

GPR00: c016c0c4 ee047cc0 ee058000 00000053 00029000 00000000 c037c744 00000003
GPR08: c09aab28 c09aab24 c09aab28 00000156 20008028 00000000 c0002ac8 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000139 c0950394
GPR24: c09f0000 ee5585b0 ee047d08 c0a10000 ee047d08 ee15f808 00000002 ee03f660
NIP [c016c0c4] remove_proc_entry
LR [c016c0c4] remove_proc_entry
Call Trace:
remove_proc_entry (unreliable)
unregister_irq_proc
free_desc
irq_free_descs
mpc85xx_l2_err_probe
platform_drv_probe
really_probe
__driver_attach
bus_for_each_dev
bus_add_driver
driver_register
mpc85xx_mc_init
do_one_initcall
kernel_init_freeable
kernel_init
ret_from_kernel_thread
Instruction dump: ...

Reported-and-tested-by: <lpb_098@163.com>
Acked-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-09-30 12:55:41 +02:00
Aravind Gopalakrishnan
7981a28f1a amd64_edac: Modify usage of amd64_read_dct_pci_cfg()
Rationale behind this change:
 - F2x1xx addresses were stopped from being mapped explicitly to DCT1
   from F15h (OR) onwards. They use _dct[0:1] mechanism to access the
   registers. So we should move away from using address ranges to select
   DCT for these families.
 - On newer processors, the address ranges used to indicate DCT1 (0x140,
   0x1a0) have different meanings than what is assumed currently.

Changes introduced:
 - amd64_read_dct_pci_cfg() now takes in dct value and uses it for
   'selecting the dct'
 - Update usage of the function. Keep in mind that different families
   have specific handling requirements
 - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different
   from amd64_read_pci_cfg()
   - Move the k8 specific check to amd64_read_pci_cfg
 - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg()
 - Remove now needless .read_dct_pci_cfg

Testing:
 - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG'
   and mce_amd_inj
 - driver obtains info from F2x registers and caches it in pvt
   structures correctly
 - ECC decoding works fine

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-09-23 13:16:05 +02:00
Pranith Kumar
2d34056d27 ppc4xx_edac: Fix build error caused by wrong member access
Fix the following error

drivers/edac/ppc4xx_edac.c:977:45: error: request for member 'dimm' in something
not a structure or union

by changing member access to pointer dereference.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Link: http://lkml.kernel.org/r/1408482646-22541-1-git-send-email-bobby.prani@gmail.com
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-09-15 14:20:56 +02:00
Thor Thayer
71bcada88b edac: altera: Add Altera SDRAM EDAC support
This patch adds support for the CycloneV and ArriaV SDRAM controllers.
Correction and reporting of SBEs, Panic on DBEs.

There was a discussion thread on whether this driver should be an mfd driver
or just make use of syscon, which is already a mfd. Ultimately, the
decision to use a simple syscon interface was reached.[1]

[1] https://lkml.org/lkml/2014/7/30/514

[dinguyen] Fixed Kconfig to have EDAC_ALTERA_MC as a tristate to prevent a
build failure for allmodconfig.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Acked-by: Borislav Petkov <bp@suse.de>
[dinguyen] cleaned up commit message
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
2014-09-04 13:41:46 -05:00
Borislav Petkov
f4ce6eca71 EDAC: Fix mem_types strings type
This one got forgotten during an earlier cleanup.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-09-02 09:11:16 +02:00
Linus Torvalds
68ffeca4f4 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC updates from Mauro Carvalho Chehab.

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  sb_edac: add support for Haswell based systems
  sb_edac: Fix mix tab/spaces alignments
  edac: add DDR4 and RDDR4
  sb_edac: remove bogus assumption on mc ordering
  sb_edac: make minimal use of channel_mask
  sb_edac: fix socket detection on Ivy Bridge controllers
  sb_edac: update Kconfig description
  sb_edac: search devices using product id
  sb_edac: make RIR limit retrieval per model
  sb_edac: make node id retrieval per model
  sb_edac: make memory type detection per memory controller
2014-08-15 17:56:45 -06:00
Linus Torvalds
ae36e95cf8 The branch contains the following device tree changes the v3.17 merge
window:
 
 Group changes to the device tree. In preparation for adding device tree
 overlay support, OF_DYNAMIC is reworked so that a set of device tree
 changes can be prepared and applied to the tree all at once. OF_RECONFIG
 notifiers see the most significant change here so that users always get
 a consistent view of the tree. Notifiers generation is moved from before
 a change to after it, and notifiers for a group of changes are emitted
 after the entire block of changes have been applied
 
 Automatic console selection from DT. Console drivers can now use
 of_console_check() to see if the device node is specified as a console
 device. If so then it gets added as a preferred console. UART devices
 get this support automatically when uart_add_one_port() is called.
 
 DT unit tests no longer depend on pre-loaded data in the device tree.
 Data is loaded dynamically at the start of unit tests, and then unloaded
 again when the tests have completed.
 
 Also contains a few bugfixes for reserved regions and early memory setup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT6M7XAAoJEMWQL496c2LNTTgP/2rXyrTTGZpK/qrLKHWKYHvr
 XL7tcTkhA0OLU64E37fB+xtDXyBYnLsharuwUFSd1LlL1Wnc1cZN5ORRlMJbmjUR
 Wvwl0A8/mkhGl4tzzKgJ4z4rMJXvlZfJRpnVoRB5FOn90LI7k/jsf5rIwF/6S90B
 6D6II0r4RG9ku1m7g70cToxcIFCzp0V+eu2tym9GnhsyGKlunPT9iNiTpwfVhPAj
 QUvMPKIQXReOv6xDU5q6E07839IMf7SdAvciBTHGnCDD7sGziHvnBIShj/2vTDAF
 27sGRKrWUnnuxRUMOoIudiYyeHXIdt1WXp6FsS/ztVI37Ijh9YPShJwWGhQDppnp
 4tcoSdefqw9IRUajAVWsB0RUW/tCL4a7tggWofylOA6itDi+HZnw6YafE1G1YzxF
 q8OFo9uqLcmFQfHDJpk+sdtXoMZzOgrxlEscsGsQ8kd2Uoe8+chgR9EY1sqPkWVF
 Zw0FJCTB6spBlsnXeboBGrnvpbPkacwhvesIFO0IANy4j4R55xlEeTcs1fe3ubUf
 UhNyyFdnCAUA7e5CAabcAQYsdbEKG6v0Il3H6xts36c8lXCSFXVgNcw5mdCpFCcQ
 DZ3/1FGSVzkYNXX8hlzIn1W0dqvwn4x0ZbnpXUJrGUBpSmjt9qkXx0XbdeFwartU
 7X4tNGS1jfNYhOdP87jF
 =8IFz
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux

Pull device tree updates from Grant Likely:
 "The branch contains the following device tree changes the v3.17 merge
  window:

  Group changes to the device tree.  In preparation for adding device
  tree overlay support, OF_DYNAMIC is reworked so that a set of device
  tree changes can be prepared and applied to the tree all at once.
  OF_RECONFIG notifiers see the most significant change here so that
  users always get a consistent view of the tree.  Notifiers generation
  is moved from before a change to after it, and notifiers for a group
  of changes are emitted after the entire block of changes have been
  applied

  Automatic console selection from DT.  Console drivers can now use
  of_console_check() to see if the device node is specified as a console
  device.  If so then it gets added as a preferred console.  UART
  devices get this support automatically when uart_add_one_port() is
  called.

  DT unit tests no longer depend on pre-loaded data in the device tree.
  Data is loaded dynamically at the start of unit tests, and then
  unloaded again when the tests have completed.

  Also contains a few bugfixes for reserved regions and early memory
  setup"

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: (21 commits)
  of: Fixing OF Selftest build error
  drivers: of: add automated assignment of reserved regions to client devices
  of: Use proper types for checking memory overflow
  of: typo fix in __of_prop_dup()
  Adding selftest testdata dynamically into live tree
  of: Add todo tasklist for Devicetree
  of: Transactional DT support.
  of: Reorder device tree changes and notifiers
  of: Move dynamic node fixups out of powerpc and into common code
  of: Make sure attached nodes don't carry along extra children
  of: Make devicetree sysfs update functions consistent.
  of: Create unlocked versions of node and property add/remove functions
  OF: Utility helper functions for dynamic nodes
  of: Move CONFIG_OF_DYNAMIC code into a separate file
  of: rename of_aliases_mutex to just of_mutex
  of/platform: Fix of_platform_device_destroy iteration of devices
  of: Migrate of_find_node_by_name() users to for_each_node_by_name()
  tty: Update hypervisor tty drivers to use core stdout parsing code.
  arm/versatile: Add the uart as the stdout device.
  of: Enable console on serial ports specified by /chosen/stdout-path
  ...
2014-08-14 09:53:39 -06:00
Linus Torvalds
d782cebd6b Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "The main changes in this cycle are:

   - RAS tracing/events infrastructure, by Gong Chen.

   - Various generalizations of the APEI code to make it available to
     non-x86 architectures, by Tomasz Nowicki"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras: Fix build warnings in <linux/aer.h>
  acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context.
  acpi, apei, ghes: Make NMI error notification to be GHES architecture extension.
  apei, mce: Factor out APEI architecture specific MCE calls.
  RAS, extlog: Adjust init flow
  trace, eMCA: Add a knob to adjust where to save event log
  trace, RAS: Add eMCA trace event interface
  RAS, debugfs: Add debugfs interface for RAS subsystem
  CPER: Adjust code flow of some functions
  x86, MCE: Robustify mcheck_init_device
  trace, AER: Move trace into unified interface
  trace, RAS: Add basic RAS trace event
  x86, MCE: Kill CPU_POST_DEAD
2014-08-04 17:21:59 -07:00
Aravind Gopalakrishnan
eba4bfb34d EDAC, MCE, AMD: Add MCE decoding for F15h M60h
Add decoding logic for new Fam15h model 60h.

Tested using mce_amd_inj module and works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1405098795-4678-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: simplify a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-07-14 16:58:19 +02:00
Jason Baron
78fd4d1242 ie31200_edac: Allocate mci and map mchbar first
Check for memory allocation and mchbar mapping failures before
initializing the dimm info tables needlessly.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Suggested-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/ead8f53e699f1ce21c2e17f3cffb4685d4faf72a.1404939455.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-07-10 10:55:12 +02:00
Jason Baron
7ee40b897d ie31200_edac: Introduce the driver
Add a driver for the E3-1200 series of Intel DRAM controllers, based on
the following E3-1200 specs:

http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200-family-vol-2-datasheet.html
http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-vol-2-datasheet.html

I've tested this on bad memory hardware, and observed correlating bad
reads and uncorrected memory errors as reported by the driver.

Tested against:

CPU E3-1270 v3 @ 3.50GHz : 8086:0c08 (haswell)
CPU E3-1270 V2 @ 3.50GHz : 8086:0158 (ivy bridge)
CPU E31270 @ 3.40GHz : 8086:0108 (sandy bridge)

Signed-off-by: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/95c83e80dd40b5377e8bb206285c5d95ac623872.1403818526.git.jbaron@akamai.com
[ Boris: realign defines ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-07-04 14:00:26 +02:00
Jason Baron
a21e98ce1e x38_edac: make use of lo_hi_readq()
Convert to the generic API.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/bb9a4cbb980cc7b51be75cbfcf644553bf6a04cd.1403818526.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-07-04 13:46:03 +02:00
Aristeu Rozanski
50d1bb9367 sb_edac: add support for Haswell based systems
Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge
ones. This patch adds support to Haswell based systems.

[m.chehab@samsung.com: Fix CodingStyle issues]
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:47:01 -03:00
Mauro Carvalho Chehab
c41afdca29 sb_edac: Fix mix tab/spaces alignments
We should not have spaces before ^I on alignments.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:47:00 -03:00
Aristeu Rozanski
7b8278358c edac: add DDR4 and RDDR4
Haswell memory controller can make use of DDR4 and Registered DDR4

Cc: tony.luck@intel.com
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:55 -03:00
Aristeu Rozanski
adc61bcd91 sb_edac: remove bogus assumption on mc ordering
When a MC is handled, the correct sbridge_dev is searched based on the node,
checking again later with the assumption the first memory controller found is
the first socket's memory controller is a bogus assumption. Get rid of it.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:51 -03:00
Aristeu Rozanski
d7c660b7dc sb_edac: make minimal use of channel_mask
channel_mask will be used in the future to determine which group of memory
modules is causing the errors since when mirroring, lockstep and close page
are enabled you can't. While that doesn't happen, use the channel_mask to
determine the channel instead of relying on the MC event/exception.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:46 -03:00
Aristeu Rozanski
2ff3a308b5 sb_edac: fix socket detection on Ivy Bridge controllers
This patch fixes the obvious bug while handling the socket/HA bitmask used in
Ivy Bridge memory controllers.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:42 -03:00
Aristeu Rozanski
66ca727491 sb_edac: update Kconfig description
Kconfig wasn't updated when Ivy Bridge support was added.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:37 -03:00
Aristeu Rozanski
dbc954dddd sb_edac: search devices using product id
This patch changes the way devices are searched by using product id instead of
device/function numbers. Tested in a Sandy Bridge and a Ivy Bridge machine to
make sure everything works properly.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:33 -03:00
Aristeu Rozanski
b976bcf249 sb_edac: make RIR limit retrieval per model
Haswell has a different way to retrieve RIR limits, make this procedure per
model.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:29 -03:00
Aristeu Rozanski
f14d6892e4 sb_edac: make node id retrieval per model
Haswell has a different way to retrieve the node id, make so this procedure
can be reimplemented.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:25 -03:00
Aristeu Rozanski
9e37544615 sb_edac: make memory type detection per memory controller
Haswell has different register, offset to determine memory type and supports
DDR4 in some models. This patch makes it easier to have a different method
depending on the memory controller type.

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:23:46 -03:00
Grant Likely
ccdb8ed3b3 of: Migrate of_find_node_by_name() users to for_each_node_by_name()
There are a bunch of users open coding the for_each_node_by_name() by
calling of_find_node_by_name() directly instead of using the macro. This
is getting in the way of some cleanups, and the possibility of removing
of_find_node_by_name() entirely. Clean it up so that all the users are
consistent.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Takashi Iwai <tiwai@suse.de>
2014-06-26 17:12:24 +01:00
Fabian Frederick
6866b39056 EDAC, edac_module.c: Remove unnecessary test on unsigned value
unsigned long value is never < 0.

Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Link: http://lkml.kernel.org/r/1402341618-10674-1-git-send-email-fabf@skynet.be
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-06-24 15:13:08 +02:00
Chen, Gong
76ac8275f2 trace, RAS: Add basic RAS trace event
To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.

Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-06-23 10:12:19 -07:00
Linus Torvalds
425553209b PCI changes for the v3.16 merge window:
Enumeration
     - Notify driver before and after device reset (Keith Busch)
     - Use reset notification in NVMe (Keith Busch)
 
   NUMA
     - Warn if we have to guess host bridge node information (Myron Stowe)
     - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee Suthikulpanit)
     - Clean up and mark early_root_info_init() as deprecated (Suravee Suthikulpanit)
 
   Driver binding
     - Add "driver_override" for force specific binding (Alex Williamson)
     - Fail "new_id" addition for devices we already know about (Bandan Das)
 
   Resource management
     - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
     - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
     - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
     - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
     - Don't convert BAR address to resource if dma_addr_t is too small (Bjorn Helgaas)
     - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
     - Don't print anything while decoding is disabled (Bjorn Helgaas)
     - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
     - Add resource allocation comments (Bjorn Helgaas)
     - Restrict 64-bit prefetchable bridge windows to 64-bit resources (Yinghai Lu)
     - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)
 
   PCI device hotplug
     - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
     - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
     - Fix rphahp endianess issues (Laurent Dufour)
     - Acknowledge spurious "cmd completed" event (Rajat Jain)
     - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
     - Fix cpqphp possible NULL dereference (Rickard Strandqvist)
 
   MSI
     - Replace pci_enable_msi_block() by pci_enable_msi_exact() (Alexander Gordeev)
     - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
     - Simplify populate_msi_sysfs() (Jan Beulich)
 
   Virtualization
     - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
     - Mark RTL8110SC INTx masking as broken (Alex Williamson)
 
   Generic host bridge driver
     - Add generic PCI host controller driver (Will Deacon)
 
   Freescale i.MX6
     - Use new clock names (Lucas Stach)
     - Drop old IRQ mapping (Lucas Stach)
     - Remove optional (and unused) IRQs (Lucas Stach)
     - Add support for MSI (Lucas Stach)
     - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Renesas R-Car
     - Add gen2 device tree support (Ben Dooks)
     - Use new OF interrupt mapping when possible (Lucas Stach)
     - Add PCIe driver (Phil Edworthy)
     - Add PCIe MSI support (Phil Edworthy)
     - Add PCIe device tree bindings (Phil Edworthy)
 
   Samsung Exynos
     - Remove unnecessary OOM messages (Jingoo Han)
     - Fix add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Synopsys DesignWare
     - Make MSI ISR shared IRQ aware (Lucas Stach)
 
   Miscellaneous
     - Check for broken config space aliasing (Alex Williamson)
     - Update email address (Ben Hutchings)
     - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
     - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
     - Remove unnecessary __ref annotations (Bjorn Helgaas)
     - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns (Bjorn Helgaas)
     - Fix use of uninitialized MPS value (Bjorn Helgaas)
     - Tidy x86/gart messages (Bjorn Helgaas)
     - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
     - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
     - Remove unused serial device IDs (Jean Delvare)
     - Use designated initialization in PCI_VDEVICE (Mark Rustad)
     - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
     - Configure MPS on ARM (Murali Karicheri)
     - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
     - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
     - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
     - Remove pcibios_add_platform_entries() (Sebastian Ott)
     - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
     - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
     - Add and use new pci_is_bridge() interface (Yijing Wang)
     - Make pci_bus_add_device() void (Yijing Wang)
 
   DMA API
     - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
     - Fix typos in docs (Emilio López)
     - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
     - Change dma_declare_coherent_memory() CPU address to phys_addr_t (Bjorn Helgaas)
     - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory() (Bjorn Helgaas)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjMaQAAoJEFmIoMA60/r8XncQAKX7cD6btXCZnrcYo7inseyp
 3rwOlrsNkWyHqSj/RqqzE1NY6L1h5G2uliI6xg1SKenuHPDcosm5d8FYO0ORKiUs
 xrqBkmZJHXN7fck//tJwsTXiYh5u42RO8QWbvZVr5UqXe40LyaMHMh9Y7VarrU/o
 sM2ADzFKagv1qMQ13nmYxqT+Zl+CqpimyLP+ep6Nfqxi6ils+KJ6b9SKYqrpqE6t
 Mcq2K5ShqU5SaYub1JIXLcQ9XylID+t1M9+cwixcs7a87HbJiktfkGqQvQJoUIuK
 Q5U+abcIGk4vfOnDCctSnoRyrcbTAZ/vqfo0vpX22TokESjwrD8hFOX5HPOFtD+4
 wIDbYurW/8oBhLRaJ0uTPzSH8bXjXTynAwxHZgIrEur5908eECKQ/WiFCxyrovvv
 r4ThAN0FaobllEr0XOFESOzDNSt/ME00WWI7+puAJ/KJkFEtcXt9othLmLmvLz8H
 2GWXrm/aOR0WUO7foGUxI3bXYlDN6NbSKpfuZsLAi2VAyJJ6L6yVSo/fT0X07e3z
 qRy9LOohuiwIKv/I4F2SEq2REfGGsnkrJBoeQi/oBZDcBy1Lsi7P9LWIERhLQEM+
 Hm+30lC/f326nI3hoyThj2k2xxZOQzCIvrt658xP4qd9Zfe1bvCH58FF8K62CoOd
 p8XAf7Sl6v6YUodUrT/t
 =km55
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
    - Notify driver before and after device reset (Keith Busch)
    - Use reset notification in NVMe (Keith Busch)

  NUMA
    - Warn if we have to guess host bridge node information (Myron Stowe)
    - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
      Suthikulpanit)
    - Clean up and mark early_root_info_init() as deprecated (Suravee
      Suthikulpanit)

  Driver binding
    - Add "driver_override" for force specific binding (Alex Williamson)
    - Fail "new_id" addition for devices we already know about (Bandan
      Das)

  Resource management
    - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
    - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
    - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
    - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
    - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
    - Don't convert BAR address to resource if dma_addr_t is too small
      (Bjorn Helgaas)
    - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
    - Don't print anything while decoding is disabled (Bjorn Helgaas)
    - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
    - Add resource allocation comments (Bjorn Helgaas)
    - Restrict 64-bit prefetchable bridge windows to 64-bit resources
      (Yinghai Lu)
    - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)

  PCI device hotplug
    - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
    - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
    - Fix rphahp endianess issues (Laurent Dufour)
    - Acknowledge spurious "cmd completed" event (Rajat Jain)
    - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
    - Fix cpqphp possible NULL dereference (Rickard Strandqvist)

  MSI
    - Replace pci_enable_msi_block() by pci_enable_msi_exact()
      (Alexander Gordeev)
    - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
    - Simplify populate_msi_sysfs() (Jan Beulich)

  Virtualization
    - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
    - Mark RTL8110SC INTx masking as broken (Alex Williamson)

  Generic host bridge driver
    - Add generic PCI host controller driver (Will Deacon)

  Freescale i.MX6
    - Use new clock names (Lucas Stach)
    - Drop old IRQ mapping (Lucas Stach)
    - Remove optional (and unused) IRQs (Lucas Stach)
    - Add support for MSI (Lucas Stach)
    - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)

  Renesas R-Car
    - Add gen2 device tree support (Ben Dooks)
    - Use new OF interrupt mapping when possible (Lucas Stach)
    - Add PCIe driver (Phil Edworthy)
    - Add PCIe MSI support (Phil Edworthy)
    - Add PCIe device tree bindings (Phil Edworthy)

  Samsung Exynos
    - Remove unnecessary OOM messages (Jingoo Han)
    - Fix add_pcie_port() section mismatch warning (Sachin Kamat)

  Synopsys DesignWare
    - Make MSI ISR shared IRQ aware (Lucas Stach)

  Miscellaneous
    - Check for broken config space aliasing (Alex Williamson)
    - Update email address (Ben Hutchings)
    - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
    - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
    - Remove unnecessary __ref annotations (Bjorn Helgaas)
    - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
      (Bjorn Helgaas)
    - Fix use of uninitialized MPS value (Bjorn Helgaas)
    - Tidy x86/gart messages (Bjorn Helgaas)
    - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
    - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
    - Remove unused serial device IDs (Jean Delvare)
    - Use designated initialization in PCI_VDEVICE (Mark Rustad)
    - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
    - Configure MPS on ARM (Murali Karicheri)
    - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
    - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
    - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
    - Remove pcibios_add_platform_entries() (Sebastian Ott)
    - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
    - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
    - Add and use new pci_is_bridge() interface (Yijing Wang)
    - Make pci_bus_add_device() void (Yijing Wang)

  DMA API
    - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
    - Fix typos in docs (Emilio López)
    - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
    - Change dma_declare_coherent_memory() CPU address to phys_addr_t
      (Bjorn Helgaas)
    - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
      (Bjorn Helgaas)"

* tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
  MAINTAINERS: Add generic PCI host controller driver
  PCI: generic: Add generic PCI host controller driver
  PCI: imx6: Add support for MSI
  PCI: designware: Make MSI ISR shared IRQ aware
  PCI: imx6: Remove optional (and unused) IRQs
  PCI: imx6: Drop old IRQ mapping
  PCI: imx6: Use new clock names
  i82875p_edac: Assign PCI resources before adding device
  ARM/PCI: Call pcie_bus_configure_settings() to set MPS
  PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
  PCI: Make pci_bus_add_device() void
  PCI: exynos: Fix add_pcie_port() section mismatch warning
  PCI: Introduce new device binding path using pci_dev.driver_override
  PCI: rcar: Add gen2 device tree support
  PCI: cpqphp: Fix possible null pointer dereference
  PCI: rcar: Add R-Car PCIe device tree bindings
  PCI: rcar: Add MSI support for PCIe
  PCI: rcar: Add Renesas R-Car PCIe driver
  PCI: Fix return value from pci_user_{read,write}_config_*()
  PCI: exynos: Remove unnecessary OOM messages
  ...
2014-06-02 12:15:19 -07:00
Bjorn Helgaas
617b4157a5 Merge branches 'pci/host-exynos', 'pci/host-imx6', 'pci/resource' and 'pci/misc' into next
* pci/host-exynos:
  PCI: exynos: Fix add_pcie_port() section mismatch warning

* pci/host-imx6:
  PCI: imx6: Add support for MSI
  PCI: designware: Make MSI ISR shared IRQ aware
  PCI: imx6: Remove optional (and unused) IRQs
  PCI: imx6: Drop old IRQ mapping
  PCI: imx6: Use new clock names
  PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning

* pci/resource:
  i82875p_edac: Assign PCI resources before adding device

* pci/misc:
  ARM/PCI: Call pcie_bus_configure_settings() to set MPS
  PCI: Make pci_bus_add_device() void

Conflicts:
	drivers/edac/i82875p_edac.c
2014-05-30 11:41:17 -06:00
Yinghai Lu
06b00514b7 i82875p_edac: Assign PCI resources before adding device
Assign PCI resources before pci_bus_add_device().  The resources must be
assigned before a driver can claim the device.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-05-30 10:56:46 -06:00
Yijing Wang
c893d133ea PCI: Make pci_bus_add_device() void
pci_bus_add_device() always returns 0, so there's no point in returning
anything at all.  Make it a void function and remove the tests of the
return value from the callers.

[bhelgaas: changelog, remove unused "err" from i82875p_setup_overfl_dev()]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-05-30 09:34:27 -06:00
Loc Ho
aa2064d7dd EDAC: Fix MC scrub mode comparsion bug for correctable errors
The MC structure field scrub_mode is of integer type - not bit field.
Use it accordingly.

Signed-off-by: Loc Ho <lho@apm.com>
Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-05-09 09:35:04 +02:00
Borislav Petkov
c5c0903b2c EDAC, MCE, AMD: Remove leftover unused mask
295d8cda26 ("EDAC, MCE, AMD: Drop local coreid reporting") removed the
code snippet which used that mask but forgot to drop the mask itself. Do
that now.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-05-08 20:37:07 +02:00
Linus Torvalds
3c83e61e67 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
 "The main set of series of patches for media subsystem, including:
   - document RC sysfs class
   - added an API to setup scancode to allow waking up systems using the
     Remote Controller
   - add API for SDR devices.  Drivers are still on staging
   - some API improvements for getting EDID data from media
     inputs/outputs
   - new DVB frontend driver for drx-j (ATSC)
   - one driver (it913x/it9137) got removed, in favor of an improvement
     on another driver (af9035)
   - added a skeleton V4L2 PCI driver at documentation
   - added a dual flash driver (lm3646)
   - added a new IR driver (img-ir)
   - added an IR scancode decoder for the Sharp protocol
   - some improvements at the usbtv driver, to allow its core to be
     reused.
   - added a new SDR driver (rtl2832u_sdr)
   - added a new tuner driver (msi001)
   - several improvements at em28xx driver to fix PM support, device
     removal and to split the V4L2 specific bits into a separate
     sub-driver
   - one driver got converted to videobuf2 (s2255drv)
   - the e4000 tuner driver now follows an improved binding model
   - some fixes at V4L2 compat32 code
   - several fixes and enhancements at videobuf2 code
   - some cleanups at V4L2 API documentation
   - usual driver enhancements, new board additions and misc fixups"

[ NOTE! This merge effective drops commit 4329b93b28 ("of: Reduce
  indentation in of_graph_get_next_endpoint").

  The of_graph_get_next_endpoint() function was moved and renamed by
  commit fd9fdb78a9 ("[media] of: move graph helpers from
  drivers/media/v4l2-core to drivers/of").  It was originally called
  v4l2_of_get_next_endpoint() and lived in the file
  drivers/media/v4l2-core/v4l2-of.c.

  In that original location, it was then fixed to support empty port
  nodes by commit b9db140c1e ("[media] v4l: of: Support empty port
  nodes"), and that commit clashes badly with the dropped "Reduce
  intendation" commit.  I had to choose one or the other, and decided
  that the "Support empty port nodes" commit was more important ]

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (426 commits)
  [media] em28xx-dvb: fix PCTV 461e tuner I2C binding
  Revert "[media] em28xx-dvb: fix PCTV 461e tuner I2C binding"
  [media] em28xx: fix PCTV 290e LNA oops
  [media] em28xx-dvb: fix PCTV 461e tuner I2C binding
  [media] m88ds3103: fix bug on .set_tone()
  [media] saa7134: fix WARN_ON during resume
  [media] v4l2-dv-timings: add module name, description, license
  [media] videodev2.h: add parenthesis around macro arguments
  [media] saa6752hs: depends on CRC32
  [media] si4713: fix Kconfig dependencies
  [media] Sensoray 2255 uses videobuf2
  [media] adv7180: free an interrupt on failure paths in init_device()
  [media] e4000: make VIDEO_V4L2 dependency optional
  [media] af9033: Don't export functions for the hardware filter
  [media] af9035: use af9033 PID filters
  [media] af9033: implement PID filter
  [media] rtl2832_sdr: do not use dynamic stack allocation
  [media] e4000: fix 32-bit build error
  [media] em28xx-audio: make sure audio is unmuted on open()
  [media] DocBook media: v4l2_format_sdr was renamed to v4l2_sdr_format
  ...
2014-04-04 09:50:07 -07:00
Linus Torvalds
4a4389abdd Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull sb_edac patches from Mauro Carvalho Chehab:
 "A couple sb_edac driver improvements, cleaning a little bit the amount
  of data sent to dmesg, and fixing one error message"

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  sb_edac: mark MCE messages as KERN_DEBUG
  sb_edac: use "event" instead of "exception" when MC wasnt signaled
2014-04-03 17:07:17 -07:00
Linus Torvalds
bdfc7cbdee Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
Pull MIPS updates from Ralf Baechle:
 - Support for Imgtec's Aptiv family of MIPS cores.
 - Improved detection of BCM47xx configurations.
 - Fix hiberation for certain configurations.
 - Add support for the Chinese Loongson 3 CPU, a MIPS64 R2 core and
   systems.
 - Detection and support for the MIPS P5600 core.
 - A few more random fixes that didn't make 3.14.
 - Support for the EVA Extended Virtual Addressing
 - Switch Alchemy to the platform PATA driver
 - Complete unification of Alchemy support
 - Allow availability of I/O cache coherency to be runtime detected
 - Improvments to multiprocessing support for Imgtec platforms
 - A few microoptimizations
 - Cleanups of FPU support
 - Paul Gortmaker's fixes for the init stuff
 - Support for seccomp

* 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr: (165 commits)
  MIPS: CPC: Use __raw_ memory access functions
  MIPS: CM: use __raw_ memory access functions
  MIPS: Fix warning when including smp-ops.h with CONFIG_SMP=n
  MIPS: Malta: GIC IPIs may be used without MT
  MIPS: smp-mt: Use common GIC IPI implementation
  MIPS: smp-cmp: Remove incorrect core number probe
  MIPS: Fix gigaton of warning building with microMIPS.
  MIPS: Fix core number detection for MT cores
  MIPS: MT: core_nvpes function to retrieve VPE count
  MIPS: Provide empty mips_mt_set_cpuoptions when CONFIG_MIPS_MT=n
  MIPS: Lasat: Replace del_timer by del_timer_sync
  MIPS: Malta: Setup PM I/O region on boot
  MIPS: Loongson: Add a Loongson-3 default config file
  MIPS: Loongson 3: Add CPU hotplug support
  MIPS: Loongson 3: Add Loongson-3 SMP support
  MIPS: Loongson: Add Loongson-3 Kconfig options
  MIPS: Loongson: Add swiotlb to support All-Memory DMA
  MIPS: Loongson 3: Add serial port support
  MIPS: Loongson 3: Add IRQ init and dispatch support
  MIPS: Loongson 3: Add HT-linked PCI support
  ...
2014-04-02 13:40:50 -07:00
Linus Torvalds
62ff577fa2 A bunch of EDAC updates all over the place:
* Support for new AMD models, along with more graceful fallback for
 unsupported hw.
 
 * Bunch of fixes from SUSE accumulated from bug reports
 
 * Misc other fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTMUueAAoJEBLB8Bhh3lVKbGIP/iXBanEUUXbg027hvNch8tbJ
 Yzsk3VkhmZ6hWXU4XNkvFFbSfCfHuHYQSEAxro6yc2t+6aVhMBrKIcCBNejNYt37
 louswtfhC9A4geX206bzfOFNxOVsWzX2vxhhQFbqybZe6BnmcnCSKn//v+kF0/5E
 6CrwQyKQjl2vGewPU4pBlrbnRNduPRs6bg6lWgzNxbKrhD8Ce/7NQRVd5v698pNm
 NTqfS37J90o1DP3I6yY30trtprPjjfnFANxb2DPaf09Z/X+aI1ONNe4s/URBd+qg
 yYIWAphbKopXPZOS0wDU0ikb7dGWH1lcQdVE69GxnCEvz9hz2ISg3plguwbYLPzK
 CmI4RGU0D8n5Irxu8bUID4SLT6fSAqU+cgJCdu6dqUncB7khcwG8dWJxMYNGjlBm
 TD9aCRRs+/b07KUrBeL1tufgd9gGPKWDeecTw55VMofIFkYvwIAz8TFdj3ql4Uz/
 AsYJkAqvoO4dTz1ai+PJ06EWSijUT+KLkyZbybS5+PiP4uRAWY8pkCKbvTJQb1xN
 0SAwm0tqT7mQKc/oIsvpZ/TLjwYxRhAeVMvrkElkNRPSsrt70rDVQiPgzQh9GTqV
 d5GmYI65Tk2Jrl7dVN+Pk+NDdS/y52SPk1AMDsfoBP6dndftdF6/gtqnijmbLgMY
 /yH+hIAKJXJUUKO4pELe
 =vafk
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "A bunch of EDAC updates all over the place:

   - Support for new AMD models, along with more graceful fallback for
     unsupported hw.

   - Bunch of fixes from SUSE accumulated from bug reports

   - Misc other fixes and cleanups"

* tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Add support for newer F16h models
  i7core_edac: Drop unused variable
  i82875p_edac: Drop redundant call to pci_get_device()
  amd8111_edac: Fix leaks in probe error paths
  e752x_edac: Drop pvt->bridge_ck
  MCE, AMD: Fix decoding module loading on unsupported hw
  i5100_edac: Remove an unneeded condition in i5100_init_csrows()
  sb_edac: Degrade log level for device registration
  amd64_edac: Fix logic to determine channel for F15 M30h processors
  edac/85xx: Remove deprecated IRQF_DISABLED
  i3200_edac: Add a missing pci_disable_device() on the exit path
  i5400_edac: Disable device when unloading module
  e752x_edac: Simplify call to pci_get_device()
2014-04-01 13:54:00 -07:00
Daniel Walker
1bc021e815 EDAC: Octeon: Add error injection support
This adds an ad-hoc error injection method. Octeon II doesn't have
hardware support for injection, so this simulates it.

Signed-off-by: Daniel Walker <dwalker@fifo99.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/5873/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-31 18:17:12 +02:00
Daniel Walker
5331de0570 EDAC: Octeon: Fix lack of opstate_init
If the opstate_init() isn't called the driver won't start properly.

I just added it in what appears to be an appropriate place.

Signed-off-by: Daniel Walker <dwalker@fifo99.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/5872/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-31 18:17:12 +02:00
Aristeu Rozanski
49856dc973 sb_edac: mark MCE messages as KERN_DEBUG
Since the driver is decoding the MCE, it's useless to have these
messages printed unless you're debugging a problem in the driver.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-03-13 09:52:09 -03:00
Aristeu Rozanski
cf40f80cbe sb_edac: use "event" instead of "exception" when MC wasnt signaled
Corrected Errors are MC events, not exceptions and reporting as the
later might confuse users.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-03-13 09:52:01 -03:00
Mauro Carvalho Chehab
c897df0e2d Linux 3.14-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTE+9XAAoJEHm+PkMAQRiGrMQIAKI2V49Kj8WlnwGchFvsbGJB
 SLALwNi33T/IBKdZRhrfryBu02Zj7eVvZ2ML35dJEnmF88O+dJBDMTkKV1xalrip
 mtkBrjUnfAI04fq/daLQ1TsAy4qqlra5tSTuDCw8ILOnGPwT0VydIEHNdtmoUIfw
 xlZLxHzny1MslZ78d7uR/cUnV9ylKRRajWzfw1HT8hL51fCt8nRWY0sCvwvl+kMJ
 LsK+6I7mHDUuzA7QBmBI+dhzQgos5+JkkrnpmqHAqwmIh+AI3ksmjUCQ4dM7owrO
 IvEx+ZNDqxAdLcm1WAxATNfxddFXHc62JTvKuuKqTVWuaxVfK1Aqt8MjDMIPeAQ=
 =yV5u
 -----END PGP SIGNATURE-----

Merge tag 'v3.14-rc5' into patchwork

Linux 3.14-rc5

* tag 'v3.14-rc5': (1117 commits)
  Linux 3.14-rc5
  drm/vmwgfx: avoid null pointer dereference at failure paths
  drm/vmwgfx: Make sure backing mobs are cleared when allocated. Update driver date.
  drm/vmwgfx: Remove some unused surface formats
  MAINTAINERS: add maintainer entry for Armada DRM driver
  arm64: Fix !CONFIG_SMP kernel build
  arm64: mm: Add double logical invert to pte accessors
  dm cache: fix truncation bug when mapping I/O to >2TB fast device
  perf tools: Fix strict alias issue for find_first_bit
  powerpc/powernv: Fix indirect XSCOM unmangling
  powerpc/powernv: Fix opal_xscom_{read,write} prototype
  powerpc/powernv: Refactor PHB diag-data dump
  powerpc/powernv: Dump PHB diag-data immediately
  powerpc: Increase stack redzone for 64-bit userspace to 512 bytes
  powerpc/ftrace: bugfix for test_24bit_addr
  powerpc/crashdump : Fix page frame number check in copy_oldmem_page
  powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly
  kvm, vmx: Really fix lazy FPU on nested guest
  perf tools: fix BFD detection on opensuse
  drm/radeon: enable speaker allocation setup on dce3.2
  ...
2014-03-11 06:55:49 -03:00
Aravind Gopalakrishnan
85a8885bd0 amd64_edac: Add support for newer F16h models
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
turned on using mce_amd_inj module and the patch works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Tested-by: Arindam Nath <Arindam.Nath@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-27 18:03:16 +01:00
Jean Delvare
f118920baf i7core_edac: Drop unused variable
Fix the following warning:

drivers/edac/i7core_edac.c: In function "core_mce_output_error":
drivers/edac/i7core_edac.c:1711:8: warning: variable "type" set but not used [-Wunused-but-set-variable]
  char *type, *optype, *err;
        ^
According to Mauro, type can just be dropped, as tp_event now maps if
the error is corrected, uncorrected non-fatal or uncorrected fatal
one.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224171358.692d7e5a@endymion.delvare
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25 10:22:34 +01:00
Jean Delvare
5fdae41900 i82875p_edac: Drop redundant call to pci_get_device()
The first call to pci_get_device() in i82875p_probe1() is useless. The
result is immediately reset at the beginning of
i82875p_setup_overfl_dev(), which then issues the same
pci_get_device() call. Dropping the redundant call avoids a device
reference leak.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224160316.60e55fb6@endymion.delvare
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25 10:13:46 +01:00
Jean Delvare
5576fb8696 amd8111_edac: Fix leaks in probe error paths
Both probe error paths are incomplete and leak memory and device
references. Add the missing cleanups.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224154534.5a3b797a@endymion.delvare
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25 10:09:09 +01:00
Jean Delvare
0e089c1828 e752x_edac: Drop pvt->bridge_ck
pvt->bridge_ck always points to the same device as pvt->dev_d0f1, so
get rid of the former and only use the latter.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140224151544.16ba28a0@endymion.delvare
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25 10:01:30 +01:00
Jean Delvare
75135da0d6 i7300_edac: Fix device reference count
pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.

This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.

Tested on an Intel S7000FC4UR system with a 7300 chipset.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25 09:43:13 +01:00
Jean Delvare
c0f5eeed0f i7core_edac: Fix PCI device reference count
The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.

In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.

This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25 08:54:45 +01:00
Borislav Petkov
fd0f5ffff8 MCE, AMD: Fix decoding module loading on unsupported hw
We want to still be able to issue some error information on systems for
which there is no decoding support (think older distro kernels here,
for example). Therefore, we allow module registration but skip the
per-family bank-specific decoders and issue the general information
only, i.e.:

[   46.822828] [Hardware Error]: Error Status: Uncorrected, software containable error.
[   46.822846] [Hardware Error]: CPU:0 (15:30:0) MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f
[   46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)

with the hope that it still contains helpful useful bits.

Suggested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392659391-2411-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-24 10:25:47 +01:00
Dan Carpenter
9d6c7cbe30 i5100_edac: Remove an unneeded condition in i5100_init_csrows()
We checked that "npages" was not zero a couple lines earlier so we can
remove this and pull the code in one indent level.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20140213111738.GB15549@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-20 11:52:58 +01:00
Jiang Liu
ec5a0b3824 sb_edac: Degrade log level for device registration
On a system with four Intel processors, it generates too many messages
"EDAC sbridge: Seeking for: dev 1d.3 PCI ID xxxx". And it doesn't give
many useful information for normal users, so change log level from INFO
to DEBUG.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1392613824-11230-1-git-send-email-jiang.liu@linux.intel.com
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-20 11:42:36 +01:00
Borislav Petkov
cb6ef42e51 EDAC: Correct workqueue setup path
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.

On that second path we don't need to init the workqueue which has been
initialized already.

Thanks to Tejun for workqueue insights.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>
2014-02-14 10:40:47 +01:00
Borislav Petkov
9da21b1509 EDAC: Poll timeout cannot be zero, p2
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
2014-02-14 10:40:29 +01:00
Prarit Bhargava
79040cad3f drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
If you do

  echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec

the following stack trace is output because the edac module is not
designed to poll with a timeout of zero.

  WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
  list_add corruption. prev->next should be next (ffff8808291dd1b8), but was           (null). (prev=ffff8808286fe3f8).
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    __list_add+0xac/0xc0
    __internal_add_timer+0xab/0x130
    internal_add_timer+0x17/0x40
    mod_timer_pinned+0xca/0x170
    intel_pstate_timer_func+0x28a/0x380
    call_timer_fn+0x36/0x100
    run_timer_softirq+0x1ff/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290

  kernel BUG at kernel/timer.c:1084!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    run_timer_softirq+0x245/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290
  RIP   cascade+0x93/0xa0

  WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Workqueue: edac-poller edac_mc_workq_function [edac_core]
  Call Trace:
    dump_stack+0x45/0x56
    warn_slowpath_common+0x7d/0xa0
    warn_slowpath_null+0x1a/0x20
    __queue_delayed_work+0xed/0x1a0
    queue_delayed_work_on+0x27/0x50
    edac_mc_workq_function+0x72/0xa0 [edac_core]
    process_one_work+0x17b/0x460
    worker_thread+0x11b/0x400
    kthread+0xd2/0xf0
    ret_from_fork+0x7c/0xb0

This patch adds a range check in the edac_mc_poll_msec code to check for 0.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Aravind Gopalakrishnan
9d0e8d8348 amd64_edac: Fix logic to determine channel for F15 M30h processors
Update current channel selection logic to include F15h, M30h memory
controllers.

Refer F15 M30h BKDG D18F2x110[7:6] (DRAM Controller Select Low)
(Link:http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf)

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1390338216-3873-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07 15:01:19 +01:00
Johannes Thumshirn
e245e3b25f edac/85xx: Remove deprecated IRQF_DISABLED
Remove IRQF_DISABLED as it is a NOOP.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Link: http://lkml.kernel.org/r/1390293747-11185-1-git-send-email-johannes.thumshirn@men.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07 14:48:07 +01:00
Hitoshi Mitake
b90fe1568f i3200_edac: Add a missing pci_disable_device() on the exit path
Currently, i3200_edac.c forgets to call pci_disable_device() during a
process of disabling device. This patch adds that call.

The problem was detected by Huqiu Liu.

Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Link: http://lkml.kernel.org/r/1389587178-10167-1-git-send-email-mitake.hitoshi@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07 11:34:55 +01:00
Aristeu Rozanski
c2e650c49a i5400_edac: Disable device when unloading module
This was found by Huqiu Liu using a static analysis.

Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140116162021.GY15716@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07 11:28:49 +01:00
Mauro Carvalho Chehab
37e59f876b [media, edac] Change my email address
There are several left overs with my old email address.
Remove their occurrences and add myself at CREDITS, to
allow people to be able to reach me on my new addresses.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-02-07 08:03:07 -02:00
Jean Delvare
2edbf56997 e752x_edac: Simplify call to pci_get_device()
The last parameter, pvt->bridge_ck, is always NULL as far as I can
see, so just use that.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140130091101.22f2b550@endymion.delvare
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-03 19:03:35 +01:00
Linus Torvalds
fab5669d55 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:

 - SCI reporting for other error types not only correctable ones

 - GHES cleanups

 - Add the functionality to override error reporting agents as some
   machines are sporting a new extended error logging capability which,
   if done properly in the BIOS, makes a corresponding EDAC module
   redundant

 - PCIe AER tracepoint severity levels fix

 - Error path correction for the mce device init

 - MCE timer fix

 - Add more flexibility to the error injection (EINJ) debugfs interface

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mce: Fix mce_start_timer semantics
  ACPI, APEI, GHES: Cleanup ghes memory error handling
  ACPI, APEI: Cleanup alignment-aware accesses
  ACPI, APEI, GHES: Do not report only correctable errors with SCI
  ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface
  ACPI, eMCA: Combine eMCA/EDAC event reporting priority
  EDAC, sb_edac: Modify H/W event reporting policy
  EDAC: Add an edac_report parameter to EDAC
  PCI, AER: Fix severity usage in aer trace event
  x86, mce: Call put_device on device_register failure
2014-01-20 12:10:27 -08:00
Linus Torvalds
09854dde94 * mpc85xx PCIe error interrupt support
* misc small enhancements/fixes all over the place.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3PEWAAoJEBLB8Bhh3lVKziAP/3nNouipUvIcRpL3o8U3jX19
 Z38tD0KPjIIX46rgexjoXbI/bKEy6whE1eDc0CG9GPpOaN9p9WWe81hfBIQc9DYK
 epFNdH3Wk83Ml+3+01CBKfLfiIV25c7rc9IOM7g+nMOg65Z++iC1Cea7i7Ul7T5a
 C+CIDdsmco9VrDBNjHHsrdYVZ1Ufx4NxPb5xhemQAiGCXseBtoD1IDS/gOM1F+aP
 UBDSJobjDrFq5JasEkp83U62VBTDndEcxb6yHROVahIPRn4miroc0NMHJ3XEd1/H
 K9u24iXmHgjjG3Euzd3mq8Axsscz8dUksKQT+lL2YIXz+AM8PcrCDrXK6TiVF9/A
 k5Vj91svnWUDrEU0OF1HysOvn0Z/9Y0+Fwyt6//HMyl/7ehCZBINwxsZ4sc4whtd
 nVHV2Ddcc/LjZzQEAvHuyWdf2pfxi3M1VRhNvuGyWtdHBPf5P8tqVUteoztELPM+
 bKr7wkpvKzKajX6xldNNrVP9xNps8hrYdETjhP5FpUYpF4HKntAe+VJvIue+nOja
 K2AxB25qGHyK5rgh69DkSuatZmImflG4brKWbOE+4La0E8OajrM790YdqP2hRnDQ
 3zpuIRr/htIAy7C2vGbmWITRhxXnGx5FhDI7AQRr1VjvgJprWM5zgIacDuH+5C9K
 ftouZ1hFTDI1WEEMDxRF
 =EJ+3
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 - mpc85xx PCIe error interrupt support
 - misc small enhancements/fixes all over the place.

* tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC: Don't try to cancel workqueue when it's never setup
  e752x_edac: Fix pci_dev usage count
  sb_edac: Mark get_mci_for_node_id as static
  EDAC: Mark edac_create_debug_nodes as static
  amd64_edac: Remove "amd64" prefix from static functions
  amd64_edac: Simplify code around decode_bus_error
  amd64_edac: Mark amd64_decode_bus_error as static
  EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
  amd64_edac: Fix condition to verify max channels allowed for F15 M30h
  edac/85xx: Add PCIe error interrupt edac support
2014-01-20 09:26:15 -08:00
Stephen Boyd
881f0fcef9 EDAC: Don't try to cancel workqueue when it's never setup
We only setup a workqueue for edac devices that use the polling
method. We still try to cancel the workqueue if an edac_device
uses the irq method though. This causes a warning from debug
objects when we remove an edac device:

WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0()
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28
Modules linked in: krait_edac(-)
CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69
(unwind_backtrace+0x0/0x144)
(show_stack+0x20/0x24)
(dump_stack+0x74/0xb4)
(warn_slowpath_common+0x78/0x9c)
(warn_slowpath_fmt+0x40/0x48)
(debug_print_object+0x98/0xc0)
(debug_object_assert_init+0xdc/0xec)
(del_timer+0x24/0x7c)
(try_to_grab_pending+0xc0/0x1b0)
(cancel_delayed_work+0x2c/0xa0)
(edac_device_workq_teardown+0x1c/0x38)
(edac_device_del_device+0xb8/0xe4)
(krait_edac_remove+0x50/0x70 [krait_edac])
(platform_drv_remove+0x24/0x28)
(__device_release_driver+0x68/0xc0)
(driver_detach+0xc4/0xc8)
(bus_remove_driver+0xac/0x114)
(driver_unregister+0x38/0x58)
(platform_driver_unregister+0x1c/0x20)
(krait_edac_driver_exit+0x14/0x1c [krait_edac])
(SyS_delete_module+0x178/0x2b4)

Fix it by skipping the workqueue teardown for such devices.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-01-10 15:57:36 +01:00
Aristeu Rozanski
90ed4988b8 e752x_edac: Fix pci_dev usage count
In case the device 0, function 1 is not found using pci_get_device(),
pci_scan_single_device() will be used but, differently than
pci_get_device(), it allocates a pci_dev but doesn't does bump the usage
count on the pci_dev and after few module removals and loads the pci_dev
will be freed.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: mark gross <mark.gross@intel.com>
Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-21 13:26:57 +01:00
Ingo Molnar
014952270e * Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which, if
 done properly in the BIOS, makes a corresponding EDAC module redundant,
 from Gong Chen.
 
 * PCIe AER tracepoint severity levels fix, from Rui Wang.
 
 * Error path correction for the mce device init, from Levente Kurusa.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSrCysAAoJEBLB8Bhh3lVK1ikP/0hKY1Kk4tjbSta9A9Z8LdQG
 9F5JzEny47DpTrLaKij7MqAlbYFO8sSm7Zw0CEztTF7Ou/H37GAuxhMlB8ECMGOm
 Dzu53X1rySTna9mB+1gyXXd+pJypp/oe18/o16rw1QKjI9o2Kfgwfj7lKvytR549
 kDM1dhxEImQIS5cpJPkOPbcpVlSqYN7BnK9/Qx3h0W70httT/8qrr9xVtVL7wjOT
 auTA0R5/TkV06FtxyfHUNULEWTSP+2yNP/iJbusR6f4Jk1j0XmyCFr0BYOkPA1UO
 9+wC9+2R+r7rJw8MBfMzNmPrRzDJHdaiHPwYqse05yewRHfRHe5cgZWJYbL8Qv0u
 2WOX+fY12EfDYlihcOYtlupRzhGfGKRsaRpSuG1zX87ctDxAfNZencv4hnaJvfqG
 Xk6ggIX6tHKEivO2gmaPsmhoKveh0zcozUs+wgh/tvV5QB6ioFCjzHfSEsix5+BH
 ryyg1ri7IZnh92g3UuSUpE0OCbAquMfI7XIJo+kFs0u79dZTL/kD3wVu6oYazwdy
 yTrvIq7Bq5cMWnnni5w7dIU09ef2uvDgyHyAS6+RiqaQxhYFsW8/yx2zJrIloWRs
 7txz6t3CVmWFiejIg2gw6KyjaG6pXRBkDkI1XU6T+bKLb31ojx2+i9UKIIUeRZTB
 iisWAOI6ZSdt4eAkgeaI
 =r//I
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull RAS updates from Borislav Petkov:

  * Add the functionality to override error reporting agents as some
  machines are sporting a new extended error logging capability which, if
  done properly in the BIOS, makes a corresponding EDAC module redundant,
  from Gong Chen.

  * PCIe AER tracepoint severity levels fix, from Rui Wang.

  * Error path correction for the mce device init, from Levente Kurusa.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16 14:33:17 +01:00
Rashika Kheria
8112c0cdf7 sb_edac: Mark get_mci_for_node_id as static
This patch marks the function get_mci_for_node_id() as static because it
is not used outside of sb_edac.c.

Thus, it also eliminates the following warning:
drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 18:02:13 +01:00
Rashika Kheria
95285933d0 EDAC: Mark edac_create_debug_nodes as static
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.

Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:59:59 +01:00
Borislav Petkov
d1ea71cdc9 amd64_edac: Remove "amd64" prefix from static functions
No need for the namespace tagging there. Cleanup setup_pci_device while
at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:54:27 +01:00
Borislav Petkov
df781d0386 amd64_edac: Simplify code around decode_bus_error
Drop wrapper function and prefixes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:29:44 +01:00
Rashika Kheria
79db57cef9 amd64_edac: Mark amd64_decode_bus_error as static
This patch marks the function amd64_decode_bus_error() as static because
it is not used outside of amd64_edac.c.

It also eliminates the following warning:
drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:16:59 +01:00
Chen, Gong
fd52103966 EDAC, sb_edac: Modify H/W event reporting policy
Newer Intel platforms support more than one method to report H/W event.
On this kind of platform, H/W event report can adopt new method and
traditional EDAC method should be disabled. Moreover, if EDAC event
report method is set to *force*, it means event must be reported via
EDAC interface. IOW, it overrides the default event report policy.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit and error messages ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11 18:19:25 +01:00
Chen, Gong
c700f013ad EDAC: Add an edac_report parameter to EDAC
This new parameter is used to control how to report HW error reporting,
especially for newer Intel platform, like Ivybridge-EX, which contains
an enhanced error decoding functionality in the firmware, i.e. eMCA.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11 18:06:47 +01:00
Jingoo Han
ba935f4097 EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
Currently, there is no other bus that has something like this macro for
their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com
[ Boris: swap commit message with better one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06 10:23:41 +01:00
Aravind Gopalakrishnan
7f3f5240ce amd64_edac: Fix condition to verify max channels allowed for F15 M30h
The value returned from 'f15_m30h_determine_channel' will
always be 0x3 max. The condition

	(channel > 4 || channel < 0)

works as hardware never returns a value of 4, but
it leads to static checker analysis errors like
http://marc.info/?l=linux-edac&m=138607615131951&w=2.

Fix that.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain
[ Boris: massage commit message a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06 10:04:17 +01:00
Aristeu Rozanski
bd4b968362 sb_edac: Shut up compiler warning when EDAC_DEBUG is enabled
Fix this:

In file included from drivers/edac/sb_edac.c:27:0:
drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  printk(level "EDAC " prefix ": " fmt, ##arg)
        ^
drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here
  u64   ch_addr, offset, limit, prv = 0;

Limit can be initialized to 0. The only way limit wouldn't be
initialized is if there are no DIMMs present (which would be a bug of
course) and it'd fail on the next test.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30 12:26:36 +01:00
Chunhe Lan
c92132f598 edac/85xx: Add PCIe error interrupt edac support
Add pcie error interrupt edac support for mpc85xx, p3041, p4080, and
p5020. The mpc85xx uses the legacy interrupt report mechanism - the
error interrupts are reported directly to mpic. While the p3041/
p4080/p5020 attaches the most of error interrupts to interrupt zero. And
report error interrupts to mpic via interrupt 0.

This patch can handle both of them.

Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Link: http://lkml.kernel.org/r/1384712714-8826-3-git-send-email-morbidrsa@gmail.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Dave Jiang <dave.jiang@gmail.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-25 11:29:15 +01:00
Linus Torvalds
cdd278db0e Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC driver updates from Mauro Carvalho Chehab:
 - sb_edac: add support for Ivy Bridge support
 - cell_edac: add a missing of_node_put() call

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  cell_edac: fix missing of_node_put
  sb_edac: add support for Ivy Bridge
  sb_edac: avoid decoding the same error multiple times
  sb_edac: rename mci_bind_devs()
  sb_edac: enable multiple PCI id tables to be used
  sb_edac: rework sad_pkg
  sb_edac: allow different interleave lists
  sb_edac: allow different dram_rule arrays
  sb_edac: isolate TOHM retrieval
  sb_edac: rename pci_br
  sb_edac: isolate TOLM retrieval
  sb_edac: make RANK_CFG_A value part of sbridge_info
2013-11-18 14:51:52 -08:00
Linus Torvalds
794e96e8ec EDAC updates for 3.13
Highlights:
 
 * Support for Calxeda ECX-2000 memory controller, from Robert Richter
 
 * Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob Herring
   and Robert Richter
 
 * New maintainer for Freescale's MPC85xx EDAC driver: Johannes Thumshirn
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSiRmeAAoJEBLB8Bhh3lVKYswQAKPCYSsIQg3L3K3320uvWV3x
 NEAxAAdYyN2ds7ksLv//54FBLgH9UeiC56glTMmnK9QvfGCcgHGo1WJCa84LwqIf
 M7H6mKSgTEBZXr7HpWgtarMYjpNve4Nh8SwHv7tlWRJNd3ufcBbOfCY6rEreULJd
 sBTRMuEPc1Ki7NxZr2m/xsPyzWXS1N1nSd2aewiszyY3Rwp1vIAPv/Chr7UwF/Fm
 GGeQonc801hVRIQONwxsXzS2qwj/wgx8OPab05psfMuv6CWLxQQJAzGWbe+gv3V4
 mYx64+U4nOkQ/knRAf9s0fLwJX6DWSTtQer7m5YSUey0dYDfgV+DemLFvS5We7XB
 os9PBGL+0bGUrJ0cnLE6O+6S1qniWaKZrhSZndcYiVoQeDZmaMuartFlIaeRvY21
 WJML2oqqUop2ZyaIKInJEyeD74FIf7BsG3V+RJwsCZx5+38Pm0EnBZqGJ9bnJl7x
 OxXlHjwjZRhlVFIdcN5WeaKoKmXpdcnzLcL1XE2wMgs9ZFaleeyTDXuQm+XxkKGd
 ExD/1TbAoBqFF7FwIAQwqPf1f2HPDBKlSg38X6l6NV5uLK/u5rdffKTMQPWi/6p2
 RDO+Ddbypzm72850hcYc9mY+B8Qe3T3F/iKlbELiK1S5+IQzm/hmfTDcyKStwKZn
 cCTvsIo9QHflhshXR1a8
 =pU/h
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "Following up on last week's discussion, here's my part of the EDAC
  pile, highlights in the signed tag.

  The last two patches have a date from just now because I've just
  applied them to the tree after Johannes sent them to me earlier.  I
  decided to forward them now because they're trivial.

  There's a third one for MPC85xx which adds PCIe error interrupt
  support but since it is not so trivial and hasn't seen any linux-next
  time, I'm deferring it to 3.14

  EDAC update highlights:
   - Support for Calxeda ECX-2000 memory controller, from Robert Richter
   - Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob
     Herring and Robert Richter
   - New maintainer for Freescale's MPC85xx EDAC driver: Johannes
     Thumshirn"

* tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  edac/85xx: Remove mpc85xx_pci_err_remove
  EDAC: Add edac-mpc85xx driver to MAINTAINERS
  edac, highbank: Moving error injection to sysfs for edac
  edac, highbank: Add MAINTAINERS entry
  edac: Unify reporting of device info for device, mc and pci
  edac, highbank: Improve and unify naming
  edac, highbank: Add Calxeda ECX-2000 support
  ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi
  edac, highbank: Fix interrupt setup of mem and l2 controller
2013-11-18 14:50:17 -08:00
Johannes Thumshirn
0f1741c74a edac/85xx: Remove mpc85xx_pci_err_remove
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the
compiler warning which can be seen when building the driver either
statically or as a module.

Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com>
Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-17 20:05:50 +01:00
Libo Chen
3e45588825 cell_edac: fix missing of_node_put
Decrease device_node refcount np after task completion.

Signed-off-by: Libo Chen <libo.chen@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 17:13:43 -02:00
Aristeu Rozanski
4d715a805b sb_edac: add support for Ivy Bridge
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's
wiser to modify sb_edac to support both instead of creating another
driver.

[m.chehab@samsung.com: Fix CodingStyle]
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 17:13:22 -02:00
Aristeu Rozanski
be3036d220 sb_edac: avoid decoding the same error multiple times
Whenever the extended error reporting is active, multiple MCEs will be
generated for the same event, which will lead to multiple repeated
errors to be reported. So check ADDRV and only decode the error if the
MCE address is valid.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:50:03 -02:00
Aristeu Rozanski
ea779b5a09 sb_edac: rename mci_bind_devs()
This is in preparation for Ivy Bridge support

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:49:47 -02:00
Aristeu Rozanski
5153a0f94c sb_edac: enable multiple PCI id tables to be used
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy
Bridge.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:49:27 -02:00
Aristeu Rozanski
cc311991a7 sb_edac: rework sad_pkg
This is in preparation for Ivy Bridge support

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:49:04 -02:00
Aristeu Rozanski
ef1ce51e7b sb_edac: allow different interleave lists
This is in preparation for Ivy Bridge support

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:48:42 -02:00
Aristeu Rozanski
464f1d829a sb_edac: allow different dram_rule arrays
This is in preparation for Ivy Bridge support

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:48:16 -02:00
Aristeu Rozanski
8fd6a43ac9 sb_edac: isolate TOHM retrieval
This is preparation of Ivy Bridge support.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:43:17 -02:00
Aristeu Rozanski
5f8a1b8a7b sb_edac: rename pci_br
Ivy Bridge has more than one, so rename pci_br to pci_br0

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:43:06 -02:00
Aristeu Rozanski
fb79a50926 sb_edac: isolate TOLM retrieval
This is in preparation for the Ivy Bridge support.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:25:27 -02:00
Aristeu Rozanski
ef1e8d03b1 sb_edac: make RANK_CFG_A value part of sbridge_info
This is in preparation of Ivy Bridge support.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:20:37 -02:00
Linus Torvalds
10d0c9705e DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
 
 - Cross arch clean-up and consolidation of early DT scanning code.
 - Clean-up and removal of arch prom.h headers. Makes arch specific
   prom.h optional on all but Sparc.
 - Addition of interrupts-extended property for devices connected to
   multiple interrupt controllers.
 - Refactoring of DT interrupt parsing code in preparation for deferred
   probe of interrupts.
 - ARM cpu and cpu topology bindings documentation.
 - Various DT vendor binding documentation updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
 R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
 huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
 PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
 2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
 Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
 =GCbY
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "DeviceTree updates for 3.13.  This is a bit larger pull request than
  usual for this cycle with lots of clean-up.

   - Cross arch clean-up and consolidation of early DT scanning code.
   - Clean-up and removal of arch prom.h headers.  Makes arch specific
     prom.h optional on all but Sparc.
   - Addition of interrupts-extended property for devices connected to
     multiple interrupt controllers.
   - Refactoring of DT interrupt parsing code in preparation for
     deferred probe of interrupts.
   - ARM cpu and cpu topology bindings documentation.
   - Various DT vendor binding documentation updates"

* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
  powerpc: add missing explicit OF includes for ppc
  dt/irq: add empty of_irq_count for !OF_IRQ
  dt: disable self-tests for !OF_IRQ
  of: irq: Fix interrupt-map entry matching
  MIPS: Netlogic: replace early_init_devtree() call
  of: Add Panasonic Corporation vendor prefix
  of: Add Chunghwa Picture Tubes Ltd. vendor prefix
  of: Add AU Optronics Corporation vendor prefix
  of/irq: Fix potential buffer overflow
  of/irq: Fix bug in interrupt parsing refactor.
  of: set dma_mask to point to coherent_dma_mask
  of: add vendor prefix for PHYTEC Messtechnik GmbH
  DT: sort vendor-prefixes.txt
  of: Add vendor prefix for Cadence
  of: Add empty for_each_available_child_of_node() macro definition
  arm/versatile: Fix versatile irq specifications.
  of/irq: create interrupts-extended property
  microblaze/pci: Drop PowerPC-ism from irq parsing
  of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
  of/irq: Use irq_of_parse_and_map()
  ...
2013-11-12 16:52:17 +09:00
Robert Richter
78cfbf0bbf edac, highbank: Moving error injection to sysfs for edac
Always have the error injection i/f available, even if there is no
debugfs or EDAC_DEBUG enabled. We need this for testing production
kernels and environments.

Thus, the entry moves from:

 /sys/kernel/debug/edac/mc0/inject_ctrl

to:

 /sys/devices/system/edac/mc/mc0/inject_ctrl

No other changes of the interface.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04 17:01:11 -06:00
Robert Richter
7270a6085a edac: Unify reporting of device info for device, mc and pci
Log messages slightly differ between edac subsystems. Unifying it.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04 17:01:09 -06:00
Robert Richter
41ec0e8da0 edac, highbank: Improve and unify naming
Assinging correct names of the 'hb_mc_edac' and 'hb_l2_edac' edac
modules for module, controller and device. Reported values for
Highbank in dmesg are now:

 EDAC MC0: Giving out device to module hb_mc_edac controller
 calxeda,hb-ddr-ctrl: DEV fff00000.memory-controller (INTERRUPT)

 EDAC DEVICE0: Giving out device to module hb_l2_edac controller
 calxeda,hb-sregs-l2-ecc: DEV fff3c200.sregs (INTERRUPT)

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04 17:01:07 -06:00
Robert Richter
0ec8579e16 edac, highbank: Add Calxeda ECX-2000 support
Implement edac support for Calxeda ECX-2000.

The ECX-2000 memory controller is similar to Highbank but has
different register bases for error and interrupt registers. There is
an own device tree name "calxeda,ecx-2000-ddr-ctrl" for identification
and initialization of the ECX-2000 and its base addresses.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04 17:01:06 -06:00
Robert Richter
a72b8859fd edac, highbank: Fix interrupt setup of mem and l2 controller
Register and enable interrupts after the edac registration. Otherwise
incomming ecc error interrupts lead to crashes during device setup.

Fixing this in drivers for mc and l2.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Cc: stable <stable@vger.kernel.org>     # 3.6+
Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04 15:43:18 -06:00
Chen, Gong
56507694de EDAC, GHES: Update ghes error record info
In latest UEFI spec(by now it's 2.4) there are some new
fields for memory error reporting. Add these new fields for
ghes_edac interface.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-23 10:11:00 -07:00
Chen, Gong
147de14772 ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.

Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-23 10:10:20 -07:00
Chen, Gong
10ef6b0dff bitops: Introduce a more generic BITMASK macro
GENMASK is used to create a contiguous bitmask([hi:lo]). It is
implemented twice in current kernel. One is in EDAC driver, the other
is in SiS/XGI FB driver. Move it to a more generic place for other
usage.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Winischhofer <thomas@winischhofer.net>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-21 15:12:01 -07:00
Rob Herring
5af5073004 drivers: clean-up prom.h implicit includes
Powerpc is a mess of implicit includes by prom.h. Add the necessary
explicit includes to drivers in preparation of prom.h cleanup.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-09 20:04:04 -05:00
Linus Torvalds
4de9ad9bc0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull Tile arch updates from Chris Metcalf:
 "These changes bring in a bunch of new functionality that has been
  maintained internally at Tilera over the last year, plus other stray
  bits of work that I've taken into the tile tree from other folks.

  The changes include some PCI root complex work, interrupt-driven
  console support, support for performing fast-path unaligned data
  fixups by kernel-based JIT code generation, CONFIG_PREEMPT support,
  vDSO support for gettimeofday(), a serial driver for the tilegx
  on-chip UART, KGDB support, more optimized string routines, support
  for ftrace and kprobes, improved ASLR, and many bug fixes.

  We also remove support for the old TILE64 chip, which is no longer
  buildable"

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits)
  tile: refresh tile defconfig files
  tile: rework <asm/cmpxchg.h>
  tile PCI RC: make default consistent DMA mask 32-bit
  tile: add null check for kzalloc in tile/kernel/setup.c
  tile: make __write_once a synonym for __read_mostly
  tile: remove support for TILE64
  tile: use asm-generic/bitops/builtin-*.h
  tile: eliminate no-op "noatomichash" boot argument
  tile: use standard tile_bundle_bits type in traps.c
  tile: simplify code referencing hypervisor API addresses
  tile: change <asm/system.h> to <asm/switch_to.h> in comments
  tile: mark pcibios_init() as __init
  tile: check for correct compiler earlier in asm-offsets.c
  tile: use standard 'generic-y' model for <asm/hw_irq.h>
  tile: use asm-generic version of <asm/local64.h>
  tile PCI RC: add comment about "PCI hole" problem
  tile: remove DEBUG_EXTRA_FLAGS kernel config option
  tile: add virt_to_kpte() API and clean up and document behavior
  tile: support FRAME_POINTER
  tile: support reporting Tilera hypervisor statistics
  ...
2013-09-06 11:14:33 -07:00
Aravind Gopalakrishnan
4fc06b3171 amd64_edac: Fix incorrect wraparounds
dct_base and dct_limit obtain 32 bit register values when they read
their respective pci config space registers. A left shift beyond 32 bits
will cause them to wrap around. Similar case for chan_addr as can be
seen from the bug report (link below). In the patch, we rectify this by
casting chan_addr to u64 and by comparing dct_base and dct_limit against
properly shifted sys_addr in order to compare the correct bits.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-27 15:00:22 +02:00
Borislav Petkov
3f0aba4fc0 amd64_edac: Correct erratum 505 range
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later.

Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-27 14:25:08 +02:00
Ingo Molnar
c874b6ba55 An amd64_edac fix for single channel configurations + trivial cleanups
courtesy of Jingoo Han.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSC6mSAAoJEBLB8Bhh3lVKhKYQALt3lklah1BpZIqyTiLqtizN
 YVF/lSubpcHB2BfnZUzaSInE+17uo21DJ44tQ56O0lWDsxLqadJzTDiJ2BG5Ol3u
 JOIWti/Yl8YoCQQcir6QksBAUkR26iCpelO8kO6J/JWGEekVgl3Oik4NOmYrdN55
 rUoHIyVdK5Z5y4j79Pmth7//+c6OFli1cAeUmBlIvxS9T4T2ZCz30jBim76VTS8H
 AjgaX/aBlE3SxAYoMWLZh1VglukxAVCG2qZ9lm7iNLCpkGwP59jT/DE9Gok2IXBg
 id9SMTkrpitijCyM3oKYox14Tl+QP/brElnWCVyVeRIpkVH8s2WUkU+qHeZztBg9
 8i/aU9x4bOCkDjhvBjIM4jbYJAvvaVKlIXPvANO8xl/A0D7nCsmCs7DpritRHZEr
 3y4N8SQsaamVD083+UaVciAo1XCpl5cNq9gH/Y7+U8h2bIThZn8HTbZ1uWgXaCY1
 OwfElbJDKInsSmDVBEklMI8CF42YsjGQc2JC+A/3M3CapTfepKPg6SvptNQueZb+
 SUw0mgGBumpdDQSHo5tPf5JL1y57ERkdOBVryrqYtr6qdjw3ox9o/+B8eTy3gkg1
 b71LEhzG2UbdSVaHiYhRE/IR3W3yKzNX8Oh3BTHfp04jqnNijx4hHu9oX7j3W59M
 P/bd6GHHI5ZY66aLqCue
 =r2kL
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull RAS/EDAC updates from Boris Petkov:

 "An amd64_edac fix for single channel configurations + trivial cleanups
  courtesy of Jingoo Han."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:07:20 +02:00
Jingoo Han
75a9551f2b cpc925_edac: Use proper array termination
The struct should be terminated by using empty braces in order to
fix the following sparse warning.

drivers/edac/cpc925_edac.c:792:10: warning: Using plain integer as NULL pointer

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ drop obvious comment ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-14 12:46:46 +02:00
Borislav Petkov
a4b4bedce8 amd64_edac: Get rid of boot_cpu_data accesses
Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 16:01:56 +02:00
Aravind Gopalakrishnan
18b94f66f9 amd64_edac: Add ECC decoding support for newer F15h models
On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.

Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.

While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:

      * Erratum 505: fixed in model 0x1, stepping 0x1 and later.
      * Erratum 637: not fixed.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Cleanups, corrections ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 16:00:10 +02:00
Jingoo Han
e0d391ab04 x38_edac: Make a local function static
Make a local function static in order to fix the following sparse
warning:

drivers/edac/x38_edac.c:252:14: warning: symbol 'x38_map_mchbar' was not declared. Should it be static?

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ Boris: Correct commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-09 15:29:10 +02:00
Jingoo Han
166e9334e9 i3200_edac: Make a local function static
This local symbol is used only in this file.
Fix the following sparse warnings:

drivers/edac/i3200_edac.c:264:14: warning: symbol 'i3200_map_mchbar' was not declared. Should it be static?

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-09 15:23:02 +02:00
Borislav Petkov
f0a56c4801 amd64_edac: Fix single-channel setups
It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.

So always allocate two channels unconditionally.

This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.

Reported-and-tested-by: Roger Leigh <rleigh@debian.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-29 17:22:41 +02:00
Jingoo Han
c542b53da9 EDAC: Replace strict_strtol() with kstrtol()
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-24 09:30:03 +02:00
Borislav Petkov
88d84ac973 EDAC: Fix lockdep splat
Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
  dump_stack
  warn_slowpath_common
  warn_slowpath_fmt
  lockdep_init_map
  ? trace_hardirqs_on_caller
  ? trace_hardirqs_on
  debug_mutex_init
  __mutex_init
  bus_register
  edac_create_sysfs_mci_device
  edac_mc_add_mc
  sbridge_probe
  pci_device_probe
  driver_probe_device
  __driver_attach
  ? driver_probe_device
  bus_for_each_dev
  driver_attach
  bus_add_driver
  driver_register
  __pci_register_driver
  ? 0xffffffffa0010fff
  sbridge_init
  ? 0xffffffffa0010fff
  do_one_initcall
  load_module
  ? unset_module_init_ro_nx
  SyS_init_module
  tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-23 16:01:28 -07:00
Sachin Kamat
8e42e211e4 edac: Remove redundant platform_set_drvdata()
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-07-17 12:49:55 -04:00
Linus Torvalds
d144746478 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "MIPS updates:

   - All the things that didn't make 3.10.
   - Removes the Windriver PPMC platform.  Nobody will miss it.
   - Remove a workaround from kernel/irq/irqdomain.c which was there
     exclusivly for MIPS.  Patch by Grant Likely.
   - More small improvments for the SEAD 3 platform
   - Improvments on the BMIPS / SMP support for the BCM63xx series.
   - Various cleanups of dead leftovers.
   - Platform support for the Cavium Octeon-based EdgeRouter Lite.

  Two large KVM patchsets didn't make it for this pull request because
  their respective authors are vacationing"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits)
  MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER
  MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions
  MIPS: SEAD3: Disable L2 cache on SEAD-3.
  MIPS: BCM63xx: Enable second core SMP on BCM6328 if available
  MIPS: BCM63xx: Add SMP support to prom.c
  MIPS: define write{b,w,l,q}_relaxed
  MIPS: Expose missing pci_io{map,unmap} declarations
  MIPS: Malta: Update GCMP detection.
  Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
  MIPS: APSP: Remove <asm/kspd.h>
  SSB: Kconfig: Amend SSB_EMBEDDED dependencies
  MIPS: microMIPS: Fix improper definition of ISA exception bit.
  MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
  MIPS: Declare emulate_load_store_microMIPS as a static function.
  MIPS: Fix typos and cleanup comment
  MIPS: Cleanup indentation and whitespace
  MIPS: BMIPS: support booting from physical CPU other than 0
  MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
  MIPS: GIC: Fix gic_set_affinity infinite loop
  MIPS: Don't save/restore OCTEON wide multiplier state on syscalls.
  ...
2013-07-13 14:52:21 -07:00
Linus Torvalds
f3acb96f38 Add MCE signatures for family 0x15, models 30-3f.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0UeRAAoJEBLB8Bhh3lVKp/oP/1Bn5wFlVNVQymyqjUv8PCui
 W7gro9vf67KSZrEvbvWYuBl3/Z7JI9e7MQMzWOtj1mx/li6G/mSHPWYsoK/g4MRD
 nuxr5EWmXoXvoJvaIPDf/ne9qeMZ88sTv1zQavCWEO+fCY95rUY5DzhJuw8kgjMX
 g++C1AZJFzRGHBF0/VswcBKU1cBaEIJ2h0UI6zJQDPWtTUL5VJcfZ+7CduHf07IA
 rMKqRLKLzBuuQ/aiE52bPQAHirbYbJ9d2jfNqpDRsdBFrOoOeqJbco7ac/itk9Lo
 4Qlh52HuF0rpi9Ub+QIchaQm4SZCFnBuw5liDgT9kp8cOO9KWS6JBNmYa2cEw54F
 LALbdp0XeHCYWoQskzVlTaZlAGADmr4f0C0o1e+XXJYfp8TFIR3ZoconyBKHulsh
 JbQDp1NNC1JzoyZyZW73Gzi4a1yLmf5tB7Y3+O6NMTDy+6RFck3oCwctcr3dRaP2
 pArQA0OCApDixcGfB0h1uX+H8zhVrUen+JcF3KfuXhTepS5koIRbMBglBiJYtSjN
 4RRaV1DWR3ii+gvSfWEaApzX2XxnLYA/6ZJSvicwoRHi6AcZPn5mcLDCXIyYU0p/
 jDGLPY1RU+EmIQlgXUr6CbtlpoCaQOZuyUgSAqQL0pPfCvaeUvX1M7Ly0GqXy5b5
 dKMw7cf7T37S2S9YUBJ/
 =KQCd
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull AMD EDAC update from Borislav Petkov:
 "Add MCE signatures for family 0x15, models 30-3f"

* tag 'edac_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE, AMD: Add an MCE signature for new Fam15h models
  EDAC: Replace strict_strtoul() with kstrtoul()
2013-07-03 13:11:18 -07:00
David Daney
9ddebc46e7 MIPS: OCTEON: Rename Kconfig CAVIUM_OCTEON_REFERENCE_BOARD to CAVIUM_OCTEON_SOC
CAVIUM_OCTEON_SOC most place we used to use CPU_CAVIUM_OCTEON.  This
allows us to CPU_CAVIUM_OCTEON in places where we have no OCTEON SOC.

Remove CAVIUM_OCTEON_SIMULATOR as it doesn't really do anything, we can
get the same configuration with CAVIUM_OCTEON_SOC.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-ide@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: linux-i2c@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: spi-devel-general@lists.sourceforge.net
Cc: devel@driverdev.osuosl.org
Cc: linux-usb@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Patchwork: https://patchwork.linux-mips.org/patch/5295/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-06-10 18:01:25 +02:00
Aravind Gopalakrishnan
aad19e5176 EDAC, MCE, AMD: Add an MCE signature for new Fam15h models
Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
 [ cleanup commit message and error string ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-08 10:17:03 +02:00
Jingoo Han
c7f62fc87b EDAC: Replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-08 10:16:33 +02:00
Stephen Rothwell
40b313608a Finally eradicate CONFIG_HOTPLUG
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off.  Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03 14:20:18 -07:00
Borislav Petkov
bbb013b920 amd64_edac: Fix bogus sysfs file permissions
Fix yet another issue caught by 8f46baaa7e ("base: core: WARN() about
bogus permissions on device attributes").

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-05-21 09:13:11 +02:00
Linus Torvalds
7462543abb Two small EDAC fixes.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRi31iAAoJEBLB8Bhh3lVKiaQQAIZ4R6/9QSt7KbP5VLN1ksv2
 qV2VOGPuxkhGoO/vfUJ0RKFj0NgnQFYizKl+vjZL/ahQy9yqelWInoc/TO7tHGol
 rxlyCoxmS5kuXwbs8CYhC5OQBvuFSTfk/8Ecu4lZa/PfPKs4IXaq+jzjqfk6/QvC
 jjfJ/4TIbyHLR48tbjoiJtjMQlsZOZ+R9o6Ic0Qw49GlqbIdZ15KzZVnuLRxJWby
 4S2hQnBWkMfc/RYXfliUs6TsXd55qyd88La6PHeY/BJRxQoaBvsPWLEd6Pk6RNWd
 RfpoEHCdUfnFBQMvO5C50/Dp6iKXJ4xqh9qrPg0LVlTtGbPL8gdDfK5QEElhEiuw
 vM9dzfNXFvE4Pavx32WSm7ql2LD9Qf8ZdLPxMlvjNGW14+oQexHmDI6sdU5iVYxb
 toct5jF7MwEy6GQQcwmp84V8y5FU+MyEtT+w9OKLpay/9Bqcq0I/Mv6LJWH10IDC
 bpfkaUm10C1aF2/vP6BB48NGUUElZIxXg9VapzX+AuRs6kN7LLOmM38G371HPEbV
 wcsRCU1znxo1Yjehen6oI9I2AQ4NuMcHplK2FiD0I1AzlRQ/BM6TeHejy84SJMgf
 QEQkjwh8DAClzcKJFlt9uoIglCLLjY/WDVSLgvvhv+/kQIXrLV7zCAhGR2CE27ci
 XQsmruJYlPt0A0xkOh//
 =uyan
 -----END PGP SIGNATURE-----

Merge tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull two small EDAC fixes from Borislav Petkov.

* tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC: Don't give write permission to read-only files
  EDAC, mc_sysfs.c: Fix string array pointer types
2013-05-09 10:11:08 -07:00
Srivatsa S. Bhat
c8c64d165c EDAC: Don't give write permission to read-only files
I get the following warning on boot:

------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name:  -[8737R2A]-
Write permission without 'store'
...
</snip>

Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-05-09 12:40:45 +02:00
Linus Torvalds
e2823299cd Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac fixes from Mauro Carvalho Chehab:
 "Two edac fixes:

   - i7300_edac currently reports a wrong number of DIMMs when the
     memory controller is in single channel mode

   - on some Sandy Bridge machines, the EDAC driver bails out as one of
     the PCI IDs used by the driver is hidden by BIOS.  As the driver
     uses it only to detect the type of memory, make it optional at the
     driver"

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  edac: sb_edac.c should not require prescence of IMC_DDRIO device
  i7300_edac: Fix memory detection in single mode
2013-04-30 10:00:49 -07:00
Luck, Tony
de4772c621 edac: sb_edac.c should not require prescence of IMC_DDRIO device
The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
space to determine the type of DIMMs (registered or unregistered).
But this device does not exist on some single socket Sandy Bridge
servers.  While the type of DIMMs is nice to know, it is not essential
for this driver's other functions. So it seems harsh to have it
refuse to load at all when it cannot find this device.

Make the check for this device be optional. If it isn't present
just report the memory type as "MEM_UNKNOWN".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-04-29 10:32:40 -03:00
Mauro Carvalho Chehab
33ad41263d i7300_edac: Fix memory detection in single mode
When the machine is on single mode, only branch 0 channel 0
is valid. However, the code is not honouring it:

[ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode
...
[ 1952.639351] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH0 = 0x1:
[ 1952.639353] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH1 = 0x0:
[ 1952.639355] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH2 = 0x0:
[ 1952.639358] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH3 = 0x0:
...
[ 1952.639360] EDAC DEBUG: decode_mtr: 	MTR0 CH0: DIMMs are Present (mtr)
[ 1952.639362] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639363] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639364] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639366] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639367] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639368] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639370] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639371] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639373] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
[ 1952.639374] EDAC DEBUG: decode_mtr: 	MTR0 CH1: DIMMs are Present (mtr)
[ 1952.639376] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639377] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639379] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639380] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639381] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639383] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639384] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639385] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639387] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
...
[ 1952.639449] EDAC DEBUG: print_dimm_size:               channel 0 | channel 1 | channel 2 | channel 3 |
[ 1952.639451] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
[ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0   512 MB   |  512 MB   |    0 MB   |    0 MB   |
[ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639470] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------

Instead of detecting a single memory at channel 0, it is showing
twice the memory.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-04-29 10:32:39 -03:00
Aravind Gopalakrishnan
94c1acf2c8 amd64_edac: Add Family 16h support
Add code to handle DRAM ECC errors decoding for Fam16h.

Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-04-19 12:46:50 +02:00
Borislav Petkov
8b7719e08a EDAC, mc_sysfs.c: Fix string array pointer types
Those should be const ptr to a const string, fix them.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-25 15:44:25 +01:00
Mauro Carvalho Chehab
9713faecff EDAC: Merge mci.mem_is_per_rank with mci.csbased
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:30 +01:00
Mauro Carvalho Chehab
1eef128254 amd64_edac: Correct DIMM sizes
We were filling the csrow size with a wrong value. 16a528ee39 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:02 +01:00
Stephen Hemminger
fbe2d3616c EDAC: Make sysfs functions static
Fixes lots of sparse warnings here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-05 11:33:57 +01:00
Linus Torvalds
ad6c2c2eb3 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
 "For:

   - Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
   - error injection support for i5100, when EDAC debug is enabled;
   - fix edac when it is loaded builtin (early init for the subsystem);
   - a "Firmware First" EDAC driver, allowing ghes to report errors via
     EDAC (ghes-edac).

  With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
  that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
  i7core_edac or sb_edac are running, the error reports are
  unpredictable, as both BIOS and OS race to access the registers.  With
  ghes-edac, the EDAC core will refuse to register any other concurrent
  memory error driver.

  This patchset moves the ghes struct definitions to a separate header
  file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
  register/unregister and to report errors via ghes-edac.  Those changes
  were acked by ghes driver maintainer (Huang)."

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
  i5100_edac: convert to use simple_open()
  ghes_edac: fix to use list_for_each_entry_safe() when delete list items
  ghes_edac: Fix RAS tracing
  ghes_edac: Make it compliant with UEFI spec 2.3.1
  ghes_edac: Improve driver's printk messages
  ghes_edac: Don't credit the same memory dimm twice
  ghes_edac: do a better job of filling EDAC DIMM info
  ghes_edac: add support for reporting errors via EDAC
  ghes_edac: Register at EDAC core the BIOS report
  ghes: add the needed hooks for EDAC error report
  ghes: move structures/enum to a header file
  edac: add support for error type "Info"
  edac: add support for raw error reports
  edac: reduce stack pressure by using a pre-allocated buffer
  edac: lock module owner to avoid error report conflicts
  edac: remove proc_name from mci structure
  edac: add a new memory layer type
  edac: initialize the core earlier
  edac: better report error conditions in debug mode
  i5100_edac: Remove two checkpatch warnings
  ...
2013-02-28 20:42:33 -08:00
Wei Yongjun
b0769891ba i5100_edac: convert to use simple_open()
This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-26 10:06:18 -03:00
Wei Yongjun
5dae92a718 ghes_edac: fix to use list_for_each_entry_safe() when delete list items
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-26 10:05:35 -03:00
Mauro Carvalho Chehab
8ae8f50ad8 ghes_edac: Fix RAS tracing
With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.

As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.

The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.

Now, the error message:

[   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[   72.396627] {1}[Hardware Error]: APEI generic hardware error status
[   72.396628] {1}[Hardware Error]: severity: 2, corrected
[   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[   72.396632] {1}[Hardware Error]: flags: 0x01
[   72.396634] {1}[Hardware Error]: primary
[   72.396635] {1}[Hardware Error]: section_type: memory error
[   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[   72.396638] {1}[Hardware Error]: node: 3
[   72.396639] {1}[Hardware Error]: card: 0
[   72.396640] {1}[Hardware Error]: module: 0
[   72.396641] {1}[Hardware Error]: device: 0
[   72.396643] {1}[Hardware Error]: error_type: 18, unknown
[   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Is properly represented on the trace event:

     kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location👎-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 sockets E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:17 -03:00
Mauro Carvalho Chehab
689c9cd812 ghes_edac: Make it compliant with UEFI spec 2.3.1
The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.

The EDAC error log is now properly reporting the error:

[  281.556854] mce: [Hardware Error]: Machine check events logged
[  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  281.557044] {2}[Hardware Error]: APEI generic hardware error status
[  281.557046] {2}[Hardware Error]: severity: 2, corrected
[  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[  281.557050] {2}[Hardware Error]: flags: 0x01
[  281.557052] {2}[Hardware Error]: primary
[  281.557053] {2}[Hardware Error]: section_type: memory error
[  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[  281.557056] {2}[Hardware Error]: node: 3
[  281.557057] {2}[Hardware Error]: card: 0
[  281.557058] {2}[Hardware Error]: module: 1
[  281.557059] {2}[Hardware Error]: device: 0
[  281.557061] {2}[Hardware Error]: error_type: 18, unknown
[  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 CPUs E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:16 -03:00
Mauro Carvalho Chehab
d2a6856614 ghes_edac: Improve driver's printk messages
Provide a better infrastructure for printk's inside the driver:
	- use edac_dbg() for debug messages;
	- standardize the usage of pr_info();
	- provide warning about the risk of relying on this
	  driver.

While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

Conflicts:
	drivers/edac/ghes_edac.c
2013-02-25 19:42:15 -03:00
Mauro Carvalho Chehab
5ee726db52 ghes_edac: Don't credit the same memory dimm twice
On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:

          +-----------------------+
          |    mc0    |    mc1    |
----------+-----------------------+
memory45: |  8192 MB  |  8192 MB  |
memory44: |     0 MB  |     0 MB  |
----------+-----------------------+
memory43: |     0 MB  |     0 MB  |
memory42: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory41: |     0 MB  |     0 MB  |
memory40: |     0 MB  |     0 MB  |
----------+-----------------------+
memory39: |  8192 MB  |  8192 MB  |
memory38: |     0 MB  |     0 MB  |
----------+-----------------------+
memory37: |     0 MB  |     0 MB  |
memory36: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory35: |     0 MB  |     0 MB  |
memory34: |     0 MB  |     0 MB  |
----------+-----------------------+
memory33: |  8192 MB  |  8192 MB  |
memory32: |     0 MB  |     0 MB  |
----------+-----------------------+
memory31: |     0 MB  |     0 MB  |
memory30: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory29: |     0 MB  |     0 MB  |
memory28: |     0 MB  |     0 MB  |
----------+-----------------------+
memory27: |  8192 MB  |  8192 MB  |
memory26: |     0 MB  |     0 MB  |
----------+-----------------------+
memory25: |     0 MB  |     0 MB  |
memory24: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory23: |     0 MB  |     0 MB  |
memory22: |     0 MB  |     0 MB  |
----------+-----------------------+
memory21: |  8192 MB  |  8192 MB  |
memory20: |     0 MB  |     0 MB  |
----------+-----------------------+
memory19: |     0 MB  |     0 MB  |
memory18: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory17: |     0 MB  |     0 MB  |
memory16: |     0 MB  |     0 MB  |
----------+-----------------------+
memory15: |  8192 MB  |  8192 MB  |
memory14: |     0 MB  |     0 MB  |
----------+-----------------------+
memory13: |     0 MB  |     0 MB  |
memory12: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory11: |     0 MB  |     0 MB  |
memory10: |     0 MB  |     0 MB  |
----------+-----------------------+
memory9:  |  8192 MB  |  8192 MB  |
memory8:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory7:  |     0 MB  |     0 MB  |
memory6:  |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory5:  |     0 MB  |     0 MB  |
memory4:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory3:  |  8192 MB  |  8192 MB  |
memory2:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory1:  |     0 MB  |     0 MB  |
memory0:  |  8192 MB  |  8192 MB  |
----------+-----------------------+

Total sum of 256 GB.

As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:14 -03:00
Mauro Carvalho Chehab
32fa1f53c2 ghes_edac: do a better job of filling EDAC DIMM info
Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab
f04c62a703 ghes_edac: add support for reporting errors via EDAC
Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab
77c5f5d2f2 ghes_edac: Register at EDAC core the BIOS report
Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.

The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.

For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:12 -03:00
Linus Torvalds
06991c28f3 Driver core patches for 3.9-rc1
Here is the big driver core merge for 3.9-rc1
 
 There are two major series here, both of which touch lots of drivers all
 over the kernel, and will cause you some merge conflicts:
   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.
   - remove CONFIG_EXPERIMENTAL
 
 If you need me to provide a merged tree to handle these resolutions,
 please let me know.
 
 Other than those patches, there's not much here, some minor fixes and
 updates.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
 =yWAQ
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg Kroah-Hartman:
 "Here is the big driver core merge for 3.9-rc1

  There are two major series here, both of which touch lots of drivers
  all over the kernel, and will cause you some merge conflicts:

   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.

   - remove CONFIG_EXPERIMENTAL

  Other than those patches, there's not much here, some minor fixes and
  updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
  base: memory: fix soft/hard_offline_page permissions
  drivercore: Fix ordering between deferred_probe and exiting initcalls
  backlight: fix class_find_device() arguments
  TTY: mark tty_get_device call with the proper const values
  driver-core: constify data for class_find_device()
  firmware: Ignore abort check when no user-helper is used
  firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
  firmware: Make user-mode helper optional
  firmware: Refactoring for splitting user-mode helper code
  Driver core: treat unregistered bus_types as having no devices
  watchdog: Convert to devm_ioremap_resource()
  thermal: Convert to devm_ioremap_resource()
  spi: Convert to devm_ioremap_resource()
  power: Convert to devm_ioremap_resource()
  mtd: Convert to devm_ioremap_resource()
  mmc: Convert to devm_ioremap_resource()
  mfd: Convert to devm_ioremap_resource()
  media: Convert to devm_ioremap_resource()
  iommu: Convert to devm_ioremap_resource()
  drm: Convert to devm_ioremap_resource()
  ...
2013-02-21 12:05:51 -08:00
Mauro Carvalho Chehab
e7e248304c edac: add support for raw error reports
That allows APEI GHES driver to report errors directly, using
the EDAC error report API.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 14:16:03 -03:00
Mauro Carvalho Chehab
c7ef764554 edac: reduce stack pressure by using a pre-allocated buffer
The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 13:48:45 -03:00
Mauro Carvalho Chehab
80cc7d87d5 edac: lock module owner to avoid error report conflicts
APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
	"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
   the EDAC core, as it is driver's responsibility to number
   the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
   doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:38 -03:00
Mauro Carvalho Chehab
c66b5a79a9 edac: add a new memory layer type
There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:37 -03:00
Mauro Carvalho Chehab
4ab19b06ac edac: initialize the core earlier
In order for it to work with it builtin, the EDAC core should
be initialized earlier, otherwise the ghes_edac driver initializes
before edac_mc_sysfs_init() being called:

...
[    4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
...
[    4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
[    6.519495] EDAC MC: Ver: 3.0.0
[    6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created

The net result is that no EDAC sysfs nodes will appear.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:36 -03:00
Mauro Carvalho Chehab
3d958823e2 edac: better report error conditions in debug mode
It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:35 -03:00
Mauro Carvalho Chehab
59b9796d1e i5100_edac: Remove two checkpatch warnings
The last changeset introduced a few checkpatch warnings:

WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+       if (priv->debugfs)
+               debugfs_remove_recursive(priv->debugfs);

WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+       if (i5100_debugfs)
+               debugfs_remove(i5100_debugfs);

Get rid of them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:34 -03:00
Niklas Söderlund
9cbc6d38f2 i5100_edac: connect fault injection to debugfs node
Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.

After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.

Example of a CE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Example of UE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:34 -03:00
Niklas Söderlund
53ceafd6a2 i5100_edac: add fault injection code
Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.

[1] Intel 5100 Memory Controller Hub Chipset
    Doc.Nr: 318378
    http://www.intel.com/content/dam/doc/datasheet/5100-
    memory-controller-hub-chipset-datasheet.pdf

[2] Intel 7300 Chipset MemoryController Hub (MCH)
    Doc.Nr: 318082
	http://www.intel.com/assets/pdf/datasheet/318082.pdf

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:33 -03:00
Niklas Söderlund
52608ba205 i5100_edac: probe for device 19 function 0
Probe and store the device handle for the device 19 function 0 during
driver initialization. The device is used during fault injection.

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:32 -03:00
Mauro Carvalho Chehab
e7100478fa edac: only create sdram_scrub_rate where supported
Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.

Reported-by: Felipe Balbi <balbi@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab
61734e1835 i3200_edac: Fix the logic that detects filled memories
After running a series of tests on an HP DL320, filled with different
memory sizes, it was noticed that, when filled with just one DIMM
on such hardware, the driver wrongly detects twice the memory, and
thinks that both channels 0 and 1 are filled.

It seems to be partially caused by the BIOS and partially by the driver.

The i3200_edac current logic would be working fine if the BIOS were
disabling the unused second channel when just one DIMM is connected,
in order to do power-saving, as recommended on this chipset's datasheet.

However, the BIOS on this particular machine doesn't do it:

[   16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
[   16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled

So, the driver were assuming that 2 channels are enabled (well, they are,
but the second is unused).

Combined with that, I found two issues at the logic that creates the
EDAC data, that were failing when the two channels are not equally
filled (AFAICT, that happens only when just 1 DIMM is plugged).

The first one is that a 0 at DRB means that nothing is filled. The
driver's logic, however, do some calculation with that.

The second one is that the logic that fills the DIMM data currently
assumes that both channels are equally filled.

I tested the system already with the current configuration and my
patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
at dimm slot 01 (channel 0), it is now displaying:

[   16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
[   16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
[   16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
[   16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
...
[   16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
[   16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb

and the corresponding sysfs nodes are now properly filled.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab
5f466cb025 i3200_edac: Add more debug to the driver
Currently, it is not possible to know, when debug is enabled,
if the driver is using 2 DIMMS per channel mode or not. It is
not possible to know the values of the drbs registers, used
to identify the memory rank sizes.

Add debug for both, as it helps to track issues on the driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:22 -03:00
Linus Torvalds
55529fa576 EDAC updates for 3.9
Only one: AMD F16h MCE decoding enablement from Jacob Shin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRI2t7AAoJEBLB8Bhh3lVKmLAP/RbYNVVvLZZxXqB6heCkX7D3
 Avhi4GtuHqjY+79nqBoJEH9HjgzBzL6ffsJGPF3gtXrWybi6nzZkqog5QGRQFfGS
 gIoldbfP9MMF/RrMHbUF5x9Ha9EcdudDgYE3eqCq6KUfOMlUrBRECaU5YRrB+FKR
 qsuQNmF6J3uMPYwO9oGhTpA/vPwwapFaKgm+KND8q5h5JcpJrRpooKNQc1DheW+x
 nfYFpmSBW8eamVwV5DTjqKVhKeG3gB/3i6uSTpLvh4i0ZmV2vQ+drwC3aU0Eh0Q4
 wnslEpQENjODsvbZH6b0j2WTDgBGKEpALRkMUDTnnqvYrR96cV02MWUfp6cbKYCv
 +d/iPt3hEBCyDTDzfP66N+1b+Wqls4Dx8lhhqLdPhsIpY2Qu8OQ8/mjyIIs12qK5
 g9w3lsfy3shWmdsK/Ehd0IhNt1bzNV+63CINA2isC2QAp0Eqecj3jKrXor+MTPu1
 uRY3wM+8CrdeFNNhhhsMQYIb4+DFGLgMwY5mV6z8ceFJAjxvyUxjObnSMopjBKog
 9+JcyEHCADbpHsRup/zRIoGtSs+44sQ6l8l7pq6d4K4UfkG7Biq9lcxThID7rImn
 Rl6MOJVmJuM5n9JVIU4/bmhjfuTcI+gdshexDEH2HnqaXxd7ceIbnrFWFaBZEO5t
 t3WasBuIb30g1XCQmNT6
 =LbcU
 -----END PGP SIGNATURE-----

Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "Mostly AMD's side of EDAC.  It is basically a new family enablement
  stuff: AMD F16h MCE decoding enablement from Jacob Shin.  The rest is
  trivial cleanups."

* tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  mpc85xx_edac: Fix typo
  EDAC, MCE, AMD: Remove unneeded exports
  EDAC, MCE, AMD: Add MCE decoding support for Family 16h
  EDAC, MCE, AMD: Make MC2 decoding per-family
  amd64_edac: Remove dead code
2013-02-20 11:28:50 -08:00
Mauro Carvalho Chehab
1339730e73 Linux 3.8-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q
 GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5
 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+
 Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo
 EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL
 ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M=
 =jjBc
 -----END PGP SIGNATURE-----

Merge tag 'v3.8-rc7' into next

Linux 3.8-rc7

* tag 'v3.8-rc7': (12052 commits)
  Linux 3.8-rc7
  net: sctp: sctp_endpoint_free: zero out secret key data
  net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
  atm/iphase: rename fregt_t -> ffreg_t
  ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
  ARM: DMA mapping: fix bad atomic test
  ARM: realview: ensure that we have sufficient IRQs available
  ARM: GIC: fix GIC cpumask initialization
  net: usb: fix regression from FLAG_NOARP code
  l2tp: dont play with skb->truesize
  net: sctp: sctp_auth_key_put: use kzfree instead of kfree
  netback: correct netbk_tx_err to handle wrap around.
  xen/netback: free already allocated memory on failure in xen_netbk_get_requests
  xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
  xen/netback: shutdown the ring if it contains garbage.
  drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
  virtio_console: Don't access uninitialized data.
  net: qmi_wwan: add more Huawei devices, including E320
  net: cdc_ncm: add another Huawei vendor specific device
  ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
  ...
2013-02-20 15:45:52 -03:00
Linus Torvalds
f98982ce80 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:

 - Support for the Technologic Systems TS-5500 platform, by Vivien
   Didelot

 - Improved NUMA support on AMD systems:

   Add support for federated systems where multiple memory controllers
   can exist and see each other over multiple PCI domains.  This
   basically means that AMD node ids can be more than 8 now and the code
   handling this is taught to incorporate PCI domain into those IDs.

 - Support for the Goldfish virtual Android emulator, by Jun Nakajima,
   Intel, Google, et al.

 - Misc fixlets.

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add TS-5500 platform support
  x86/srat: Simplify memory affinity init error handling
  x86/apb/timer: Remove unnecessary "if"
  goldfish: platform device for x86
  amd64_edac: Fix type usage in NB IDs and memory ranges
  amd64_edac: Fix PCI function lookup
  x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
  x86, AMD, NB: Add multi-domain support
2013-02-19 20:11:07 -08:00
Baruch Siach
e7d2c215e5 mpc85xx_edac: Fix typo
Correct typos.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Cc: Dave Jiang <djiang@mvista.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-02-10 13:28:41 +01:00
Joe Perches
d3d09e1820 EDAC: Fix kcalloc argument order
First number, then size.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-01-30 11:38:25 +01:00