summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2009-12-16Merge remote branch 'i7core_edac/linux_next'Stephen Rothwell
2009-12-11x86, msr: Add support for non-contiguous cpumasksBorislav Petkov
The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-08amd64_edac: bump driver versionBorislav Petkov
This was long overdue ... Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08amd64_edac: fix use-uninitialised bugAndrew Morton
drivers/edac/amd64_edac.c: In function 'amd64_edac_init': drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08amd64_edac: correct sys address to chip select mappingBorislav Petkov
The routine does the reverse mapping of the error address of a CECC back to the node id, DRAM controller and chip select of the DIMM which caused the error. We should lookup the channel using the syndromes _only_ when the DCTs are ganged so fix that. Also, add an early exit when there's an error while scanning for the csrow thus decreasing indentation levels for better readability. Finally, fixup comments. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08amd64_edac: add a leaner syndrome decoding algorithmBorislav Petkov
Instead of using the whole syndrome tables for channel decoding, use a set of eigenvectors with which the tables can be generated to search for the syndrome in error. The algorithm operates independently of symbol size and can be used for both x4 and x8 syndromes. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: remove early hw support checkBorislav Petkov
The .probe_valid_hardware low_ops member checked whether the DCTs are in DDR3 mode and bailed out if so. Now that all the needed changes for DDR3 support is in place, remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: detect DDR3 memory typeBorislav Petkov
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07edac: add memory types strings for debuggingBorislav Petkov
Instead of using deeply-nested conditionals for dumping the DIMM type in debug mode, add a strings array of the supported DIMM types. This is useful in cases where an edac driver supports multiple DRAM types and is only defined in debug builds. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07edac, mce: update AMD F10h revD checkBorislav Petkov
F10h revD start with model number 8. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: remove unneeded extract_error_address wrapperBorislav Petkov
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: rename StinkyIdentifierBorislav Petkov
SystemAddress -> sys_addr Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: remove superfluous dbg printkBorislav Petkov
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: enhance address to DRAM bank mappingBorislav Petkov
Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10 and all K8 flavors and remove klugdy table of pseudo values. Add a low_ops->dbam_to_cs member which is family-specific and replaces low_ops->dbam_map_to_pages since the pages calculation is a one liner now. Further cleanups, while at it: - shorten family name defines - align amd64_family_types struct members Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: cleanup f10_early_channel_countBorislav Petkov
Do not read DCLR[01] again since this is done in amd64_read_mc_registers() earlier. There can be more than two physical DIMMs present so clamp the channels value to max 2. Also, do not report DCT data width - it is also done earlier. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: dump DIMM sizes on K8 tooBorislav Petkov
Extend f10_debug_display_dimm_sizes to dump the logical DIMMs configuration on K8 revF too. Remove the ganged arg since we print the DCT operating mode (ganged vs unganged) earlier. Also, DCT csrow configuration is relevant therefore dump it as KERN_DEBUG instead of only on debug builds. Remove misleading DIMM output since there's no reliable way of mapping of chip selects to actual physical DIMMs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: cleanup rest of amd64_dump_misc_regsBorislav Petkov
Clarify bitfields description, add PCI config function/offset names to registers for easy reference, simplify code layout, remove unneeded info. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: cleanup DRAM cfg low debug outputBorislav Petkov
Carve out the register-specific debug statements into a separate function, clarify meanings of the single bitfields in the register, remove irrelevant output and macros. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: wrap-up pci config read error handlingBorislav Petkov
Add a pci config read wrapper for signaling pci config space access errors instead of them being visible only on a debug build. This is important on amd64_edac since it uses all those pci config register values to access the DRAM/DIMM configuration of the nodes. In addition, the wrapper makes a _lot_ (look at the diffstat!) of error handling code superfluous and improves much of the overall code readability by removing error handling details out of the way. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: unify MCGCTL ECC switchingBorislav Petkov
Unify almost identical code into one function and remove NUMA-specific usage (specifically cpumask_of_node()) in favor of generic topology methods. Remove unused defines, while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07cpumask: use modern cpumask style in drivers/edac/amd64_edac.cRusty Russell
cpumask_t -> struct cpumask, and don't put one on the stack. (Note: this is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: make DRAM regions output more human-readableBorislav Petkov
Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits are only a subset of the 64bit register, the values are correct since the remaining bits are Read-As-Zero and no shifting is needed. Also, cleanup DRAM base/limit debug output. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: clarify DRAM CTL debug reportingBorislav Petkov
Make debug info formulations about the DRAM and DCT configuration of the machine more human readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-03Merge branch 'perf/mce' into perf/coreIngo Molnar
Merge reason: It's ready for v2.6.33. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08edac: fix i7core buildRandy Dunlap
Fix build warning (missing header file) and build error when CONFIG_SMP=n. drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep' drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-11-08edac: i7core_edac produces undefined behaviour on 32bitAlan Cox
Fix the shifts up Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-11-04amd64_edac: fix CECCs reportingBorislav Petkov
Shift error type bits properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-11-04amd64_edac: fix a wrong goto clause in amd64_edac.cLi Hong
In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is called before pci_register_driver. If it fails, should exit with err directly. Signed-off-by: Li Hong <lihong.hi@gmail.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-29edac: i5100 fix initialization codeKeith Mannthey
Allow csrows to properly initialize when the topology only has active channels on 2 and 3. This new check allows proper detection and initialization in this topology. Only checking the first mrt that represented channels 0 and 1 is not sufficient. I also fixed up the related debug information path. I can submit as a 2nd patch if needed. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Acked-by: Aristeu Rozanski <aris@ruivo.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29edac: i5400 fix missing CONFIG_PCI defineIra W. Snyder
When building without CONFIG_PCI the edac_pci_idx variable is unused, causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI, just like the rest of the PCI support. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29edac: i5400 fix csrow mappingJeff Roberson
The i5400 EDAC driver has several bugs with chip-select row computation which most likely lead to bugs in detailed error reporting. Attempts to contact the authors have gone mostly unanswered so I am presenting my diff here. I do not subscribe to lkml and would appreciate being kept in the cc. The most egregious problem was miscalculating the addresses of MTR registers after register 0 by assuming they are 32bit rather than 16. This caused the driver to miss half of the memories. Most motherboards tend to have only 8 dimm slots and not 16, so this may not have been noticed before. Further, the row calculations multiplied the number of dimms several times, ultimately ending up with a maximum row of 32. The chipset only supports 4 dimms in each of 4 channels, so csrow could not be higher than 4 unless you use a row per-rank with dual-rank dimms. I opted to eliminate this behavior as it is confusing to the user and the error reporting works by slot and not rank. This gives a much clearer view of memory by slot and channel in /sys. Signed-off-by: Jeff Roberson <jroberson@jroberson.net> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-16amd64_edac: fix DRAM base and limit extraction masks, v2Borislav Petkov
This is a proper fix as a follow-up to 66216a7 and 916d11b. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-14i7core_edac: Use a more generic approach for probing PCI devicesMauro Carvalho Chehab
Currently, only one PCI set of tables is allowed. This prevents using the driver for other devices like Lynnfield, with have a different set of PCI ID's. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-14i7core_edac: PCI device is called NONCORE, instead of NOCOREMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: Fix ringbuffer maxsizeMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: First store, then incrementMauro Carvalho Chehab
Fix ringbuffer store logic. While here, add a few comments to the code and remove the undesired printk that could otherwise be called during NMI time. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: Better parse "any" addrmaskMauro Carvalho Chehab
Instead of accepting just "any", accept also "any\n" Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: Use a lockless ringbufferMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13edac: Create an unique instance for each kobjMauro Carvalho Chehab
Current code only works when there's just one memory controller, since we need one kobj for each instance. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: Convert UDIMM error counters into a proper sysfs groupMauro Carvalho Chehab
Instead of displaying 3 values at the same var, break it into 3 different sysfs nodes: /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2 For registered dimms, however, the error counters are already being displayed at: /sys/devices/system/edac/mc/mc0/csrow*/ce_count So, there's no need to add any extra sysfs nodes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13edac: Don't create csrow entries on instance groupsMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13edac: store/show methods for device groups weren't workingMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: Add support for sysfs addrmatch groupMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13edac_core: Allow the creation of sysfs groupsMauro Carvalho Chehab
Currently, all sysfs nodes are stored at /sys/.*/mc. (regex) However, sometimes it is needed to create attribute groups. This patch extends edac_core to allow groups creation. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: Avoid printing a warning when debug is disabledMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: We need to use list_for_each_entry_safe to avoid errorsMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: change remove module strategyMauro Carvalho Chehab
The old remove module stragegy didn't work on devices with multiple cores, since only one PCI device is used to open all mc's, due to Nehalem nature. Also, it were based at pdev value. However, this doesn't point to the pci device used at mci->dev. So, instead, it unregisters all devices at once, deleting them from the device list. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: remove static counter for max socketsMauro Carvalho Chehab
The number of sockets is now fully dynamic. Get rid of this obsolete var. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: at remove, don't remove all pci devices at onceMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-10-13i7core_edac: Fix a bug when printing error counts with RDIMMsMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>