Age | Commit message (Collapse) | Author |
|
|
|
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.
This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.
Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
This was long overdue ...
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
drivers/edac/amd64_edac.c: In function 'amd64_edac_init':
drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
The routine does the reverse mapping of the error address of a CECC back
to the node id, DRAM controller and chip select of the DIMM which caused
the error. We should lookup the channel using the syndromes _only_ when
the DCTs are ganged so fix that.
Also, add an early exit when there's an error while scanning for the
csrow thus decreasing indentation levels for better readability.
Finally, fixup comments.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Instead of using the whole syndrome tables for channel decoding, use a
set of eigenvectors with which the tables can be generated to search for
the syndrome in error. The algorithm operates independently of symbol
size and can be used for both x4 and x8 syndromes.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
The .probe_valid_hardware low_ops member checked whether the DCTs are in
DDR3 mode and bailed out if so. Now that all the needed changes for DDR3
support is in place, remove it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.
This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
F10h revD start with model number 8.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
SystemAddress -> sys_addr
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10
and all K8 flavors and remove klugdy table of pseudo values. Add a
low_ops->dbam_to_cs member which is family-specific and replaces
low_ops->dbam_map_to_pages since the pages calculation is a one liner
now.
Further cleanups, while at it:
- shorten family name defines
- align amd64_family_types struct members
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Do not read DCLR[01] again since this is done in
amd64_read_mc_registers() earlier. There can be more than two physical
DIMMs present so clamp the channels value to max 2. Also, do not report
DCT data width - it is also done earlier.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Extend f10_debug_display_dimm_sizes to dump the logical DIMMs
configuration on K8 revF too. Remove the ganged arg since we print the
DCT operating mode (ganged vs unganged) earlier.
Also, DCT csrow configuration is relevant therefore dump it as
KERN_DEBUG instead of only on debug builds. Remove misleading DIMM
output since there's no reliable way of mapping of chip selects to
actual physical DIMMs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Clarify bitfields description, add PCI config function/offset names to
registers for easy reference, simplify code layout, remove unneeded
info.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Carve out the register-specific debug statements into a separate
function, clarify meanings of the single bitfields in the register,
remove irrelevant output and macros.
There should be no functionality change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add a pci config read wrapper for signaling pci config space access
errors instead of them being visible only on a debug build. This is
important on amd64_edac since it uses all those pci config register
values to access the DRAM/DIMM configuration of the nodes.
In addition, the wrapper makes a _lot_ (look at the diffstat!) of
error handling code superfluous and improves much of the overall code
readability by removing error handling details out of the way.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Unify almost identical code into one function and remove NUMA-specific
usage (specifically cpumask_of_node()) in favor of generic topology
methods.
Remove unused defines, while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
cpumask_t -> struct cpumask, and don't put one on the stack. (Note: this
is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the
whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits
are only a subset of the 64bit register, the values are correct since
the remaining bits are Read-As-Zero and no shifting is needed.
Also, cleanup DRAM base/limit debug output.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Make debug info formulations about the DRAM and DCT configuration of the
machine more human readable.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Merge reason: It's ready for v2.6.33.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Fix build warning (missing header file) and
build error when CONFIG_SMP=n.
drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Fix the shifts up
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Shift error type bits properly.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is
called before pci_register_driver. If it fails, should exit with err
directly.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Allow csrows to properly initialize when the topology only has active
channels on 2 and 3. This new check allows proper detection and
initialization in this topology. Only checking the first mrt that
represented channels 0 and 1 is not sufficient.
I also fixed up the related debug information path. I can submit as a 2nd
patch if needed.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Acked-by: Aristeu Rozanski <aris@ruivo.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When building without CONFIG_PCI the edac_pci_idx variable is unused,
causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI,
just like the rest of the PCI support.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The i5400 EDAC driver has several bugs with chip-select row computation
which most likely lead to bugs in detailed error reporting. Attempts to
contact the authors have gone mostly unanswered so I am presenting my diff
here. I do not subscribe to lkml and would appreciate being kept in the
cc.
The most egregious problem was miscalculating the addresses of MTR
registers after register 0 by assuming they are 32bit rather than 16.
This caused the driver to miss half of the memories. Most motherboards
tend to have only 8 dimm slots and not 16, so this may not have been
noticed before.
Further, the row calculations multiplied the number of dimms several
times, ultimately ending up with a maximum row of 32. The chipset only
supports 4 dimms in each of 4 channels, so csrow could not be higher than
4 unless you use a row per-rank with dual-rank dimms. I opted to
eliminate this behavior as it is confusing to the user and the error
reporting works by slot and not rank. This gives a much clearer view of
memory by slot and channel in /sys.
Signed-off-by: Jeff Roberson <jroberson@jroberson.net>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is a proper fix as a follow-up to 66216a7 and 916d11b.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Currently, only one PCI set of tables is allowed. This prevents using
the driver for other devices like Lynnfield, with have a different
set of PCI ID's.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Fix ringbuffer store logic.
While here, add a few comments to the code and remove the undesired
printk that could otherwise be called during NMI time.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of accepting just "any", accept also "any\n"
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Current code only works when there's just one memory
controller, since we need one kobj for each instance.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2
For registered dimms, however, the error counters are already being
displayed at:
/sys/devices/system/edac/mc/mc0/csrow*/ce_count
So, there's no need to add any extra sysfs nodes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Currently, all sysfs nodes are stored at /sys/.*/mc. (regex)
However, sometimes it is needed to create attribute groups.
This patch extends edac_core to allow groups creation.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.
Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.
So, instead, it unregisters all devices at once, deleting them from the
device list.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The number of sockets is now fully dynamic. Get rid of this obsolete
var.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|