
System Event Log Troubleshooting Guide for Intel
®
S5500/S3420 series Server Boards Memory subsystem
Revision 1.0 Intel order number G74211-001 59
[7:5] – Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached:
000b = Processor Socket 1
001b = Processor Socket 2
All other values are reserved.
[4:3] – Indicates the processor Memory Channel to which the failing DDR3 DIMM is attached:
00b = Channel A
01b = Channel B
10b = Channel C
11b is reserved.
[2:0] – Indicates the DIMM Socket on the channel to which the failing DDR3 DIMM is attached:
000b = DIMM Socket 1
001b = DIMM Socket 2
All other values are reserved.
Table 61: Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset – Next Steps
An uncorrectable (multi-bit) ECC error has occurred. This is a fatal issue that will typically
lead to an OS crash (unless memory has been configured in a RAS mode). The system
will generate a CATERR# (catastrophic error) and an MCE (Machine Check Exception
Error).
While the error may be due to a failing DRAM chip on the DIMM, it could also be cause by
incorrect seating or improper contact between socket and DIMM, or by bent pins in the
processor socket.
1. If needed, decode DIMM location from hex
version of SEL.
2. Verify DIMM is seated properly.
3. Examine gold fingers on edge of DIMM to
verify contacts are clean.
4. Inspect processor socket this DIMM is
connected to for bent pins, and if found,
replace the board.
5. Consider replacing the DIMM as a
preventative measure. For multiple
occurrences, replace the DIMM.
Comentarios a estos manuales