Secondary Error Codes

From XenonLibrary
Revision as of 18:18, 5 January 2023 by Octal450 (talk | contribs)
Jump to navigation Jump to search

SMC Errors

These errors are generated by the System Management Controller.

0001

ANA_V12P0_PWRGD is driven high by the ANA (later HANA) as long as the V_12P0 rail is within tolerance. If V_12P0 ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0001 code to be displayed on the front panel.

Hex Name Description Type Repair Guide
0x01 ERROR_V_12P0 ANA_V12P0_PWRGD negated unexpectedly EC_FATAL 0001

0002

Hex Name Description Type Repair Guide
0x02 ERROR_V_CPUCORE VREG_CPU_PWRGD negated unexpectedly EC_FATAL 0002

VREG_CPU_PWRGD is driven high by the CPU power controller as long as the V_CPUCORE rail is within tolerance. If V_CPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0002 code to be displayed on the front panel.

0003

Hex Name Description Type Repair Guide
0x03 ERROR_V_GPUCORE VREG_GPU_PWRGD negated unexpectedly EC_FATAL 0003

VREG_GPU_PWRGD is driven high by the GPU power controller as long as the V_GPUCORE rail is within tolerance. If V_GPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0003 code to be displayed on the front panel.

0010

Hex Name Description Type Repair Guide
0x04 ERROR_NO_ANA ANA/HANA is not responding to reads or writes EC_FATAL 0010

The SMC communicates with the ANA/HANA via the SMBus. If communication is lost, the SMC enters EC_FATAL and the 0010 code is displayed on the front panel.

0011

Hex Name Description Type Repair Guide
0x05 ERROR_THERMAL_CPU CPU thermal overload EC_THERMAL Thermal Overload

The SMC monitors the CPU thermal diode as reported by the ANA/HANA. If the CPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0011 code is displayed on the front panel.

0012

Hex Name Description Type Repair Guide
0x06 ERROR_THERMAL_GPU GPU thermal overload EC_THERMAL Thermal Overload

The SMC monitors the GPU thermal diode as reported by the ANA/HANA. If the GPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0012 code is displayed on the front panel.

0013

Hex Name Description Type Repair Guide
0x07 ERROR_THERMAL_EDRAM eDRAM thermal overload EC_THERMAL Thermal Overload

The SMC monitors the eDRAM thermal diode as reported by the ANA/HANA. If the eDRAM temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0013 code is displayed on the front panel.

0020

Hex Name Description Type Repair Guide
0x08 ERROR_GPU_RST_DONE GPU_RST_DONE signal not asserted after seqUnReset time passed EC_BOOT 0020

After GPU power and clocking are available, the SMC starts seqUnReset which releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted in the time allotted, the SMC will retry seqUnReset 4 more times. If GPU_RST_DONE is still not asserted, the SMC enters EC_BOOT and the 0020 code is displayed on the front panel.

0021

Hex Name Description Type Repair Guide
0x09 ERROR_NO_PCIE PCIe link did not enter L0 after seqUnReset time passed EC_BOOT 0021

After receiving GPU_RST_DONE during seqUnReset, the SMC monitors the PCIe L0 status and waits for the link to enter the L0 state. If the link does not enter the L0 state in the time allotted, the SMC will retry seqUnReset 4 more times. If the link still does not enter L0 state, the SMC enters EC_BOOT and the 0021 code is displayed on the front panel.

0022

Hex Name Description Type Repair Guide
0x0A ERROR_NO_HANDSHAKE CPU did not send GetPowerUpCause to SMC EC_BOOT 0022

After the PCIe link has entered the L0 state during seqUnReset, the SMC releases the CPU from reset. The CPU will run the Bootloaders and start the XSS. When the XSS starts, it will attempt to retrieve the power up cause from the SMC. If the SMC does not receive GetPowerUpCause in the time allotted, the SMC will retry seqUnReset 4 more times. If GetPowerUpCause is still not received, the SMC enters EC_BOOT and the 0022 code is displayed on the front panel.