Secondary Error Codes: Difference between revisions

From XenonLibrary
Jump to navigation Jump to search
No edit summary
No edit summary
Line 74: Line 74:
|}
|}


On power up, after [[GPU]] power and clocking available, the [[SMC]] starts [[SMC#seqUnReset|seqUnReset]] and releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted by the time allotted for seqUnReset, the SMC will retry seqUnReset 4 more times. If GPU_RST_DONE is still not asserted, the SMC enters [[Error Codes#EC_BOOT|EC_BOOT]] and the 0020 code is displayed on the front panel.
On power up, after [[GPU]] power and clocking available, the [[SMC]] starts [[SMC#seqUnReset|seqUnReset]] and releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted by the time allotted, the SMC will retry seqUnReset 4 more times. If GPU_RST_DONE is still not asserted, the SMC enters [[Error Codes#EC_BOOT|EC_BOOT]] and the 0020 code is displayed on the front panel.
 
=== 0021 ===
{|class="wikitable"
! Hex !! Name !! Description !! Type !! Repair Guide
|-
| 0x09 || ERROR_NO_PCIE || PCIe link did not enter L0 after seqUnReset time passed || EC_BOOT || [[Repair Guides/0021|0021]]
|}
 
After receiving GPU_RST_DONE, the [[SMC]] monitors the [[PCIe]] L0 status and waits for the link to enter L0 state. If the link does not enter the L0 state by the time allotted, the SMC will retry seqUnReset 4 more times. If the link still does not enter L0 state, the SMC enters [[Error Codes#EC_BOOT|EC_BOOT]] and the 0021 code is displayed on the front panel.


{{Debug and Repair}}
{{Debug and Repair}}

Revision as of 17:52, 5 January 2023

SMC Errors

These errors are generated by the System Management Controller.

0001

ANA_V12P0_PWRGD is driven high by the ANA (later HANA) as long as the V_12P0 rail is within tolerance. If V_12P0 ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0001 code to be displayed on the front panel.

Hex Name Description Type Repair Guide
0x01 ERROR_V_12P0 ANA_V12P0_PWRGD negated unexpectedly EC_FATAL 0001

0002

Hex Name Description Type Repair Guide
0x02 ERROR_V_CPUCORE VREG_CPU_PWRGD negated unexpectedly EC_FATAL 0002

VREG_CPU_PWRGD is driven high by the CPU power controller as long as the V_CPUCORE rail is within tolerance. If V_CPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0002 code to be displayed on the front panel.

0003

Hex Name Description Type Repair Guide
0x03 ERROR_V_GPUCORE VREG_GPU_PWRGD negated unexpectedly EC_FATAL 0003

VREG_GPU_PWRGD is driven high by the GPU power controller as long as the V_GPUCORE rail is within tolerance. If V_GPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0003 code to be displayed on the front panel.

0010

Hex Name Description Type Repair Guide
0x04 ERROR_NO_ANA ANA/HANA is not responding to reads or writes EC_FATAL 0010

The SMC communicates with the ANA/HANA via the SMBus. If communication is lost, the SMC enters EC_FATAL and the 0010 code is displayed on the front panel.

0011

Hex Name Description Type Repair Guide
0x05 ERROR_THERMAL_CPU CPU thermal overload EC_THERMAL Thermal Overload

The SMC monitors the CPU thermal diode as reported by the ANA/HANA. If the CPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0011 code is displayed on the front panel.

0012

Hex Name Description Type Repair Guide
0x06 ERROR_THERMAL_GPU GPU thermal overload EC_THERMAL Thermal Overload

The SMC monitors the GPU thermal diode as reported by the ANA/HANA. If the GPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0012 code is displayed on the front panel.

0013

Hex Name Description Type Repair Guide
0x07 ERROR_THERMAL_EDRAM eDRAM thermal overload EC_THERMAL Thermal Overload

The SMC monitors the eDRAM thermal diode as reported by the ANA/HANA. If the eDRAM temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0013 code is displayed on the front panel.

0020

Hex Name Description Type Repair Guide
0x08 ERROR_GPU_RST_DONE GPU_RST_DONE signal not asserted after seqUnReset time passed EC_BOOT 0020

On power up, after GPU power and clocking available, the SMC starts seqUnReset and releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted by the time allotted, the SMC will retry seqUnReset 4 more times. If GPU_RST_DONE is still not asserted, the SMC enters EC_BOOT and the 0020 code is displayed on the front panel.

0021

Hex Name Description Type Repair Guide
0x09 ERROR_NO_PCIE PCIe link did not enter L0 after seqUnReset time passed EC_BOOT 0021

After receiving GPU_RST_DONE, the SMC monitors the PCIe L0 status and waits for the link to enter L0 state. If the link does not enter the L0 state by the time allotted, the SMC will retry seqUnReset 4 more times. If the link still does not enter L0 state, the SMC enters EC_BOOT and the 0021 code is displayed on the front panel.