Secondary Error Codes: Difference between revisions
No edit summary |
No edit summary |
||
Line 85: | Line 85: | ||
|} | |} | ||
After [[GPU]] power and clocking are available, the [[SMC]] starts [[SMC#seqUnReset|seqUnReset]] which releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted in the time allotted, | After [[GPU]] power and clocking are available, the [[SMC]] starts [[SMC#seqUnReset|seqUnReset]] which releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted in the time allotted, EC_BOOT will be reported. The SMC will retry 4 more times. If on the final attempt, GPU_RST_DONE is still not asserted, the SMC remains in [[Error Codes#EC_BOOT|EC_BOOT]] and the 0020 code is displayed on the front panel. | ||
=== 0021 === | === 0021 === | ||
Line 94: | Line 94: | ||
|} | |} | ||
After receiving GPU_RST_DONE during [[SMC#seqUnReset|seqUnReset]], the [[SMC]] monitors the [[PCIe]] L0 status and waits for the link to enter the L0 state. If the link does not enter the L0 state in the time allotted, | After receiving GPU_RST_DONE during [[SMC#seqUnReset|seqUnReset]], the [[SMC]] monitors the [[PCIe]] L0 status and waits for the link to enter the L0 state. If the link does not enter the L0 state in the time allotted, EC_BOOT will be reported. The SMC will retry 4 more times. The SMC will retry 4 more times. If on the final attempt, the link still does not enter L0 state, the SMC remains in [[Error Codes#EC_BOOT|EC_BOOT]] and the 0021 code is displayed on the front panel. | ||
=== 0022 === | === 0022 === | ||
Line 103: | Line 103: | ||
|} | |} | ||
After the [[PCIe]] link has entered the L0 state during [[SMC#seqUnReset|seqUnReset]], the SMC releases the [[CPU]] from reset. The CPU will run the [[Bootloaders]] and start the [[XSS]]. When the XSS starts, it will attempt to retrieve the power up cause from the SMC. If the SMC does not receive GetPowerUpCause in the time allotted, | After the [[PCIe]] link has entered the L0 state during [[SMC#seqUnReset|seqUnReset]], the SMC releases the [[CPU]] from reset. The CPU will run the [[Bootloaders]] and start the [[XSS]]. When the XSS starts, it will attempt to retrieve the power up cause from the SMC. If the SMC does not receive GetPowerUpCause in the time allotted, EC_BOOT will be reported. The SMC will retry 4 more times. If on the final attempt, GetPowerUpCause is still not received, the SMC remains in [[Error Codes#EC_BOOT|EC_BOOT]] and the 0022 code is displayed on the front panel. | ||
{{Debug and Repair}} | {{Debug and Repair}} | ||
[[Category:Repair Guides]] | [[Category:Repair Guides]] |
Revision as of 21:06, 5 January 2023
SMC Errors
These errors are generated by the System Management Controller.
0001
ANA_V12P0_PWRGD is driven high by the ANA (later HANA) as long as the V_12P0 rail is within tolerance. If V_12P0 ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0001 code to be displayed on the front panel.
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x01 | ERROR_V_12P0 | ANA_V12P0_PWRGD negated unexpectedly | EC_FATAL | 0001 |
0002
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x02 | ERROR_V_CPUCORE | VREG_CPU_PWRGD negated unexpectedly | EC_FATAL | 0002 |
VREG_CPU_PWRGD is driven high by the CPU power controller as long as the V_CPUCORE rail is within tolerance. If V_CPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0002 code to be displayed on the front panel.
0003
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x03 | ERROR_V_GPUCORE | VREG_GPU_PWRGD negated unexpectedly | EC_FATAL | 0003 |
VREG_GPU_PWRGD is driven high by the GPU power controller as long as the V_GPUCORE rail is within tolerance. If V_GPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0003 code to be displayed on the front panel.
0010
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x04 | ERROR_NO_ANA | ANA/HANA is not responding to reads or writes | EC_FATAL | 0010 |
The SMC communicates with the ANA/HANA via the SMBus. If communication is lost, the SMC enters EC_FATAL and the 0010 code is displayed on the front panel.
0011
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x05 | ERROR_THERMAL_CPU | CPU thermal overload | EC_THERMAL | Thermal Overload |
The SMC monitors the CPU thermal diode as reported by the ANA/HANA. If the CPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0011 code is displayed on the front panel.
0012
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x06 | ERROR_THERMAL_GPU | GPU thermal overload | EC_THERMAL | Thermal Overload |
The SMC monitors the GPU thermal diode as reported by the ANA/HANA. If the GPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0012 code is displayed on the front panel.
0013
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x07 | ERROR_THERMAL_EDRAM | eDRAM thermal overload | EC_THERMAL | Thermal Overload |
The SMC monitors the eDRAM thermal diode as reported by the ANA/HANA. If the eDRAM temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0013 code is displayed on the front panel.
0020
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x08 | ERROR_GPU_RST_DONE | GPU_RST_DONE signal not asserted after seqUnReset time passed | EC_BOOT | 0020 |
After GPU power and clocking are available, the SMC starts seqUnReset which releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted in the time allotted, EC_BOOT will be reported. The SMC will retry 4 more times. If on the final attempt, GPU_RST_DONE is still not asserted, the SMC remains in EC_BOOT and the 0020 code is displayed on the front panel.
0021
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x09 | ERROR_NO_PCIE | PCIe link did not enter L0 after seqUnReset time passed | EC_BOOT | 0021 |
After receiving GPU_RST_DONE during seqUnReset, the SMC monitors the PCIe L0 status and waits for the link to enter the L0 state. If the link does not enter the L0 state in the time allotted, EC_BOOT will be reported. The SMC will retry 4 more times. The SMC will retry 4 more times. If on the final attempt, the link still does not enter L0 state, the SMC remains in EC_BOOT and the 0021 code is displayed on the front panel.
0022
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x0A | ERROR_NO_HANDSHAKE | CPU did not send GetPowerUpCause to SMC | EC_BOOT | 0022 |
After the PCIe link has entered the L0 state during seqUnReset, the SMC releases the CPU from reset. The CPU will run the Bootloaders and start the XSS. When the XSS starts, it will attempt to retrieve the power up cause from the SMC. If the SMC does not receive GetPowerUpCause in the time allotted, EC_BOOT will be reported. The SMC will retry 4 more times. If on the final attempt, GetPowerUpCause is still not received, the SMC remains in EC_BOOT and the 0022 code is displayed on the front panel.