Secondary Error Codes: Difference between revisions
No edit summary |
No edit summary |
||
Line 74: | Line 74: | ||
|} | |} | ||
After [[GPU]] power and clocking available, the [[SMC]] starts [[SMC#seqUnReset|seqUnReset]] | After [[GPU]] power and clocking are available, the [[SMC]] starts [[SMC#seqUnReset|seqUnReset]] which releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted in the time allotted, the SMC will retry seqUnReset 4 more times. If GPU_RST_DONE is still not asserted, the SMC enters [[Error Codes#EC_BOOT|EC_BOOT]] and the 0020 code is displayed on the front panel. | ||
=== 0021 === | === 0021 === | ||
Line 83: | Line 83: | ||
|} | |} | ||
After receiving GPU_RST_DONE, the [[SMC]] monitors the [[PCIe]] L0 status and waits for the link to enter L0 state. If the link does not enter the L0 state in the time allotted, the SMC will retry seqUnReset 4 more times. If the link still does not enter L0 state, the SMC enters [[Error Codes#EC_BOOT|EC_BOOT]] and the 0021 code is displayed on the front panel. | After receiving GPU_RST_DONE during [[SMC#seqUnReset|seqUnReset]], the [[SMC]] monitors the [[PCIe]] L0 status and waits for the link to enter the L0 state. If the link does not enter the L0 state in the time allotted, the SMC will retry seqUnReset 4 more times. If the link still does not enter L0 state, the SMC enters [[Error Codes#EC_BOOT|EC_BOOT]] and the 0021 code is displayed on the front panel. | ||
=== 0022 === | === 0022 === | ||
Line 92: | Line 92: | ||
|} | |} | ||
After the [[PCIe]] link has entered the L0 state, the SMC | After the [[PCIe]] link has entered the L0 state during [[SMC#seqUnReset|seqUnReset]], the SMC releases the [[CPU]] from reset. The CPU will run the [[Bootloaders]] and start the [[XSS]]. When the XSS starts, it will attempt to retrieve the power up cause from the SMC. If the SMC does not receive GetPowerUpCause in the time allotted, the SMC will retry seqUnReset 4 more times. If GetPowerUpCause is still not received, the SMC enters [[Error Codes#EC_BOOT|EC_BOOT]] and the 0022 code is displayed on the front panel. | ||
{{Debug and Repair}} | {{Debug and Repair}} |
Revision as of 18:09, 5 January 2023
SMC Errors
These errors are generated by the System Management Controller.
0001
ANA_V12P0_PWRGD is driven high by the ANA (later HANA) as long as the V_12P0 rail is within tolerance. If V_12P0 ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0001 code to be displayed on the front panel.
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x01 | ERROR_V_12P0 | ANA_V12P0_PWRGD negated unexpectedly | EC_FATAL | 0001 |
0002
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x02 | ERROR_V_CPUCORE | VREG_CPU_PWRGD negated unexpectedly | EC_FATAL | 0002 |
VREG_CPU_PWRGD is driven high by the CPU power controller as long as the V_CPUCORE rail is within tolerance. If V_CPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0002 code to be displayed on the front panel.
0003
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x03 | ERROR_V_GPUCORE | VREG_GPU_PWRGD negated unexpectedly | EC_FATAL | 0003 |
VREG_GPU_PWRGD is driven high by the GPU power controller as long as the V_GPUCORE rail is within tolerance. If V_GPUCORE ever drops out of tolerance, the signal is de-asserted, causing the SMC to enter EC_FATAL and the 0003 code to be displayed on the front panel.
0010
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x04 | ERROR_NO_ANA | ANA/HANA is not responding to reads or writes | EC_FATAL | 0010 |
The SMC communicates with the ANA/HANA via the SMBus. If communication is lost, the SMC enters EC_FATAL and the 0010 code is displayed on the front panel.
0011
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x05 | ERROR_THERMAL_CPU | CPU thermal overload | EC_THERMAL | Thermal Overload |
The SMC monitors the CPU thermal diode as reported by the ANA/HANA. If the CPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0011 code is displayed on the front panel.
0012
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x06 | ERROR_THERMAL_GPU | GPU thermal overload | EC_THERMAL | Thermal Overload |
The SMC monitors the GPU thermal diode as reported by the ANA/HANA. If the GPU temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0012 code is displayed on the front panel.
0013
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x07 | ERROR_THERMAL_EDRAM | eDRAM thermal overload | EC_THERMAL | Thermal Overload |
The SMC monitors the eDRAM thermal diode as reported by the ANA/HANA. If the eDRAM temperature exceeds the Trip Temperature defined in the SMC Config, the SMC enters EC_THERMAL and the 0013 code is displayed on the front panel.
0020
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x08 | ERROR_GPU_RST_DONE | GPU_RST_DONE signal not asserted after seqUnReset time passed | EC_BOOT | 0020 |
After GPU power and clocking are available, the SMC starts seqUnReset which releases the GPU from reset. It then waits for the GPU to assert GPU_RST_DONE. If the GPU_RST_DONE signal is not asserted in the time allotted, the SMC will retry seqUnReset 4 more times. If GPU_RST_DONE is still not asserted, the SMC enters EC_BOOT and the 0020 code is displayed on the front panel.
0021
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x09 | ERROR_NO_PCIE | PCIe link did not enter L0 after seqUnReset time passed | EC_BOOT | 0021 |
After receiving GPU_RST_DONE during seqUnReset, the SMC monitors the PCIe L0 status and waits for the link to enter the L0 state. If the link does not enter the L0 state in the time allotted, the SMC will retry seqUnReset 4 more times. If the link still does not enter L0 state, the SMC enters EC_BOOT and the 0021 code is displayed on the front panel.
0022
Hex | Name | Description | Type | Repair Guide |
---|---|---|---|---|
0x0A | ERROR_NO_HANDSHAKE | CPU did not send GetPowerUpCause to SMC | EC_BOOT | 0022 |
After the PCIe link has entered the L0 state during seqUnReset, the SMC releases the CPU from reset. The CPU will run the Bootloaders and start the XSS. When the XSS starts, it will attempt to retrieve the power up cause from the SMC. If the SMC does not receive GetPowerUpCause in the time allotted, the SMC will retry seqUnReset 4 more times. If GetPowerUpCause is still not received, the SMC enters EC_BOOT and the 0022 code is displayed on the front panel.