Don't wake a runtime-suspended dGPU to service NVPCF/GPS ACPI notifies#1181
Open
ElXreno wants to merge 1 commit into
Open
Don't wake a runtime-suspended dGPU to service NVPCF/GPS ACPI notifies#1181ElXreno wants to merge 1 commit into
ElXreno wants to merge 1 commit into
Conversation
…notifies On RTD3 laptops the dGPU is runtime-suspended (D3cold) while idle. Some platforms still deliver ACPI Notify() events for the NVPCF device and for GPS status changes while the GPU is suspended (for example around battery/AC transitions, or when the SBIOS pushes a thermal or power-limit hint). rm_acpi_nvpcf_notify() and RmHandleGPSStatusChange() both call os_ref_dynamic_power() unconditionally, resuming the GPU only to deliver an event that is meaningless while it is powered down. The GPU then re-suspends, and where the next notify arrives immediately it never settles in D3cold, cycling D0/D3cold and draining the battery (see NVIDIA#860, where users work around it by patching the ACPI tables to drop the Notify(NPCF, 0xC0)). Skip the work when the GPU is already runtime-suspended (NV_DYNAMIC_POWER_STATE_IDLE_INDICATED), the same guard that rm_pmu_perfmon_get_load() already uses. The NVPCF event is only consumed while the GPU is active, and GPS/SBIOS state is re-read on the next StateLoad, so no state is lost.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On RTD3 laptops the discrete GPU sits in D3cold while idle. On some machines the platform keeps delivering ACPI
Notify()events (to the NVPCF device, and as GPS status changes) even while the GPU is suspended, for example around battery/AC transitions or when the SBIOS pushes a new thermal or power-limit hint.The two handlers that service those notifies,
rm_acpi_nvpcf_notify()andRmHandleGPSStatusChange(), both callos_ref_dynamic_power()unconditionally. That resumes the GPU purely to deliver an event that does nothing while it's powered down, and it then re-suspends. On a fair number of laptops the next notify lands right away, so the GPU never settles in D3cold. Folks in #860 describe it cycling D0/D3cold every ~11 seconds on battery, and the current workaround is to patch the ACPI tables to strip theNotify(NPCF, 0xC0).This skips the resume when the GPU is already runtime-suspended (
NV_DYNAMIC_POWER_STATE_IDLE_INDICATED), the same guardrm_pmu_perfmon_get_load()already uses a few functions away. Why it's safe:StateLoad, so a skipped sync is recovered when the GPU next powers uprm_power_source_change_event) and are left untouchedI traced the wake on an RTX 4060 mobile (ASUS TUF, open module):
With the guard in place the handlers still run, but the GPU stays in D3cold. Verified on 595.45.04 and 610.43.02.