fix(pool): clamp pool top-up to runners_maximum_count#5187
Open
jeff-french wants to merge 1 commit into
Open
Conversation
The pool lambda (`adjustPool`) topped up purely against `pool_size` and never read `runners_maximum_count`, so under sustained load—where newly created runners immediately become busy and stop counting toward the idle-only pool size—it would launch a fresh `pool_size` batch every cycle with no upper bound, driving total runners far past the configured maximum while the scale-up lambda correctly refused to launch. Clamp the top-up to the remaining headroom under `runners_maximum_count` (busy + idle). `ec2runners` already holds every running runner for the type, so its length is the current total—no extra API call. The env var defaults to `-1` (unlimited), matching scale-up semantics and preserving behavior when unset. Thread the value into the pool lambda's environment via Terraform. Fixes github-aws-runners#5186 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
runners_maximum_countwas enforced only by the scale-up lambda. The pool lambda (adjustPool) had no knowledge of the maximum and topped up purely againstpool_size, so a warm pool could drive the total number of runners far pastrunners_maximum_count.calculatePooSize()counts only idle runners. Under a sustained burst of queued jobs, runners created to fill the pool are immediately picked up and become busy, so they stop counting towardnumberOfRunnersInPool. Every scheduled pool cycle therefore sees ~0 idle runners and launches another fullpool_sizebatch — with no upper bound — while the scale-up lambda correctly refuses to launch ("maximum number of runners reached"). The two lambdas actively disagree about the cap.Fixes #5186.
Changes
lambdas/.../pool/pool.ts— readRUNNERS_MAXIMUM_COUNT(default-1= unlimited, matching scale-up semantics) and clamptopUpto the remaining headroom under the cap.ec2runnersalready contains every running runner for the type (busy + idle), so its length is the current total — no extra API call. Logs when the cap limits the top-up.modules/runners/pool/main.tf:RUNNERS_MAXIMUM_COUNT = var.config.runners_maximum_countmodules/runners/pool/variables.tf: addrunners_maximum_countto theconfigobjectmodules/runners/pool.tf:runners_maximum_count = var.runners_maximum_countmodules/runners/pool/README.md: regenerated docsBackward compatibility
Defaulting the env to
-1preserves current behavior when it is unset and matches the documented "-1disables the maximum check" semantics.Relationship to #5062
#5062 added
Math.max(0, …)in scale-up to stop a negativeTotalTargetCapacityreaching CreateFleet whencurrentRunnersalready exceedsmaximumRunners. That guards the crash symptom; this PR addresses the root cause of howcurrentRunnersexceedsmaximumRunners(the pool creating past the cap). The two are complementary.Tests
pool.test.tsadds cap coverage: at-max ⇒ 0 created, over-max ⇒ 0, headroom-clamped ⇒ 2, within-headroom ⇒ pool-driven, and-1⇒ unlimited. The baseRUNNERS_MAXIMUM_COUNTin the suite is set to-1so the existing pool-logic tests remain cap-free.terraform validate/terraform fmt: clean🤖 Generated with Claude Code