fix(pool): clamp pool top-up to runners_maximum_count by jeff-french · Pull Request #5187 · github-aws-runners/terraform-aws-github-runner

jeff-french · 2026-06-26T18:55:10Z

Description

runners_maximum_count was enforced only by the scale-up lambda. The pool lambda (adjustPool) had no knowledge of the maximum and topped up purely against pool_size, so a warm pool could drive the total number of runners far past runners_maximum_count.

calculatePooSize() counts only idle runners. Under a sustained burst of queued jobs, runners created to fill the pool are immediately picked up and become busy, so they stop counting toward numberOfRunnersInPool. Every scheduled pool cycle therefore sees ~0 idle runners and launches another full pool_size batch — with no upper bound — while the scale-up lambda correctly refuses to launch ("maximum number of runners reached"). The two lambdas actively disagree about the cap.

Fixes #5186.

Changes

lambdas/.../pool/pool.ts — read RUNNERS_MAXIMUM_COUNT (default -1 = unlimited, matching scale-up semantics) and clamp topUp to the remaining headroom under the cap. ec2runners already contains every running runner for the type (busy + idle), so its length is the current total — no extra API call. Logs when the cap limits the top-up.
Terraform — thread the value into the pool lambda's environment:
- modules/runners/pool/main.tf: RUNNERS_MAXIMUM_COUNT = var.config.runners_maximum_count
- modules/runners/pool/variables.tf: add runners_maximum_count to the config object
- modules/runners/pool.tf: runners_maximum_count = var.runners_maximum_count
- modules/runners/pool/README.md: regenerated docs

Backward compatibility

Defaulting the env to -1 preserves current behavior when it is unset and matches the documented "-1 disables the maximum check" semantics.

Relationship to #5062

#5062 added Math.max(0, …) in scale-up to stop a negative TotalTargetCapacity reaching CreateFleet when currentRunners already exceeds maximumRunners. That guards the crash symptom; this PR addresses the root cause of how currentRunners exceeds maximumRunners (the pool creating past the cap). The two are complementary.

Tests

pool.test.ts adds cap coverage: at-max ⇒ 0 created, over-max ⇒ 0, headroom-clamped ⇒ 2, within-headroom ⇒ pool-driven, and -1 ⇒ unlimited. The base RUNNERS_MAXIMUM_COUNT in the suite is set to -1 so the existing pool-logic tests remain cap-free.

control-plane vitest suite: 499 passed
eslint / prettier --check: clean
terraform validate / terraform fmt: clean

🤖 Generated with Claude Code

The pool lambda (`adjustPool`) topped up purely against `pool_size` and never read `runners_maximum_count`, so under sustained load—where newly created runners immediately become busy and stop counting toward the idle-only pool size—it would launch a fresh `pool_size` batch every cycle with no upper bound, driving total runners far past the configured maximum while the scale-up lambda correctly refused to launch. Clamp the top-up to the remaining headroom under `runners_maximum_count` (busy + idle). `ec2runners` already holds every running runner for the type, so its length is the current total—no extra API call. The env var defaults to `-1` (unlimited), matching scale-up semantics and preserving behavior when unset. Thread the value into the pool lambda's environment via Terraform. Fixes github-aws-runners#5186 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeff-french requested a review from a team as a code owner June 26, 2026 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pool): clamp pool top-up to runners_maximum_count#5187

fix(pool): clamp pool top-up to runners_maximum_count#5187
jeff-french wants to merge 1 commit into
github-aws-runners:mainfrom
jeff-french:fix/pool-respect-runners-maximum-count

jeff-french commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jeff-french commented Jun 26, 2026

Description

Changes

Backward compatibility

Relationship to #5062

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant