feat(buildrunner): add Buildkite BuildRunner implementation by JamyDev · Pull Request #190 · uber/submitqueue

JamyDev · 2026-06-04T04:05:22Z

What

Adds submitqueue/extension/buildrunner/buildkite — a BuildRunner implementation backed by the Buildkite CI platform. The package is four files:

buildkite.go — runner struct implementing BuildRunner.Trigger, .Status, and .Cancel
client.go — thin HTTP wrapper over the three Buildkite REST endpoints (create/get/cancel build)
buildkite_test.go — 16 unit tests backed by an in-process httptest server
BUILD.bazel — Bazel build file

Trigger calls the Buildkite API to create a build, encoding the base and head change URIs as JSON-encoded environment variables (SQ_BASE_URIS, SQ_HEAD_URIS, SQ_QUEUE). The Buildkite
pipeline script uses these to fetch diffs via the GitHub API, apply them in order, and run CI. Status and Cancel derive the Buildkite build number from the BuildID string — no local
state is needed.

Auth and base URL are injected via HTTP transport layers on the caller-provided *http.Client (using httpclient.BaseURLTransport + an auth transport), following the existing httpclient
pattern and keeping credentials out of the struct.

Why

The BuildRunner extension interface was designed to support multiple CI platforms as pluggable backends. This adds the first real (non-noop) implementation. Buildkite is the target CI
platform for speculative merges: its pipeline model maps cleanly onto the base/head separation in BuildRunner.Trigger — the pipeline applies the base layer first (potentially cacheable
across batches), then the head layer, then runs the full test suite.

Using environment variables to pass change URIs keeps the SQ→Buildkite contract simple and auditable; the pipeline script is responsible for materialising the diffs, isolating SQ from
Buildkite's pipeline YAML.

Tests

All three interface methods are tested against an in-process httptest server — no external dependencies or mocks:

Trigger: validates HTTP method, request payload (env var keys/values), URI flattening across multiple Change entries, and that a nil base produces [] (not null) in JSON
Status: state mapping — exhaustive table over all documented Buildkite states (creating, scheduled, running, blocked, passed, failed, not_run, skipped, canceling,
canceled) plus unknown future states (mapped to non-terminal Unknown so the poll loop keeps waiting rather than failing a batch on an unrecognised state); also validates build URL is
returned in metadata
Cancel: verifies the cancel endpoint is called; verifies 422 (already-terminal build) is treated as a no-op
Helpers: encode/parse round-trip, invalid build ID rejection

Implements buildrunner.BuildRunner and buildrunner.Factory backed by the Buildkite CI platform, under submitqueue/extension/buildrunner/buildkite. Design: - Trigger/Cancel return promptly (per the async/sync contract in the build-runner RFC): both enqueue work on a buffered channel and a background worker contacts Buildkite, keeping the orchestrator's queue loops decoupled from Buildkite availability. - The in-memory build-ID -> Buildkite-ref map is a pure latency cache, not the source of truth. The SQ build ID is stamped into the Buildkite build's metadata at create time; Status and Cancel re-derive the ref via a metadata-filtered build lookup on cache miss, so they remain correct after a process restart that empties the cache (no orphaned Accepted-forever builds). - The background worker retries transient create/cancel failures with backoff. If a submission exhausts its retries, the build is recorded as a submission failure and Status reports terminal Failed (with the reason in BuildMetadata["error"]) rather than polling Accepted forever. - mapState collapses Buildkite states into BuildStatus; unrecognised states map to the non-terminal Unknown (not Failed), so an unknown state does not terminally fail a batch. - Each job is dispatched to its own goroutine so a retrying submit does not head-of-line block other builds. Errors are returned plain (classification left to the controller, per core/errs). Includes config knobs (queue sizes, submit timeout, retry attempts/backoff, test overrides) and unit tests covering trigger/status/cancel, cache-miss recovery, retry-then-succeed, retry-exhaustion-fails, and state mapping. Known limitation: a process restart while a trigger job is still buffered (never submitted to Buildkite) leaves the build Accepted with nothing to recover; catching that belongs to an orchestrator-level Accepted deadline, which is out of scope for this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… via HTTP transport - Replace New(Config) with NewBuildRunner(Params) following the MergeChecker Params pattern: Config holds queue-specific settings, HTTPClient is injected by the caller with auth and base URL already configured via transports - Remove APIToken from Config; auth is now the caller's concern via a transport layer (e.g. Authorization-header RoundTripper) - Remove BaseURL from Config and client; client now uses relative paths so BaseURLTransport on the injected client resolves them to absolute URLs - Delete factory.go; the Factory belongs in the example server (per-queue wiring is an application-level concern, not an extension concern)

Replace the channel-based async machinery (triggerCh, cancelCh, background worker goroutine, retry loop, submitFailures map) with direct synchronous Buildkite API calls in Trigger and Cancel. Errors propagate to the caller so the queue consumer can nack and retry through the normal path. Config is simplified to QueueName only; the channel buffer sizes, submit timeout, max attempts, and backoff fields are removed. Tests drop the drainTrigger/drainCancel helpers and async-specific cases in favour of straightforward call-and-assert patterns.

albertywu · 2026-06-05T15:24:41Z

+	return uris
+}
+
+// encodeBuildNumber encodes a Buildkite build number as the SQ build ID.


does this build id need to be namespaced somehow to prevent conflicts if there are multiple build systems used that might use the same id number?

I think it doesn't matter since inside the orchestrator they are all bounded by their batch ids, but we can always switch it to uuid if we want to fully avoid conflicts. Build numbers on BK are unique per pipeline.

albertywu · 2026-06-05T15:25:17Z

+// build number as the build ID. Errors are propagated to the caller so the
+// queue consumer can nack and retry.
+func (r *runner) Trigger(ctx context.Context, base, head []entity.Change, _ entity.BuildMetadata) (entity.BuildID, error) {
+	baseJSON, _ := json.Marshal(flattenURIs(base))


just curious (outside scope of this PR) -- will we actually apply multiple base changes, and if so why? This is different from SQV1 right?

v1 did apply multiple changes as bases, only the last change in the list was considered the "head change". I think distinctly in v2 we allow for more than a single test change to be applied to the batch, but the bases follow the same pattern as v1.

albertywu · 2026-06-05T15:26:38Z

+}
+
+// Status fetches the current state of the build from Buildkite and returns it
+// with the build URL in BuildMetadata["url"].


Can we check w/CI team to see if any concerns with the polling approach mainly w.r.t rate limits -- are we are planning on using polling in production or a webhook-based approach?

albertywu · 2026-06-05T15:29:45Z

+	// HTTPClient is a pre-configured HTTP client. The caller is responsible
+	// for the base URL (via httpclient.BaseURLTransport) and auth (via a
+	// transport layer). If nil, http.DefaultClient is used.
+	HTTPClient *http.Client


will we be able to use this extension as-is for our on-prem deployment or would we need one that does everything here except uses CI Gateway?

Will have to double check, this was just to have a reference implementation for OSS and just to test the design; I would likely have a completely separate internal one for ci-gw

JamyDev requested review from a team, behinddwalls and sbalabanov as code owners June 4, 2026 04:05

JamyDev added the ChangesPlanned label Jun 4, 2026

behinddwalls reviewed Jun 4, 2026

View reviewed changes

JamyDev added 3 commits June 4, 2026 11:42

fixup! move some more variables into the httpClient

a8a0678

behinddwalls reviewed Jun 5, 2026

View reviewed changes

Comment thread submitqueue/extension/buildrunner/buildkite/config.go Outdated

Comment thread submitqueue/extension/buildrunner/buildkite/buildkite.go Outdated

Comment thread submitqueue/extension/buildrunner/buildkite/buildkite.go Outdated

JamyDev and others added 4 commits June 4, 2026 20:24

fixup! reduce complexity

6e8c498

Merge branch 'main' into feat/buildkite-buildrunner

e80be59

fixup! gazelle

5ce5e8c

remove config

4228731

JamyDev removed the ChangesPlanned label Jun 5, 2026

behinddwalls reviewed Jun 5, 2026

View reviewed changes

Comment thread submitqueue/extension/buildrunner/buildkite/buildkite.go Outdated

behinddwalls approved these changes Jun 5, 2026

View reviewed changes

JamyDev enabled auto-merge June 5, 2026 04:57

JamyDev disabled auto-merge June 5, 2026 04:57

JamyDev and others added 2 commits June 4, 2026 21:59

fixup! error on no client or config

67c40d4

Merge branch 'main' into feat/buildkite-buildrunner

dc9f929

JamyDev added this pull request to the merge queue Jun 5, 2026

Merged via the queue into main with commit d2bc17c Jun 5, 2026
15 checks passed

JamyDev deleted the feat/buildkite-buildrunner branch June 5, 2026 05:05

JamyDev temporarily deployed to stack-rebase June 5, 2026 05:05 — with GitHub Actions Inactive

albertywu reviewed Jun 5, 2026

View reviewed changes

Conversation

JamyDev commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albertywu Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

JamyDev Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

albertywu Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

JamyDev Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

albertywu Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

albertywu Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

JamyDev Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JamyDev commented Jun 4, 2026 •

edited

Loading