Skip to content

ci: reduce unit test flakiness and shard re-run cost#3844

Open
nicktrn wants to merge 1 commit into
mainfrom
ci/speed-up-flaky-unit-tests-tri-10485
Open

ci: reduce unit test flakiness and shard re-run cost#3844
nicktrn wants to merge 1 commit into
mainfrom
ci/speed-up-flaky-unit-tests-tri-10485

Conversation

@nicktrn
Copy link
Copy Markdown
Collaborator

@nicktrn nicktrn commented Jun 5, 2026

A unit-test shard recently failed on a timing race rather than a real regression - a run-engine waitpoint test sleeps 1250ms waiting on a 1000ms timeout that's processed by a ~1000ms worker poll, so on a CPU-starved shard the margin evaporates and the whole matrix goes red. Because fail-fast defaults on, that one flake cancels the sibling shards, and the only recovery is re-running the entire matrix "just to be sure" - which is itself slow.

This is the low-risk first pass at that pain:

  • fail-fast: false on the webapp and internal shard matrices, so one flaky shard no longer cancels its siblings. "Re-run failed jobs" now re-runs just the failed shard instead of the whole matrix.
  • CI-scoped retry: process.env.CI ? 2 : 0 on the timing-sensitive packages (run-engine, redis-worker, schedule-engine). Flakes self-heal in CI; local runs stay at retry: 0 so they still surface in dev. A stopgap until the timing tests are made deterministic.
  • fetch-depth: 1 on the unit-test checkouts - they don't use git history, so the full clone was wasted setup time across ~20 jobs.
  • Reconcile the pre-pull image tags with what testcontainers actually pulls (redis:7-alpine -> redis:7.2, ryuk:0.11.0 -> ryuk:0.14.0) and add minio/minio:latest to the webapp pre-pull. Otherwise those images pull unauthenticated at test time and risk Docker Hub rate-limit flakes (worst on fork PRs, where the authenticated pre-pull is skipped entirely).

Deeper follow-ups - bigger runners, turbo remote cache, runtime-weighted sharding, and the real root-cause fix (container reuse / template-DB isolation + deterministic timing tests) - are tracked under TRI-10484.

@nicktrn nicktrn self-assigned this Jun 5, 2026
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 5, 2026

⚠️ No Changeset found

Latest commit: f380957

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5bbf409f-8942-4e2d-8678-e23269eb4fdb

📥 Commits

Reviewing files that changed from the base of the PR and between 7c70982 and f380957.

📒 Files selected for processing (6)
  • .github/workflows/unit-tests-internal.yml
  • .github/workflows/unit-tests-packages.yml
  • .github/workflows/unit-tests-webapp.yml
  • internal-packages/run-engine/vitest.config.ts
  • internal-packages/schedule-engine/vitest.config.ts
  • packages/redis-worker/vitest.config.ts
✅ Files skipped from review due to trivial changes (1)
  • internal-packages/schedule-engine/vitest.config.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • .github/workflows/unit-tests-packages.yml
  • packages/redis-worker/vitest.config.ts
  • internal-packages/run-engine/vitest.config.ts
  • .github/workflows/unit-tests-internal.yml
  • .github/workflows/unit-tests-webapp.yml
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: typecheck / typecheck
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: Build and publish previews

Walkthrough

This PR updates CI infrastructure and test configuration across the codebase. GitHub Actions unit test workflows now disable matrix fail-fast where set, use shallow repository clones (fetch-depth: 1) for unitTests and merge-reports jobs, and pre-pull updated Docker images (redis:7.2, testcontainers/ryuk:0.14.0, plus minio in webapp). Three vitest config files add a CI-only retry setting (2 retries when running under CI, otherwise 0).

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: reducing unit test flakiness and shard re-run costs through CI optimizations and configuration adjustments.
Description check ✅ Passed The PR description is comprehensive and addresses all key sections, providing clear context about the problem, the multi-faceted solution, and future work.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/speed-up-flaky-unit-tests-tri-10485

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@nicktrn nicktrn force-pushed the ci/speed-up-flaky-unit-tests-tri-10485 branch from 7c70982 to f380957 Compare June 5, 2026 15:01
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 5, 2026

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@f380957

trigger.dev

npm i https://pkg.pr.new/trigger.dev@f380957

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@f380957

@trigger.dev/plugins

npm i https://pkg.pr.new/@trigger.dev/plugins@f380957

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@f380957

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@f380957

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@f380957

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@f380957

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@f380957

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@f380957

commit: f380957

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants