Skip to content

Parallelize Firecracker fork restore hot path#258

Closed
sjmiller609 wants to merge 3 commits into
codex/uffd-snapshot-pagerfrom
codex/uffd-fork-parallel-latency
Closed

Parallelize Firecracker fork restore hot path#258
sjmiller609 wants to merge 3 commits into
codex/uffd-snapshot-pagerfrom
codex/uffd-fork-parallel-latency

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

Summary

  • remove the source snapshot alias write lock from the fork restore path
  • add detailed network allocation tracing
  • reserve network identity under a short mutex, create TAP devices outside it, and defer TC shaping to a serialized async path
  • cache initialized default network metadata so burst forks do not repeatedly query bridge state

Benchmarks

  • 25-way browser fork/CDP burst before async TC: fork avg ~2.96s, p50 ~2.80s, p95 ~5.50s
  • traced pre-async TC sample: fork avg ~3.42s, p50 ~3.23s, p95 ~6.07s
  • traced async-TC sample: fork avg ~1.65s, p50 ~1.27s, p95 ~3.14s; TC queue moved off the fork response path
  • non-OTEL async-TC warm sample: fork avg ~1.74s, p50 ~1.70s, p95 ~3.02s
  • later non-OTEL sample on the same host regressed to ~3.94s avg while host load was ~112, so the concurrency benchmark is currently host-load sensitive

Testing

  • go test ./lib/network ./lib/hypervisor/firecracker ./lib/instances ./cmd/api/api -run 'TestPrepareFork|TestWithSnapshotSourceDirAlias|TestGetInstanceWaitsDuringSnapshotSourceAlias|TestForkInstanceStoppedSourceUsesReadLock|TestForkInstance_InsufficientResources|TestForkInstance|TestAllocateUniqueMAC|TestGenerate|TestIncrement|TestFormatTcRate|TestNameExists|TestCreateAllocation|TestPendingAllocation|TestDefaultNetworkCache'
  • remote Linux: go test ./lib/network ./lib/instances ./cmd/api/api -run 'TestForkInstance_InsufficientResources|TestForkInstanceStoppedSourceUsesReadLock|TestGetInstanceWaitsDuringSnapshotSourceAlias|TestAllocateUniqueMAC|TestIncrement|TestFormatTcRate|TestNameExists|TestCreateAllocation|TestPendingAllocation|TestDefaultNetworkCache'

Note: full local package run also hit an unrelated macOS environment failure because mkfs.ext4 is not installed for an API volume test.

@sjmiller609
Copy link
Copy Markdown
Collaborator Author

Closing in favor of the split replacement PRs: #263 for Firecracker fork concurrency and #266 for network/TAP/TC parallelization. Keeping the old branch around for comparison.

@sjmiller609 sjmiller609 closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant