Improve Firecracker fork concurrency by sjmiller609 · Pull Request #263 · kernel/hypeman

sjmiller609 · 2026-06-01T11:50:00Z

Summary

rewrites Firecracker snapshot-state data-dir paths for forks when an equal-length replacement is possible, including CRC recomputation
avoids the temporary source-dir alias on restore when the snapshot state can be rewritten directly
narrows alias locking to only the restore windows that still need a temporary source-dir alias
gates concurrent stopped/standby fork preparation behind SupportsConcurrentForkPrepare, with Firecracker opting in and other hypervisors keeping the previous exclusive source lock behavior

Tests

go test ./lib/hypervisor/firecracker ./lib/hypervisor ./lib/mailbox ./lib/guest ./lib/system/guest_agent ./lib/oapi -count=1
go test ./lib/instances -run 'TestForkInstanceStoppedSourceUsesReadLock|TestForkInstanceStoppedSourceRequiresWriteLockWithoutConcurrentPrepare' -count=1\n\n## Notes\n- the targeted ./lib/instances tests required local placeholder files for embedded test binaries that are not present in this checkout.

Note

Medium Risk
Changes fork, restore, and snapshot locking paths plus shared restore admission; behavior is well covered by tests but mistakes could cause rare restore/fork races or overload.

Overview
This PR improves Firecracker fork/restore concurrency and surfaces resource exhaustion more clearly on fork/restore APIs.

Firecracker fork prepare can now rewrite snapshot state file paths (with CRC64 validation) when source and target data-dir strings are the same length, so many forks avoid the temporary source data-dir symlink alias during restore. When rewrite is not possible, prepare reports RequiresSnapshotSourceAlias and restore still uses a narrow, shared alias lock (readers block on metadata/copy; writers only during the alias mutation window).

Instance fork uses a source read lock for stopped/standby sources when the hypervisor advertises SupportsConcurrentForkPrepare (Firecracker enables it); other hypervisors keep the exclusive source lock. Fork metadata admission is serialized with a final name check before save, and snapshot memory prepare is per snapshot directory.

Restore capacity is capped via configurable firecracker_max_concurrent_restores (default 32), with a shared slot pool across managers for that hypervisor/limit.

API maps ErrInsufficientResources to 409 with code insufficient_resources on restore, fork instance, and fork snapshot.

^{Reviewed by Cursor Bugbot for commit bb2602e. Bugbot is set up for automated code reviews on this repo. Configure here.}

firetiger-agent · 2026-06-02T14:42:10Z

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

PRs in the kernel, infra, hypeman, and hypeship repos. kernel is a ~mono repo with many logical services underneath, ensure to focus on the implicated service for the PR

Reason: PR repository is not specified, making it impossible to confirm it belongs to one of the allowed repos (kernel, infra, hypeman, hypeship); please clarify which repo this PR is in or opt in manually for deploy monitoring.

To monitor this PR anyway, reply with @firetiger monitor this.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ba08c4c. Configure here.}

This reverts commit a7bb3c5.

rgarcia

Did a read-through focused on the fork/restore concurrency changes. Overall the capability-gated approach is clean and the snapshot-state path rewrite is a nice way to avoid the symlink dance. Three things I'd like to discuss before approving — none are blockers on their own, but #1 and #2 are about long-term operability.

1. The symlink-alias fallback is load-bearing but invisible, and gives us two failure modes to reason about.

For the common instance→instance fork the rewrite is guaranteed to work (IDs are fixed-length cuid2, so guests/<id> source and target are always equal length). The fallback in recordSnapshotSourceAliasFallback really only matters for (a) symlink-resolved paths from EvalSymlinks and (b) diff-snapshot chains that retain an upstream base path (RetainSnapshotSourceDataDirAlias), where the alias is genuinely required for correctness.

My concern is operational: the alias path is slower and globally serialized, and right now it's entered silently. During an incident I don't want to have to reason about whether a host is hitting the rewrite path or the alias path. Two asks:

Emit a metric/log counter whenever the alias fallback fires, so "are we on the slow path?" is a one-query answer.
Consider making the non-rewritable case a hard error unless it's the legitimate retained-base diff-snapshot case. That way "alias path" only ever means the one well-understood scenario, instead of a catch-all.

If there's a path to giving diff-snapshot base reuse a stable fixed-length canonical path, we could eventually drop the alias entirely — but I don't think that needs to happen in this PR.

2. Firecracker-specific concept leaking into the generic instances manager.

This PR introduces the clean pattern for hypervisor-specific behavior — a capability plus supportsConcurrentForkPrepare(hvType) — and then doesn't use it for the restore limiter. acquireFirecrackerRestoreSlot hard-codes stored.HypervisorType != hypervisor.TypeFirecracker, and we now have a firecrackerRestoreSlots field on the generic manager, a firecrackerRestoreSlotPools package global, and FirecrackerMaxConcurrentRestores in the manager config. To stay consistent with the capability approach a few lines up, I'd rather see the concurrency limit advertised by the hypervisor (e.g. a MaxConcurrentRestores on Capabilities or typed config keyed by hypervisor.Type) and the manager keep a generic map[hypervisor.Type]chan struct{}, so manager.go/restore.go never name Firecracker directly. To be clear, the existence of a restore limiter is fine and probably inherent to Firecracker's restore cost — it's the naming and the hard-coded type switch in the generic layer I'd like to clean up.

3. ForkInstance's read-lock logic needs explanation.

The useReadLock block is a double-checked-locking pattern (load metadata under RLock, drop it, re-acquire and re-load to re-validate) and its correctness isn't self-evident. The escalation condition (useReadLock && sourceState == StateStandby && targetState == StateRunning && targetRestoreNeedsSourceLock) is also a dense boolean. Please add a comment explaining why we re-validate under the second lock and what each escalation condition guards against — this is exactly the kind of concurrency code the next person will be afraid to touch without context.

Also +1 on getting the Bugbot duplicate-name finding resolved (concurrent same-name forks both passing instanceNameExists/NameExists under the read lock) before merge, unless the serialized name-admission already covers it — if so, a comment pointing at where that's enforced would help.

rgarcia

Approving — the review feedback is well addressed:

Hypervisor leakage: clean refactor. acquireRestoreSlot(ctx, hvType) + restoreSlotsByHypervisor map[hypervisor.Type]chan struct{} + MaxConcurrentRestoresByHypervisor, with the only Firecracker-specific wiring now living in providers.go. Exactly the capability-consistent shape I was hoping for. (Tiny nit, non-blocking: defaultManagerFirecrackerMaxConcurrentRestores is now applied as the generic per-hypervisor default, so the name reads a little oddly.)
Alias fallback visibility: the WarnContext lines in both fork paths are enough to answer "are we on the slow path?" during an incident. A counter metric would be a nice future add but I don't consider it blocking.
useReadLock docs: the three added comments make the double-checked-lock + escalation logic readable. Thank you.
Duplicate fork names: agreed that saveForkMetadata under forkMetadataMu with the re-check before save is the right serialized admission point.

Two open Bugbot threads still look valid to me and don't appear to be touched by 32b62db. Approving so as not to block, but I'd like these resolved (or explicitly dispositioned) before merge:

VsockCID collision on concurrent standby forks. fork.go still assigns standby forks the source's CID (forkMeta.VsockCID = stored.VsockCID) while stopped forks get a fresh generateVsockCID. With the new read-lock concurrency, multiple standby forks of one source can be prepared in parallel and then started to running, all sharing the source CID → vsock routing / guest-agent breakage. The comment says rewriting CID in restored snapshots isn't reliable across hypervisors, which I get — but then we likely need to either reallocate the CID at restore time for forks or serialize "running" admission per source CID. Can you confirm what prevents two running guests with the same CID here?
Stale source state in the escalated write-lock restore. applyForkTargetStateWithSourceLock re-takes the source write lock, but the targetRestoreNeedsSourceLock / sourceState == StateStandby decision was captured under an earlier RLock that was already released. Nothing re-validates the source state under the write lock before the aliased restore runs. If the source was restored to running in that gap, we could alias its data dir while its VMM is live. Does holding the source write lock actually guarantee the source isn't running by this point, or should we re-load + re-check source state under the write lock before aliasing?

Neither is a correctness concern for the common (rewrite-succeeds, single-fork) path, so I'm comfortable approving — but I'd treat #2 as the higher-risk of the two given it's the data-dir-corruption path.

sjmiller609 mentioned this pull request Jun 1, 2026

Parallelize Firecracker fork restore hot path #258

Closed

sjmiller609 force-pushed the hypeship/fc-fork-concurrency-v2 branch from cd4377a to c1b06c9 Compare June 1, 2026 19:55

sjmiller609 changed the base branch from hypeship/fc-resume-on-load-v2 to main June 1, 2026 19:55

sjmiller609 commented Jun 2, 2026

View reviewed changes

Comment thread lib/hypervisor/firecracker/snapshot_state.go

sjmiller609 commented Jun 2, 2026

View reviewed changes

Comment thread lib/hypervisor/firecracker/snapshot_state.go

sjmiller609 commented Jun 2, 2026

View reviewed changes

Comment thread lib/hypervisor/firecracker/snapshot_state_fork.go

sjmiller609 commented Jun 2, 2026

View reviewed changes

Comment thread lib/hypervisor/hypervisor.go

sjmiller609 marked this pull request as ready for review June 2, 2026 14:42

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/instances/snapshot_alias_lock.go

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/instances/snapshot_alias_lock.go

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/hypervisor/firecracker/snapshot_state_fork.go Outdated

sjmiller609 commented Jun 2, 2026

View reviewed changes

Comment thread config.example.yaml

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/instances/manager.go

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/instances/snapshot_alias_lock.go

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread lib/network/bridge_linux.go