Add fork mailbox payload API#267
Conversation
✱ Stainless preview builds for hypemanThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ hypeman-openapi studio · code · diff
✅ hypeman-go studio · code · diff
✅ hypeman-typescript studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
dffc792 to
05bc363
Compare
…-payloads # Conflicts: # cmd/api/api/snapshots.go # cmd/api/api/snapshots_test.go # lib/forkvm/README.md # lib/guest/client.go # lib/instances/fork.go # lib/instances/guest_resume_network.go # lib/instances/manager.go # lib/instances/restore.go # lib/instances/restore_egress_test.go # lib/system/guest_agent/resume_network.go
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR repository cannot be determined from provided information; please confirm this is in the To monitor this PR anyway, reply with |
rgarcia
left a comment
There was a problem hiding this comment.
blocking on a few maintainability/design issues:
-
lib/instances/fork.go:50andlib/instances/snapshot.go:390only gate mailbox usage on standby + running target, butlib/instances/fork_mailbox.go:225assumes a Firecracker-style rawmemorysnapshot file and fixed mailbox offsets. Non-Firecracker standby forks can accept mailbox requests and fail later inside restore. Please make this an explicit capability/hypervisor check before fork work begins, so the API rejects unsupported mailbox forks at the boundary. -
lib/instances/fork_mailbox.go:209duplicates the open/stat/mmap/find/write/cache logic already inlib/instances/guest_resume_network.go:186. This is the same mailbox-frame operation with a different layout. Please push the common primitive intolib/mailboxwith layout config, instead of maintaining two snapshot-memory patchers that can drift. Ideally the fork path should also preflight all requested markers before writing any payloads, so multi-mailbox patching is structurally all-or-nothing rather than relying on outer fork cleanup. -
lib/instances/guest_resume_network.go:143adds mailbox ACK matching as raw substring checks.mailbox=foocan matchmailbox=foobar, and any free-form UDP text containingstage=appliedis accepted. Since this ACK now gates successful fork restore, please parse the key/value payload and require exact fields forstageandmailbox.
rgarcia
left a comment
There was a problem hiding this comment.
- Suggestion, maybe too early: Unify the two handoffs into one interface + slice (main ask).
After this PR, restoreInstance prepares two handoffs back-to-back (restore.go:263 and :271) that now implement the identical trio — prepare… / Close() / AfterResume(ctx) error — and the two post-resume teardown blocks (restore.go:332-341 and :343-349) are line-for-line the same recovery (hv.Shutdown → rollbackAdmissionAllocationActive → releaseNetwork → wrapped return). That's the setup for:
type resumeHandoff interface {
AfterResume(ctx context.Context) error
Close()
}
handoffs := []resumeHandoff{resumeNetworkHandoff, forkMailboxHandoff}
for _, h := range handoffs { defer h.Close() }
// after resume, in order:
for _, h := range handoffs {
if err := h.AfterResume(ctx); err != nil { /* one shared teardown */ }
}
Order is preserved (network up before the guest can UDP-ack), so the slice is safe. Collapses the duplicated teardown and makes a third handoff one line. No guest changes needed.
-
Light suggestion: Collapse the duplicated UDP-wait machinery.
WaitApplied and WaitMailboxApplied (guest_resume_network.go:119 and :140) are methods on the same waiter, reading the same channel, with identical select loops — they differ only in the match predicate. Suggest one wait(ctx, match func(fields map[string]string) bool) and route both through parseUDPAckFields. That also removes the case-handling mismatch between them (one lowercases the whole string, the other only the keys). -
WriteForkMailboxPayloadAt is a thin wrapper used only by its own test.
mailbox.go:110 just calls WritePayloadAt(w, ForkLayout, …); production calls WritePayloadAt directly (fork_mailbox.go:272), and the only caller of the wrapper is TestWriteForkMailboxPayloadAt. Suggest deleting it and pointing the test at WritePayloadAt. -
AckTimeout validation is asymmetric + message is off.
fork_mailbox.go:99-104: the < 0 check is gated on WaitForAck but the > 30s check isn't, and "must be positive" is misleading since 0 is valid (maps to the 2s default). Suggest aligning the gating and changing the message to "must not be negative." -
Question on wait_for_ack failure semantics.
On a missing ack, a fork that resumed successfully gets torn down (restore.go:343). Two asymmetries vs the resume-network path worth confirming are intentional: that path falls back to host-initiated reconfigure (resume_network_handoff.go:81) rather than aborting, and its default timeout is 5s vs 2s here. Since the guest-side consumer of a fork mailbox is caller-implemented, wait_for_ack=true against a guest that doesn't ack will reliably destroy the fork after 2s — fine if that's the intended contract, but worth a note on the API field and a deliberate choice on the 2s vs 5s default.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1778dc2. Configure here.
|
Closing this for now because we are dropping the mailbox direction. Leaving the branch intact for reference. |

Summary
Tests
Note
High Risk
Changes fork/restore orchestration and direct snapshot memory patching; incorrect markers or ack handling can fail forks or leave guests misconfigured, and the feature is limited to Firecracker standby→Running paths.
Overview
Adds optional named JSON mailboxes to instance and snapshot fork APIs so callers can inject per-fork configuration into guest memory before a forked standby VM resumes.
Fork requests accept a
mailboxesarray (name, guest-provided token, JSON payload, optionalwait_for_ack/ack_timeout_ms). The API layer maps these into domain types; Firecracker standby forks that target Running validate payloads (limits, duplicate names, JSON object shape) and patch matching markers in snapshot memory before restore. Whenwait_for_ackis set, the host injectsack_portand fails restore if the guest does not send a UDPstage=appliedack for that mailbox in time.lib/mailboxgains fork-specific layout/marker helpers and sharedFindMarker/WritePayloadAtused by resume-network and fork paths. Restore runs resume-network and fork-mailbox handoffs through a common post-resume hook; UDP ack parsing is stricter (field-based matching).Reviewed by Cursor Bugbot for commit f763c6e. Bugbot is set up for automated code reviews on this repo. Configure here.