feat(server): support out-of-tree compute drivers via --compute-driver-socket#1703
Open
st-gr wants to merge 5 commits into
Open
feat(server): support out-of-tree compute drivers via --compute-driver-socket#1703st-gr wants to merge 5 commits into
st-gr wants to merge 5 commits into
Conversation
Carries the UDS path supplied by --compute-driver-socket. Drops Copy from the enum derive (PathBuf is not Copy); existing callers use Clone or owned values. FromStr accepts 'external:<path>' and rejects bare 'external' with a message pointing at the CLI flag. Signed-off-by: st-gr <38470677+st-gr@users.noreply.github.com>
Adds an opt-in `--compute-driver-socket=PATH` flag (env `OPENSHELL_COMPUTE_DRIVER_SOCKET`) on the gateway's `RunArgs`. When set, the gateway pins `ComputeDriverKind::External(<path>)` and skips both the `--drivers` list and the auto-detection probe. This lets out-of-tree driver binaries (Kyma, custom backends) connect to a stock gateway without a rebuild. `effective_single_driver` and the `Config.compute_drivers` payload both honour the new flag so pre-runtime checks and the runtime factory dispatch agree on the configured driver. The companion dispatch arm in `lib.rs::build_compute_runtime` is wired in the follow-up commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: st-gr <38470677+st-gr@users.noreply.github.com>
`build_compute_runtime` now connects a tonic `Channel` to the configured Unix domain socket using `hyper-util`'s `TokioIo` connector and wraps it in `RemoteComputeDriver` -- the same proxy used by the VM driver. Replaces the placeholder `External(_) => Err(...)` arm. Adds two helpers in `compute/mod.rs`: * `connect_external_compute_driver(socket_path)` -- a small tonic-Endpoint + tower::service_fn + UnixStream connector, parallel to the one in `compute::vm` but with no VM-specific logging or capability probing. Out-of-tree drivers manage their own readiness; the gateway just dials. * `ComputeRuntime::new_remote_external(channel, ...)` -- mirrors `new_remote_vm` but takes no `ManagedDriverProcess`. The external driver's lifecycle is the operator's responsibility (systemd unit, sidecar container, etc.). Smoke-tested: `--compute-driver-socket /tmp/nonexistent.sock` now starts the gateway, logs "Connecting to external compute driver", and fails with a clear "failed to connect to external compute driver socket '<path>': transport error" message that points at the new arm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: st-gr <38470677+st-gr@users.noreply.github.com>
…uteDriverKind Adds three short inline comments at the points where the non-`Copy`-ness of `ComputeDriverKind` becomes visible to readers, so reviewers don't have to ask "why was Copy removed?" or "why clone() instead of *deref?": - `crates/openshell-core/src/config.rs`: comment above the enum derive explaining the `Copy` and `const fn as_str(self)` revert was forced by the `External(PathBuf)` variant, and that all in-tree call sites already handle `Clone`. - `crates/openshell-server/src/cli.rs`: comment at the `effective_single_driver` match arm noting why we clone instead of deref. - `crates/openshell-server/src/lib.rs`: same comment shape at the `configured_compute_driver` match arm. Squash into the corresponding earlier commit if preferred — split out as its own commit to make the rationale grep-able. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: st-gr <38470677+st-gr@users.noreply.github.com>
Adds a row to the Runtime Summary table describing the new
`--compute-driver-socket=<path>` flag and how the External arm relates
to the existing in-tree drivers (operator-managed driver process,
Unix domain socket, identical proto contract to the VM driver, same
trust boundary).
Keeps the rest of the document untouched — the per-runtime
implementation notes (e.g. which driver crate README to read) are
intentionally NOT extended for `External` because external drivers
ship their own documentation in their own repos. The reference
implementation lives at:
https://github.com/st-gr/openshell-driver-kyma
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: st-gr <38470677+st-gr@users.noreply.github.com>
|
All contributors have signed the DCO ✍️ ✅ |
Author
|
I have read the DCO document and I hereby sign the DCO. |
Author
|
recheck |
Collaborator
|
/ok to test 8689967 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in
--compute-driver-socket=<path>flag (envOPENSHELL_COMPUTE_DRIVER_SOCKET) to the gateway. When set, the gateway dispatches sandbox lifecycle to an out-of-tree compute driver speaking the existingcompute_driver.protocontract over a Unix domain socket, instead of one of the four built-in drivers (Kubernetes / Podman / Docker / VM).Related Issue
Supersedes the closed draft #1604 (same author, same patch shape — re-opened as a non-draft after rebasing onto current
main, with all commits DCO-signed and st-gr-attributed). Same direction ascheese-head's vouch in #1345 ("extend or provide new out-of-tree compute drivers to OpenShell").No upstream tracking issue filed — happy to file one if reviewers prefer.
Changes
crates/openshell-core/src/config.rs: addsExternal(PathBuf)variant toComputeDriverKind. Drops theCopyderive andconst fn as_str(self)(both incompatible withPathBuf).FromStracceptsexternal:<path>(case-insensitive prefix) and rejects bareexternalwith a message pointing at the CLI flag.crates/openshell-server/src/cli.rs: adds--compute-driver-socket=<path>flag (envOPENSHELL_COMPUTE_DRIVER_SOCKET) onRunArgs. When set, pinsComputeDriverKind::External(<path>)and skips both--driversand the auto-detection probe.crates/openshell-server/src/config_file.rs: surfaces the new field through the config-file path.crates/openshell-server/src/lib.rs/crates/openshell-server/src/compute/mod.rs: inbuild_compute_runtime, theExternal(<path>)arm connects atonic::Channelto the UDS viahyper-util::TokioIoand wraps it inRemoteComputeDriver— the same proxy used by the in-tree VM driver.architecture/compute-runtimes.md: adds an "External" row to the Runtime Summary table, describing the trust boundary (operator-owned UDS file permissions) and the activation flag.Testing
mise run pre-commitpasses — not run end-to-end (misenot on author's dev environment), but the equivalent rust pieces verified independently:cargo fmt --all -- --checkclean;cargo clippy --no-deps -p openshell-core -p openshell-server --all-targets -- -D warningsclean.compute_driver_kind_*tests incrates/openshell-core/src/config.rs::testscover the new variant: parsesexternal:<path>, rejects bareexternaland empty path, displays asexternal:<path>, round-trips throughFromStr+Display, and is case-insensitive on the prefix. Plus 3 CLI-arg tests for the new flag (presence, override-of-drivers, env-var binding).Externaldriver in CI would require shipping a stub driver binary; deferring until at least one in-tree consumer wants to test against it. The patch is exercised in production at SAP via thest-gr/openshell-driver-kymareference driver running against a Gardener-managed Kyma cluster since 2026-05-28 (sandbox CR creation,openshell sandbox exec,openshell sandbox upload/downloadall work).Checklist
feat(core):,feat(server):,docs(core,server):,docs(arch):)architecture/compute-runtimes.md::Runtime Summary. Per-runtime implementation-notes section is intentionally NOT extended forExternalbecause external drivers ship their own documentation; the reference implementation lives atst-gr/openshell-driver-kyma.Notes for reviewers
Out of scope (intentional):
External(PathBuf)+ UDS dispatch). Operators bring their own driver process and point--compute-driver-socketat its socket path.compute_driver.protochanges. The existing protocol is reused verbatim — the same one already used by the in-tree VM driver to talk to its tonic peer.--drivers vmposture.hyper-utilis already in the workspaceCargo.toml; the patch reuses it via the sameTokioIoconnector path the VM driver uses.Operator context:
The forked gateway image at
ghcr.io/st-gr/openshell-gateway(carrying these patches plus an in-treeDockerfile.gateway) has been driving production-shaped Kubernetes clusters with thest-gr/openshell-driver-kymacompute driver since 2026-05-28. The flag has needed zero changes since it landed — the API shape is stable. Merging this PR lets the driver consume the unmodified upstream gateway image, retiring the fork.