[SPARK-56413][SPARK-56661][UDF][BUILD] Confine gRPC to a dedicated udf-worker-grpc module#56273
Closed
haiyangsun-db wants to merge 4 commits into
Closed
[SPARK-56413][SPARK-56661][UDF][BUILD] Confine gRPC to a dedicated udf-worker-grpc module#56273haiyangsun-db wants to merge 4 commits into
haiyangsun-db wants to merge 4 commits into
Conversation
cloud-fan
approved these changes
Jun 2, 2026
Contributor
|
thanks, merging to master/4.x! |
cloud-fan
pushed a commit
that referenced
this pull request
Jun 3, 2026
…f-worker-grpc module
This PR extracts the gRPC-based UDF worker transport into a new `udf/worker/grpc` Maven/SBT module, sibling to the existing `udf/worker/proto` and `udf/worker/core` modules, so that gRPC is no longer pulled onto the shared Spark classpath.
Concretely:
- **New module `spark-udf-worker-grpc`** — generates the gRPC service stubs (`UdfWorkerGrpc`) from the `.proto` definitions in `udf-worker-proto` (`compile-custom` / grpc-java only), and owns the gRPC runtime dependencies (`grpc-api`, `grpc-protobuf`, `grpc-stub`, plus `grpc-inprocess` for tests).
- **`udf-worker-proto`** now generates only protobuf-java message classes (dropped the grpc-java codegen goal and the `grpc-*` dependencies).
- **`udf-worker-core`** no longer depends on gRPC (the `grpc-inprocess` test dependency was removed).
- **`EchoProtocolSuite`** (the gRPC protocol test) moved from `udf-worker-core` to the new `udf-worker-grpc` module and re-packaged to `org.apache.spark.udf.worker.grpc`.
- Registered the module in the root `pom.xml` and in `project/SparkBuild.scala` (new `udfWorkerGrpc` project, `UDFWorkerGrpc` settings for grpc-stub-only codegen, and `UDFWorkerProto` restricted to message-only codegen).
- Regenerated `dev/deps/spark-deps-hadoop-3-hive-2.3`, which drops `grpc-api`, `grpc-protobuf`, `grpc-protobuf-lite`, `grpc-stub`, `proto-google-common-protos`, `animal-sniffer-annotations`, and `error_prone_annotations` from the assembly classpath.
Module dependency shape after this change:
```
udf-worker-proto (protobuf-java messages only)
^ ^
| |
core/catalyst/sql-core -- use message types + worker abstractions (NO gRPC)
|
udf-worker-core (worker abstractions, no gRPC)
^
|
udf-worker-grpc (gRPC service stubs + gRPC runtime -- confined here)
```
Introducing the language-agnostic UDF worker framework made `spark-udf-worker-proto`/`-core` compile dependencies of `core`, `catalyst`, and `sql/core`. Because the proto module carried the gRPC stack as compile-scope dependencies (needed to compile its generated gRPC service stubs), this dragged `grpc-api`, `grpc-protobuf{,-lite}`, `grpc-stub`, and `proto-google-common-protos` transitively onto the widely-shared Spark core/assembly classpath. Spark has historically kept gRPC isolated to Spark Connect (relocated/shaded) to avoid `io.grpc`/protobuf version clashes on that classpath.
No code on the runtime classpath actually uses the gRPC stubs yet (only `EchoProtocolSuite` did, a test). Confining gRPC to its own module removes the unnecessary footprint from `core`/`catalyst`/`sql-core` while keeping the framework's message types and worker abstractions available to them.
No. This is a build/module reorganization; the affected UDF worker framework is experimental and not yet consumed at runtime.
- Existing tests, relocated: `EchoProtocolSuite` now runs under `udf-worker-grpc`.
- Verified with SBT that `udf-worker-grpc/Test`, `udf-worker-core/Test`, `catalyst`, `core`, and `sql` compile, and confirmed the codegen split on disk (proto -> `generated-sources/protobuf/java` messages only; grpc -> `generated-sources/protobuf/grpc-java/UdfWorkerGrpc.java`).
- Regenerated and validated the dependency manifest via `./dev/test-dependencies.sh --replace-manifest`.
Yes
Closes #56273 from haiyangsun-db/SPARK-56661.
Authored-by: Haiyang Sun <haiyang.sun@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 13b526d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR extracts the gRPC-based UDF worker transport into a new
udf/worker/grpcMaven/SBT module, sibling to the existingudf/worker/protoandudf/worker/coremodules, so that gRPC is no longer pulled onto the shared Spark classpath.Concretely:
spark-udf-worker-grpc— generates the gRPC service stubs (UdfWorkerGrpc) from the.protodefinitions inudf-worker-proto(compile-custom/ grpc-java only), and owns the gRPC runtime dependencies (grpc-api,grpc-protobuf,grpc-stub, plusgrpc-inprocessfor tests).udf-worker-protonow generates only protobuf-java message classes (dropped the grpc-java codegen goal and thegrpc-*dependencies).udf-worker-coreno longer depends on gRPC (thegrpc-inprocesstest dependency was removed).EchoProtocolSuite(the gRPC protocol test) moved fromudf-worker-coreto the newudf-worker-grpcmodule and re-packaged toorg.apache.spark.udf.worker.grpc.pom.xmland inproject/SparkBuild.scala(newudfWorkerGrpcproject,UDFWorkerGrpcsettings for grpc-stub-only codegen, andUDFWorkerProtorestricted to message-only codegen).dev/deps/spark-deps-hadoop-3-hive-2.3, which dropsgrpc-api,grpc-protobuf,grpc-protobuf-lite,grpc-stub,proto-google-common-protos,animal-sniffer-annotations, anderror_prone_annotationsfrom the assembly classpath.Module dependency shape after this change:
Why are the changes needed?
Introducing the language-agnostic UDF worker framework made
spark-udf-worker-proto/-corecompile dependencies ofcore,catalyst, andsql/core. Because the proto module carried the gRPC stack as compile-scope dependencies (needed to compile its generated gRPC service stubs), this draggedgrpc-api,grpc-protobuf{,-lite},grpc-stub, andproto-google-common-protostransitively onto the widely-shared Spark core/assembly classpath. Spark has historically kept gRPC isolated to Spark Connect (relocated/shaded) to avoidio.grpc/protobuf version clashes on that classpath.No code on the runtime classpath actually uses the gRPC stubs yet (only
EchoProtocolSuitedid, a test). Confining gRPC to its own module removes the unnecessary footprint fromcore/catalyst/sql-corewhile keeping the framework's message types and worker abstractions available to them.Does this PR introduce any user-facing change?
No. This is a build/module reorganization; the affected UDF worker framework is experimental and not yet consumed at runtime.
How was this patch tested?
EchoProtocolSuitenow runs underudf-worker-grpc.udf-worker-grpc/Test,udf-worker-core/Test,catalyst,core, andsqlcompile, and confirmed the codegen split on disk (proto ->generated-sources/protobuf/javamessages only; grpc ->generated-sources/protobuf/grpc-java/UdfWorkerGrpc.java)../dev/test-dependencies.sh --replace-manifest.Was this patch authored or co-authored using generative AI tooling?
Yes