Skip to content

[SPARK-56413][SPARK-56661][UDF][BUILD] Confine gRPC to a dedicated udf-worker-grpc module#56273

Closed
haiyangsun-db wants to merge 4 commits into
apache:masterfrom
haiyangsun-db:SPARK-56661
Closed

[SPARK-56413][SPARK-56661][UDF][BUILD] Confine gRPC to a dedicated udf-worker-grpc module#56273
haiyangsun-db wants to merge 4 commits into
apache:masterfrom
haiyangsun-db:SPARK-56661

Conversation

@haiyangsun-db

@haiyangsun-db haiyangsun-db commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR extracts the gRPC-based UDF worker transport into a new udf/worker/grpc Maven/SBT module, sibling to the existing udf/worker/proto and udf/worker/core modules, so that gRPC is no longer pulled onto the shared Spark classpath.

Concretely:

  • New module spark-udf-worker-grpc — generates the gRPC service stubs (UdfWorkerGrpc) from the .proto definitions in udf-worker-proto (compile-custom / grpc-java only), and owns the gRPC runtime dependencies (grpc-api, grpc-protobuf, grpc-stub, plus grpc-inprocess for tests).
  • udf-worker-proto now generates only protobuf-java message classes (dropped the grpc-java codegen goal and the grpc-* dependencies).
  • udf-worker-core no longer depends on gRPC (the grpc-inprocess test dependency was removed).
  • EchoProtocolSuite (the gRPC protocol test) moved from udf-worker-core to the new udf-worker-grpc module and re-packaged to org.apache.spark.udf.worker.grpc.
  • Registered the module in the root pom.xml and in project/SparkBuild.scala (new udfWorkerGrpc project, UDFWorkerGrpc settings for grpc-stub-only codegen, and UDFWorkerProto restricted to message-only codegen).
  • Regenerated dev/deps/spark-deps-hadoop-3-hive-2.3, which drops grpc-api, grpc-protobuf, grpc-protobuf-lite, grpc-stub, proto-google-common-protos, animal-sniffer-annotations, and error_prone_annotations from the assembly classpath.

Module dependency shape after this change:

udf-worker-proto  (protobuf-java messages only)
      ^   ^
      |   |
core/catalyst/sql-core -- use message types + worker abstractions (NO gRPC)
      |
udf-worker-core   (worker abstractions, no gRPC)
      ^
      |
udf-worker-grpc   (gRPC service stubs + gRPC runtime -- confined here)

Why are the changes needed?

Introducing the language-agnostic UDF worker framework made spark-udf-worker-proto/-core compile dependencies of core, catalyst, and sql/core. Because the proto module carried the gRPC stack as compile-scope dependencies (needed to compile its generated gRPC service stubs), this dragged grpc-api, grpc-protobuf{,-lite}, grpc-stub, and proto-google-common-protos transitively onto the widely-shared Spark core/assembly classpath. Spark has historically kept gRPC isolated to Spark Connect (relocated/shaded) to avoid io.grpc/protobuf version clashes on that classpath.

No code on the runtime classpath actually uses the gRPC stubs yet (only EchoProtocolSuite did, a test). Confining gRPC to its own module removes the unnecessary footprint from core/catalyst/sql-core while keeping the framework's message types and worker abstractions available to them.

Does this PR introduce any user-facing change?

No. This is a build/module reorganization; the affected UDF worker framework is experimental and not yet consumed at runtime.

How was this patch tested?

  • Existing tests, relocated: EchoProtocolSuite now runs under udf-worker-grpc.
  • Verified with SBT that udf-worker-grpc/Test, udf-worker-core/Test, catalyst, core, and sql compile, and confirmed the codegen split on disk (proto -> generated-sources/protobuf/java messages only; grpc -> generated-sources/protobuf/grpc-java/UdfWorkerGrpc.java).
  • Regenerated and validated the dependency manifest via ./dev/test-dependencies.sh --replace-manifest.

Was this patch authored or co-authored using generative AI tooling?

Yes

@haiyangsun-db haiyangsun-db changed the title [SPARK-56661] Fix grpc dep - separate test-only grpc away from udf core module. [SPARK-56413][SPARK-56661][UDF][BUILD] Confine gRPC to a dedicated udf-worker-grpc module Jun 2, 2026
@cloud-fan

Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x!

@cloud-fan cloud-fan closed this in 13b526d Jun 3, 2026
cloud-fan pushed a commit that referenced this pull request Jun 3, 2026
…f-worker-grpc module

This PR extracts the gRPC-based UDF worker transport into a new `udf/worker/grpc` Maven/SBT module, sibling to the existing `udf/worker/proto` and `udf/worker/core` modules, so that gRPC is no longer pulled onto the shared Spark classpath.

Concretely:

- **New module `spark-udf-worker-grpc`** — generates the gRPC service stubs (`UdfWorkerGrpc`) from the `.proto` definitions in `udf-worker-proto` (`compile-custom` / grpc-java only), and owns the gRPC runtime dependencies (`grpc-api`, `grpc-protobuf`, `grpc-stub`, plus `grpc-inprocess` for tests).
- **`udf-worker-proto`** now generates only protobuf-java message classes (dropped the grpc-java codegen goal and the `grpc-*` dependencies).
- **`udf-worker-core`** no longer depends on gRPC (the `grpc-inprocess` test dependency was removed).
- **`EchoProtocolSuite`** (the gRPC protocol test) moved from `udf-worker-core` to the new `udf-worker-grpc` module and re-packaged to `org.apache.spark.udf.worker.grpc`.
- Registered the module in the root `pom.xml` and in `project/SparkBuild.scala` (new `udfWorkerGrpc` project, `UDFWorkerGrpc` settings for grpc-stub-only codegen, and `UDFWorkerProto` restricted to message-only codegen).
- Regenerated `dev/deps/spark-deps-hadoop-3-hive-2.3`, which drops `grpc-api`, `grpc-protobuf`, `grpc-protobuf-lite`, `grpc-stub`, `proto-google-common-protos`, `animal-sniffer-annotations`, and `error_prone_annotations` from the assembly classpath.

Module dependency shape after this change:

```
udf-worker-proto  (protobuf-java messages only)
      ^   ^
      |   |
core/catalyst/sql-core -- use message types + worker abstractions (NO gRPC)
      |
udf-worker-core   (worker abstractions, no gRPC)
      ^
      |
udf-worker-grpc   (gRPC service stubs + gRPC runtime -- confined here)
```

Introducing the language-agnostic UDF worker framework made `spark-udf-worker-proto`/`-core` compile dependencies of `core`, `catalyst`, and `sql/core`. Because the proto module carried the gRPC stack as compile-scope dependencies (needed to compile its generated gRPC service stubs), this dragged `grpc-api`, `grpc-protobuf{,-lite}`, `grpc-stub`, and `proto-google-common-protos` transitively onto the widely-shared Spark core/assembly classpath. Spark has historically kept gRPC isolated to Spark Connect (relocated/shaded) to avoid `io.grpc`/protobuf version clashes on that classpath.

No code on the runtime classpath actually uses the gRPC stubs yet (only `EchoProtocolSuite` did, a test). Confining gRPC to its own module removes the unnecessary footprint from `core`/`catalyst`/`sql-core` while keeping the framework's message types and worker abstractions available to them.

No. This is a build/module reorganization; the affected UDF worker framework is experimental and not yet consumed at runtime.

- Existing tests, relocated: `EchoProtocolSuite` now runs under `udf-worker-grpc`.
- Verified with SBT that `udf-worker-grpc/Test`, `udf-worker-core/Test`, `catalyst`, `core`, and `sql` compile, and confirmed the codegen split on disk (proto -> `generated-sources/protobuf/java` messages only; grpc -> `generated-sources/protobuf/grpc-java/UdfWorkerGrpc.java`).
- Regenerated and validated the dependency manifest via `./dev/test-dependencies.sh --replace-manifest`.

Yes

Closes #56273 from haiyangsun-db/SPARK-56661.

Authored-by: Haiyang Sun <haiyang.sun@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 13b526d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants