[FLINK-39136][filesystems] Bump google-cloud-storage to 2.68.0 in flink-gs-fs-hadoop#28286
Conversation
| - org.apache.httpcomponents:httpclient:4.5.13 | ||
| - org.apache.httpcomponents:httpcore:4.4.14 | ||
| - org.conscrypt:conscrypt-openjdk-uber:2.5.2 | ||
| - org.jspecify:jspecify:1.0.0 |
There was a problem hiding this comment.
do we need it?
AFAIK it is only annotations for compile time checks
There was a problem hiding this comment.
Excluded it from the shaded jar and dropped it from the NOTICE, consistent with the other annotation-only dependencies already excluded in this module (checker-qual, error_prone_annotations, j2objc-annotations).
One nuance: jspecify's annotations are actually @Retention(RUNTIME) rather than CLASS/SOURCE, but they're static-analysis nullness markers that nothing in the gcs-connector / google-cloud-storage stack reads reflectively at runtime, so excluding them is safe. Verified the module still builds, the unit tests pass, and the newly added real-GCS RecoverableWriter/FileSystem ITCases pass against a live bucket.
…nk-gs-fs-hadoop The GCS file system bundled google-cloud-storage 2.29.1, which throws a NullPointerException instead of retrying certain GCS 503 Service Unavailable errors during resumable uploads, breaking checkpointing for jobs writing to gs:// via a RecoverableWriter. The upstream fix is in googleapis/java-storage#2987. Bump google-cloud-storage 2.29.1 -> 2.68.0 and the matching grpc artifacts 1.59.1 -> 1.81.0, regenerate the bundled-dependency NOTICE accordingly, add the bundled license file for the newly bundled stax2-api, and update the version links in the GCS filesystem documentation. The newly pulled-in jspecify dependency provides only static-analysis nullness annotations that are not needed at runtime, so it is excluded from the shaded jar like the other annotation-only dependencies (checker-qual, error_prone_annotations, j2objc-annotations). Generated-by: Claude Code (Opus 4.8)
0d2fd30 to
bb16e08
Compare
…em against a real bucket
Add integration tests that run against a real GCS bucket, mirroring the existing
S3 filesystem integration tests. They are skipped unless a bucket is configured
via the IT_CASE_GCS_BUCKET environment variable; authentication uses Application
Default Credentials (GOOGLE_APPLICATION_CREDENTIALS).
- GSTestCredentials gates the tests on IT_CASE_GCS_BUCKET.
- GSFileSystemBehaviorITCase runs the shared FileSystemBehaviorTestSuite.
- GSRecoverableWriterITCase exercises the write / persist (checkpoint) /
recover / commit flow that backs exactly-once FileSink checkpointing on GCS.
Generated-by: Claude Code (Opus 4.8)
bb16e08 to
d36b8b6
Compare
What is the purpose of the change
flink-gs-fs-hadoopbundledgoogle-cloud-storage2.29.1, which throws aNullPointerExceptioninstead of retrying certain GCS503 Service Unavailableerrors during resumable uploads. This breaks checkpointing for jobs that write to
gs://through aRecoverableWriter(e.g. theFileSink). The upstream fix is ingoogleapis/java-storage#2987,
which is included in newer releases of the library.
This PR takes over the stale #27679 (thanks @jonchase) and completes it by also
regenerating the bundled-dependency
NOTICE, which is required because this moduleshades all of its dependencies into the plugin jar.
Brief change log
google-cloud-storage2.29.1 -> 2.68.0 (latest stable) and the matchinggrpc artifacts 1.59.1 -> 1.81.0 in
flink-gs-fs-hadoop/pom.xml.META-INF/NOTICEto match the bundled dependency set produced by theshade plugin (verified to match exactly via
tools/ci/license_check.sh).org.codehaus.woodstox:stax2-api.google-cloud-storageversion link in the GCS filesystem docs(English + Chinese).
Verifying this change
This change is covered by the existing
flink-gs-fs-hadooptests (236 tests pass,including
GSRecoverableWriterTest, the committer/serializer tests, and theLocalStorageHelper-basedGSBlobStorageImplTest, confirming the pinnedgoogle-cloud-niotest dependency stays compatible).In addition, the license/NOTICE was validated locally with
tools/ci/license_check.sh(no severe issues; theNOTICEmatches the 98 bundleddependencies exactly) and the
dependency-convergenceandban-unsafe-jacksonenforcers pass under
-Pcheck-convergence.Finally, the upgraded SDK was manually verified against a real GCS bucket: a
RecoverableWriterwrote a multi-chunk object, took a checkpoint viapersist(),recovered from that checkpoint via
recover(), committed, and the committed objectwas read back and asserted byte-for-byte (exactly-once), with no NPE/503 failure.
Does this pull request potentially affect one of the following parts:
google-cloud-storageand grpc, with correspondingNOTICE/license updates)@Public(Evolving): noCheckpointing, Kubernetes/Yarn, ZooKeeper: yes (improves reliability of GCS
RecoverableWritercheckpoint/recovery by fixing 503 retry handling)Documentation
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.8)