[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288
Closed
MaxGekk wants to merge 4 commits into
Closed
[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288MaxGekk wants to merge 4 commits into
MaxGekk wants to merge 4 commits into
Conversation
Member
Author
|
@stevomitric @uros-db Could you review this PR, please. |
uros-db
reviewed
Jun 3, 2026
uros-db
reviewed
Jun 3, 2026
uros-db
approved these changes
Jun 3, 2026
uros-db
left a comment
Contributor
There was a problem hiding this comment.
Solid, idiomatic, well-tested. Left a few nit comments, but otherwise LGTM!
Wire Cast to support CAST(<string> AS TIMESTAMP_NTZ(p)) and
CAST(<string> AS TIMESTAMP_LTZ(p)) for fractional-seconds precision p in
[7, 9], on both the interpreted and codegen paths and across LEGACY, ANSI
and TRY eval modes. Reuses the SPARK-57032 string->nanos parse helpers on
SparkDateTimeUtils, which already return a normalized TimestampNanosVal and
apply per-precision truncation.
- Add StringType -> Timestamp{NTZ,LTZ}NanosType arms to canCast/canAnsiCast.
- Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone (NTZ string
is zone-independent, mirroring micro TIMESTAMP_NTZ).
- Add interpreted castToTimestamp{LTZ,NTZ}Nanos and matching codegen,
dispatched with the precision taken from the target type. NTZ adopts
allowTimeZone = true to match the micro TIMESTAMP_NTZ string cast.
Tests cover success cases over p in [7, 9], ANSI parse errors, LEGACY/TRY
null on malformed input, and a flag-off FEATURE_NOT_ENABLED guard.
…estamps Add end-to-end golden-file coverage to cast.sql for casting strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p), mirroring the existing timestamp, timestamp_ntz and TIME cast checks: - Positive cases assert the result type via typeof (the reverse direction, nanos -> string rendering, is not wired yet; tracked under SPARK-57162). - Negative cases exercise the parse-error path: ANSI mode throws CAST_INVALID_INPUT, non-ANSI returns NULL. Golden files regenerated with SPARK_GENERATE_GOLDEN_FILES=1.
…r thrift ThriftServerQueryTestSuite failed on nonansi/cast.sql because a bare TIMESTAMP_NTZ(9)/TIMESTAMP_LTZ(9) result column cannot be mapped to a JDBC/Hive type name yet (nanos -> string serialization is out of scope, tracked under SPARK-57162). Wrap the negative cast checks in IS NULL so the result column is boolean; the ANSI parse-error path is unchanged.
…st import - Cast.scala: make the interpreted NTZ string parse pass allowTimeZone = true explicitly so it matches the codegen path (which must pass it since Scala default args are not visible from generated Java). - CastWithAnsiOffSuite: import foreachNanosPrecision instead of using the fully-qualified name inline, consistent with the other Cast suites.
4a1a084 to
63b5890
Compare
uros-b
pushed a commit
that referenced
this pull request
Jun 3, 2026
### What changes were proposed in this pull request? This PR wires `Cast` to support casting `StringType` to the nanosecond-capable timestamp types `TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)` with fractional-seconds precision `p` in `[7, 9]`, on both the interpreted and codegen paths and across all eval modes (`LEGACY`, `ANSI`, `TRY`): - `CAST(<string> AS TIMESTAMP_NTZ(p))` - `CAST(<string> AS TIMESTAMP_LTZ(p))` Concretely, in `Cast.scala`: - Add `StringType -> TimestampNTZNanosType(p)` / `TimestampLTZNanosType(p)` arms to `canCast` and `canAnsiCast`. Try-cast is covered automatically (`canTryCast` delegates to `canAnsiCast`, and `canUseLegacyCastForTryCast` already matches `(StringType, DatetimeType)`, which the nanos types extend). - Add `(StringType, TimestampLTZNanosType)` to `Cast.needsTimeZone`. The NTZ string is zone-independent, mirroring the micro `TIMESTAMP_NTZ` cast. - Add interpreted `castToTimestampLTZNanos` / `castToTimestampNTZNanos` and matching codegen, dispatched from `castInternal` / `nullSafeCastFunction` with the precision taken from the target type. The result is a `TimestampNanosVal` (or `null` in legacy/try mode on malformed input). - The NTZ cast adopts `allowTimeZone = true` to match the existing micro `TIMESTAMP_NTZ` string cast, and resolves the `TODO(SPARK-57032)` left on `stringToTimestampNTZNanosAnsi`. This reuses the parse entry points added in SPARK-57032 on `SparkDateTimeUtils` (inherited by `DateTimeUtils`), which already return a normalized `TimestampNanosVal` and apply per-precision truncation, so no separate normalization module is required for the string path. Existing preview gating is unchanged: `Cast.checkInputDataTypes` calls `TypeUtils.failUnsupportedDataType`, which throws `FEATURE_NOT_ENABLED` when `spark.sql.timestampNanosTypes.enabled` is off. ### Why are the changes needed? This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision). The logical types, the `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` SQL syntax, the physical row value `TimestampNanosVal`, and the string-to-nanos parse helpers all exist, but `Cast` had zero arms for the nanos types. As a result `CAST(s AS TIMESTAMP_NTZ(9))` failed type-check with `CAST_WITHOUT_SUGGESTION` even when the preview flag `spark.sql.timestampNanosTypes.enabled` was on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands. ### Does this PR introduce _any_ user-facing change? Yes, but only when the preview flag `spark.sql.timestampNanosTypes.enabled` is enabled (it defaults to off in production). With the flag on, `CAST(<string> AS TIMESTAMP_NTZ(p))` and `CAST(<string> AS TIMESTAMP_LTZ(p))` for `p` in `[7, 9]` now produce correct nanosecond values in `LEGACY`, `ANSI`, and `TRY` modes; previously they failed type-checking. With the flag off, the behavior is unchanged (`FEATURE_NOT_ENABLED`). Existing microsecond timestamp string casts are unchanged. ### How was this patch tested? - `CastSuiteBase`: success cases for both types over `p` in `[7, 9]` and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard asserting `FEATURE_NOT_ENABLED`. - `CastWithAnsiOnSuite`: malformed-input parse errors (`DateTimeException` / `CAST_INVALID_INPUT`). - `CastWithAnsiOffSuite` / `TryCastSuite`: malformed input returns `NULL`. - Golden-file checks added to `cast.sql` (regenerated with `SPARK_GENERATE_GOLDEN_FILES=1`): positive cases assert the result type via `typeof` (the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (and `NULL` in non-ANSI mode). Verified locally: ``` $ build/sbt 'catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *TryCastSuite' $ build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql' $ ./dev/scalastyle ``` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56288 from MaxGekk/nanos-cast-string. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Uros Bojanic <221401595+uros-b@users.noreply.github.com> (cherry picked from commit 7d0a8cd) Signed-off-by: Uros Bojanic <221401595+uros-b@users.noreply.github.com>
Member
Author
|
Thank you @uros-b! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR wires
Castto support castingStringTypeto the nanosecond-capable timestamp typesTimestampNTZNanosType(p)andTimestampLTZNanosType(p)with fractional-seconds precisionpin[7, 9], on both the interpreted and codegen paths and across all eval modes (LEGACY,ANSI,TRY):CAST(<string> AS TIMESTAMP_NTZ(p))CAST(<string> AS TIMESTAMP_LTZ(p))Concretely, in
Cast.scala:StringType -> TimestampNTZNanosType(p)/TimestampLTZNanosType(p)arms tocanCastandcanAnsiCast. Try-cast is covered automatically (canTryCastdelegates tocanAnsiCast, andcanUseLegacyCastForTryCastalready matches(StringType, DatetimeType), which the nanos types extend).(StringType, TimestampLTZNanosType)toCast.needsTimeZone. The NTZ string is zone-independent, mirroring the microTIMESTAMP_NTZcast.castToTimestampLTZNanos/castToTimestampNTZNanosand matching codegen, dispatched fromcastInternal/nullSafeCastFunctionwith the precision taken from the target type. The result is aTimestampNanosVal(ornullin legacy/try mode on malformed input).allowTimeZone = trueto match the existing microTIMESTAMP_NTZstring cast, and resolves theTODO(SPARK-57032)left onstringToTimestampNTZNanosAnsi.This reuses the parse entry points added in SPARK-57032 on
SparkDateTimeUtils(inherited byDateTimeUtils), which already return a normalizedTimestampNanosValand apply per-precision truncation, so no separate normalization module is required for the string path.Existing preview gating is unchanged:
Cast.checkInputDataTypescallsTypeUtils.failUnsupportedDataType, which throwsFEATURE_NOT_ENABLEDwhenspark.sql.timestampNanosTypes.enabledis off.Why are the changes needed?
This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).
The logical types, the
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)SQL syntax, the physical row valueTimestampNanosVal, and the string-to-nanos parse helpers all exist, butCasthad zero arms for the nanos types. As a resultCAST(s AS TIMESTAMP_NTZ(9))failed type-check withCAST_WITHOUT_SUGGESTIONeven when the preview flagspark.sql.timestampNanosTypes.enabledwas on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands.Does this PR introduce any user-facing change?
Yes, but only when the preview flag
spark.sql.timestampNanosTypes.enabledis enabled (it defaults to off in production). With the flag on,CAST(<string> AS TIMESTAMP_NTZ(p))andCAST(<string> AS TIMESTAMP_LTZ(p))forpin[7, 9]now produce correct nanosecond values inLEGACY,ANSI, andTRYmodes; previously they failed type-checking. With the flag off, the behavior is unchanged (FEATURE_NOT_ENABLED). Existing microsecond timestamp string casts are unchanged.How was this patch tested?
CastSuiteBase: success cases for both types overpin[7, 9]and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard assertingFEATURE_NOT_ENABLED.CastWithAnsiOnSuite: malformed-input parse errors (DateTimeException/CAST_INVALID_INPUT).CastWithAnsiOffSuite/TryCastSuite: malformed input returnsNULL.cast.sql(regenerated withSPARK_GENERATE_GOLDEN_FILES=1): positive cases assert the result type viatypeof(the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (andNULLin non-ANSI mode).Verified locally:
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)