[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p) by MaxGekk · Pull Request #56288 · apache/spark

MaxGekk · 2026-06-02T20:46:30Z

What changes were proposed in this pull request?

This PR wires Cast to support casting StringType to the nanosecond-capable timestamp types TimestampNTZNanosType(p) and TimestampLTZNanosType(p) with fractional-seconds precision p in [7, 9], on both the interpreted and codegen paths and across all eval modes (LEGACY, ANSI, TRY):

CAST(<string> AS TIMESTAMP_NTZ(p))
CAST(<string> AS TIMESTAMP_LTZ(p))

Concretely, in Cast.scala:

Add StringType -> TimestampNTZNanosType(p) / TimestampLTZNanosType(p) arms to canCast and canAnsiCast. Try-cast is covered automatically (canTryCast delegates to canAnsiCast, and canUseLegacyCastForTryCast already matches (StringType, DatetimeType), which the nanos types extend).
Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone. The NTZ string is zone-independent, mirroring the micro TIMESTAMP_NTZ cast.
Add interpreted castToTimestampLTZNanos / castToTimestampNTZNanos and matching codegen, dispatched from castInternal / nullSafeCastFunction with the precision taken from the target type. The result is a TimestampNanosVal (or null in legacy/try mode on malformed input).
The NTZ cast adopts allowTimeZone = true to match the existing micro TIMESTAMP_NTZ string cast, and resolves the TODO(SPARK-57032) left on stringToTimestampNTZNanosAnsi.

This reuses the parse entry points added in SPARK-57032 on SparkDateTimeUtils (inherited by DateTimeUtils), which already return a normalized TimestampNanosVal and apply per-precision truncation, so no separate normalization module is required for the string path.

Existing preview gating is unchanged: Cast.checkInputDataTypes calls TypeUtils.failUnsupportedDataType, which throws FEATURE_NOT_ENABLED when spark.sql.timestampNanosTypes.enabled is off.

Why are the changes needed?

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).

The logical types, the TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) SQL syntax, the physical row value TimestampNanosVal, and the string-to-nanos parse helpers all exist, but Cast had zero arms for the nanos types. As a result CAST(s AS TIMESTAMP_NTZ(9)) failed type-check with CAST_WITHOUT_SUGGESTION even when the preview flag spark.sql.timestampNanosTypes.enabled was on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands.

Does this PR introduce any user-facing change?

Yes, but only when the preview flag spark.sql.timestampNanosTypes.enabled is enabled (it defaults to off in production). With the flag on, CAST(<string> AS TIMESTAMP_NTZ(p)) and CAST(<string> AS TIMESTAMP_LTZ(p)) for p in [7, 9] now produce correct nanosecond values in LEGACY, ANSI, and TRY modes; previously they failed type-checking. With the flag off, the behavior is unchanged (FEATURE_NOT_ENABLED). Existing microsecond timestamp string casts are unchanged.

How was this patch tested?

CastSuiteBase: success cases for both types over p in [7, 9] and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard asserting FEATURE_NOT_ENABLED.
CastWithAnsiOnSuite: malformed-input parse errors (DateTimeException / CAST_INVALID_INPUT).
CastWithAnsiOffSuite / TryCastSuite: malformed input returns NULL.
Golden-file checks added to cast.sql (regenerated with SPARK_GENERATE_GOLDEN_FILES=1): positive cases assert the result type via typeof (the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (and NULL in non-ANSI mode).

Verified locally:

$ build/sbt 'catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *TryCastSuite'
$ build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql'
$ ./dev/scalastyle

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk · 2026-06-03T08:14:35Z

@stevomitric @uros-db Could you review this PR, please.

uros-db

Solid, idiomatic, well-tested. Left a few nit comments, but otherwise LGTM!

Wire Cast to support CAST(<string> AS TIMESTAMP_NTZ(p)) and CAST(<string> AS TIMESTAMP_LTZ(p)) for fractional-seconds precision p in [7, 9], on both the interpreted and codegen paths and across LEGACY, ANSI and TRY eval modes. Reuses the SPARK-57032 string->nanos parse helpers on SparkDateTimeUtils, which already return a normalized TimestampNanosVal and apply per-precision truncation. - Add StringType -> Timestamp{NTZ,LTZ}NanosType arms to canCast/canAnsiCast. - Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone (NTZ string is zone-independent, mirroring micro TIMESTAMP_NTZ). - Add interpreted castToTimestamp{LTZ,NTZ}Nanos and matching codegen, dispatched with the precision taken from the target type. NTZ adopts allowTimeZone = true to match the micro TIMESTAMP_NTZ string cast. Tests cover success cases over p in [7, 9], ANSI parse errors, LEGACY/TRY null on malformed input, and a flag-off FEATURE_NOT_ENABLED guard.

…estamps Add end-to-end golden-file coverage to cast.sql for casting strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p), mirroring the existing timestamp, timestamp_ntz and TIME cast checks: - Positive cases assert the result type via typeof (the reverse direction, nanos -> string rendering, is not wired yet; tracked under SPARK-57162). - Negative cases exercise the parse-error path: ANSI mode throws CAST_INVALID_INPUT, non-ANSI returns NULL. Golden files regenerated with SPARK_GENERATE_GOLDEN_FILES=1.

…r thrift ThriftServerQueryTestSuite failed on nonansi/cast.sql because a bare TIMESTAMP_NTZ(9)/TIMESTAMP_LTZ(9) result column cannot be mapped to a JDBC/Hive type name yet (nanos -> string serialization is out of scope, tracked under SPARK-57162). Wrap the negative cast checks in IS NULL so the result column is boolean; the ANSI parse-error path is unchanged.

…st import - Cast.scala: make the interpreted NTZ string parse pass allowTimeZone = true explicitly so it matches the codegen path (which must pass it since Scala default args are not visible from generated Java). - CastWithAnsiOffSuite: import foreachNanosPrecision instead of using the fully-qualified name inline, consistent with the other Cast suites.

stevomitric

LGTM.

### What changes were proposed in this pull request? This PR wires `Cast` to support casting `StringType` to the nanosecond-capable timestamp types `TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)` with fractional-seconds precision `p` in `[7, 9]`, on both the interpreted and codegen paths and across all eval modes (`LEGACY`, `ANSI`, `TRY`): - `CAST(<string> AS TIMESTAMP_NTZ(p))` - `CAST(<string> AS TIMESTAMP_LTZ(p))` Concretely, in `Cast.scala`: - Add `StringType -> TimestampNTZNanosType(p)` / `TimestampLTZNanosType(p)` arms to `canCast` and `canAnsiCast`. Try-cast is covered automatically (`canTryCast` delegates to `canAnsiCast`, and `canUseLegacyCastForTryCast` already matches `(StringType, DatetimeType)`, which the nanos types extend). - Add `(StringType, TimestampLTZNanosType)` to `Cast.needsTimeZone`. The NTZ string is zone-independent, mirroring the micro `TIMESTAMP_NTZ` cast. - Add interpreted `castToTimestampLTZNanos` / `castToTimestampNTZNanos` and matching codegen, dispatched from `castInternal` / `nullSafeCastFunction` with the precision taken from the target type. The result is a `TimestampNanosVal` (or `null` in legacy/try mode on malformed input). - The NTZ cast adopts `allowTimeZone = true` to match the existing micro `TIMESTAMP_NTZ` string cast, and resolves the `TODO(SPARK-57032)` left on `stringToTimestampNTZNanosAnsi`. This reuses the parse entry points added in SPARK-57032 on `SparkDateTimeUtils` (inherited by `DateTimeUtils`), which already return a normalized `TimestampNanosVal` and apply per-precision truncation, so no separate normalization module is required for the string path. Existing preview gating is unchanged: `Cast.checkInputDataTypes` calls `TypeUtils.failUnsupportedDataType`, which throws `FEATURE_NOT_ENABLED` when `spark.sql.timestampNanosTypes.enabled` is off. ### Why are the changes needed? This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision). The logical types, the `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` SQL syntax, the physical row value `TimestampNanosVal`, and the string-to-nanos parse helpers all exist, but `Cast` had zero arms for the nanos types. As a result `CAST(s AS TIMESTAMP_NTZ(9))` failed type-check with `CAST_WITHOUT_SUGGESTION` even when the preview flag `spark.sql.timestampNanosTypes.enabled` was on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands. ### Does this PR introduce _any_ user-facing change? Yes, but only when the preview flag `spark.sql.timestampNanosTypes.enabled` is enabled (it defaults to off in production). With the flag on, `CAST(<string> AS TIMESTAMP_NTZ(p))` and `CAST(<string> AS TIMESTAMP_LTZ(p))` for `p` in `[7, 9]` now produce correct nanosecond values in `LEGACY`, `ANSI`, and `TRY` modes; previously they failed type-checking. With the flag off, the behavior is unchanged (`FEATURE_NOT_ENABLED`). Existing microsecond timestamp string casts are unchanged. ### How was this patch tested? - `CastSuiteBase`: success cases for both types over `p` in `[7, 9]` and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard asserting `FEATURE_NOT_ENABLED`. - `CastWithAnsiOnSuite`: malformed-input parse errors (`DateTimeException` / `CAST_INVALID_INPUT`). - `CastWithAnsiOffSuite` / `TryCastSuite`: malformed input returns `NULL`. - Golden-file checks added to `cast.sql` (regenerated with `SPARK_GENERATE_GOLDEN_FILES=1`): positive cases assert the result type via `typeof` (the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (and `NULL` in non-ANSI mode). Verified locally: ``` $ build/sbt 'catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *TryCastSuite' $ build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql' $ ./dev/scalastyle ``` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56288 from MaxGekk/nanos-cast-string. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Uros Bojanic <221401595+uros-b@users.noreply.github.com> (cherry picked from commit 7d0a8cd) Signed-off-by: Uros Bojanic <221401595+uros-b@users.noreply.github.com>

MaxGekk · 2026-06-03T13:11:41Z

Thank you @uros-b!

uros-db reviewed Jun 3, 2026

View reviewed changes

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala

uros-db reviewed Jun 3, 2026

View reviewed changes

Comment thread ...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala Outdated

uros-db approved these changes Jun 3, 2026

View reviewed changes

MaxGekk added 4 commits June 3, 2026 10:33

MaxGekk force-pushed the nanos-cast-string branch from 4a1a084 to 63b5890 Compare June 3, 2026 08:34

stevomitric approved these changes Jun 3, 2026

View reviewed changes

uros-b closed this in 7d0a8cd Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288

[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:nanos-cast-string

MaxGekk commented Jun 2, 2026 •

edited

Loading

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

uros-db left a comment

Uh oh!

stevomitric left a comment

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MaxGekk commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

uros-db left a comment

Choose a reason for hiding this comment

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MaxGekk commented Jun 2, 2026 •

edited

Loading