Skip to content

[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288

Closed
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:nanos-cast-string
Closed

[SPARK-57211][SQL] Cast strings to TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)#56288
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:nanos-cast-string

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 2, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR wires Cast to support casting StringType to the nanosecond-capable timestamp types TimestampNTZNanosType(p) and TimestampLTZNanosType(p) with fractional-seconds precision p in [7, 9], on both the interpreted and codegen paths and across all eval modes (LEGACY, ANSI, TRY):

  • CAST(<string> AS TIMESTAMP_NTZ(p))
  • CAST(<string> AS TIMESTAMP_LTZ(p))

Concretely, in Cast.scala:

  • Add StringType -> TimestampNTZNanosType(p) / TimestampLTZNanosType(p) arms to canCast and canAnsiCast. Try-cast is covered automatically (canTryCast delegates to canAnsiCast, and canUseLegacyCastForTryCast already matches (StringType, DatetimeType), which the nanos types extend).
  • Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone. The NTZ string is zone-independent, mirroring the micro TIMESTAMP_NTZ cast.
  • Add interpreted castToTimestampLTZNanos / castToTimestampNTZNanos and matching codegen, dispatched from castInternal / nullSafeCastFunction with the precision taken from the target type. The result is a TimestampNanosVal (or null in legacy/try mode on malformed input).
  • The NTZ cast adopts allowTimeZone = true to match the existing micro TIMESTAMP_NTZ string cast, and resolves the TODO(SPARK-57032) left on stringToTimestampNTZNanosAnsi.

This reuses the parse entry points added in SPARK-57032 on SparkDateTimeUtils (inherited by DateTimeUtils), which already return a normalized TimestampNanosVal and apply per-precision truncation, so no separate normalization module is required for the string path.

Existing preview gating is unchanged: Cast.checkInputDataTypes calls TypeUtils.failUnsupportedDataType, which throws FEATURE_NOT_ENABLED when spark.sql.timestampNanosTypes.enabled is off.

Why are the changes needed?

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).

The logical types, the TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) SQL syntax, the physical row value TimestampNanosVal, and the string-to-nanos parse helpers all exist, but Cast had zero arms for the nanos types. As a result CAST(s AS TIMESTAMP_NTZ(9)) failed type-check with CAST_WITHOUT_SUGGESTION even when the preview flag spark.sql.timestampNanosTypes.enabled was on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands.

Does this PR introduce any user-facing change?

Yes, but only when the preview flag spark.sql.timestampNanosTypes.enabled is enabled (it defaults to off in production). With the flag on, CAST(<string> AS TIMESTAMP_NTZ(p)) and CAST(<string> AS TIMESTAMP_LTZ(p)) for p in [7, 9] now produce correct nanosecond values in LEGACY, ANSI, and TRY modes; previously they failed type-checking. With the flag off, the behavior is unchanged (FEATURE_NOT_ENABLED). Existing microsecond timestamp string casts are unchanged.

How was this patch tested?

  • CastSuiteBase: success cases for both types over p in [7, 9] and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard asserting FEATURE_NOT_ENABLED.
  • CastWithAnsiOnSuite: malformed-input parse errors (DateTimeException / CAST_INVALID_INPUT).
  • CastWithAnsiOffSuite / TryCastSuite: malformed input returns NULL.
  • Golden-file checks added to cast.sql (regenerated with SPARK_GENERATE_GOLDEN_FILES=1): positive cases assert the result type via typeof (the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (and NULL in non-ANSI mode).

Verified locally:

$ build/sbt 'catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *TryCastSuite'
$ build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql'
$ ./dev/scalastyle

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

@MaxGekk

MaxGekk commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

@stevomitric @uros-db Could you review this PR, please.

@uros-db uros-db left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid, idiomatic, well-tested. Left a few nit comments, but otherwise LGTM!

MaxGekk added 4 commits June 3, 2026 10:33
Wire Cast to support CAST(<string> AS TIMESTAMP_NTZ(p)) and
CAST(<string> AS TIMESTAMP_LTZ(p)) for fractional-seconds precision p in
[7, 9], on both the interpreted and codegen paths and across LEGACY, ANSI
and TRY eval modes. Reuses the SPARK-57032 string->nanos parse helpers on
SparkDateTimeUtils, which already return a normalized TimestampNanosVal and
apply per-precision truncation.

- Add StringType -> Timestamp{NTZ,LTZ}NanosType arms to canCast/canAnsiCast.
- Add (StringType, TimestampLTZNanosType) to Cast.needsTimeZone (NTZ string
  is zone-independent, mirroring micro TIMESTAMP_NTZ).
- Add interpreted castToTimestamp{LTZ,NTZ}Nanos and matching codegen,
  dispatched with the precision taken from the target type. NTZ adopts
  allowTimeZone = true to match the micro TIMESTAMP_NTZ string cast.

Tests cover success cases over p in [7, 9], ANSI parse errors, LEGACY/TRY
null on malformed input, and a flag-off FEATURE_NOT_ENABLED guard.
…estamps

Add end-to-end golden-file coverage to cast.sql for casting strings to
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p), mirroring the existing timestamp,
timestamp_ntz and TIME cast checks:

- Positive cases assert the result type via typeof (the reverse direction,
  nanos -> string rendering, is not wired yet; tracked under SPARK-57162).
- Negative cases exercise the parse-error path: ANSI mode throws
  CAST_INVALID_INPUT, non-ANSI returns NULL.

Golden files regenerated with SPARK_GENERATE_GOLDEN_FILES=1.
…r thrift

ThriftServerQueryTestSuite failed on nonansi/cast.sql because a bare
TIMESTAMP_NTZ(9)/TIMESTAMP_LTZ(9) result column cannot be mapped to a
JDBC/Hive type name yet (nanos -> string serialization is out of scope,
tracked under SPARK-57162). Wrap the negative cast checks in IS NULL so the
result column is boolean; the ANSI parse-error path is unchanged.
…st import

- Cast.scala: make the interpreted NTZ string parse pass allowTimeZone = true
  explicitly so it matches the codegen path (which must pass it since Scala
  default args are not visible from generated Java).
- CastWithAnsiOffSuite: import foreachNanosPrecision instead of using the
  fully-qualified name inline, consistent with the other Cast suites.
@MaxGekk MaxGekk force-pushed the nanos-cast-string branch from 4a1a084 to 63b5890 Compare June 3, 2026 08:34

@stevomitric stevomitric left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@uros-b uros-b closed this in 7d0a8cd Jun 3, 2026
uros-b pushed a commit that referenced this pull request Jun 3, 2026
### What changes were proposed in this pull request?

This PR wires `Cast` to support casting `StringType` to the nanosecond-capable timestamp types `TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)` with fractional-seconds precision `p` in `[7, 9]`, on both the interpreted and codegen paths and across all eval modes (`LEGACY`, `ANSI`, `TRY`):

- `CAST(<string> AS TIMESTAMP_NTZ(p))`
- `CAST(<string> AS TIMESTAMP_LTZ(p))`

Concretely, in `Cast.scala`:
- Add `StringType -> TimestampNTZNanosType(p)` / `TimestampLTZNanosType(p)` arms to `canCast` and `canAnsiCast`. Try-cast is covered automatically (`canTryCast` delegates to `canAnsiCast`, and `canUseLegacyCastForTryCast` already matches `(StringType, DatetimeType)`, which the nanos types extend).
- Add `(StringType, TimestampLTZNanosType)` to `Cast.needsTimeZone`. The NTZ string is zone-independent, mirroring the micro `TIMESTAMP_NTZ` cast.
- Add interpreted `castToTimestampLTZNanos` / `castToTimestampNTZNanos` and matching codegen, dispatched from `castInternal` / `nullSafeCastFunction` with the precision taken from the target type. The result is a `TimestampNanosVal` (or `null` in legacy/try mode on malformed input).
- The NTZ cast adopts `allowTimeZone = true` to match the existing micro `TIMESTAMP_NTZ` string cast, and resolves the `TODO(SPARK-57032)` left on `stringToTimestampNTZNanosAnsi`.

This reuses the parse entry points added in SPARK-57032 on `SparkDateTimeUtils` (inherited by `DateTimeUtils`), which already return a normalized `TimestampNanosVal` and apply per-precision truncation, so no separate normalization module is required for the string path.

Existing preview gating is unchanged: `Cast.checkInputDataTypes` calls `TypeUtils.failUnsupportedDataType`, which throws `FEATURE_NOT_ENABLED` when `spark.sql.timestampNanosTypes.enabled` is off.

### Why are the changes needed?

This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision).

The logical types, the `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` SQL syntax, the physical row value `TimestampNanosVal`, and the string-to-nanos parse helpers all exist, but `Cast` had zero arms for the nanos types. As a result `CAST(s AS TIMESTAMP_NTZ(9))` failed type-check with `CAST_WITHOUT_SUGGESTION` even when the preview flag `spark.sql.timestampNanosTypes.enabled` was on. String ingestion is the most common entry point for these types and unblocks typed literals, filters, and CTAS once coercion lands.

### Does this PR introduce _any_ user-facing change?

Yes, but only when the preview flag `spark.sql.timestampNanosTypes.enabled` is enabled (it defaults to off in production). With the flag on, `CAST(<string> AS TIMESTAMP_NTZ(p))` and `CAST(<string> AS TIMESTAMP_LTZ(p))` for `p` in `[7, 9]` now produce correct nanosecond values in `LEGACY`, `ANSI`, and `TRY` modes; previously they failed type-checking. With the flag off, the behavior is unchanged (`FEATURE_NOT_ENABLED`). Existing microsecond timestamp string casts are unchanged.

### How was this patch tested?

- `CastSuiteBase`: success cases for both types over `p` in `[7, 9]` and a 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ zone-independent (including a discarded zone suffix). Plus a flag-off guard asserting `FEATURE_NOT_ENABLED`.
- `CastWithAnsiOnSuite`: malformed-input parse errors (`DateTimeException` / `CAST_INVALID_INPUT`).
- `CastWithAnsiOffSuite` / `TryCastSuite`: malformed input returns `NULL`.
- Golden-file checks added to `cast.sql` (regenerated with `SPARK_GENERATE_GOLDEN_FILES=1`): positive cases assert the result type via `typeof` (the reverse direction, nanos -> string rendering, is not wired yet and is tracked under SPARK-57162); negative cases exercise the ANSI parse-error path (and `NULL` in non-ANSI mode).

Verified locally:
```
$ build/sbt 'catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *TryCastSuite'
$ build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql'
$ ./dev/scalastyle
```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

Closes #56288 from MaxGekk/nanos-cast-string.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Uros Bojanic <221401595+uros-b@users.noreply.github.com>
(cherry picked from commit 7d0a8cd)
Signed-off-by: Uros Bojanic <221401595+uros-b@users.noreply.github.com>
@MaxGekk

MaxGekk commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Thank you @uros-b!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants