Skip to content

GH-50078: [C++][ORC] Avoid signed overflow when converting timestamps#50035

Merged
kou merged 3 commits into
apache:mainfrom
jmestwa-coder:orc-timestamp-nanos-overflow
Jun 5, 2026
Merged

GH-50078: [C++][ORC] Avoid signed overflow when converting timestamps#50035
kou merged 3 commits into
apache:mainfrom
jmestwa-coder:orc-timestamp-nanos-overflow

Conversation

@jmestwa-coder

@jmestwa-coder jmestwa-coder commented May 25, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

A far-future ORC timestamp (after ~2262) makes AppendTimestampBatch in cpp/src/arrow/adapters/orc/util.cc overflow int64 nanoseconds in seconds * kOneSecondNanos + nanos. Reducing the multiply under -fsanitize=signed-integer-overflow:

runtime error: signed integer overflow:
  10000000000 * 1000000000 cannot be represented in type 'int64_t'

What changes are included in this PR?

Detect the overflow with MultiplyWithOverflow/AddWithOverflow and return Status::Invalid for out-of-range values instead of computing the product with undefined behavior. Null slots are skipped.

Are these changes tested?

Verified the offending expression with a standalone -fsanitize=signed-integer-overflow reproducer and confirmed the patched path returns an error rather than overflowing.

Are there any user-facing changes?

No.

@kou

kou commented May 25, 2026

Copy link
Copy Markdown
Member

@wgtmac

wgtmac commented May 26, 2026

Copy link
Copy Markdown
Member

I agree with @kou. Changes like this require an issue. The change itself looks good.

@jmestwa-coder jmestwa-coder changed the title MINOR: [C++][ORC] Avoid signed overflow when converting timestamps GH-50078: [C++][ORC] Avoid signed overflow when converting timestamps Jun 2, 2026
@jmestwa-coder

Copy link
Copy Markdown
Contributor Author

Opened #50078 for this and retitled the PR to reference it. Also restored the PR template. Thanks both.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #50078 has been automatically assigned in GitHub to PR creator.

@wgtmac

wgtmac commented Jun 2, 2026

Copy link
Copy Markdown
Member

Thanks for updating it! Is it possible to add a test case?

@jmestwa-coder

Copy link
Copy Markdown
Contributor Author

Added one in adapter_test.cc (TimestampOutOfRangeIsRejected) - it feeds a far-future ORC timestamp (10000000000s, ~year 2286) through AppendBatch and asserts it returns Invalid instead of overflowing.

@wgtmac wgtmac left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jmestwa-coder!

Comment thread cpp/src/arrow/adapters/orc/adapter_test.cc Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a signed integer overflow (UB) in the ORC-to-Arrow timestamp conversion path (AppendTimestampBatch) when scaling very large ORC timestamps (seconds + nanos) to int64 nanoseconds, returning a clear Status::Invalid instead of overflowing.

Changes:

  • Add checked arithmetic (MultiplyWithOverflow / AddWithOverflow) when computing seconds * 1e9 + nanos for ORC timestamps.
  • Switch timestamp appending to an explicit loop that skips conversion for null slots and uses UnsafeAppend* after reserving capacity.
  • Add a unit test asserting out-of-range timestamps are rejected with Invalid.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cpp/src/arrow/adapters/orc/util.cc Detect and report overflow when converting ORC timestamps to int64 nanoseconds; skip nulls safely.
cpp/src/arrow/adapters/orc/adapter_test.cc Add regression test to ensure far-future timestamps that overflow nanosecond scaling are rejected.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kou kou merged commit e980b7e into apache:main Jun 5, 2026
59 of 60 checks passed
@kou kou removed the awaiting review Awaiting review label Jun 5, 2026
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e980b7e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants