GH-50078: [C++][ORC] Avoid signed overflow when converting timestamps#50035
Conversation
|
|
I agree with @kou. Changes like this require an issue. The change itself looks good. |
|
Opened #50078 for this and retitled the PR to reference it. Also restored the PR template. Thanks both. |
|
|
|
Thanks for updating it! Is it possible to add a test case? |
|
Added one in adapter_test.cc (TimestampOutOfRangeIsRejected) - it feeds a far-future ORC timestamp (10000000000s, ~year 2286) through AppendBatch and asserts it returns Invalid instead of overflowing. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a signed integer overflow (UB) in the ORC-to-Arrow timestamp conversion path (AppendTimestampBatch) when scaling very large ORC timestamps (seconds + nanos) to int64 nanoseconds, returning a clear Status::Invalid instead of overflowing.
Changes:
- Add checked arithmetic (
MultiplyWithOverflow/AddWithOverflow) when computingseconds * 1e9 + nanosfor ORC timestamps. - Switch timestamp appending to an explicit loop that skips conversion for null slots and uses
UnsafeAppend*after reserving capacity. - Add a unit test asserting out-of-range timestamps are rejected with
Invalid.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| cpp/src/arrow/adapters/orc/util.cc | Detect and report overflow when converting ORC timestamps to int64 nanoseconds; skip nulls safely. |
| cpp/src/arrow/adapters/orc/adapter_test.cc | Add regression test to ensure far-future timestamps that overflow nanosecond scaling are rejected. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e980b7e. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
A far-future ORC timestamp (after ~2262) makes
AppendTimestampBatchincpp/src/arrow/adapters/orc/util.ccoverflow int64 nanoseconds inseconds * kOneSecondNanos + nanos. Reducing the multiply under-fsanitize=signed-integer-overflow:What changes are included in this PR?
Detect the overflow with
MultiplyWithOverflow/AddWithOverflowand returnStatus::Invalidfor out-of-range values instead of computing the product with undefined behavior. Null slots are skipped.Are these changes tested?
Verified the offending expression with a standalone
-fsanitize=signed-integer-overflowreproducer and confirmed the patched path returns an error rather than overflowing.Are there any user-facing changes?
No.