[SPARK-57201][SQL] Skip the null check in UnsafeProjection codegen for statically non-null values#56257
Closed
gengliangwang wants to merge 1 commit into
Closed
Conversation
…r statically non-null values
GenerateUnsafeProjection writes a nullable-schema field as
`if (input.isNull) { setNull } else { write }`. When the bound value is
statically non-null its isNull is the literal `false`, producing a dead
`if (false) { setNullAt } else { write }`. Skip the check when
`input.isNull == FalseLiteral` (emit just the write, as the !nullable path
already does); emit only setNull when `input.isNull == TrueLiteral`.
Across the TPC-DS queries this removes ~358 dead `if (false)` null-write blocks.
Generated-by: Claude Code (Opus 4.8)
LuciferYang
approved these changes
Jun 3, 2026
LuciferYang
approved these changes
Jun 3, 2026
Member
Author
|
@LuciferYang thanks for the review. Merging to master/4.x |
gengliangwang
added a commit
that referenced
this pull request
Jun 3, 2026
…r statically non-null values
### What changes were proposed in this pull request?
`GenerateUnsafeProjection` writes each field as `if (${input.isNull}) { setNull } else { write }` whenever the field's schema is nullable. When the bound value is statically non-null its `isNull` is the literal `false`, so this emits a dead `if (false) { setNullAt(i); } else { write(...); }`.
This PR skips the null check (and the dead `setNull` branch) when `input.isNull == FalseLiteral`, emitting just the write -- the same code the existing `!nullable` path already produces. Symmetrically, when `input.isNull == TrueLiteral` (statically null), it emits only `setNull`. The `FalseLiteral` comparison is the idiom already used in this file for the top-level `zeroOutNullBytes` fast path.
Before (for a non-null value in a nullable-schema field):
```java
if (false) {
rowWriter.setNullAt(0);
} else {
rowWriter.write(0, value);
}
```
After:
```java
rowWriter.write(0, value);
```
### Why are the changes needed?
Sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen). Dumping the TPC-DS whole-stage codegen shows ~358 such dead `if (false) { setNullAt } else { write }` blocks; the schema marks the field nullable while the bound value is statically non-null. Emitting only the write removes the dead branch and the redundant conditional from the generated projection code.
### Does this PR introduce _any_ user-facing change?
No. The row writer's null bits are already cleared up front (`resetRowWriter` / `zeroOutNullBytes`), so writing a non-null value without `setNullAt` is exactly what the existing `!nullable` path does; results are unchanged.
### How was this patch tested?
Behavior-preserving change covered by the existing projection suites: `GeneratedProjectionSuite` and `UnsafeRowConverterSuite` (39 tests) pass. Additionally verified by re-dumping the TPC-DS whole-stage codegen: the ~358 dead `if (false) { setNullAt }` blocks are gone and every generated subtree still compiles.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
Closes #56257 from gengliangwang/spark-unsafeproj-nullcheck.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 8a73a0c)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
GenerateUnsafeProjectionwrites each field asif (${input.isNull}) { setNull } else { write }whenever the field's schema is nullable. When the bound value is statically non-null itsisNullis the literalfalse, so this emits a deadif (false) { setNullAt(i); } else { write(...); }.This PR skips the null check (and the dead
setNullbranch) wheninput.isNull == FalseLiteral, emitting just the write -- the same code the existing!nullablepath already produces. Symmetrically, wheninput.isNull == TrueLiteral(statically null), it emits onlysetNull. TheFalseLiteralcomparison is the idiom already used in this file for the top-levelzeroOutNullBytesfast path.Before (for a non-null value in a nullable-schema field):
After:
Why are the changes needed?
Sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen). Dumping the TPC-DS whole-stage codegen shows ~358 such dead
if (false) { setNullAt } else { write }blocks; the schema marks the field nullable while the bound value is statically non-null. Emitting only the write removes the dead branch and the redundant conditional from the generated projection code.Does this PR introduce any user-facing change?
No. The row writer's null bits are already cleared up front (
resetRowWriter/zeroOutNullBytes), so writing a non-null value withoutsetNullAtis exactly what the existing!nullablepath does; results are unchanged.How was this patch tested?
Behavior-preserving change covered by the existing projection suites:
GeneratedProjectionSuiteandUnsafeRowConverterSuite(39 tests) pass. Additionally verified by re-dumping the TPC-DS whole-stage codegen: the ~358 deadif (false) { setNullAt }blocks are gone and every generated subtree still compiles.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)