Skip to content

[SPARK-57201][SQL] Skip the null check in UnsafeProjection codegen for statically non-null values#56257

Closed
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-unsafeproj-nullcheck
Closed

[SPARK-57201][SQL] Skip the null check in UnsafeProjection codegen for statically non-null values#56257
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-unsafeproj-nullcheck

Conversation

@gengliangwang

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

GenerateUnsafeProjection writes each field as if (${input.isNull}) { setNull } else { write } whenever the field's schema is nullable. When the bound value is statically non-null its isNull is the literal false, so this emits a dead if (false) { setNullAt(i); } else { write(...); }.

This PR skips the null check (and the dead setNull branch) when input.isNull == FalseLiteral, emitting just the write -- the same code the existing !nullable path already produces. Symmetrically, when input.isNull == TrueLiteral (statically null), it emits only setNull. The FalseLiteral comparison is the idiom already used in this file for the top-level zeroOutNullBytes fast path.

Before (for a non-null value in a nullable-schema field):

if (false) {
  rowWriter.setNullAt(0);
} else {
  rowWriter.write(0, value);
}

After:

rowWriter.write(0, value);

Why are the changes needed?

Sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen). Dumping the TPC-DS whole-stage codegen shows ~358 such dead if (false) { setNullAt } else { write } blocks; the schema marks the field nullable while the bound value is statically non-null. Emitting only the write removes the dead branch and the redundant conditional from the generated projection code.

Does this PR introduce any user-facing change?

No. The row writer's null bits are already cleared up front (resetRowWriter / zeroOutNullBytes), so writing a non-null value without setNullAt is exactly what the existing !nullable path does; results are unchanged.

How was this patch tested?

Behavior-preserving change covered by the existing projection suites: GeneratedProjectionSuite and UnsafeRowConverterSuite (39 tests) pass. Additionally verified by re-dumping the TPC-DS whole-stage codegen: the ~358 dead if (false) { setNullAt } blocks are gone and every generated subtree still compiles.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

…r statically non-null values

GenerateUnsafeProjection writes a nullable-schema field as
`if (input.isNull) { setNull } else { write }`. When the bound value is
statically non-null its isNull is the literal `false`, producing a dead
`if (false) { setNullAt } else { write }`. Skip the check when
`input.isNull == FalseLiteral` (emit just the write, as the !nullable path
already does); emit only setNull when `input.isNull == TrueLiteral`.

Across the TPC-DS queries this removes ~358 dead `if (false)` null-write blocks.

Generated-by: Claude Code (Opus 4.8)
@gengliangwang

Copy link
Copy Markdown
Member Author

@LuciferYang thanks for the review. Merging to master/4.x

gengliangwang added a commit that referenced this pull request Jun 3, 2026
…r statically non-null values

### What changes were proposed in this pull request?

`GenerateUnsafeProjection` writes each field as `if (${input.isNull}) { setNull } else { write }` whenever the field's schema is nullable. When the bound value is statically non-null its `isNull` is the literal `false`, so this emits a dead `if (false) { setNullAt(i); } else { write(...); }`.

This PR skips the null check (and the dead `setNull` branch) when `input.isNull == FalseLiteral`, emitting just the write -- the same code the existing `!nullable` path already produces. Symmetrically, when `input.isNull == TrueLiteral` (statically null), it emits only `setNull`. The `FalseLiteral` comparison is the idiom already used in this file for the top-level `zeroOutNullBytes` fast path.

Before (for a non-null value in a nullable-schema field):

```java
if (false) {
  rowWriter.setNullAt(0);
} else {
  rowWriter.write(0, value);
}
```

After:

```java
rowWriter.write(0, value);
```

### Why are the changes needed?

Sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen). Dumping the TPC-DS whole-stage codegen shows ~358 such dead `if (false) { setNullAt } else { write }` blocks; the schema marks the field nullable while the bound value is statically non-null. Emitting only the write removes the dead branch and the redundant conditional from the generated projection code.

### Does this PR introduce _any_ user-facing change?

No. The row writer's null bits are already cleared up front (`resetRowWriter` / `zeroOutNullBytes`), so writing a non-null value without `setNullAt` is exactly what the existing `!nullable` path does; results are unchanged.

### How was this patch tested?

Behavior-preserving change covered by the existing projection suites: `GeneratedProjectionSuite` and `UnsafeRowConverterSuite` (39 tests) pass. Additionally verified by re-dumping the TPC-DS whole-stage codegen: the ~358 dead `if (false) { setNullAt }` blocks are gone and every generated subtree still compiles.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Closes #56257 from gengliangwang/spark-unsafeproj-nullcheck.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 8a73a0c)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants