Skip to content

[SPARK-57216][SQL] Skip dead null-check branches in NaNvl codegen for non-nullable children#56276

Closed
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:nanvl-codegen
Closed

[SPARK-57216][SQL] Skip dead null-check branches in NaNvl codegen for non-nullable children#56276
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:nanvl-codegen

Conversation

@LuciferYang

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

NaNvl.doGenCode always wrapped each child in a null check: an outer if (leftGen.isNull) { ev.isNull = true; } else { ... }, and (in the NaN-fallback path) an inner if (rightGen.isNull) { ev.isNull = true; } else { ev.value = rightGen.value; }. For a non-nullable child the isNull is statically false, so the wrapper is dead.

This drops the outer wrapper when left is non-nullable and the inner one when right is non-nullable, emitting just the value assignment. The NaN check and the lazy evaluation of the right child (evaluated only when left is non-null and NaN) are preserved. The body is refactored into a few small Block values to express the gating cleanly.

Follows the SPARK-56908 non-nullable-children dead-branch pattern of GenArrayData.genCodeToCreateArrayData / CreateNamedStruct.

Why are the changes needed?

It removes a dead comparison/branch per non-nullable child, shrinking the generated method (~3 lines per non-nullable child, ~6 when both are non-nullable), which helps with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work. Part of SPARK-56908.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing NullExpressionsSuite "nanvl" and ColumnExpressionSuite "nanvl", plus added checkEvaluation cases with nullable BoundReference children that exercise the both-children-nullable codegen path (the existing non-nullable-literal cases now take the dead-branch-skipping paths instead). checkEvaluation runs the interpreted and codegen paths; behavior is unchanged.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

… non-nullable children

NaNvl.doGenCode wrapped each child in a null check: an outer if (leftGen.isNull) {...} else {...} and, in the NaN-fallback path, an inner if (rightGen.isNull) {...} else { ev.value = rightGen.value; }. For a non-nullable child the isNull is statically false, so drop the outer wrapper when left is non-nullable and the inner one when right is non-nullable, emitting just the value assignment. The NaN check and lazy right-child evaluation are preserved. The body is refactored into small Block values to express the gating cleanly. Follows the non-nullable-children dead-branch pattern of GenArrayData.genCodeToCreateArrayData / CreateNamedStruct; shrinks the generated method per non-nullable child. Adds checkEvaluation cases with nullable BoundReference children to cover the both-children-nullable codegen path.

Part of SPARK-56908.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .

@gengliangwang

Copy link
Copy Markdown
Member

Flagging against the scope guidance added to the umbrella SPARK-56908.

This is a source-only dead-branch elimination: Janino constant-folds if (true)/if (false), so there is no bytecode change. Measured on TPC-DS whole-stage codegen, even the most frequent such patterns (×358–445, ~0.4% of generated source) saved compile time below the run-to-run noise floor (~0.7%). The changes that were worth it removed more than source — e.g. SPARK-57198 dropped an unreachable references[]/constant-pool entry, which Janino cannot recover.

Before merging, please confirm this removes more than source text (a references[]/constant-pool entry, or keeps a method under the 64KB limit) — raw occurrence count alone is not sufficient. If not, it adds codegen-logic complexity for negligible benefit and should be dropped.

@LuciferYang

Copy link
Copy Markdown
Contributor Author

Close this PR after checking against the preceding criteria.

@LuciferYang LuciferYang closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants