Skip to content

[SPARK-57217][SQL] Skip dead null assignments in If codegen when branch values are non-nullable#56278

Closed
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:if-codegen-nonnull
Closed

[SPARK-57217][SQL] Skip dead null assignments in If codegen when branch values are non-nullable#56278
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:if-codegen-nonnull

Conversation

@LuciferYang

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

If.doGenCode always emitted, for each branch, an ev.isNull = branchEval.isNull; assignment after the boolean ev.isNull = false; declaration. For a non-nullable branch that assignment is ev.isNull = false;, which is dead (ev.isNull is already initialized to false). This drops that assignment for a non-nullable branch. When the whole If is non-nullable (both branches non-nullable), it also drops the boolean ev.isNull = false; declaration and returns isNull = FalseLiteral, which lets parent operators skip null checks on the result.

The body is refactored into small Block values (isNullDecl / setTrueIsNull / setFalseIsNull, EmptyBlock when not needed). This follows the SPARK-56908 non-nullable-children dead-branch pattern and the standard non-nullable codegen contract (isNull = FalseLiteral).

Why are the changes needed?

It removes dead assignments (and, for a non-nullable If, the ev.isNull variable itself), shrinking the generated method. Because If is ubiquitous and often nested, the isNull = FalseLiteral propagation additionally lets parent operators drop null checks on a non-nullable If result. This helps with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work. Part of SPARK-56908.

Does this PR introduce any user-facing change?

No. If.nullable, dataType, and eval are unchanged, so results and output-schema nullability are identical; only the generated Java is smaller.

How was this patch tested?

Existing ConditionalExpressionSuite "if" / "case when" (the latter delegates single-branch codegen to If) and the SPARK-49396 nullability test, plus added checkEvaluation cases with nullable BoundReference branches that exercise the both-children-nullable codegen template (the existing non-null-literal cases now take the dead-branch-skipping paths instead). checkEvaluation runs the interpreted and codegen paths; behavior is unchanged.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…ch values are non-nullable

If.doGenCode emitted ev.isNull = branchEval.isNull; for each branch after declaring boolean ev.isNull = false;. For a non-nullable branch that assignment is a dead ev.isNull = false;, so drop it. When the whole If is non-nullable, also drop the boolean ev.isNull declaration and return isNull = FalseLiteral, letting parent operators skip null checks on the result. Refactored via small Block values (isNullDecl/setTrueIsNull/setFalseIsNull, EmptyBlock when not needed). Follows the non-nullable-children dead-branch pattern and the standard non-nullable codegen contract; shrinks the generated method (and, for non-nullable If, the downstream null checks). Adds checkEvaluation cases with nullable BoundReference branches to cover the both-children-nullable codegen template.

Part of SPARK-56908.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .

@gengliangwang

Copy link
Copy Markdown
Member

Flagging against the scope guidance added to the umbrella SPARK-56908.

This is a source-only dead-branch elimination: Janino constant-folds if (true)/if (false), so there is no bytecode change. Measured on TPC-DS whole-stage codegen, even the most frequent such patterns (×358–445, ~0.4% of generated source) saved compile time below the run-to-run noise floor (~0.7%). The changes that were worth it removed more than source — e.g. SPARK-57198 dropped an unreachable references[]/constant-pool entry, which Janino cannot recover.

Before merging, please confirm this removes more than source text (a references[]/constant-pool entry, or keeps a method under the 64KB limit) — raw occurrence count alone is not sufficient. If not, it adds codegen-logic complexity for negligible benefit and should be dropped.

@LuciferYang

Copy link
Copy Markdown
Contributor Author

Close this PR after checking against the preceding criteria.

@LuciferYang LuciferYang closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants