Skip to content

[SPARK-57210][SQL] Skip statically-dead null-write branch in CreateNamedStruct codegen for non-nullable fields#56270

Closed
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:createnamedstruct-codegen
Closed

[SPARK-57210][SQL] Skip statically-dead null-write branch in CreateNamedStruct codegen for non-nullable fields#56270
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:createnamedstruct-codegen

Conversation

@LuciferYang

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

CreateNamedStruct.doGenCode emitted, for every value child, an if (isNull) values[i] = null; else values[i] = value; block. For a non-nullable child the isNull branch is statically dead (eval.isNull is the constant false), so this skips it and emits a single values[i] = value;. Nullable children are unchanged.

This mirrors GenArrayData.genCodeToCreateArrayData in the same file, which already gates the per-element null write on !expr.nullable (so CreateArray / CreateMap already do this) — CreateNamedStruct was the lone holdout.

Why are the changes needed?

It removes a dead branch and the unreachable values[i] = null write per non-nullable field, shrinking the generated method. The win scales with struct width (e.g. struct(*) over a non-null schema), which helps with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work. Part of SPARK-56908.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing ComplexTypeSuite (CreateStruct, CreateNamedStruct, SPARK-22693 codegen) and CodeGenerationSuite wide-struct tests already exercise both the non-nullable child (the new branch) and nullable children via checkEvaluation, which runs the interpreted and codegen paths. Behavior is unchanged — the skipped write targets a slot that new Object[] already zero-initializes to null — so no new test is added, matching the in-file array (GenArrayData) convention.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…medStruct codegen for non-nullable fields

CreateNamedStruct.doGenCode emitted an if (isNull) values[i]=null else values[i]=value block for every value child. For a non-nullable child the isNull branch is statically dead (eval.isNull is the constant false), so emit a single values[i]=value instead. Nullable children are unchanged. This mirrors GenArrayData.genCodeToCreateArrayData in the same file, which already gates the per-element null write on !expr.nullable (CreateArray/CreateMap already do this); CreateNamedStruct was the lone holdout. Shrinks the generated method per non-nullable field, scaling with struct width.

Part of SPARK-56908.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .

@gengliangwang

Copy link
Copy Markdown
Member

Flagging against the scope guidance added to the umbrella SPARK-56908.

This is a source-only dead-branch elimination: Janino constant-folds if (true)/if (false), so there is no bytecode change. Measured on TPC-DS whole-stage codegen, even the most frequent such patterns (×358–445, ~0.4% of generated source) saved compile time below the run-to-run noise floor (~0.7%). The changes that were worth it removed more than source — e.g. SPARK-57198 dropped an unreachable references[]/constant-pool entry, which Janino cannot recover.

Before merging, please confirm this removes more than source text (a references[]/constant-pool entry, or keeps a method under the 64KB limit) — raw occurrence count alone is not sufficient. If not, it adds codegen-logic complexity for negligible benefit and should be dropped.

@LuciferYang

Copy link
Copy Markdown
Contributor Author

Close this PR after checking against the preceding criteria. thank you @gengliangwang @dongjoon-hyun

@LuciferYang LuciferYang closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants