Skip to content

[FLINK-39763][table] Allow inferring constants with IS NOT DISTINCT FROM#28256

Merged
snuyanzin merged 2 commits into
apache:masterfrom
snuyanzin:flink39763
Jun 16, 2026
Merged

[FLINK-39763][table] Allow inferring constants with IS NOT DISTINCT FROM#28256
snuyanzin merged 2 commits into
apache:masterfrom
snuyanzin:flink39763

Conversation

@snuyanzin

@snuyanzin snuyanzin commented May 26, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

The PRs enables Calcite's optimization for inferring constants https://issues.apache.org/jira/browse/CALCITE-5336

Brief change log

RexUtil

Verifying this change

Existing tests and updated tests

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): ( no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

<![CDATA[
Calc(select=[a, b, CAST(2 AS INTEGER) AS d, e])
+- MultipleInput(readOrder=[0,1], members=[\nHashJoin(joinType=[InnerJoin], where=[((a = d) AND (b = e))], select=[a, b, d, e], isBroadcast=[true], build=[right])\n:- HashAggregate(isMerge=[true], groupBy=[a], select=[a, Final_COUNT(count$0) AS b])\n: +- [#2] Exchange(distribution=[hash[a]])\n+- [#1] Exchange(distribution=[broadcast])\n])
Calc(select=[CAST(2 AS INTEGER) AS a, b, CAST(2 AS INTEGER) AS d, e])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the constant pulled up

join
(select d, count(e) as e from MyTable2 group by d)
on a = d and b = e and d = 2 and b = 1
]]>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and for other joins (except for NestedLoop)
now Calcite is able to understand that a == 2 and e == 1
after that it is able to push down these values to corresponding table sources
as a result non nested loop join will fail as there is only true in join condition.

when all joins are available then NestedLoop will be selected, there is also e2e case in DecimalITCase in this PR

@flinkbot

flinkbot commented May 26, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@jnh5y

jnh5y commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Do we need/want to add a test for IS NOT DISTINCT FROM or is that already covered sufficiently?

Comment on lines -182 to -183
tEnv.getConfig
.set(ExecutionConfigOptions.TABLE_EXEC_DISABLED_OPERATORS, "HashJoin, NestedLoopJoin")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being changed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reason as explained above #28256 (comment)

Comment on lines 78 to 80
@@ -79,10 +79,7 @@
* because of current Calcite way of inferring constants from IS NOT DISTINCT FROM clashes with
* filter push down.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this comment needs to be updated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, updated

@snuyanzin

Copy link
Copy Markdown
Contributor Author

Do we need/want to add a test for IS NOT DISTINCT FROM or is that already covered sufficiently?

if you mean IS NOT DISTINCT FROM in SQL the there are already some existing like

testSqlApi("NULLIF(1,1) IS NOT DISTINCT FROM NULLIF(1,1)", "TRUE")
testSqlApi("NULLIF(1,1) IS NOT DISTINCT FROM NULLIF(1,1)", "TRUE")

@Test
def testJoinWithIsNotDistinctFrom(): Unit = {
val sql =
"""
|SELECT L.*, R.*
|FROM (
| SELECT
| a,
| window_start,
| window_end,
| window_time,
| count(*) as cnt,
| count(distinct c) AS uv
| FROM TABLE(TUMBLE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '15' MINUTE))
| GROUP BY a, window_start, window_end, window_time
|) L
|JOIN (
| SELECT
| a,
| window_start,
| window_end,
| window_time,
| count(*) as cnt,
| count(distinct c) AS uv
| FROM TABLE(TUMBLE(TABLE MyTable2, DESCRIPTOR(rowtime), INTERVAL '15' MINUTE))
| GROUP BY a, window_start, window_end, window_time
|) R
|ON L.window_start = R.window_start AND L.window_end = R.window_end AND
|L.a IS NOT DISTINCT FROM R.a
""".stripMargin

@github-actions github-actions Bot added the community-reviewed PR has been reviewed by the community. label Jun 16, 2026
@snuyanzin snuyanzin merged commit 2d515d3 into apache:master Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants