Skip to content

Canonicalize MX fake quant export through Q-DQ#762

Merged
mhs4670go merged 3 commits into
Samsung:mainfrom
mhs4670go:mx
Jun 8, 2026
Merged

Canonicalize MX fake quant export through Q-DQ#762
mhs4670go merged 3 commits into
Samsung:mainfrom
mhs4670go:mx

Conversation

@mhs4670go

Copy link
Copy Markdown
Contributor

Introduce a separate MX fake-quant frontend op and lower it to a logical quantize_mx/dequantize_mx pair before Circle export.

Related: #436
TICO-DCO-1.0-Signed-off-by: seongwoo mhs4670go@naver.com

Introduce a separate MX fake-quant frontend op and lower it to a logical
quantize_mx/dequantize_mx pair before Circle export.

TICO-DCO-1.0-Signed-off-by: seongwoo <mhs4670go@naver.com>
@mhs4670go

Copy link
Copy Markdown
Contributor Author

@stamalakhov

Thanks for the draft. I slighty changed a bit the codes that introduce mx_fake_quantize.

Then DecomposeFakeQuantize lowers mx_fake_quantize to quantize_mx -> dequantize_mx, and FoldQuantOps folds that Q-DQ pair into the producer node's QPARAM metadata, just like affine quantization already does. This avoids overloading quantize_mx with two different meanings.

Please feel free to give your opinions.

def CircleMXFakeQuantize():
"""Register the eager MX fake-quantization custom operator."""

@custom_op("circle_custom::mx_fake_quantize", mutates_args=())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread tico/serialize/circle_mapping.py Outdated
# TODO Add more dtypes
}

optional_dtypes = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Although i'm not sure why these:

 "mxint8": "MXINT8",
 "mxfp4": "MXFP4",

can not be inserted directly into dmap.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I used the indirect getattr mapping only to avoid breaking environments where circle_schema does not define MX tensor types yet. In that case, non-MX dtype conversion can still work and MX dtype conversion fails only when requested.

However, since this PR is adding MX export support, it is reasonable to require a schema version that already has MXINT8/MXFP4. I agree direct entries in dmap are simpler and clearer. I will update it that way.

@stamalakhov

Copy link
Copy Markdown
Contributor

@stamalakhov

Thanks for the draft. I slighty changed a bit the codes that introduce mx_fake_quantize.

Then DecomposeFakeQuantize lowers mx_fake_quantize to quantize_mx -> dequantize_mx, and FoldQuantOps folds that Q-DQ pair into the producer node's QPARAM metadata, just like affine quantization already does. This avoids overloading quantize_mx with two different meanings.

Please feel free to give your opinions.

@mhs4670go
I've tried to keep existing codes untouched. So that they can be refactored later. I planned to introduce Q-DQ after #760, I I supposed to introduce serialization of MX in another PR, as it is contained in another module. Draft contains all of these features. Should i approve #762?

@mhs4670go

Copy link
Copy Markdown
Contributor Author

@stamalakhov

Thanks for the clarification. I understand your intention: #760 introduces the MX Q/DQ stubs first, and the Q-DQ canonicalization and serialization could be layered on top in later PRs. I'm sorry if my comment came across as dismissing your code. That was not my intention at all.

Even though it can be refactored later, my preference was let #762 supersede #760 if you are comfortable with the scope, because the fake-quant op, Q-DQ decomposition, and folding logic are tightly coupled and are easier to review as one coherent export flow.

From the design perspective, this keeps the MX export path consistent with the affine fake-quant path: fake quant API -> Q-DQ decomposition -> qparam folding into producer metadata -> Circle qmodel export.

If you're okay with it, I'd prefer to proceed directly on top of the refactored code, even if it may require a bit of extra work.

@stamalakhov

Copy link
Copy Markdown
Contributor

If you're okay with it, I'd prefer to proceed directly on top of the refactored code, even if it may require a bit of extra work.

I'm ok with it.

@stamalakhov stamalakhov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mhs4670go mhs4670go merged commit f3098af into Samsung:main Jun 8, 2026
7 checks passed
@mhs4670go mhs4670go deleted the mx branch June 8, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants