Canonicalize MX fake quant export through Q-DQ#762
Conversation
Introduce a separate MX fake-quant frontend op and lower it to a logical quantize_mx/dequantize_mx pair before Circle export. TICO-DCO-1.0-Signed-off-by: seongwoo <mhs4670go@naver.com>
|
Thanks for the draft. I slighty changed a bit the codes that introduce Then Please feel free to give your opinions. |
| def CircleMXFakeQuantize(): | ||
| """Register the eager MX fake-quantization custom operator.""" | ||
|
|
||
| @custom_op("circle_custom::mx_fake_quantize", mutates_args=()) |
| # TODO Add more dtypes | ||
| } | ||
|
|
||
| optional_dtypes = { |
There was a problem hiding this comment.
👍
Although i'm not sure why these:
"mxint8": "MXINT8",
"mxfp4": "MXFP4",
can not be inserted directly into dmap.
There was a problem hiding this comment.
Good point. I used the indirect getattr mapping only to avoid breaking environments where circle_schema does not define MX tensor types yet. In that case, non-MX dtype conversion can still work and MX dtype conversion fails only when requested.
However, since this PR is adding MX export support, it is reasonable to require a schema version that already has MXINT8/MXFP4. I agree direct entries in dmap are simpler and clearer. I will update it that way.
@mhs4670go |
|
Thanks for the clarification. I understand your intention: #760 introduces the MX Q/DQ stubs first, and the Q-DQ canonicalization and serialization could be layered on top in later PRs. I'm sorry if my comment came across as dismissing your code. That was not my intention at all. Even though it can be refactored later, my preference was let #762 supersede #760 if you are comfortable with the scope, because the fake-quant op, Q-DQ decomposition, and folding logic are tightly coupled and are easier to review as one coherent export flow. From the design perspective, this keeps the MX export path consistent with the affine fake-quant path: fake quant API -> Q-DQ decomposition -> qparam folding into producer metadata -> Circle qmodel export. If you're okay with it, I'd prefer to proceed directly on top of the refactored code, even if it may require a bit of extra work. |
I'm ok with it. |
Introduce a separate MX fake-quant frontend op and lower it to a logical quantize_mx/dequantize_mx pair before Circle export.
Related: #436
TICO-DCO-1.0-Signed-off-by: seongwoo mhs4670go@naver.com