Skip to content

[Iluvatar] Support CINN for PaddleOCR-VL by converting max_seqlens to Tensor inputs#7997

Open
wuyujiji wants to merge 1 commit into
PaddlePaddle:developfrom
wuyujiji:yuzhe_dev
Open

[Iluvatar] Support CINN for PaddleOCR-VL by converting max_seqlens to Tensor inputs#7997
wuyujiji wants to merge 1 commit into
PaddlePaddle:developfrom
wuyujiji:yuzhe_dev

Conversation

@wuyujiji

@wuyujiji wuyujiji commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Motivation

天数硬件paddleocr-vl支持CINN

Modifications

N/A

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter

codecov-commenter commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@4ba6625). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/iluvatar_model_runner.py 0.00% 9 Missing ⚠️
...eploy/model_executor/ops/iluvatar/attention_ops.py 0.00% 4 Missing ⚠️
...rs/backends/iluvatar/attention/mha_attn_backend.py 0.00% 2 Missing ⚠️
fastdeploy/worker/iluvatar_worker.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7997   +/-   ##
==========================================
  Coverage           ?   66.53%           
==========================================
  Files              ?      475           
  Lines              ?    66679           
  Branches           ?    10289           
==========================================
  Hits               ?    44363           
  Misses             ?    19477           
  Partials           ?     2839           
Flag Coverage Δ
GPU 76.38% <ø> (?)
XPU 6.97% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 6, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-16 15:35:16

CI报告基于以下代码生成(30分钟更新一次):
PR commit: e8dd413 | Merge base: 4ba6625 (branch: develop)


1 Required任务 : 10/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 39 2 1 0 0

2 失败详情

PaddlePaddle-bot

This comment was marked as outdated.

@wuyujiji wuyujiji changed the title [Iluvatar] Support CINN for paddleocr-vl [Iluvatar] Support CINN for PaddleOCR-VL by converting max_seqlens to Tensor inputs Jun 11, 2026
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@wuyujiji

Copy link
Copy Markdown
Contributor Author

/re-run all-failed

PaddlePaddle-bot

This comment was marked as outdated.

@wuyujiji

Copy link
Copy Markdown
Contributor Author

/re-run all-failed

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-16 16:31:19 Asia/Shanghai

📋 Review 摘要

PR 概述:将 Iluvatar cuinfer_flash_attn_unpaddedmax_seqlens_q/k 从 attr 改为 Tensor input,并同步 PaddleOCR-VL 的 CINN/CUDAGraph、Iluvatar worker 和文档/CI 脚本配置。
变更范围:Iluvatar custom op、attention backend/model runner、KV cache 默认内存、Iluvatar 安装文档与 CI 脚本。
影响面 Tag[OP] [Iluvatar] [Graph Optimization] [CI] [Docs]

问题

未发现新的阻塞性代码问题。PR 规范问题在下面章节报,不在这里重复。

历史 Findings 修复情况

Finding 问题 状态
F1 max_seqlens_q/k Tensor 解引用前缺少空 tensor / numel 校验 ⚠️ 仍存在
F2 attention 热路径中新增 paddle.to_tensor ⚠️ 仍存在
F3 max_seqlens_q/k Tensor 缺少 dtype 校验 ⚠️ 仍存在

📝 PR 规范检查

ModificationsUsage or CommandAccuracy Tests 三节均填写 "Pass",未提供实质内容。建议按模板补全。

标题建议(可直接复制):

  • [Iluvatar] Support CINN for PaddleOCR-VL by converting max_seqlens to Tensor inputs
PR 描述建议(点击展开,可直接复制)
## Motivation
天数智芯(Iluvatar)平台 `cuinfer_flash_attn_unpadded` 算子原先将 `max_seqlens_q/k` 注册为 scalar attr,导致 CINN 无法处理动态序列长度。本 PR 将其改为 Tensor input,使 PaddleOCR-VL 在 Iluvatar 硬件上可启用 CINN(`graph_opt_level: 2`)。

## Modifications
- `custom_ops/iluvatar_ops/flash_attn_unpadded.cu`- `FlashAttnUnpaddedKernel` / `FlashAttnUnpadded` 函数签名:`int max_seqlens_q/k``const paddle::Tensor& max_seqlens_q_/k_`
  - `PD_BUILD_STATIC_OP`:将 `max_seqlens_q/k``.Attrs` 移至 `.Inputs`
  - `FlashAttnUnpaddedInferShape` / `FlashAttnUnpaddedInferDtype`:新增对应入参
- `custom_ops/setup_ops.py`:Iluvatar 编译标志追加 `-std=c++17`
- `docs/`:更新容器名称、挂载路径及启动命令参数(`max-num-seqs: 240``gpu-memory-utilization: 0.7``graph_opt_level: 2`- `scripts/run_ci_iluvatar.sh`:CI 脚本同步更新 `graph-optimization-config`

## Usage or Command
```bash
python3 -m fastdeploy.entrypoints.openai.api_server \
    --model /data1/fastdeploy/PaddleOCR-VL \
    --max-model-len 16384 \
    --max-num-batched-tokens 16384 \
    --max-num-seqs 240 \
    --block-size 16 \
    --workers 2 \
    --gpu-memory-utilization 0.7 \
    --graph-optimization-config '{"graph_opt_level":2, "use_cudagraph": true}'
```

## Accuracy Tests
在 Iluvatar 硬件上测试 PaddleOCR-VL 推理精度与启用 CINN 前一致(或附具体指标)。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先回溯了 custom op 签名迁移、Python 调用侧、Iluvatar attention backend 的动态维度标记、按层 KV heads backend 初始化、KV cache shape 分配,以及 CI/requirements 联动。除历史未解决项和 PR 描述仍不完整外,未确认新的阻塞性问题。

@wuyujiji

Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants