Skip to content

mtmd: add unlimited-ocr (converter, full MHA)#24969

Merged
ngxson merged 1 commit into
ggml-org:masterfrom
sfallah:sf/unlimited-ocr
Jun 24, 2026
Merged

mtmd: add unlimited-ocr (converter, full MHA)#24969
ngxson merged 1 commit into
ggml-org:masterfrom
sfallah:sf/unlimited-ocr

Conversation

@sfallah

@sfallah sfallah commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Overview

  • Adds baidu/Unlimited-OCR to mtmd: converts and runs the model in llama.cpp.
  • Converter-only, no C++ changes.
  • Registers UnlimitedOCRForCausalLM (text + mmproj). Writes the decoder sliding_window
    to the GGUF as metadata; the decoder runs full MHA and ignores it for now.
  • Parity test: single-page vs the HF reference (transformers 4.46.3), full MHA, within tolerance.

Scope: conversion + full-MHA inference. Follow-up PR (stacked on this one) implements R-SWA in the decoder.

How to run

GGUF models: sabafallah/Unlimited-OCR-GGUF

build/bin/llama-mtmd-cli -hf sabafallah/Unlimited-OCR-GGUF:bf16 \
  --image tools/mtmd/test-1.jpeg -p "document parsing." \
  --chat-template deepseek-ocr \
  --temp 0 --flash-attn off --no-warmup \
  -n 4096 -c 16384 \
  --dry-multiplier 0.8 --dry-base 1.75 --dry-allowed-length 2 \
  --dry-penalty-last-n -1 --dry-sequence-breaker none

Additional information

  • Unlimited-OCR is trained with R-SWA (Reference Sliding Window Attention). This PR runs
    the decoder as full MHA.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - I used AI assistance for code review, debugging, implementation checks, and testing. I have reviewed the submitted changes and take responsibility for the full contents of this PR.

@sfallah sfallah requested review from a team and CISC as code owners June 24, 2026 11:11
@github-actions github-actions Bot added examples python python script changes labels Jun 24, 2026
@ngxson ngxson merged commit 894bb27 into ggml-org:master Jun 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants