docs(xtoken): X-Token distillation guide and README updates by avenkateshha · Pull Request #2854 · NVIDIA-NeMo/RL

avenkateshha · 2026-06-16T21:50:10Z

What

Documentation-only changes for X-Token (cross-tokenizer off-policy) distillation, split out from #2797 so the docs can be reviewed independently of the multi-teacher code:

docs/guides/xtoken-off-policy-distillation.md — guide updates: simplified per review, multi-teacher run for results + eval table, and a tokenizer-overlap motivation paragraph + figure.
README.md — rename the feature to "X-Token Distillation" and add a distillation support matrix.
docs/assets/ — add tokenizer_overlap_matrix.png and xtoken_mt_curves.png; drop the stale xtoken_pkl_smoke_curves.png.

Why a separate PR

These docs incorporate offline review feedback on the (now-merged) #2508. The same changes are also present in the multi-teacher PR #2797; this PR carries only the docs so they can land independently of the code.

🤖 Generated with Claude Code

copy-pr-bot · 2026-06-16T21:50:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Shorten the H1 to 'Cross-Tokenizer (X-Token)'. - Trim the future-work note to a generic 'actively improving support'. - Reframe Step 2 around tokenizer overlap (similar algorithms such as BPE). - Describe Step 3's sparse [V_student, top_k] projection representation and why it avoids a dense [student_vocab, teacher_vocab] matrix. - Remove the --preserve_last paragraph; the recommended recipe disables the scale trick, so it never engages. - Drop the 'via CUDA IPC' qualifier from the P-KL loss-mode row. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com>

- Drop 'Off-Policy' from the X-Token section/subsection headings and the feature-list entry; update the matching table-of-contents anchors. - Reword the section's first line to 'distillation' (no 'off-policy'). - Add a link to the x-token distillation paper alongside the implementation guide in the 'read about the details' line. Real file paths (run_xtoken_off_policy_distillation.py, xtoken_off_policy_distillation.yaml, the guide filename) are left unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com>

Add a paragraph and figure on pairwise tokenizer vocabulary overlap (intersection over min vocab size) to motivate why the cross-tokenizer projection is necessary. Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a Distillation overview table comparing the MOPD (on-policy, Megatron) and xToken (off-policy, DTensor V2) recipes across multi-teacher, async, policy, loss, tokenizer, and backend, preceding the recipe sections. Signed-off-by: Adithya Hanasoge <avenkateshha@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

avenkateshha requested review from a team as code owners June 16, 2026 21:50

github-actions Bot added the Documentation Improvements or additions to documentation label Jun 16, 2026

avenkateshha and others added 3 commits June 16, 2026 15:04

avenkateshha force-pushed the avenkateshha/xtoken-docs branch from 6b0c4a0 to 439d6dd Compare June 16, 2026 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(xtoken): X-Token distillation guide and README updates#2854

docs(xtoken): X-Token distillation guide and README updates#2854
avenkateshha wants to merge 4 commits into
mainfrom
avenkateshha/xtoken-docs

avenkateshha commented Jun 16, 2026

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avenkateshha commented Jun 16, 2026

What

Why a separate PR

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant