fix(data): stabilize multi-turn chat chunking and tokenization by jinglinglingling · Pull Request #2856 · NVIDIA-NeMo/RL

jinglinglingling · 2026-06-17T04:55:05Z

Use overlap-aware chunk extraction and context-aware token slicing in get_formatted_message_log so non-monotonic reasoning templates do not duplicate prior assistant text and sentencepiece tokenizers do not produce leading-space drift on assistant targets.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

#2844
#2821

copy-pr-bot · 2026-06-17T04:55:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jinglinglingling · 2026-06-17T04:56:47Z

/ok to test c716f68

Use overlap-aware chunk extraction and context-aware token slicing in get_formatted_message_log so non-monotonic reasoning templates do not duplicate prior assistant text and sentencepiece tokenizers do not produce leading-space drift on assistant targets. Signed-off-by: Linglin Jing <linglinj@cw-dfw-cs-001-vscode-01.cm.cluster>

Signed-off-by: Linglin Jing <linglinj@cw-dfw-cs-001-vscode-01.cm.cluster>

jinglinglingling · 2026-06-17T05:09:39Z

/ok to test df2f8a2

Signed-off-by: Linglin Jing <linglinj@cw-dfw-cs-001-vscode-01.cm.cluster>

jinglinglingling · 2026-06-17T05:55:29Z

hi @yuki-97, please review this PR for #2844 and #2821 .

jinglinglingling requested review from a team as code owners June 17, 2026 04:55

jinglinglingling added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Jun 17, 2026

copy-pr-bot Bot temporarily deployed to public June 17, 2026 04:57 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 05:00 Inactive

Linglin Jing added 2 commits June 16, 2026 22:06

style(data): trim verbose comments in message log fix

df2f8a2

Signed-off-by: Linglin Jing <linglinj@cw-dfw-cs-001-vscode-01.cm.cluster>

jinglinglingling force-pushed the fix/issue-2821-2844-message-log-tokenization-main branch from aeb1875 to df2f8a2 Compare June 17, 2026 05:08

copy-pr-bot Bot temporarily deployed to public June 17, 2026 05:09 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 05:10 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 05:14 Inactive

jinglinglingling force-pushed the fix/issue-2821-2844-message-log-tokenization-main branch from eb07166 to df2f8a2 Compare June 17, 2026 05:20

fix(tests): import cast in llm message utils test

6431256

Signed-off-by: Linglin Jing <linglinj@cw-dfw-cs-001-vscode-01.cm.cluster>

jinglinglingling requested a review from yuki-97 June 17, 2026 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(data): stabilize multi-turn chat chunking and tokenization#2856

fix(data): stabilize multi-turn chat chunking and tokenization#2856
jinglinglingling wants to merge 3 commits into
mainfrom
fix/issue-2821-2844-message-log-tokenization-main

jinglinglingling commented Jun 17, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 17, 2026

Uh oh!

jinglinglingling commented Jun 17, 2026

Uh oh!

jinglinglingling commented Jun 17, 2026

Uh oh!

jinglinglingling commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jinglinglingling commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Uh oh!

copy-pr-bot Bot commented Jun 17, 2026

Uh oh!

jinglinglingling commented Jun 17, 2026

Uh oh!

jinglinglingling commented Jun 17, 2026

Uh oh!

jinglinglingling commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jinglinglingling commented Jun 17, 2026 •

edited

Loading

jinglinglingling commented Jun 17, 2026 •

edited

Loading