feat(ppo): Megatron value-model sequence packing + context parallelism by bg51717 · Pull Request #2839 · NVIDIA-NeMo/RL

bg51717 · 2026-06-16T10:05:10Z

What does this PR do ?

Adds sequence packing and context parallelism (CP) support to the Megatron PPO value model, building on the in-model value head from PR #2825. Also fixes a value loss-scaling bug (the per-microbatch value loss was averaged instead of summed).

Issues

Closes #2687 (PR 2 of 2). Stacked on #2825 — the base branch is ppo-mcore-value-head until #2825 merges, after which this retargets to main.

Usage

Context parallelism for the value model requires sequence packing (matching Megatron-Core). Both are configured like the policy:

value:
  sequence_packing:
    enabled: true                    # now supported
  megatron_cfg:
    context_parallel_size: 2         # now supported (requires sequence_packing)
  make_sequence_length_divisible_by: 4   # must be a multiple of 2*CP

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Test

Tests run (all pass):

Unit (tests/unit/models/value/test_megatron_value_worker.py, mcore, 2 GPU): packed-unpack (variable-length, CPU), packed train step (incl. PP2), and get_values equivalence across SP / dynamic batching / PP / sequence packing / CP (CP=1 vs CP=2, bf16), plus DP/TP regression. 19 passed.
Functional (tests/functional/ppo_megatron.sh): passes.
Convergence (Qwen2.5-1.5B, GSM8K, 1n8g, 40 steps), train reward / val accuracy — the two nightly value-parallelism recipes:

valuetp2sp-dynbatch (TP2 + SP + dynamic batching; re-validates the loss-scaling fix does not regress convergence): 0.85 / 0.76

valuetp2sp-pp2cp2-pack (TP2 + SP + PP2 + CP2 + sequence packing): 0.84 / 0.72

copy-pr-bot · 2026-06-16T10:05:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: bg51717 <biguo@nvidia.com>

bg51717 · 2026-06-19T12:14:46Z

/ok to test 6cfe536

bg51717 · 2026-06-19T12:33:36Z

/ok to test 6cfe536

bg51717 added 5 commits June 18, 2026 05:49

feat(ppo): Megatron value-model sequence packing + context parallelism

3ccae05

Signed-off-by: bg51717 <biguo@nvidia.com>

test(ppo): add value packed-unpack + packed train-step unit coverage

9124cbd

Signed-off-by: bg51717 <biguo@nvidia.com>

fix(ppo): counteract Megatron loss averaging in value forward step

6e103d0

Signed-off-by: bg51717 <biguo@nvidia.com>

test(ppo): consolidate value nightly recipes into two max-loaded configs

03bc6d9

Signed-off-by: bg51717 <biguo@nvidia.com>

fix(ppo): use packed micro_batch_size in value train forward-backward

b75f7d3

Signed-off-by: bg51717 <biguo@nvidia.com>

bg51717 force-pushed the ppo-mcore-value-seqpack-cp branch from dde8f94 to b75f7d3 Compare June 19, 2026 09:01

cp>1 unit test

6cfe536

Signed-off-by: bg51717 <biguo@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 19, 2026 12:15 Inactive

copy-pr-bot Bot temporarily deployed to public June 19, 2026 12:18 Inactive

bg51717 added the CI:L1 Run doctests, unit tests, and functional tests label Jun 19, 2026

copy-pr-bot Bot temporarily deployed to test June 19, 2026 12:37 Inactive

bg51717 marked this pull request as ready for review June 19, 2026 15:32

bg51717 requested review from a team as code owners June 19, 2026 15:32

bg51717 requested a review from yuki-97 June 19, 2026 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ppo): Megatron value-model sequence packing + context parallelism#2839

feat(ppo): Megatron value-model sequence packing + context parallelism#2839
bg51717 wants to merge 6 commits into
NVIDIA-NeMo:mainfrom
bg51717:ppo-mcore-value-seqpack-cp

bg51717 commented Jun 16, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

bg51717 commented Jun 19, 2026

Uh oh!

bg51717 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bg51717 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Test

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

bg51717 commented Jun 19, 2026

Uh oh!

bg51717 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bg51717 commented Jun 16, 2026 •

edited

Loading