Skip to content

graph, store: reject grafts below the base subgraph's earliest_block#6617

Open
rslowinski wants to merge 3 commits into
graphprotocol:masterfrom
rslowinski:upstream-fix-graft-below-prune-floor
Open

graph, store: reject grafts below the base subgraph's earliest_block#6617
rslowinski wants to merge 3 commits into
graphprotocol:masterfrom
rslowinski:upstream-fix-graft-below-prune-floor

Conversation

@rslowinski
Copy link
Copy Markdown

The bug

Graft::validate permits grafting at a block below the base subgraph's earliest_block_number. When that happens, the copy silently produces a destination with reset state for any entity whose live-at-graft version had been pruned from the base. This is most visible on heavily-updated mutable singletons: every sub-graft-block version of such an entity is a closed historical version, every closed historical version below the prune floor is eligible for pruning, so the live-at-graft version is gone, the copy (lower(block_range) <= graft_block) finds nothing for that entity, and the handler subsequently re-initialises it from defaults. Immutable entities and rarely-updated mutable entities (whose live-at-graft version is still open) are unaffected, so the corruption is selective and easy to miss in spot checks.

A structural fingerprint of an affected deployment: its earliest_block_number is strictly greater than its head/graft block. That state cannot arise from a correctly-performed graft; it's the inversion produced by copy_earliest_block propagating the parent's prune floor into a destination seeded at a block below it.

Reproduction (new tests)

graft_validate_rejects_below_prune_floor and graft_store_path_rejects_below_prune_floor capture the minimal repro against a fresh Postgres in ~1–2s:

  1. Start with three test entities at blocks 0, 1, 2 (the default setup in graft.rs).
  2. Update user 2 at block 3, closing its [1, 3) version (which was live at block 2).
  3. Forward head to block 6, then prune with history_blocks=2earliest_block=4. This removes the closed user 2 [1, 3) and user 3 [1, 2).
  4. Without this fix, grafting at block 2 succeeds, and the destination is missing user 2: its only surviving version [3, ) is excluded by the copy filter lower(block_range) <= 2. With this fix, the graft is rejected with an actionable error naming the prune floor.

Fix (3 commits)

  1. graph, store: validate graft block against base subgraph's earliest_block — adds SubgraphStore::earliest_block_number, uses it in Graft::validate to reject sub-floor grafts at manifest validation time. Makes Graft::validate pub so tooling and tests can validate a graft directly without resolving a full UnvalidatedSubgraphManifest.

  2. store: enforce graft prune-floor invariant in start_subgraph — defence in depth: a pre-copy check in DeploymentStore::start_subgraph catches callers that reach the store path without going through Graft::validate (graphman, custom tooling, test_store::create_subgraph). A post-copy check detects the prune-during-copy race that copy_earliest_block already warns about.

  3. store: take the max of source and destination history_blocks during graft — secondary issue found in the same code path. Graft finalization currently overwrites the destination's history_blocks with the source's value unconditionally, silently downgrading prune: never children to whatever retention the parent used. Use max(src, dst) instead.

Related upstream work

#6384 added the analogous check on the graphman copy admin path. This PR extends the same invariant to the deployment graft path (Graft::validate + start_subgraph). The two PRs are complementary — neither makes the other redundant.

Tests

Three new regression tests in store/test-store/tests/postgres/graft.rs:

  • graft_validate_rejects_below_prune_floor — manifest-level check.
  • graft_store_path_rejects_below_prune_floor — store-level check (also confirms graft at the floor is still accepted).
  • graft_history_blocks_takes_max_of_parent_and_child — secondary fix.

Existing graft, prune, copy, on_sync tests are unchanged and green. cargo fmt --all -- --check, cargo check --workspace --tests, and cargo check --release --workspace all pass on rustc 1.91.

Notes for the reviewer

  • Graft::validate becomes pub. The struct and the validation error type were already public; making the function callable directly from tooling/tests removes the need to resolve a full manifest in those callers. Happy to use a different approach (e.g. a thin pub wrapper, or a test-only re-export) if you'd prefer the function itself to stay crate-private.

  • The history_blocks change to max() is a deliberate behaviour change. The previous overwrite was internally consistent if you assumed destinations should match their source's retention, but it silently overrides an explicit prune: never on the destination. max() honours both constraints simultaneously. Easy to split into its own PR if you'd rather discuss it independently from the prune-floor fix.

  • Does this recover existing subgraphs already corrupted by this bug? No — re-indexing is the remediation. The fix prevents new occurrences only.

…lock

Reject grafts whose block is below the base subgraph's
earliest_block_number. When the base has been pruned past the
requested graft block, the versions of mutable entities that were
live at the graft block have been removed (closed versions with
upper(block_range) <= earliest_block are eligible for pruning).
The copy then reads versions where lower(block_range) <= graft_block
and finds none for those entities, leaving them absent in the
destination; the post-copy re-open (RevertClampQuery) has no
version to re-open. Entities whose live-at-graft version is still
open (never updated again, hence never pruned) and all immutable
entities are unaffected — only heavily-updated mutable singletons
are silently reset to their default state.

Add SubgraphStore::earliest_block_number and use it in
Graft::validate to reject the manifest before deployment
registration. Make Graft::validate `pub` so tooling and tests can
validate a graft directly without resolving a full
UnvalidatedSubgraphManifest.
Add a pre-copy check rejecting graft.block <
src.earliest_block_number, mirroring the manifest-level check in
Graft::validate. Callers that reach DeploymentStore::start_subgraph
directly (graphman, custom tooling, test-store helpers) bypass the
registrar and therefore the manifest validator; the store-level
check guarantees the invariant is enforced for every graft.

Also add a post-copy check that fails the transaction if the
source's earliest_block_number advanced past the graft block while
the copy was running — the prune-during-copy race that
copy_earliest_block already acknowledges. Partial copied data
remains in the destination's tables and must be cleaned up by
deleting and recreating the deployment; document this in the
error message.
…raft

The graft finalization previously overwrote the destination's
history_blocks with the source's value unconditionally. As a
result, a destination subgraph manifesting `prune: never`
(history_blocks = BLOCK_NUMBER_MAX) grafted from a `prune: auto`
source silently inherited the source's much smaller retention
window, contradicting the destination's manifest.

Use max(src, dst) instead. The destination's retention is at
least as long as the source's (otherwise the inherited copied
data would be immediately eligible for pruning) while a
destination that requests longer retention is no longer silently
downgraded.

Also expose `SubgraphStore::history_blocks` mirroring
`set_history_blocks`, so the test can verify the post-graft
retention value.
@rslowinski rslowinski force-pushed the upstream-fix-graft-below-prune-floor branch from 7ee5da8 to e9f7dc3 Compare June 4, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant