Skip to content

Fix stack overflow in trie sibling traversal#11

Open
carpentry-agent[bot] wants to merge 1 commit into
mainfrom
claude/iterative-sibling-traversal
Open

Fix stack overflow in trie sibling traversal#11
carpentry-agent[bot] wants to merge 1 commit into
mainfrom
claude/iterative-sibling-traversal

Conversation

@carpentry-agent

Copy link
Copy Markdown

Summary

  • Converted four recursive sibling-chain functions (sibling-find, sibling-upsert, sibling-remove, get-in-siblings) to iterative loops using stack-based traversal
  • sibling-find and get-in-siblings use a single-element Array as a stack to walk the linked list
  • sibling-upsert and sibling-remove additionally collect a spine Array and rebuild in reverse (standard functional-update pattern for singly linked lists)
  • get-in-siblings now handles both sibling traversal AND child descent iteratively, fixing stack overflow on deep tries (e.g. 100k-part keys)
  • Follows the same iterative pattern already used by subtree-has-entry? in this codebase

What was broken

The trie's sibling chain is a singly linked list traversed by mutual recursion. With many siblings at one node (wide trie) or deep key descent through get-in-siblings, the C stack overflows. The existing test with a 100k-part key already demonstrated this — it crashed with AddressSanitizer: stack-overflow in get-in-siblings.

Test plan

  • All 28 trie tests pass (including the 100k-deep key test that was previously crashing)
  • New test: insert/get/remove on a 10k-wide trie (10k siblings at one level)
  • Memory-balance test passes — no leaks introduced
  • All 222 tests across all 10 data structures pass (hash map, hash set, list, vector, queue, deque, heap, ord-map, ord-set, trie)
  • carp-fmt and angler clean on changed files

Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

Replace mutual recursion in sibling-find, sibling-upsert,
sibling-remove, and get-in-siblings with iterative stack-based
traversal. This prevents stack overflow on both wide tries (many
siblings at one level) and deep tries (long keys).

The approach uses a single-element Array as a stack for the linked-list
walk, and a spine Array with reverse rebuild for the functional-update
functions (upsert, remove). This follows the same pattern already used
by subtree-has-entry?.

Adds a test exercising insert/get/remove on a 10k-wide trie.

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

CI: Pending (both macOS and Ubuntu). Cannot verify build/test results yet.
Local build: No Carp compiler on Pi; relying on CI.

Findings

1. sibling-find — correct. The iterative version uses a single-element stack to walk the singly linked sibling list. Each iteration pops a node, checks its label, and either returns it or pushes its sibling. Semantically equivalent to the original recursion. The stack always has 0 or 1 elements, consistent with the subtree-has-entry? pattern already used in this codebase.

2. sibling-upsert — correct. Uses the spine-collect-and-rebuild pattern:

  • Forward pass: walks siblings, collecting into spine, until the matching label is found (or siblings are exhausted).
  • When found: creates replacement node preserving the matched node's remaining sibling chain (@(%node-sibling head-ref)) — correct.
  • When not found: sets tail to the new child — appending at end, matching original recursive behavior.
  • Reverse pass: rebuilds from spine[len-1] down to spine[0], threading tail through — correct functional update of the singly linked list.
  • Return: (Pair.init tail found) — the rebuilt chain plus whether it was a replacement. Matches original semantics.

3. sibling-remove — correct. Same spine-and-rebuild pattern:

  • When found: tail is set to the matched node's sibling, effectively unlinking it. Rebuild produces the chain without the removed node.
  • When not found: returns @siblings-ref directly without rebuilding — an optimization over the original (which would rebuild then discard at each level via the not @(Pair.b &rec) check). This is correct and slightly more efficient.

4. get-in-siblings — correct, and the most interesting conversion. This function originally had two recursive calls: one for sibling traversal (same level) and one for child descent (deeper level). The iterative version handles both with cur-part/cur-index mutation:

  • Sibling miss → push next sibling onto stack (same part/index)
  • Sibling hit, key exhausted → result found, done
  • Sibling hit, key continues → push child onto stack, advance part/index

The stack always has 0 or 1 elements because the function either descends or moves laterally, never both. When descending, cur-part and cur-index are mutated to track the new position in the key — this is safe because once you descend, the old part/index are no longer needed.

5. Formatting changes. The diff is large (~750 lines) but the vast majority is carp-fmt reformatting in the HAMT section (line wrapping, alignment). I verified the structural changes: the closing-paren consolidations (standalone ) merged into preceding lines) preserve the same total paren count. No semantic changes outside the four sibling functions and the test file.

6. Ref-counting overhead. The iterative versions create extra @ copies of Rc pointers (pushing onto stack, copying from spine). Since these are Rc-wrapped, copies are just refcount increments — cheap. The dropped copies decrement when the stack/spine are cleaned up. Marginally more overhead than the recursive version for non-overflow cases, but necessary for the overflow fix.

7. New test is good. build-wide-trie inserts 10k single-key entries (10k siblings at one level), then verifies get/remove operations. This directly exercises the fixed code path. The existing 100k-deep key test now works instead of crashing.

8. No missed edge case. I looked for:

  • Empty sibling list → handled (Maybe.Nothing sets done = true or skips)
  • Single sibling → works (one iteration)
  • Key not present → get-in-siblings returns Nothing, sibling-remove returns unchanged
  • Zero-length key → handled by get-in-siblings's >= next-i (Array.length key-ref) check

Verdict: revise

The code is correct and well-tested. However, CI is still pending — both macOS and Ubuntu checks have not completed. Given the extensive formatting changes alongside functional changes, CI confirmation is important before merging. Once CI is green, this is ready for merge.

@hellerve

Copy link
Copy Markdown
Member

We need to park this since I’m rewriting the library right now. After the rewrite goes live (depends on carp-lang/Carp#1570), this can be rewritten and integrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant