fix: cap embedding provider input length#899
Conversation
rwmjhb
left a comment
There was a problem hiding this comment.
Requesting changes. The input cap direction is useful, but this head has two issues that should be fixed before merge.
Must fix
-
The input cap truncates long documents before the existing chunking path can preserve later content. In
embedSingle, the code computesinputText = this.prepareInput(text)before the provider request and then passes that already-cappedinputTextintosmartChunk()on context errors.embedManysimilarly storesprepareInput(text)invalidTextsbefore batch fallback chunking. Fornomic-embed-text, the default cap is 1400 chars, so a long memory can be embedded from only the first 1400 characters and never reach the chunk-and-average path. That prevents provider overload, but it regresses recall for facts after the capped prefix. Please applymaxInputCharsper provider request/chunk rather than truncating the whole document before chunking, or make this explicit truncation mode with clear tests/docs for the recall tradeoff. -
The checked-in generated output is stale. I verified locally on
9c900f5:
npm run build --if-presentpassesnode --test test/embedder-max-input-chars.test.mjspassesnode test/plugin-manifest-regression.mjspasses
After build, the worktree is dirty:
M dist/src/retriever.js
The generated change is the FLEET-PATCH(#884) retriever comment expansion. Please run the build and commit the updated generated dist output, or remove unrelated generated/comment drift from this PR.
Also worth considering: truncation is silent when content is dropped before embedding, so users may not realize recall quality changed.
rwmjhb
left a comment
There was a problem hiding this comment.
Requesting changes. The latest head fixes the earlier whole-document pre-truncation problem and the generated output is now clean, but there is still a blocking fallback bug in the batch path.
Independent verification on dcf1bfa passed:
npm run build --if-presentnode --test test/embedder-max-input-chars.test.mjsnode test/plugin-manifest-regression.mjs- build left source/dist/package/manifest/docs/tests clean
Must fix
embedMany() handles any batch context error by sending every valid text through embedChunkedText(). That helper is appropriate for a single input that is itself too large, but it also applies the single-chunk strict-reduction guard and retries with inputText.slice(0, safeLimit).
A local embedding server can reject a batch because the aggregate request is too large while each item would succeed individually. In that case, the fallback can truncate otherwise valid memories and silently drop trailing content. This recreates the recall-quality regression the PR is meant to avoid, just in the batch aggregate-limit path.
Please separate these two cases: for a batch context-limit failure, retry items individually first, or chunk per item by maxInputChars, but only apply the strict single-input truncation fallback after that individual request also fails with a context/input-size error. Add a regression test where the initial batch fails but each individual item fits and verify the provider receives the full individual inputs, not truncated prefixes.
Also worth addressing
- The full npm test run timed out in orchestrator (
npm test exceeded 180s); targeted tests are useful, but broad regression coverage did not complete in that run. - The new normal long-input chunking paths emit unconditional console output, which can be noisy during routine ingestion.
- If one item exceeds
maxInputChars, the whole batch currently falls back to per-item calls; that changes throughput and timeout behavior for mixed batches.
|
Fixed the batch fallback case: if a provider rejects the aggregate batch for context length, the fallback now retries each item individually first, so valid per-item inputs are not chunked or reduced just because the batch was too large.\n\nI added a regression for that path and reran:\n- npm run build --if-present\n- node --test test/embedder-max-input-chars.test.mjs\n- node test/plugin-manifest-regression.mjs |
rwmjhb
left a comment
There was a problem hiding this comment.
Approved. Orchestrator verdict: approve, and I also ran an independent verification on head c66f8fd.
Verification run:
- npm ci --silent
- npm run build --if-present
- node --test test/embedder-max-input-chars.test.mjs
- node test/plugin-manifest-regression.mjs
- git diff --exit-code -- dist src index.ts package.json package-lock.json openclaw.plugin.json README.md scripts test
All passed. Remaining items are non-blocking: consider routing the new routine chunking/fallback output through debug/diagnostics instead of direct console output, and improving per-item retry error attribution in the batch fallback.
Refs #834.
Adds an explicit
embedding.maxInputCharscap for embedding provider inputs, with a conservative default fornomic-embed-textlocal servers. The cap is applied before provider requests and cache lookup so cached embeddings match the actual submitted text.Also exposes the config in the plugin schema/UI hints and adds focused regression coverage.