Skip to content

perf(workspace): batch deps-cache invalidation into one workspace fs scan#10445

Open
davidfirst wants to merge 3 commits into
masterfrom
component-loading-batch-deps-invalidation
Open

perf(workspace): batch deps-cache invalidation into one workspace fs scan#10445
davidfirst wants to merge 3 commits into
masterfrom
component-loading-batch-deps-invalidation

Conversation

@davidfirst

Copy link
Copy Markdown
Member

Part of the component-loading redesign (scopes/workspace/workspace/component-loading-redesign.md, Phase 2).

The deps fs-cache freshness check ran a recursive globby per component that followed each component's node_modules symlink into the shared workspace node_modules — 226k of 230k scanned entries, run 313× per command. This replaces it with a single node_modules-ignoring workspace scan, memoized as a command-scoped mtime index on FsCache and invalidated through the workspace's existing clear-cache hooks (so watch/start stay correct), with a per-component fallback for entries not in the scan.

Measured on this repo's workspace (~313 components): warm bit status filesystem syscalls 74.3k → 44.8k (~40% fewer); the statFiles sub-step dropped 22.3s → 0.15s aggregate; readFile traffic is unchanged (checked against the bootstrap fs-read e2e metric — no read regression).

Warm wall is ~flat on a fast local SSD: the eliminated work is I/O-wait that overlaps with CPU on the single JS thread, so it was never on the warm critical path. The win lands on cold / CI / networked filesystems, where those ~30k syscalls aren't free. §4.1 of the redesign doc is updated with the full breakdown (this also corrects an earlier "object materialization" misread — deserialize is 9ms).

…scan

The deps fs-cache freshness check ran a recursive globby per component that
followed each component's node_modules symlink into the shared workspace
node_modules (226k of 230k scanned entries), 313x per command.

Replace it with a single node_modules-ignoring workspace scan, memoized as a
command-scoped mtime index on FsCache and invalidated via the workspace's
clear-cache hooks (so watch stays correct), with a per-component fallback.
Cuts warm `bit status` fs syscalls ~40% (74.3k -> 44.8k); read traffic
unchanged. Warm-wall-neutral on fast SSD (I/O-wait overlapping CPU), a real
win on cold/CI/networked filesystems.
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 23, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📜 Skill insights (0)

Grey Divider


Action required

1. Stuck index build promise ✓ Resolved 🐞 Bug ☼ Reliability
Description
FsCache.getOrBuildComponentsMtimeIndex() only clears componentsMtimeIndexBuilding on success, so
if the build rejects once the rejected promise is retained and all later callers will keep failing
in the same process. This can break deps-cache reads for long-lived commands (e.g. watch/start)
until restart.
Code

scopes/workspace/modules/fs-cache/fs-cache.ts[R33-45]

+  async getOrBuildComponentsMtimeIndex(build: () => Promise<Map<string, number>>): Promise<Map<string, number>> {
+    if (this.componentsMtimeIndex) return this.componentsMtimeIndex;
+    if (!this.componentsMtimeIndexBuilding) {
+      const gen = this.componentsMtimeIndexGen;
+      this.componentsMtimeIndexBuilding = build().then((index) => {
+        // if the index was cleared while building, don't cache this now-stale result as canonical.
+        if (gen === this.componentsMtimeIndexGen) this.componentsMtimeIndex = index;
+        this.componentsMtimeIndexBuilding = undefined;
+        return index;
+      });
+    }
+    return this.componentsMtimeIndexBuilding;
+  }
Evidence
The memoized promise is only cleared in the success path, so a rejection leaves
componentsMtimeIndexBuilding set forever. The build path can reject because getPathStatIfExist
rethrows non-ENOENT errors and globby(..., { stats: true }) can also reject.

scopes/workspace/modules/fs-cache/fs-cache.ts[33-45]
scopes/toolbox/fs/last-modified/last-modified.ts[35-41]
scopes/toolbox/fs/last-modified/last-modified.ts[82-100]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`FsCache.getOrBuildComponentsMtimeIndex()` memoizes the first build in `componentsMtimeIndexBuilding`, but it only resets that field in the `.then()` success handler. If the build throws/rejects (e.g. a filesystem stat error), the rejected promise remains stored and future calls cannot recover.
### Issue Context
The build function used by deps invalidation (`buildDirsLastModifiedIndex`) performs a large glob + many stats, and can reject on non-ENOENT filesystem errors.
### Fix Focus Areas
- scopes/workspace/modules/fs-cache/fs-cache.ts[33-45]
### Implementation notes
- Ensure `componentsMtimeIndexBuilding` is cleared in a `finally` (or a `.catch()` that rethrows) so a transient build failure doesn’t permanently poison the cache.
- Consider also logging the error at debug/trace level to aid diagnosing glob/stat failures.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Fallback rescans node_modules 🐞 Bug ➹ Performance ⭐ New
Description
DependenciesLoader.getComponentLastModified() falls back to getLastModifiedComponentTimestampMs()
when an index entry is missing, but that per-component scan does not ignore node_modules and can
again traverse the component’s node_modules symlink tree. This means watch/single-component
invalidations (or any index-miss) can still pay the old worst-case traversal cost and behave
differently than the shared index path.
Code

scopes/dependencies/dependencies/dependencies-loader/dependencies-loader.ts[R154-174]

+  private async getComponentLastModified(workspace: Workspace, rootDir: string): Promise<number> {
+    let index: Map<string, number> | undefined;
+    try {
+      index = await workspace.consumer.componentFsCache.getOrBuildComponentsMtimeIndex(() =>
+        buildDirsLastModifiedIndex(
+          workspace.path,
+          workspace.consumer.bitMap.getAllComponents().map((componentMap) => componentMap.getComponentDir())
+        )
+      );
+    } catch (err: any) {
+      // a centralized scan failure (e.g. a filesystem error on one dir) shouldn't fail every
+      // component's load — fall back to the per-component scan below, preserving fault isolation.
+      this.logger.debug(`dependencies-loader, failed building the components mtime index: ${err?.message || err}`);
+    }
+    const fromIndex = index?.get(rootDir);
+    if (fromIndex !== undefined) return fromIndex;
+    const filesPaths = this.component.files.map((file) => file.path);
+    filesPaths.push(path.join(workspace.path, rootDir, COMPONENT_CONFIG_FILE_NAME));
+    const lastModified = await getLastModifiedComponentTimestampMs(rootDir, filesPaths);
+    index?.set(rootDir, lastModified);
+    return lastModified;
Evidence
The shared index path explicitly ignores node_modules, but the fallback calls the legacy
per-component timestamp function, whose internal directory scan has the node_modules ignore
commented out; therefore index misses can still traverse node_modules and pay the old cost.

scopes/dependencies/dependencies/dependencies-loader/dependencies-loader.ts[154-175]
scopes/toolbox/fs/last-modified/last-modified.ts[12-19]
scopes/toolbox/fs/last-modified/last-modified.ts[70-89]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`getComponentLastModified()` uses `buildDirsLastModifiedIndex()` (which ignores `node_modules`) for the shared index, but when the entry is missing it falls back to `getLastModifiedComponentTimestampMs()`, whose directory scan currently does **not** ignore `node_modules`. This reintroduces the expensive traversal on fallback and makes behavior inconsistent between index-hit and index-miss cases.

### Issue Context
The intent of the PR is to avoid following component `node_modules` symlinks into the workspace `node_modules` during deps-cache freshness checks.

### Fix Focus Areas
- scopes/dependencies/dependencies/dependencies-loader/dependencies-loader.ts[154-174]
- scopes/toolbox/fs/last-modified/last-modified.ts[12-19]
- scopes/toolbox/fs/last-modified/last-modified.ts[70-89]

### Suggested fix
Update the fallback path to use the same `node_modules`-ignoring logic as the shared index. Options:
1) In `getComponentLastModified()`, when `fromIndex` is missing, compute last-modified via `buildDirsLastModifiedIndex(workspace.path, [rootDir])` (same ignore defaults) and use that value; optionally also stat the component config explicitly if needed.
2) Alternatively, extend `getLastModifiedComponentTimestampMs()` / `getLastModifiedDirTimestampMs()` to accept an `ignore` list (defaulting to current behavior), and call it from the fallback with `['**/node_modules/**']` to match the batched scan semantics.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Inflight build invalidation race ✓ Resolved 🐞 Bug ≡ Correctness
Description
deleteComponentMtimeIndexEntry() only deletes from an already-built map and does not invalidate an
in-flight index build, so a build started before a watch-triggered cache clear can still be cached
as canonical afterward. This can cause deps-cache staleness checks to miss a just-changed component
and incorrectly reuse cached dependencies.
Code

scopes/workspace/modules/fs-cache/fs-cache.ts[R33-57]

+  async getOrBuildComponentsMtimeIndex(build: () => Promise<Map<string, number>>): Promise<Map<string, number>> {
+    if (this.componentsMtimeIndex) return this.componentsMtimeIndex;
+    if (!this.componentsMtimeIndexBuilding) {
+      const gen = this.componentsMtimeIndexGen;
+      this.componentsMtimeIndexBuilding = build().then((index) => {
+        // if the index was cleared while building, don't cache this now-stale result as canonical.
+        if (gen === this.componentsMtimeIndexGen) this.componentsMtimeIndex = index;
+        this.componentsMtimeIndexBuilding = undefined;
+        return index;
+      });
+    }
+    return this.componentsMtimeIndexBuilding;
+  }
+
+  /** drop the whole index (e.g. on a full workspace cache clear). */
+  clearComponentsMtimeIndex() {
+    this.componentsMtimeIndex = undefined;
+    this.componentsMtimeIndexBuilding = undefined;
+    this.componentsMtimeIndexGen += 1;
+  }
+
+  /** drop a single component's entry so its next load recomputes it (e.g. on a watch file change). */
+  deleteComponentMtimeIndexEntry(rootDir: string) {
+    this.componentsMtimeIndex?.delete(rootDir);
+  }
Evidence
Watch triggers workspace.clearComponentCache(), which deletes an index entry, but the delete
method only affects the already-built map. The build caching decision is based on the generation
captured at build start and delete does not bump it, so an in-flight build can still be accepted as
canonical.

scopes/workspace/watcher/watcher.ts[651-664]
scopes/workspace/workspace/workspace.ts[840-846]
scopes/workspace/modules/fs-cache/fs-cache.ts[33-45]
scopes/workspace/modules/fs-cache/fs-cache.ts[54-57]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`deleteComponentMtimeIndexEntry()` is intended to force recomputation for a component after `workspace.clearComponentCache()`, but it’s a no-op if the shared index hasn’t been materialized yet (or is still building). Because the in-flight build is still eligible to be cached (generation unchanged), it can reintroduce the stale entry.
### Issue Context
Watch flows call `workspace.clearComponentCache()` on file changes, which calls `deleteComponentMtimeIndexEntry(componentDir)`.
### Fix Focus Areas
- scopes/workspace/modules/fs-cache/fs-cache.ts[33-57]
### Implementation notes
Pick one:
- On `deleteComponentMtimeIndexEntry()`, if `componentsMtimeIndexBuilding` is set, bump `componentsMtimeIndexGen` (and/or call `clearComponentsMtimeIndex()`) so the in-flight build result will not be cached as canonical.
- Alternatively track pending deletions and apply them to the resolved index before caching/returning it.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit db242dc

Results up to commit N/A


New Review Started


This review has been superseded by a new analysis


Qodo Logo

@qodo-free-for-open-source-projects

Copy link
Copy Markdown

PR Summary by Qodo

perf(workspace): batch deps-cache invalidation with shared workspace mtime index
✨ Enhancement 📝 Documentation 🕐 40+ Minutes

Grey Divider

Description

• Replace per-component recursive mtime scans with a single workspace scan shared across components.
• Memoize the workspace-wide mtime index in FsCache and invalidate it via existing clear-cache
 hooks.
• Update the component-loading redesign doc with corrected profiling conclusions and new perf
 numbers.
Diagram

graph TD
DL["DependenciesLoader"] --> FC["FsCache mtime index"] --> BLD["buildDirsLastModifiedIndex"] --> FS[("Workspace FS")]
DL --> FB["Per-component fallback"] --> FS
WS["Workspace clear-cache"] --> FC
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Persistent mtime index across commands
  • ➕ Eliminates the workspace scan on every command; best-case warm performance on very large workspaces
  • ➕ Can be incrementally updated on watch events rather than rebuilt
  • ➖ More correctness edge cases (stale index on crashes, external file changes, git operations)
  • ➖ Requires durable storage format/versioning and robust invalidation rules
2. Custom filesystem walker instead of globby(stats: true)
  • ➕ Potentially fewer allocations and lower overhead than globby’s generalized matcher
  • ➕ More control over symlink handling and ignore rules
  • ➖ Higher maintenance surface (OS quirks, symlinks, dotfiles)
  • ➖ Reimplements behavior globby already provides; risk of subtle traversal bugs

Recommendation: The PR’s approach (single scan per command + memoized index with clear-cache invalidation and per-component fallback) is the best near-term tradeoff: it removes the pathological N× scan behavior while keeping correctness localized and leveraging existing cache-clear pathways (watch/start). A persistent cross-command index could yield further gains but adds significant correctness and maintenance risk.

Files changed (6) +184 / -28

Enhancement (5) +130 / -4
dependencies-loader.tsUse workspace-wide mtime index for deps-cache staleness checks +25/-4

Use workspace-wide mtime index for deps-cache staleness checks

• Replaces the per-component recursive last-modified calculation with a helper that consults a shared workspace mtime index. Falls back to the previous per-component scan when the index lacks an entry (e.g., after single-component cache clears) and memoizes the computed fallback result back into the index.

scopes/dependencies/dependencies/dependencies-loader/dependencies-loader.ts

index.tsExport buildDirsLastModifiedIndex from fs last-modified toolbox +1/-0

Export buildDirsLastModifiedIndex from fs last-modified toolbox

• Adds the new multi-directory last-modified index builder to the package’s public exports so workspace code can reuse it.

scopes/toolbox/fs/last-modified/index.ts

last-modified.tsAdd single-scan multi-directory last-modified index builder +64/-0

Add single-scan multi-directory last-modified index builder

• Introduces buildDirsLastModifiedIndex(), which computes max mtime per component directory via a single globby scan (including nested dirs) while ignoring node_modules by default. Adds ownerDir() to associate scanned entries with the deepest matching input dir and explicitly stats the root dirs to capture deletions directly under them.

scopes/toolbox/fs/last-modified/last-modified.ts

fs-cache.tsMemoize command-scoped component mtime index with safe invalidation +37/-0

Memoize command-scoped component mtime index with safe invalidation

• Adds a cached componentsMtimeIndex plus a shared in-flight build promise to prevent duplicate concurrent builds. Implements generation-based protection so an index cleared mid-build doesn’t get re-cached, and provides APIs to clear the whole index or delete a single component entry.

scopes/workspace/modules/fs-cache/fs-cache.ts

workspace.tsInvalidate shared mtime index via existing workspace cache clear hooks +3/-0

Invalidate shared mtime index via existing workspace cache clear hooks

• Extends clearAllComponentsCache() to drop the entire mtime index and clearComponentCache() to delete the specific component directory entry. This keeps watch/start correctness by ensuring future loads recompute mtimes after cache invalidation events.

scopes/workspace/workspace/workspace.ts

Documentation (1) +54 / -24
component-loading-redesign.mdUpdate redesign doc with corrected profiling + batched invalidation results +54/-24

Update redesign doc with corrected profiling + batched invalidation results

• Marks the batched deps-cache invalidation scan as shipped in Phase 2 and updates the profiling narrative to reflect that the hotspot was filesystem traversal (not dependency object materialization). Adds the measured syscall/statFiles reductions and clarifies wall-time vs aggregate self-time interpretation.

scopes/workspace/workspace/component-loading-redesign.md

@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📜 Skill insights (0)

Grey Divider


Action required

1. Stuck index build promise 🐞 Bug ☼ Reliability
Description
FsCache.getOrBuildComponentsMtimeIndex() only clears componentsMtimeIndexBuilding on success, so
if the build rejects once the rejected promise is retained and all later callers will keep failing
in the same process. This can break deps-cache reads for long-lived commands (e.g. watch/start)
until restart.
Code

scopes/workspace/modules/fs-cache/fs-cache.ts[R33-45]

+  async getOrBuildComponentsMtimeIndex(build: () => Promise<Map<string, number>>): Promise<Map<string, number>> {
+    if (this.componentsMtimeIndex) return this.componentsMtimeIndex;
+    if (!this.componentsMtimeIndexBuilding) {
+      const gen = this.componentsMtimeIndexGen;
+      this.componentsMtimeIndexBuilding = build().then((index) => {
+        // if the index was cleared while building, don't cache this now-stale result as canonical.
+        if (gen === this.componentsMtimeIndexGen) this.componentsMtimeIndex = index;
+        this.componentsMtimeIndexBuilding = undefined;
+        return index;
+      });
+    }
+    return this.componentsMtimeIndexBuilding;
+  }
Evidence
The memoized promise is only cleared in the success path, so a rejection leaves
componentsMtimeIndexBuilding set forever. The build path can reject because getPathStatIfExist
rethrows non-ENOENT errors and globby(..., { stats: true }) can also reject.

scopes/workspace/modules/fs-cache/fs-cache.ts[33-45]
scopes/toolbox/fs/last-modified/last-modified.ts[35-41]
scopes/toolbox/fs/last-modified/last-modified.ts[82-100]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`FsCache.getOrBuildComponentsMtimeIndex()` memoizes the first build in `componentsMtimeIndexBuilding`, but it only resets that field in the `.then()` success handler. If the build throws/rejects (e.g. a filesystem stat error), the rejected promise remains stored and future calls cannot recover.

### Issue Context
The build function used by deps invalidation (`buildDirsLastModifiedIndex`) performs a large glob + many stats, and can reject on non-ENOENT filesystem errors.

### Fix Focus Areas
- scopes/workspace/modules/fs-cache/fs-cache.ts[33-45]

### Implementation notes
- Ensure `componentsMtimeIndexBuilding` is cleared in a `finally` (or a `.catch()` that rethrows) so a transient build failure doesn’t permanently poison the cache.
- Consider also logging the error at debug/trace level to aid diagnosing glob/stat failures.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Inflight build invalidation race 🐞 Bug ≡ Correctness
Description
deleteComponentMtimeIndexEntry() only deletes from an already-built map and does not invalidate an
in-flight index build, so a build started before a watch-triggered cache clear can still be cached
as canonical afterward. This can cause deps-cache staleness checks to miss a just-changed component
and incorrectly reuse cached dependencies.
Code

scopes/workspace/modules/fs-cache/fs-cache.ts[R33-57]

+  async getOrBuildComponentsMtimeIndex(build: () => Promise<Map<string, number>>): Promise<Map<string, number>> {
+    if (this.componentsMtimeIndex) return this.componentsMtimeIndex;
+    if (!this.componentsMtimeIndexBuilding) {
+      const gen = this.componentsMtimeIndexGen;
+      this.componentsMtimeIndexBuilding = build().then((index) => {
+        // if the index was cleared while building, don't cache this now-stale result as canonical.
+        if (gen === this.componentsMtimeIndexGen) this.componentsMtimeIndex = index;
+        this.componentsMtimeIndexBuilding = undefined;
+        return index;
+      });
+    }
+    return this.componentsMtimeIndexBuilding;
+  }
+
+  /** drop the whole index (e.g. on a full workspace cache clear). */
+  clearComponentsMtimeIndex() {
+    this.componentsMtimeIndex = undefined;
+    this.componentsMtimeIndexBuilding = undefined;
+    this.componentsMtimeIndexGen += 1;
+  }
+
+  /** drop a single component's entry so its next load recomputes it (e.g. on a watch file change). */
+  deleteComponentMtimeIndexEntry(rootDir: string) {
+    this.componentsMtimeIndex?.delete(rootDir);
+  }
Evidence
Watch triggers workspace.clearComponentCache(), which deletes an index entry, but the delete
method only affects the already-built map. The build caching decision is based on the generation
captured at build start and delete does not bump it, so an in-flight build can still be accepted as
canonical.

scopes/workspace/watcher/watcher.ts[651-664]
scopes/workspace/workspace/workspace.ts[840-846]
scopes/workspace/modules/fs-cache/fs-cache.ts[33-45]
scopes/workspace/modules/fs-cache/fs-cache.ts[54-57]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`deleteComponentMtimeIndexEntry()` is intended to force recomputation for a component after `workspace.clearComponentCache()`, but it’s a no-op if the shared index hasn’t been materialized yet (or is still building). Because the in-flight build is still eligible to be cached (generation unchanged), it can reintroduce the stale entry.

### Issue Context
Watch flows call `workspace.clearComponentCache()` on file changes, which calls `deleteComponentMtimeIndexEntry(componentDir)`.

### Fix Focus Areas
- scopes/workspace/modules/fs-cache/fs-cache.ts[33-57]

### Implementation notes
Pick one:
- On `deleteComponentMtimeIndexEntry()`, if `componentsMtimeIndexBuilding` is set, bump `componentsMtimeIndexGen` (and/or call `clearComponentsMtimeIndex()`) so the in-flight build result will not be cached as canonical.
- Alternatively track pending deletions and apply them to the resolved index before caching/returning it.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread scopes/workspace/modules/fs-cache/fs-cache.ts
…nd watch races

Address qodo review on the deps-cache invalidation index:
- clear the in-flight build promise in `finally`, so a transient build
  rejection (glob/stat error) no longer poisons all later reads in a
  long-lived process (watch/start).
- in `deleteComponentMtimeIndexEntry`, bump the generation when a build is
  in-flight, so a watch-triggered clear that races the first build discards
  that build's result instead of caching the now-stale entry as canonical.
- fall back to the per-component scan if the centralized index build throws,
  preserving fault isolation (one bad dir no longer fails every load).
@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit db242dc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant