Skip to content

feat: criticality worker init [CM-1214]#4161

Merged
mbani01 merged 15 commits into
mainfrom
feat/criticality_worker
Jun 5, 2026
Merged

feat: criticality worker init [CM-1214]#4161
mbani01 merged 15 commits into
mainfrom
feat/criticality_worker

Conversation

@mbani01

@mbani01 mbani01 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces a new criticality scoring pipeline for open source packages, implementing the ADR-0001 methodology. It adds a manual override table for criticality, a new impact scoring formula, and a standalone PageRank-based centrality computation. The changes include new SQL migrations, worker scripts, and supporting TypeScript modules for graph construction and scoring. These updates make the system more flexible, auditable, and tunable.

Database schema and scoring logic:

  • Added a new package_criticality_spotlight table to allow manual overrides for critical package designation, ensuring certain packages are always marked as critical regardless of computed score.
  • Updated the scoring formula in rank_packages_universe() to use an "impact" metric (replacing criticality_score), based on weighted percentiles of downloads, direct dependents, and transitive dependents. The function now also applies spotlight overrides and propagates scores to the main packages table.

PageRank centrality computation:

  • Added new TypeScript modules to build a dependency graph, compute PageRank centrality scores, and merge results into the database. This includes efficient graph construction, iterative scoring, and chunked updates for scalability. [1] [2] [3] [4]

Worker scripts and developer tools:

  • Added new scripts to package.json for running and developing the criticality worker, PageRank, and impact scorer, supporting both production and local environments.
  • Added standalone command-line runners for PageRank (run-pagerank.ts) with validation/spot-checks, and for triggering the impact scoring function (run-impact.ts) with tunable parameters. [1] [2]

Note

High Risk
Renames scoring columns and redefines is_critical selection for large package universes; incorrect weights, top-N JSON, or graph/spotlight logic would mis-rank Tier 2 enrichment targets.

Overview
Introduces the criticality slice of ADR-0001: auditable spotlight overrides, a new impact ranking pass in Postgres, and an in-worker PageRank path that writes centrality_score ahead of folding it into impact.

Database: Adds package_criticality_spotlight and replaces rank_packages_universe() so criticality_score becomes impact on packages_universe and packages. Impact is a per-ecosystem weighted blend of percentile ranks on log downloads, direct dependents, and transitive dependents (defaults 0.25 / 0.25 / 0.50), then top-N is_critical, then spotlight forces critical, then propagation to packages. ADR-0001 is updated to describe this slimmer formula (PageRank stored but not in impact yet).

Worker / tooling: New src/criticality/ builds a CSR graph from direct package_dependencies, runs PageRank, and bulk-updates packages_universe.centrality_score. criticality-worker is a DB health stub for now; run:pagerank (with DB spot-checks) and run:impact invoke scoring on demand. package.json gains start/dev scripts for the criticality worker and the CLIs.

Reviewed by Cursor Bugbot for commit 6054c53. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings June 3, 2026 10:15
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces initial “criticality” groundwork in packages_worker by adding a PageRank-based centrality computation (written into packages_universe) and updating the database ranking function/migrations to support an ADR-based criticality scoring formula.

Changes:

  • Add CSR graph building + PageRank computation and a standalone runner for validating graph correctness.
  • Add DB queries to load direct dependency edges and merge computed centrality scores back into packages_universe.
  • Add migrations for graph-derived signals and a v2 rank_packages_universe() implementation using weighted percentile ranks.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
services/apps/packages_worker/src/criticality/types.ts Adds types for centrality input/output and criticality weight definitions.
services/apps/packages_worker/src/criticality/run-pagerank.ts Adds a standalone CLI-like script to build/validate the graph and optionally run full PageRank.
services/apps/packages_worker/src/criticality/queries.ts Adds SQL helpers to load direct dependency edges and bulk-merge centrality scores.
services/apps/packages_worker/src/criticality/graph.ts Implements CSR graph construction and PageRank iteration utilities.
services/apps/packages_worker/src/criticality/activities.ts Implements the Temporal activity to compute PageRank and persist centrality scores in chunks.
backend/src/osspckgs/migrations/V1780416481__rank_packages_universe_v2.sql Replaces/updates rank_packages_universe() scoring + ranking logic to match the ADR methodology.
backend/src/osspckgs/migrations/V1780394591__packages_universe_graph_signals.sql Adds transitive_dependent_count and centrality_score columns to packages_universe.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/src/osspckgs/migrations/V1780416481__rank_packages_universe_v2.sql Outdated
Comment thread backend/src/osspckgs/migrations/V1780416481__rank_packages_universe_v2.sql Outdated
Comment thread backend/src/osspckgs/migrations/V1780416481__rank_packages_universe_v2.sql Outdated
Comment thread backend/src/osspckgs/migrations/V1780416481__rank_packages_universe_v2.sql Outdated
Comment thread services/apps/packages_worker/src/criticality/graph.ts Outdated
Comment thread services/apps/packages_worker/src/criticality/activities.ts
Comment thread services/apps/packages_worker/src/criticality/run-pagerank.ts
Comment thread services/apps/packages_worker/src/criticality/run-pagerank.ts Outdated
Comment thread services/apps/packages_worker/src/criticality/types.ts Outdated
Comment thread services/apps/packages_worker/src/criticality/graph.ts
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 self-assigned this Jun 4, 2026
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

mbani01 added 3 commits June 4, 2026 18:39
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings June 4, 2026 17:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 changed the title feat: criticality worker init (wip) feat: criticality worker init [CM-1214] Jun 4, 2026
Signed-off-by: Mouad BANI <mbani@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings June 4, 2026 18:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 marked this pull request as ready for review June 4, 2026 18:11
Copilot AI review requested due to automatic review settings June 4, 2026 18:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread backend/src/osspckgs/migrations/V1780589607__rank_packages_universe_v2.sql Outdated
Comment thread services/apps/packages_worker/src/criticality/run-impact.ts Outdated
Comment thread services/apps/packages_worker/src/criticality/graph.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@mbani01 mbani01 requested a review from joanagmaia June 4, 2026 18:13
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Comment on lines +14 to +20
export interface CriticalityWeights {
wCentrality: number // 0.40
wTransitive: number // 0.10
wDepPkgs: number // 0.20
wDepRepos: number // 0.15
wDownloads: number // 0.15
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still up to date?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This is actually no longer used, removed it

weight_downloads numeric DEFAULT 0.25,
weight_dependent_packages numeric DEFAULT 0.25,
weight_transitive numeric DEFAULT 0.50,
critical_top_n_by_ecosystem jsonb DEFAULT '{}'::jsonb

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where this is defined, but to already consider all of the possible registries I would say to have it like:

  • npm: 30% (210k)
  • PyPI: 20% (140k)
  • Maven Central: 17% (120k)
  • NuGet: 10% (70k)
  • Packagist: 8% (56k)
  • Go modules: 6% (42k)
  • crates.io: 4% (28k)
  • RubyGems: 3% (21k)
  • Docker Hub: 2% (13k)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to set them during the function call, but yeah we should have default values.
Fixed

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings June 5, 2026 10:08

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 025534c. Configure here.


const [result] = await qx.select(
`SELECT * FROM rank_packages_universe($/wDownloads/, $/wDepPkgs/, $/wTransitive/, $/topN/::jsonb)`,
{ wDownloads, wDepPkgs, wTransitive, topN },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial top-N clears critical flags

Medium Severity

run:impact always passes critical_top_n_by_ecosystem as JSON, and the default only includes cargo and maven. In rank_packages_universe(), missing ecosystem keys make (critical_top_n_by_ecosystem ->> ecosystem)::int null, so is_critical stays false for npm, pypi, go, nuget, and others when the script runs without --top-n.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 025534c. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.

Comment on lines +23 to +24
const ecosystem = process.argv[2] ?? 'cargo'
const graphOnly = process.argv.includes('--graph-only')
Comment on lines +27 to +30
function parseJsonArg(flag: string, fallback: string): string {
const idx = process.argv.indexOf(flag)
return idx !== -1 ? process.argv[idx + 1] : fallback
}
if (delta < convergence) break // scores have stabilised
}

return { scores, iterations: iters }
Comment on lines +73 to +86
// Each node v collects votes from packages that depend on it.
// numDeps[dependent] is always >= 1 here — only packages with at least one
// outgoing edge appear in colData, so division by zero cannot occur.
// Dangling nodes (numDeps = 0) never appear in colData; their score
// accumulates but never redistributes. This is acceptable because scores
// are used for relative ranking via pct_rank(), not as absolute values.
for (let v = 0; v < N; v++) {
let incoming = 0
for (let j = rowPtr[v]; j < rowPtr[v + 1]; j++) {
const dependent = colData[j]
incoming += scores[dependent] / numDeps[dependent]
}
next[v] = teleportation + damping * incoming
}
Comment on lines +26 to +29
// Bulk-update centrality_score on packages_universe rows by joining through packages.
// Uses unnest — one parameterised query regardless of row count, no string interpolation.
// Isolated packages (not in the graph) remain NULL; rank_packages_universe() treats
// NULL as 0 via COALESCE. Idempotent — safe for Temporal retries.
Comment on lines +55 to +59
export function computePageRank(
{ numDeps, rowPtr, colData, N }: Graph,
damping = 0.85,
maxIter = 100,
convergence = 1e-6,
Comment on lines +4 to +7
-- Formula (ADR-0001 §Criticality scoring methodology):
-- impact = w_downloads * pct_rank( LOG(1 + downloads_last_30d) ) within ecosystem
-- + w_dep_pkgs * pct_rank( LOG(1 + dependent_count) ) within ecosystem
-- + w_transitive * pct_rank( LOG(1 + transitive_dependent_count) ) within ecosystem
Comment on lines +121 to +123
impact = w_downloads * pct_rank( LOG(1 + downloads_last_30d) ) within ecosystem
+ w_dep_pkgs * pct_rank( LOG(1 + dependent_count) ) within ecosystem
+ w_transitive * pct_rank( LOG(1 + transitive_dependent_count) ) within ecosystem
Comment on lines +57 to +62
// ── Build graph ───────────────────────────────────────────────────────────
console.log(`Building graph for ecosystem=${ecosystem} ...`)
let t = Date.now()
const edges = await loadDirectEdges(qx, ecosystem)
const edgeCount = edges.length
const graph = buildGraph(edges)
}
},
)
log.info({ ecosystem, iterations, nodeCount: graph.N }, 'PageRank converged')
joanagmaia
joanagmaia previously approved these changes Jun 5, 2026
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mbani@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings June 5, 2026 13:16
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 merged commit bba3a14 into main Jun 5, 2026
15 checks passed
@mbani01 mbani01 deleted the feat/criticality_worker branch June 5, 2026 13:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 7 comments.

Comment on lines +5 to +9
"start:packages-worker": "CROWD_TEMPORAL_TASKQUEUE=packages-worker CROWD_TEMPORAL_NAMESPACE=$CROWD_PACKAGES_TEMPORAL_NAMESPACE SERVICE=packages-worker tsx src/bin/packages-worker.ts",
"start:criticality-worker": "CROWD_TEMPORAL_TASKQUEUE=packages-worker CROWD_TEMPORAL_NAMESPACE=$CROWD_PACKAGES_TEMPORAL_NAMESPACE SERVICE=criticality-worker tsx src/bin/criticality-worker.ts",
"start:deps-dev-ingest": "CROWD_TEMPORAL_TASKQUEUE=deps-dev-ingest CROWD_TEMPORAL_NAMESPACE=$CROWD_PACKAGES_TEMPORAL_NAMESPACE SERVICE=deps-dev-ingest tsx src/bin/deps-dev-ingest.ts",
"start:github-repos-enricher": "SERVICE=github-repos-enricher tsx src/bin/github-repos-enricher.ts",
"start:npm-worker": "CROWD_TEMPORAL_TASKQUEUE=npm-worker CROWD_TEMPORAL_NAMESPACE=$CROWD_PACKAGES_TEMPORAL_NAMESPACE SERVICE=npm-worker tsx src/bin/npm-worker.ts",
"start:packages-worker": "CROWD_TEMPORAL_TASKQUEUE=packages-worker CROWD_TEMPORAL_NAMESPACE=$CROWD_PACKAGES_TEMPORAL_NAMESPACE SERVICE=packages-worker tsx src/bin/packages-worker.ts",
"start:github-repos-enricher": "SERVICE=github-repos-enricher tsx src/bin/github-repos-enricher.ts",
Comment on lines 4 to +6
"scripts": {
"start:packages-worker": "CROWD_TEMPORAL_TASKQUEUE=packages-worker CROWD_TEMPORAL_NAMESPACE=$CROWD_PACKAGES_TEMPORAL_NAMESPACE SERVICE=packages-worker tsx src/bin/packages-worker.ts",
"start:criticality-worker": "CROWD_TEMPORAL_TASKQUEUE=packages-worker CROWD_TEMPORAL_NAMESPACE=$CROWD_PACKAGES_TEMPORAL_NAMESPACE SERVICE=criticality-worker tsx src/bin/criticality-worker.ts",
Comment on lines +16 to +20
FROM package_dependencies pd
JOIN packages p
ON p.id = pd.package_id
AND p.ecosystem = $/ecosystem/
WHERE pd.dependency_kind = 'direct'`,
if (delta < convergence) break // scores have stabilised
}

return { scores, iterations: iters }
Comment on lines +12 to +16
export function buildGraph(edges: DirectEdge[]): Graph {
// Pass 0: assign contiguous indices
const nodeIndex = new Map<number, number>()
const nodeIds: number[] = []

Comment on lines +27 to +30
function parseJsonArg(flag: string, fallback: string): string {
const idx = process.argv.indexOf(flag)
return idx !== -1 ? process.argv[idx + 1] : fallback
}
Comment on lines 118 to +122
Per-ecosystem percentile-rank of each log-transformed signal, then weighted blend:

```
score = w_downloads * pct_rank( LN(1 + downloads_last_30d) ) within ecosystem
+ w_dep_pkgs * pct_rank( LN(1 + dependent_packages_count) ) within ecosystem
+ w_dep_repos * pct_rank( LN(1 + dependent_repos_count) ) within ecosystem
+ w_transitive * pct_rank( LN(1 + transitive_dependent_count) ) within ecosystem
+ w_centrality * pct_rank( centrality_score ) within ecosystem
impact = w_downloads * pct_rank( LOG(1 + downloads_last_30d) ) within ecosystem
+ w_dep_pkgs * pct_rank( LOG(1 + dependent_count) ) within ecosystem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants