Skip to content

Standardized metrics export - feature flagged#125

Open
chrishagglund-ship-it wants to merge 22 commits into
mainfrom
gated-metrics-standardization
Open

Standardized metrics export - feature flagged#125
chrishagglund-ship-it wants to merge 22 commits into
mainfrom
gated-metrics-standardization

Conversation

@chrishagglund-ship-it

@chrishagglund-ship-it chrishagglund-ship-it commented May 4, 2026

Copy link
Copy Markdown
Contributor

Provides cross-sdk standardized metrics output format, enabled by setting WORKER_CANONICAL_METRICS=true, see standardized metrics catalog for the proposed standard metrics catalog

see TOOL-34 for more info

@codecov

codecov Bot commented May 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.57529% with 135 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...rc/sdk/worker/metrics/MetricsCollectorInterface.ts 0.00% 54 Missing ⚠️
...rc/sdk/worker/metrics/CanonicalMetricsCollector.ts 91.88% 43 Missing ⚠️
src/sdk/clients/workflow/WorkflowExecutor.ts 14.28% 24 Missing ⚠️
src/sdk/worker/metrics/LegacyMetricsCollector.ts 86.36% 9 Missing ⚠️
.../sdk/worker/metrics/CanonicalPrometheusRegistry.ts 98.21% 3 Missing ⚠️
...dk/createConductorClient/helpers/fetchWithRetry.ts 97.29% 2 Missing ⚠️
Flag Coverage Δ
integration-v4 64.39% <36.52%> (-2.64%) ⬇️
integration-v5 69.78% <39.15%> (-3.08%) ⬇️
unit 49.33% <88.80%> (+3.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/sdk/builders/tasks/pullWorkflowMessages.ts 100.00% <100.00%> (ø)
src/sdk/clients/worker/Poller.ts 98.63% <100.00%> (+<0.01%) ⬆️
src/sdk/clients/worker/TaskRunner.ts 100.00% <100.00%> (ø)
src/sdk/clients/worker/events/EventDispatcher.ts 100.00% <100.00%> (ø)
...sdk/createConductorClient/createConductorClient.ts 100.00% <100.00%> (ø)
...eateConductorClient/helpers/metricsInterceptors.ts 100.00% <100.00%> (ø)
...reateConductorClient/helpers/resolveOrkesConfig.ts 100.00% <100.00%> (ø)
src/sdk/worker/core/TaskHandler.ts 73.84% <100.00%> (+0.04%) ⬆️
src/sdk/worker/metrics/MetricsServer.ts 100.00% <100.00%> (ø)
src/sdk/worker/metrics/accumulators.ts 100.00% <100.00%> (ø)
... and 8 more

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bradyyie

Copy link
Copy Markdown

One thing on the metrics wiring: the recordApiRequestTime calls in wrapFetchWithRetry (both the success .then and the error handler) aren't guarded. If a metric emit ever throws, the success path turns a good response into a rejected promise, and on the error path it throws over the original network error before throw error ever runs. Metrics shouldn't be able to fail a real request.

Can we wrap the emit in a try/catch (or a small safeEmit helper) so a collector blowing up just gets logged and the request still goes through? Same goes for the other observer calls added on the request path, e.g. recordWorkflowStartError in the start-workflow catch.

Comment on lines +329 to 332
_metricUri?: string,
): void {
const key = `${method}:${uri}:${status}`;
this.observeSummary(this.metrics.apiRequestDurationMs, key, durationMs, "api_request", "endpoint");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this path is wired into the live HTTP pipeline, ignoring the template here is a cardinality problem. _metricUri (the bounded path template) is accepted but unused, and the key is built from the interpolated uri, so id-bearing endpoints each become their own endpoint series and their own Map entry, e.g. GET:/api/workflow/<uuid>:200. Any app calling things like getWorkflow(id) will grow this unbounded.

The canonical collector already uses metricUri ?? uri. Legacy should use the template the same way so the endpoint label stays bounded.

@chrishagglund-ship-it chrishagglund-ship-it Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - i think wiring it up was unintentional, the metrics in main branch don't include this in the output. i think keeping it in the legacy output as a new newly outputted thing (but correctly without cardinality problem) is fine b/c it is just adding a new metric and doesn't change any existing one. wdyt (get rid of it from legacy to match main existing output, or keep it in but do it right?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants