feat(metrics): emit workflow execution and per-block metrics to CloudWatch by TheodoreSpeaks · Pull Request #4931 · simstudioai/sim

TheodoreSpeaks · 2026-06-09T23:17:44Z

Summary

generalize lib/monitoring/metrics.ts into a namespace-aware CloudWatch emitter (per-namespace PutMetricData batching, same buffer/flush/exit-drain machinery); hostedKeyMetrics API unchanged
new Sim/Workflow namespace: ExecutionStarted, ExecutionCompleted (Trigger/Status dims), ExecutionDuration, ExecutionPaused, BlockExecuted (BlockType/Operation/Success dims), BlockDuration
emit lifecycle metrics from LoggingSession (all completion paths incl. cost-only fallback and markAsFailed), guarded against double-counting; pause is tracked separately from terminal completions
emit per-block metrics from BlockExecutor at blockLog finalization (success + error paths); Operation dim is regex-guarded against dynamic values exploding cardinality
cleanup-stale-executions cron now counts crashed-worker executions as failed completions (only place those can be recorded)
add GRAFANA_DEPLOYMENT_ENVIRONMENT to the Environment dimension fallback chain so staging workers stop tagging metrics as production

Type of Change

New feature

Testing

Unit tests for the emitter (namespace grouping, buffer drain, dimension shapes, failure drop) and LoggingSession metric emission (start/success/failed/cancelled/paused, dedup across fallback/markAsFailed). bun run lint:check, tsc --noEmit, and check:api-validation:strict pass.

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…Watch

vercel · 2026-06-09T23:17:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Jun 9, 2026 11:45pm

TheodoreSpeaks · 2026-06-09T23:17:53Z

@greptile review

cursor · 2026-06-09T23:17:54Z

PR Summary

Medium Risk
Touches execution logging completion paths and the stale-execution cron; incorrect dedup or race handling could skew error-rate metrics, but changes are guarded and mostly additive telemetry.

Overview
Extends CloudWatch application metrics with a Sim/Workflow namespace and wires emission through execution lifecycle and block runs.

lib/monitoring/metrics.ts is generalized to buffer and flush per namespace (Sim/HostedKey unchanged API; flushMetrics replaces the hosted-key-only flush). New workflowMetrics: ExecutionStarted, ExecutionCompleted (+ optional ExecutionDuration), ExecutionPaused, and per-block BlockExecuted / BlockDuration with low-cardinality dimensions. GRAFANA_DEPLOYMENT_ENVIRONMENT is added to the Environment dimension fallback so staging workers are not labeled as production.

LoggingSession records start (not on resume), terminal outcomes (success/failed/cancelled), pause separately, and cost-only fallback paths, with guards so completions are not double-counted; markExecutionAsFailed emits failed only when transitioning from running/pending.

BlockExecutor emits block metrics on success and error; the Operation dimension is regex-limited to avoid cardinality blowups.

The stale-execution cleanup cron only force-fails rows still running (avoids racing a finishing worker) and records ExecutionCompleted failed for worker crashes that never hit LoggingSession.

^{Reviewed by Cursor Bugbot for commit 83bfcbd. Bugbot is set up for automated code reviews on this repo. Configure here.}

greptile-apps · 2026-06-09T23:25:21Z

Greptile Summary

This PR generalises lib/monitoring/metrics.ts into a namespace-aware CloudWatch emitter and wires workflow execution and per-block metrics throughout the executor, LoggingSession, and the stale-execution cleanup cron.

New Sim/Workflow namespace adds ExecutionStarted, ExecutionCompleted (with Trigger/Status dimensions), ExecutionDuration, ExecutionPaused, BlockExecuted (with BlockType/Operation/Success), and BlockDuration metrics; cardinality is guarded by an operation regex and the existing low-cardinality dimension convention.
LoggingSession emits metrics across all terminal paths (success, error, cancellation, pause, cost-only fallback, markAsFailed) with a completionMetricEmitted boolean to prevent double-counting; the cleanup cron fills the gap for crashed workers that never reach a LoggingSession completion path.
hostedKeyMetrics API is unchanged; the refactor only adds the namespace parameter to the internal enqueue function and renames flushHostedKeyMetrics → flushMetrics.

Confidence Score: 5/5

Safe to merge; all metric calls are fire-and-forget with errors caught and dropped, so no request-path impact.

The new metric paths are fully additive to existing logic and cannot break execution correctness. The completionMetricEmitted guard and the cron conditional-update pattern correctly prevent double-counting. Test coverage for the emitter and all LoggingSession emission paths is solid. The only finding is an omitted optional field in a telemetry call.

apps/sim/app/api/cron/cleanup-stale-executions/route.ts — the already-computed totalDurationMs is not forwarded to recordExecutionCompleted.

Important Files Changed

Filename	Overview
apps/sim/lib/monitoring/metrics.ts	Cleanly generalised to a namespace-aware emitter; buffer, flush, and exit-drain logic unchanged; GRAFANA_DEPLOYMENT_ENVIRONMENT added to the env chain.
apps/sim/lib/logs/execution/logging-session.ts	Metric emission integrated across all completion paths; completionMetricEmitted guard prevents double-counting; success path passes raw duration = totalDurationMs
apps/sim/executor/execution/block-executor.ts	recordBlockMetric correctly gated behind the blockLog check on both success and error paths; operation cardinality guard via regex looks correct.
apps/sim/app/api/cron/cleanup-stale-executions/route.ts	Conditional-update pattern correctly prevents double-counting with concurrent worker completions; recordExecutionCompleted emitted correctly but totalDurationMs is not forwarded, so ExecutionDuration data is never emitted for this failure class.
apps/sim/lib/monitoring/metrics.test.ts	New test file covering namespace grouping, buffer drain, dimension shapes, failure drop, and missing-duration skipping; good coverage of the emitter contract.
apps/sim/lib/logs/execution/logging-session.test.ts	Comprehensive metric tests added: start/resume discrimination, all completion paths, pause→markAsFailed sequencing, dedup guard, and primary-write-failure fallback.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Workflow execution starts] --> B[LoggingSession.start / safeStart]
    B --> C[recordExecutionStarted Sim/Workflow]
    C --> D{Execution path}
    D --> E[BlockExecutor.execute]
    E --> F{Block outcome}
    F -->|success| G[recordBlockExecuted success=true + BlockDuration]
    F -->|error| H[recordBlockExecuted success=false + BlockDuration]
    D --> I{Session completion}
    I -->|complete| J[recordExecutionCompleted status=success + ExecutionDuration]
    I -->|completeWithError| K[recordExecutionCompleted status=failed + ExecutionDuration]
    I -->|completeWithCancellation| L[recordExecutionCompleted status=cancelled + ExecutionDuration]
    I -->|completeWithPause| M[recordExecutionPaused]
    I -->|cost-only fallback| N[recordExecutionCompleted status=failed/cancelled/success]
    I -->|markAsFailed| O[recordExecutionCompleted status=failed no durationMs]
    M -->|later markAsFailed| O
    P[cleanup-stale-executions cron] -->|crashed workers only| Q[recordExecutionCompleted status=failed no durationMs]
    J & K & L & N & O & Q --> R[(CloudWatch Sim/Workflow)]
    G & H --> R

_{Reviews (3): Last reviewed commit: "improvement(metrics): one-shot guard on ..." | Re-trigger Greptile}

greptile-apps · 2026-06-09T23:25:26Z

 const ENVIRONMENT =
  process.env.OTEL_DEPLOYMENT_ENVIRONMENT ||
  process.env.DEPLOYMENT_ENVIRONMENT ||
+  process.env.GRAFANA_DEPLOYMENT_ENVIRONMENT ||
  process.env.NODE_ENV ||
  'development'


GRAFANA_DEPLOYMENT_ENVIRONMENT ordering may not fix staging

GRAFANA_DEPLOYMENT_ENVIRONMENT is checked only after DEPLOYMENT_ENVIRONMENT. If staging workers already have DEPLOYMENT_ENVIRONMENT set to any value — even 'production' (common when a shared env template is used) — the Grafana variable is silently shadowed and the fix has no effect. If staging workers consistently have GRAFANA_DEPLOYMENT_ENVIRONMENT but not DEPLOYMENT_ENVIRONMENT, the current order is fine; but if the opposite is possible, GRAFANA_DEPLOYMENT_ENVIRONMENT should move earlier in the chain, or the comment should document the expected env layout explicitly.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Verified the premise does not hold: the infra CDK stacks set neither OTEL_DEPLOYMENT_ENVIRONMENT nor DEPLOYMENT_ENVIRONMENT anywhere (grepped application-stack and constructs), and CloudWatch currently shows only Environment=production series — which is exactly the bug. The ordering is intentional: the OTEL_/DEPLOYMENT_ vars stay as explicit overrides, with the Grafana label (already set per-env for trigger.dev workers) as the next-best signal. The authoritative fix — DEPLOYMENT_ENVIRONMENT: config.environment on the ECS env maps — lands in the infra repo separately.

greptile-apps · 2026-06-09T23:25:48Z

Greptile Summary

This PR generalizes the existing metrics.ts CloudWatch emitter to support multiple namespaces and wires up new Sim/Workflow metrics for execution lifecycle and per-block execution tracking. The hostedKeyMetrics API is unchanged; new workflowMetrics helpers (recordExecutionStarted, recordExecutionCompleted, recordExecutionPaused, recordBlockExecuted) are added with appropriate cardinality guards.

metrics.ts: The buffer is now keyed by namespace (BufferedDatum), flushMetrics() groups by namespace before calling PutMetricData, and GRAFANA_DEPLOYMENT_ENVIRONMENT is added to the ENVIRONMENT fallback chain so trigger.dev staging workers no longer collapse to production.
logging-session.ts: A completionMetricEmitted flag guards all terminal paths (complete, completeWithError, completeWithCancellation, cost-only fallback, markAsFailed) against double-counting; pause is tracked separately via recordExecutionPaused.
block-executor.ts: recordBlockMetric is called on both success and error paths inside the blockLog guard; the Operation dimension is regex-validated to prevent cardinality explosion.
cleanup-stale-executions/route.ts: Adds trigger to the stale-execution query and calls recordExecutionCompleted(failed) for crashed workers that never reach a LoggingSession completion path.

Confidence Score: 4/5

Safe to merge; all terminal completion paths are guarded against double-counting and the CloudWatch flush machinery is unmodified. The two findings are confined to metric accuracy, not execution correctness.

The dedup flag (completionMetricEmitted), the regex cardinality guard on Operation, and the trigger type safety (notNull DB column) are all correct. The safeStart double-emit risk is latent rather than present — nothing currently triggers it — but the pattern is fragile enough to note. The cost-only fallback's || 0 duration passes a zero ExecutionDuration data point whenever totalDurationMs is absent, which will skew CloudWatch percentile math for that path.

The two metric-emission sites in logging-session.ts — around safeStart (line 796) and the cost-only fallback duration (line 1080) — are worth a second look before the staging rollout produces misleading percentile data.

Important Files Changed

Filename	Overview
apps/sim/lib/monitoring/metrics.ts	Core emitter generalized to support namespace-aware buffering; new workflowMetrics API added with correct cardinality guards and GRAFANA_DEPLOYMENT_ENVIRONMENT env-var fallback.
apps/sim/lib/logs/execution/logging-session.ts	Metric emission wired across all terminal paths; completionMetricEmitted flag prevents double-counting; a subtle gap exists in the safeStart fallback path (see comment).
apps/sim/executor/execution/block-executor.ts	recordBlockMetric added to both success and error paths, guarded behind blockLog existence check and operation regex filter; duration is always computed before the guard.
apps/sim/app/api/cron/cleanup-stale-executions/route.ts	trigger column added to select query (notNull in schema); recordExecutionCompleted(failed) correctly placed after the DB update succeeds, handling crashed-worker completions.
apps/sim/lib/monitoring/metrics.test.ts	New test file covering namespace grouping, buffer drain, dimension shapes, and failure-drop behavior for the generalized emitter.
apps/sim/lib/logs/execution/logging-session.test.ts	Comprehensive new test suite for metric emission: start/resume gate, all completion paths, pause/failed dedup, cost-only fallback idempotency, and cancelled-skip behavior.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LoggingSession
    participant BlockExecutor
    participant workflowMetrics
    participant Buffer
    participant CloudWatch

    Caller->>LoggingSession: start() / safeStart()
    LoggingSession->>workflowMetrics: "recordExecutionStarted({trigger})"
    workflowMetrics->>Buffer: enqueue(Sim/Workflow, ExecutionStarted)

    loop Per block
        BlockExecutor->>BlockExecutor: execute block
        alt Success
            BlockExecutor->>workflowMetrics: "recordBlockExecuted({blockType, operation?, success:true, durationMs})"
        else Error
            BlockExecutor->>workflowMetrics: "recordBlockExecuted({blockType, operation?, success:false, durationMs})"
        end
        workflowMetrics->>Buffer: enqueue(Sim/Workflow, BlockExecuted + BlockDuration)
    end

    alt Normal completion
        LoggingSession->>workflowMetrics: "recordExecutionCompleted({trigger, status, durationMs})"
    else Pause
        LoggingSession->>workflowMetrics: "recordExecutionPaused({trigger})"
    else markAsFailed
        LoggingSession->>workflowMetrics: "recordExecutionCompleted({trigger, status:failed}) [guarded by completionMetricEmitted]"
    else Crashed worker (cron)
        Note over LoggingSession,workflowMetrics: cleanup-stale-executions route
        LoggingSession->>workflowMetrics: "recordExecutionCompleted({trigger, status:failed})"
    end
    workflowMetrics->>Buffer: enqueue(Sim/Workflow, ExecutionCompleted + ExecutionDuration?)

    Note over Buffer,CloudWatch: Every 5s / threshold / SIGTERM
    Buffer->>CloudWatch: PutMetricData per namespace (Sim/Workflow, Sim/HostedKey)

_{Reviews (2): Last reviewed commit: "feat(metrics): emit workflow execution a..." | Re-trigger Greptile}

…etric on actual row transition

TheodoreSpeaks · 2026-06-09T23:26:58Z

@greptile review

…/fallback paths

…d omit unknown durations

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 83bfcbd. Configure here.}

feat(metrics): emit workflow execution and per-block metrics to Cloud…

64a80b4

…Watch

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread apps/sim/lib/logs/execution/logging-session.ts Outdated

Comment thread apps/sim/app/api/cron/cleanup-stale-executions/route.ts

Comment thread apps/sim/lib/logs/execution/logging-session.ts

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread apps/sim/lib/logs/execution/logging-session.ts Outdated

Comment thread apps/sim/lib/logs/execution/logging-session.ts

fix(metrics): skip zero-ms fallback duration and gate stale-cleanup m…

912349e

…etric on actual row transition

vercel Bot temporarily deployed to Preview June 9, 2026 23:26 Inactive

improvement(metrics): one-shot guard on ExecutionStarted across start…

8673e5c

…/fallback paths

vercel Bot temporarily deployed to Preview June 9, 2026 23:34 Inactive

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread apps/sim/lib/logs/execution/logging-session.ts Outdated

Comment thread apps/sim/lib/logs/execution/logging-session.ts Outdated

fix(metrics): count static markExecutionAsFailed terminal failures an…

83bfcbd

…d omit unknown durations

vercel Bot temporarily deployed to Preview June 9, 2026 23:45 Inactive

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread apps/sim/lib/logs/execution/logging-session.ts

Conversation

TheodoreSpeaks commented Jun 9, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheodoreSpeaks commented Jun 9, 2026

Uh oh!

cursor Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

TheodoreSpeaks Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 9, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

TheodoreSpeaks commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 9, 2026 •

edited

Loading

cursor Bot commented Jun 9, 2026 •

edited

Loading

greptile-apps Bot commented Jun 9, 2026 •

edited

Loading