Skip to content

Emit scheduler.executor_events_duration per executor#68152

Open
dkranchii wants to merge 2 commits into
apache:mainfrom
dkranchii:metrics-process-executor-events-timer
Open

Emit scheduler.executor_events_duration per executor#68152
dkranchii wants to merge 2 commits into
apache:mainfrom
dkranchii:metrics-process-executor-events-timer

Conversation

@dkranchii

Copy link
Copy Markdown
Contributor

Summary

Wrap _process_executor_events() in a per-executor stats.timer named scheduler.executor_events_duration, tagged by executor class name. Multi-executor deployments can now attribute per-loop event-processing cost to each configured executor, instead of only seeing it baked into the aggregate scheduler.scheduler_loop_duration.

This mirrors the precedent set by #66808, which added scheduler.executor_heartbeat_duration for executor.heartbeat(). The two timers sit side-by-side, so operators can localize which stage of the scheduler loop a given executor is slowing down.

Why

Today, when the scheduler loop runs long in a multi-executor deployment, operators can see the aggregate via scheduler.scheduler_loop_duration but cannot tell which executor's event processing is to blame. Adding a per-executor timer for _process_executor_events gives the same granular signal the heartbeat timer added — single-file, additive, zero-cost when metrics are disabled.

Changes

  • airflow-core/src/airflow/jobs/scheduler_job_runner.py — wrap the per-executor _process_executor_events() call in stats.timer("scheduler.executor_events_duration", tags={"executor": type(executor).__name__}).
  • shared/observability/src/airflow_shared/observability/metrics/metrics_template.yaml — declare the new timer in the metrics registry.
  • airflow-core/tests/unit/jobs/test_scheduler_job.py — add test_process_executor_events_emits_timer, mirroring the existing test_executor_heartbeat_emits_timer structure.

Test plan

  • New unit test test_process_executor_events_emits_timer asserts the timer is emitted once per executor with the expected tag.
  • Existing test_executor_heartbeat_emits_timer still passes (sibling test, same loop).
  • ruff format / ruff check clean.
  • prek run --from-ref main --stage pre-commit and --stage manual clean (excluding host-only mypy/breeze hooks; CI will run them).

Notes

  • Additive change — no behavioral impact when metrics are disabled.
  • No newsfragment per CLAUDE.md: new optional metrics are not major/breaking.
  • No config or schema changes.

Was generative AI tooling used to co-author this PR?
  • Yes — Cursor

@boring-cyborg boring-cyborg Bot added the area:Scheduler including HA (high availability) scheduler label Jun 7, 2026
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Jun 8, 2026
@dkranchii

Copy link
Copy Markdown
Contributor Author

@ashb can you please review this PR. thanks.

@ashb ashb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is fine but please update the commit message or self to follow our PR guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants