Skip to content

Add CloudSQLNoOperationInProgressSensor for parallel admin ops#68151

Open
Vamsi-klu wants to merge 4 commits into
apache:mainfrom
Vamsi-klu:feature/cloudsql-no-op-sensor-68040
Open

Add CloudSQLNoOperationInProgressSensor for parallel admin ops#68151
Vamsi-klu wants to merge 4 commits into
apache:mainfrom
Vamsi-klu:feature/cloudsql-no-op-sensor-68040

Conversation

@Vamsi-klu

@Vamsi-klu Vamsi-klu commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

What

Adds a new operation-agnostic, deferrable CloudSQLNoOperationInProgressSensor to the Google provider that waits until a Cloud SQL instance has no administrative operation in flight before downstream tasks submit a new one. Supporting changes:

  • A list_operations hook method on CloudSQLHook (calls sqladmin.operations.list, filtered to the instance via targetId).
  • A shared CLOUD_SQL_NON_TERMINAL_STATUSES frozenset (PENDING/RUNNING) used to decide whether an operation is still in progress.
  • A CloudSQLNoOperationInProgressTrigger for deferrable mode.
  • Registration of the new sensor module in provider.yaml and a howto section in the Cloud SQL docs.

Why

Cloud SQL serializes administrative operations per instance, so two parallel CloudSQLImportInstanceOperator / CloudSQLExportInstanceOperator tasks against the same instance race and the second fails immediately with HTTP 409 operationInProgress (#68040). The provider does not retry and had no sensor to wait for the instance to become idle, leaving users to reimplement a custom PythonOperator poll or coarse workarounds (per-instance pools, blind retries).

Impact

  • Place the sensor upstream of the import/export operators (or between mutually exclusive admin operators) to serialize work against the same instance.
  • The sensor polls sqladmin.operations.list and succeeds once no operation is in a non-terminal (PENDING/RUNNING) state. Keying off an explicit status set (rather than status != DONE) avoids treating an UNKNOWN/unexpected status as in-progress and poking forever.
  • Runs in either a synchronous poke loop or deferrable mode (deferrable=True, or the operators.default_deferrable config default).
  • Fails fast with AirflowException on HTTP 403/404 (instance missing or access denied) instead of poking until timeout; other HTTP errors are re-raised.
  • Operator submit semantics are unchanged. This is best-effort: it greatly reduces the chance of a 409 but cannot guarantee exclusivity, since a new operation (console-triggered, automated backup, etc.) could still start between the sensor passing and the operator submitting.

Testing

New/updated unit tests (no live GCP calls; get_conn/the hook are mocked):

  • tests/unit/google/cloud/hooks/test_cloud_sql.py::test_list_operations — asserts operations.list is invoked for the right project/instance and that results are filtered to operations whose targetId matches the instance.
  • tests/unit/google/cloud/sensors/test_cloud_sql.py (new) — poke returns True when only terminal ops exist and False when a RUNNING/PENDING op exists (test_poke_returns_true_when_no_in_progress_operations, test_poke_returns_false_when_operation_in_progress); fast-fails on 403/404 and re-raises other HTTP errors (test_poke_fails_fast_on_403_404, test_poke_reraises_other_http_errors); execute defers with the trigger when deferrable and not idle, does not defer when already idle, and delegates to BaseSensorOperator.execute when non-deferrable (test_execute_defers_when_deferrable_and_not_idle, test_execute_does_not_defer_when_idle, test_execute_non_deferrable_delegates_to_super); execute_complete is a no-op on success and raises on failed/error events.
  • tests/unit/google/cloud/triggers/test_cloud_sql.py::TestCloudSQLNoOperationInProgressTrigger — serialization round-trip; run yields a success TriggerEvent when no/only-terminal ops are present, keeps sleeping while an op is in progress, fails fast on 403/404, and emits a failure event on a generic exception.

closes: #68040


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.8)

Generated-by: Claude Code (Opus 4.8) following the guidelines

@Vamsi-klu Vamsi-klu requested a review from shahar1 as a code owner June 7, 2026 00:34
@boring-cyborg boring-cyborg Bot added area:providers kind:documentation provider:google Google (including GCP) related issues labels Jun 7, 2026
Ramachandra Nalam added 3 commits June 6, 2026 18:05
The sensor must be referenced by an example DAG to satisfy the Google
provider project-structure test (test_missing_examples). It gates the
import task on the instance having no admin operation in progress,
which is the documented usage pattern.
- Raise a dedicated CloudSQLOperationError (a subclass of AirflowException)
  instead of the base AirflowException, so the new sensor passes the
  check-no-new-airflow-exceptions static check.
- Add the new cloud_sql sensor to the generated get_provider_info.py.
- Restore the correct generated/provider_dependencies.json.sha256sum
  (the .json itself is unchanged from main).
The provider.yaml change triggers provider-dependencies regeneration,
which produces this checksum on top of current main; commit it with a
trailing newline so update-providers-build-files and end-of-file-fixer
both pass.
@potiuk

potiuk commented Jun 9, 2026

Copy link
Copy Markdown
Member

@Vamsi-klu A few things need addressing before review — see our Pull Request quality criteria.

  • Merge conflicts with main. See docs.

No rush.

Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers kind:documentation provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cloud SQL — CloudSQLImportInstanceOperator / CloudSQLExportInstanceOperator 409 operationInProgress on parallel tasks against the same instanc

2 participants