Skip to content

server: add admission control for tenant startup wait queue#171225

Open
shankeleven wants to merge 2 commits into
cockroachdb:masterfrom
shankeleven:tenant_system
Open

server: add admission control for tenant startup wait queue#171225
shankeleven wants to merge 2 commits into
cockroachdb:masterfrom
shankeleven:tenant_system

Conversation

@shankeleven

Copy link
Copy Markdown

fixes #154857

When a SQL connection arrives for a default virtual cluster whose tenant server is still initializing, the server controller may wait for the tenant server to become available before accepting the connection.

Previously, an unbounded number of connections could wait concurrently, allowing a bootstrapping tenant to consume excessive resources through open TCP connections.

Add admission control to limit the number of concurrent waiters and reject excess connections immediately. Introduce metrics for wait queue observability, rate-limit timeout/rejection logs, and improve client errors to encourage retry. Also clarify documentation around DataStateReady and tenant runtime readiness.

Release note (ops change): Added cluster setting
server.controller.mux_virtual_cluster_wait.max_concurrent (default 10) to limit the number of SQL connections that may wait concurrently for a default virtual cluster to become available. Connections above the limit are rejected immediately. Added metrics under
server.controller.mux_virtual_cluster_wait.* for wait queue observability.

When a SQL connection arrives for a default virtual cluster whose tenant
server is still initializing, the server controller may wait for the
tenant server to become available before accepting the connection.

Previously, an unbounded number of connections could wait concurrently,
allowing a bootstrapping tenant to consume excessive resources through
open TCP connections.

Add admission control to limit the number of concurrent waiters and
reject excess connections immediately. Introduce metrics for wait queue
observability, rate-limit timeout/rejection logs, and improve client
errors to encourage retry. Also clarify documentation around
DataStateReady and tenant runtime readiness.

Relates to: cockroachdb#154857

Release note (ops change): Added cluster setting
`server.controller.mux_virtual_cluster_wait.max_concurrent` (default 10)
to limit the number of SQL connections that may wait concurrently for a
default virtual cluster to become available. Connections above the limit
are rejected immediately. Added metrics under
`server.controller.mux_virtual_cluster_wait.*` for wait queue
observability.
@shankeleven shankeleven requested review from a team as code owners May 29, 2026 22:21
@shankeleven shankeleven requested review from spilchen and removed request for a team May 29, 2026 22:21
@trunk-io

trunk-io Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@blathers-crl

blathers-crl Bot commented May 29, 2026

Copy link
Copy Markdown

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl Bot added the O-community Originated from the community label May 29, 2026
@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@blathers-crl

blathers-crl Bot commented May 29, 2026

Copy link
Copy Markdown

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-community Originated from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: system tenant should be able to query which app tenants can serve sql

2 participants