Skip to content

feat(node): opt-in --node-disable-conn-pool for local-first proxy failback#10

Open
DeviaVir wants to merge 3 commits into
masterfrom
feat/node-conn-pool-toggle
Open

feat(node): opt-in --node-disable-conn-pool for local-first proxy failback#10
DeviaVir wants to merge 3 commits into
masterfrom
feat/node-conn-pool-toggle

Conversation

@DeviaVir

@DeviaVir DeviaVir commented Jun 19, 2026

Copy link
Copy Markdown

What

Add an opt-in --node-disable-conn-pool flag (env NODE_DISABLE_CONN_POOL, default false). When set, the node reqwest::Client is built with pool_max_idle_per_host(0), so every request to the node opens a fresh connection instead of reusing a pooled keep-alive one.

Why

When waterfalls reaches bitcoind/elements through a locality-aware L4 proxy a pooled keep-alive connection is a problem. nginx stream picks the upstream once, at TCP connect, and never migrates a live connection. waterfalls polls the node REST tip ~1×/s, so the keep-alive connection never idles out and stays pinned to whichever backend it first hit. After a transient local-node outage (electrs/waterfalls fail over to the remote region), the warm connection stays stuck on the remote daemon forever until the pod restarts.

Disabling the pool makes every request re-establish and re-evaluate local-first (sub-ms against a local node). This is the waterfalls analog of electrs' --daemon-rpc-conn-max-age.

Why opt-in (not the default)

Keeping the pool is the right default for most deployments:

  • Remote/esplora backend (use_esplora, or a WAN node): no pooling means a fresh TCP + TLS handshake per request; a real regression.
  • Burst/catch-up: sequential block fetches benefit from connection reuse; no pool adds handshake + TIME_WAIT churn.
  • Single-region / non-proxied: nothing to get pinned to, so recycling buys nothing.

So the flag defaults to pooled.

Note: reqwest has no native per-connection max-age (only idle-based pool_idle_timeout, ineffective here because ~1/s polling keeps connections warm), so disabling the pool is the available mechanism.

Changes

  • Arguments::node_disable_conn_pool: bool (#[arg(env, long)], documented) + Debug field.
  • fetch.rs: conditionally apply .pool_max_idle_per_host(0).

Testing

  • cargo check and cargo check --tests pass. Default-off means existing behavior is unchanged unless the flag/env is set; test_env.rs builds via ..Default::default().

@DeviaVir DeviaVir self-assigned this Jun 19, 2026
Add --node-disable-conn-pool (default false: keep pooling). When set,
the reqwest client uses pool_max_idle_per_host(0) so every request to the
node opens a fresh connection.

Needed when the node is reached through a locality-aware L4 proxy (nginx
local-primary/remote-backup bitcoin-proxy in front of a GKE MCS
ClusterSetIP): a pooled keep-alive connection gets pinned by the proxy to
whichever backend it first hit, so after a transient local-node outage it
stays stuck on the remote region. Disabling the pool makes each request
re-evaluate local-first (sub-ms against a local node) — the waterfalls
analog of electrs --daemon-rpc-conn-max-age.

Default stays pooled so remote/esplora backends and single-region deploys
keep keep-alive (no TCP+TLS handshake per request).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DeviaVir DeviaVir force-pushed the feat/node-conn-pool-toggle branch from 075afb1 to e6bbff4 Compare June 19, 2026 05:48
Adds three unit tests in fetch.rs (no node/network needed, run under
`cargo test --lib`):

- test_node_disable_conn_pool_flag: locks the CLI contract — defaults to
  pooled, --node-disable-conn-pool flips it on.
- test_conn_pool_enabled_reuses_connection: against a loopback HTTP/1.1
  server that counts accepted TCP connections, three sequential requests
  ride a single reused keep-alive connection (default behavior).
- test_conn_pool_disabled_opens_fresh_connection_each_request: with the
  flag set, each request opens its own connection — proving the proxy
  re-evaluates the upstream every request instead of staying pinned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DeviaVir DeviaVir marked this pull request as ready for review June 19, 2026 06:13
@DeviaVir DeviaVir requested review from RCasatta and Copilot June 19, 2026 06:13

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in CLI/env flag to disable HTTP keep-alive pooling for node REST/RPC calls, to avoid connection “pinning” when waterfalls talks to a locality-aware L4 proxy in front of bitcoind/elementsd.

Changes:

  • Introduces --node-disable-conn-pool / NODE_DISABLE_CONN_POOL (default false) in Arguments.
  • Conditionally configures the reqwest::Client with pool_max_idle_per_host(0) when the flag is enabled.
  • Adds tests validating flag parsing and that connections are/aren’t reused depending on the setting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/server/mod.rs Adds the new Arguments::node_disable_conn_pool flag and includes it in the custom Debug output.
src/fetch.rs Applies the conditional reqwest client configuration and adds tests for flag behavior and connection reuse.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/fetch.rs

@RCasatta RCasatta left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Comment thread src/fetch.rs Outdated
Comment thread src/fetch.rs
Address review feedback on PR #10:
- node_disable_conn_pool no longer disables keep-alive pooling for the
  Esplora backend (would force TCP+TLS per request, a regression). When
  the flag is set together with --use-esplora it is ignored and a loud
  warning is logged instead of being silently dropped.
- Document the node-only scope in the flag help text.
- Drop the clap flag-parse test; the behavior is covered by the
  connection-reuse tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants