Skip to content

Harden StoppableQueueBlockingRunnable shutdown to not depend solely on the RUNNABLE_STOP wakeup #5325

Description

@Ma77Ball

The three StoppableQueueBlockingRunnable threads (MainLoop, NetworkSender, PortStorageWriter) block in interruptible_get on a queue.get() with no timeout. By design this get() is indefinitely blocking so the engine only advances when a real message or the RUNNABLE_STOP marker arrives, and the Scala side mirrors this. That design is intentional and should be preserved: the loop must not return to receive() on a quiet queue.

This is a defensive-hardening request, not a report of a reproducible failure. As the code stands today, stop() reliably sets and delivers the RUNNABLE_STOP marker through correctly-locked queues, so no shutdown hang has been observed or reproduced in CI. The goal is to decouple "stop was requested" from "the marker wakeup was delivered," so that a future change (a new stop path that forgets the marker, or a change to the queue's notify logic) cannot silently reintroduce a shutdown hang.

Proposed approach (data path and blocking semantics preserved):

  • Add a threading.Event stop flag; stop() sets it in addition to enqueueing the marker.
  • interruptible_get polls with a short timeout and treats queue.Empty as "loop and wait again" (continue), so it never returns control to receive() / the handling loop on a timeout. It only returns on a real item, or raises InterruptRunnable when the flag is set. This keeps the indefinite-blocking semantics for the data path intact while ensuring a stop request is honored within one poll interval even if the single marker wakeup were ever missed.
  • Thread an optional timeout through Getable.get -> InternalQueue.get -> LinkedBlockingMultiQueue.get (the last via Condition.wait_for). Default timeout=None keeps the existing blocking behavior unchanged; only the stoppable threads opt into polling.

Open question: whether this hardening is worth the added surface area (a timeout on a queue that is intentionally infinite-blocking) given no failure has been reproduced. See discussion in #5326.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions