fix(pyamber): prevent indefinite worker shutdown hang#5326
Closed
Ma77Ball wants to merge 4 commits into
Closed
Conversation
…or tuple command arrives
9168cdb to
83c15c5
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5326 +/- ##
============================================
- Coverage 51.15% 49.13% -2.02%
+ Complexity 2413 2379 -34
============================================
Files 1054 1051 -3
Lines 40923 40205 -718
Branches 4381 4271 -110
============================================
- Hits 20933 19756 -1177
- Misses 18791 19292 +501
+ Partials 1199 1157 -42
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this PR?
Hardens the shutdown path of the three
StoppableQueueBlockingRunnablethreads (MainLoop,NetworkSender,PortStorageWriter) so a stop request no longer depends solely on a single queue wakeup.threading.Eventstop flag toStoppableQueueBlockingRunnable.stop()now sets the flag and enqueues the existingRUNNABLE_STOPmarker.interruptible_getblocks onget(timeout=STOP_POLL_INTERVAL)and treats aqueue.Emptytimeout as a cue to re-check the flag and exit.timeoutthroughGetable.get,InternalQueue.get, andLinkedBlockingMultiQueue.get(viaCondition.wait_for). The defaulttimeout=Nonepreserves the existing blocking behavior exactly, so the tuple data path is untouched and only the stoppable threads opt into polling.Why?
This is defensive hardening, not a fix for a currently-reproducible hang. As the code stands today,
stop()always sets the flag and enqueues the marker through correctly-locked queues, so the wakeup is reliably delivered and no thread parks indefinitely. The value of the change is to decouple "stop was requested" (the flag, checked on a timer) from "the queue delivered the marker" (the notify), so that a future refactor (a new stop path that forgets the marker, or a change to the queue's notify logic) cannot silently reintroduce a shutdown hang.Performance
timeout=Noneeverywhere except the stop threads).STOP_POLL_INTERVAL(1s) to re-check the flag instead of sleeping indefinitely, a no-op wakeup with negligible CPU cost. The interval can be raised if needed.Any related issues, documentation, or discussions?
Closes: #5325
How was this PR tested?
test_stoppable_queue_blocking_thread.py: items reachreceive()andstop()endsrun(); and setting only the stop flag (no marker) still terminatesrun(), exercising the timeout recheck path. This last case injects a state the production code does not currently produce, verifying the safety-net mechanism directly.test_linked_blocking_multi_queue.py(raisesEmptyon expiry, returns an available item, returns an item arriving mid-wait); the file is 17 passing tests.Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.7 in compliance with ASF