Skip to content

[Feature] Fixed batched_futures()#1006

Closed
jan-janssen wants to merge 9 commits into
mainfrom
ilgar
Closed

[Feature] Fixed batched_futures()#1006
jan-janssen wants to merge 9 commits into
mainfrom
ilgar

Conversation

@jan-janssen

@jan-janssen jan-janssen commented Jun 9, 2026

Copy link
Copy Markdown
Member

Repo: executorlib · Base: upstream main @ f926c40 · Files: standalone/batched.py, task_scheduler/interactive/dependency.py, tests/unit/standalone/test_batched.py, tests/unit/executor/test_single_dependencies.py · Size: +55 −12 (production) + new tests

Verified: the patch applies clean with git apply, and both files pass python -m py_compile.

Context. A dependency-scheduler workload submits ~100k interdependent futures and uses a batched
collector
exe.submit(batched, lst=<all upstream futures>, skip_lst=[...], n=<batch size>) — to
stream completed results into the next stage in completion order (downstream must start on the first
n done, not wait for all of lst). At that scale, three independent issues surfaced in this path: two
are scaling bugs, one is a fault-tolerance bug. They live in two files, and batched_futures is touched
by two of them — so below, the reasoning is per-issue and the code is shown per-function, each change
attributed to its issue.

Issues at a glance

# Function The bug The fix
1 batched_futures de-dups already-returned results by value (x not in list) → O(N²); on rich objects (ASE Atoms, whose __eq__ is ~100× a dict compare) a scan took >6.7 min at N=5000. Value-equality is also wrong — two distinct results can be ==. de-dup by id() in a set (O(1), identity-correct)
2 _execute_tasks_with_dependencies the collector is ingested like an ordinary task, so get_future_objects_from_input treats all 100k lst futures as inputs and scans them — O(N) per collector, on every wait-list pass → scheduler stalls, downstream starves special-case fn=="batched": register only the small skip_lst; never ingest lst
3 batched_futures + _update_waiting_task .result() on an exception-completed future re-raises; the scheduler then set_exceptions the batch future → it cascades to every downstream dependent. One failed input → all stages "finish" with zero output. exclude failed futures (log once) + emit a partial final batch; scoped try/except confines any real exception to the one batch future

The three are independent and can be reviewed/merged separately.

Why each fix is correct

Issue 1 — id() dedup. skipped_ids holds the identity of every result already emitted in a prior
batch, so id(result) not in skipped_ids answers exactly the intended question — "have I already returned
this object?" Every result is a distinct object, so len(skipped_ids) == len(old list) and n_expected
is unchanged: identical behavior, O(1) identity instead of an O(N) value compare (and no reliance on a
possibly-expensive/ambiguous __eq__). Scan dropped from >6.7 min → ~0.86 s at N=5000.

Issue 2 — don't ingest lst. A batched collector's readiness never depended on lst: it is driven
by skip_lst (the few prior batch futures) plus the batched_futures scan, which runs once, when
skip_lst is done
, and still returns the first-n-done — so completion order is preserved. Walking
100k futures on ingestion (and every pass) was pure waste; registering only skip_lst removes the
O(N)-per-collector stall with no semantic change.

Issue 3 — drop failed futures instead of re-raising. Future.result() re-raises a failed future's
exception, so calling it inside the collector lets a single failed input abort the whole batch, which
the scheduler converts into a failure of the batch future and all its dependents. Checking
v.exception() first and skipping that future confines the failure to one item; the batch is still built
from the successful results. The all_resolved / partial-batch return covers the tail case (input fully
resolved but n unreachable because some failed) so the collector can't wait forever for a batch that can
never fill. The try/except in the scheduler is a backstop: a genuine exception (e.g. from a skip_lst
future) fails only that batch future instead of crashing the scheduler thread.

Semantics note for review: Issue 3 makes the collector silently drop + log failed inputs — a
deliberate policy choice. If upstream prefers fail-loud (or wants it configurable), this is the place to
decide it. Issues 1 and 2 are pure perf/correctness with no behavior change.

Tests

New unit tests cover all new code paths:

  • test_batched_futures_with_failed_future — a failed future inside lst is excluded and the batch is built from the remaining successful results (no exception raised).
  • test_batched_futures_failed_future_logged_once_logged_failed_ids dedup: the same failed future is recorded only once, even across repeated calls.
  • test_batched_futures_partial_batch_due_to_failures — when all futures are resolved but n is unreachable due to failures, a partial batch is emitted instead of blocking forever.
  • test_batched_with_failed_upstream_future (integration) — end-to-end check via SingleNodeExecutor.batched() that a failed upstream future is silently dropped and all batch futures still resolve successfully without cascading the exception downstream.
  • test_update_waiting_task_batched_exception — directly calls _update_waiting_task with a mock skip_lst future whose .result() raises, verifying that the except Exception handler in _update_waiting_task catches the error and propagates it to the batch future (instead of crashing the scheduler thread).

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Batched futures now exclude failed operations from produced batches and log them separately, preventing exception cascading.
    • Enhanced error handling in task scheduler for batched operations with improved exception propagation.
    • Partial batches are now correctly emitted when full batches cannot be formed.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d0ad792-b16f-41de-8f5c-46203aa8bb62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR modifies batching to exclude and safely log exception-completing futures, preventing exceptions from cascading downstream. It rewrites the core batching loop to track skipped identities, emit batches upon collecting eligible results, and integrates exception-safe handling into the task scheduler to avoid crashes.

Changes

Batched futures exception handling and scheduler integration

Layer / File(s) Summary
Exception-aware batching implementation
src/executorlib/standalone/batched.py
batched_futures adds module-level _logged_failed_ids to log failed futures once, updates docstring to describe exception exclusion and partial-batch behavior, and replaces skip-list deduplication with identity-based precomputation; the batching loop excludes futures with exceptions (logging each once), collects successful results, returns immediately upon reaching n eligible results, and emits partial batches only when all input futures are resolved.
Scheduler integration for batched tasks
src/executorlib/task_scheduler/interactive/dependency.py
Scheduler adds a dedicated branch for fn=="batched" tasks that queues only skip_lst futures for readiness tracking (avoiding full dependency scan), and wraps batched_futures computation in try/except to propagate exceptions to the batch task's future without crashing the scheduler thread.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • pyiron/executorlib#756: Introduces the batching implementation directly refined by this PR, which updates exception filtering and skip-list handling in batched_futures and extends scheduler integration for batched tasks.

Suggested reviewers

  • samwaseda

Poem

🐰 A batch of futures, some may fail,
We'll log them once and tell their tale.
Skip the bad ones, keep the bright,
Scheduler sleeps—no crash tonight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fixed batched_futures()' is vague and does not clearly convey what problem was fixed or what the main change accomplishes. Replace with a more descriptive title that explains the specific fix, e.g., 'Exclude failed futures from batches in batched_futures()' or 'Handle exceptions in batched_futures() batching logic'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ilgar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jan-janssen jan-janssen marked this pull request as draft June 9, 2026 19:19
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.25%. Comparing base (4396b94) to head (98cb58c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1006      +/-   ##
==========================================
+ Coverage   94.19%   94.25%   +0.05%     
==========================================
  Files          39       39              
  Lines        2103     2122      +19     
==========================================
+ Hits         1981     2000      +19     
  Misses        122      122              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/executorlib/standalone/batched.py (1)

39-43: ⚡ Quick win

Consider using a proper logger instead of print().

The current implementation uses print() for logging failed futures. Using Python's logging module would provide better control over log levels, formatting, and output destinations, making it easier to integrate with application-wide logging configuration.

📝 Suggested improvement
+import logging
 from concurrent.futures import Future

+logger = logging.getLogger(__name__)
+
 # Future objects we have already reported as failed -- so each failed job is logged once, not on
 # every scheduler pass (batched_futures is re-evaluated many times until a batch fills).
 _logged_failed_ids: set = set()

Then replace the print statement:

             if v.exception() is not None:
                 if id(v) not in _logged_failed_ids:
                     _logged_failed_ids.add(id(v))
-                    print(
-                        f"[batched_futures] EXCLUDING failed future from batch: "
-                        f"{type(v.exception()).__name__}: {v.exception()}",
-                        flush=True,
-                    )
+                    logger.warning(
+                        "EXCLUDING failed future from batch: %s: %s",
+                        type(v.exception()).__name__,
+                        v.exception(),
+                    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/executorlib/standalone/batched.py` around lines 39 - 43, Replace the
ad-hoc print in the batched_futures code with the Python logging module:
create/get a module logger (e.g., logger = logging.getLogger(__name__)) and use
an appropriate level (logger.warning or logger.error) to log the message about
excluding a failed future, include the exception type and message in the log
string and pass exc_info=True or the exception object for richer context; update
any imports to include logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/executorlib/standalone/batched.py`:
- Around line 39-43: Replace the ad-hoc print in the batched_futures code with
the Python logging module: create/get a module logger (e.g., logger =
logging.getLogger(__name__)) and use an appropriate level (logger.warning or
logger.error) to log the message about excluding a failed future, include the
exception type and message in the log string and pass exc_info=True or the
exception object for richer context; update any imports to include logging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c66e7c31-af06-47bc-9340-31aef8ad288e

📥 Commits

Reviewing files that changed from the base of the PR and between 4396b94 and 6976d48.

📒 Files selected for processing (2)
  • src/executorlib/standalone/batched.py
  • src/executorlib/task_scheduler/interactive/dependency.py

@jan-janssen

Copy link
Copy Markdown
Member Author

@copilot Extend the unit tests to cover the newly added lines.

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@copilot Extend the unit tests to cover the newly added lines.

Added tests in commit 0a12ed7:

tests/unit/standalone/test_batched.py — 3 new tests:

  • test_batched_futures_with_failed_future: a failed future in lst is excluded and the batch is built from remaining successful results
  • test_batched_futures_failed_future_logged_once: _logged_failed_ids ensures the same failed future is only logged once across multiple calls
  • test_batched_futures_partial_batch_due_to_failures: when all futures are resolved but n is unreachable due to failures, a partial batch is emitted instead of blocking forever

tests/unit/executor/test_single_dependencies.py — 1 new integration test:

  • test_batched_with_failed_upstream_future: end-to-end check that a failed upstream future is silently dropped and the batch futures still resolve successfully (no exception cascades downstream)

@jan-janssen

Copy link
Copy Markdown
Member Author

@copilot There are still three newly added lines in dependency.py which are not covered by unit tests. Please add unit tests for those new lines? In addition, do not commit the _version.py file I want this to remain exactly the way it is.

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@copilot There are still three newly added lines in dependency.py which are not covered by unit tests. Please add unit tests for those new lines? In addition, do not commit the _version.py file I ...

Added in commit b76bd7a:

tests/unit/executor/test_single_dependencies.py — new test test_update_waiting_task_batched_exception:

  • Calls _update_waiting_task directly with a batched task dict whose skip_lst contains a mock future that passes the get_exception_lst check (.exception() returns None) but raises on .result(), triggering the except Exception as exc / set_exception / continue block
  • Verifies the batch future receives the exception (not a scheduler crash) and the task is not re-queued

Regarding _version.py: it is not modified by any of my commits — git diff cad969f..HEAD --name-only only shows the two test files.

@jan-janssen jan-janssen changed the title Fixed batched_futures() [Feature] Fixed batched_futures() Jun 10, 2026
@jan-janssen

Copy link
Copy Markdown
Member Author

I split the changes in three different pull requests:

The concepts are the same, the place where the individual features were implemented are slightly different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants