[Feature] Fixed batched_futures() by jan-janssen · Pull Request #1006 · pyiron/executorlib

jan-janssen · 2026-06-09T19:16:12Z

Repo: executorlib · Base: upstream main @ f926c40 · Files: standalone/batched.py, task_scheduler/interactive/dependency.py, tests/unit/standalone/test_batched.py, tests/unit/executor/test_single_dependencies.py · Size: +55 −12 (production) + new tests

Verified: the patch applies clean with git apply, and both files pass python -m py_compile.

Context. A dependency-scheduler workload submits ~100k interdependent futures and uses a batched
collector — exe.submit(batched, lst=<all upstream futures>, skip_lst=[...], n=<batch size>) — to
stream completed results into the next stage in completion order (downstream must start on the first
n done, not wait for all of lst). At that scale, three independent issues surfaced in this path: two
are scaling bugs, one is a fault-tolerance bug. They live in two files, and batched_futures is touched
by two of them — so below, the reasoning is per-issue and the code is shown per-function, each change
attributed to its issue.

Issues at a glance

#	Function	The bug	The fix
1	`batched_futures`	de-dups already-returned results by value (`x not in list`) → O(N²); on rich objects (ASE `Atoms`, whose `__eq__` is ~100× a dict compare) a scan took >6.7 min at N=5000. Value-equality is also wrong — two distinct results can be `==`.	de-dup by `id()` in a set (O(1), identity-correct)
2	`_execute_tasks_with_dependencies`	the collector is ingested like an ordinary task, so `get_future_objects_from_input` treats all 100k `lst` futures as inputs and scans them — O(N) per collector, on every wait-list pass → scheduler stalls, downstream starves	special-case `fn=="batched"`: register only the small `skip_lst`; never ingest `lst`
3	`batched_futures` + `_update_waiting_task`	`.result()` on an exception-completed future re-raises; the scheduler then `set_exception`s the batch future → it cascades to every downstream dependent. One failed input → all stages "finish" with zero output.	exclude failed futures (log once) + emit a partial final batch; scoped `try/except` confines any real exception to the one batch future

The three are independent and can be reviewed/merged separately.

Why each fix is correct

Issue 1 — id() dedup. skipped_ids holds the identity of every result already emitted in a prior
batch, so id(result) not in skipped_ids answers exactly the intended question — "have I already returned
this object?" Every result is a distinct object, so len(skipped_ids) == len(old list) and n_expected
is unchanged: identical behavior, O(1) identity instead of an O(N) value compare (and no reliance on a
possibly-expensive/ambiguous __eq__). Scan dropped from >6.7 min → ~0.86 s at N=5000.

Issue 2 — don't ingest lst. A batched collector's readiness never depended on lst: it is driven
by skip_lst (the few prior batch futures) plus the batched_futures scan, which runs once, when
skip_lst is done, and still returns the first-n-done — so completion order is preserved. Walking
100k futures on ingestion (and every pass) was pure waste; registering only skip_lst removes the
O(N)-per-collector stall with no semantic change.

Issue 3 — drop failed futures instead of re-raising. Future.result() re-raises a failed future's
exception, so calling it inside the collector lets a single failed input abort the whole batch, which
the scheduler converts into a failure of the batch future and all its dependents. Checking
v.exception() first and skipping that future confines the failure to one item; the batch is still built
from the successful results. The all_resolved / partial-batch return covers the tail case (input fully
resolved but n unreachable because some failed) so the collector can't wait forever for a batch that can
never fill. The try/except in the scheduler is a backstop: a genuine exception (e.g. from a skip_lst
future) fails only that batch future instead of crashing the scheduler thread.

Semantics note for review: Issue 3 makes the collector silently drop + log failed inputs — a
deliberate policy choice. If upstream prefers fail-loud (or wants it configurable), this is the place to
decide it. Issues 1 and 2 are pure perf/correctness with no behavior change.

Tests

New unit tests cover all new code paths:

test_batched_futures_with_failed_future — a failed future inside lst is excluded and the batch is built from the remaining successful results (no exception raised).
test_batched_futures_failed_future_logged_once — _logged_failed_ids dedup: the same failed future is recorded only once, even across repeated calls.
test_batched_futures_partial_batch_due_to_failures — when all futures are resolved but n is unreachable due to failures, a partial batch is emitted instead of blocking forever.
test_batched_with_failed_upstream_future (integration) — end-to-end check via SingleNodeExecutor.batched() that a failed upstream future is silently dropped and all batch futures still resolve successfully without cascading the exception downstream.
test_update_waiting_task_batched_exception — directly calls _update_waiting_task with a mock skip_lst future whose .result() raises, verifying that the except Exception handler in _update_waiting_task catches the error and propagates it to the batch future (instead of crashing the scheduler thread).

Summary by CodeRabbit

Release Notes

Bug Fixes
- Batched futures now exclude failed operations from produced batches and log them separately, preventing exception cascading.
- Enhanced error handling in task scheduler for batched operations with improved exception propagation.
- Partial batches are now correctly emitted when full batches cannot be formed.

coderabbitai · 2026-06-09T19:16:25Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d0ad792-b16f-41de-8f5c-46203aa8bb62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR modifies batching to exclude and safely log exception-completing futures, preventing exceptions from cascading downstream. It rewrites the core batching loop to track skipped identities, emit batches upon collecting eligible results, and integrates exception-safe handling into the task scheduler to avoid crashes.

Changes

Batched futures exception handling and scheduler integration

Layer / File(s)	Summary
Exception-aware batching implementation `src/executorlib/standalone/batched.py`	`batched_futures` adds module-level `_logged_failed_ids` to log failed futures once, updates docstring to describe exception exclusion and partial-batch behavior, and replaces skip-list deduplication with identity-based precomputation; the batching loop excludes futures with exceptions (logging each once), collects successful results, returns immediately upon reaching n eligible results, and emits partial batches only when all input futures are resolved.
Scheduler integration for batched tasks `src/executorlib/task_scheduler/interactive/dependency.py`	Scheduler adds a dedicated branch for `fn=="batched"` tasks that queues only `skip_lst` futures for readiness tracking (avoiding full dependency scan), and wraps `batched_futures` computation in try/except to propagate exceptions to the batch task's future without crashing the scheduler thread.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

pyiron/executorlib#756: Introduces the batching implementation directly refined by this PR, which updates exception filtering and skip-list handling in batched_futures and extends scheduler integration for batched tasks.

Suggested reviewers

samwaseda

Poem

🐰 A batch of futures, some may fail,
We'll log them once and tell their tale.
Skip the bad ones, keep the bright,
Scheduler sleeps—no crash tonight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Fixed batched_futures()' is vague and does not clearly convey what problem was fixed or what the main change accomplishes.	Replace with a more descriptive title that explains the specific fix, e.g., 'Exclude failed futures from batches in batched_futures()' or 'Handle exceptions in batched_futures() batching logic'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ilgar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-09T19:20:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.25%. Comparing base (4396b94) to head (98cb58c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1006      +/-   ##
==========================================
+ Coverage   94.19%   94.25%   +0.05%     
==========================================
  Files          39       39              
  Lines        2103     2122      +19     
==========================================
+ Hits         1981     2000      +19     
  Misses        122      122

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

🧹 Nitpick comments (1)

src/executorlib/standalone/batched.py (1)

39-43: ⚡ Quick win

Consider using a proper logger instead of print().

The current implementation uses print() for logging failed futures. Using Python's logging module would provide better control over log levels, formatting, and output destinations, making it easier to integrate with application-wide logging configuration.

📝 Suggested improvement

+import logging
 from concurrent.futures import Future

+logger = logging.getLogger(__name__)
+
 # Future objects we have already reported as failed -- so each failed job is logged once, not on
 # every scheduler pass (batched_futures is re-evaluated many times until a batch fills).
 _logged_failed_ids: set = set()

Then replace the print statement:

             if v.exception() is not None:
                 if id(v) not in _logged_failed_ids:
                     _logged_failed_ids.add(id(v))
-                    print(
-                        f"[batched_futures] EXCLUDING failed future from batch: "
-                        f"{type(v.exception()).__name__}: {v.exception()}",
-                        flush=True,
-                    )
+                    logger.warning(
+                        "EXCLUDING failed future from batch: %s: %s",
+                        type(v.exception()).__name__,
+                        v.exception(),
+                    )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/executorlib/standalone/batched.py` around lines 39 - 43, Replace the
ad-hoc print in the batched_futures code with the Python logging module:
create/get a module logger (e.g., logger = logging.getLogger(__name__)) and use
an appropriate level (logger.warning or logger.error) to log the message about
excluding a failed future, include the exception type and message in the log
string and pass exc_info=True or the exception object for richer context; update
any imports to include logging.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/executorlib/standalone/batched.py`:
- Around line 39-43: Replace the ad-hoc print in the batched_futures code with
the Python logging module: create/get a module logger (e.g., logger =
logging.getLogger(__name__)) and use an appropriate level (logger.warning or
logger.error) to log the message about excluding a failed future, include the
exception type and message in the log string and pass exc_info=True or the
exception object for richer context; update any imports to include logging.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c66e7c31-af06-47bc-9340-31aef8ad288e

📥 Commits

Reviewing files that changed from the base of the PR and between 4396b94 and 6976d48.

📒 Files selected for processing (2)

src/executorlib/standalone/batched.py
src/executorlib/task_scheduler/interactive/dependency.py

jan-janssen · 2026-06-10T08:05:50Z

@copilot Extend the unit tests to cover the newly added lines.

…cy.py

Updated versioning information and cleaned up type hints.

…ler batched path

Copilot · 2026-06-10T08:11:54Z

@copilot Extend the unit tests to cover the newly added lines.

Added tests in commit 0a12ed7:

tests/unit/standalone/test_batched.py — 3 new tests:

test_batched_futures_with_failed_future: a failed future in lst is excluded and the batch is built from remaining successful results
test_batched_futures_failed_future_logged_once: _logged_failed_ids ensures the same failed future is only logged once across multiple calls
test_batched_futures_partial_batch_due_to_failures: when all futures are resolved but n is unreachable due to failures, a partial batch is emitted instead of blocking forever

tests/unit/executor/test_single_dependencies.py — 1 new integration test:

test_batched_with_failed_upstream_future: end-to-end check that a failed upstream future is silently dropped and the batch futures still resolve successfully (no exception cascades downstream)

jan-janssen · 2026-06-10T08:37:51Z

@copilot There are still three newly added lines in dependency.py which are not covered by unit tests. Please add unit tests for those new lines? In addition, do not commit the _version.py file I want this to remain exactly the way it is.

Copilot · 2026-06-10T08:44:26Z

@copilot There are still three newly added lines in dependency.py which are not covered by unit tests. Please add unit tests for those new lines? In addition, do not commit the _version.py file I ...

Added in commit b76bd7a:

tests/unit/executor/test_single_dependencies.py — new test test_update_waiting_task_batched_exception:

Calls _update_waiting_task directly with a batched task dict whose skip_lst contains a mock future that passes the get_exception_lst check (.exception() returns None) but raises on .result(), triggering the except Exception as exc / set_exception / continue block
Verifies the batch future receives the exception (not a scheduler crash) and the task is not re-queued

Regarding _version.py: it is not modified by any of my commits — git diff cad969f..HEAD --name-only only shows the two test files.

jan-janssen · 2026-06-11T13:41:08Z

I split the changes in three different pull requests:

The concepts are the same, the place where the individual features were implemented are slightly different.

Fixed batched_futures()

9fd1c79

Format black

6976d48

jan-janssen marked this pull request as draft June 9, 2026 19:19

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Copilot started work on behalf of jan-janssen June 10, 2026 08:06 View session

Copilot AI and others added 3 commits June 10, 2026 08:09

Plan: Add unit tests for newly added lines in batched.py and dependen…

67f2570

…cy.py

Refactor versioning to use setuptools-scm

cad969f

Updated versioning information and cleaned up type hints.

Add unit tests for new batched_futures behavior and dependency schedu…

0a12ed7

…ler batched path

Copilot finished work on behalf of jan-janssen June 10, 2026 08:12

Copilot started work on behalf of jan-janssen June 10, 2026 08:38 View session

Add test for exception handler in _update_waiting_task batched path

11c7bcb

Copilot finished work on behalf of jan-janssen June 10, 2026 08:45

pyiron-runner and others added 2 commits June 10, 2026 08:46

Format black

3177aff

Update _version.py

98cb58c

jan-janssen changed the title ~~Fixed batched_futures()~~ [Feature] Fixed batched_futures() Jun 10, 2026

jan-janssen mentioned this pull request Jun 10, 2026

[Bug] Batching for failed futures #1012

Closed

Merge branch 'main' into ilgar

796f1c4

jan-janssen closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Fixed batched_futures()#1006

[Feature] Fixed batched_futures()#1006
jan-janssen wants to merge 9 commits into
mainfrom
ilgar

jan-janssen commented Jun 9, 2026 •

edited by Copilot AI

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

codecov Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

jan-janssen commented Jun 10, 2026

Uh oh!

Copilot AI commented Jun 10, 2026

Uh oh!

jan-janssen commented Jun 10, 2026

Uh oh!

Copilot AI commented Jun 10, 2026

Uh oh!

jan-janssen commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jan-janssen commented Jun 9, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues at a glance

Why each fix is correct

Tests

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jan-janssen commented Jun 10, 2026

Uh oh!

Copilot AI commented Jun 10, 2026

Uh oh!

jan-janssen commented Jun 10, 2026

Uh oh!

Copilot AI commented Jun 10, 2026

Uh oh!

jan-janssen commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jan-janssen commented Jun 9, 2026 •

edited by Copilot AI

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

codecov Bot commented Jun 9, 2026 •

edited

Loading