Skip to content

Add baml.math.sum, baml.math.mean, and baml.math.median aggregation builtins#3824

Open
ATX24 wants to merge 10 commits into
canaryfrom
cursor/array-sum-mean-median-d818
Open

Add baml.math.sum, baml.math.mean, and baml.math.median aggregation builtins#3824
ATX24 wants to merge 10 commits into
canaryfrom
cursor/array-sum-mean-median-d818

Conversation

@ATX24

@ATX24 ATX24 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

The error

The baml.math namespace exposed only trunc, and Array<T> had no aggregation methods, so any numerical workload (finance/analytics) had to hand-roll sum/mean/median. Compiling a program that uses them failed:

function UseMath() -> float {
  let xs = [1.0, 2.0, 3.0, 4.0];
  baml.math.sum(xs)
}
function UseArrayMethods() -> float {
  [1.0, 2.0, 3.0].mean()
}
error: unresolved name: sum
 3 │   baml.math.sum(xs)
   │             ╰── unresolved name: sum   (E0003)

error: type `float[]` has no member `mean`
 8 │   xs.mean()
   │      ╰── type `float[]` has no member `mean`   (E0007)

Root cause

The standard library simply had no such functions. baml.math (crates/baml_builtins2/baml_std/baml/ns_math/math.baml) defined only trunc, and Array<T> (crates/baml_builtins2/baml_std/baml/containers.baml) defined no aggregation methods.

The fix

Added three pure-BAML functions to the baml.math namespace (ns_math/math.baml). They build on existing builtins (Array.reduce, Array.slice, float[].sort() from the Sortable blanket impl), so no VM/native ($rust_function) code was needed:

  • sum(values: float[]) -> float throws never — left-to-right fold from 0.0; empty array sums to 0.0.
  • mean(values: float[]) -> float throws root.errors.InvalidArgumentsum(values) / values.length(); throws on empty input.
  • median(values: float[]) -> float throws root.errors.InvalidArgument — sorts a copy (so the caller's array is not mutated), returns the middle element for odd counts or the mean of the two middles for even counts; throws on empty input.

I chose the baml.math namespace over methods on Array<T> because (a) the issue explicitly allows it ("or equivalents under baml.math"), (b) ARCHITECTURE.md prefers sub-namespaces over polluting the root, and (c) Array<T> is generic and the compiler intentionally has no Numeric bound. The API is float[]-based because this compiler deliberately removed int <: float (see normalize.rs::test_int_not_subtype_of_float) and has no implicit int→float widening; the sum docstring documents how to map integer data to floats first.

Every new function carries a /// docstring with parameters, return value, errors, and examples.

Also added a test project crates/baml_tests/baml_src/ns_math/math.baml (13 test blocks) and regenerated the affected snapshots (describe builtin listings and the __baml_std__ HIR/TIR/MIR/codegen snapshots — additive only).

Verification

Reproduction now compiles and the runtime tests pass:

$ ./target/debug/baml-cli check --from /tmp/repro/baml_src
    Finished checked 1 file(s)

$ ./target/debug/baml-cli test --from /tmp/repro/baml_src
PASS ::sum_basic
PASS ::sum_empty
PASS ::mean_basic
PASS ::mean_empty_throws
PASS ::median_odd
PASS ::median_even
PASS ::median_does_not_mutate
PASS ::median_empty_throws
    Finished 8 passed, 0 failed

Commands run from baml_language/:

  • cargo build -p baml_cli — builds clean.
  • ./target/debug/baml-cli test --from crates/baml_tests/baml_src1971 passed, 0 failed (incl. 13 new math_* tests).
  • cargo test -p baml_cli --lib describe_command — 81 passed.
  • cargo test -p baml_tests --lib __baml_std__ — 8 passed.

No Rust files were changed, so cargo fmt is a no-op for this diff.

CodeRabbit: CODERABBIT_API_KEY was empty in this environment, so coderabbit auth login failed and the automated review could not run. I self-reviewed the diff and corrected one issue I found: the initial sum docstring incorrectly claimed int[] was accepted via covariance — verified false (int[] <: float[] does not hold here), so I rewrote it to document the float-only contract and the int→float mapping workaround.

PR Checklist

  • Unit/integration tests added (ns_math/math.baml, 13 cases)
  • Self-review performed
  • New public stdlib items documented with docstrings + examples
  • No new warnings (no Rust changed)
Open in Web Open in Cursor 

Summary by CodeRabbit

  • New Features
    • Added float-array math built-ins: sum, mean, and median.
    • sum([]) returns 0.0 (never throws); mean([]) and median([]) throw InvalidArgument.
  • Bug Fixes
    • median no longer mutates the caller’s input array.
  • Tests
    • Added a BAML test suite for sum, mean, and median, including empty, single-element, and unsorted inputs.
  • Documentation
    • Updated local build/test troubleshooting with cargo-insta snapshot remediation steps.

Add sum, mean, and median functions to the baml.math stdlib namespace
for numeric (float[]) aggregation, which finance/analytics workloads
commonly need. They are pure-BAML functions built on existing builtins
(reduce, slice, sort). mean/median throw InvalidArgument on empty input;
median sorts a copy so it does not mutate the caller's array.

Regenerated the describe and __baml_std__ HIR/TIR/MIR/codegen snapshots
to include the three new functions.

Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Jun 24, 2026 5:26pm
promptfiddle Ready Ready Preview, Comment Jun 24, 2026 5:26pm
promptfiddle2 Ready Ready Preview, Comment Jun 24, 2026 5:26pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (7)
  • baml_language/crates/baml_cli/src/snapshots/baml_cli__describe_command_tests__render_builtin_package_listing.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/compiles/__baml_std__/baml_tests__compiles____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/compiles/__baml_std__/baml_tests__compiles____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/compiles/__baml_std__/baml_tests__compiles____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cd2e0707-fcb4-4e53-9c61-b479c901d9c5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Three float-array aggregation functions—sum, mean, and median—are added to the baml.math stdlib with corresponding Rust runtime implementations. The BAML declarations specify the public contracts (return types, error conditions), while the Rust layer provides internal helpers for diagnostics and type extraction, implements the aggregation logic with proper error handling, and validates behavior through a comprehensive test suite. Developer documentation is updated to guide snapshot test regeneration during local development and to expand the contribution and testing workflow.

Changes

baml.math float-array aggregation builtins

Layer / File(s) Summary
BAML function declarations and module documentation
baml_language/crates/baml_builtins2/baml_std/baml/ns_math/math.baml, baml_language/crates/bex_vm/src/package_baml/mod.rs
Adds sum(float[]) -> float (never throws), mean(float[]) -> float (throws root.errors.InvalidArgument for empty), and median(float[]) -> float (throws root.errors.InvalidArgument for empty). Module documentation updated to list all four math operations.
Rust implementation foundations
baml_language/crates/bex_vm/src/package_baml/math.rs
Introduces value_type_name helper for runtime type diagnostics and expect_float helper to safely extract f64 from boxed Values; establishes the BamlNamespaceMath impl context for new operations.
Rust sum and mean implementations
baml_language/crates/bex_vm/src/package_baml/math.rs
sum iterates values and accumulates via f64 extraction; mean guards against empty input and divides sum by array length, returning InvalidArgument error when needed.
Rust median implementation
baml_language/crates/bex_vm/src/package_baml/math.rs
Copies input to Vec<f64>, sorts using f64::total_cmp, and returns middle element for odd lengths or average of two middle elements for even lengths; throws InvalidArgument for empty input.
Test suite
baml_language/crates/baml_tests/baml_src/ns_math/math.baml
Tests normal cases (basic, single-element, negative, unsorted, odd/even counts), error paths for mean and median on empty inputs via try/catch, and side-effect verification that median does not mutate the caller's array.
Developer documentation updates
CONTRIBUTING.md, README-DEV.md
CONTRIBUTING.md is expanded with a numbered contributing workflow, revised local development setup steps including snapshot test remediation (install cargo-insta, accept snapshots, rerun tests), integration-testing prerequisites, and language-specific test execution for TypeScript, Python, and Ruby. README-DEV.md adds Rust snapshot test failure troubleshooting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 Hippity-hop through an array I go,
Summing up floats in a neat little row,
Mean splits the pile, median finds the heart,
Empty arrays? Errors—that's the clever part!
A copy is sorted, the caller's untouched,
These math helpers work—and they're perfectly Dutch! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding three aggregation functions (sum, mean, median) to the baml.math namespace, which is the primary focus of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cursor/array-sum-mean-median-d818

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

⏭️ Performance benchmarks were skipped

Perf benchmarks (CodSpeed) are opt-in on pull requests — they no longer run on every push. They always run automatically after merge to canary/main.

To run them on this PR, do any of the following, then push a commit (or re-run CI):

  • Add RUN_CODSPEED=1 to the PR description, or
  • Include run-perf or /perf in the PR title or any commit message.

@github-actions

Copy link
Copy Markdown

No description provided.

Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
@ATX24

ATX24 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Should probably be using rust functions here. No reason to use baml. Much faster.

Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
…icts

Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
CONTRIBUTING.md (1)

117-117: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix duplicate step numbering.

Step 5 is referenced twice (lines 117 and 119). Line 119's "Set up Git hooks" should be numbered as step 6 to maintain a consistent sequence.

📝 Proposed fix
 5. Run the integration tests.

-5. **Set up Git hooks (Recommended)**:
+6. **Set up Git hooks (Recommended)**:

Also applies to: 119-119

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CONTRIBUTING.md` at line 117, The CONTRIBUTING.md file has duplicate step
numbering where step 5 appears twice in the installation/setup instructions. The
first occurrence at line 117 is "Run the integration tests" and the second
occurrence at line 119 is "Set up Git hooks". Change the step number for "Set up
Git hooks" from 5 to 6 to maintain a proper sequential numbering in the list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@CONTRIBUTING.md`:
- Line 117: The CONTRIBUTING.md file has duplicate step numbering where step 5
appears twice in the installation/setup instructions. The first occurrence at
line 117 is "Run the integration tests" and the second occurrence at line 119 is
"Set up Git hooks". Change the step number for "Set up Git hooks" from 5 to 6 to
maintain a proper sequential numbering in the list.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f5303132-a267-417b-b58f-b24dd7a117e7

📥 Commits

Reviewing files that changed from the base of the PR and between 2b9fcd4 and 0010246.

📒 Files selected for processing (1)
  • CONTRIBUTING.md

Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
…an-median-d818

# Conflicts:
#	baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap
#	baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap

Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
…hot conflicts

Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants