Add baml.math.sum, baml.math.mean, and baml.math.median aggregation builtins#3824
Add baml.math.sum, baml.math.mean, and baml.math.median aggregation builtins#3824ATX24 wants to merge 10 commits into
baml.math.sum, baml.math.mean, and baml.math.median aggregation builtins#3824Conversation
Add sum, mean, and median functions to the baml.math stdlib namespace for numeric (float[]) aggregation, which finance/analytics workloads commonly need. They are pure-BAML functions built on existing builtins (reduce, slice, sort). mean/median throw InvalidArgument on empty input; median sorts a copy so it does not mutate the caller's array. Regenerated the describe and __baml_std__ HIR/TIR/MIR/codegen snapshots to include the three new functions. Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (7)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThree float-array aggregation functions— Changesbaml.math float-array aggregation builtins
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
⏭️ Performance benchmarks were skippedPerf benchmarks (CodSpeed) are opt-in on pull requests — they no longer run on every push. They always run automatically after merge to To run them on this PR, do any of the following, then push a commit (or re-run CI):
|
|
No description provided. |
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
|
Should probably be using rust functions here. No reason to use baml. Much faster. |
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
…icts Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CONTRIBUTING.md (1)
117-117: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winFix duplicate step numbering.
Step 5 is referenced twice (lines 117 and 119). Line 119's "Set up Git hooks" should be numbered as step 6 to maintain a consistent sequence.
📝 Proposed fix
5. Run the integration tests. -5. **Set up Git hooks (Recommended)**: +6. **Set up Git hooks (Recommended)**:Also applies to: 119-119
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CONTRIBUTING.md` at line 117, The CONTRIBUTING.md file has duplicate step numbering where step 5 appears twice in the installation/setup instructions. The first occurrence at line 117 is "Run the integration tests" and the second occurrence at line 119 is "Set up Git hooks". Change the step number for "Set up Git hooks" from 5 to 6 to maintain a proper sequential numbering in the list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@CONTRIBUTING.md`:
- Line 117: The CONTRIBUTING.md file has duplicate step numbering where step 5
appears twice in the installation/setup instructions. The first occurrence at
line 117 is "Run the integration tests" and the second occurrence at line 119 is
"Set up Git hooks". Change the step number for "Set up Git hooks" from 5 to 6 to
maintain a proper sequential numbering in the list.
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
…an-median-d818 # Conflicts: # baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap # baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
…hot conflicts Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
Co-authored-by: Dhilan Shah <ATX24@users.noreply.github.com>
The error
The
baml.mathnamespace exposed onlytrunc, andArray<T>had no aggregation methods, so any numerical workload (finance/analytics) had to hand-rollsum/mean/median. Compiling a program that uses them failed:Root cause
The standard library simply had no such functions.
baml.math(crates/baml_builtins2/baml_std/baml/ns_math/math.baml) defined onlytrunc, andArray<T>(crates/baml_builtins2/baml_std/baml/containers.baml) defined no aggregation methods.The fix
Added three pure-BAML functions to the
baml.mathnamespace (ns_math/math.baml). They build on existing builtins (Array.reduce,Array.slice,float[].sort()from theSortableblanket impl), so no VM/native ($rust_function) code was needed:sum(values: float[]) -> float throws never— left-to-right fold from0.0; empty array sums to0.0.mean(values: float[]) -> float throws root.errors.InvalidArgument—sum(values) / values.length(); throws on empty input.median(values: float[]) -> float throws root.errors.InvalidArgument— sorts a copy (so the caller's array is not mutated), returns the middle element for odd counts or the mean of the two middles for even counts; throws on empty input.I chose the
baml.mathnamespace over methods onArray<T>because (a) the issue explicitly allows it ("or equivalents underbaml.math"), (b)ARCHITECTURE.mdprefers sub-namespaces over polluting the root, and (c)Array<T>is generic and the compiler intentionally has noNumericbound. The API isfloat[]-based because this compiler deliberately removedint <: float(seenormalize.rs::test_int_not_subtype_of_float) and has no implicit int→float widening; thesumdocstring documents how to map integer data to floats first.Every new function carries a
///docstring with parameters, return value, errors, and examples.Also added a test project
crates/baml_tests/baml_src/ns_math/math.baml(13testblocks) and regenerated the affected snapshots (describebuiltin listings and the__baml_std__HIR/TIR/MIR/codegen snapshots — additive only).Verification
Reproduction now compiles and the runtime tests pass:
Commands run from
baml_language/:cargo build -p baml_cli— builds clean../target/debug/baml-cli test --from crates/baml_tests/baml_src— 1971 passed, 0 failed (incl. 13 newmath_*tests).cargo test -p baml_cli --lib describe_command— 81 passed.cargo test -p baml_tests --lib __baml_std__— 8 passed.No Rust files were changed, so
cargo fmtis a no-op for this diff.CodeRabbit:
CODERABBIT_API_KEYwas empty in this environment, socoderabbit auth loginfailed and the automated review could not run. I self-reviewed the diff and corrected one issue I found: the initialsumdocstring incorrectly claimedint[]was accepted via covariance — verified false (int[] <: float[]does not hold here), so I rewrote it to document the float-only contract and the int→float mapping workaround.PR Checklist
ns_math/math.baml, 13 cases)Summary by CodeRabbit
mathbuilt-ins:sum,mean, andmedian.sum([])returns0.0(never throws);mean([])andmedian([])throwInvalidArgument.medianno longer mutates the caller’s input array.sum,mean, andmedian, including empty, single-element, and unsorted inputs.cargo-instasnapshot remediation steps.