Skip to content

[codex] Fix media union matching#3462

Open
aaronvg wants to merge 2 commits into
canaryfrom
aaron/media-fix
Open

[codex] Fix media union matching#3462
aaronvg wants to merge 2 commits into
canaryfrom
aaron/media-fix

Conversation

@aaronvg

@aaronvg aaronvg commented May 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Fix runtime union discrimination for BAML media values so media outputs can match media union arms such as image | string.

Root Cause

Media values were converted to external Adt(Media(_)) values, but union matching did not inspect the underlying MediaValue.kind. That made the matcher report a generic media value as not matching image or string.

Changes

  • Teach external value matching to compare MediaValue.kind against Ty::Media union members.
  • Teach VM-side union discrimination to recognize both raw Rust media data and class-shaped media instances with _data.
  • Add coverage for raw media union wrapping and engine-level pdf | string returns.

Validation

  • cargo fmt --check
  • cargo test -p bex_engine media

Note

Medium Risk
Changes union member selection logic for Ty::Media during VM↔external conversions, which can affect runtime type wrapping and downstream consumers of union metadata. Scope is focused and covered by new unit and integration tests.

Overview
Fixes union discrimination so BAML media values correctly match Ty::Media arms (e.g. image | string) by inspecting the underlying MediaValue.kind rather than treating media as an opaque ADT.

Adds helpers to extract media kind from both external values (Adt(Media(_)), union wrappers, and class-shaped instances with _data) and VM objects (including RustData and instances), and extends tests to cover raw media union wrapping plus an engine-level pdf | string return selecting the pdf arm.

Reviewed by Cursor Bugbot for commit 9a358bf. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Media values now correctly integrate with union types, with intelligent type matching based on media kind (e.g., PDF, Generic)
    • Enhanced support for media value handling across different internal representations and nested structures
  • Tests

    • Expanded test coverage for PDF media behavior in union member scenarios

@vercel

vercel Bot commented May 4, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
baml-website-redesign Error Error May 4, 2026 7:16pm
beps Ready Ready Preview, Comment May 4, 2026 7:16pm
promptfiddle Ready Ready Preview, Comment May 4, 2026 7:16pm

Request Review

@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR extends union and type matching logic to discriminate media values by their MediaKind at runtime. New helper functions extract media kind from external and VM-level representations, enabling Ty::Media members to match values with compatible kinds. Tests validate that raw media values wrap into the correct union member and reject mismatches.

Changes

Media Kind Discrimination in Unions

Layer / File(s) Summary
Imports & Types
baml_language/crates/bex_engine/src/conversion.rs
MediaKind added to the baml_type imports.
Type Matching Logic
baml_language/crates/bex_engine/src/conversion.rs
value_matches_type now treats Ty::Media(kind, _) as matching values whose extracted actual media kind equals the same kind or MediaKind::Generic.
Union Member Selection
baml_language/crates/bex_engine/src/conversion.rs
find_matching_union_member adds an early discriminator path that extracts runtime media kind from the VM object and selects the first compatible Ty::Media union member.
Media Kind Extraction Helpers
baml_language/crates/bex_engine/src/conversion.rs
New private functions media_kind_matches, external_media_kind, object_media_kind, instance_media_kind, and vm_value_media_kind extract MediaKind from different runtime representations (raw media ADT, heap-backed instances, and nested unions/optionals).
Union Wrapping Tests
baml_language/crates/bex_engine/src/conversion.rs
Two new tests verify successful wrapping of raw media values into matching Ty::Media union arms and rejection of mismatched kinds.
PDF Union Test
baml_language/crates/bex_engine/tests/media_roundtrip.rs
New async test media_pdf_matches_union_member invokes unwrap_union, asserts the returned union's selected_option matches the expected PDF media type, and validates the wrapped payload contains media instance data with correct MediaKind.
Test Imports
baml_language/crates/bex_engine/tests/media_roundtrip.rs
Ty and TyAttr imports added to support type assertion metadata in the new test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


🐰 A media kind hops through the union's gate,
Matching kinds decide its fate,
PDF wraps secure and tight,
Generic accepts the sight,
Type-safe discrimination—what a delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[codex] Fix media union matching' is a clear and concise description of the main change: fixing runtime union discrimination for BAML media values in union type matching.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aaron/media-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 6/8 reviews remaining, refill in 12 minutes and 18 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@aaronvg aaronvg marked this pull request as ready for review May 4, 2026 18:56

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@baml_language/crates/bex_engine/src/conversion.rs`:
- Around line 550-556: The current object_media_kind path treats any instance
whose `_data` field is Adt(Media(_)) as Ty::Media and thus may misclassify
ordinary user classes before Ty::Class matching; update the logic in
object_media_kind (and the companion branch around media_kind_matches/Ty::Media)
to only consider the `_data` ADT-as-media heuristic when the object's runtime
class is one of the stdlib media wrapper classes (e.g., check the class
identity/name/ID against the known stdlib media wrappers) and otherwise fall
back to normal Ty::Class handling; after changing the guard add a regression
test demonstrating a user-defined class (e.g., `class Holder { _data pdf }`)
does not get treated as the `pdf` media variant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5d3b6d6a-cd3b-4474-9845-ed74eb408b74

📥 Commits

Reviewing files that changed from the base of the PR and between 953cd2f and 9a358bf.

📒 Files selected for processing (2)
  • baml_language/crates/bex_engine/src/conversion.rs
  • baml_language/crates/bex_engine/tests/media_roundtrip.rs

Comment on lines +550 to +556
if let Some(actual_media_kind) = object_media_kind(obj) {
if let Some(member) = members.iter().find(|m| {
matches!(m, Ty::Media(kind, _) if media_kind_matches(actual_media_kind, *kind))
}) {
return Some(member);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Limit _data-based media detection to actual media classes.

These helpers currently treat any instance whose _data field contains Adt(Media(_)) as Ty::Media. That changes union discrimination for ordinary user classes before Ty::Class matching runs, e.g. class Holder { _data pdf } against Holder | pdf can now resolve to pdf. Please gate this path on the instance/class being one of the stdlib media wrappers, then add a regression for a non-media class with _data.

Possible guard
 fn external_media_kind(value: &BexExternalValue) -> Option<MediaKind> {
     match value {
         BexExternalValue::Adt(BexExternalAdt::Media(media)) => Some(media.kind),
-        BexExternalValue::Instance { fields, .. } => {
+        BexExternalValue::Instance { class_name, fields }
+            if is_media_class_name(class_name) =>
+        {
             fields.get("_data").and_then(external_media_kind)
         }
         BexExternalValue::Union { value, .. } => external_media_kind(value),
         _ => None,
     }
 }

 fn instance_media_kind(instance: &bex_vm_types::Instance) -> Option<MediaKind> {
     let class_obj = unsafe { instance.class.get() };
     let Object::Class(class) = class_obj else {
         return None;
     };
+    if !is_media_type_name(&class.name) {
+        return None;
+    }

     class
         .fields
         .iter()
         .zip(instance.fields.iter())

Also applies to: 624-662

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/bex_engine/src/conversion.rs` around lines 550 - 556,
The current object_media_kind path treats any instance whose `_data` field is
Adt(Media(_)) as Ty::Media and thus may misclassify ordinary user classes before
Ty::Class matching; update the logic in object_media_kind (and the companion
branch around media_kind_matches/Ty::Media) to only consider the `_data`
ADT-as-media heuristic when the object's runtime class is one of the stdlib
media wrapper classes (e.g., check the class identity/name/ID against the known
stdlib media wrappers) and otherwise fall back to normal Ty::Class handling;
after changing the guard add a regression test demonstrating a user-defined
class (e.g., `class Holder { _data pdf }`) does not get treated as the `pdf`
media variant.

@codspeed-hq

codspeed-hq Bot commented May 4, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 44.04%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 10 improved benchmarks
✅ 9 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime vm_array_push_50k 9.4 ms 8 ms +18.26%
WallTime vm_array_iter_10k 5.8 ms 4.6 ms +24.53%
WallTime vm_closure_call_50k 10.1 ms 9 ms +12.49%
WallTime vm_field_access_50k 8.4 ms 7.5 ms +11.71%
WallTime vm_nested_loop 6.1 ms 5.2 ms +17.49%
WallTime vm_mixed_ops 20.8 ms 16.9 ms +23.54%
WallTime vm_string_concat_5k 54.1 ms 45.3 ms +19.44%
WallTime vm_fib_20 5.2 ms 4.5 ms +14.65%
WallTime engine_init_cost 2.9 ms 2 ms +44.04%
WallTime vm_call_chain_100_x_5k 37.9 ms 32.7 ms +16%

Comparing aaron/media-fix (9a358bf) with canary (953cd2f)

Open in CodSpeed

@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown

Binary size checks passed

7 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 6.1 MB 5.7 MB +459.6 KB (+8.1%) OK
bridge_cffi-stripped Linux 6.1 MB 5.7 MB +424.2 KB (+7.5%) OK
bridge_cffi macOS 5.0 MB 4.6 MB +424.3 KB (+9.2%) OK
bridge_cffi-stripped macOS 5.0 MB 4.7 MB +361.5 KB (+7.7%) OK
bridge_cffi Windows 5.1 MB 4.6 MB +437.2 KB (+9.5%) OK
bridge_cffi-stripped Windows 5.0 MB 4.7 MB +378.4 KB (+8.1%) OK
bridge_wasm WASM 3.3 MB 3.4 MB -26.5 KB (-0.8%) OK

Generated by cargo size-gate · workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant