Skip to content

feat(lsp): resolution-first semantic tokens rework#3867

Open
codeshaunted wants to merge 1 commit into
canaryfrom
avery/sem-tok-cleanup
Open

feat(lsp): resolution-first semantic tokens rework#3867
codeshaunted wants to merge 1 commit into
canaryfrom
avery/sem-tok-cleanup

Conversation

@codeshaunted

@codeshaunted codeshaunted commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Reworks the BAML semantic-tokens classifier to be resolution-driven, mirroring rust-analyzer: walk the lossless CST for token positions + syntactic tokens (keywords/operators/punctuation), and resolve identifier meaning through the HIR/inference instead of guessing from the CST.

What's here

  • tokens/index.rs — span-keyed resolution index over every inference-bearing scope (function / lambda / let) via the new uniform tir::scope_body lookup, so identifiers inside spawn/block/closure bodies classify with their own scope's inference.
  • Interface members resolve like class membersresolve_interface_member now records MemberResolution::InterfaceMethod / InterfaceField. So .as<I>.method(), interface Self methods, and chained interface calls classify through the normal path; typos stay neutral. MIR-safe (new variants → None item-refs, fall through to existing interface dispatch; 323 runtime interface tests pass).
  • Highlighting fixes: as / with / is keywords, generators, generic type args + params (typeParameter), associated types (decl + binding), self member accesses (no longer namespace), methods after .as<Interface>, string escapes (escapeSequence), byte strings, qualified type paths, true/false/null, object/map keys vs values.
  • tools_semantic_tokens — standalone web viewer for the classifier (no VSCode needed).

Tests

78 semantic-token fixtures; 439 LSP + 323 runtime-interface + 172 tir/mir tests pass; clippy-clean.

Next steps / gaps

See crates/baml_lsp2_actions/src/tokens/GAPS.md — the priority is scaling like rust-analyzer (on-demand per-token resolution + semanticTokens/range viewport + delta), plus a couple remaining feature items (optional params at call site, a boolean literal token type).

Summary by CodeRabbit

  • New Features

    • Added richer semantic-token support across the editor, including token modifiers, more token types, and improved highlighting for strings, keywords, methods, properties, and interfaces.
    • Added a local semantic-tokens viewer for inspecting, comparing, and accepting fixture outputs.
    • Expanded test coverage with many new language scenarios for patterns, generators, clients, matches, loops, templates, and interfaces.
  • Bug Fixes

    • Improved resolution of interface members, field accesses, and multi-segment paths for more accurate navigation and highlighting.
    • Fixed several tokenization edge cases so expected editor coloring and definitions are more consistent.

Rework the BAML semantic-tokens classifier to be resolution-driven
(rust-analyzer style): walk the lossless CST for token positions and
syntactic tokens (keywords/operators/punctuation), and resolve identifier
meaning through the HIR/inference rather than guessing from the CST.

- tokens/index.rs: span-keyed resolution index built over every
  inference-bearing scope (function / lambda / let) via the new uniform
  tir::scope_body lookup, so names in spawn/block/closure bodies classify
  with their own scope's inference.
- Interface members now resolve like class members: resolve_interface_member
  records MemberResolution::InterfaceMethod / InterfaceField. MIR-safe (new
  variants lower to None item-refs and fall through to interface dispatch).
- Highlighting fixes: as / with / is keywords, generators, generic type args
  and params (typeParameter), associated types (decl + binding), self member
  accesses, methods after .as<Interface>, string escapes (escapeSequence),
  byte strings, qualified type paths, true/false/null literals, object/map
  keys vs values.
- bitflags-backed ModifierSet; classifier shared with `baml describe`.
- tools_semantic_tokens: standalone web viewer for the classifier.

78 fixtures. See crates/baml_lsp2_actions/src/tokens/GAPS.md for the
scaling next-steps (on-demand resolution, range/viewport, delta) and the
remaining known gaps.
@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Jun 27, 2026 8:13am
promptfiddle Ready Ready Preview, Comment Jun 27, 2026 8:13am
promptfiddle2 Ready Ready Preview, Comment Jun 27, 2026 8:13am

Request Review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Semantic-token classification now carries modifiers and escape-sequence tokens, interface-member resolution is added through TIR/MIR/LSP paths, and a local viewer plus fixtures reflect the revised output.

Changes

Semantic Token Pipeline

Layer / File(s) Summary
Compiler resolution and lowering
baml_language/crates/baml_compiler2_tir/src/*, baml_language/crates/baml_compiler2_mir/src/lower.rs, baml_language/crates/baml_compiler_syntax/src/syntax_kind.rs, baml_language/crates/baml_lsp2_actions/src/definition.rs
ScopeBody and interface-member resolution variants are added, MIR lowering skips interface members in callable paths, is_keyword() includes is and with, and interface go-to-definition resolves to the declaring interface span.
Semantic token model and classification
baml_language/Cargo.toml, baml_language/crates/baml_lsp2_actions/Cargo.toml, baml_language/crates/baml_lsp2_actions/src/{lib.rs,paint.rs,tokens.rs,tokens/{classify.rs,index.rs,GAPS.md}}, baml_language/crates/baml_lsp2_actions_tests/src/runner.rs
SemanticTokenType gains EscapeSequence, tokens carry modifiers, resolution-based classification/indexing is added, the palette and re-exports are updated, and the test harness prints modifiers.
LSP semantic-token transport
baml_language/crates/bex_project/src/bex_lsp/multi_project/request.rs
The semantic-token legend now advertises modifiers and emitted token_modifiers_bitset values come from the compiler tokens.
Viewer tool
baml_language/crates/tools_semantic_tokens/{Cargo.toml,README.md,src/*}
The viewer loads fixtures, computes and accepts semantic-token snapshots, exposes HTTP routes, renders the browser UI, and restarts on source changes.
Syntax, control-flow, and matching fixtures
baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/{array_pattern.baml,backtick_string.baml,block_attributes.baml,break_continue.baml,byte_strings.baml,cancel_token.baml,catch_all_keyword.baml,catch_throw.baml,const_binding.baml,const_let_else_defer.baml,constructor_spread.baml,enum_decls.baml,exhaustive.baml,for_loops.baml,function.baml,header_comments.baml,if_else.baml,if_expression.baml,if_let_chain.baml,is_operator.baml,literal_types.baml,match_expr.baml,match_literal_types.baml,patterns_new.baml,task_group.baml,test_expr_throwing_body.baml,testset.baml,top_level_binding.baml,watch_let.baml,while_loop.baml,with_keyword.baml,throws_clause.baml}
Fixtures update token spans, modifiers, and diagnostics for loops, matches, literals, scoping, cancellation, throws, and test syntax.
Member, path, and type-resolution fixtures
baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/{class.baml,deep_method_call.baml,enum_member_access.baml,extends_chain.baml,field_alias.baml,generic_field_chain.baml,generic_function_call.baml,implements_for.baml,index_access.baml,interface_impl.baml,interfaces_sort_comparable.baml,iterator.baml,json_map_literal.baml,json_parse_stringify_intrinsics.baml,lambda_advanced.baml,map_literal_keys.baml,map_type_and_methods.baml,method_explicit_type_args.baml,multi_segment_path.baml,requires_clause.baml,self_qualified_call.baml,type_alias.baml}
Fixtures update token spans for declarations, member access, multi-segment paths, generic inference, JSON/map operations, and type aliases.
Client, prompt, and template fixtures
baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/{client.baml,config_dictionary.baml,dynamic_type_builder.baml,generators.baml,http_config.baml,llm_function.baml,ns_*.baml,remap_role.baml,retry_policy_valid_retry.baml,signature_variety.baml,stream_llm_inferred_typeargs.baml,template_string.baml,template_string_calls.baml,test_block.baml,testset_vibes_nested.baml,type_aliases_jinja.baml}
Client declarations, prompt bodies, template strings, generator configs, retry policies, and nested testsets update expected semantic-token output.

Sequence Diagram(s)

Accept snapshot flow.

sequenceDiagram
  participant BrowserUI
  participant server_accept_fixture
  participant analysis_accept_fixture
  participant runner_run_test
  participant updater_update_test_file
  BrowserUI->>server_accept_fixture: POST /api/accept
  server_accept_fixture->>analysis_accept_fixture: accept_fixture(name)
  analysis_accept_fixture->>runner_run_test: run_test(path)
  runner_run_test->>updater_update_test_file: write semantic_tokens block
  updater_update_test_file-->>analysis_accept_fixture: updated file
  analysis_accept_fixture-->>server_accept_fixture: ok
  server_accept_fixture-->>BrowserUI: { ok: true }
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • BoundaryML/baml#3577: Shares the backtick-string escape-sequence token work that feeds the new EscapeSequence semantic token ranges.
  • BoundaryML/baml#3658: Touches the same SyntaxKind::is_keyword() path used to classify keyword tokens.
  • BoundaryML/baml#3842: Updates SemanticTokenType::EscapeSequence styling in paint.rs, matching the new token kind.

Poem

I hopped through tokens in the code,
With escape-sequence sparkles on my road.
Interfaces blinked, the viewer sighed,
And modifiers twinkled side by side.
Hop-hoppity—semantic gardens grow! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: a resolution-first rework of semantic token classification in LSP.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch avery/sem-tok-cleanup
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch avery/sem-tok-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
baml_language/crates/baml_compiler2_mir/src/lower.rs (1)

6111-6145: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Handle captured roots in interface-method value lowering.

This path only runs when the first segment is a local. A captured receiver in let f = x.eq skips the candidate-switch branch and falls through to lower_multi_segment_path_as_field_chain, so the method value gets lowered like a field/map lookup instead of a bound-method dispatch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/baml_compiler2_mir/src/lower.rs` around lines 6111 -
6145, The interface-method value lowering in the path that starts by checking
self.locals for the first segment only handles plain locals, so captured
receivers like let f = x.eq bypass bound-method dispatch and get treated as
field/map chains. Update the logic in lower.rs around the method-candidate
resolution branch to recognize captured roots as valid receivers, then route
them through lower_path_receiver_to_local and emit_bound_method_candidate_switch
just like local roots, instead of falling through to
lower_multi_segment_path_as_field_chain.
🧹 Nitpick comments (1)
baml_language/crates/tools_semantic_tokens/src/analysis.rs (1)

46-244: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add unit tests around the offset/parser helpers.

This file is mostly pure string/offset logic, so a small #[cfg(test)] module for UTF-16 offsets, escaped snapshot lines, and text-only diff cases would give faster and more targeted coverage than end-to-end fixtures alone. As per coding guidelines, "Prefer writing Rust unit tests over integration tests where possible."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/tools_semantic_tokens/src/analysis.rs` around lines 46 -
244, Add a focused #[cfg(test)] module in analysis.rs to unit test the pure
helpers instead of relying only on fixture flows. Cover offset_to_line_col,
utf16_offset, parse_expected, and diff_count with cases for non-ASCII/UTF-16
positioning, escaped or quoted snapshot text lines, and text-only token diffs.
Use the existing helper symbols and Token shape to keep the tests local and
resilient to line changes.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@baml_language/crates/baml_compiler2_tir/src/builder.rs`:
- Around line 12397-12405: The default-method branch in builder.rs is writing a
redundant MemberResolution::InterfaceMethod into self.resolutions that is always
overwritten later by InterfaceDefaultMethod. Remove the insert in the
default-method loop around the bound check, and keep InterfaceMethod only in the
required-method loop where no later overwrite occurs; use the existing
iface_loc/method_name and func_loc resolution paths to keep the intended
behavior clear.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_throw.baml`:
- Line 10: The semantic tokens fixture for catch handling is missing coverage
for the catch binding identifier, so update the `catch_throw.baml` expectations
to include the bound name in the `MayFail(x) catch (e) {` case and the other
referenced catch blocks. Make sure the `semantic_tokens` test asserts token
classification for the catch variable itself, not just the `catch` keyword, so
the resolution-driven identifier behavior is verified through the existing
fixture.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/constructor_spread.baml`:
- Line 8: The semantic-tokens fixture for constructor spreads is missing the
spread operand token, so extend the expected token list in the semantic token
test around the MyClass constructor spread case to include the x in ...x. Update
the fixture used by the semantic_tokens test for constructor_spread.baml (and
the related cases in the same test block) so the spread operand is asserted and
regressions in operand classification are caught.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/dynamic_type_builder.baml`:
- Around line 21-27: The semantic-tokens fixture records `functions
[ExtractResume]` but never asserts that `ExtractResume` is classified as a
function reference, so add the missing token assertion in the
`dynamic_type_builder.baml` test around `test ReturnDynamicClassTest` using the
existing semantic token expectations for `functions[...]`. Make sure the
assertion explicitly covers the `ExtractResume` identifier so regressions in
function-reference classification inside `functions [...]` are caught, and apply
the same update to the matching fixture noted in the comment.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/field_alias.baml`:
- Around line 35-37: Update the semantic tokens fixture for field alias handling
so it asserts both alias endpoints, not just the `as` keyword. In
`field_alias.baml`, expand the expectations around the `name as display_name`
alias to include tokens for the source field (`name`) and the target alias
(`display_name`) alongside the existing `as` token, using the same semantic
token test pattern used elsewhere in these fixtures.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/index_access.baml`:
- Around line 45-46: The semantic tokens fixture is incorrectly marked as
diagnostic-free even though the preceding examples document validation errors.
Update the fixture around the diagnostics marker so it no longer uses
<no-diagnostics-expected> in the index_access.baml test content; either add the
expected diagnostics for the cases shown above or split the negative examples
into a separate fixture and keep this one truly diagnostic-free. Use the
existing semantic_tokens fixture structure and diagnostics comment block to
locate the change.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_images_pipeline.baml`:
- Around line 40-41: The semantic token test only asserts the client property
key in ns_images_pipeline.baml and misses the actual client symbol reference, so
regressions in client-reference classification could go unnoticed. Update the
expected tokens in the semantic tokens test data to also assert the
AiGatewayImagenImageClient reference inside the function config, using the
existing semantic token fixture around the client/prompt entries to keep the
check tied to the relevant symbols.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/requires_clause.baml`:
- Around line 1-30: The semantic-tokens fixture for requires-chain coverage does
not currently exercise the transitive Aged member through Person requires Named,
Aged. Update the test content in requires_clause.baml to add a Person-typed or
self.age access that resolves age via the requires chain, so the fixture
actually validates the grandparent-interface flow instead of only the concrete
Employee field. Use the existing Person, Aged, and get_age symbols to place the
new access alongside the current example.

In `@baml_language/crates/baml_lsp2_actions/src/definition.rs`:
- Around line 426-436: The interface member resolution in definition.rs is
currently collapsing both MemberResolution::InterfaceMethod and
MemberResolution::InterfaceField to Definition::Interface, which makes
go-to-definition jump to the interface header instead of the accessed member.
Update the definition lookup in the branch handling
MemberResolution::InterfaceMethod and MemberResolution::InterfaceField to use
the member-specific identifier from the resolution and resolve the corresponding
interface method/field span, following the same pattern used by the Field and
method branches above. Ensure utils::definition_span is called with the member
declaration target rather than the interface declaration.

In `@baml_language/crates/baml_lsp2_actions/src/tokens/index.rs`:
- Around line 56-63: The token indexing logic in the scope filter is skipping
template-body scopes entirely, which leaves template interpolations like string
tokens unprocessed. Update the scope handling in the token indexer to not
unconditionally continue on scope.is_template_body, and instead let those bodies
be indexed with the same inference context as the enclosing ExprBody. Keep the
existing function/lambda/let scope checks in the token indexing flow, but ensure
template-body synthetic scopes are still traversed so resolution-driven
highlighting works.
- Around line 157-176: In tokens/index.rs, the fallback in the
path-classification logic currently marks unresolved leaf segments as Namespace,
which incorrectly masks typos as valid identifiers. Update the resolve/classify
flow around the `resolve_path_at` and `classify::classify_resolved` branch so
only non-leaf prefixes can fall back to `SemanticTokenType::Namespace`, while
the final unresolved segment stays neutral when `inference.resolution(expr_id)`
is missing and `local_root` does not apply.

In `@baml_language/crates/tools_semantic_tokens/src/analysis.rs`:
- Around line 202-207: `load_fixture` and `list_fixture_names` are swallowing
load failures by fabricating empty results, so update them to return errors
instead of defaulting to empty source or empty names. In `load_fixture`, use the
parsed file presence in `parsed.files` as a required condition and surface an
`InvalidData` error when no virtual file is available; in `list_fixture_names`,
stop flattening away `read_dir` failures and propagate the underlying I/O error.
Keep the fix localized to the `load_fixture` and `list_fixture_names` paths so
callers can distinguish “no data” from “failed to load.”
- Around line 225-243: The diff_count function is only comparing token type and
modifiers for matching position/length, so text-only snapshot drift can be
missed even when the committed semantic_tokens fixture would still change.
Update diff_count in analysis.rs to also compare the normalized committed lexeme
or raw snapshot text for each token key, and treat any text-only mismatch as a
diff alongside the existing type/modifier checks.

In `@baml_language/crates/tools_semantic_tokens/src/index.html`:
- Around line 440-445: The load-failure handler in the fixture fetch flow should
clear stale state instead of only rendering the error pane. In the catch path
around getJson in index.html, reset detail, disable the Accept control, and
clear the inspector/status state so the UI does not keep reflecting the previous
fixture after a failed /api/fixture request.
- Around line 456-469: The accept button flow in the click handler for acceptEl
leaves the control disabled after a failed /api/accept request, so the user
cannot retry without reselecting a fixture. Update the async click handler to
restore the enabled state in a finally block after the getJson("/api/accept"),
refreshFixtures(), and select(selected) sequence, and base that restore on the
current selection state (for example the same condition used before disabling).
- Around line 296-312: Stop interpolating fixture names directly into HTML
attributes in renderList(). The current esc() usage in the fixture button markup
still leaves data-name and title vulnerable because quotes are not escaped.
Update the row-building logic in renderList() to set these attributes through
DOM APIs or use an attribute-safe escape for f.name before inserting it, while
keeping the Scratchpad row consistent.
- Around line 414-427: The renderScratch flow applies any finished /api/tokens
response even if it was started for an older scratchText value, so update it to
ignore stale results. Add a request version/sequence check inside renderScratch
around the getJson("/api/tokens") call and only assign scratchTokens and render
when the response still matches the latest request for the current scratchText,
using the existing renderScratch, scratchTokens, and selected/SCRATCH logic to
locate the fix.

In `@baml_language/crates/tools_semantic_tokens/src/main.rs`:
- Around line 87-95: The port selection logic in bind can overflow when adding
the scan offset to a high preferred port. Update the loop in bind to clamp the
upper bound at u16::MAX before probing ports, so the SocketAddr construction
never wraps or panics; keep the rest of the TcpListener::bind retry behavior
unchanged and preserve the existing error path when no port is available.

In `@baml_language/crates/tools_semantic_tokens/src/server.rs`:
- Around line 183-196: resolve_fixture currently trusts Path::is_file, which
follows symlinks and can let fixture names escape fixtures_dir; update the
validation in resolve_fixture to reject symlinked entries before returning the
path. Use the existing resolve_fixture helper and its dir.join(name) result to
check the resolved candidate itself is a regular file owned inside the fixture
tree, and return an error for any symlink or non-regular path before
/api/fixture or /api/accept can use it.

In `@baml_language/crates/tools_semantic_tokens/src/staleness.rs`:
- Around line 31-37: The staleness watcher in newest_source_mtime only tracks
this crate and baml_lsp2_actions, but semantic token behavior also depends on
baml_compiler2_tir, baml_compiler2_mir, and baml_compiler_syntax. Update the
roots list (and any related watcher logic in staleness.rs) so those compiler
crates are included in the mtime scan, ensuring edits there trigger the viewer
restart and keep semantic tokens fresh.

---

Outside diff comments:
In `@baml_language/crates/baml_compiler2_mir/src/lower.rs`:
- Around line 6111-6145: The interface-method value lowering in the path that
starts by checking self.locals for the first segment only handles plain locals,
so captured receivers like let f = x.eq bypass bound-method dispatch and get
treated as field/map chains. Update the logic in lower.rs around the
method-candidate resolution branch to recognize captured roots as valid
receivers, then route them through lower_path_receiver_to_local and
emit_bound_method_candidate_switch just like local roots, instead of falling
through to lower_multi_segment_path_as_field_chain.

---

Nitpick comments:
In `@baml_language/crates/tools_semantic_tokens/src/analysis.rs`:
- Around line 46-244: Add a focused #[cfg(test)] module in analysis.rs to unit
test the pure helpers instead of relying only on fixture flows. Cover
offset_to_line_col, utf16_offset, parse_expected, and diff_count with cases for
non-ASCII/UTF-16 positioning, escaped or quoted snapshot text lines, and
text-only token diffs. Use the existing helper symbols and Token shape to keep
the tests local and resilient to line changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3f4c5b6e-52a8-4a11-bea9-e5d65b8d4846

📥 Commits

Reviewing files that changed from the base of the PR and between d5f68a6 and 775a89e.

⛔ Files ignored due to path filters (1)
  • baml_language/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (100)
  • baml_language/Cargo.toml
  • baml_language/crates/baml_cli/src/paint.rs
  • baml_language/crates/baml_compiler2_mir/src/lower.rs
  • baml_language/crates/baml_compiler2_tir/src/builder.rs
  • baml_language/crates/baml_compiler2_tir/src/inference.rs
  • baml_language/crates/baml_compiler_syntax/src/syntax_kind.rs
  • baml_language/crates/baml_lsp2_actions/Cargo.toml
  • baml_language/crates/baml_lsp2_actions/src/definition.rs
  • baml_language/crates/baml_lsp2_actions/src/lib.rs
  • baml_language/crates/baml_lsp2_actions/src/tokens.rs
  • baml_language/crates/baml_lsp2_actions/src/tokens/GAPS.md
  • baml_language/crates/baml_lsp2_actions/src/tokens/classify.rs
  • baml_language/crates/baml_lsp2_actions/src/tokens/index.rs
  • baml_language/crates/baml_lsp2_actions_tests/src/runner.rs
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/array_pattern.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/attributes.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/backtick_string.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/block_attributes.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/break_continue.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/byte_strings.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/cancel_token.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_all_keyword.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_throw.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/class.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/client.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/config_dictionary.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/const_binding.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/const_let_else_defer.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/constructor_spread.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/deep_method_call.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/dynamic_type_builder.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/enum_decls.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/enum_member_access.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/exhaustive.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/extends_chain.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/field_alias.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/for_loops.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/function.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/generators.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/generic_field_chain.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/generic_function_call.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/header_comments.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/http_config.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/if_else.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/if_expression.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/if_let_chain.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/implements_for.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/index_access.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interface_impl.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interfaces_inferred_generic_type_args.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interfaces_iter_core.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interfaces_sort_comparable.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/is_operator.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/iterator.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/jinja_control_prompt.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/json_map_literal.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/json_parse_stringify_intrinsics.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/lambda_advanced.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/literal_types.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/llm_function.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/map_literal_keys.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/map_type_and_methods.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/match_expr.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/match_literal_types.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/method_explicit_type_args.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/multi_segment_path.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_agent_clients.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_ansi.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_game_clients.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_images_pipeline.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_sentiment.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/patterns_new.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/remap_role.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/requires_clause.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/retry_policy_valid_retry.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/self_qualified_call.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/signature_variety.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/stream_llm_inferred_typeargs.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/task_group.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/template_string.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/template_string_calls.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/test_block.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/test_expr_throwing_body.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/testset.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/testset_vibes_nested.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/throws_clause.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/top_level_binding.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/type_alias.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/type_aliases_jinja.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/watch_let.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/while_loop.baml
  • baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/with_keyword.baml
  • baml_language/crates/bex_project/src/bex_lsp/multi_project/request.rs
  • baml_language/crates/tools_semantic_tokens/Cargo.toml
  • baml_language/crates/tools_semantic_tokens/README.md
  • baml_language/crates/tools_semantic_tokens/src/analysis.rs
  • baml_language/crates/tools_semantic_tokens/src/index.html
  • baml_language/crates/tools_semantic_tokens/src/main.rs
  • baml_language/crates/tools_semantic_tokens/src/server.rs
  • baml_language/crates/tools_semantic_tokens/src/staleness.rs

Comment on lines +12397 to +12405
if bound {
self.resolutions.insert(
at,
crate::inference::MemberResolution::InterfaceMethod {
iface_loc,
method_name: member.clone(),
},
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Redundant InterfaceMethod resolution overwritten by InterfaceDefaultMethod.

In the default-method loop this records MemberResolution::InterfaceMethod at at, but Line 12667-12673 unconditionally overwrites self.resolutions[at] with InterfaceDefaultMethod before this iteration returns. So the insert here is always clobbered and never observable downstream.

Default methods have a func_loc and should resolve as InterfaceDefaultMethod (which this code already does at Line 12667); the InterfaceMethod variant is the one needed for the required-method loop (Line 12702-12710), where there is no later override. Consider dropping this insertion to avoid the redundant write and the implication that a bound default-method call resolves to InterfaceMethod.

♻️ Proposed removal
                 let func_loc = baml_compiler2_hir::loc::FunctionLoc::new(db, file, fn_id);
-                if bound {
-                    self.resolutions.insert(
-                        at,
-                        crate::inference::MemberResolution::InterfaceMethod {
-                            iface_loc,
-                            method_name: member.clone(),
-                        },
-                    );
-                }
                 let sig = baml_compiler2_ppir::elaborated_function_signature(db, func_loc);

If the intent was instead for InterfaceMethod to survive in some path, please confirm — as written it cannot.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if bound {
self.resolutions.insert(
at,
crate::inference::MemberResolution::InterfaceMethod {
iface_loc,
method_name: member.clone(),
},
);
}
let func_loc = baml_compiler2_hir::loc::FunctionLoc::new(db, file, fn_id);
let sig = baml_compiler2_ppir::elaborated_function_signature(db, func_loc);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/baml_compiler2_tir/src/builder.rs` around lines 12397 -
12405, The default-method branch in builder.rs is writing a redundant
MemberResolution::InterfaceMethod into self.resolutions that is always
overwritten later by InterfaceDefaultMethod. Remove the insert in the
default-method loop around the bound check, and keep InterfaceMethod only in the
required-method loop where no later overwrite occurs; use the existing
iface_loc/method_name and func_loc resolution paths to keep the intended
behavior clear.

}

function Handled(x: int) -> string {
MayFail(x) catch (e) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Cover the catch binding identifier.

The expected output asserts catch on Line 10, but skips the bound name e. Since this PR is about resolution-driven identifier classification, the fixture should verify that the catch binding itself is tokenized, not just the keyword.

Also applies to: 37-40

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_throw.baml`
at line 10, The semantic tokens fixture for catch handling is missing coverage
for the catch binding identifier, so update the `catch_throw.baml` expectations
to include the bound name in the `MayFail(x) catch (e) {` case and the other
referenced catch blocks. Make sure the `semantic_tokens` test asserts token
classification for the catch variable itself, not just the `catch` keyword, so
the resolution-driven identifier behavior is verified through the existing
fixture.


let x = MyClass { a: 1, b: "2" };

let y = MyClass { a: 1, ...x };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the spread operand token too.

Line 8 is the new case under test, but the expected semantic-token list stops before the x in ...x. That means this fixture won't catch regressions where spread operands fail to resolve/classify.

Suggested fixture addition
 // constructor_spread.baml:8:9 (class) len=7 "MyClass"
 // constructor_spread.baml:8:19 (property) len=1 "a"
 // constructor_spread.baml:8:22 (number) len=1 "1"
+// constructor_spread.baml:8:28 (variable) len=1 "x"
 // constructor_spread.baml:11:1 (keyword) len=3 "let"

Also applies to: 43-48

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/constructor_spread.baml`
at line 8, The semantic-tokens fixture for constructor spreads is missing the
spread operand token, so extend the expected token list in the semantic token
test around the MyClass constructor spread case to include the x in ...x. Update
the fixture used by the semantic_tokens test for constructor_spread.baml (and
the related cases in the same test block) so the spread operand is asserted and
regressions in operand classification are caught.

Comment on lines +21 to +27
function ExtractResume(from_text: string) -> Resume {
client "openai/gpt-4o-mini"
prompt #"Hello"#
}

test ReturnDynamicClassTest {
functions [ExtractResume]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the ExtractResume reference here.

This fixture records the functions property but never asserts a token for ExtractResume, so it won't catch regressions in function-reference classification inside functions [...].

Also applies to: 116-118

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/dynamic_type_builder.baml`
around lines 21 - 27, The semantic-tokens fixture records `functions
[ExtractResume]` but never asserts that `ExtractResume` is classified as a
function reference, so add the missing token assertion in the
`dynamic_type_builder.baml` test around `test ReturnDynamicClassTest` using the
existing semantic token expectations for `functions[...]`. Make sure the
assertion explicitly covers the `ExtractResume` identifier so regressions in
function-reference classification inside `functions [...]` are caught, and apply
the same update to the matching fixture noted in the comment.

Comment on lines +35 to +37
// field_alias.baml:11:3 (keyword) len=10 "implements"
// field_alias.baml:11:14 (interface) len=5 "Named"
// field_alias.baml:12:10 (keyword) len=2 "as"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Assert both alias endpoints on Line 12.

This fixture only checks the as keyword for name as display_name. It never asserts semantic tokens for name or display_name, so a regression in interface-field/class-field alias resolution would still pass.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/field_alias.baml`
around lines 35 - 37, Update the semantic tokens fixture for field alias
handling so it asserts both alias endpoints, not just the `as` keyword. In
`field_alias.baml`, expand the expectations around the `name as display_name`
alias to include tokens for the source field (`name`) and the target alias
(`display_name`) alongside the existing `as` token, using the same semantic
token test pattern used elsewhere in these fixtures.

Comment on lines +440 to +445
try {
detail = await getJson(`/api/fixture?name=${encodeURIComponent(name)}`);
} catch (e) {
panesEl.innerHTML = `<div class="pane-body"><span class="muted">${esc(String(e))}</span></div>`;
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Clear the previous fixture state on load failures.

If /api/fixture fails here, the old detail and accept/status state stay live. That leaves the page showing an error pane while still reflecting the previous fixture's actions and metadata. Reset detail, disable Accept, and clear the inspector in this catch path.

🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 442-442: Avoid assigning untrusted data to innerHTML/outerHTML or document.write
Context: panesEl.innerHTML = <div class="pane-body"><span class="muted">${esc(String(e))}</span></div>
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').

(inner-outer-html)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/tools_semantic_tokens/src/index.html` around lines 440 -
445, The load-failure handler in the fixture fetch flow should clear stale state
instead of only rendering the error pane. In the catch path around getJson in
index.html, reset detail, disable the Accept control, and clear the
inspector/status state so the UI does not keep reflecting the previous fixture
after a failed /api/fixture request.

Comment on lines +456 to +469
acceptEl.addEventListener("click", async () => {
if (selected === SCRATCH || acceptEl.disabled) return;
acceptEl.disabled = true;
try {
await getJson("/api/accept", {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ name: selected }),
});
await refreshFixtures();
await select(selected);
} catch (e) {
statusEl.textContent = `Accept failed: ${e}`;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Re-enable Accept after a failed save.

Line 458 disables the button, but the failure path never restores it. One transient write error forces the user to reselect the fixture before retrying. Recompute the enabled state in a finally block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/tools_semantic_tokens/src/index.html` around lines 456 -
469, The accept button flow in the click handler for acceptEl leaves the control
disabled after a failed /api/accept request, so the user cannot retry without
reselecting a fixture. Update the async click handler to restore the enabled
state in a finally block after the getJson("/api/accept"), refreshFixtures(),
and select(selected) sequence, and base that restore on the current selection
state (for example the same condition used before disabling).

Comment on lines +87 to +95
async fn bind(base: u16) -> Result<(TcpListener, u16)> {
for offset in 0..50 {
let port = base + offset;
let addr = SocketAddr::from(([127, 0, 0, 1], port));
if let Ok(listener) = TcpListener::bind(addr).await {
return Ok((listener, port));
}
}
anyhow::bail!("no free port in {base}..{}", base + 50)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Clamp the port scan to u16::MAX.

Line 89 can overflow when a user passes a preferred port near 65535. In debug builds this panics; in release builds it can wrap and start probing unrelated low ports.

Proposed fix
 async fn bind(base: u16) -> Result<(TcpListener, u16)> {
-    for offset in 0..50 {
-        let port = base + offset;
+    for port in base..=base.saturating_add(49) {
         let addr = SocketAddr::from(([127, 0, 0, 1], port));
         if let Ok(listener) = TcpListener::bind(addr).await {
             return Ok((listener, port));
         }
     }
-    anyhow::bail!("no free port in {base}..{}", base + 50)
+    anyhow::bail!("no free port in {base}..={}", base.saturating_add(49))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async fn bind(base: u16) -> Result<(TcpListener, u16)> {
for offset in 0..50 {
let port = base + offset;
let addr = SocketAddr::from(([127, 0, 0, 1], port));
if let Ok(listener) = TcpListener::bind(addr).await {
return Ok((listener, port));
}
}
anyhow::bail!("no free port in {base}..{}", base + 50)
async fn bind(base: u16) -> Result<(TcpListener, u16)> {
for port in base..=base.saturating_add(49) {
let addr = SocketAddr::from(([127, 0, 0, 1], port));
if let Ok(listener) = TcpListener::bind(addr).await {
return Ok((listener, port));
}
}
anyhow::bail!("no free port in {base}..={}", base.saturating_add(49))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/tools_semantic_tokens/src/main.rs` around lines 87 - 95,
The port selection logic in bind can overflow when adding the scan offset to a
high preferred port. Update the loop in bind to clamp the upper bound at
u16::MAX before probing ports, so the SocketAddr construction never wraps or
panics; keep the rest of the TcpListener::bind retry behavior unchanged and
preserve the existing error path when no port is available.

Comment on lines +183 to +196
fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> {
let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy())
== Some(std::borrow::Cow::Borrowed(name));
let is_baml = Path::new(name)
.extension()
.is_some_and(|ext| ext.eq_ignore_ascii_case("baml"));
if !is_basename || !is_baml {
return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
}
let path = dir.join(name);
if !path.is_file() {
return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string()));
}
Ok(path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Reject symlinked fixtures before returning the path.

Line 193 uses Path::is_file(), which follows symlinks. A checked-in foo.baml symlink can therefore escape fixtures_dir, letting /api/fixture read and /api/accept overwrite files outside the fixture tree.

Suggested fix
 fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> {
     let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy())
         == Some(std::borrow::Cow::Borrowed(name));
     let is_baml = Path::new(name)
         .extension()
         .is_some_and(|ext| ext.eq_ignore_ascii_case("baml"));
     if !is_basename || !is_baml {
         return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
     }
-    let path = dir.join(name);
-    if !path.is_file() {
+    let root = dir
+        .canonicalize()
+        .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
+    let path = dir.join(name);
+    let meta = std::fs::symlink_metadata(&path)
+        .map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?;
+    if meta.file_type().is_symlink() {
+        return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
+    }
+    let path = path
+        .canonicalize()
+        .map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?;
+    if !path.starts_with(&root) || !path.is_file() {
         return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string()));
     }
     Ok(path)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> {
let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy())
== Some(std::borrow::Cow::Borrowed(name));
let is_baml = Path::new(name)
.extension()
.is_some_and(|ext| ext.eq_ignore_ascii_case("baml"));
if !is_basename || !is_baml {
return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
}
let path = dir.join(name);
if !path.is_file() {
return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string()));
}
Ok(path)
fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> {
let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy())
== Some(std::borrow::Cow::Borrowed(name));
let is_baml = Path::new(name)
.extension()
.is_some_and(|ext| ext.eq_ignore_ascii_case("baml"));
if !is_basename || !is_baml {
return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
}
let root = dir
.canonicalize()
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let path = dir.join(name);
let meta = std::fs::symlink_metadata(&path)
.map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?;
if meta.file_type().is_symlink() {
return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
}
let path = path
.canonicalize()
.map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?;
if !path.starts_with(&root) || !path.is_file() {
return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string()));
}
Ok(path)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/tools_semantic_tokens/src/server.rs` around lines 183 -
196, resolve_fixture currently trusts Path::is_file, which follows symlinks and
can let fixture names escape fixtures_dir; update the validation in
resolve_fixture to reject symlinked entries before returning the path. Use the
existing resolve_fixture helper and its dir.join(name) result to check the
resolved candidate itself is a regular file owned inside the fixture tree, and
return an error for any symlink or non-regular path before /api/fixture or
/api/accept can use it.

Comment on lines +31 to +37
/// Newest mtime among the classifier (`baml_lsp2_actions`) and viewer sources.
fn newest_source_mtime() -> Option<SystemTime> {
let manifest = Path::new(env!("CARGO_MANIFEST_DIR"));
let roots = [
manifest.join("src"),
manifest.join("../baml_lsp2_actions/src"),
];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Watch the compiler crates that semantic tokens actually depend on.

The watcher only tracks this crate plus baml_lsp2_actions, but this PR also changes token behavior in baml_compiler2_tir, baml_compiler2_mir, and baml_compiler_syntax. Editing those files leaves the viewer stale, so the self-restart contract breaks across a big part of the classifier stack.

Proposed fix
 fn newest_source_mtime() -> Option<SystemTime> {
     let manifest = Path::new(env!("CARGO_MANIFEST_DIR"));
     let roots = [
         manifest.join("src"),
         manifest.join("../baml_lsp2_actions/src"),
+        manifest.join("../baml_compiler2_tir/src"),
+        manifest.join("../baml_compiler2_mir/src"),
+        manifest.join("../baml_compiler_syntax/src"),
     ];
     let mut newest = None;
     for root in roots {
         newest_under(&root, &mut newest);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Newest mtime among the classifier (`baml_lsp2_actions`) and viewer sources.
fn newest_source_mtime() -> Option<SystemTime> {
let manifest = Path::new(env!("CARGO_MANIFEST_DIR"));
let roots = [
manifest.join("src"),
manifest.join("../baml_lsp2_actions/src"),
];
fn newest_source_mtime() -> Option<SystemTime> {
let manifest = Path::new(env!("CARGO_MANIFEST_DIR"));
let roots = [
manifest.join("src"),
manifest.join("../baml_lsp2_actions/src"),
manifest.join("../baml_compiler2_tir/src"),
manifest.join("../baml_compiler2_mir/src"),
manifest.join("../baml_compiler_syntax/src"),
];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@baml_language/crates/tools_semantic_tokens/src/staleness.rs` around lines 31
- 37, The staleness watcher in newest_source_mtime only tracks this crate and
baml_lsp2_actions, but semantic token behavior also depends on
baml_compiler2_tir, baml_compiler2_mir, and baml_compiler_syntax. Update the
roots list (and any related watcher logic in staleness.rs) so those compiler
crates are included in the mtime scan, ensuring edits there trigger the viewer
restart and keep semantic tokens fresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant