feat(lsp): resolution-first semantic tokens rework#3867
Conversation
Rework the BAML semantic-tokens classifier to be resolution-driven (rust-analyzer style): walk the lossless CST for token positions and syntactic tokens (keywords/operators/punctuation), and resolve identifier meaning through the HIR/inference rather than guessing from the CST. - tokens/index.rs: span-keyed resolution index built over every inference-bearing scope (function / lambda / let) via the new uniform tir::scope_body lookup, so names in spawn/block/closure bodies classify with their own scope's inference. - Interface members now resolve like class members: resolve_interface_member records MemberResolution::InterfaceMethod / InterfaceField. MIR-safe (new variants lower to None item-refs and fall through to interface dispatch). - Highlighting fixes: as / with / is keywords, generators, generic type args and params (typeParameter), associated types (decl + binding), self member accesses, methods after .as<Interface>, string escapes (escapeSequence), byte strings, qualified type paths, true/false/null literals, object/map keys vs values. - bitflags-backed ModifierSet; classifier shared with `baml describe`. - tools_semantic_tokens: standalone web viewer for the classifier. 78 fixtures. See crates/baml_lsp2_actions/src/tokens/GAPS.md for the scaling next-steps (on-demand resolution, range/viewport, delta) and the remaining known gaps.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughSemantic-token classification now carries modifiers and escape-sequence tokens, interface-member resolution is added through TIR/MIR/LSP paths, and a local viewer plus fixtures reflect the revised output. ChangesSemantic Token Pipeline
Sequence Diagram(s)Accept snapshot flow. sequenceDiagram
participant BrowserUI
participant server_accept_fixture
participant analysis_accept_fixture
participant runner_run_test
participant updater_update_test_file
BrowserUI->>server_accept_fixture: POST /api/accept
server_accept_fixture->>analysis_accept_fixture: accept_fixture(name)
analysis_accept_fixture->>runner_run_test: run_test(path)
runner_run_test->>updater_update_test_file: write semantic_tokens block
updater_update_test_file-->>analysis_accept_fixture: updated file
analysis_accept_fixture-->>server_accept_fixture: ok
server_accept_fixture-->>BrowserUI: { ok: true }
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
baml_language/crates/baml_compiler2_mir/src/lower.rs (1)
6111-6145: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winHandle captured roots in interface-method value lowering.
This path only runs when the first segment is a local. A captured receiver in
let f = x.eqskips the candidate-switch branch and falls through tolower_multi_segment_path_as_field_chain, so the method value gets lowered like a field/map lookup instead of a bound-method dispatch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@baml_language/crates/baml_compiler2_mir/src/lower.rs` around lines 6111 - 6145, The interface-method value lowering in the path that starts by checking self.locals for the first segment only handles plain locals, so captured receivers like let f = x.eq bypass bound-method dispatch and get treated as field/map chains. Update the logic in lower.rs around the method-candidate resolution branch to recognize captured roots as valid receivers, then route them through lower_path_receiver_to_local and emit_bound_method_candidate_switch just like local roots, instead of falling through to lower_multi_segment_path_as_field_chain.
🧹 Nitpick comments (1)
baml_language/crates/tools_semantic_tokens/src/analysis.rs (1)
46-244: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd unit tests around the offset/parser helpers.
This file is mostly pure string/offset logic, so a small
#[cfg(test)]module for UTF-16 offsets, escaped snapshot lines, and text-only diff cases would give faster and more targeted coverage than end-to-end fixtures alone. As per coding guidelines, "Prefer writing Rust unit tests over integration tests where possible."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@baml_language/crates/tools_semantic_tokens/src/analysis.rs` around lines 46 - 244, Add a focused #[cfg(test)] module in analysis.rs to unit test the pure helpers instead of relying only on fixture flows. Cover offset_to_line_col, utf16_offset, parse_expected, and diff_count with cases for non-ASCII/UTF-16 positioning, escaped or quoted snapshot text lines, and text-only token diffs. Use the existing helper symbols and Token shape to keep the tests local and resilient to line changes.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@baml_language/crates/baml_compiler2_tir/src/builder.rs`:
- Around line 12397-12405: The default-method branch in builder.rs is writing a
redundant MemberResolution::InterfaceMethod into self.resolutions that is always
overwritten later by InterfaceDefaultMethod. Remove the insert in the
default-method loop around the bound check, and keep InterfaceMethod only in the
required-method loop where no later overwrite occurs; use the existing
iface_loc/method_name and func_loc resolution paths to keep the intended
behavior clear.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_throw.baml`:
- Line 10: The semantic tokens fixture for catch handling is missing coverage
for the catch binding identifier, so update the `catch_throw.baml` expectations
to include the bound name in the `MayFail(x) catch (e) {` case and the other
referenced catch blocks. Make sure the `semantic_tokens` test asserts token
classification for the catch variable itself, not just the `catch` keyword, so
the resolution-driven identifier behavior is verified through the existing
fixture.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/constructor_spread.baml`:
- Line 8: The semantic-tokens fixture for constructor spreads is missing the
spread operand token, so extend the expected token list in the semantic token
test around the MyClass constructor spread case to include the x in ...x. Update
the fixture used by the semantic_tokens test for constructor_spread.baml (and
the related cases in the same test block) so the spread operand is asserted and
regressions in operand classification are caught.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/dynamic_type_builder.baml`:
- Around line 21-27: The semantic-tokens fixture records `functions
[ExtractResume]` but never asserts that `ExtractResume` is classified as a
function reference, so add the missing token assertion in the
`dynamic_type_builder.baml` test around `test ReturnDynamicClassTest` using the
existing semantic token expectations for `functions[...]`. Make sure the
assertion explicitly covers the `ExtractResume` identifier so regressions in
function-reference classification inside `functions [...]` are caught, and apply
the same update to the matching fixture noted in the comment.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/field_alias.baml`:
- Around line 35-37: Update the semantic tokens fixture for field alias handling
so it asserts both alias endpoints, not just the `as` keyword. In
`field_alias.baml`, expand the expectations around the `name as display_name`
alias to include tokens for the source field (`name`) and the target alias
(`display_name`) alongside the existing `as` token, using the same semantic
token test pattern used elsewhere in these fixtures.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/index_access.baml`:
- Around line 45-46: The semantic tokens fixture is incorrectly marked as
diagnostic-free even though the preceding examples document validation errors.
Update the fixture around the diagnostics marker so it no longer uses
<no-diagnostics-expected> in the index_access.baml test content; either add the
expected diagnostics for the cases shown above or split the negative examples
into a separate fixture and keep this one truly diagnostic-free. Use the
existing semantic_tokens fixture structure and diagnostics comment block to
locate the change.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_images_pipeline.baml`:
- Around line 40-41: The semantic token test only asserts the client property
key in ns_images_pipeline.baml and misses the actual client symbol reference, so
regressions in client-reference classification could go unnoticed. Update the
expected tokens in the semantic tokens test data to also assert the
AiGatewayImagenImageClient reference inside the function config, using the
existing semantic token fixture around the client/prompt entries to keep the
check tied to the relevant symbols.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/requires_clause.baml`:
- Around line 1-30: The semantic-tokens fixture for requires-chain coverage does
not currently exercise the transitive Aged member through Person requires Named,
Aged. Update the test content in requires_clause.baml to add a Person-typed or
self.age access that resolves age via the requires chain, so the fixture
actually validates the grandparent-interface flow instead of only the concrete
Employee field. Use the existing Person, Aged, and get_age symbols to place the
new access alongside the current example.
In `@baml_language/crates/baml_lsp2_actions/src/definition.rs`:
- Around line 426-436: The interface member resolution in definition.rs is
currently collapsing both MemberResolution::InterfaceMethod and
MemberResolution::InterfaceField to Definition::Interface, which makes
go-to-definition jump to the interface header instead of the accessed member.
Update the definition lookup in the branch handling
MemberResolution::InterfaceMethod and MemberResolution::InterfaceField to use
the member-specific identifier from the resolution and resolve the corresponding
interface method/field span, following the same pattern used by the Field and
method branches above. Ensure utils::definition_span is called with the member
declaration target rather than the interface declaration.
In `@baml_language/crates/baml_lsp2_actions/src/tokens/index.rs`:
- Around line 56-63: The token indexing logic in the scope filter is skipping
template-body scopes entirely, which leaves template interpolations like string
tokens unprocessed. Update the scope handling in the token indexer to not
unconditionally continue on scope.is_template_body, and instead let those bodies
be indexed with the same inference context as the enclosing ExprBody. Keep the
existing function/lambda/let scope checks in the token indexing flow, but ensure
template-body synthetic scopes are still traversed so resolution-driven
highlighting works.
- Around line 157-176: In tokens/index.rs, the fallback in the
path-classification logic currently marks unresolved leaf segments as Namespace,
which incorrectly masks typos as valid identifiers. Update the resolve/classify
flow around the `resolve_path_at` and `classify::classify_resolved` branch so
only non-leaf prefixes can fall back to `SemanticTokenType::Namespace`, while
the final unresolved segment stays neutral when `inference.resolution(expr_id)`
is missing and `local_root` does not apply.
In `@baml_language/crates/tools_semantic_tokens/src/analysis.rs`:
- Around line 202-207: `load_fixture` and `list_fixture_names` are swallowing
load failures by fabricating empty results, so update them to return errors
instead of defaulting to empty source or empty names. In `load_fixture`, use the
parsed file presence in `parsed.files` as a required condition and surface an
`InvalidData` error when no virtual file is available; in `list_fixture_names`,
stop flattening away `read_dir` failures and propagate the underlying I/O error.
Keep the fix localized to the `load_fixture` and `list_fixture_names` paths so
callers can distinguish “no data” from “failed to load.”
- Around line 225-243: The diff_count function is only comparing token type and
modifiers for matching position/length, so text-only snapshot drift can be
missed even when the committed semantic_tokens fixture would still change.
Update diff_count in analysis.rs to also compare the normalized committed lexeme
or raw snapshot text for each token key, and treat any text-only mismatch as a
diff alongside the existing type/modifier checks.
In `@baml_language/crates/tools_semantic_tokens/src/index.html`:
- Around line 440-445: The load-failure handler in the fixture fetch flow should
clear stale state instead of only rendering the error pane. In the catch path
around getJson in index.html, reset detail, disable the Accept control, and
clear the inspector/status state so the UI does not keep reflecting the previous
fixture after a failed /api/fixture request.
- Around line 456-469: The accept button flow in the click handler for acceptEl
leaves the control disabled after a failed /api/accept request, so the user
cannot retry without reselecting a fixture. Update the async click handler to
restore the enabled state in a finally block after the getJson("/api/accept"),
refreshFixtures(), and select(selected) sequence, and base that restore on the
current selection state (for example the same condition used before disabling).
- Around line 296-312: Stop interpolating fixture names directly into HTML
attributes in renderList(). The current esc() usage in the fixture button markup
still leaves data-name and title vulnerable because quotes are not escaped.
Update the row-building logic in renderList() to set these attributes through
DOM APIs or use an attribute-safe escape for f.name before inserting it, while
keeping the Scratchpad row consistent.
- Around line 414-427: The renderScratch flow applies any finished /api/tokens
response even if it was started for an older scratchText value, so update it to
ignore stale results. Add a request version/sequence check inside renderScratch
around the getJson("/api/tokens") call and only assign scratchTokens and render
when the response still matches the latest request for the current scratchText,
using the existing renderScratch, scratchTokens, and selected/SCRATCH logic to
locate the fix.
In `@baml_language/crates/tools_semantic_tokens/src/main.rs`:
- Around line 87-95: The port selection logic in bind can overflow when adding
the scan offset to a high preferred port. Update the loop in bind to clamp the
upper bound at u16::MAX before probing ports, so the SocketAddr construction
never wraps or panics; keep the rest of the TcpListener::bind retry behavior
unchanged and preserve the existing error path when no port is available.
In `@baml_language/crates/tools_semantic_tokens/src/server.rs`:
- Around line 183-196: resolve_fixture currently trusts Path::is_file, which
follows symlinks and can let fixture names escape fixtures_dir; update the
validation in resolve_fixture to reject symlinked entries before returning the
path. Use the existing resolve_fixture helper and its dir.join(name) result to
check the resolved candidate itself is a regular file owned inside the fixture
tree, and return an error for any symlink or non-regular path before
/api/fixture or /api/accept can use it.
In `@baml_language/crates/tools_semantic_tokens/src/staleness.rs`:
- Around line 31-37: The staleness watcher in newest_source_mtime only tracks
this crate and baml_lsp2_actions, but semantic token behavior also depends on
baml_compiler2_tir, baml_compiler2_mir, and baml_compiler_syntax. Update the
roots list (and any related watcher logic in staleness.rs) so those compiler
crates are included in the mtime scan, ensuring edits there trigger the viewer
restart and keep semantic tokens fresh.
---
Outside diff comments:
In `@baml_language/crates/baml_compiler2_mir/src/lower.rs`:
- Around line 6111-6145: The interface-method value lowering in the path that
starts by checking self.locals for the first segment only handles plain locals,
so captured receivers like let f = x.eq bypass bound-method dispatch and get
treated as field/map chains. Update the logic in lower.rs around the
method-candidate resolution branch to recognize captured roots as valid
receivers, then route them through lower_path_receiver_to_local and
emit_bound_method_candidate_switch just like local roots, instead of falling
through to lower_multi_segment_path_as_field_chain.
---
Nitpick comments:
In `@baml_language/crates/tools_semantic_tokens/src/analysis.rs`:
- Around line 46-244: Add a focused #[cfg(test)] module in analysis.rs to unit
test the pure helpers instead of relying only on fixture flows. Cover
offset_to_line_col, utf16_offset, parse_expected, and diff_count with cases for
non-ASCII/UTF-16 positioning, escaped or quoted snapshot text lines, and
text-only token diffs. Use the existing helper symbols and Token shape to keep
the tests local and resilient to line changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3f4c5b6e-52a8-4a11-bea9-e5d65b8d4846
⛔ Files ignored due to path filters (1)
baml_language/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (100)
baml_language/Cargo.tomlbaml_language/crates/baml_cli/src/paint.rsbaml_language/crates/baml_compiler2_mir/src/lower.rsbaml_language/crates/baml_compiler2_tir/src/builder.rsbaml_language/crates/baml_compiler2_tir/src/inference.rsbaml_language/crates/baml_compiler_syntax/src/syntax_kind.rsbaml_language/crates/baml_lsp2_actions/Cargo.tomlbaml_language/crates/baml_lsp2_actions/src/definition.rsbaml_language/crates/baml_lsp2_actions/src/lib.rsbaml_language/crates/baml_lsp2_actions/src/tokens.rsbaml_language/crates/baml_lsp2_actions/src/tokens/GAPS.mdbaml_language/crates/baml_lsp2_actions/src/tokens/classify.rsbaml_language/crates/baml_lsp2_actions/src/tokens/index.rsbaml_language/crates/baml_lsp2_actions_tests/src/runner.rsbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/array_pattern.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/attributes.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/backtick_string.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/block_attributes.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/break_continue.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/byte_strings.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/cancel_token.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_all_keyword.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_throw.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/class.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/client.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/config_dictionary.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/const_binding.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/const_let_else_defer.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/constructor_spread.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/deep_method_call.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/dynamic_type_builder.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/enum_decls.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/enum_member_access.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/exhaustive.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/extends_chain.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/field_alias.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/for_loops.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/function.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/generators.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/generic_field_chain.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/generic_function_call.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/header_comments.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/http_config.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/if_else.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/if_expression.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/if_let_chain.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/implements_for.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/index_access.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interface_impl.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interfaces_inferred_generic_type_args.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interfaces_iter_core.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/interfaces_sort_comparable.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/is_operator.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/iterator.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/jinja_control_prompt.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/json_map_literal.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/json_parse_stringify_intrinsics.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/lambda_advanced.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/literal_types.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/llm_function.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/map_literal_keys.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/map_type_and_methods.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/match_expr.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/match_literal_types.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/method_explicit_type_args.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/multi_segment_path.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_agent_clients.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_ansi.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_game_clients.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_images_pipeline.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/ns_sentiment.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/patterns_new.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/remap_role.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/requires_clause.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/retry_policy_valid_retry.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/self_qualified_call.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/signature_variety.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/stream_llm_inferred_typeargs.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/task_group.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/template_string.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/template_string_calls.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/test_block.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/test_expr_throwing_body.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/testset.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/testset_vibes_nested.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/throws_clause.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/top_level_binding.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/type_alias.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/type_aliases_jinja.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/watch_let.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/while_loop.bamlbaml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/with_keyword.bamlbaml_language/crates/bex_project/src/bex_lsp/multi_project/request.rsbaml_language/crates/tools_semantic_tokens/Cargo.tomlbaml_language/crates/tools_semantic_tokens/README.mdbaml_language/crates/tools_semantic_tokens/src/analysis.rsbaml_language/crates/tools_semantic_tokens/src/index.htmlbaml_language/crates/tools_semantic_tokens/src/main.rsbaml_language/crates/tools_semantic_tokens/src/server.rsbaml_language/crates/tools_semantic_tokens/src/staleness.rs
| if bound { | ||
| self.resolutions.insert( | ||
| at, | ||
| crate::inference::MemberResolution::InterfaceMethod { | ||
| iface_loc, | ||
| method_name: member.clone(), | ||
| }, | ||
| ); | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Redundant InterfaceMethod resolution overwritten by InterfaceDefaultMethod.
In the default-method loop this records MemberResolution::InterfaceMethod at at, but Line 12667-12673 unconditionally overwrites self.resolutions[at] with InterfaceDefaultMethod before this iteration returns. So the insert here is always clobbered and never observable downstream.
Default methods have a func_loc and should resolve as InterfaceDefaultMethod (which this code already does at Line 12667); the InterfaceMethod variant is the one needed for the required-method loop (Line 12702-12710), where there is no later override. Consider dropping this insertion to avoid the redundant write and the implication that a bound default-method call resolves to InterfaceMethod.
♻️ Proposed removal
let func_loc = baml_compiler2_hir::loc::FunctionLoc::new(db, file, fn_id);
- if bound {
- self.resolutions.insert(
- at,
- crate::inference::MemberResolution::InterfaceMethod {
- iface_loc,
- method_name: member.clone(),
- },
- );
- }
let sig = baml_compiler2_ppir::elaborated_function_signature(db, func_loc);If the intent was instead for InterfaceMethod to survive in some path, please confirm — as written it cannot.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if bound { | |
| self.resolutions.insert( | |
| at, | |
| crate::inference::MemberResolution::InterfaceMethod { | |
| iface_loc, | |
| method_name: member.clone(), | |
| }, | |
| ); | |
| } | |
| let func_loc = baml_compiler2_hir::loc::FunctionLoc::new(db, file, fn_id); | |
| let sig = baml_compiler2_ppir::elaborated_function_signature(db, func_loc); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/baml_compiler2_tir/src/builder.rs` around lines 12397 -
12405, The default-method branch in builder.rs is writing a redundant
MemberResolution::InterfaceMethod into self.resolutions that is always
overwritten later by InterfaceDefaultMethod. Remove the insert in the
default-method loop around the bound check, and keep InterfaceMethod only in the
required-method loop where no later overwrite occurs; use the existing
iface_loc/method_name and func_loc resolution paths to keep the intended
behavior clear.
| } | ||
|
|
||
| function Handled(x: int) -> string { | ||
| MayFail(x) catch (e) { |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Cover the catch binding identifier.
The expected output asserts catch on Line 10, but skips the bound name e. Since this PR is about resolution-driven identifier classification, the fixture should verify that the catch binding itself is tokenized, not just the keyword.
Also applies to: 37-40
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/catch_throw.baml`
at line 10, The semantic tokens fixture for catch handling is missing coverage
for the catch binding identifier, so update the `catch_throw.baml` expectations
to include the bound name in the `MayFail(x) catch (e) {` case and the other
referenced catch blocks. Make sure the `semantic_tokens` test asserts token
classification for the catch variable itself, not just the `catch` keyword, so
the resolution-driven identifier behavior is verified through the existing
fixture.
|
|
||
| let x = MyClass { a: 1, b: "2" }; | ||
|
|
||
| let y = MyClass { a: 1, ...x }; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Assert the spread operand token too.
Line 8 is the new case under test, but the expected semantic-token list stops before the x in ...x. That means this fixture won't catch regressions where spread operands fail to resolve/classify.
Suggested fixture addition
// constructor_spread.baml:8:9 (class) len=7 "MyClass"
// constructor_spread.baml:8:19 (property) len=1 "a"
// constructor_spread.baml:8:22 (number) len=1 "1"
+// constructor_spread.baml:8:28 (variable) len=1 "x"
// constructor_spread.baml:11:1 (keyword) len=3 "let"Also applies to: 43-48
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/constructor_spread.baml`
at line 8, The semantic-tokens fixture for constructor spreads is missing the
spread operand token, so extend the expected token list in the semantic token
test around the MyClass constructor spread case to include the x in ...x. Update
the fixture used by the semantic_tokens test for constructor_spread.baml (and
the related cases in the same test block) so the spread operand is asserted and
regressions in operand classification are caught.
| function ExtractResume(from_text: string) -> Resume { | ||
| client "openai/gpt-4o-mini" | ||
| prompt #"Hello"# | ||
| } | ||
|
|
||
| test ReturnDynamicClassTest { | ||
| functions [ExtractResume] |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Assert the ExtractResume reference here.
This fixture records the functions property but never asserts a token for ExtractResume, so it won't catch regressions in function-reference classification inside functions [...].
Also applies to: 116-118
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/dynamic_type_builder.baml`
around lines 21 - 27, The semantic-tokens fixture records `functions
[ExtractResume]` but never asserts that `ExtractResume` is classified as a
function reference, so add the missing token assertion in the
`dynamic_type_builder.baml` test around `test ReturnDynamicClassTest` using the
existing semantic token expectations for `functions[...]`. Make sure the
assertion explicitly covers the `ExtractResume` identifier so regressions in
function-reference classification inside `functions [...]` are caught, and apply
the same update to the matching fixture noted in the comment.
| // field_alias.baml:11:3 (keyword) len=10 "implements" | ||
| // field_alias.baml:11:14 (interface) len=5 "Named" | ||
| // field_alias.baml:12:10 (keyword) len=2 "as" |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Assert both alias endpoints on Line 12.
This fixture only checks the as keyword for name as display_name. It never asserts semantic tokens for name or display_name, so a regression in interface-field/class-field alias resolution would still pass.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@baml_language/crates/baml_lsp2_actions_tests/test_files/semantic_tokens/field_alias.baml`
around lines 35 - 37, Update the semantic tokens fixture for field alias
handling so it asserts both alias endpoints, not just the `as` keyword. In
`field_alias.baml`, expand the expectations around the `name as display_name`
alias to include tokens for the source field (`name`) and the target alias
(`display_name`) alongside the existing `as` token, using the same semantic
token test pattern used elsewhere in these fixtures.
| try { | ||
| detail = await getJson(`/api/fixture?name=${encodeURIComponent(name)}`); | ||
| } catch (e) { | ||
| panesEl.innerHTML = `<div class="pane-body"><span class="muted">${esc(String(e))}</span></div>`; | ||
| return; | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Clear the previous fixture state on load failures.
If /api/fixture fails here, the old detail and accept/status state stay live. That leaves the page showing an error pane while still reflecting the previous fixture's actions and metadata. Reset detail, disable Accept, and clear the inspector in this catch path.
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 442-442: Avoid assigning untrusted data to innerHTML/outerHTML or document.write
Context: panesEl.innerHTML = <div class="pane-body"><span class="muted">${esc(String(e))}</span></div>
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').
(inner-outer-html)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/tools_semantic_tokens/src/index.html` around lines 440 -
445, The load-failure handler in the fixture fetch flow should clear stale state
instead of only rendering the error pane. In the catch path around getJson in
index.html, reset detail, disable the Accept control, and clear the
inspector/status state so the UI does not keep reflecting the previous fixture
after a failed /api/fixture request.
| acceptEl.addEventListener("click", async () => { | ||
| if (selected === SCRATCH || acceptEl.disabled) return; | ||
| acceptEl.disabled = true; | ||
| try { | ||
| await getJson("/api/accept", { | ||
| method: "POST", | ||
| headers: { "content-type": "application/json" }, | ||
| body: JSON.stringify({ name: selected }), | ||
| }); | ||
| await refreshFixtures(); | ||
| await select(selected); | ||
| } catch (e) { | ||
| statusEl.textContent = `Accept failed: ${e}`; | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Re-enable Accept after a failed save.
Line 458 disables the button, but the failure path never restores it. One transient write error forces the user to reselect the fixture before retrying. Recompute the enabled state in a finally block.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/tools_semantic_tokens/src/index.html` around lines 456 -
469, The accept button flow in the click handler for acceptEl leaves the control
disabled after a failed /api/accept request, so the user cannot retry without
reselecting a fixture. Update the async click handler to restore the enabled
state in a finally block after the getJson("/api/accept"), refreshFixtures(),
and select(selected) sequence, and base that restore on the current selection
state (for example the same condition used before disabling).
| async fn bind(base: u16) -> Result<(TcpListener, u16)> { | ||
| for offset in 0..50 { | ||
| let port = base + offset; | ||
| let addr = SocketAddr::from(([127, 0, 0, 1], port)); | ||
| if let Ok(listener) = TcpListener::bind(addr).await { | ||
| return Ok((listener, port)); | ||
| } | ||
| } | ||
| anyhow::bail!("no free port in {base}..{}", base + 50) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Clamp the port scan to u16::MAX.
Line 89 can overflow when a user passes a preferred port near 65535. In debug builds this panics; in release builds it can wrap and start probing unrelated low ports.
Proposed fix
async fn bind(base: u16) -> Result<(TcpListener, u16)> {
- for offset in 0..50 {
- let port = base + offset;
+ for port in base..=base.saturating_add(49) {
let addr = SocketAddr::from(([127, 0, 0, 1], port));
if let Ok(listener) = TcpListener::bind(addr).await {
return Ok((listener, port));
}
}
- anyhow::bail!("no free port in {base}..{}", base + 50)
+ anyhow::bail!("no free port in {base}..={}", base.saturating_add(49))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async fn bind(base: u16) -> Result<(TcpListener, u16)> { | |
| for offset in 0..50 { | |
| let port = base + offset; | |
| let addr = SocketAddr::from(([127, 0, 0, 1], port)); | |
| if let Ok(listener) = TcpListener::bind(addr).await { | |
| return Ok((listener, port)); | |
| } | |
| } | |
| anyhow::bail!("no free port in {base}..{}", base + 50) | |
| async fn bind(base: u16) -> Result<(TcpListener, u16)> { | |
| for port in base..=base.saturating_add(49) { | |
| let addr = SocketAddr::from(([127, 0, 0, 1], port)); | |
| if let Ok(listener) = TcpListener::bind(addr).await { | |
| return Ok((listener, port)); | |
| } | |
| } | |
| anyhow::bail!("no free port in {base}..={}", base.saturating_add(49)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/tools_semantic_tokens/src/main.rs` around lines 87 - 95,
The port selection logic in bind can overflow when adding the scan offset to a
high preferred port. Update the loop in bind to clamp the upper bound at
u16::MAX before probing ports, so the SocketAddr construction never wraps or
panics; keep the rest of the TcpListener::bind retry behavior unchanged and
preserve the existing error path when no port is available.
| fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> { | ||
| let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy()) | ||
| == Some(std::borrow::Cow::Borrowed(name)); | ||
| let is_baml = Path::new(name) | ||
| .extension() | ||
| .is_some_and(|ext| ext.eq_ignore_ascii_case("baml")); | ||
| if !is_basename || !is_baml { | ||
| return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string())); | ||
| } | ||
| let path = dir.join(name); | ||
| if !path.is_file() { | ||
| return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string())); | ||
| } | ||
| Ok(path) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Reject symlinked fixtures before returning the path.
Line 193 uses Path::is_file(), which follows symlinks. A checked-in foo.baml symlink can therefore escape fixtures_dir, letting /api/fixture read and /api/accept overwrite files outside the fixture tree.
Suggested fix
fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> {
let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy())
== Some(std::borrow::Cow::Borrowed(name));
let is_baml = Path::new(name)
.extension()
.is_some_and(|ext| ext.eq_ignore_ascii_case("baml"));
if !is_basename || !is_baml {
return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
}
- let path = dir.join(name);
- if !path.is_file() {
+ let root = dir
+ .canonicalize()
+ .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
+ let path = dir.join(name);
+ let meta = std::fs::symlink_metadata(&path)
+ .map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?;
+ if meta.file_type().is_symlink() {
+ return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string()));
+ }
+ let path = path
+ .canonicalize()
+ .map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?;
+ if !path.starts_with(&root) || !path.is_file() {
return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string()));
}
Ok(path)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> { | |
| let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy()) | |
| == Some(std::borrow::Cow::Borrowed(name)); | |
| let is_baml = Path::new(name) | |
| .extension() | |
| .is_some_and(|ext| ext.eq_ignore_ascii_case("baml")); | |
| if !is_basename || !is_baml { | |
| return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string())); | |
| } | |
| let path = dir.join(name); | |
| if !path.is_file() { | |
| return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string())); | |
| } | |
| Ok(path) | |
| fn resolve_fixture(dir: &Path, name: &str) -> Result<PathBuf, ApiError> { | |
| let is_basename = Path::new(name).file_name().map(|f| f.to_string_lossy()) | |
| == Some(std::borrow::Cow::Borrowed(name)); | |
| let is_baml = Path::new(name) | |
| .extension() | |
| .is_some_and(|ext| ext.eq_ignore_ascii_case("baml")); | |
| if !is_basename || !is_baml { | |
| return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string())); | |
| } | |
| let root = dir | |
| .canonicalize() | |
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | |
| let path = dir.join(name); | |
| let meta = std::fs::symlink_metadata(&path) | |
| .map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?; | |
| if meta.file_type().is_symlink() { | |
| return Err((StatusCode::BAD_REQUEST, "invalid fixture name".to_string())); | |
| } | |
| let path = path | |
| .canonicalize() | |
| .map_err(|_| (StatusCode::NOT_FOUND, "unknown fixture".to_string()))?; | |
| if !path.starts_with(&root) || !path.is_file() { | |
| return Err((StatusCode::NOT_FOUND, "unknown fixture".to_string())); | |
| } | |
| Ok(path) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/tools_semantic_tokens/src/server.rs` around lines 183 -
196, resolve_fixture currently trusts Path::is_file, which follows symlinks and
can let fixture names escape fixtures_dir; update the validation in
resolve_fixture to reject symlinked entries before returning the path. Use the
existing resolve_fixture helper and its dir.join(name) result to check the
resolved candidate itself is a regular file owned inside the fixture tree, and
return an error for any symlink or non-regular path before /api/fixture or
/api/accept can use it.
| /// Newest mtime among the classifier (`baml_lsp2_actions`) and viewer sources. | ||
| fn newest_source_mtime() -> Option<SystemTime> { | ||
| let manifest = Path::new(env!("CARGO_MANIFEST_DIR")); | ||
| let roots = [ | ||
| manifest.join("src"), | ||
| manifest.join("../baml_lsp2_actions/src"), | ||
| ]; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Watch the compiler crates that semantic tokens actually depend on.
The watcher only tracks this crate plus baml_lsp2_actions, but this PR also changes token behavior in baml_compiler2_tir, baml_compiler2_mir, and baml_compiler_syntax. Editing those files leaves the viewer stale, so the self-restart contract breaks across a big part of the classifier stack.
Proposed fix
fn newest_source_mtime() -> Option<SystemTime> {
let manifest = Path::new(env!("CARGO_MANIFEST_DIR"));
let roots = [
manifest.join("src"),
manifest.join("../baml_lsp2_actions/src"),
+ manifest.join("../baml_compiler2_tir/src"),
+ manifest.join("../baml_compiler2_mir/src"),
+ manifest.join("../baml_compiler_syntax/src"),
];
let mut newest = None;
for root in roots {
newest_under(&root, &mut newest);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Newest mtime among the classifier (`baml_lsp2_actions`) and viewer sources. | |
| fn newest_source_mtime() -> Option<SystemTime> { | |
| let manifest = Path::new(env!("CARGO_MANIFEST_DIR")); | |
| let roots = [ | |
| manifest.join("src"), | |
| manifest.join("../baml_lsp2_actions/src"), | |
| ]; | |
| fn newest_source_mtime() -> Option<SystemTime> { | |
| let manifest = Path::new(env!("CARGO_MANIFEST_DIR")); | |
| let roots = [ | |
| manifest.join("src"), | |
| manifest.join("../baml_lsp2_actions/src"), | |
| manifest.join("../baml_compiler2_tir/src"), | |
| manifest.join("../baml_compiler2_mir/src"), | |
| manifest.join("../baml_compiler_syntax/src"), | |
| ]; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/tools_semantic_tokens/src/staleness.rs` around lines 31
- 37, The staleness watcher in newest_source_mtime only tracks this crate and
baml_lsp2_actions, but semantic token behavior also depends on
baml_compiler2_tir, baml_compiler2_mir, and baml_compiler_syntax. Update the
roots list (and any related watcher logic in staleness.rs) so those compiler
crates are included in the mtime scan, ensuring edits there trigger the viewer
restart and keep semantic tokens fresh.
Reworks the BAML semantic-tokens classifier to be resolution-driven, mirroring rust-analyzer: walk the lossless CST for token positions + syntactic tokens (keywords/operators/punctuation), and resolve identifier meaning through the HIR/inference instead of guessing from the CST.
What's here
tokens/index.rs— span-keyed resolution index over every inference-bearing scope (function / lambda / let) via the new uniformtir::scope_bodylookup, so identifiers insidespawn/block/closure bodies classify with their own scope's inference.resolve_interface_membernow recordsMemberResolution::InterfaceMethod/InterfaceField. So.as<I>.method(), interfaceSelfmethods, and chained interface calls classify through the normal path; typos stay neutral. MIR-safe (new variants →Noneitem-refs, fall through to existing interface dispatch; 323 runtime interface tests pass).as/with/iskeywords, generators, generic type args + params (typeParameter), associated types (decl + binding), self member accesses (no longernamespace), methods after.as<Interface>, string escapes (escapeSequence), byte strings, qualified type paths,true/false/null, object/map keys vs values.tools_semantic_tokens— standalone web viewer for the classifier (no VSCode needed).Tests
78 semantic-token fixtures; 439 LSP + 323 runtime-interface + 172 tir/mir tests pass; clippy-clean.
Next steps / gaps
See
crates/baml_lsp2_actions/src/tokens/GAPS.md— the priority is scaling like rust-analyzer (on-demand per-token resolution +semanticTokens/rangeviewport + delta), plus a couple remaining feature items (optional params at call site, abooleanliteral token type).Summary by CodeRabbit
New Features
Bug Fixes