Skip to content

RHIDP-12952: persist interrupted conversation#1971

Merged
tisnik merged 10 commits into
lightspeed-core:mainfrom
Jdubrick:interrupt-message-persistence
Jun 25, 2026
Merged

RHIDP-12952: persist interrupted conversation#1971
tisnik merged 10 commits into
lightspeed-core:mainfrom
Jdubrick:interrupt-message-persistence

Conversation

@Jdubrick

@Jdubrick Jdubrick commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Description

  • When query interruption was added initially it replaced the entire conversation portion that was interrupted with the interrupt message. This change allows the half-completed message to remain after fixing any breaking code fences/html/tables/etc.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude (Cursor)
  • Generated by: Claude (Cursor)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • New Features

    • Interrupted streaming responses now preserve partial text already sent and continue emitting the interruption update with correct, sequential chunk ordering.
    • Interruption output is reconstructed dynamically and includes automatic cleanup for truncated Markdown/HTML so rendering stays intact.
  • Bug Fixes

    • Improved interruption persistence and display by rebuilding the interrupted message from the streamed content instead of using a static placeholder.
  • Updates

    • Updated the interruption notice to: “Response stopped by the user.”
  • Tests

    • Expanded unit coverage for interruption sequencing, partial-token accumulation, and Markdown repair behavior.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@Jdubrick, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 49 minutes and 6 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e011ca15-98f4-47fc-bd94-069b82361aca

📥 Commits

Reviewing files that changed from the base of the PR and between c87cf1b and 458b1d6.

📒 Files selected for processing (2)
  • src/utils/markdown_repair.py
  • tests/unit/utils/test_markdown_repair.py

Walkthrough

The PR replaces the static interrupted-response message with interruption handling that rebuilds text from streamed partial tokens, repairs open Markdown, tracks SSE chunk ids, and emits a final token suffix on cancellation in both agent and endpoint streaming paths.

Changes

Structured Interrupted Response from Partial Tokens

Layer / File(s) Summary
TurnSummary fields and Markdown repair utility
src/models/common/turn_summary.py, src/utils/markdown_repair.py, tests/unit/utils/test_markdown_repair.py
TurnSummary gains partial_tokens and next_chunk_id. close_open_markdown(text) is added to return closing suffixes for open fences and tracked block-level HTML tags, with tests covering fences, HTML nesting, combined cases, and empty/plain input.
Interrupted response helper and persistence
src/constants.py, src/utils/stream_interrupts.py, tests/unit/utils/test_stream_interrupts.py
build_interrupted_response(partial_tokens) joins tokens, repairs Markdown, appends the interrupt indicator, and returns (full_text, suffix). _on_interrupt stores the computed text, persist_interrupted_turn uses turn_summary.llm_response, and the interruption message constant is updated. Tests cover helper output and persistence content.
Agent streaming cancellation and chunk tracking
src/utils/agents/streaming.py, tests/unit/utils/agents/test_streaming.py
Token processing appends to turn_summary.partial_tokens and updates turn_summary.next_chunk_id. Cancellation now emits a computed suffix token before the interrupted payload. Tests verify token emission, partial-token accumulation, chunk-id sequencing, and cancellation before any tokens.
Endpoint streaming cancellation and delta tracking
src/app/endpoints/streaming_query.py, tests/unit/app/endpoints/test_streaming_query.py
The endpoint streaming path records deltas in turn_summary.partial_tokens, keeps next_chunk_id aligned with emitted SSE events, and emits an LLM_TOKEN_EVENT with the computed suffix on cancellation. Tests expect token events during interruption and verify the persisted interrupted response text.

Sequence Diagram(s)

sequenceDiagram
  participant StreamingPath
  participant build_interrupted_response
  participant close_open_markdown
  participant persist_interrupted_turn
  participant Client

  StreamingPath->>StreamingPath: collect partial_tokens and next_chunk_id
  StreamingPath->>build_interrupted_response: partial_tokens on CancelledError
  build_interrupted_response->>close_open_markdown: joined text
  close_open_markdown-->>build_interrupted_response: repair suffix
  build_interrupted_response-->>StreamingPath: full_text and suffix
  StreamingPath->>persist_interrupted_turn: llm_response = full_text
  StreamingPath-->>Client: final token SSE event with suffix
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lightspeed-core/lightspeed-stack#1918: Both PRs modify the streaming interruption flow by changing utils/stream_interrupts.py persistence/interrupt handling and coordinating CancelledError behavior in streaming endpoints.
  • lightspeed-core/lightspeed-stack#1919: Both PRs touch agent streaming interruption handling in src/utils/agents/streaming.py, including cancellation-path persistence and emitted token data.

Suggested reviewers

  • tisnik
  • jrobertboos
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately reflects the main change: persisting interrupted conversation content.
Docstring Coverage ✅ Passed Docstring coverage is 90.32% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 637-640: The build_interrupted_response call at line 637 relies
solely on turn_summary.partial_tokens, which may be empty or incomplete if
cancellation occurs after response.output_text.done has populated
turn_summary.llm_response but before all deltas are processed. Modify the
build_interrupted_response call to use turn_summary.llm_response as a fallback
when partial_tokens is empty, ensuring that model output is not lost when
interrupted responses are reconstructed and persisted.

In `@src/utils/markdown_repair.py`:
- Around line 75-90: In the fence closing logic (the elif condition checking
`char == fence_char and len(matched_group) >= fence_len`), add validation to
ensure that any trailing content after the fence marker contains only whitespace
characters (spaces and tabs). Extract the remainder of the line after the
matched fence group and check that it either doesn't exist or contains only
whitespace using a string method like strip() or a regex check. Only allow the
fence to close if this whitespace validation passes, otherwise treat the line as
regular content inside the code block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 093e2b47-86b6-41de-a11e-9ff652c597b9

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff72ff and 6aeea11.

📒 Files selected for processing (10)
  • src/app/endpoints/streaming_query.py
  • src/constants.py
  • src/models/common/turn_summary.py
  • src/utils/agents/streaming.py
  • src/utils/markdown_repair.py
  • src/utils/stream_interrupts.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_markdown_repair.py
  • tests/unit/utils/test_stream_interrupts.py
📜 Review details
⏰ Context from checks skipped due to timeout. (12)
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/constants.py
  • src/models/common/turn_summary.py
  • src/utils/markdown_repair.py
  • src/utils/stream_interrupts.py
  • src/app/endpoints/streaming_query.py
  • src/utils/agents/streaming.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/common/turn_summary.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/utils/test_markdown_repair.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_stream_interrupts.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/streaming_query.py
🔇 Additional comments (7)
src/models/common/turn_summary.py (1)

117-126: LGTM!

src/constants.py (1)

15-15: LGTM!

src/utils/stream_interrupts.py (1)

23-23: LGTM!

Also applies to: 219-239, 277-277, 286-286, 368-369

tests/unit/utils/test_stream_interrupts.py (1)

8-21: LGTM!

Also applies to: 49-49, 71-72, 100-100, 169-193

src/utils/agents/streaming.py (1)

28-28: LGTM!

Also applies to: 68-68, 201-217, 358-364, 415-415

tests/unit/utils/agents/test_streaming.py (1)

67-68: LGTM!

Also applies to: 720-722, 813-813, 966-1106

tests/unit/app/endpoints/test_streaming_query.py (1)

54-54: LGTM!

Also applies to: 74-75, 1385-1385, 1394-1400

Comment thread src/app/endpoints/streaming_query.py
Comment thread src/utils/markdown_repair.py Outdated
@Jdubrick

Copy link
Copy Markdown
Contributor Author

/cc @tisnik

@Jdubrick Jdubrick force-pushed the interrupt-message-persistence branch from 969f5d6 to 13718e0 Compare June 23, 2026 17:23

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)

774-783: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Advance next_chunk_id before yielding each SSE chunk.

Because response_generator suspends at yield, cancellation immediately after generate_response re-yields a token can skip Lines 782-783 or 808-809. The cancellation handler then emits the interruption suffix with the stale turn_summary.next_chunk_id, duplicating the ID of the just-emitted token. Move the ID advancement before each yield while preserving the event’s current ID.

Proposed fix
         if event_type == "response.content_part.added":
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": "",
                 },
                 LLM_TOKEN_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id
 
         # Text streaming - emit token delta
         elif event_type == "response.output_text.delta":
             delta_chunk = cast(TextDeltaChunk, chunk)
             text_parts.append(delta_chunk.delta)
             turn_summary.partial_tokens.append(delta_chunk.delta)
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": delta_chunk.delta,
                 },
                 LLM_TOKEN_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id
@@
         elif event_type == "response.completed":
             latest_response_object = cast(
                 OpenAIResponseObject,
                 getattr(chunk, "response"),  # noqa: B009
             )
             turn_summary.llm_response = turn_summary.llm_response or "".join(text_parts)
             # Capture structured output items for compacted-mode turn storage
             # (LCORE-1572), so the persisted turn keeps non-text output items
             # rather than being flattened to the response text.
             turn_summary.output_items = list(latest_response_object.output or [])
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": turn_summary.llm_response,
                 },
                 LLM_TURN_COMPLETE_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id

Also applies to: 800-809, 889-898

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/endpoints/streaming_query.py` around lines 774 - 783, The ID
advancement in the stream_event yielding blocks happens after the yield
statement, which can cause the increment to be skipped during cancellation. Move
the `chunk_id += 1` and `turn_summary.next_chunk_id = chunk_id` statements to
occur before each `yield stream_event()` call while preserving the event's ID
field to use the current chunk_id value at the time of emission. This pattern
appears in three locations: the stream_event yielding block around lines
774-783, the similar block around lines 800-809, and the block around lines
889-898. Save the current chunk_id value before incrementing, use the saved
value in the event's ID field, and increment the counters before yielding to
ensure the ID advancement cannot be skipped by cancellation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/markdown_repair.py`:
- Around line 30-50: The function `_process_html_tags` currently mutates its
`html_stack` parameter in-place using `.pop()` and `.append()` operations.
Refactor this function to return the updated stack instead of modifying the
parameter directly. Change the function signature to return a list (the updated
html_stack) and update all callers to capture the returned value rather than
relying on side effects. This applies to both the main function definition and
any other similar patterns mentioned at lines 77-79.
- Around line 31-36: The docstring for the function starting at line 31 uses
"Parameters:" section header, but Google Python docstring conventions require
"Args:" instead. Update the docstring header from "Parameters:" to "Args:" in
the function at line 31-36, and apply the same fix to the other docstring
mentioned at lines 53-64. Additionally, review both docstrings to ensure they
include all required Google format sections (Args, Returns, Raises) where
applicable based on what each function actually does.

In `@src/utils/stream_interrupts.py`:
- Around line 219-232: The docstring for the build_interrupted_response function
uses "Parameters:" and "Returns:" sections, but the repository standard requires
Google-style docstring format with "Args", "Returns", and "Raises" sections.
Update the docstring by renaming the "Parameters:" section to "Args:" to match
the required convention. Additionally, add a "Raises:" section if the function
can raise any exceptions during execution, following the repository's Google
Python docstring conventions.

---

Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 774-783: The ID advancement in the stream_event yielding blocks
happens after the yield statement, which can cause the increment to be skipped
during cancellation. Move the `chunk_id += 1` and `turn_summary.next_chunk_id =
chunk_id` statements to occur before each `yield stream_event()` call while
preserving the event's ID field to use the current chunk_id value at the time of
emission. This pattern appears in three locations: the stream_event yielding
block around lines 774-783, the similar block around lines 800-809, and the
block around lines 889-898. Save the current chunk_id value before incrementing,
use the saved value in the event's ID field, and increment the counters before
yielding to ensure the ID advancement cannot be skipped by cancellation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 187ae3b8-ac25-444d-befd-f5f266b56a23

📥 Commits

Reviewing files that changed from the base of the PR and between 6aeea11 and 13718e0.

📒 Files selected for processing (10)
  • src/app/endpoints/streaming_query.py
  • src/constants.py
  • src/models/common/turn_summary.py
  • src/utils/agents/streaming.py
  • src/utils/markdown_repair.py
  • src/utils/stream_interrupts.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_markdown_repair.py
  • tests/unit/utils/test_stream_interrupts.py
📜 Review details
⏰ Context from checks skipped due to timeout. (14)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/utils/markdown_repair.py
  • src/constants.py
  • src/models/common/turn_summary.py
  • src/utils/agents/streaming.py
  • src/utils/stream_interrupts.py
  • src/app/endpoints/streaming_query.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/common/turn_summary.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/test_markdown_repair.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_stream_interrupts.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/streaming_query.py
🔇 Additional comments (13)
src/models/common/turn_summary.py (1)

117-126: LGTM!

tests/unit/utils/test_markdown_repair.py (1)

1-175: LGTM!

src/constants.py (1)

15-15: LGTM!

src/utils/stream_interrupts.py (1)

23-23: LGTM!

Also applies to: 277-287, 368-369

tests/unit/utils/test_stream_interrupts.py (1)

8-21: LGTM!

Also applies to: 49-49, 71-73, 100-100, 169-193

src/utils/agents/streaming.py (3)

28-28: LGTM!

Also applies to: 68-68


201-217: LGTM!


358-364: LGTM!

Also applies to: 415-415

tests/unit/utils/agents/test_streaming.py (2)

67-68: LGTM!

Also applies to: 720-722, 813-813


966-1106: LGTM!

src/app/endpoints/streaming_query.py (1)

124-124: LGTM!

Also applies to: 637-652

tests/unit/app/endpoints/test_streaming_query.py (2)

54-54: LGTM!

Also applies to: 74-75


1385-1400: LGTM!

Comment thread src/utils/markdown_repair.py Outdated
Comment thread src/utils/markdown_repair.py
Comment thread src/utils/stream_interrupts.py
Jdubrick added 6 commits June 24, 2026 09:32
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
@Jdubrick Jdubrick force-pushed the interrupt-message-persistence branch from 13718e0 to a23ccbb Compare June 24, 2026 13:32

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/unit/app/endpoints/test_streaming_query.py (1)

1385-1400: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Cover the real delta path and suffix payload.

This mock generator never exercises response.output_text.delta, so the new partial_tokens and next_chunk_id endpoint logic can regress while this test still passes. Add a cancellation case through response_generator and assert the final token payload/id before "interrupted".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/app/endpoints/test_streaming_query.py` around lines 1385 - 1400,
The streaming query test is only covering the interrupted path and is missing
the real delta payload behavior for response.output_text.delta. Update the test
around response_generator to include a cancellation case that exercises the
partial_tokens/next_chunk_id logic, then assert the emitted final token payload
and chunk id before the "interrupted" event so regressions in the endpoint flow
are caught.
src/app/endpoints/streaming_query.py (1)

774-783: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Advance next_chunk_id before yielding SSE events.

These updates run only after the inner generator resumes. If cancellation hits while generate_response is paused yielding that SSE event, the interrupt suffix uses the stale next_chunk_id and can duplicate the last token id.

Proposed fix
         if event_type == "response.content_part.added":
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": "",
                 },
                 LLM_TOKEN_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id
@@
         elif event_type == "response.output_text.delta":
             delta_chunk = cast(TextDeltaChunk, chunk)
             text_parts.append(delta_chunk.delta)
             turn_summary.partial_tokens.append(delta_chunk.delta)
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": delta_chunk.delta,
                 },
                 LLM_TOKEN_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id
@@
             turn_summary.output_items = list(latest_response_object.output or [])
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": turn_summary.llm_response,
                 },
                 LLM_TURN_COMPLETE_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id

Also applies to: 800-809, 889-898

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/endpoints/streaming_query.py` around lines 774 - 783, Advance the
turn_summary.next_chunk_id update in generate_response before each
yield_stream_event call, using the chunk_id assigned for the SSE token event;
this ensures the state is already updated if cancellation interrupts while the
generator is paused. Apply the same ordering fix in the other matching send
paths as well, so the interrupt suffix logic reads the latest next_chunk_id
instead of reusing the last emitted token id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/utils/agents/test_streaming.py`:
- Around line 1054-1057: The chunk ID assertions in the token event test only
verify ordering and non-negativity, so duplicates can still slip through; update
the checks around the token_events-derived chunk_ids to assert the exact
contiguous sequence starting at 0 with no gaps or repeats. Use the existing
chunk_ids logic in test_streaming.py to compare against the expected range based
on the list length, so the test catches skipped or duplicated SSE chunk ids.

---

Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 774-783: Advance the turn_summary.next_chunk_id update in
generate_response before each yield_stream_event call, using the chunk_id
assigned for the SSE token event; this ensures the state is already updated if
cancellation interrupts while the generator is paused. Apply the same ordering
fix in the other matching send paths as well, so the interrupt suffix logic
reads the latest next_chunk_id instead of reusing the last emitted token id.

In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 1385-1400: The streaming query test is only covering the
interrupted path and is missing the real delta payload behavior for
response.output_text.delta. Update the test around response_generator to include
a cancellation case that exercises the partial_tokens/next_chunk_id logic, then
assert the emitted final token payload and chunk id before the "interrupted"
event so regressions in the endpoint flow are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c294eef1-573b-43fe-861a-22502d9a87a5

📥 Commits

Reviewing files that changed from the base of the PR and between 13718e0 and a23ccbb.

📒 Files selected for processing (10)
  • src/app/endpoints/streaming_query.py
  • src/constants.py
  • src/models/common/turn_summary.py
  • src/utils/agents/streaming.py
  • src/utils/markdown_repair.py
  • src/utils/stream_interrupts.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_markdown_repair.py
  • tests/unit/utils/test_stream_interrupts.py
📜 Review details
⏰ Context from checks skipped due to timeout. (13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/models/common/turn_summary.py
  • src/utils/markdown_repair.py
  • src/constants.py
  • src/utils/agents/streaming.py
  • src/utils/stream_interrupts.py
  • src/app/endpoints/streaming_query.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/common/turn_summary.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_stream_interrupts.py
  • tests/unit/utils/test_markdown_repair.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/streaming_query.py
🔇 Additional comments (13)
src/utils/markdown_repair.py (3)

30-50: 📐 Maintainability & Code Quality | 💤 Low value

In-place mutation of html_stack parameter.

_process_html_tags still mutates its html_stack argument via .pop()/.append(). As per coding guidelines, "Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters."

Source: Coding guidelines


33-36: 📐 Maintainability & Code Quality | 💤 Low value

Docstring sections should use Google Args/Returns.

Both _process_html_tags and close_open_markdown use a Parameters: heading. As per coding guidelines, "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes."

Also applies to: 59-64

Source: Coding guidelines


89-96: LGTM!

src/utils/stream_interrupts.py (2)

225-231: 📐 Maintainability & Code Quality | 💤 Low value

Docstring should use Google Args/Returns headings.

build_interrupted_response uses a Parameters: heading. As per coding guidelines, "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes."

Source: Coding guidelines


233-238: LGTM!

Also applies to: 277-277, 286-286, 368-369

src/models/common/turn_summary.py (1)

117-126: LGTM!

tests/unit/utils/test_markdown_repair.py (1)

1-175: LGTM!

src/constants.py (1)

28-28: LGTM!

tests/unit/utils/test_stream_interrupts.py (1)

8-20: LGTM!

Also applies to: 49-49, 71-72, 100-100, 171-193

src/utils/agents/streaming.py (1)

28-68: LGTM!

Also applies to: 201-217, 358-364, 415-415

tests/unit/utils/agents/test_streaming.py (1)

67-68: LGTM!

Also applies to: 720-722, 813-813, 966-1053, 1059-1106

src/app/endpoints/streaming_query.py (1)

124-124: LGTM!

Also applies to: 637-652

tests/unit/app/endpoints/test_streaming_query.py (1)

54-54: LGTM!

Also applies to: 74-75, 1394-1400

Comment thread tests/unit/utils/agents/test_streaming.py Outdated
Jdubrick added 2 commits June 24, 2026 09:47
…ace with yield

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (have some comments to HTML tags processing/closing, but IMHO it's ok for now)

"tbody",
"details",
"summary",
"pre",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot more, but if it's ok for your use case...

if html_stack and html_stack[-1] == tag_name:
html_stack.pop()
else:
html_stack.append(tag_name)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it can be break by using comments etc. but again ... if it's ok for your case...

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/markdown_repair.py`:
- Around line 42-72: The _strip_comments helper in markdown_repair is unused and
should be removed. Delete the _strip_comments function entirely, and keep
close_open_markdown and _strip_comments_with_zone_priority as the only
comment-stripping path so the module no longer carries dead code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8d3f6593-9aec-4fc9-ad4a-3c66d14eb875

📥 Commits

Reviewing files that changed from the base of the PR and between a23ccbb and c87cf1b.

📒 Files selected for processing (4)
  • src/app/endpoints/streaming_query.py
  • src/utils/markdown_repair.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_markdown_repair.py
📜 Review details
⏰ Context from checks skipped due to timeout. (13)
  • GitHub Check: Pylinter
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/app/endpoints/streaming_query.py
  • src/utils/markdown_repair.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/streaming_query.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_markdown_repair.py
🧠 Learnings (2)
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.

Applied to files:

  • src/app/endpoints/streaming_query.py
  • src/utils/markdown_repair.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_markdown_repair.py
🪛 ast-grep (0.44.0)
src/utils/markdown_repair.py

[warning] 58-58: XPath query is request-/variable-derived; use parameterized XPath to prevent injection.
Context: line.find(_COMMENT_CLOSE, i)
Note: [CWE-643] Improper Neutralization of Data within XPath Expressions ('XPath Injection').

(xpath-injection-python)


[warning] 64-64: XPath query is request-/variable-derived; use parameterized XPath to prevent injection.
Context: line.find(_COMMENT_OPEN, i)
Note: [CWE-643] Improper Neutralization of Data within XPath Expressions ('XPath Injection').

(xpath-injection-python)


[warning] 189-189: XPath query is request-/variable-derived; use parameterized XPath to prevent injection.
Context: remaining.find(_COMMENT_CLOSE)
Note: [CWE-643] Improper Neutralization of Data within XPath Expressions ('XPath Injection').

(xpath-injection-python)


[warning] 200-200: XPath query is request-/variable-derived; use parameterized XPath to prevent injection.
Context: remaining.find(_COMMENT_OPEN)
Note: [CWE-643] Improper Neutralization of Data within XPath Expressions ('XPath Injection').

(xpath-injection-python)

🔇 Additional comments (8)
src/utils/markdown_repair.py (2)

217-247: LGTM!


250-313: LGTM!

tests/unit/utils/test_markdown_repair.py (1)

133-472: LGTM!

tests/unit/utils/agents/test_streaming.py (2)

1055-1061: Chunk-id assertions are sufficient for contiguity.

sorted + non-negative + no-duplicates + chunk_ids[-1] == num_chunks - 1 together force the set to be exactly {0..num_chunks-1}, so this correctly catches skipped or duplicated SSE ids.


720-723: LGTM!

Also applies to: 813-814, 1063-1108

src/app/endpoints/streaming_query.py (3)

773-784: LGTM!


800-811: Increment-before-yield correctly avoids the duplicate chunk-id race.

Appending to partial_tokens and advancing next_chunk_id prior to the yield ensures that if cancellation lands at the yield point, the interrupt handler emits the suffix with a fresh, non-colliding id and the delta is retained.


891-901: LGTM!

Comment thread src/utils/markdown_repair.py Outdated
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 2596e0a into lightspeed-core:main Jun 25, 2026
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants