Skip to content

WIP: Agentic docs example#6223

Open
jatinsu wants to merge 2 commits into
openshift:mainfrom
jatinsu:agentic-docs-example
Open

WIP: Agentic docs example#6223
jatinsu wants to merge 2 commits into
openshift:mainfrom
jatinsu:agentic-docs-example

Conversation

@jatinsu

@jatinsu jatinsu commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Agentic Docs example PR.

  1. Ran /agentic-docs:component
  2. ran evals (/eval-setup, /eval-analyze, /eval-dataset, /eval-run)

Summary by CodeRabbit

  • Documentation
    • Added/updated end-to-end Machine Config Operator guides, including agent overview, development setup, testing instructions, and architecture/component references.
    • Added new reference docs for CRDs and the broader ecosystem, plus multiple ADRs covering on-cluster layering, deterministic config rendering, and the update lifecycle.
  • Tests
    • Introduced an automated documentation evaluation suite with a configurable benchmark, test cases, and recorded run outputs/results.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ee620cfe-28b3-4527-adfc-d0ba637d21c0

📥 Commits

Reviewing files that changed from the base of the PR and between 02c0527 and 79a929f.

⛔ Files ignored due to path filters (1)
  • claude-metrics/otelcol.log is excluded by !**/*.log
📒 Files selected for processing (2)
  • claude-metrics/claude-metrics.jsonl
  • claude-metrics/otelcol.pid
✅ Files skipped from review due to trivial changes (1)
  • claude-metrics/otelcol.pid

Walkthrough

Adds MCO documentation pages, an eval benchmark spec, 16 documented cases, and recorded run outputs with analysis for the docs benchmark.

Changes

Documentation and evaluation updates

Layer / File(s) Summary
Root guides
AGENTS.md, MCO_DEVELOPMENT.md, MCO_TESTING.md
Adds root documentation for MCO entry points, development workflow, and testing workflow.
Architecture and reference docs
ai-docs/architecture/components.md, ai-docs/domain/crds.md, ai-docs/decisions/adr-*.md, ai-docs/exec-plans/README.md, ai-docs/references/ecosystem.md
Adds or rewrites architecture, CRD, ADR, execution-plan, and ecosystem reference pages.
Eval spec and judges
eval.md, eval.yaml
Defines the docs evaluation setup, dataset structure, judges, and thresholds.
Eval cases
eval/cases/case-*/annotations.yaml, eval/cases/case-*/input.yaml
Adds 16 evaluation cases with prompts, annotations, expected files, expected mentions, and constraint metadata.
Run results and analysis
eval/runs/mco-docs-eval/2026-06-22-sonnet/*
Adds run results, per-case outputs, summary metrics, and analysis notes for the evaluation run.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 errors, 1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Container-Privileges ❌ Error New manifest files include privileged: true, hostNetwork: true, and hostPID: true in MCD/Server daemonsets, matching the forbidden patterns. Remove or justify the privileged settings in the affected K8s manifests, or mark this check as intentionally exempt if those settings are required.
No-Sensitive-Data-In-Logs ❌ Error FAIL: committed logs expose session.id/user.id and raw prompt/body fields in claude-metrics/claude-metrics.jsonl. Remove or redact committed log artifacts; keep only sanitized metrics and strip session/user IDs, prompts, bodies, and other log payloads.
Microshift Test Compatibility ⚠️ Warning New Ginkgo suite test/extended-priv/mco_layering.go has no MicroShift skip/apigroup tag and exercises MachineConfig/MachineConfigPool APIs, which MicroShift doesn't serve. Add [apigroup:machineconfiguration.openshift.io] or [Skipped:MicroShift] to the suite/tests, or guard with exutil.IsMicroShiftCluster() and g.Skip().
Title check ❓ Inconclusive The title is relevant but too generic; it doesn't clearly describe the documentation and eval files added in this PR. Rename it to summarize the main change, e.g. a concise title about adding MCO agentic documentation and evaluation assets.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PASS: The PR only adds docs/YAML eval files; no Ginkgo Describe/Context/When/It titles were added or modified, so no unstable test names.
Test Structure And Quality ✅ Passed PR changes are docs/YAML/JSON only; no Ginkgo test code was added or modified, so this check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests or Go changes were added; the PR only adds docs/eval YAML, so no SNO-specific multi-node assumptions appear.
Topology-Aware Scheduling Compatibility ✅ Passed PASS: The PR only adds docs/eval artifacts; no deployment manifests, operator code, or controllers were changed, so no topology-sensitive scheduling was introduced.
Ote Binary Stdout Contract ✅ Passed Only one new Go file was added, and it has no process-level stdout writes; its logger targets GinkgoWriter, not stdout.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR only adds docs/eval artifacts; no new Ginkgo e2e test files or IPv4/networking assumptions were introduced.
No-Weak-Crypto ✅ Passed The actual diff only changes claude-metrics files, and none contain MD5/SHA1/DES/RC4/3DES/Blowfish/ECB or secret-comparison code.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jatinsu
Once this PR has been reviewed and has the lgtm label, please assign pablintino for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
eval.md (1)

142-145: 🎯 Functional Correctness | 🔴 Critical

Fix budget_check judge logic: it compares aggregate run cost against per-case budget.

The budget_check judge applies the same aggregate cost ($11.14 total run) to all cases. All case rationales show "Cost $11.14 exceeds budget $2.00", though individual case costs range from $0.31–$1.20 — each under the threshold. The judge logic must retrieve and compare the per-case cost instead of the total run cost.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval.md` around lines 142 - 145, The budget_check judge is currently using
the aggregate total run cost for all case evaluations instead of the individual
per-case cost. Locate the budget_check judge logic and modify it to retrieve the
actual cost for each specific case rather than using the total run cost. Update
the comparison to check if the per-case cost exceeds the per-case budget
threshold instead of comparing the aggregate cost, ensuring that each case is
evaluated against its own individual cost which should be significantly lower
than the total run cost.
eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yaml (1)

1-17: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Schema mismatch: test case uses wrong field names for judge evaluation.

This input uses expected_api, expected_example_type, expected_fields, and expected_documentation, but the judges defined in eval.yaml (lines 187–236) expect expected_files and expected_mentions. This mismatch causes judges to skip validation (as documented in analysis.md and confirmed by the 'str object' has no attribute 'get' errors in summary.yaml).

Align field names with judge expectations: rename expected_documentationexpected_files and add expected_mentions if applicable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yaml` around
lines 1 - 17, The test case input.yaml uses incorrect field names that do not
match the judge evaluation schema defined in eval.yaml. Rename the field
expected_documentation to expected_files to align with judge expectations.
Additionally, add the expected_mentions field if there are specific mentions or
terms that the judges should validate against. Ensure all field names in this
test case match the schema requirements documented in the judges section of
eval.yaml.
eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yaml (1)

1-15: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Schema mismatch: test case uses wrong field names for judge evaluation.

This input uses expected_api, expected_example_type, expected_fields, and expected_documentation, but the judges defined in eval.yaml (lines 187–236) expect expected_files and expected_mentions. This mismatch causes judges to skip validation (as documented in analysis.md and confirmed by the 'str object' has no attribute 'get' errors in summary.yaml).

Align field names with judge expectations: rename expected_documentationexpected_files and add expected_mentions if applicable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yaml` around
lines 1 - 15, The test case in input.yaml uses field names that do not match the
expectations defined in the judges within eval.yaml. Rename the field
expected_documentation to expected_files to align with judge expectations.
Additionally, add an expected_mentions field if there are specific API or
documentation mentions that should be validated by the judges. This will ensure
the schema matches the judge validation logic and prevent the validation skip
errors.
eval/cases/case-003/annotations.yaml (1)

1-4: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Critical: annotations.yaml missing required metadata fields.

Per eval.yaml dataset schema, annotations.yaml must include expected_files and expected_mentions. This file only declares category, difficulty, and topic—missing the fields required for judge evaluation. Without expected_files, the consulted_docs judge cannot validate which documentation the agent should consult. Without expected_mentions, the mentions_expected_keywords judge will be skipped.

Add expected_files (list of docs the agent should consult) and expected_mentions (keywords the response should include).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-003/annotations.yaml` around lines 1 - 4, The
annotations.yaml file is missing the required metadata fields for judge
evaluation. Add two new fields to the YAML file: first add `expected_files` as a
list containing the names of documentation files that the agent should consult
when answering the question, and second add `expected_mentions` as a list
containing the keywords or concepts that should appear in the agent's response.
These fields are mandatory per the eval.yaml dataset schema and are required for
the consulted_docs and mentions_expected_keywords judges to function properly.
eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yaml (1)

6-27: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift

Move expected metadata to annotations.yaml; verify field mappings for authoring test cases.

This run snapshot shows case-015/input.yaml contains expected result fields (expected_structure, expected_patterns, expected_documentation) that should be in annotations.yaml. The analysis.md explicitly calls out expected_patterns as a field that must migrate to annotations.yaml (line 17).

The baseline schema (eval.yaml:43–53) may need extension for authoring tests, but the principle remains: expected result metadata belongs in annotations.yaml so judges can query it via outputs["annotations"].

For case-015, move to eval/cases/case-015/annotations.yaml:

  • expected_structureexpected_structure (if authoring tests use this field)
  • expected_patternsexpected_patterns (per analysis.md)
  • expected_documentationexpected_files

Then verify that judges have corresponding if conditions and checks for these fields (similar to lines 162–194 in eval.yaml).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yaml` around
lines 6 - 27, The expected result metadata fields (expected_structure,
expected_patterns, and expected_documentation) currently in input.yaml should be
migrated to annotations.yaml so that judge logic can query them via
outputs["annotations"]. Create or update the annotations.yaml file to include
these fields, renaming expected_documentation to expected_files to match the
schema convention. Then verify that the judge conditions and checks in eval.yaml
(similar to the pattern shown in lines 162–194) have corresponding if conditions
that handle these three fields so judges can properly validate against the
expected results.
eval/cases/case-015/input.yaml (1)

6-27: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Relocate expected_ metadata from input.yaml to annotations.yaml.*

Lines 6–27 contain test metadata (expected_structure, expected_patterns, expected_documentation) that should reside in annotations.yaml per the judge contract. Keep only the prompt field in input.yaml; move the metadata to the corresponding case-015/annotations.yaml file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-015/input.yaml` around lines 6 - 27, The test metadata fields
expected_structure, expected_patterns, and expected_documentation in input.yaml
should be relocated to annotations.yaml according to the judge contract. Remove
these three fields from input.yaml while keeping only the prompt field, then add
the expected_structure, expected_patterns, and expected_documentation fields to
the corresponding annotations.yaml file in the case-015 directory.
♻️ Duplicate comments (2)
eval/cases/case-013/input.yaml (1)

6-14: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift

These expected result fields must move to annotations.yaml (duplicate of case-013/annotations.yaml review).

Lines 6–14 contain expected result metadata that belongs in annotations.yaml. See the fix in case-013/annotations.yaml review above.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-013/input.yaml` around lines 6 - 14, The fields
expected_rejection, expected_constraint, expected_documentation, and
correct_approach in the input.yaml file for case-013 should be moved to the
annotations.yaml file instead. Remove these four fields from input.yaml and add
them to the corresponding annotations.yaml file where they belong as expected
result metadata.
eval/cases/case-012/input.yaml (1)

7-14: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift

These expected result fields must move to annotations.yaml (duplicate of case-012/annotations.yaml review).

Lines 7–14 contain expected result metadata that belongs in annotations.yaml per the schema contract. See the detailed fix in the case-012/annotations.yaml review above. Once moved, this file should contain only the prompt field.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-012/input.yaml` around lines 7 - 14, The fields
expected_rejection, expected_constraint, expected_documentation, and
correct_approach in input.yaml (lines 7-14) violate the schema contract and
should only exist in annotations.yaml. Remove all these expected result metadata
fields from input.yaml, leaving only the prompt field in this file. Then ensure
these same fields are properly defined in the corresponding annotations.yaml
file for case-012.
🧹 Nitpick comments (5)
AGENTS.md (1)

9-25: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Specify language for fenced code block.

The directory listing code block (line 9) is missing a language specifier. Use bash, text, or sh as appropriate for markdownlint compliance (MD040).

✏️ Proposed fix
-```
+```bash
 cmd/                          # 7 binaries (operator, controller, daemon, server, machine-os-builder, osimagestream, apiserver-watcher)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@AGENTS.md` around lines 9 - 25, The fenced code block containing the
directory listing in AGENTS.md is missing a language specifier, which violates
markdownlint rule MD040. Add a language identifier (bash, text, or sh)
immediately after the opening triple backticks of the code block that lists the
directories like cmd/, pkg/controller/, pkg/daemon/, etc. This ensures the
markdown is compliant and properly formatted.

Source: Linters/SAST tools

MCO_DEVELOPMENT.md (1)

14-21: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Specify language for fenced code block.

The bash code block (line 14) is missing a language specifier. Per markdownlint (MD040), specify bash explicitly.

✏️ Proposed fix
-```
+```bash
 make binaries                          # All 8 binaries
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@MCO_DEVELOPMENT.md` around lines 14 - 21, The fenced code block containing
the make commands (starting with make binaries) is missing a language specifier
on the opening triple backticks. Add bash as the language identifier by changing
the opening ``` to ```bash to comply with the markdownlint MD040 rule.

Source: Linters/SAST tools

MCO_TESTING.md (2)

98-103: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Specify language for fenced code block.

The Go import code block (line 98) is missing a language specifier. Per markdownlint (MD040), specify go explicitly.

✏️ Proposed fix
-```
+```go
 import (
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@MCO_TESTING.md` around lines 98 - 103, The fenced code block containing the
Go import statements for helpers and framework packages is missing the language
specifier. Add `go` to the opening fence (changing from ``` to ```go) before the
import statement to explicitly declare this as a Go code block and comply with
markdown linting standards.

Source: Linters/SAST tools


103-105: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Reduce repetition of "Use" at the start of consecutive sentences.

Three successive sentences in the E2E test-writing guidance begin with "Use", which reduces readability. Consider restructuring for variety.

✏️ Proposed fix
-Use builders from `test/helpers/` for creating test objects. Use `test/e2e-shared-tests/` for test logic shared across suites.
+Use builders from `test/helpers/` for creating test objects and `test/e2e-shared-tests/` for test logic shared across suites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@MCO_TESTING.md` around lines 103 - 105, The E2E test-writing guidance section
contains three consecutive sentences that all begin with "Use", which reduces
readability. Restructure at least one or two of these sentences to start with
different words while preserving the same meaning and guidance. For example,
consider using alternative phrasings like "Utilize builders from...", "Create
test objects using builders from...", "Leverage builders from...", or
restructuring with different sentence structures (e.g., "Test objects should be
created using builders from...") to introduce grammatical variety while
maintaining clear and instructive tone.

Source: Linters/SAST tools

eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yaml (1)

1-10: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Regenerate run snapshots after fixing test case structure.

This file contains a snapshot of case-001 as executed with the buggy input.yaml structure (expected_files/expected_mentions in input.yaml). Once the test cases are corrected to move expected_* fields to annotations.yaml, the baseline snapshots in eval/runs/ should be regenerated by re-running /eval-run --model claude-sonnet-4-6 --run-id 2026-06-22-sonnet-v2 (as recommended in analysis.md line 15).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yaml` around
lines 1 - 10, The input.yaml file has an incorrect structure with expected_files
and expected_mentions fields that should not be in this file. Remove the
expected_files and expected_mentions fields from input.yaml, keeping only the
prompt field. Move the expected_files and expected_mentions fields to
annotations.yaml where they belong according to the corrected test case
structure. After making these structural corrections, regenerate the baseline
snapshots by running the eval-run command with the specified model and run-id as
documented in analysis.md.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@eval.yaml`:
- Around line 43-53: The schema definition in the dataset section of eval.yaml
incorrectly documents the structure of annotations.yaml files, listing fields
like expected_files, expected_mentions, expected_rejection, and
expected_guidance that do not actually exist in the test case files. Update the
schema documentation to accurately reflect the actual fields present in
annotations.yaml, which include category (present in all cases) along with
case-specific metadata fields such as difficulty, complexity, topic, severity,
constraint_type, use_case, focus, api_type, content_type, and
constraints_tested. Remove all references to the non-existent expected_* fields.

In `@eval/cases/case-002/input.yaml`:
- Around line 1-10: The input.yaml file currently contains evaluation metadata
fields (expected_files, expected_mentions, and their values) that violate the
eval.yaml dataset schema. According to the schema definition, input.yaml should
contain only the prompt field. Move the expected_files and expected_mentions
fields along with their values from input.yaml to annotations.yaml, ensuring
input.yaml retains only the prompt field. This separation is critical because
the mentions_expected_keywords judge looks for
annotations.get('expected_mentions') to validate expected keywords, and the
current placement in the wrong file causes the validation to be skipped.

In `@eval/cases/case-003/input.yaml`:
- Around line 4-12: The expected_files and expected_mentions fields in
input.yaml are in the wrong location. According to the eval.yaml schema, these
fields must be in annotations.yaml instead because the downstream judges
(mentions_expected_keywords and response_accuracy) access them via
annotations.get() calls. Remove the expected_files and expected_mentions
sections from input.yaml, and move them to annotations.yaml alongside other
metadata like category and difficulty. This ensures the judges can properly
access these expectations at runtime through the annotations dictionary.

In `@eval/cases/case-004/annotations.yaml`:
- Around line 1-3: Remove the topic field from annotations.yaml as it is not
defined in the eval.yaml schema. Add the missing expected_files and
expected_mentions fields to annotations.yaml by moving them from the paired
input.yaml file. This ensures all expectation metadata is consolidated in
annotations.yaml according to the eval.yaml schema (lines 43-53) so judges can
access it via annotations.get(), and removes undefined custom metadata.

In `@eval/cases/case-007/annotations.yaml`:
- Around line 1-4: The annotations.yaml file contains fields (api_type,
use_case, difficulty) that do not conform to the eval.yaml contract schema.
Replace these incorrect fields with the required fields specified in
eval.yaml:48-55: keep category, add expected_files, add expected_mentions, and
conditionally add expected_rejection and expected_guidance only if this is an
anti-pattern case. Remove all fields that are not part of the contract
(api_type, use_case, difficulty) to ensure the schema matches and prevents the
judge failures caused by attempting to call get() on string objects.

In `@eval/cases/case-007/input.yaml`:
- Around line 15-17: In the input.yaml file at the fixture level, rename the
field `expected_documentation` to `expected_files` to align with the field name
defined in the schema in eval.yaml. This ensures consistency between the fixture
and the schema definition so that judges can properly access the expected
documentation references using the correct field name.

In `@eval/cases/case-008/annotations.yaml`:
- Around line 1-3: The annotations.yaml file in case-008 contains incorrect
fields that do not conform to the eval.yaml schema contract. Remove the `focus`
and `complexity` fields from the annotations.yaml file. Replace them with the
required fields according to the eval.yaml contract (lines 48-55): add
`expected_files`, `expected_mentions`, and if this is an anti-pattern case, also
add `expected_rejection` and `expected_guidance`. Keep the `category` field as
it is already correct. Refer to the analysis.md documentation (lines 40-58) for
context on this issue which is the same root cause as case-007.

In `@eval/cases/case-008/input.yaml`:
- Around line 5-26: The input.yaml file references schema fields
`expected_components` and `expected_interactions` that are not defined in the
eval.yaml schema (lines 48-55). Either add these field definitions to the schema
in eval.yaml with their validation rules, or remove these fields from the
input.yaml if they are not required. Additionally, rename the
`expected_documentation` field to `expected_files` in the input.yaml to match
the field name defined in the eval.yaml schema.

In `@eval/cases/case-009/annotations.yaml`:
- Around line 1-3: The annotations.yaml file in eval/cases/case-009 is missing
required fields that are defined in the eval.yaml contract schema. Currently it
only contains category, focus, and complexity fields, but must also include the
required fields expected_files and expected_mentions (and any other required
fields documented in the eval.yaml contract). Add all missing required fields to
the annotations.yaml file to match the proper schema definition, following the
same structure used in case-007 and case-008 as reference examples.

In `@eval/cases/case-012/annotations.yaml`:
- Around line 1-4: The annotations.yaml file is missing required expected
metadata fields that judges need to access. Move the fields from input.yaml to
annotations.yaml while renaming them according to the schema contract: move
expected_rejection as-is, rename expected_constraint to rule, rename
expected_documentation to expected_files, and rename correct_approach to
expected_guidance. After adding these fields to annotations.yaml with their
corrected names, remove all these fields from input.yaml, leaving only the
prompt field there.

In `@eval/cases/case-014/input.yaml`:
- Around line 8-16: The case-014 evaluation is storing judge-required metadata
in input.yaml with field names expected_constraint and correct_approach, but the
judge template reads from annotations.yaml using different keys (rule and
expected_guidance respectively). Remove the expected_constraint and
correct_approach fields from the input.yaml file for case-014, then add them to
case-014/annotations.yaml with the correct field names: rename
expected_constraint to rule and correct_approach to expected_guidance. This
alignment will allow the judge to properly read the constraint details and
guidance without encountering the "str object has no attribute get" error.

In `@eval/cases/case-015/annotations.yaml`:
- Around line 1-7: Add the required metadata fields to the annotations.yaml file
in case-015 directory. The authoring_quality judge template expects
expected_patterns and expected_documentation fields to be present in
annotations, but they are currently missing. Migrate the expected_patterns list
(currently in input.yaml lines 14-21) and the expected_documentation list
(currently in input.yaml lines 23-26) from input.yaml to annotations.yaml so
that the judge template can access them via annotations.get() calls.

In `@eval/cases/case-016/annotations.yaml`:
- Around line 1-7: The annotations.yaml file is missing the expected_patterns
and expected_documentation fields that the authoring_quality judge template
requires. Add the expected_patterns list from input.yaml lines 17-25 and the
expected_documentation list from input.yaml lines 27-30 as top-level fields in
annotations.yaml to ensure the judge can properly access these required metadata
fields.

In `@eval/cases/case-016/input.yaml`:
- Around line 7-30: The test metadata fields (expected_structure,
expected_patterns, and expected_documentation) currently in input.yaml at lines
7-30 should be relocated to annotations.yaml per the judge contract
requirements. Remove these three expected_* fields from input.yaml and keep only
the prompt field there. Then add the expected_structure, expected_patterns, and
expected_documentation fields to the case-016/annotations.yaml file with the
same content that was removed from input.yaml.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/input.yaml`:
- Around line 5-11: The test case file case-011/input.yaml uses the field name
correct_approach when it should use expected_guidance to match the anti-pattern
schema definition in eval.yaml (lines 48-55). Replace the field name
correct_approach with expected_guidance in the YAML file to ensure consistency
with the schema specification and allow judges to correctly map this field
during evaluation.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/input.yaml`:
- Around line 1-14: The input.yaml file uses the field name `correct_approach`
but the anti-pattern schema specifies that this field should be named
`expected_guidance`. Rename the `correct_approach` field to `expected_guidance`
in this test case to align with the schema specification. Ensure this change is
applied consistently across all anti-pattern test cases to maintain schema
compliance.

---

Outside diff comments:
In `@eval.md`:
- Around line 142-145: The budget_check judge is currently using the aggregate
total run cost for all case evaluations instead of the individual per-case cost.
Locate the budget_check judge logic and modify it to retrieve the actual cost
for each specific case rather than using the total run cost. Update the
comparison to check if the per-case cost exceeds the per-case budget threshold
instead of comparing the aggregate cost, ensuring that each case is evaluated
against its own individual cost which should be significantly lower than the
total run cost.

In `@eval/cases/case-003/annotations.yaml`:
- Around line 1-4: The annotations.yaml file is missing the required metadata
fields for judge evaluation. Add two new fields to the YAML file: first add
`expected_files` as a list containing the names of documentation files that the
agent should consult when answering the question, and second add
`expected_mentions` as a list containing the keywords or concepts that should
appear in the agent's response. These fields are mandatory per the eval.yaml
dataset schema and are required for the consulted_docs and
mentions_expected_keywords judges to function properly.

In `@eval/cases/case-015/input.yaml`:
- Around line 6-27: The test metadata fields expected_structure,
expected_patterns, and expected_documentation in input.yaml should be relocated
to annotations.yaml according to the judge contract. Remove these three fields
from input.yaml while keeping only the prompt field, then add the
expected_structure, expected_patterns, and expected_documentation fields to the
corresponding annotations.yaml file in the case-015 directory.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yaml`:
- Around line 1-15: The test case in input.yaml uses field names that do not
match the expectations defined in the judges within eval.yaml. Rename the field
expected_documentation to expected_files to align with judge expectations.
Additionally, add an expected_mentions field if there are specific API or
documentation mentions that should be validated by the judges. This will ensure
the schema matches the judge validation logic and prevent the validation skip
errors.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yaml`:
- Around line 1-17: The test case input.yaml uses incorrect field names that do
not match the judge evaluation schema defined in eval.yaml. Rename the field
expected_documentation to expected_files to align with judge expectations.
Additionally, add the expected_mentions field if there are specific mentions or
terms that the judges should validate against. Ensure all field names in this
test case match the schema requirements documented in the judges section of
eval.yaml.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yaml`:
- Around line 6-27: The expected result metadata fields (expected_structure,
expected_patterns, and expected_documentation) currently in input.yaml should be
migrated to annotations.yaml so that judge logic can query them via
outputs["annotations"]. Create or update the annotations.yaml file to include
these fields, renaming expected_documentation to expected_files to match the
schema convention. Then verify that the judge conditions and checks in eval.yaml
(similar to the pattern shown in lines 162–194) have corresponding if conditions
that handle these three fields so judges can properly validate against the
expected results.

---

Duplicate comments:
In `@eval/cases/case-012/input.yaml`:
- Around line 7-14: The fields expected_rejection, expected_constraint,
expected_documentation, and correct_approach in input.yaml (lines 7-14) violate
the schema contract and should only exist in annotations.yaml. Remove all these
expected result metadata fields from input.yaml, leaving only the prompt field
in this file. Then ensure these same fields are properly defined in the
corresponding annotations.yaml file for case-012.

In `@eval/cases/case-013/input.yaml`:
- Around line 6-14: The fields expected_rejection, expected_constraint,
expected_documentation, and correct_approach in the input.yaml file for case-013
should be moved to the annotations.yaml file instead. Remove these four fields
from input.yaml and add them to the corresponding annotations.yaml file where
they belong as expected result metadata.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 9-25: The fenced code block containing the directory listing in
AGENTS.md is missing a language specifier, which violates markdownlint rule
MD040. Add a language identifier (bash, text, or sh) immediately after the
opening triple backticks of the code block that lists the directories like cmd/,
pkg/controller/, pkg/daemon/, etc. This ensures the markdown is compliant and
properly formatted.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yaml`:
- Around line 1-10: The input.yaml file has an incorrect structure with
expected_files and expected_mentions fields that should not be in this file.
Remove the expected_files and expected_mentions fields from input.yaml, keeping
only the prompt field. Move the expected_files and expected_mentions fields to
annotations.yaml where they belong according to the corrected test case
structure. After making these structural corrections, regenerate the baseline
snapshots by running the eval-run command with the specified model and run-id as
documented in analysis.md.

In `@MCO_DEVELOPMENT.md`:
- Around line 14-21: The fenced code block containing the make commands
(starting with make binaries) is missing a language specifier on the opening
triple backticks. Add bash as the language identifier by changing the opening
``` to ```bash to comply with the markdownlint MD040 rule.

In `@MCO_TESTING.md`:
- Around line 98-103: The fenced code block containing the Go import statements
for helpers and framework packages is missing the language specifier. Add `go`
to the opening fence (changing from ``` to ```go) before the import statement to
explicitly declare this as a Go code block and comply with markdown linting
standards.
- Around line 103-105: The E2E test-writing guidance section contains three
consecutive sentences that all begin with "Use", which reduces readability.
Restructure at least one or two of these sentences to start with different words
while preserving the same meaning and guidance. For example, consider using
alternative phrasings like "Utilize builders from...", "Create test objects
using builders from...", "Leverage builders from...", or restructuring with
different sentence structures (e.g., "Test objects should be created using
builders from...") to introduce grammatical variety while maintaining clear and
instructive tone.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c8442213-bc9a-4f06-945b-1349b7e1b139

📥 Commits

Reviewing files that changed from the base of the PR and between dd2d4f3 and 02c0527.

⛔ Files ignored due to path filters (50)
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/output/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/output/stdout.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/stderr.log is excluded by !**/*.log
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/stdout.log is excluded by !**/*.log
📒 Files selected for processing (114)
  • AGENTS.md
  • MCO_DEVELOPMENT.md
  • MCO_TESTING.md
  • ai-docs/architecture/components.md
  • ai-docs/decisions/adr-001-on-cluster-layering.md
  • ai-docs/decisions/adr-002-config-rendering.md
  • ai-docs/decisions/adr-003-update-lifecycle.md
  • ai-docs/domain/crds.md
  • ai-docs/exec-plans/README.md
  • ai-docs/references/ecosystem.md
  • eval.md
  • eval.yaml
  • eval/cases/case-001/annotations.yaml
  • eval/cases/case-001/input.yaml
  • eval/cases/case-002/annotations.yaml
  • eval/cases/case-002/input.yaml
  • eval/cases/case-003/annotations.yaml
  • eval/cases/case-003/input.yaml
  • eval/cases/case-004/annotations.yaml
  • eval/cases/case-004/input.yaml
  • eval/cases/case-005/annotations.yaml
  • eval/cases/case-005/input.yaml
  • eval/cases/case-006/annotations.yaml
  • eval/cases/case-006/input.yaml
  • eval/cases/case-007/annotations.yaml
  • eval/cases/case-007/input.yaml
  • eval/cases/case-008/annotations.yaml
  • eval/cases/case-008/input.yaml
  • eval/cases/case-009/annotations.yaml
  • eval/cases/case-009/input.yaml
  • eval/cases/case-010/annotations.yaml
  • eval/cases/case-010/input.yaml
  • eval/cases/case-011/annotations.yaml
  • eval/cases/case-011/input.yaml
  • eval/cases/case-012/annotations.yaml
  • eval/cases/case-012/input.yaml
  • eval/cases/case-013/annotations.yaml
  • eval/cases/case-013/input.yaml
  • eval/cases/case-014/annotations.yaml
  • eval/cases/case-014/input.yaml
  • eval/cases/case-015/annotations.yaml
  • eval/cases/case-015/input.yaml
  • eval/cases/case-016/annotations.yaml
  • eval/cases/case-016/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/analysis.md
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/subagents/agent-a207441d16e958075.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/subagents/agent-a845438d18a82924a.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/subagents/agent-a213d36a449d15fd0.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/subagents/agent-a2b52b5edbe308998.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/subagents/agent-a2cc8654419f1bf46.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/subagents/agent-a1070550a207b5ed7.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/subagents/agent-a472ace94b5ff04dd.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/subagents/agent-aafd2a1079434721b.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/subagents/agent-a4c85ad13de8d82c4.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/subagents/agent-ad8f424734eaf9c41.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/subagents/agent-a12d71965a7d591c1.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/subagents/agent-a75d9bfdacfcec099.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/subagents/agent-a5cba4a548698d94a.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/subagents/agent-af337390aa687194c.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/subagents/agent-a72410ff514c682f1.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/events.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/input.yaml
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/subagents/agent-a73ecb25f4b643455.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/subagents/agent-ac035d4a244162191.jsonl
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/collection.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/report.html
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/run_result.json
  • eval/runs/mco-docs-eval/2026-06-22-sonnet/summary.yaml

Comment thread eval.yaml
Comment on lines +43 to +53
dataset:
path: eval/cases
schema: |
Each case directory contains:
- input.yaml: YAML file with 'prompt' (the question or task for the agent)
- annotations.yaml: Expected results metadata with fields:
- category: test category (navigation, component-usage, architecture, anti-pattern, authoring)
- expected_files: list of documentation files the agent should consult
- expected_mentions: keywords that should appear in the response
- expected_rejection: (anti-pattern only) boolean, true if agent should reject the approach
- expected_guidance: (anti-pattern only) keywords for the correct approach

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify test case annotations.yaml files match the declared schema.

# Check a sample of case annotations for structure and types
for case_dir in eval/cases/case-00{1,5,9,13}/; do
  if [ -d "$case_dir" ]; then
    echo "=== $case_dir/annotations.yaml ==="
    head -20 "$case_dir/annotations.yaml"
  fi
done

# Verify required keys exist in all annotations
echo -e "\n=== Checking required fields in all cases ==="
for annotations in eval/cases/*/annotations.yaml; do
  if [ -f "$annotations" ]; then
    echo "Checking: $annotations"
    grep -E "^(category|expected_|rule|correct_approach)" "$annotations" | head -5
  fi
done

Repository: openshift/machine-config-operator

Length of output: 1694


🏁 Script executed:

#!/bin/bash
# Comprehensive check for all fields in annotations.yaml files

echo "=== Complete content of a few annotations.yaml files ==="
for case in eval/cases/case-001 eval/cases/case-011 eval/cases/case-015; do
  if [ -f "$case/annotations.yaml" ]; then
    echo -e "\n--- $case/annotations.yaml ---"
    cat "$case/annotations.yaml"
  fi
done

echo -e "\n=== Search for 'expected_' fields across ALL annotations ==="
grep -r "expected_" eval/cases/*/annotations.yaml 2>/dev/null | head -20

echo -e "\n=== All unique top-level keys in annotations files ==="
find eval/cases -name "annotations.yaml" -exec sh -c 'echo "=== $1 ==="; grep "^[a-z_]*:" "$1" | cut -d: -f1 | sort -u' _ {} \; | grep -v "^===" | sort | uniq -c | sort -rn

Repository: openshift/machine-config-operator

Length of output: 1035


Schema documentation in eval.yaml does not match actual test case files.

The declared schema (lines 45-53) specifies annotations.yaml should contain expected_files, expected_mentions, expected_rejection, and expected_guidance fields. However, none of these fields exist in any of the test case annotations.

Actual annotation files contain entirely different fields: category (present in all cases), plus case-specific metadata like difficulty, complexity, topic, severity, constraint_type, use_case, focus, api_type, content_type, and constraints_tested.

Update the schema definition in eval.yaml to accurately reflect the actual structure of annotations.yaml files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval.yaml` around lines 43 - 53, The schema definition in the dataset section
of eval.yaml incorrectly documents the structure of annotations.yaml files,
listing fields like expected_files, expected_mentions, expected_rejection, and
expected_guidance that do not actually exist in the test case files. Update the
schema documentation to accurately reflect the actual fields present in
annotations.yaml, which include category (present in all cases) along with
case-specific metadata fields such as difficulty, complexity, topic, severity,
constraint_type, use_case, focus, api_type, content_type, and
constraints_tested. Remove all references to the non-existent expected_* fields.

Comment on lines +1 to +10
prompt: How does MachineConfig merging work when multiple MachineConfigs target the
same pool? What determines the order?
expected_files:
- ai-docs/decisions/adr-002-config-rendering.md
- ai-docs/domain/crds.md
expected_mentions:
- deterministic
- sorted
- name
- last-writer-wins

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Critical: expected_files and expected_mentions placed in wrong file; breaks evaluation contract.

Per eval.yaml dataset schema (lines 43–53), input.yaml should contain only the prompt field. Evaluation metadata (expected_files, expected_mentions, category, etc.) must go in annotations.yaml. Placing these fields in input.yaml breaks the downstream judge contract; the mentions_expected_keywords judge looks for annotations.get('expected_mentions') and will skip validation when these fields are absent from annotations.

Analysis.md explicitly documents this as bug #2: "expected fields in wrong file," and the first run confirms case-002 "Passed but vacuously (no expected_files in annotations)" despite supplying them in input.yaml.

Move lines 3–10 to eval/cases/case-002/annotations.yaml, keeping only the prompt in input.yaml.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-002/input.yaml` around lines 1 - 10, The input.yaml file
currently contains evaluation metadata fields (expected_files,
expected_mentions, and their values) that violate the eval.yaml dataset schema.
According to the schema definition, input.yaml should contain only the prompt
field. Move the expected_files and expected_mentions fields along with their
values from input.yaml to annotations.yaml, ensuring input.yaml retains only the
prompt field. This separation is critical because the mentions_expected_keywords
judge looks for annotations.get('expected_mentions') to validate expected
keywords, and the current placement in the wrong file causes the validation to
be skipped.

Comment on lines +4 to +12
expected_files:
- ai-docs/domain/crds.md
- ai-docs/decisions/adr-001-on-cluster-layering.md
- docs/OnClusterLayering.md
expected_mentions:
- MachineOSConfig
- MachineOSBuild
- layering
- Build Controller

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚖️ Poor tradeoff

Move expected_files and expected_mentions to annotations.yaml.

Per the eval.yaml dataset schema (lines 43-53), expected_files and expected_mentions must be in annotations.yaml, not input.yaml. The downstream judges (mentions_expected_keywords and response_accuracy) access these via annotations.get('expected_mentions') and annotations.get('expected_files', []) respectively, so placing them in input.yaml makes them inaccessible at runtime.

This is the exact configuration bug reported in analysis.md (config bug #2): "expected_mentions, expected_files, category, rule, expected_guidance, expected_patterns must be in annotations.yaml for judges to access them via outputs['annotations']".

🔧 Proposed fix: Move expectations to annotations.yaml

Create/update eval/cases/case-003/annotations.yaml:

category: navigation
difficulty: medium
topic: OnClusterLayering
expected_files:
  - ai-docs/domain/crds.md
  - ai-docs/decisions/adr-001-on-cluster-layering.md
  - docs/OnClusterLayering.md
expected_mentions:
  - MachineOSConfig
  - MachineOSBuild
  - layering
  - Build Controller

Then update eval/cases/case-003/input.yaml to contain only the prompt:

-prompt: I want to set up on-cluster layering to build custom OS images. Where can
-  I find information about MachineOSConfig and MachineOSBuild, and how does the build
-  process work?
-expected_files:
-- ai-docs/domain/crds.md
-- ai-docs/decisions/adr-001-on-cluster-layering.md
-- docs/OnClusterLayering.md
-expected_mentions:
-- MachineOSConfig
-- MachineOSBuild
-- layering
-- Build Controller
+prompt: I want to set up on-cluster layering to build custom OS images. Where can
+  I find information about MachineOSConfig and MachineOSBuild, and how does the build
+  process work?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-003/input.yaml` around lines 4 - 12, The expected_files and
expected_mentions fields in input.yaml are in the wrong location. According to
the eval.yaml schema, these fields must be in annotations.yaml instead because
the downstream judges (mentions_expected_keywords and response_accuracy) access
them via annotations.get() calls. Remove the expected_files and
expected_mentions sections from input.yaml, and move them to annotations.yaml
alongside other metadata like category and difficulty. This ensures the judges
can properly access these expectations at runtime through the annotations
dictionary.

Comment on lines +1 to +3
category: navigation
difficulty: medium
topic: NodeDisruptionPolicy

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚖️ Poor tradeoff

Add missing expected_files and expected_mentions to annotations.yaml.

This annotations.yaml is incomplete per the eval.yaml schema (lines 43-53). It should include expected_files and expected_mentions, which currently reside (incorrectly) in the paired input.yaml. All expectation metadata belongs in annotations.yaml so judges can access it via annotations.get().

Additionally, the topic field is not defined in the eval.yaml schema and adds custom metadata that judges do not use.

🔧 Proposed fix: Populate annotations.yaml with expected_* fields

Update eval/cases/case-004/annotations.yaml:

 category: navigation
 difficulty: medium
-topic: NodeDisruptionPolicy
+expected_files:
+  - docs/NodeDisruptionPolicy.md
+expected_mentions:
+  - NodeDisruptionPolicy
+  - MachineConfiguration
+  - cluster
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
category: navigation
difficulty: medium
topic: NodeDisruptionPolicy
category: navigation
difficulty: medium
expected_files:
- docs/NodeDisruptionPolicy.md
expected_mentions:
- NodeDisruptionPolicy
- MachineConfiguration
- cluster
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-004/annotations.yaml` around lines 1 - 3, Remove the topic
field from annotations.yaml as it is not defined in the eval.yaml schema. Add
the missing expected_files and expected_mentions fields to annotations.yaml by
moving them from the paired input.yaml file. This ensures all expectation
metadata is consolidated in annotations.yaml according to the eval.yaml schema
(lines 43-53) so judges can access it via annotations.get(), and removes
undefined custom metadata.

Comment on lines +1 to +4
category: component-usage
api_type: kubernetes-crd
use_case: basic-usage
difficulty: easy

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Annotations schema does not match eval.yaml contract.

According to eval.yaml:48-55, annotations.yaml must contain:

  • category, expected_files, expected_mentions, expected_rejection (anti-pattern only), expected_guidance (anti-pattern only)

This file has: category, api_type, use_case, difficulty — none of which are in the contract.

This mismatch is already documented in eval/runs/mco-docs-eval/2026-06-22-sonnet/analysis.md:40-58 as the root cause of judge failures (str object has no attribute get). Unless the schema is being revised, these fixtures must be corrected before evaluation can proceed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-007/annotations.yaml` around lines 1 - 4, The
annotations.yaml file contains fields (api_type, use_case, difficulty) that do
not conform to the eval.yaml contract schema. Replace these incorrect fields
with the required fields specified in eval.yaml:48-55: keep category, add
expected_files, add expected_mentions, and conditionally add expected_rejection
and expected_guidance only if this is an anti-pattern case. Remove all fields
that are not part of the contract (api_type, use_case, difficulty) to ensure the
schema matches and prevents the judge failures caused by attempting to call
get() on string objects.

Comment on lines +1 to +7
category: authoring
content_type: machineconfig-resource
complexity: medium
constraints_tested:
- all-remote-sources-resolved-at-render-time
- deterministic-merge-ordering
- worker-pool-label-matching

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Add judge-required metadata fields to annotations.yaml.

The authoring_quality judge template (eval.yaml line 209–246) expects annotations.get('expected_patterns', []) and annotations.get('expected_documentation', []), but case-015/annotations.yaml is missing these fields. The expected patterns and documentation files are currently in input.yaml (lines 13–26), but judges cannot access them there.

Migrate to annotations.yaml:

  • expected_patterns: [...] from input.yaml lines 14–21
  • expected_documentation: [...] from input.yaml lines 23–26
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-015/annotations.yaml` around lines 1 - 7, Add the required
metadata fields to the annotations.yaml file in case-015 directory. The
authoring_quality judge template expects expected_patterns and
expected_documentation fields to be present in annotations, but they are
currently missing. Migrate the expected_patterns list (currently in input.yaml
lines 14-21) and the expected_documentation list (currently in input.yaml lines
23-26) from input.yaml to annotations.yaml so that the judge template can access
them via annotations.get() calls.

Comment on lines +1 to +7
category: authoring
content_type: machineconfigpool-and-kubeletconfig
complexity: complex
constraints_tested:
- custom-pool-must-inherit-from-worker
- kubeletconfig-controller-integration
- deterministic-rendering

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Add judge-required metadata fields to annotations.yaml.

The authoring_quality judge template expects annotations.get('expected_patterns', []) and annotations.get('expected_documentation', []), but case-016/annotations.yaml lacks these fields. The expected patterns and documentation are in input.yaml (lines 16–30), but judges read from annotations only.

Migrate to annotations.yaml:

  • expected_patterns: [...] from input.yaml lines 17–25
  • expected_documentation: [...] from input.yaml lines 27–30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-016/annotations.yaml` around lines 1 - 7, The
annotations.yaml file is missing the expected_patterns and
expected_documentation fields that the authoring_quality judge template
requires. Add the expected_patterns list from input.yaml lines 17-25 and the
expected_documentation list from input.yaml lines 27-30 as top-level fields in
annotations.yaml to ensure the judge can properly access these required metadata
fields.

Comment on lines +7 to +30
expected_structure:
- MachineConfigPool resource definition
- MachineConfigPool metadata with name
- MachineConfigPool spec.machineConfigSelector with matchExpressions
- MachineConfigPool spec.nodeSelector
- KubeletConfig resource definition
- KubeletConfig metadata with name
- KubeletConfig spec.machineConfigPoolSelector
- KubeletConfig spec.kubeletConfig with systemReserved and cpuManagerPolicy
expected_patterns:
- Custom MachineConfigPool inherits from worker pool via machineConfigSelector using
matchExpressions with both worker and gpu-worker roles
- machineConfigSelector must NOT use independent matchLabels without worker inheritance
- 'nodeSelector matches node-role.kubernetes.io/gpu-worker: '''''
- KubeletConfig machineConfigPoolSelector targets the gpu-worker pool
- apiVersion is machineconfiguration.openshift.io/v1 for both resources
- KubeletConfig kubeletConfig section uses valid kubelet configuration fields
- systemReserved memory is specified as '2Gi'
- cpuManagerPolicy is set to 'static'
expected_documentation:
- ai-docs/domain/crds.md
- ai-docs/architecture/components.md
- docs/KubeletConfigDesign.md
- docs/MachineConfig.md

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Relocate expected_ metadata from input.yaml to annotations.yaml.*

Lines 7–30 contain test metadata (expected_structure, expected_patterns, expected_documentation) that should reside in annotations.yaml per the judge contract (eval.yaml:209–246). Keep only the prompt field in input.yaml; move the metadata to case-016/annotations.yaml.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/cases/case-016/input.yaml` around lines 7 - 30, The test metadata fields
(expected_structure, expected_patterns, and expected_documentation) currently in
input.yaml at lines 7-30 should be relocated to annotations.yaml per the judge
contract requirements. Remove these three expected_* fields from input.yaml and
keep only the prompt field there. Then add the expected_structure,
expected_patterns, and expected_documentation fields to the
case-016/annotations.yaml file with the same content that was removed from
input.yaml.

Comment on lines +5 to +11
expected_rejection: true
expected_constraint: Custom MachineConfigPools must inherit from worker pool via machineConfigSelector
expected_documentation:
- ai-docs/architecture/components.md
correct_approach: Custom MachineConfigPools must inherit from the worker pool by including
the worker role in their machineConfigSelector, then adding additional selectors
for the custom role on top of that inheritance.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Run output shows correct_approach instead of expected_guidance from schema.

The anti-pattern schema in eval.yaml:48-55 specifies expected_guidance as the field for correct approach keywords. This run snapshot uses correct_approach. Confirm whether this is an intentional schema evolution or a bug that slipped into the first run.

This inconsistency will prevent judges from correctly mapping the field during evaluation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/input.yaml` around
lines 5 - 11, The test case file case-011/input.yaml uses the field name
correct_approach when it should use expected_guidance to match the anti-pattern
schema definition in eval.yaml (lines 48-55). Replace the field name
correct_approach with expected_guidance in the YAML file to ensure consistency
with the schema specification and allow judges to correctly map this field
during evaluation.

Comment on lines +1 to +14
prompt: "I want to set up a NodeDisruptionPolicy to control how my nodes handle certain\
\ config changes. I'm going to create a MachineConfiguration custom resource named\
\ 'my-cluster-policy' with the disruption actions I need. Here's my YAML so far:\n\
\napiVersion: operator.openshift.io/v1\nkind: MachineConfiguration\nmetadata:\n\
\ name: my-cluster-policy\nspec:\n nodeDisruptionPolicy:\n ...\n\nCan you help\
\ me fill in the spec?"
expected_rejection: true
expected_constraint: MachineConfiguration CR for NodeDisruptionPolicy must be named
'cluster'
expected_documentation:
- docs/NodeDisruptionPolicy.md
correct_approach: The MachineConfiguration custom resource for NodeDisruptionPolicy
must be named exactly 'cluster'. Change metadata.name from 'my-cluster-policy' to
'cluster'.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Run output shows correct_approach instead of expected_guidance from schema.

Same as case-011: the anti-pattern schema specifies expected_guidance, but this run uses correct_approach. Synchronize across all anti-pattern test cases and update the schema documentation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/input.yaml` around
lines 1 - 14, The input.yaml file uses the field name `correct_approach` but the
anti-pattern schema specifies that this field should be named
`expected_guidance`. Rename the `correct_approach` field to `expected_guidance`
in this test case to align with the schema specification. Ensure this change is
applied consistently across all anti-pattern test cases to maintain schema
compliance.

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@jatinsu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 79a929f link true /test unit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant