WIP: Agentic docs example#6223
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
WalkthroughAdds MCO documentation pages, an eval benchmark spec, 16 documented cases, and recorded run outputs with analysis for the docs benchmark. ChangesDocumentation and evaluation updates
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (2 errors, 1 warning, 1 inconclusive)
✅ Passed checks (11 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jatinsu The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
eval.md (1)
142-145: 🎯 Functional Correctness | 🔴 CriticalFix budget_check judge logic: it compares aggregate run cost against per-case budget.
The
budget_checkjudge applies the same aggregate cost ($11.14 total run) to all cases. All case rationales show "Cost $11.14 exceeds budget $2.00", though individual case costs range from $0.31–$1.20 — each under the threshold. The judge logic must retrieve and compare the per-case cost instead of the total run cost.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval.md` around lines 142 - 145, The budget_check judge is currently using the aggregate total run cost for all case evaluations instead of the individual per-case cost. Locate the budget_check judge logic and modify it to retrieve the actual cost for each specific case rather than using the total run cost. Update the comparison to check if the per-case cost exceeds the per-case budget threshold instead of comparing the aggregate cost, ensuring that each case is evaluated against its own individual cost which should be significantly lower than the total run cost.eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yaml (1)
1-17: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winSchema mismatch: test case uses wrong field names for judge evaluation.
This input uses
expected_api,expected_example_type,expected_fields, andexpected_documentation, but the judges defined ineval.yaml(lines 187–236) expectexpected_filesandexpected_mentions. This mismatch causes judges to skip validation (as documented inanalysis.mdand confirmed by the'str object' has no attribute 'get'errors insummary.yaml).Align field names with judge expectations: rename
expected_documentation→expected_filesand addexpected_mentionsif applicable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yaml` around lines 1 - 17, The test case input.yaml uses incorrect field names that do not match the judge evaluation schema defined in eval.yaml. Rename the field expected_documentation to expected_files to align with judge expectations. Additionally, add the expected_mentions field if there are specific mentions or terms that the judges should validate against. Ensure all field names in this test case match the schema requirements documented in the judges section of eval.yaml.eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yaml (1)
1-15: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winSchema mismatch: test case uses wrong field names for judge evaluation.
This input uses
expected_api,expected_example_type,expected_fields, andexpected_documentation, but the judges defined ineval.yaml(lines 187–236) expectexpected_filesandexpected_mentions. This mismatch causes judges to skip validation (as documented inanalysis.mdand confirmed by the'str object' has no attribute 'get'errors insummary.yaml).Align field names with judge expectations: rename
expected_documentation→expected_filesand addexpected_mentionsif applicable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yaml` around lines 1 - 15, The test case in input.yaml uses field names that do not match the expectations defined in the judges within eval.yaml. Rename the field expected_documentation to expected_files to align with judge expectations. Additionally, add an expected_mentions field if there are specific API or documentation mentions that should be validated by the judges. This will ensure the schema matches the judge validation logic and prevent the validation skip errors.eval/cases/case-003/annotations.yaml (1)
1-4: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winCritical: annotations.yaml missing required metadata fields.
Per eval.yaml dataset schema,
annotations.yamlmust includeexpected_filesandexpected_mentions. This file only declarescategory,difficulty, andtopic—missing the fields required for judge evaluation. Withoutexpected_files, theconsulted_docsjudge cannot validate which documentation the agent should consult. Withoutexpected_mentions, thementions_expected_keywordsjudge will be skipped.Add
expected_files(list of docs the agent should consult) andexpected_mentions(keywords the response should include).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/cases/case-003/annotations.yaml` around lines 1 - 4, The annotations.yaml file is missing the required metadata fields for judge evaluation. Add two new fields to the YAML file: first add `expected_files` as a list containing the names of documentation files that the agent should consult when answering the question, and second add `expected_mentions` as a list containing the keywords or concepts that should appear in the agent's response. These fields are mandatory per the eval.yaml dataset schema and are required for the consulted_docs and mentions_expected_keywords judges to function properly.eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yaml (1)
6-27: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy liftMove expected metadata to annotations.yaml; verify field mappings for authoring test cases.
This run snapshot shows case-015/input.yaml contains expected result fields (expected_structure, expected_patterns, expected_documentation) that should be in annotations.yaml. The analysis.md explicitly calls out expected_patterns as a field that must migrate to annotations.yaml (line 17).
The baseline schema (eval.yaml:43–53) may need extension for authoring tests, but the principle remains: expected result metadata belongs in annotations.yaml so judges can query it via
outputs["annotations"].For case-015, move to eval/cases/case-015/annotations.yaml:
expected_structure→expected_structure(if authoring tests use this field)expected_patterns→expected_patterns(per analysis.md)expected_documentation→expected_filesThen verify that judges have corresponding
ifconditions and checks for these fields (similar to lines 162–194 in eval.yaml).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yaml` around lines 6 - 27, The expected result metadata fields (expected_structure, expected_patterns, and expected_documentation) currently in input.yaml should be migrated to annotations.yaml so that judge logic can query them via outputs["annotations"]. Create or update the annotations.yaml file to include these fields, renaming expected_documentation to expected_files to match the schema convention. Then verify that the judge conditions and checks in eval.yaml (similar to the pattern shown in lines 162–194) have corresponding if conditions that handle these three fields so judges can properly validate against the expected results.eval/cases/case-015/input.yaml (1)
6-27: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winRelocate expected_ metadata from input.yaml to annotations.yaml.*
Lines 6–27 contain test metadata (
expected_structure,expected_patterns,expected_documentation) that should reside in annotations.yaml per the judge contract. Keep only thepromptfield in input.yaml; move the metadata to the corresponding case-015/annotations.yaml file.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/cases/case-015/input.yaml` around lines 6 - 27, The test metadata fields expected_structure, expected_patterns, and expected_documentation in input.yaml should be relocated to annotations.yaml according to the judge contract. Remove these three fields from input.yaml while keeping only the prompt field, then add the expected_structure, expected_patterns, and expected_documentation fields to the corresponding annotations.yaml file in the case-015 directory.
♻️ Duplicate comments (2)
eval/cases/case-013/input.yaml (1)
6-14: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy liftThese expected result fields must move to annotations.yaml (duplicate of case-013/annotations.yaml review).
Lines 6–14 contain expected result metadata that belongs in annotations.yaml. See the fix in case-013/annotations.yaml review above.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/cases/case-013/input.yaml` around lines 6 - 14, The fields expected_rejection, expected_constraint, expected_documentation, and correct_approach in the input.yaml file for case-013 should be moved to the annotations.yaml file instead. Remove these four fields from input.yaml and add them to the corresponding annotations.yaml file where they belong as expected result metadata.eval/cases/case-012/input.yaml (1)
7-14: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy liftThese expected result fields must move to annotations.yaml (duplicate of case-012/annotations.yaml review).
Lines 7–14 contain expected result metadata that belongs in annotations.yaml per the schema contract. See the detailed fix in the case-012/annotations.yaml review above. Once moved, this file should contain only the
promptfield.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/cases/case-012/input.yaml` around lines 7 - 14, The fields expected_rejection, expected_constraint, expected_documentation, and correct_approach in input.yaml (lines 7-14) violate the schema contract and should only exist in annotations.yaml. Remove all these expected result metadata fields from input.yaml, leaving only the prompt field in this file. Then ensure these same fields are properly defined in the corresponding annotations.yaml file for case-012.
🧹 Nitpick comments (5)
AGENTS.md (1)
9-25: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueSpecify language for fenced code block.
The directory listing code block (line 9) is missing a language specifier. Use
bash,text, orshas appropriate for markdownlint compliance (MD040).✏️ Proposed fix
-``` +```bash cmd/ # 7 binaries (operator, controller, daemon, server, machine-os-builder, osimagestream, apiserver-watcher)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@AGENTS.md` around lines 9 - 25, The fenced code block containing the directory listing in AGENTS.md is missing a language specifier, which violates markdownlint rule MD040. Add a language identifier (bash, text, or sh) immediately after the opening triple backticks of the code block that lists the directories like cmd/, pkg/controller/, pkg/daemon/, etc. This ensures the markdown is compliant and properly formatted.Source: Linters/SAST tools
MCO_DEVELOPMENT.md (1)
14-21: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueSpecify language for fenced code block.
The bash code block (line 14) is missing a language specifier. Per markdownlint (MD040), specify
bashexplicitly.✏️ Proposed fix
-``` +```bash make binaries # All 8 binaries🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@MCO_DEVELOPMENT.md` around lines 14 - 21, The fenced code block containing the make commands (starting with make binaries) is missing a language specifier on the opening triple backticks. Add bash as the language identifier by changing the opening ``` to ```bash to comply with the markdownlint MD040 rule.Source: Linters/SAST tools
MCO_TESTING.md (2)
98-103: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueSpecify language for fenced code block.
The Go import code block (line 98) is missing a language specifier. Per markdownlint (MD040), specify
goexplicitly.✏️ Proposed fix
-``` +```go import (🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@MCO_TESTING.md` around lines 98 - 103, The fenced code block containing the Go import statements for helpers and framework packages is missing the language specifier. Add `go` to the opening fence (changing from ``` to ```go) before the import statement to explicitly declare this as a Go code block and comply with markdown linting standards.Source: Linters/SAST tools
103-105: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueReduce repetition of "Use" at the start of consecutive sentences.
Three successive sentences in the E2E test-writing guidance begin with "Use", which reduces readability. Consider restructuring for variety.
✏️ Proposed fix
-Use builders from `test/helpers/` for creating test objects. Use `test/e2e-shared-tests/` for test logic shared across suites. +Use builders from `test/helpers/` for creating test objects and `test/e2e-shared-tests/` for test logic shared across suites.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@MCO_TESTING.md` around lines 103 - 105, The E2E test-writing guidance section contains three consecutive sentences that all begin with "Use", which reduces readability. Restructure at least one or two of these sentences to start with different words while preserving the same meaning and guidance. For example, consider using alternative phrasings like "Utilize builders from...", "Create test objects using builders from...", "Leverage builders from...", or restructuring with different sentence structures (e.g., "Test objects should be created using builders from...") to introduce grammatical variety while maintaining clear and instructive tone.Source: Linters/SAST tools
eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yaml (1)
1-10: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winRegenerate run snapshots after fixing test case structure.
This file contains a snapshot of case-001 as executed with the buggy input.yaml structure (expected_files/expected_mentions in input.yaml). Once the test cases are corrected to move expected_* fields to annotations.yaml, the baseline snapshots in eval/runs/ should be regenerated by re-running
/eval-run --model claude-sonnet-4-6 --run-id 2026-06-22-sonnet-v2(as recommended in analysis.md line 15).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yaml` around lines 1 - 10, The input.yaml file has an incorrect structure with expected_files and expected_mentions fields that should not be in this file. Remove the expected_files and expected_mentions fields from input.yaml, keeping only the prompt field. Move the expected_files and expected_mentions fields to annotations.yaml where they belong according to the corrected test case structure. After making these structural corrections, regenerate the baseline snapshots by running the eval-run command with the specified model and run-id as documented in analysis.md.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@eval.yaml`:
- Around line 43-53: The schema definition in the dataset section of eval.yaml
incorrectly documents the structure of annotations.yaml files, listing fields
like expected_files, expected_mentions, expected_rejection, and
expected_guidance that do not actually exist in the test case files. Update the
schema documentation to accurately reflect the actual fields present in
annotations.yaml, which include category (present in all cases) along with
case-specific metadata fields such as difficulty, complexity, topic, severity,
constraint_type, use_case, focus, api_type, content_type, and
constraints_tested. Remove all references to the non-existent expected_* fields.
In `@eval/cases/case-002/input.yaml`:
- Around line 1-10: The input.yaml file currently contains evaluation metadata
fields (expected_files, expected_mentions, and their values) that violate the
eval.yaml dataset schema. According to the schema definition, input.yaml should
contain only the prompt field. Move the expected_files and expected_mentions
fields along with their values from input.yaml to annotations.yaml, ensuring
input.yaml retains only the prompt field. This separation is critical because
the mentions_expected_keywords judge looks for
annotations.get('expected_mentions') to validate expected keywords, and the
current placement in the wrong file causes the validation to be skipped.
In `@eval/cases/case-003/input.yaml`:
- Around line 4-12: The expected_files and expected_mentions fields in
input.yaml are in the wrong location. According to the eval.yaml schema, these
fields must be in annotations.yaml instead because the downstream judges
(mentions_expected_keywords and response_accuracy) access them via
annotations.get() calls. Remove the expected_files and expected_mentions
sections from input.yaml, and move them to annotations.yaml alongside other
metadata like category and difficulty. This ensures the judges can properly
access these expectations at runtime through the annotations dictionary.
In `@eval/cases/case-004/annotations.yaml`:
- Around line 1-3: Remove the topic field from annotations.yaml as it is not
defined in the eval.yaml schema. Add the missing expected_files and
expected_mentions fields to annotations.yaml by moving them from the paired
input.yaml file. This ensures all expectation metadata is consolidated in
annotations.yaml according to the eval.yaml schema (lines 43-53) so judges can
access it via annotations.get(), and removes undefined custom metadata.
In `@eval/cases/case-007/annotations.yaml`:
- Around line 1-4: The annotations.yaml file contains fields (api_type,
use_case, difficulty) that do not conform to the eval.yaml contract schema.
Replace these incorrect fields with the required fields specified in
eval.yaml:48-55: keep category, add expected_files, add expected_mentions, and
conditionally add expected_rejection and expected_guidance only if this is an
anti-pattern case. Remove all fields that are not part of the contract
(api_type, use_case, difficulty) to ensure the schema matches and prevents the
judge failures caused by attempting to call get() on string objects.
In `@eval/cases/case-007/input.yaml`:
- Around line 15-17: In the input.yaml file at the fixture level, rename the
field `expected_documentation` to `expected_files` to align with the field name
defined in the schema in eval.yaml. This ensures consistency between the fixture
and the schema definition so that judges can properly access the expected
documentation references using the correct field name.
In `@eval/cases/case-008/annotations.yaml`:
- Around line 1-3: The annotations.yaml file in case-008 contains incorrect
fields that do not conform to the eval.yaml schema contract. Remove the `focus`
and `complexity` fields from the annotations.yaml file. Replace them with the
required fields according to the eval.yaml contract (lines 48-55): add
`expected_files`, `expected_mentions`, and if this is an anti-pattern case, also
add `expected_rejection` and `expected_guidance`. Keep the `category` field as
it is already correct. Refer to the analysis.md documentation (lines 40-58) for
context on this issue which is the same root cause as case-007.
In `@eval/cases/case-008/input.yaml`:
- Around line 5-26: The input.yaml file references schema fields
`expected_components` and `expected_interactions` that are not defined in the
eval.yaml schema (lines 48-55). Either add these field definitions to the schema
in eval.yaml with their validation rules, or remove these fields from the
input.yaml if they are not required. Additionally, rename the
`expected_documentation` field to `expected_files` in the input.yaml to match
the field name defined in the eval.yaml schema.
In `@eval/cases/case-009/annotations.yaml`:
- Around line 1-3: The annotations.yaml file in eval/cases/case-009 is missing
required fields that are defined in the eval.yaml contract schema. Currently it
only contains category, focus, and complexity fields, but must also include the
required fields expected_files and expected_mentions (and any other required
fields documented in the eval.yaml contract). Add all missing required fields to
the annotations.yaml file to match the proper schema definition, following the
same structure used in case-007 and case-008 as reference examples.
In `@eval/cases/case-012/annotations.yaml`:
- Around line 1-4: The annotations.yaml file is missing required expected
metadata fields that judges need to access. Move the fields from input.yaml to
annotations.yaml while renaming them according to the schema contract: move
expected_rejection as-is, rename expected_constraint to rule, rename
expected_documentation to expected_files, and rename correct_approach to
expected_guidance. After adding these fields to annotations.yaml with their
corrected names, remove all these fields from input.yaml, leaving only the
prompt field there.
In `@eval/cases/case-014/input.yaml`:
- Around line 8-16: The case-014 evaluation is storing judge-required metadata
in input.yaml with field names expected_constraint and correct_approach, but the
judge template reads from annotations.yaml using different keys (rule and
expected_guidance respectively). Remove the expected_constraint and
correct_approach fields from the input.yaml file for case-014, then add them to
case-014/annotations.yaml with the correct field names: rename
expected_constraint to rule and correct_approach to expected_guidance. This
alignment will allow the judge to properly read the constraint details and
guidance without encountering the "str object has no attribute get" error.
In `@eval/cases/case-015/annotations.yaml`:
- Around line 1-7: Add the required metadata fields to the annotations.yaml file
in case-015 directory. The authoring_quality judge template expects
expected_patterns and expected_documentation fields to be present in
annotations, but they are currently missing. Migrate the expected_patterns list
(currently in input.yaml lines 14-21) and the expected_documentation list
(currently in input.yaml lines 23-26) from input.yaml to annotations.yaml so
that the judge template can access them via annotations.get() calls.
In `@eval/cases/case-016/annotations.yaml`:
- Around line 1-7: The annotations.yaml file is missing the expected_patterns
and expected_documentation fields that the authoring_quality judge template
requires. Add the expected_patterns list from input.yaml lines 17-25 and the
expected_documentation list from input.yaml lines 27-30 as top-level fields in
annotations.yaml to ensure the judge can properly access these required metadata
fields.
In `@eval/cases/case-016/input.yaml`:
- Around line 7-30: The test metadata fields (expected_structure,
expected_patterns, and expected_documentation) currently in input.yaml at lines
7-30 should be relocated to annotations.yaml per the judge contract
requirements. Remove these three expected_* fields from input.yaml and keep only
the prompt field there. Then add the expected_structure, expected_patterns, and
expected_documentation fields to the case-016/annotations.yaml file with the
same content that was removed from input.yaml.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/input.yaml`:
- Around line 5-11: The test case file case-011/input.yaml uses the field name
correct_approach when it should use expected_guidance to match the anti-pattern
schema definition in eval.yaml (lines 48-55). Replace the field name
correct_approach with expected_guidance in the YAML file to ensure consistency
with the schema specification and allow judges to correctly map this field
during evaluation.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/input.yaml`:
- Around line 1-14: The input.yaml file uses the field name `correct_approach`
but the anti-pattern schema specifies that this field should be named
`expected_guidance`. Rename the `correct_approach` field to `expected_guidance`
in this test case to align with the schema specification. Ensure this change is
applied consistently across all anti-pattern test cases to maintain schema
compliance.
---
Outside diff comments:
In `@eval.md`:
- Around line 142-145: The budget_check judge is currently using the aggregate
total run cost for all case evaluations instead of the individual per-case cost.
Locate the budget_check judge logic and modify it to retrieve the actual cost
for each specific case rather than using the total run cost. Update the
comparison to check if the per-case cost exceeds the per-case budget threshold
instead of comparing the aggregate cost, ensuring that each case is evaluated
against its own individual cost which should be significantly lower than the
total run cost.
In `@eval/cases/case-003/annotations.yaml`:
- Around line 1-4: The annotations.yaml file is missing the required metadata
fields for judge evaluation. Add two new fields to the YAML file: first add
`expected_files` as a list containing the names of documentation files that the
agent should consult when answering the question, and second add
`expected_mentions` as a list containing the keywords or concepts that should
appear in the agent's response. These fields are mandatory per the eval.yaml
dataset schema and are required for the consulted_docs and
mentions_expected_keywords judges to function properly.
In `@eval/cases/case-015/input.yaml`:
- Around line 6-27: The test metadata fields expected_structure,
expected_patterns, and expected_documentation in input.yaml should be relocated
to annotations.yaml according to the judge contract. Remove these three fields
from input.yaml while keeping only the prompt field, then add the
expected_structure, expected_patterns, and expected_documentation fields to the
corresponding annotations.yaml file in the case-015 directory.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yaml`:
- Around line 1-15: The test case in input.yaml uses field names that do not
match the expectations defined in the judges within eval.yaml. Rename the field
expected_documentation to expected_files to align with judge expectations.
Additionally, add an expected_mentions field if there are specific API or
documentation mentions that should be validated by the judges. This will ensure
the schema matches the judge validation logic and prevent the validation skip
errors.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yaml`:
- Around line 1-17: The test case input.yaml uses incorrect field names that do
not match the judge evaluation schema defined in eval.yaml. Rename the field
expected_documentation to expected_files to align with judge expectations.
Additionally, add the expected_mentions field if there are specific mentions or
terms that the judges should validate against. Ensure all field names in this
test case match the schema requirements documented in the judges section of
eval.yaml.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yaml`:
- Around line 6-27: The expected result metadata fields (expected_structure,
expected_patterns, and expected_documentation) currently in input.yaml should be
migrated to annotations.yaml so that judge logic can query them via
outputs["annotations"]. Create or update the annotations.yaml file to include
these fields, renaming expected_documentation to expected_files to match the
schema convention. Then verify that the judge conditions and checks in eval.yaml
(similar to the pattern shown in lines 162–194) have corresponding if conditions
that handle these three fields so judges can properly validate against the
expected results.
---
Duplicate comments:
In `@eval/cases/case-012/input.yaml`:
- Around line 7-14: The fields expected_rejection, expected_constraint,
expected_documentation, and correct_approach in input.yaml (lines 7-14) violate
the schema contract and should only exist in annotations.yaml. Remove all these
expected result metadata fields from input.yaml, leaving only the prompt field
in this file. Then ensure these same fields are properly defined in the
corresponding annotations.yaml file for case-012.
In `@eval/cases/case-013/input.yaml`:
- Around line 6-14: The fields expected_rejection, expected_constraint,
expected_documentation, and correct_approach in the input.yaml file for case-013
should be moved to the annotations.yaml file instead. Remove these four fields
from input.yaml and add them to the corresponding annotations.yaml file where
they belong as expected result metadata.
---
Nitpick comments:
In `@AGENTS.md`:
- Around line 9-25: The fenced code block containing the directory listing in
AGENTS.md is missing a language specifier, which violates markdownlint rule
MD040. Add a language identifier (bash, text, or sh) immediately after the
opening triple backticks of the code block that lists the directories like cmd/,
pkg/controller/, pkg/daemon/, etc. This ensures the markdown is compliant and
properly formatted.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yaml`:
- Around line 1-10: The input.yaml file has an incorrect structure with
expected_files and expected_mentions fields that should not be in this file.
Remove the expected_files and expected_mentions fields from input.yaml, keeping
only the prompt field. Move the expected_files and expected_mentions fields to
annotations.yaml where they belong according to the corrected test case
structure. After making these structural corrections, regenerate the baseline
snapshots by running the eval-run command with the specified model and run-id as
documented in analysis.md.
In `@MCO_DEVELOPMENT.md`:
- Around line 14-21: The fenced code block containing the make commands
(starting with make binaries) is missing a language specifier on the opening
triple backticks. Add bash as the language identifier by changing the opening
``` to ```bash to comply with the markdownlint MD040 rule.
In `@MCO_TESTING.md`:
- Around line 98-103: The fenced code block containing the Go import statements
for helpers and framework packages is missing the language specifier. Add `go`
to the opening fence (changing from ``` to ```go) before the import statement to
explicitly declare this as a Go code block and comply with markdown linting
standards.
- Around line 103-105: The E2E test-writing guidance section contains three
consecutive sentences that all begin with "Use", which reduces readability.
Restructure at least one or two of these sentences to start with different words
while preserving the same meaning and guidance. For example, consider using
alternative phrasings like "Utilize builders from...", "Create test objects
using builders from...", "Leverage builders from...", or restructuring with
different sentence structures (e.g., "Test objects should be created using
builders from...") to introduce grammatical variety while maintaining clear and
instructive tone.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c8442213-bc9a-4f06-945b-1349b7e1b139
⛔ Files ignored due to path filters (50)
eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/output/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/output/stdout.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/stderr.logis excluded by!**/*.logeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/stdout.logis excluded by!**/*.log
📒 Files selected for processing (114)
AGENTS.mdMCO_DEVELOPMENT.mdMCO_TESTING.mdai-docs/architecture/components.mdai-docs/decisions/adr-001-on-cluster-layering.mdai-docs/decisions/adr-002-config-rendering.mdai-docs/decisions/adr-003-update-lifecycle.mdai-docs/domain/crds.mdai-docs/exec-plans/README.mdai-docs/references/ecosystem.mdeval.mdeval.yamleval/cases/case-001/annotations.yamleval/cases/case-001/input.yamleval/cases/case-002/annotations.yamleval/cases/case-002/input.yamleval/cases/case-003/annotations.yamleval/cases/case-003/input.yamleval/cases/case-004/annotations.yamleval/cases/case-004/input.yamleval/cases/case-005/annotations.yamleval/cases/case-005/input.yamleval/cases/case-006/annotations.yamleval/cases/case-006/input.yamleval/cases/case-007/annotations.yamleval/cases/case-007/input.yamleval/cases/case-008/annotations.yamleval/cases/case-008/input.yamleval/cases/case-009/annotations.yamleval/cases/case-009/input.yamleval/cases/case-010/annotations.yamleval/cases/case-010/input.yamleval/cases/case-011/annotations.yamleval/cases/case-011/input.yamleval/cases/case-012/annotations.yamleval/cases/case-012/input.yamleval/cases/case-013/annotations.yamleval/cases/case-013/input.yamleval/cases/case-014/annotations.yamleval/cases/case-014/input.yamleval/cases/case-015/annotations.yamleval/cases/case-015/input.yamleval/cases/case-016/annotations.yamleval/cases/case-016/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/analysis.mdeval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-001/subagents/agent-a207441d16e958075.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-002/subagents/agent-a845438d18a82924a.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-003/subagents/agent-a213d36a449d15fd0.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-004/subagents/agent-a2b52b5edbe308998.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-005/subagents/agent-a2cc8654419f1bf46.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-006/subagents/agent-a1070550a207b5ed7.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-007/subagents/agent-a472ace94b5ff04dd.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-008/subagents/agent-aafd2a1079434721b.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-009/subagents/agent-a4c85ad13de8d82c4.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-010/subagents/agent-ad8f424734eaf9c41.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/subagents/agent-a12d71965a7d591c1.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/subagents/agent-a75d9bfdacfcec099.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-013/subagents/agent-a5cba4a548698d94a.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-014/subagents/agent-af337390aa687194c.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-015/subagents/agent-a72410ff514c682f1.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/events.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/input.yamleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/subagents/agent-a73ecb25f4b643455.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-016/subagents/agent-ac035d4a244162191.jsonleval/runs/mco-docs-eval/2026-06-22-sonnet/collection.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/report.htmleval/runs/mco-docs-eval/2026-06-22-sonnet/run_result.jsoneval/runs/mco-docs-eval/2026-06-22-sonnet/summary.yaml
| dataset: | ||
| path: eval/cases | ||
| schema: | | ||
| Each case directory contains: | ||
| - input.yaml: YAML file with 'prompt' (the question or task for the agent) | ||
| - annotations.yaml: Expected results metadata with fields: | ||
| - category: test category (navigation, component-usage, architecture, anti-pattern, authoring) | ||
| - expected_files: list of documentation files the agent should consult | ||
| - expected_mentions: keywords that should appear in the response | ||
| - expected_rejection: (anti-pattern only) boolean, true if agent should reject the approach | ||
| - expected_guidance: (anti-pattern only) keywords for the correct approach |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify test case annotations.yaml files match the declared schema.
# Check a sample of case annotations for structure and types
for case_dir in eval/cases/case-00{1,5,9,13}/; do
if [ -d "$case_dir" ]; then
echo "=== $case_dir/annotations.yaml ==="
head -20 "$case_dir/annotations.yaml"
fi
done
# Verify required keys exist in all annotations
echo -e "\n=== Checking required fields in all cases ==="
for annotations in eval/cases/*/annotations.yaml; do
if [ -f "$annotations" ]; then
echo "Checking: $annotations"
grep -E "^(category|expected_|rule|correct_approach)" "$annotations" | head -5
fi
doneRepository: openshift/machine-config-operator
Length of output: 1694
🏁 Script executed:
#!/bin/bash
# Comprehensive check for all fields in annotations.yaml files
echo "=== Complete content of a few annotations.yaml files ==="
for case in eval/cases/case-001 eval/cases/case-011 eval/cases/case-015; do
if [ -f "$case/annotations.yaml" ]; then
echo -e "\n--- $case/annotations.yaml ---"
cat "$case/annotations.yaml"
fi
done
echo -e "\n=== Search for 'expected_' fields across ALL annotations ==="
grep -r "expected_" eval/cases/*/annotations.yaml 2>/dev/null | head -20
echo -e "\n=== All unique top-level keys in annotations files ==="
find eval/cases -name "annotations.yaml" -exec sh -c 'echo "=== $1 ==="; grep "^[a-z_]*:" "$1" | cut -d: -f1 | sort -u' _ {} \; | grep -v "^===" | sort | uniq -c | sort -rnRepository: openshift/machine-config-operator
Length of output: 1035
Schema documentation in eval.yaml does not match actual test case files.
The declared schema (lines 45-53) specifies annotations.yaml should contain expected_files, expected_mentions, expected_rejection, and expected_guidance fields. However, none of these fields exist in any of the test case annotations.
Actual annotation files contain entirely different fields: category (present in all cases), plus case-specific metadata like difficulty, complexity, topic, severity, constraint_type, use_case, focus, api_type, content_type, and constraints_tested.
Update the schema definition in eval.yaml to accurately reflect the actual structure of annotations.yaml files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval.yaml` around lines 43 - 53, The schema definition in the dataset section
of eval.yaml incorrectly documents the structure of annotations.yaml files,
listing fields like expected_files, expected_mentions, expected_rejection, and
expected_guidance that do not actually exist in the test case files. Update the
schema documentation to accurately reflect the actual fields present in
annotations.yaml, which include category (present in all cases) along with
case-specific metadata fields such as difficulty, complexity, topic, severity,
constraint_type, use_case, focus, api_type, content_type, and
constraints_tested. Remove all references to the non-existent expected_* fields.
| prompt: How does MachineConfig merging work when multiple MachineConfigs target the | ||
| same pool? What determines the order? | ||
| expected_files: | ||
| - ai-docs/decisions/adr-002-config-rendering.md | ||
| - ai-docs/domain/crds.md | ||
| expected_mentions: | ||
| - deterministic | ||
| - sorted | ||
| - name | ||
| - last-writer-wins |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Critical: expected_files and expected_mentions placed in wrong file; breaks evaluation contract.
Per eval.yaml dataset schema (lines 43–53), input.yaml should contain only the prompt field. Evaluation metadata (expected_files, expected_mentions, category, etc.) must go in annotations.yaml. Placing these fields in input.yaml breaks the downstream judge contract; the mentions_expected_keywords judge looks for annotations.get('expected_mentions') and will skip validation when these fields are absent from annotations.
Analysis.md explicitly documents this as bug #2: "expected fields in wrong file," and the first run confirms case-002 "Passed but vacuously (no expected_files in annotations)" despite supplying them in input.yaml.
Move lines 3–10 to eval/cases/case-002/annotations.yaml, keeping only the prompt in input.yaml.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/cases/case-002/input.yaml` around lines 1 - 10, The input.yaml file
currently contains evaluation metadata fields (expected_files,
expected_mentions, and their values) that violate the eval.yaml dataset schema.
According to the schema definition, input.yaml should contain only the prompt
field. Move the expected_files and expected_mentions fields along with their
values from input.yaml to annotations.yaml, ensuring input.yaml retains only the
prompt field. This separation is critical because the mentions_expected_keywords
judge looks for annotations.get('expected_mentions') to validate expected
keywords, and the current placement in the wrong file causes the validation to
be skipped.
| expected_files: | ||
| - ai-docs/domain/crds.md | ||
| - ai-docs/decisions/adr-001-on-cluster-layering.md | ||
| - docs/OnClusterLayering.md | ||
| expected_mentions: | ||
| - MachineOSConfig | ||
| - MachineOSBuild | ||
| - layering | ||
| - Build Controller |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚖️ Poor tradeoff
Move expected_files and expected_mentions to annotations.yaml.
Per the eval.yaml dataset schema (lines 43-53), expected_files and expected_mentions must be in annotations.yaml, not input.yaml. The downstream judges (mentions_expected_keywords and response_accuracy) access these via annotations.get('expected_mentions') and annotations.get('expected_files', []) respectively, so placing them in input.yaml makes them inaccessible at runtime.
This is the exact configuration bug reported in analysis.md (config bug #2): "expected_mentions, expected_files, category, rule, expected_guidance, expected_patterns must be in annotations.yaml for judges to access them via outputs['annotations']".
🔧 Proposed fix: Move expectations to annotations.yaml
Create/update eval/cases/case-003/annotations.yaml:
category: navigation
difficulty: medium
topic: OnClusterLayering
expected_files:
- ai-docs/domain/crds.md
- ai-docs/decisions/adr-001-on-cluster-layering.md
- docs/OnClusterLayering.md
expected_mentions:
- MachineOSConfig
- MachineOSBuild
- layering
- Build ControllerThen update eval/cases/case-003/input.yaml to contain only the prompt:
-prompt: I want to set up on-cluster layering to build custom OS images. Where can
- I find information about MachineOSConfig and MachineOSBuild, and how does the build
- process work?
-expected_files:
-- ai-docs/domain/crds.md
-- ai-docs/decisions/adr-001-on-cluster-layering.md
-- docs/OnClusterLayering.md
-expected_mentions:
-- MachineOSConfig
-- MachineOSBuild
-- layering
-- Build Controller
+prompt: I want to set up on-cluster layering to build custom OS images. Where can
+ I find information about MachineOSConfig and MachineOSBuild, and how does the build
+ process work?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/cases/case-003/input.yaml` around lines 4 - 12, The expected_files and
expected_mentions fields in input.yaml are in the wrong location. According to
the eval.yaml schema, these fields must be in annotations.yaml instead because
the downstream judges (mentions_expected_keywords and response_accuracy) access
them via annotations.get() calls. Remove the expected_files and
expected_mentions sections from input.yaml, and move them to annotations.yaml
alongside other metadata like category and difficulty. This ensures the judges
can properly access these expectations at runtime through the annotations
dictionary.
| category: navigation | ||
| difficulty: medium | ||
| topic: NodeDisruptionPolicy |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚖️ Poor tradeoff
Add missing expected_files and expected_mentions to annotations.yaml.
This annotations.yaml is incomplete per the eval.yaml schema (lines 43-53). It should include expected_files and expected_mentions, which currently reside (incorrectly) in the paired input.yaml. All expectation metadata belongs in annotations.yaml so judges can access it via annotations.get().
Additionally, the topic field is not defined in the eval.yaml schema and adds custom metadata that judges do not use.
🔧 Proposed fix: Populate annotations.yaml with expected_* fields
Update eval/cases/case-004/annotations.yaml:
category: navigation
difficulty: medium
-topic: NodeDisruptionPolicy
+expected_files:
+ - docs/NodeDisruptionPolicy.md
+expected_mentions:
+ - NodeDisruptionPolicy
+ - MachineConfiguration
+ - cluster📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| category: navigation | |
| difficulty: medium | |
| topic: NodeDisruptionPolicy | |
| category: navigation | |
| difficulty: medium | |
| expected_files: | |
| - docs/NodeDisruptionPolicy.md | |
| expected_mentions: | |
| - NodeDisruptionPolicy | |
| - MachineConfiguration | |
| - cluster |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/cases/case-004/annotations.yaml` around lines 1 - 3, Remove the topic
field from annotations.yaml as it is not defined in the eval.yaml schema. Add
the missing expected_files and expected_mentions fields to annotations.yaml by
moving them from the paired input.yaml file. This ensures all expectation
metadata is consolidated in annotations.yaml according to the eval.yaml schema
(lines 43-53) so judges can access it via annotations.get(), and removes
undefined custom metadata.
| category: component-usage | ||
| api_type: kubernetes-crd | ||
| use_case: basic-usage | ||
| difficulty: easy |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win
Annotations schema does not match eval.yaml contract.
According to eval.yaml:48-55, annotations.yaml must contain:
category,expected_files,expected_mentions,expected_rejection(anti-pattern only),expected_guidance(anti-pattern only)
This file has: category, api_type, use_case, difficulty — none of which are in the contract.
This mismatch is already documented in eval/runs/mco-docs-eval/2026-06-22-sonnet/analysis.md:40-58 as the root cause of judge failures (str object has no attribute get). Unless the schema is being revised, these fixtures must be corrected before evaluation can proceed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/cases/case-007/annotations.yaml` around lines 1 - 4, The
annotations.yaml file contains fields (api_type, use_case, difficulty) that do
not conform to the eval.yaml contract schema. Replace these incorrect fields
with the required fields specified in eval.yaml:48-55: keep category, add
expected_files, add expected_mentions, and conditionally add expected_rejection
and expected_guidance only if this is an anti-pattern case. Remove all fields
that are not part of the contract (api_type, use_case, difficulty) to ensure the
schema matches and prevents the judge failures caused by attempting to call
get() on string objects.
| category: authoring | ||
| content_type: machineconfig-resource | ||
| complexity: medium | ||
| constraints_tested: | ||
| - all-remote-sources-resolved-at-render-time | ||
| - deterministic-merge-ordering | ||
| - worker-pool-label-matching |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Add judge-required metadata fields to annotations.yaml.
The authoring_quality judge template (eval.yaml line 209–246) expects annotations.get('expected_patterns', []) and annotations.get('expected_documentation', []), but case-015/annotations.yaml is missing these fields. The expected patterns and documentation files are currently in input.yaml (lines 13–26), but judges cannot access them there.
Migrate to annotations.yaml:
expected_patterns: [...]from input.yaml lines 14–21expected_documentation: [...]from input.yaml lines 23–26
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/cases/case-015/annotations.yaml` around lines 1 - 7, Add the required
metadata fields to the annotations.yaml file in case-015 directory. The
authoring_quality judge template expects expected_patterns and
expected_documentation fields to be present in annotations, but they are
currently missing. Migrate the expected_patterns list (currently in input.yaml
lines 14-21) and the expected_documentation list (currently in input.yaml lines
23-26) from input.yaml to annotations.yaml so that the judge template can access
them via annotations.get() calls.
| category: authoring | ||
| content_type: machineconfigpool-and-kubeletconfig | ||
| complexity: complex | ||
| constraints_tested: | ||
| - custom-pool-must-inherit-from-worker | ||
| - kubeletconfig-controller-integration | ||
| - deterministic-rendering |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Add judge-required metadata fields to annotations.yaml.
The authoring_quality judge template expects annotations.get('expected_patterns', []) and annotations.get('expected_documentation', []), but case-016/annotations.yaml lacks these fields. The expected patterns and documentation are in input.yaml (lines 16–30), but judges read from annotations only.
Migrate to annotations.yaml:
expected_patterns: [...]from input.yaml lines 17–25expected_documentation: [...]from input.yaml lines 27–30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/cases/case-016/annotations.yaml` around lines 1 - 7, The
annotations.yaml file is missing the expected_patterns and
expected_documentation fields that the authoring_quality judge template
requires. Add the expected_patterns list from input.yaml lines 17-25 and the
expected_documentation list from input.yaml lines 27-30 as top-level fields in
annotations.yaml to ensure the judge can properly access these required metadata
fields.
| expected_structure: | ||
| - MachineConfigPool resource definition | ||
| - MachineConfigPool metadata with name | ||
| - MachineConfigPool spec.machineConfigSelector with matchExpressions | ||
| - MachineConfigPool spec.nodeSelector | ||
| - KubeletConfig resource definition | ||
| - KubeletConfig metadata with name | ||
| - KubeletConfig spec.machineConfigPoolSelector | ||
| - KubeletConfig spec.kubeletConfig with systemReserved and cpuManagerPolicy | ||
| expected_patterns: | ||
| - Custom MachineConfigPool inherits from worker pool via machineConfigSelector using | ||
| matchExpressions with both worker and gpu-worker roles | ||
| - machineConfigSelector must NOT use independent matchLabels without worker inheritance | ||
| - 'nodeSelector matches node-role.kubernetes.io/gpu-worker: ''''' | ||
| - KubeletConfig machineConfigPoolSelector targets the gpu-worker pool | ||
| - apiVersion is machineconfiguration.openshift.io/v1 for both resources | ||
| - KubeletConfig kubeletConfig section uses valid kubelet configuration fields | ||
| - systemReserved memory is specified as '2Gi' | ||
| - cpuManagerPolicy is set to 'static' | ||
| expected_documentation: | ||
| - ai-docs/domain/crds.md | ||
| - ai-docs/architecture/components.md | ||
| - docs/KubeletConfigDesign.md | ||
| - docs/MachineConfig.md |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Relocate expected_ metadata from input.yaml to annotations.yaml.*
Lines 7–30 contain test metadata (expected_structure, expected_patterns, expected_documentation) that should reside in annotations.yaml per the judge contract (eval.yaml:209–246). Keep only the prompt field in input.yaml; move the metadata to case-016/annotations.yaml.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/cases/case-016/input.yaml` around lines 7 - 30, The test metadata fields
(expected_structure, expected_patterns, and expected_documentation) currently in
input.yaml at lines 7-30 should be relocated to annotations.yaml per the judge
contract requirements. Remove these three expected_* fields from input.yaml and
keep only the prompt field there. Then add the expected_structure,
expected_patterns, and expected_documentation fields to the
case-016/annotations.yaml file with the same content that was removed from
input.yaml.
| expected_rejection: true | ||
| expected_constraint: Custom MachineConfigPools must inherit from worker pool via machineConfigSelector | ||
| expected_documentation: | ||
| - ai-docs/architecture/components.md | ||
| correct_approach: Custom MachineConfigPools must inherit from the worker pool by including | ||
| the worker role in their machineConfigSelector, then adding additional selectors | ||
| for the custom role on top of that inheritance. |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Run output shows correct_approach instead of expected_guidance from schema.
The anti-pattern schema in eval.yaml:48-55 specifies expected_guidance as the field for correct approach keywords. This run snapshot uses correct_approach. Confirm whether this is an intentional schema evolution or a bug that slipped into the first run.
This inconsistency will prevent judges from correctly mapping the field during evaluation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-011/input.yaml` around
lines 5 - 11, The test case file case-011/input.yaml uses the field name
correct_approach when it should use expected_guidance to match the anti-pattern
schema definition in eval.yaml (lines 48-55). Replace the field name
correct_approach with expected_guidance in the YAML file to ensure consistency
with the schema specification and allow judges to correctly map this field
during evaluation.
| prompt: "I want to set up a NodeDisruptionPolicy to control how my nodes handle certain\ | ||
| \ config changes. I'm going to create a MachineConfiguration custom resource named\ | ||
| \ 'my-cluster-policy' with the disruption actions I need. Here's my YAML so far:\n\ | ||
| \napiVersion: operator.openshift.io/v1\nkind: MachineConfiguration\nmetadata:\n\ | ||
| \ name: my-cluster-policy\nspec:\n nodeDisruptionPolicy:\n ...\n\nCan you help\ | ||
| \ me fill in the spec?" | ||
| expected_rejection: true | ||
| expected_constraint: MachineConfiguration CR for NodeDisruptionPolicy must be named | ||
| 'cluster' | ||
| expected_documentation: | ||
| - docs/NodeDisruptionPolicy.md | ||
| correct_approach: The MachineConfiguration custom resource for NodeDisruptionPolicy | ||
| must be named exactly 'cluster'. Change metadata.name from 'my-cluster-policy' to | ||
| 'cluster'. |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Run output shows correct_approach instead of expected_guidance from schema.
Same as case-011: the anti-pattern schema specifies expected_guidance, but this run uses correct_approach. Synchronize across all anti-pattern test cases and update the schema documentation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@eval/runs/mco-docs-eval/2026-06-22-sonnet/cases/case-012/input.yaml` around
lines 1 - 14, The input.yaml file uses the field name `correct_approach` but the
anti-pattern schema specifies that this field should be named
`expected_guidance`. Rename the `correct_approach` field to `expected_guidance`
in this test case to align with the schema specification. Ensure this change is
applied consistently across all anti-pattern test cases to maintain schema
compliance.
|
@jatinsu: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Agentic Docs example PR.
Summary by CodeRabbit