fix: Support CP2K 2025 output format for energy and forces (fixes #850)#978
fix: Support CP2K 2025 output format for energy and forces (fixes #850)#978newtontech wants to merge 5 commits into
Conversation
…pmodeling#850) This commit adds support for parsing CP2K 2025 version output files: **Changes in CP2K 2025 format:** 1. Energy line format changed from: 'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343' to: 'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725' 2. Forces output format changed from: 'ATOMIC FORCES in [a.u.]' table with ' Atom Kind Element X Y Z' header to: 'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines **Implementation:** - Detect CP2K 2025 format by checking for 'energy [hartree]' in the content - Parse energy from new '[hartree]' format - Parse forces from new 'FORCES|' prefixed lines - Maintain backward compatibility with CP2K 2023 format **Testing:** - Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/) - Added test case TestCp2k2025Output to verify parsing - Added regression test TestCp2k2023FormatStillWorks to ensure backward compatibility - All existing CP2K tests pass
for more information, see https://pre-commit.ci
- Add tests for energy parsing with extra whitespace - Add tests for FORCES| header line filtering (Atom x y z, Atomic forces) - Add integration test for CP2K 2025 format with LabeledSystem - Improve code coverage for CP2K 2025 format support
for more information, see https://pre-commit.ci
- Raise clear RuntimeError when energy cannot be parsed from CP2K 2025 line - Fix literal backslash-n in test fixture line 71 - Replace truthiness checks with explicit None checks in test helper - Add numeric value assertions to edge case tests - Add force value assertions to header filtering tests
📝 WalkthroughWalkthrough
ChangesCP2K 2025 format support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dpdata/formats/cp2k/output.py (1)
405-483:⚠️ Potential issue | 🟠 MajorFix ruff linting errors in this file.
This file has 2 linting issues found by ruff check that must be fixed to comply with coding guidelines:
- Line 118: Rename unused loop variable
iito_ii- Line 534: Prefix unused variable
tmp_nameswith an underscoreWhile the code changes at lines 405-483 themselves appear compliant, the file contains linting violations elsewhere that must be resolved before committing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dpdata/formats/cp2k/output.py` around lines 405 - 483, The file has two ruff linting violations that need to be fixed: (1) on line 118, rename the unused loop variable `ii` to `_ii` to indicate it is intentionally unused, and (2) on line 534, prefix the unused variable `tmp_names` with an underscore to make it `_tmp_names`. These changes follow Python naming conventions for variables that are intentionally not used in the code.Source: Coding guidelines
🧹 Nitpick comments (1)
tests/test_cp2k_2025_output.py (1)
11-212: ⚡ Quick winConsider adding test for energy parsing error condition.
The parser raises a
RuntimeErrorwhen energy parsing fails (dpdata/formats/cp2k/output.py:455-457), but there's no test coverage for this error path. Adding a test that provides a malformed energy line and asserts the expected exception would improve coverage.🧪 Suggested test for error condition
def test_cp2k2025_energy_parsing_failure_raises_error(self): """Test that malformed energy line raises RuntimeError with clear message.""" fname = self.create_cp2k_output_2025( energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] invalid" ) try: with self.assertRaises(RuntimeError) as cm: dpdata.LabeledSystem(fname, fmt="cp2k/output") self.assertIn("Cannot parse energy from CP2K 2025 output", str(cm.exception)) finally: os.unlink(fname)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cp2k_2025_output.py` around lines 11 - 212, Add a new test method to the TestCp2k2025EdgeCases class to verify that the energy parser properly raises a RuntimeError when encountering a malformed energy line. The test should call create_cp2k_output_2025() with an energy_line parameter containing invalid data (e.g., a non-numeric value where the energy should be), then use assertRaises to verify that dpdata.LabeledSystem raises a RuntimeError when attempting to parse the file, and optionally verify the error message contains expected text like "Cannot parse energy from CP2K 2025 output". Remember to clean up the temporary file in a finally block after the test completes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dpdata/formats/cp2k/output.py`:
- Around line 405-406: Fix the ruff linting violations in the file
dpdata/formats/cp2k/output.py by renaming the unused loop variable ii to _ii on
line 118 to comply with the B007 rule, and by prefixing the unused unpacked
variable tmp_names with an underscore to become _tmp_names on line 534 to comply
with the RUF059 rule. These changes follow the convention of marking unused
variables with a leading underscore to satisfy linting requirements.
---
Outside diff comments:
In `@dpdata/formats/cp2k/output.py`:
- Around line 405-483: The file has two ruff linting violations that need to be
fixed: (1) on line 118, rename the unused loop variable `ii` to `_ii` to
indicate it is intentionally unused, and (2) on line 534, prefix the unused
variable `tmp_names` with an underscore to make it `_tmp_names`. These changes
follow Python naming conventions for variables that are intentionally not used
in the code.
---
Nitpick comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 11-212: Add a new test method to the TestCp2k2025EdgeCases class
to verify that the energy parser properly raises a RuntimeError when
encountering a malformed energy line. The test should call
create_cp2k_output_2025() with an energy_line parameter containing invalid data
(e.g., a non-numeric value where the energy should be), then use assertRaises to
verify that dpdata.LabeledSystem raises a RuntimeError when attempting to parse
the file, and optionally verify the error message contains expected text like
"Cannot parse energy from CP2K 2025 output". Remember to clean up the temporary
file in a finally block after the test completes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7409312a-3219-49a4-8a0e-ceb4c476fc28
📒 Files selected for processing (10)
dpdata/formats/cp2k/output.pytests/cp2k/cp2k_2025_output/cp2k_2025_outputtests/cp2k/cp2k_2025_output/deepmd/set.000/box.npytests/cp2k/cp2k_2025_output/deepmd/set.000/coord.npytests/cp2k/cp2k_2025_output/deepmd/set.000/energy.npytests/cp2k/cp2k_2025_output/deepmd/set.000/force.npytests/cp2k/cp2k_2025_output/deepmd/set.000/virial.npytests/cp2k/cp2k_2025_output/deepmd/type.rawtests/cp2k/cp2k_2025_output/deepmd/type_map.rawtests/test_cp2k_2025_output.py
| # Check if this is CP2K 2025 format | ||
| is_cp2k_2025 = "energy [hartree]" in content |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify that "energy [hartree]" does not appear in CP2K 2023 format files
# Search for the detection string in CP2K 2023 test fixtures
rg -n 'energy \[hartree\]' tests/cp2k/ -g '!*2025*'Repository: deepmodeling/dpdata
Length of output: 45
🏁 Script executed:
# Verify the detection string appears in CP2K 2025 format files
rg -n 'energy \[hartree\]' tests/cp2k/ -g '*2025*'Repository: deepmodeling/dpdata
Length of output: 180
🏁 Script executed:
# Check ruff linting compliance for the file
cd dpdata && ruff check formats/cp2k/output.pyRepository: deepmodeling/dpdata
Length of output: 1286
🏁 Script executed:
# Inspect the code context around lines 405-406
sed -n '400,490p' dpdata/formats/cp2k/output.py | cat -nRepository: deepmodeling/dpdata
Length of output: 4362
Fix ruff linting violations before committing.
The format detection via substring search works as intended—the string "energy [hartree]" correctly discriminates CP2K 2025 format from 2023 format and appears only in the former. However, the file has linting violations that must be resolved per the coding guidelines:
- Line 118: Rename unused loop variable
iito_ii(B007) - Line 534: Prefix unused unpacked variable
tmp_nameswith underscore (RUF059)
Run ruff check --fix dpdata/formats/cp2k/output.py to address these issues.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dpdata/formats/cp2k/output.py` around lines 405 - 406, Fix the ruff linting
violations in the file dpdata/formats/cp2k/output.py by renaming the unused loop
variable ii to _ii on line 118 to comply with the B007 rule, and by prefixing
the unused unpacked variable tmp_names with an underscore to become _tmp_names
on line 534 to comply with the RUF059 rule. These changes follow the convention
of marking unused variables with a leading underscore to satisfy linting
requirements.
This is a recreation of #947 which was closed because the head repository was deleted.
Adds support for parsing CP2K 2025 version output files.
Changes in CP2K 2025 format:
ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.):to:ENERGY| Total FORCE_EVAL ( QS ) energy [hartree]ATOMIC FORCES in [a.u.]table toFORCES| Atomic forces [hartree/bohr]withFORCES| Atom x y z |f|prefix linesImplementation:
Review comments addressed since #947:
\nin test fixtureTesting:
Summary by CodeRabbit
Release Notes
New Features
Tests