fix: Support CP2K 2025 output format for energy and forces (fixes #850) by newtontech · Pull Request #978 · deepmodeling/dpdata

newtontech · 2026-06-18T15:13:18Z

This is a recreation of #947 which was closed because the head repository was deleted.

Adds support for parsing CP2K 2025 version output files.

Changes in CP2K 2025 format:

Energy line format changed from: ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): to: ENERGY| Total FORCE_EVAL ( QS ) energy [hartree]
Forces output format changed from ATOMIC FORCES in [a.u.] table to FORCES| Atomic forces [hartree/bohr] with FORCES| Atom x y z |f| prefix lines

Implementation:

Detect CP2K 2025 format by checking for 'energy [hartree]' in the content
Parse energy from new '[hartree]' format
Parse forces from new 'FORCES|' prefixed lines
Maintain backward compatibility with CP2K 2023 format

Review comments addressed since #947:

Raise clear RuntimeError when energy cannot be parsed from CP2K 2025 format
Fix literal \n in test fixture
Replace truthiness checks with explicit None checks in test helper
Add numeric value assertions to edge case tests
Add force value assertions to header filtering tests

Testing:

Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/)
Added regression test for CP2K 2023 backward compatibility
Added edge case tests for whitespace, header lines, and atomic forces variants
All 110 CP2K tests pass
Previously approved by @wanghan-iapcm

Summary by CodeRabbit

Release Notes

New Features
- CP2K 2025 output format is now supported with enhanced energy and force data extraction capabilities
Tests
- Comprehensive test coverage added for CP2K 2025 format parsing, including edge cases and backward compatibility validation for earlier CP2K versions

…pmodeling#850) This commit adds support for parsing CP2K 2025 version output files: **Changes in CP2K 2025 format:** 1. Energy line format changed from: 'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343' to: 'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725' 2. Forces output format changed from: 'ATOMIC FORCES in [a.u.]' table with ' Atom Kind Element X Y Z' header to: 'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines **Implementation:** - Detect CP2K 2025 format by checking for 'energy [hartree]' in the content - Parse energy from new '[hartree]' format - Parse forces from new 'FORCES|' prefixed lines - Maintain backward compatibility with CP2K 2023 format **Testing:** - Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/) - Added test case TestCp2k2025Output to verify parsing - Added regression test TestCp2k2023FormatStillWorks to ensure backward compatibility - All existing CP2K tests pass

for more information, see https://pre-commit.ci

- Add tests for energy parsing with extra whitespace - Add tests for FORCES| header line filtering (Atom x y z, Atomic forces) - Add integration test for CP2K 2025 format with LabeledSystem - Improve code coverage for CP2K 2025 format support

for more information, see https://pre-commit.ci

- Raise clear RuntimeError when energy cannot be parsed from CP2K 2025 line - Fix literal backslash-n in test fixture line 71 - Replace truthiness checks with explicit None checks in test helper - Add numeric value assertions to edge case tests - Add force value assertions to header filtering tests

coderabbitai · 2026-06-18T15:17:58Z

📝 Walkthrough

Walkthrough

get_frames in the CP2K output parser gains a is_cp2k_2025 flag, set by detecting energy [hartree] in file content. Energy and force parsing then branch on this flag: the 2025 path uses token-based extraction and FORCES| line scanning; the prior fixed-field/state-machine path is kept for older formats. A fixture file and a new test module with integration, regression, and edge-case tests are added.

Changes

CP2K 2025 format support

Layer / File(s)	Summary
Format detection and energy/force parsing `dpdata/formats/cp2k/output.py`	Adds `is_cp2k_2025` detection via `energy [hartree]` header check, then branches energy extraction to a token/float-fallback path and force extraction to `FORCES
CP2K 2025 test fixture `tests/cp2k/cp2k_2025_output/cp2k_2025_output`, `tests/cp2k/cp2k_2025_output/deepmd/type.raw`, `tests/cp2k/cp2k_2025_output/deepmd/type_map.raw`	Adds a complete CP2K 2025 run transcript (banner through timing footer) and matching deepmd reference type files used by integration tests.
Test module `tests/test_cp2k_2025_output.py`	Adds `TestCp2k2025Output` (energy and forces assertions against fixture), `TestCp2k2023FormatStillWorks` (regression guard), and `TestCp2k2025EdgeCases` (temporary-file tests for whitespace in energy lines and header skipping in force blocks).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding CP2K 2025 output format support with backward compatibility, directly matching the PR's primary objective.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dpdata/formats/cp2k/output.py (1)
405-483: ⚠️ Potential issue | 🟠 Major

Fix ruff linting errors in this file.

This file has 2 linting issues found by ruff check that must be fixed to comply with coding guidelines:

Line 118: Rename unused loop variable ii to _ii

Line 534: Prefix unused variable tmp_names with an underscore

While the code changes at lines 405-483 themselves appear compliant, the file contains linting violations elsewhere that must be resolved before committing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/formats/cp2k/output.py` around lines 405 - 483, The file has two ruff
linting violations that need to be fixed: (1) on line 118, rename the unused
loop variable `ii` to `_ii` to indicate it is intentionally unused, and (2) on
line 534, prefix the unused variable `tmp_names` with an underscore to make it
`_tmp_names`. These changes follow Python naming conventions for variables that
are intentionally not used in the code.
Source: Coding guidelines

🧹 Nitpick comments (1)

tests/test_cp2k_2025_output.py (1)

11-212: ⚡ Quick win

Consider adding test for energy parsing error condition.

The parser raises a RuntimeError when energy parsing fails (dpdata/formats/cp2k/output.py:455-457), but there's no test coverage for this error path. Adding a test that provides a malformed energy line and asserts the expected exception would improve coverage.

🧪 Suggested test for error condition

def test_cp2k2025_energy_parsing_failure_raises_error(self):
    """Test that malformed energy line raises RuntimeError with clear message."""
    fname = self.create_cp2k_output_2025(
        energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] invalid"
    )
    try:
        with self.assertRaises(RuntimeError) as cm:
            dpdata.LabeledSystem(fname, fmt="cp2k/output")
        self.assertIn("Cannot parse energy from CP2K 2025 output", str(cm.exception))
    finally:
        os.unlink(fname)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cp2k_2025_output.py` around lines 11 - 212, Add a new test method
to the TestCp2k2025EdgeCases class to verify that the energy parser properly
raises a RuntimeError when encountering a malformed energy line. The test should
call create_cp2k_output_2025() with an energy_line parameter containing invalid
data (e.g., a non-numeric value where the energy should be), then use
assertRaises to verify that dpdata.LabeledSystem raises a RuntimeError when
attempting to parse the file, and optionally verify the error message contains
expected text like "Cannot parse energy from CP2K 2025 output". Remember to
clean up the temporary file in a finally block after the test completes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dpdata/formats/cp2k/output.py`:
- Around line 405-406: Fix the ruff linting violations in the file
dpdata/formats/cp2k/output.py by renaming the unused loop variable ii to _ii on
line 118 to comply with the B007 rule, and by prefixing the unused unpacked
variable tmp_names with an underscore to become _tmp_names on line 534 to comply
with the RUF059 rule. These changes follow the convention of marking unused
variables with a leading underscore to satisfy linting requirements.

---

Outside diff comments:
In `@dpdata/formats/cp2k/output.py`:
- Around line 405-483: The file has two ruff linting violations that need to be
fixed: (1) on line 118, rename the unused loop variable `ii` to `_ii` to
indicate it is intentionally unused, and (2) on line 534, prefix the unused
variable `tmp_names` with an underscore to make it `_tmp_names`. These changes
follow Python naming conventions for variables that are intentionally not used
in the code.

---

Nitpick comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 11-212: Add a new test method to the TestCp2k2025EdgeCases class
to verify that the energy parser properly raises a RuntimeError when
encountering a malformed energy line. The test should call
create_cp2k_output_2025() with an energy_line parameter containing invalid data
(e.g., a non-numeric value where the energy should be), then use assertRaises to
verify that dpdata.LabeledSystem raises a RuntimeError when attempting to parse
the file, and optionally verify the error message contains expected text like
"Cannot parse energy from CP2K 2025 output". Remember to clean up the temporary
file in a finally block after the test completes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7409312a-3219-49a4-8a0e-ceb4c476fc28

📥 Commits

Reviewing files that changed from the base of the PR and between 1b63c9b and 62a3892.

📒 Files selected for processing (10)

dpdata/formats/cp2k/output.py
tests/cp2k/cp2k_2025_output/cp2k_2025_output
tests/cp2k/cp2k_2025_output/deepmd/set.000/box.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/coord.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/energy.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/force.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/virial.npy
tests/cp2k/cp2k_2025_output/deepmd/type.raw
tests/cp2k/cp2k_2025_output/deepmd/type_map.raw
tests/test_cp2k_2025_output.py

coderabbitai · 2026-06-18T15:18:01Z

+    # Check if this is CP2K 2025 format
+    is_cp2k_2025 = "energy [hartree]" in content


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify that "energy [hartree]" does not appear in CP2K 2023 format files # Search for the detection string in CP2K 2023 test fixtures rg -n 'energy \[hartree\]' tests/cp2k/ -g '!*2025*'

Repository: deepmodeling/dpdata

Length of output: 45

🏁 Script executed:

# Verify the detection string appears in CP2K 2025 format files rg -n 'energy \[hartree\]' tests/cp2k/ -g '*2025*'

Repository: deepmodeling/dpdata

Length of output: 180

🏁 Script executed:

# Check ruff linting compliance for the file cd dpdata && ruff check formats/cp2k/output.py

Repository: deepmodeling/dpdata

Length of output: 1286

🏁 Script executed:

# Inspect the code context around lines 405-406 sed -n '400,490p' dpdata/formats/cp2k/output.py | cat -n

Repository: deepmodeling/dpdata

Length of output: 4362

Fix ruff linting violations before committing.

The format detection via substring search works as intended—the string "energy [hartree]" correctly discriminates CP2K 2025 format from 2023 format and appears only in the former. However, the file has linting violations that must be resolved per the coding guidelines:

Line 118: Rename unused loop variable ii to _ii (B007)

Line 534: Prefix unused unpacked variable tmp_names with underscore (RUF059)

Run ruff check --fix dpdata/formats/cp2k/output.py to address these issues.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dpdata/formats/cp2k/output.py` around lines 405 - 406, Fix the ruff linting violations in the file dpdata/formats/cp2k/output.py by renaming the unused loop variable ii to _ii on line 118 to comply with the B007 rule, and by prefixing the unused unpacked variable tmp_names with an underscore to become _tmp_names on line 534 to comply with the RUF059 rule. These changes follow the convention of marking unused variables with a leading underscore to satisfy linting requirements.

OpenClaw Bot and others added 5 commits June 18, 2026 23:10

[pre-commit.ci] auto fixes from pre-commit.com hooks

f3909eb

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

8a6d7b5

for more information, see https://pre-commit.ci

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. cp2k dpdata enhancement New feature or request labels Jun 18, 2026

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Support CP2K 2025 output format for energy and forces (fixes #850)#978

fix: Support CP2K 2025 output format for energy and forces (fixes #850)#978
newtontech wants to merge 5 commits into
deepmodeling:masterfrom
newtontech:fix-cp2k-2025-format-v2

newtontech commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Check if this is CP2K 2025 format
		is_cp2k_2025 = "energy [hartree]" in content

Conversation

newtontech commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

newtontech commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading