Skip to content

H2D/disk reconstruction casts LIST offsets child to INT64, producing malformed cuDF LIST columns #147

@yayen-lin

Description

@yayen-lin

Summary

When reconstructing a cuDF table during H2D conversion, the offsets child of LIST columns is cast from INT32 to INT64. cuDF LIST columns require INT32 (cudf::size_type) offsets, so the resulting column is malformed: list algorithms (which read the offsets as size_type) misinterpret it and the sublists collapse.

Location

src/data/representation_converter.cpp:

  • reconstruct_column (H2D path), LIST branch: lines ~1147–1165 (the INT32→INT64 cast at ~1153–1158)
  • reconstruct_column_from_disk (disk→GPU path), LIST branch: lines ~1772–1789 (cast at ~1780–1783)

Root cause

The INT32→INT64 offsets cast is correct for STRING columns (cuDF's large-strings convention uses 64-bit offsets), but it was applied to the LIST branch as well. cuDF's lists_column_view and list algorithms treat the offsets child as size_type (INT32). make_lists_column does not validate or re-cast the offsets type, so the malformed column is constructed silently.

Expected

For LIST columns, the offsets child should remain INT32 after reconstruction. The INT64 promotion should apply only to STRING offsets.

Actual

LIST offsets are promoted to INT64; downstream cuDF list operations read the offsets incorrectly and collapse the sublists.

Impact / workaround

We currently work around it by recasting every LIST column's offsets back to INT32 immediately after the H2D conversion.

Environment

  • cuCascade at commit b4abc2d64cc9ade1efe252f69d27cd2300d9c94e
  • libcudf 26.06

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions