Summary
When reconstructing a cuDF table during H2D conversion, the offsets child of LIST columns is cast from INT32 to INT64. cuDF LIST columns require INT32 (cudf::size_type) offsets, so the resulting column is malformed: list algorithms (which read the offsets as size_type) misinterpret it and the sublists collapse.
Location
src/data/representation_converter.cpp:
reconstruct_column (H2D path), LIST branch: lines ~1147–1165 (the INT32→INT64 cast at ~1153–1158)
reconstruct_column_from_disk (disk→GPU path), LIST branch: lines ~1772–1789 (cast at ~1780–1783)
Root cause
The INT32→INT64 offsets cast is correct for STRING columns (cuDF's large-strings convention uses 64-bit offsets), but it was applied to the LIST branch as well. cuDF's lists_column_view and list algorithms treat the offsets child as size_type (INT32). make_lists_column does not validate or re-cast the offsets type, so the malformed column is constructed silently.
Expected
For LIST columns, the offsets child should remain INT32 after reconstruction. The INT64 promotion should apply only to STRING offsets.
Actual
LIST offsets are promoted to INT64; downstream cuDF list operations read the offsets incorrectly and collapse the sublists.
Impact / workaround
We currently work around it by recasting every LIST column's offsets back to INT32 immediately after the H2D conversion.
Environment
- cuCascade at commit
b4abc2d64cc9ade1efe252f69d27cd2300d9c94e
- libcudf 26.06
Summary
When reconstructing a cuDF table during H2D conversion, the offsets child of LIST columns is cast from INT32 to INT64. cuDF LIST columns require INT32 (
cudf::size_type) offsets, so the resulting column is malformed: list algorithms (which read the offsets assize_type) misinterpret it and the sublists collapse.Location
src/data/representation_converter.cpp:reconstruct_column(H2D path), LIST branch: lines ~1147–1165 (the INT32→INT64 cast at ~1153–1158)reconstruct_column_from_disk(disk→GPU path), LIST branch: lines ~1772–1789 (cast at ~1780–1783)Root cause
The INT32→INT64 offsets cast is correct for STRING columns (cuDF's large-strings convention uses 64-bit offsets), but it was applied to the LIST branch as well. cuDF's
lists_column_viewand list algorithms treat the offsets child assize_type(INT32).make_lists_columndoes not validate or re-cast the offsets type, so the malformed column is constructed silently.Expected
For LIST columns, the offsets child should remain INT32 after reconstruction. The INT64 promotion should apply only to STRING offsets.
Actual
LIST offsets are promoted to INT64; downstream cuDF list operations read the offsets incorrectly and collapse the sublists.
Impact / workaround
We currently work around it by recasting every LIST column's offsets back to INT32 immediately after the H2D conversion.
Environment
b4abc2d64cc9ade1efe252f69d27cd2300d9c94e