Skip to content

Remove libcudf dependency from cuCascade #142

@vyasr

Description

@vyasr

Remove libcudf dependency from cuCascade

Summary

cuCascade should be a generic tiered GPU memory/data manager. Currently some components of the library are tightly coupled to cudf, introducing a dependency that many consumers do not want. The cudf-specific concrete representations (gpu_table_representation, host_data_representation, host_data_packed_representation) are, for the moment, specific to Sirius and should be moved there. Currently, cuCascade's public API exposes cudf::table, cudf::table_view, and cudf::type_id directly, and links cudf::cudf as a PUBLIC dependency. As part of the removal of cudf-specific types from cuCascade, the public APIs should also avoid exposing these cudf types. In the future if there is a common use case for cudf-aware representations in other projects, they can be added back as separate plugins without forcing every cuCascade consumer to depend on cudf.

Motivation

  • Build isolation: cuCascade consumers should only need RMM + CUDA, not the entire libcudf
  • Clean architecture: cuCascade's abstractions (idata_representation, converter registry, data_batch, disk I/O backends) are already generic
  • Separation of concerns: The 1800-line representation_converter.cpp understands cudf's nested column model (STRING/LIST/STRUCT/DICTIONARY reconstruction) — this is Sirius domain logic, not generic memory management

Note that RMM stays as a dependency. rmm::mr::device_memory_resource and rmm::cuda_stream_view are core parts of cuCascade's memory management abstractions.

Current State

What's already generic (stays in cuCascade)

Component Location Notes
idata_representation interface include/cucascade/data/common.hpp Abstract base, no cudf types
representation_converter_registry include/cucascade/data/representation_converter.hpp Type-erased plugin registry
data_batch / read_only_data_batch / mutable_data_batch include/cucascade/data/data_batch.hpp Generic batch lifecycle
data_repository / data_repository_manager include/cucascade/data/data_repository.hpp Generic collections
idisk_io_backend + pipeline backend include/cucascade/data/disk_io_backend.hpp, src/data/pipeline_io_backend.cpp Generic disk I/O (CUDA runtime + POSIX O_DIRECT)
disk_data_representation include/cucascade/data/disk_data_representation.hpp No cudf types in header
disk_file_format.hpp include/cucascade/data/disk_file_format.hpp Only defines alignment
Entire memory subsystem include/cucascade/memory/ (except host_table.hpp) RMM-only
Topology discovery include/cucascade/memory/topology_discovery.hpp Neither RMM nor cudf

What must be extracted (moves to Sirius)

Component Location cudf Symbols Used
gpu_table_representation include/cucascade/data/gpu_data_representation.hpp cudf::table, cudf::table_view
host_data_representation include/cucascade/data/cpu_data_representation.hpp Semantically cudf column-tree oriented
host_data_packed_representation include/cucascade/data/cpu_data_representation.hpp Based on cudf::pack output
host_table.hpp (column_metadata) include/cucascade/memory/host_table.hpp cudf::type_id, cudf::size_type
host_table_packed.hpp include/cucascade/memory/host_table_packed.hpp Semantically tied to cudf::pack
Built-in converters src/data/representation_converter.cpp Deep cudf: pack/unpack, column reconstruction, type dispatch, STRING/LIST/STRUCT/DICTIONARY handling
register_builtin_converters() src/data/representation_converter.cpp Registers all cudf-specific conversions
gpu_data_representation.cpp src/data/gpu_data_representation.cpp cudf::table, cudf::copying
bandwidth_profiler.cpp src/data/bandwidth_profiler.cpp Uses cudf to generate synthetic tables

CMake coupling (must change)

These requirements should be removed from cuCascade's CMakeLists.txt and config:

# CMakeLists.txt:64 - currently REQUIRED
find_package(cudf REQUIRED CONFIG)

# CMakeLists.txt:153 - currently PUBLIC link
set(CUCASCADE_PUBLIC_LINK_LIBS rmm::rmm cudf::cudf CUDA::cudart_static Threads::Threads ${NUMA_LIB})

# cmake/cuCascadeConfig.cmake.in:9 - propagates to consumers
find_dependency(cudf REQUIRED CONFIG)

Sirius Breakage Surface

Sirius consumes cuCascade as a git submodule (add_subdirectory(cucascade ...)).Sirius must be updated to include the necessary types before cuCascade can remove them.

End State

After all phases:

  • cuCascade depends on: RMM, CUDA runtime, NVML, libnuma
  • cuCascade does NOT depend on: libcudf
  • cuCascade provides: tiered memory management, generic data batch lifecycle, disk I/O backends, converter registry infrastructure
  • Sirius provides: cudf-specific representations, cudf-specific converters, all cudf domain logic
  • Sirius depends on: cuCascade (for infrastructure) + libcudf (for its own data types)

Notes for Implementation

  • The disk_data_representation header is already generic and stays — but the DISK converters that serialize cudf column trees to disk format move to Sirius
  • The disk file format itself (disk_file_header, 4KB alignment) stays in cuCascade — it's not cudf-specific
  • disk_data_representation's destructor (RAII file deletion) stays in cuCascade
  • The representation_converter_fn signature uses idata_representation — it doesn't need cudf types

Acceptance Criteria

  • find_package(cudf) does not appear in cuCascade's CMakeLists.txt
  • cudf::cudf does not appear in any link target
  • No #include <cudf/...> in any cuCascade source or header file
  • pixi run cmake-release && pixi run build-release succeeds without cudf installed
  • cuCascade tests pass using only synthetic/mock data representations
  • Sirius builds and tests pass against the cudf-free cuCascade (using its own representation types)
  • cuCascade's disk I/O backends remain functional (tested with raw GPU buffer writes/reads)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions