Skip to content

[Tile] Mark alignment helpers as _CCCL_HOST_DEVICE_API#9487

Open
miscco wants to merge 3 commits into
NVIDIA:mainfrom
miscco:tile_disable_builtin_assume_aligned
Open

[Tile] Mark alignment helpers as _CCCL_HOST_DEVICE_API#9487
miscco wants to merge 3 commits into
NVIDIA:mainfrom
miscco:tile_disable_builtin_assume_aligned

Conversation

@miscco

@miscco miscco commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Currently tile does not support __builtin_assume_aligned

Previously we would just disable the whole codepath but with added support for __builtin_is_constant_evaluated it became active again.

@miscco miscco requested a review from a team as a code owner June 16, 2026 09:51
@miscco miscco requested a review from pciolkosz June 16, 2026 09:51
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 16, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 16, 2026
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8aeb4488-52dc-43c9-bc8d-95f66a2c6d23

📥 Commits

Reviewing files that changed from the base of the PR and between 5dbf44a and 41d1720.

📒 Files selected for processing (7)
  • libcudacxx/test/libcudacxx/cuda/utilities/expected/device_only_types.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/utilities/expected/expected.void/device_only_types.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/utilities/optional/device_only_types.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/utilities/tuple/device_only_types.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/utilities/unexpected/device_only_types.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/utilities/utility/pair/device_only_types.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/utilities/variant/device_only_types.pass.cpp

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Overview

This PR updates alignment helper functions to use the _CCCL_HOST_DEVICE_API attribute instead of _CCCL_API. This change addresses a compatibility issue with Tile, which does not support the __builtin_assume_aligned compiler builtin. With the recent addition of support for __builtin_is_constant_evaluated, the code paths using __builtin_assume_aligned became active again in Tile mode, necessitating this adjustment.

Changes to Library Code

Alignment helper functions in libcudacxx/include/cuda/__memory/ and libcudacxx/include/cuda/std/__memory/:

  • align_down.h: Updated align_down(_Tp*, size_t) annotation from _CCCL_API to _CCCL_HOST_DEVICE_API
  • align_up.h: Updated align_up(_Tp*, size_t) annotation from _CCCL_API to _CCCL_HOST_DEVICE_API
  • ptr_rebind.h: Updated all four ptr_rebind template overloads (non-const, const, volatile, and const volatile pointer variants) from _CCCL_API to _CCCL_HOST_DEVICE_API
  • align.h: Updated align(size_t, size_t, void*&, size_t&) annotation from _CCCL_API to _CCCL_HOST_DEVICE_API
  • assume_aligned.h: Updated assume_aligned(_Tp*) annotation from _CCCL_API to _CCCL_HOST_DEVICE_API
  • runtime_assume_aligned.h: Updated __runtime_assume_aligned(_Tp*, size_t) annotation from _CCCL_API to _CCCL_HOST_DEVICE_API and added [[maybe_unused]] annotation to the __alignment parameter

mdspan accessor:

  • aligned_accessor.h: Updated the access and offset member functions from _CCCL_API to _CCCL_HOST_DEVICE_API

Changes to Tests

Test annotation updates for Tile compatibility:

  • Updated tile failure diagnostics in align_down.pass.cpp, align_up.pass.cpp, and align.pass.cpp from "asm statement is unsupported in tile code" to nvbug6327166 (Internal Compiler Error related to unknown tile builtin function)
  • Added UNSUPPORTED: enable-tile directives with nvbug6327166 comments to ptr_rebind.pass.cpp, assume_aligned.pass.cpp, and assume_aligned.runfail.cpp

Test guard adjustments:

  • proclaim_return_type.pass.cpp: Updated test annotations to use TEST_TILE_FUNC and TEST_TILE_DEVICE_FUNC instead of TEST_DEVICE_FUNC for tiled compilation paths
  • copy_backward.pass.cpp, copy_n.pass.cpp, copy_rand.pass.cpp: Restricted device-only iterator tests to run only when TEST_CUDA_COMPILATION() && !_CCCL_TILE_COMPILATION() rather than just TEST_CUDA_COMPILATION()
  • aligned_accessor.pass.cpp: Added UNSUPPORTED: enable-tile directive with nvbug6327166 comment
  • extended_data_types.fp8.fail.cpp: Tightened guard to check both !_CCCL_HAS_NVFP8() AND !_CCCL_TILE_COMPILATION()
  • device_fp128_functions.pass.cpp: Added UNSUPPORTED: enable-tile directive
  • 16b_integral_ref.pass.cpp: Marked enable-tile as XFAIL with expected diagnostic about asm statement unsupport in tile code
  • ctor.pass.cpp and get.pass.cpp in format tests: Added UNSUPPORTED: enable-tile directives with expected errors about bit field read/write in tile code
  • device_only_types.pass.cpp files (in expected, optional, tuple, unexpected, pair, variant test directories): Removed _CCCL_DEVICE_COMPILATION() preprocessor guards around device test functions, allowing them to be compiled unconditionally
  • assume_aligned.runfail.cpp: Updated to include <cuda/std/memory> instead of the internal header

Impact

The PR maintains the public API signatures and logic of all affected functions while adjusting their compilation context annotations. The changes enable these alignment helpers to be callable from both host and device code, working around the Tile limitation by allowing device compilation paths to proceed with the new attribute marking.

Walkthrough

Seven memory and alignment library functions transition from _CCCL_API to _CCCL_HOST_DEVICE_API for host+device compilation. Corresponding tests update tile-codegen failure expectations to reference nvbug6327166 and adjust include paths. Functional and algorithm tests refine tile-compilation handling via annotations and narrowed preprocessor conditions. Broader test suite gains tile-compilation UNSUPPORTED/XFAIL directives. Device-only utility tests are widened to compile under both tile and device compilation targets by broadening their preprocessor guards.

Changes

Host+Device API Annotation and Tile Compilation Integration

Layer / File(s) Summary
_CCCL_API_CCCL_HOST_DEVICE_API on memory/alignment functions
libcudacxx/include/cuda/__memory/align_down.h, align_up.h, ptr_rebind.h, libcudacxx/include/cuda/std/__mdspan/aligned_accessor.h, libcudacxx/include/cuda/std/__memory/align.h, assume_aligned.h, runtime_assume_aligned.h
Seven functions and methods re-annotated for host+device compilation; __runtime_assume_aligned additionally marks __alignment as [[maybe_unused]].
Memory/alignment test tile-codegen failure markers
libcudacxx/test/libcudacxx/cuda/memory/align_down.pass.cpp, align_up.pass.cpp, ptr_rebind.pass.cpp, libcudacxx/test/.../mdspan/aligned_accessor.pass.cpp, libcudacxx/test/.../ptr.align/align.pass.cpp, assume_aligned.pass.cpp, assume_aligned.runfail.cpp
Prior asm-unsupported tile expectations replaced with nvbug6327166 ICE markers; assume_aligned.runfail.cpp switches to public <cuda/std/memory> header.
Functional test tile compilation annotations
libcudacxx/test/libcudacxx/cuda/functional/proclaim_return_type.pass.cpp
Lambda in tiled path updates from _CCCL_TILE() to TEST_TILE_FUNC; d_callable operator() overloads switch to TEST_TILE_DEVICE_FUNC.
Algorithm test device-only iterator tile exclusion
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/copy_backward.pass.cpp, copy_n.pass.cpp, copy_rand.pass.cpp
Device-only iterator test blocks narrowed from TEST_CUDA_COMPILATION() to TEST_CUDA_COMPILATION() && !_CCCL_TILE_COMPILATION() to exclude tile builds.
Tile UNSUPPORTED/XFAIL directives across broader test suite
libcudacxx/test/libcudacxx/libcxx/macros/extended_data_types.fp8.fail.cpp, libcudacxx/test/libcudacxx/libcxx/numerics/floating.point/device_fp128_functions.pass.cpp, libcudacxx/test/libcudacxx/std/atomics/atomics.types.generic/integral/16b_integral_ref.pass.cpp, libcudacxx/test/libcudacxx/std/text/format/format.fmt.string/ctor.pass.cpp, get.pass.cpp
fp8 test tightens condition to !_CCCL_HAS_NVFP8() && !_CCCL_TILE_COMPILATION(); fp128, atomic, and format tests gain enable-tile UNSUPPORTED/XFAIL directives with expected compiler errors.
Device-only utility tests compiled for tile+device
libcudacxx/test/libcudacxx/cuda/utilities/expected/device_only_types.pass.cpp, expected/expected.void/device_only_types.pass.cpp, optional/device_only_types.pass.cpp, tuple/device_only_types.pass.cpp, unexpected/device_only_types.pass.cpp, utility/pair/device_only_types.pass.cpp, variant/device_only_types.pass.cpp
Widen _CCCL_DEVICE_COMPILATION() guards to `_CCCL_TILE_COMPILATION()

Suggested reviewers

  • davebayer
  • Jacobfaib

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
libcudacxx/include/cuda/__memory/ptr_rebind.h (1)

34-34: ⚡ Quick win

suggestion: align the cv-qualified ptr_rebind overload annotations with Line 34 so the full overload set has a single callable-context contract and tile diagnostics happen at the same API boundary. As per coding guidelines, “Mark every function with the correct CCCL execution/availability macro … keep annotations consistent with the intended callable contexts.”

Source: Coding guidelines


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec8ac322-61c3-4c98-b881-96491cb0627a

📥 Commits

Reviewing files that changed from the base of the PR and between 910bea4 and 53ef7e0.

📒 Files selected for processing (23)
  • libcudacxx/include/cuda/__memory/align_down.h
  • libcudacxx/include/cuda/__memory/align_up.h
  • libcudacxx/include/cuda/__memory/ptr_rebind.h
  • libcudacxx/include/cuda/std/__mdspan/aligned_accessor.h
  • libcudacxx/include/cuda/std/__memory/align.h
  • libcudacxx/include/cuda/std/__memory/assume_aligned.h
  • libcudacxx/include/cuda/std/__memory/runtime_assume_aligned.h
  • libcudacxx/test/libcudacxx/cuda/functional/proclaim_return_type.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/memory/align_down.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/memory/align_up.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/memory/ptr_rebind.pass.cpp
  • libcudacxx/test/libcudacxx/libcxx/macros/extended_data_types.fp8.fail.cpp
  • libcudacxx/test/libcudacxx/libcxx/numerics/floating.point/device_fp128_functions.pass.cpp
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/copy_backward.pass.cpp
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/copy_n.pass.cpp
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/copy_rand.pass.cpp
  • libcudacxx/test/libcudacxx/std/atomics/atomics.types.generic/integral/16b_integral_ref.pass.cpp
  • libcudacxx/test/libcudacxx/std/containers/views/mdspan/mdspan.aligned_accessor/aligned_accessor.pass.cpp
  • libcudacxx/test/libcudacxx/std/text/format/format.fmt.string/ctor.pass.cpp
  • libcudacxx/test/libcudacxx/std/text/format/format.fmt.string/get.pass.cpp
  • libcudacxx/test/libcudacxx/std/utilities/memory/ptr.align/align.pass.cpp
  • libcudacxx/test/libcudacxx/std/utilities/memory/ptr.align/assume_aligned.pass.cpp
  • libcudacxx/test/libcudacxx/std/utilities/memory/ptr.align/assume_aligned.runfail.cpp

miscco added 2 commits June 16, 2026 12:04
Currently tile does not support `__builtin_assume_aligned`

Previously we would just disable the whole codepath but with added support for `__builtin_is_constant_evaluated` it became active again.

Rather than disabling it for all of CCCL with tile mode, we mark those functions that use the builtin as `_CCCL_HOST_DEVICE`
@miscco miscco force-pushed the tile_disable_builtin_assume_aligned branch from 9508476 to 5dbf44a Compare June 16, 2026 10:04
@github-actions

This comment has been minimized.

@miscco miscco force-pushed the tile_disable_builtin_assume_aligned branch from 5dbf44a to 41d1720 Compare June 16, 2026 14:13
@github-actions

Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 51m: Pass: 100%/118 | Total: 1d 08h | Max: 51m 06s | Hits: 100%/338432

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants