Wilcoxon refactor by Intron7 · Pull Request #636 · scverse/rapids-singlecell

Intron7 · 2026-04-24T15:09:47Z

Implement a better GPU accelerated version of wilcoxon in rapids-singlecell inspired by the super fast illico verison.

codecov-commenter · 2026-04-24T15:22:30Z

Codecov Report

❌ Patch coverage is 83.25723% with 110 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.28%. Comparing base (0eaeaaf) to head (00eccb2).

Files with missing lines	Patch %	Lines
...s_singlecell/tools/_rank_genes_groups/_wilcoxon.py	79.95%	88 Missing ⚠️
...apids_singlecell/tools/_rank_genes_groups/_core.py	87.59%	17 Missing ⚠️
...pids_singlecell/tools/_rank_genes_groups/_utils.py	86.84%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #636      +/-   ##
==========================================
- Coverage   90.29%   89.28%   -1.01%     
==========================================
  Files         104      104              
  Lines        8964     9464     +500     
==========================================
+ Hits         8094     8450     +356     
- Misses        870     1014     +144

Files with missing lines	Coverage Δ
src/rapids_singlecell/_cuda/__init__.py	`80.00% <ø> (ø)`
...ds_singlecell/tools/_rank_genes_groups/__init__.py	`87.95% <100.00%> (+5.49%)`	⬆️
...ecell/tools/_rank_genes_groups/_wilcoxon_binned.py	`90.56% <100.00%> (-0.05%)`	⬇️
...pids_singlecell/tools/_rank_genes_groups/_utils.py	`83.33% <86.84%> (-2.59%)`	⬇️
...apids_singlecell/tools/_rank_genes_groups/_core.py	`82.00% <87.59%> (-9.73%)`	⬇️
...s_singlecell/tools/_rank_genes_groups/_wilcoxon.py	`81.39% <79.95%> (-16.23%)`	⬇️

... and 1 file with indirect coverage changes

Intron7 · 2026-06-08T21:37:01Z

@coderabbitci review

coderabbitai · 2026-06-08T21:37:14Z

@Intron7: I'll review the changes in this PR now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-06-08T21:54:39Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- GPU-accelerated sparse Wilcoxon rank computation for both one-vs-rest and one-vs-one comparisons
- Added skip_empty_groups parameter to filter groups with insufficient samples
- Added return_u_values option to return Mann–Whitney U statistics
- Enhanced GPU densification pipeline for sparse matrix processing
Bug Fixes
- Sparse matrices with negative values now fall back to dense ranking instead of raising errors
Infrastructure
- Updated CUDA support from 12.8 to 12.9.1
- Enhanced GPU library discovery and build integration

Walkthrough

This PR replaces Wilcoxon rank-sum GPU computation with new OVR/OVO tiered kernel implementations using CUB segmented radix sort, adds a _rank_stats_cuda module for group statistics, upgrades infrastructure to CUDA 12.9.1, refactors Python rank_genes_groups to support new sparse/dense streaming paths and optional U-value output, and expands test coverage for non-negative data and empty-group filtering.

Changes

Wilcoxon GPU Refactor and Rank Statistics

Layer / File(s)	Summary
Infrastructure and Dependency Updates `.github/workflows/*`, `conda/rsc_rapids_26.04_cuda12.yml`, `docker/Dockerfile`, `CMakeLists.txt`, `pyproject.toml`, `.gitignore`	CUDA toolkit bumped to 12.9.1, Docker and CI configuration updated to link librmm and rapids_logger, RMM discovery added to CMakeLists.txt, wheel repair commands extended, and `/benchmarks/` added to .gitignore.
CUDA Support Types and RMM Allocation `src/rapids_singlecell/_cuda/__init__.py`, `src/rapids_singlecell/_cuda/nb_types.h`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_rmm.cu`	Host-side NumPy array type alias, lazy module import for `_rank_stats_cuda`, and RMM memory management using `mr::get_current_device_resource_ref()` with error handling.
New Rank Statistics CUDA Module `src/rapids_singlecell/_cuda/rank_genes/csr_tile_to_dense.cuh`, `src/rapids_singlecell/_cuda/rank_genes/rank_stats.cu`	New `_rank_stats_cuda` module exposing FDR-BH reverse cumulative-minimum, per-group chunk statistics aggregation, and CSR-to-dense tile densification with Python bindings and CUDA stream support.
Wilcoxon CUDA Infrastructure `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_fast_common.cuh`	Shared utilities including CUB scratch-size helpers, device shared-memory caching, RAII wrappers for pinned memory and RMM scratch pools, CSR gather/cast/accumulate kernels, and segment offset utilities.
OVR (One-vs-Rest) Kernel Suite `src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon.cuh`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovr_kernels.cuh`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_sparse_kernels.cuh`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovr_sparse.cuh`	Dense OVR rank-sums with optional tie-correction, sparse OVR kernels with shared/global-memory accumulators, CSR histogram and scatter kernels, and streaming pipelines for sparse CSC/CSR with host/device input variants.
OVO (One-vs-One) Kernel Suite `src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon_ovo.cuh`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuh`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_device_sparse.cuh`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_host_sparse.cuh`	Five OVO tier kernels (WARP/SMALL/MEDIUM/LARGE/HUGE) sized by group cardinality, tier-dispatch orchestration with CUB segmented sort, sparse device/host pipelines for CSC/CSR, and reference-only tie-sum precomputation.
GPU Launchers and Bindings `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon.cu`, `src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_sparse.cu`	High-level launchers for OVR dense streaming and OVO tiered ranking orchestrating multi-stream execution and scratch allocation; module bindings expose Python-facing entry points for sparse/dense host/device paths with optional tie-correction and statistics.
Python rank_genes_groups API `src/rapids_singlecell/tools/_rank_genes_groups/__init__.py`, `src/rapids_singlecell/tools/_rank_genes_groups/_core.py`, `src/rapids_singlecell/tools/_rank_genes_groups/_utils.py`	Added `skip_empty_groups` and `return_u_values` options, GPU-side group statistics via `_rank_stats_cuda.group_chunk_stats`, new ranking/FDR-BH helpers, array-to-records converters, sparse-negative fallback detection, and `_csr_tile_to_dense_block` for fused GPU densification.
Wilcoxon Python Workflows `src/rapids_singlecell/tools/_rank_genes_groups/_wilcoxon.py`, `src/rapids_singlecell/tools/_rank_genes_groups/_wilcoxon_binned.py`	Wilcoxon computational paths now select OVR/OVO kernels for sparse/dense/host/device inputs with optional U-value scores, group accumulator helpers, and automatic fallback to dense when sparse contains negatives; binned variant passes `force_dense` flag for negative-sparse fallback.
Test Coverage `tests/test_rank_genes_groups_ttest.py`, `tests/test_rank_genes_groups_wilcoxon.py`, `tests/test_rank_genes_groups_wilcoxon_binned.py`	Test helpers enforce non-negative expression data, new `skip_empty_groups` behavior validation, Wilcoxon tie-handling parity against Scanpy, sparse-negative fallback regression guards, and expanded correctness/format-coverage testing across dense/sparse OVO/OVR modes.

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly Related PRs

scverse/rapids-singlecell#651: Modifies _rank_genes_groups/_core.py to adjust group/reference subset handling and validation in the same computational pipeline.
scverse/rapids-singlecell#570: Modifies _core.py and _utils.py to refactor group selection and stats coding that this PR builds upon.
scverse/rapids-singlecell#619: Adds CUDA_CHECK_LAST_ERROR macro used by the new rank_stats.cu kernel error checking.

Suggested Reviewers

Zethson

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch wilcoxon-refactor

coderabbitai

Actionable comments posted: 20

🧹 Nitpick comments (1)

.gitignore (1)

57-57: ⚡ Quick win

Remove duplicate /benchmarks/ entry.

This pattern already appears on line 7.

🧹 Proposed fix

-/benchmarks/

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitignore at line 57, Duplicate .gitignore entry: remove the redundant
"/benchmarks/" line so the pattern appears only once; locate the duplicate entry
matching "/benchmarks/" and delete one occurrence (keep the first instance) to
avoid repeated patterns in .gitignore.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pyproject.toml`:
- Line 176: Update the auditwheel repair command string (the line containing
"auditwheel repair --exclude libcublas.so.12 --exclude libcublas.so.13 ...") to
use wildcard SONAME exclusions and include the same libraries excluded in
publish.yml: replace hardcoded suffixes (e.g., libcublas.so.12/.13) with
wildcard patterns like libcublas.so.*, libcublasLt.so.*, libcudart.so.*, and add
libcusolver.so.*, libcusparse.so.*, libnvJitLink.so* (and keep librmm.so and
librapids_logger.so) so the CIBW_REPAIR_WHEEL_COMMAND behavior in pyproject.toml
matches the publish.yml exclusions and avoids bundling large CUDA-dependent
libraries.

In `@src/rapids_singlecell/_cuda/rank_genes/csr_tile_to_dense.cuh`:
- Around line 28-32: The kernel narrows templated IndexT by doing const int col
= static_cast<int>(indices[k]) which can truncate large indices; change the code
to keep 64-bit math by declaring col using the template type (e.g. IndexT or
int64_t), perform comparisons against col_lb/col_ub without narrowing, compute
the column offset as a 64-bit value (col - col_lb) and only cast to the integer
type required for indexing/atomicAdd at the final array access; update the
occurrence of static_cast<int>(indices[k]) and any uses of (col - col_lb) *
n_cells to use 64-bit arithmetic so writes into out and the atomicAdd call are
correct for large indices.

In `@src/rapids_singlecell/_cuda/rank_genes/rank_stats.cu`:
- Around line 73-85: Before launching csr_tile_to_dense_kernel, validate CSR and
output buffer contracts: ensure indptr.shape(0) >= 2 and equals n_cells+1,
indptr is monotonic non-decreasing and its last value equals the expected nnz
(so indices.size() and data.size() >= nnz), out.dim(0) == n_cells and out.dim(1)
>= (col_ub - col_lb) (or exactly equals the tile width used), and that
pointers/data buffers (indptr.data(), indices.data(), data.data(), out.data())
are non-null; also validate col_lb and col_ub are within [0, num_columns] (use
available column count) and col_lb < col_ub before computing grid and launching
csr_tile_to_dense_kernel; if any check fails, return or propagate an error
instead of launching the kernel (leave
CUDA_CHECK_LAST_ERROR(csr_tile_to_dense_kernel) after the launch).
- Around line 125-138: The kernel launch trusts caller-provided shapes and can
OOB; before computing grid and launching group_chunk_stats_kernel, validate that
block is 2D and n_rows*n_cols>0 (already checked), that group_codes.length ==
n_rows, and that group_sums.size(), group_sum_sq.size(), and group_nnz.size()
all equal n_groups (the value derived from group_sums.shape(0)), and that none
of the device pointers/arrays (block.data(), group_codes.data(),
group_sums.data(), group_sum_sq.data(), group_nnz.data()) are null; if any check
fails, return or raise an error instead of launching the kernel. Ensure these
checks are placed immediately before the call site that computes grid =
strided_grid(total, GROUP_STATS_BLOCK) and before invoking
group_chunk_stats_kernel (passing compute_nnz unchanged).

In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_device_sparse.cuh`:
- Around line 158-161: The segmented radix sort calls to
cub::DeviceSegmentedRadixSort::SortKeys currently drop the returned cudaError_t;
capture the return value (e.g. cudaError_t err =
cub::DeviceSegmentedRadixSort::SortKeys(...)) for both occurrences (the call
using ref_cub_temp_buf/ref_temp/d_ref_dense/d_ref_sorted/d_ref_seg_offsets and
the second call around lines 357-360) and check err != cudaSuccess; on failure
log or propagate the error and abort/return early from the enclosing function so
downstream ranking code never consumes invalid d_ref_sorted data. Ensure the
error check uses the same stream/context as the call (ref_stream) and returns or
forwards the cudaError_t so callers can handle the failure.
- Line 17: The early-return combines the all-empty test-group case (n_all_grp ==
0) with other short-circuits, leaving rank_sums and tie_corr uninitialized;
instead, detect n_all_grp == 0 separately and write the expected defaults
(rank_sums = 0.0 and tie_corr = 1.0 for each relevant output entry) before
returning so downstream OVO kernels see the valid values. Update the conditional
in wilcoxon_ovo_device_sparse.cuh to only early-return for n_cols == 0 || n_ref
== 0, and add a branch that iterates the output slots and sets rank_sums and
tie_corr to the correct default when n_all_grp == 0; apply the same change at
the other occurrence noted (around the duplicated early-return).

In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_host_sparse.cuh`:
- Around line 226-229: The call to cub::DeviceSegmentedRadixSort::SortKeys is
currently invoked without checking its return status, so failures can leave
buf.ref_sorted invalid; update both occurrences (the SortKeys calls around the
ref_sorted usage) to capture the returned cudaError_t (or cub return code),
check it for success, and on failure propagate/return the error or log and abort
the kernel path before any ranking kernels run (and ensure the stream is
synchronized or the error forwarded to the caller) so downstream code never
operates on invalid ref_sorted.
- Line 19: The early return when (n_cols == 0 || n_ref == 0 || n_all_grp == 0)
leaves outputs uninitialized for the "all-empty test-group" case; before
returning when n_all_grp == 0 explicitly zero/initialize the output buffers used
by both CSC and CSR paths (e.g. d_rank_sums, d_tie_corr) and set the group-stat
output tensors to their deterministic values (zeros for sums/corrections and
appropriate zeros/ones for stats/p-values) so downstream consumers get defined
results; apply the same initialization at the other identical early-return site
referenced (around the other occurrence) so both exits behave consistently.
- Around line 209-214: The OVO host ranking path is incorrectly downcasting
double inputs to float (see launch_ovr_cast_and_accumulate_sparse using
d_sparse_data_f32 and the use of d_ref_data_f32/d_grp_data_f32 at the other
locations), which can collapse distinct doubles into ties; fix by making the OVO
ranking pipeline templated on the input value type (add and wire through double
variants like d_sparse_data_f64, d_ref_data_f64, d_grp_data_f64 and templated
kernels such as launch_ovr_cast_and_accumulate_sparse<InT=double, ...>) so no
double→float cast occurs for _f64 entrypoints, and add the numerical-stability
guards called out in the guidelines (epsilon checks before divisions, safe
accumulation/reduction for double, and NaN/Inf handling) in the corresponding
kernels referenced around lines 209, 535-541, and 706-713.
- Around line 141-154: The per-stream scratch buffers bufs[s].d_group_sums,
bufs[s].d_group_sq_sums, and bufs[s].d_group_nnz must be zeroed before each CSC
sub-batch to avoid leaking previous-batch stats into
launch_ovr_cast_and_accumulate_sparse(...); add a zeroing step just before
calling launch_ovr_cast_and_accumulate_sparse that issues asynchronous memsets
(e.g. cudaMemsetAsync) on each of those device pointers using the stream for
that stream, using the correct byte size (n_groups_stats * sub_batch_cols *
sizeof(double) for the first two and the same/count fallback used at allocation
for the third), and apply the same fix at the other allocation site referenced
(lines ~209-214) so all per-stream reused scratch buffers are cleared before
use.

In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuh`:
- Around line 268-279: The CUB segmented sort return value is ignored causing
the HUGE kernel (ovo_rank_huge_kernel) to run on potentially invalid
sc.grp_sorted; capture the return from cub::DeviceSegmentedRadixSort::SortKeys
into a cudaError_t (e.g., status) and check it before launching
ovo_rank_huge_kernel — on error use the existing error handling pattern
(CUDA_CHECK or equivalent logging/return) to abort/propagate the failure instead
of proceeding to launch the kernel that reads sc.grp_sorted.

In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovr_kernels.cuh`:
- Around line 34-49: The kernel csr_scatter_to_csc_kernel wrongly assumes CSR
row indices are sorted; replace the binary-search + early-break logic with a
full linear scan of the CSR row range [rs,re) using indices[p] to test
membership in [col_start,col_stop) so unsorted rows are handled correctly:
iterate p from rs to re, for each int c = (int)indices[p] if (c >= col_start &&
c < col_stop) use atomicAdd(&write_pos[c - col_start],1) and write data[p] and
row into csc_vals/csc_row_idx. Keep references to indices, indptr bounds rs/re,
col_start/col_stop and write_pos/csc_vals/csc_row_idx as the places to modify.

In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_sparse.cu`:
- Around line 22-41: The sparse binding macro RSC_OVR_SPARSE_DEVICE_BINDING
currently forwards the int parameter sub_batch_cols verbatim into IMPL, which
can be zero or negative and cause invalid launches/allocations; update the
lambda body so it validates and normalizes sub_batch_cols before calling IMPL
(e.g., if sub_batch_cols <= 0 then set it to SUB_BATCH_COLS or clamp to a
minimum of 1), then pass the sanitized value to IMPL; apply the same validation
change for every sparse binding instance referenced by this file so all calls
into the IMPL functions receive a safe, validated sub_batch_cols.
- Around line 196-200: The _f64 binding variants (RSC_OVO_CSC_HOST_BINDING
entries "ovo_streaming_csc_host_f64", "ovo_streaming_csc_host_f64_i64" and the
CSR equivalents) advertise double precision but the implementations
ovo_streaming_csc_host_impl and ovo_streaming_csr_host_impl currently downcast
InT=double to float32 for sorting/tie handling, which can change ordering and
ties; either remove these _f64 bindings or implement a true double-precision
path in those functions that preserves double inputs (avoid casting
double→float), add small epsilons before any divisions, handle accumulation
stability in reductions, and explicitly treat NaN/Inf inputs during ranking so
the exported _f64 entrypoints genuinely preserve double precision.

In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon.cu`:
- Around line 91-94: The calls to cub::DeviceSegmentedRadixSort::SortPairs in
the dense launchers are currently ignoring the cudaError_t return value (e.g.,
the call passing buf.cub_temp, temp, keys_in, buf.keys_out, buf.vals_in,
buf.vals_out, sb_items, sb_cols, buf.seg_offsets, buf.seg_offsets + 1,
BEGIN_BIT, END_BIT, stream), which can hide failures and corrupt downstream rank
computations; update both occurrences (around the shown call and the one at
lines ~263-266) to capture the return value, check it against cudaSuccess, and
on error propagate/return an appropriate cudaError_t (or call CUDA
error-handling macro used in this module) so the caller sees the failure instead
of receiving corrupted rank_sums/tie_corr.
- Around line 138-139: The early return on n_all_grp == 0 leaves rank_sums and
tie_corr uninitialized; instead, before returning set the OVO outputs to the
defined empty-group contract: fill rank_sums with 0.0 and tie_corr with 1.0 for
all expected slots (use n_cols and n_groups / n_all_grp semantics used in this
file). Locate the checks around n_all_grp, n_groups, n_cols in wilcoxon.cu and,
when n_all_grp == 0, perform a host/device fill (or launch a small kernel) that
writes 0.0 into the rank_sums buffer for every column×group slot and 1.0 into
the tie_corr buffer, then return. Ensure you respect the same memory
layout/strides used elsewhere in this file so downstream code sees the
initialized values.

In `@src/rapids_singlecell/tools/_rank_genes_groups/_core.py`:
- Around line 430-440: Call _reject_complex(self.X) before branching on method
to ensure complex inputs are rejected for all statistic methods (not just
Wilcoxon); leave the existing Wilcoxon-only negative-sparse fallback logic
(_sparse_negative_fallback = _sparse_has_negative(self.X)) in place inside the
if block. In short: move or add a standalone _reject_complex(self.X) invocation
just prior to the method check so _compute_statistics_arrays and any
astype(self._score_dtype) conversions never silently drop imaginary parts, while
keeping the method-specific setting of self._sparse_negative_fallback for
"wilcoxon" and "wilcoxon_binned".
- Around line 520-526: The top-N selector _rank_indices_matrix mishandles
n_top==0 because np.argpartition(..., -0) returns all columns; add an explicit
check at the start of _rank_indices_matrix for n_top == 0 and return an empty 2D
indices array with shape (scores.shape[0], 0) (dtype int) so
rank_genes_groups(..., n_genes=0) yields no genes; otherwise keep the existing
logic and fall back to _RankGenes._argsort_desc_matrix when n_top >=
scores.shape[1].

In `@src/rapids_singlecell/tools/_rank_genes_groups/_wilcoxon.py`:
- Around line 367-388: The helper _device_sparse_arrays_f32 currently accepts
cp.float64 inputs but always downcasts X.data via X.data.astype(cp.float32,
copy=False), which silently loses precision; instead, detect cp.float64 in
_device_sparse_arrays_f32 and do not downcast—either raise a clear TypeError
stating device-sparse float64 is unsupported (and recommend converting to
float32 or using a host/dense float64 path) or explicitly dispatch to a float64
device kernel/fallback routine; update the code around the data_dtype check to
branch for np.float64 and either raise the error or call the corresponding
float64 handler rather than calling X.data.astype(cp.float32,...).

---

Nitpick comments:
In @.gitignore:
- Line 57: Duplicate .gitignore entry: remove the redundant "/benchmarks/" line
so the pattern appears only once; locate the duplicate entry matching
"/benchmarks/" and delete one occurrence (keep the first instance) to avoid
repeated patterns in .gitignore.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 598c312f-2884-4f0e-8a38-b33679306584

📥 Commits

Reviewing files that changed from the base of the PR and between 0eaeaaf and 490bec2.

📒 Files selected for processing (31)

.github/workflows/docker.yml
.github/workflows/publish.yml
.gitignore
CMakeLists.txt
conda/rsc_rapids_26.04_cuda12.yml
docker/Dockerfile
pyproject.toml
src/rapids_singlecell/_cuda/__init__.py
src/rapids_singlecell/_cuda/nb_types.h
src/rapids_singlecell/_cuda/rank_genes/csr_tile_to_dense.cuh
src/rapids_singlecell/_cuda/rank_genes/rank_stats.cu
src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon.cuh
src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon_ovo.cuh
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon.cu
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_fast_common.cuh
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_device_sparse.cuh
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_host_sparse.cuh
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuh
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovr_kernels.cuh
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovr_sparse.cuh
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_rmm.cu
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_sparse.cu
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_sparse_kernels.cuh
src/rapids_singlecell/tools/_rank_genes_groups/__init__.py
src/rapids_singlecell/tools/_rank_genes_groups/_core.py
src/rapids_singlecell/tools/_rank_genes_groups/_utils.py
src/rapids_singlecell/tools/_rank_genes_groups/_wilcoxon.py
src/rapids_singlecell/tools/_rank_genes_groups/_wilcoxon_binned.py
tests/test_rank_genes_groups_ttest.py
tests/test_rank_genes_groups_wilcoxon.py
tests/test_rank_genes_groups_wilcoxon_binned.py

Intron7 added 2 commits April 24, 2026 15:38

first iteration of refactor

bc51273

add rmm

4094e6b

Intron7 added the run-gpu-ci label Apr 24, 2026

github-actions Bot removed the run-gpu-ci label Apr 24, 2026

update publish and cmake

9c391ed

Intron7 added run-gpu-ci labels Apr 24, 2026

github-actions Bot removed the run-gpu-ci label Apr 24, 2026

Intron7 added 2 commits April 24, 2026 18:10

update notebooks

76389bb

make dense faster

f69f1d8

Intron7 added the run-gpu-ci label Apr 24, 2026

github-actions Bot removed the run-gpu-ci label Apr 24, 2026

Intron7 and others added 2 commits April 24, 2026 23:08

update tests and fix issues

a0e9b0c

Merge branch 'main' into wilcoxon-refactor

2e36351

Intron7 added the run-gpu-ci label Apr 29, 2026

github-actions Bot removed the run-gpu-ci label Apr 29, 2026

Merge branch 'main' into wilcoxon-refactor

3682c92

Intron7 added the run-gpu-ci label May 12, 2026

github-actions Bot removed the run-gpu-ci label May 12, 2026

Intron7 and others added 7 commits May 13, 2026 12:24

Merge branch 'main' into wilcoxon-refactor

43bc9c2

fix tests

73cda58

Merge branch 'main' into wilcoxon-refactor

9769f3c

Merge branch 'main' into wilcoxon-refactor

a99fdf7

Merge branch 'main' into wilcoxon-refactor

ce6bc48

Merge branch 'main' into wilcoxon-refactor

bcbd4d3

Merge branch 'main' into wilcoxon-refactor

0f5e89e

Intron7 marked this pull request as ready for review June 3, 2026 09:40

Intron7 added 2 commits June 3, 2026 12:53

update

49d43d0

safety commit

4e4b55d

Intron7 added 2 commits June 3, 2026 17:45

start cleanup

75b810a

first draft

f8c00d8

Intron7 added the run-gpu-ci label Jun 3, 2026

github-actions Bot removed the run-gpu-ci label Jun 3, 2026

update rmm

28bc282

Intron7 added the run-gpu-ci label Jun 5, 2026

Merge branch 'main' into wilcoxon-refactor

e8638de

github-actions Bot removed the run-gpu-ci label Jun 5, 2026

update ci buildwheel

c165d9f

Intron7 added the run-gpu-ci label Jun 5, 2026

github-actions Bot removed the run-gpu-ci label Jun 5, 2026

Intron7 and others added 4 commits June 6, 2026 11:08

add csr densification columnwise clean up rmm

e8a0ba0

Merge branch 'main' into wilcoxon-refactor

4c087d3

fix docker

df988d8

Merge branch 'main' into wilcoxon-refactor

490bec2

Intron7 added the run-gpu-ci label Jun 8, 2026

github-actions Bot removed the run-gpu-ci label Jun 8, 2026

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

fix issues

82703d2

Intron7 added the run-gpu-ci label Jun 9, 2026

github-actions Bot removed the run-gpu-ci label Jun 9, 2026

redo memory allocation

00eccb2

Intron7 added the run-gpu-ci label Jun 9, 2026

github-actions Bot removed the run-gpu-ci label Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wilcoxon refactor#636

Wilcoxon refactor#636
Intron7 wants to merge 28 commits into
mainfrom
wilcoxon-refactor

Intron7 commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 24, 2026 •

edited

Loading

Uh oh!

Intron7 commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Summary by CodeRabbit

Walkthrough

Changes

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Intron7 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Intron7 commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Summary by CodeRabbit

Walkthrough

Changes

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Intron7 commented Apr 24, 2026 •

edited

Loading

codecov-commenter commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading