perf(fastlanes): fuse bit-packed compare into a transposed mask + untranspose by joseph-isaacs · Pull Request #8239 · vortex-data/vortex

joseph-isaacs · 2026-06-03T16:48:02Z

Summary

Stacked on #8238 (the benchmark) so the change lands as a CodSpeed diff.

Replaces the unpack-then-compare streaming kernel for compare-against-constant with the FastLanes fused unpack_cmp:

compare each value as it is unpacked, accumulating results straight into a transposed 1024-bit mask ([u64; 16], one register-resident word per lane — no [bool; 1024]/[T; 1024] scratch),
a single SIMD untranspose_bits per block rotates the mask into logical row order, copied directly into the output bit buffer,
inline patches are spliced in afterwards; sliced (offset != 0) arrays fall back to the scalar streaming predicate.

Add `bitpack_compare_sweep`, which exercises the public `array.binary(rhs, op)` compare-against-constant path over all eight integer types and every valid bit width (64Ki in-range elements per case, no patches). It isolates the `<BitPacked as CompareKernel>` unpack + per-element compare kernel so a kernel change shows up as a CodSpeed diff. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

codspeed-hq · 2026-06-03T16:57:14Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 2 regressed benchmarks
✅ 1504 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	269.9 µs	304.9 µs	-11.47%
❌	Simulation	`baseline_lt[16, 65536]`	216.1 µs	244.1 µs	-11.44%
⚡	Simulation	`chunked_varbinview_canonical_into[(1000, 10)]`	197.1 µs	160.7 µs	+22.59%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/confident-hamilton-mZIEo (211903c) with develop (4e6e9ed)}

…ranspose Replace the unpack-then-compare streaming kernel for compare-against-constant with the FastLanes fused `unpack_cmp`: compare each value as it is unpacked, accumulating results straight into a transposed 1024-bit mask (`[u64; 16]`, one register-resident word per lane - no `[bool; 1024]`/`[T; 1024]` scratch), then a single SIMD `untranspose_bits` per block rotates the mask into logical row order, copied directly into the output bit buffer. Inline patches are spliced in afterwards; sliced (offset != 0) arrays fall back to the scalar streaming predicate. This requires the in-development FastLanes (PR #141 fused mask + PR #145 width-generic BMI2/VBMI untranspose), pinned via a git patch until released. Benchmarked end-to-end through the public compare path (`bitpack_compare_sweep`, 64Ki elements, all integer types and bit widths): fused beats the streaming baseline for every type and width - i8/u8 ~6.2-7.7x i16/u16 ~4.5-6.0x i32/u32 ~1.9-4.3x i64/u64 ~1.2-1.9x Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs · 2026-06-04T10:27:11Z

+[patch.crates-io]
+fastlanes = { git = "https://github.com/spiraldb/fastlanes", rev = "6c10ea72cf693a17e994aa6401604ebedbeda453" }


We will remove this before we merge this PR

…space wasm-test is excluded from the workspace, so it does not inherit the root [patch.crates-io] and was building vortex-fastlanes against published fastlanes 0.5.0 (old `[bool;1024]` unpack_cmp, no `untranspose_bits`) -> compile error in compare_fused.rs. Add the matching git `rev` pin here. Temporary, like the root pin: both are removed when a FastLanes release is cut and the version is bumped. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs added the changelog/performance A performance improvement label Jun 3, 2026 — with Claude

joseph-isaacs force-pushed the claude/confident-hamilton-mZIEo branch from e27f5f4 to 48da899 Compare June 3, 2026 17:00

Base automatically changed from claude/confident-hamilton-mZIEo-benches to develop June 4, 2026 10:07

Merge branch 'develop' into claude/confident-hamilton-mZIEo

ab9c8d6

joseph-isaacs commented Jun 4, 2026

View reviewed changes

joseph-isaacs added the do not merge Pull requests that are not intended to merge label Jun 4, 2026

claude and others added 3 commits June 4, 2026 10:31

wip

816032b

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

wip

e4dd660

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs removed the do not merge Pull requests that are not intended to merge label Jun 4, 2026

joseph-isaacs added 2 commits June 4, 2026 14:10

wip

933ca0e

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

wip

211903c

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(fastlanes): fuse bit-packed compare into a transposed mask + untranspose#8239

perf(fastlanes): fuse bit-packed compare into a transposed mask + untranspose#8239
joseph-isaacs wants to merge 8 commits into
developfrom
claude/confident-hamilton-mZIEo

joseph-isaacs commented Jun 3, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

joseph-isaacs Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		[patch.crates-io]
		fastlanes = { git = "https://github.com/spiraldb/fastlanes", rev = "6c10ea72cf693a17e994aa6401604ebedbeda453" }

Conversation

joseph-isaacs commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

codspeed-hq Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Uh oh!

joseph-isaacs Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joseph-isaacs commented Jun 3, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 3, 2026 •

edited

Loading