chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips by tp5uiuc · Pull Request #4306 · pytorch/TensorRT

tp5uiuc · 2026-05-29T02:31:22Z

Description

TensorRT-RTX 1.5 resolves the upstream issues that several convolution-validator WARs and test skips were guarding against.

Removed:

BF16 depthwise conv/deconv fallback to PyTorch.
Grouped 3D deconv fallback to PyTorch (any dtype).
Engine-cache timing-flakiness skip on test_caching_small_model.
test_grouped_deconv3d_fallback (asserted behaviour no longer applies).

Kept (narrower):

convolution_capability_validator now rejects only 3D transposed conv with stride > 1 AND dilation > 1 — still no TRT-RTX kernel for this combo.
Matching in-test guard in test_deconv3d — the converter harness drives TRTInterpreter directly and bypasses the partitioner, so validator-rejected nodes raise UnsupportedOperatorException instead of falling back to PyTorch as they would in torch_tensorrt.compile.

Replaced:

The timing-flakiness skip on test_dynamo_compile_with_refittable_weight_stripped_engine is replaced with a new skip naming the actual underlying issue: a static input-shape mismatch between the export example_inputs (batch 100) and the compile arg_inputs (batch 128). Tracked as a follow-up — the old skip was incidentally masking this.

Docs: Bumped a missed TensorRT-RTX-1.4.0.76 Windows-install path example to 1.5.0.114.

Verified locally on an A100 with TRT-RTX 1.5.0.114: BF16 mobilenet_v2/efficientnet_b0, grouped 3D deconv tests, and test_caching_small_model all pass; test_deconv3d_10_combined_params (strided+dilated) cleanly skips.

Type of change

Bug fix (non-breaking change which fixes an issue)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes

…dilated deconv TensorRT-RTX 1.5 (PR pytorch#4297) resolves the upstream cuDNN and JIT issues that the original convolution capability validator and test skips were guarding against. The remaining TRT-RTX limitation in this area is 1D/2D/3D transposed convolutions that combine stride > 1 with dilation > 1, which have no kernel support and crash the build with "Strided & Dilated Deconv are currently not supported". Regular convolutions are unaffected. Changes: 1. py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py - Drop the old WARs in convolution_capability_validator: a. Depthwise conv/deconv BF16 fallback to PyTorch. b. Grouped 3D deconv fallback to PyTorch (any dtype). Both ops now run on TRT directly. - Keep the validator with a single, narrower rule: any transposed convolution (1D/2D/3D) with both stride > 1 and dilation > 1 still falls back to PyTorch. 2. tests/py/dynamo/conversion/test_deconvolution_aten.py - Drop the previous in-test guard for grouped 3D deconv. - Add a shared `_skip_if_rtx_strided_dilated_deconv` helper that mirrors the validator predicate and document why the converter test harness needs it (it bypasses the partitioner, so a validator-rejected op raises UnsupportedOperatorException rather than falling back to PyTorch). - Wire the helper into test_deconv1d/2d/3d. - Add explicit `strided_dilated` parametrize entries to test_deconv1d and test_deconv2d (test_deconv3d's existing combined_params already covers the case). All three skip cleanly on TRT-RTX. 3. tests/py/dynamo/models/test_models.py - Delete test_grouped_deconv3d_fallback; the asserted fallback behavior no longer exists. 4. tests/py/dynamo/models/test_engine_cache.py - Remove the unittest.skipIf(tensorrt_rtx, "Engine caching compilation time assertion is unreliable...") decorator on test_caching_small_model. Refit-engine perf is now reliable on TRT-RTX 1.5. 5. tests/py/dynamo/models/test_weight_stripped_engine.py - Drop the old TRT-RTX timing-based skip on test_dynamo_compile_with_refittable_weight_stripped_engine and fix the underlying test bug it was masking: example_inputs to torch.export.export (batch 100) and arg_inputs to torch_trt.dynamo.compile (batch 128) disagreed, so the engine was built for the export shape and runtime failed when fed the compile-inputs shape. Reuse a single `inputs` list at both call sites so the shapes can't drift. Verified passing on both standard TRT and TRT-RTX nightlies. 6. docsrc/getting_started/tensorrt_rtx.rst - Bump the Windows install-path example from TensorRT-RTX-1.4.0.76 to TensorRT-RTX-1.5.0.114; the Linux example was updated in the 1.5 bump but the Windows block was missed.

tp5uiuc · 2026-05-29T04:36:00Z

+        # Use the same inputs for both export and compile to avoid a
+        # static-shape mismatch between the exported program and the engine.
+        inputs = [torch.randn((100, 3, 224, 224)).to("cuda")]
+        exp_program = torch.export.export(pyt_model, args=tuple(inputs))


@zewenli98 : Without this change, both TRT standard and TRT-RTX fails this test. I am not too sure whether and with what cadence it is running on the CI currently (my understanding is that this is a L2 test as its not marked with pytest.mark.critical)

Good catch! I found the error was caught since the PR #4222, but not sure why it was not caught in the previous CI.

tp5uiuc · 2026-05-29T08:55:46Z

[by Claude Code] CI failures look unrelated to this PR — two upstream/infra issues:

Most build jobs (Linux cu130/cu132, Windows cu130/cu132, RTX Linux + Windows) fail at step 9 (test-infra/setup-binary-builds) with TypeError: dataclass() got an unexpected keyword argument 'slots'. The pkg-helpers conda env is on Python 3.9 but its pip uses @dataclass(slots=True) (3.10+). pip's own module fails to import; the wheel build never starts.
SBSA cu132: Unable to find a match: libnccl-2.27.7-1+cuda13.2 — the cu13.2 NCCL RPM isn't published for aarch64 yet. SBSA cu130 passes.

No code changes needed here. Re-running after the test-infra fix lands should turn it green.

lanluo-nvidia

LGTM

meta-cla Bot added the cla signed label May 29, 2026

github-actions Bot requested a review from zewenli98 May 29, 2026 02:31

tp5uiuc commented May 29, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated

tp5uiuc commented May 29, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated

tp5uiuc commented May 29, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated

tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from 6bf1c82 to a7e191d Compare May 29, 2026 03:02

tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from a7e191d to ae0753e Compare May 29, 2026 03:44

tp5uiuc self-assigned this May 29, 2026

tp5uiuc requested a review from lanluo-nvidia May 29, 2026 03:45

tp5uiuc marked this pull request as ready for review May 29, 2026 03:46

tp5uiuc commented May 29, 2026

View reviewed changes

lanluo-nvidia approved these changes Jun 3, 2026

View reviewed changes

lanluo-nvidia merged commit a6a4365 into pytorch:main Jun 3, 2026
83 of 92 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306
lanluo-nvidia merged 1 commit into
pytorch:mainfrom
tp5uiuc:rtx/cleanup-1.5-wars

tp5uiuc commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tp5uiuc May 29, 2026

Uh oh!

zewenli98 May 29, 2026

Uh oh!

tp5uiuc commented May 29, 2026

Uh oh!

lanluo-nvidia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tp5uiuc commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tp5uiuc May 29, 2026

Choose a reason for hiding this comment

Uh oh!

zewenli98 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

tp5uiuc commented May 29, 2026

Uh oh!

lanluo-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tp5uiuc commented May 29, 2026 •

edited

Loading