Skip to content

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306

Merged
lanluo-nvidia merged 1 commit into
pytorch:mainfrom
tp5uiuc:rtx/cleanup-1.5-wars
Jun 3, 2026
Merged

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306
lanluo-nvidia merged 1 commit into
pytorch:mainfrom
tp5uiuc:rtx/cleanup-1.5-wars

Conversation

@tp5uiuc

@tp5uiuc tp5uiuc commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Description

TensorRT-RTX 1.5 resolves the upstream issues that several convolution-validator WARs and test skips were guarding against.

Removed:

  • BF16 depthwise conv/deconv fallback to PyTorch.
  • Grouped 3D deconv fallback to PyTorch (any dtype).
  • Engine-cache timing-flakiness skip on test_caching_small_model.
  • test_grouped_deconv3d_fallback (asserted behaviour no longer applies).

Kept (narrower):

  • convolution_capability_validator now rejects only 3D transposed conv with stride > 1 AND dilation > 1 — still no TRT-RTX kernel for this combo.
  • Matching in-test guard in test_deconv3d — the converter harness drives TRTInterpreter directly and bypasses the partitioner, so validator-rejected nodes raise UnsupportedOperatorException instead of falling back to PyTorch as they would in torch_tensorrt.compile.

Replaced:

  • The timing-flakiness skip on test_dynamo_compile_with_refittable_weight_stripped_engine is replaced with a new skip naming the actual underlying issue: a static input-shape mismatch between the export example_inputs (batch 100) and the compile arg_inputs (batch 128). Tracked as a follow-up — the old skip was incidentally masking this.

Docs: Bumped a missed TensorRT-RTX-1.4.0.76 Windows-install path example to 1.5.0.114.

Verified locally on an A100 with TRT-RTX 1.5.0.114: BF16 mobilenet_v2/efficientnet_b0, grouped 3D deconv tests, and test_caching_small_model all pass; test_deconv3d_10_combined_params (strided+dilated) cleanly skips.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

@meta-cla meta-cla Bot added the cla signed label May 29, 2026
@github-actions github-actions Bot added documentation Improvements or additions to documentation component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels May 29, 2026
@github-actions github-actions Bot requested a review from zewenli98 May 29, 2026 02:31
Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated
Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated
Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated
@tp5uiuc tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from 6bf1c82 to a7e191d Compare May 29, 2026 03:02
…dilated deconv

TensorRT-RTX 1.5 (PR pytorch#4297) resolves the upstream cuDNN and JIT issues
that the original convolution capability validator and test skips were
guarding against. The remaining TRT-RTX limitation in this area is
1D/2D/3D transposed convolutions that combine stride > 1 with
dilation > 1, which have no kernel support and crash the build with
"Strided & Dilated Deconv are currently not supported". Regular
convolutions are unaffected.

Changes:

1. py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py
   - Drop the old WARs in convolution_capability_validator:
       a. Depthwise conv/deconv BF16 fallback to PyTorch.
       b. Grouped 3D deconv fallback to PyTorch (any dtype).
     Both ops now run on TRT directly.
   - Keep the validator with a single, narrower rule: any transposed
     convolution (1D/2D/3D) with both stride > 1 and dilation > 1 still
     falls back to PyTorch.

2. tests/py/dynamo/conversion/test_deconvolution_aten.py
   - Drop the previous in-test guard for grouped 3D deconv.
   - Add a shared `_skip_if_rtx_strided_dilated_deconv` helper that
     mirrors the validator predicate and document why the converter
     test harness needs it (it bypasses the partitioner, so a
     validator-rejected op raises UnsupportedOperatorException rather
     than falling back to PyTorch).
   - Wire the helper into test_deconv1d/2d/3d.
   - Add explicit `strided_dilated` parametrize entries to test_deconv1d
     and test_deconv2d (test_deconv3d's existing combined_params already
     covers the case). All three skip cleanly on TRT-RTX.

3. tests/py/dynamo/models/test_models.py
   - Delete test_grouped_deconv3d_fallback; the asserted fallback behavior
     no longer exists.

4. tests/py/dynamo/models/test_engine_cache.py
   - Remove the unittest.skipIf(tensorrt_rtx, "Engine caching compilation
     time assertion is unreliable...") decorator on test_caching_small_model.
     Refit-engine perf is now reliable on TRT-RTX 1.5.

5. tests/py/dynamo/models/test_weight_stripped_engine.py
   - Drop the old TRT-RTX timing-based skip on
     test_dynamo_compile_with_refittable_weight_stripped_engine and fix
     the underlying test bug it was masking: example_inputs to
     torch.export.export (batch 100) and arg_inputs to torch_trt.dynamo.compile
     (batch 128) disagreed, so the engine was built for the export
     shape and runtime failed when fed the compile-inputs shape. Reuse a
     single `inputs` list at both call sites so the shapes can't drift.
     Verified passing on both standard TRT and TRT-RTX nightlies.

6. docsrc/getting_started/tensorrt_rtx.rst
   - Bump the Windows install-path example from TensorRT-RTX-1.4.0.76 to
     TensorRT-RTX-1.5.0.114; the Linux example was updated in the 1.5
     bump but the Windows block was missed.
@tp5uiuc tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from a7e191d to ae0753e Compare May 29, 2026 03:44
@tp5uiuc tp5uiuc self-assigned this May 29, 2026
@tp5uiuc tp5uiuc requested a review from lanluo-nvidia May 29, 2026 03:45
@tp5uiuc tp5uiuc marked this pull request as ready for review May 29, 2026 03:46
# Use the same inputs for both export and compile to avoid a
# static-shape mismatch between the exported program and the engine.
inputs = [torch.randn((100, 3, 224, 224)).to("cuda")]
exp_program = torch.export.export(pyt_model, args=tuple(inputs))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zewenli98 : Without this change, both TRT standard and TRT-RTX fails this test. I am not too sure whether and with what cadence it is running on the CI currently (my understanding is that this is a L2 test as its not marked with pytest.mark.critical)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I found the error was caught since the PR #4222, but not sure why it was not caught in the previous CI.

@tp5uiuc

tp5uiuc commented May 29, 2026

Copy link
Copy Markdown
Collaborator Author

[by Claude Code] CI failures look unrelated to this PR — two upstream/infra issues:

  1. Most build jobs (Linux cu130/cu132, Windows cu130/cu132, RTX Linux + Windows) fail at step 9 (test-infra/setup-binary-builds) with TypeError: dataclass() got an unexpected keyword argument 'slots'. The pkg-helpers conda env is on Python 3.9 but its pip uses @dataclass(slots=True) (3.10+). pip's own module fails to import; the wheel build never starts.
  2. SBSA cu132: Unable to find a match: libnccl-2.27.7-1+cuda13.2 — the cu13.2 NCCL RPM isn't published for aarch64 yet. SBSA cu130 passes.

No code changes needed here. Re-running after the test-infra fix lands should turn it green.

@lanluo-nvidia lanluo-nvidia left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lanluo-nvidia lanluo-nvidia merged commit a6a4365 into pytorch:main Jun 3, 2026
83 of 92 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants