Skip to content

Releases: apache/tvm

v0.25.0

19 Jun 17:49
c7ba073

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax, Frontend, TIR, Runtime, etc.

Please visit the full listing of commits for a complete view: v0.24.0...v0.25.0.

Community

None.

RFCs

None.

Arith

  • #19604 - [REFACTOR][TIR]Phase out ControlFlowGraph, NarrowPredicateExpression, and rename Simplify to StmtSimplify
  • #19638 - [REFACTOR]Phase out arith/scalable_expression; arith no longer proves over scalable vectors
  • #19670 - Memoize IntervalSet variable relaxation to avoid exponential blowup
  • #19669 - Gate canonical-simplify LT Case 2 on extra scale == +1
  • #19675 - Make Analyzer a tvm-ffi Object

BugFix

  • #19502 - [TIR] Skip bool-typed expressions in CSE
  • #19497 - [Relax] Fix scatter_elements and scatter_nd CUDA compilation
  • #19498 - [Relax][ONNX] Resolve param Vars in Concat to handle mixed Shape/Tensor inputs
  • #19511 - [Relax][Torch] Honor multi-axis dims in torch.flip converter
  • #19512 - [Relax][Torch] Honor correction in std/var converter
  • #19514 - [S-TIR] Wrap bare scalar bodies in DefaultGPUSchedule to avoid root-block crash
  • #19527 - [Relax]: handle ONNX ScatterElements reduction
  • #19535 - [Fix][Relax]: ONNX Clip NaN bounds and preserve input NaN (ORT parity)
  • #19554 - [Fix][CI]: remove astral-sh/setup-uv from lint workflow
  • #19557 - [Fix][Relax] Lower bool prod as logical all
  • #19567 - [Target][LLVM] Use libm for asin/acos instead of buggy inline Taylor
  • #19568 - [Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern
  • #19619 - [Vulkan][CodeGen] Change OpControlBarrier to AcquireRelease
  • #19643 - [Fix] Stabilize layer_norm variance computation with two-pass reduction
  • #19650 - [Fix][Relax] Support ND batched matmul chains in AdjustMatmulOrder pass
  • #19683 - [Fix] CommReduce could handle 0-dim data
  • #19779 - [Fix] nn.attention support dynamic batch_size
  • #19808 - [Fix] Revert C++20-only lambda captures for C++17 build

CI

  • #19629 - Remove tvm-lint from tvm-bot
  • #19656 - Add cibw-based wheel publishing to PyPI
  • #19659 - Wheel publishing follow-ups
  • #19665 - Derive the version from Git tags via setuptools_scm
  • #19664 - Reformat the macOS repair-wheel-command as a multiline script
  • #19697 - Target apache-tvm for PyPI wheel publishing
  • #19775 - Merge PR against its target branch instead of main (#19712)
  • #19685 - Remove PyPI-only tag ref guard from wheel publishing
  • #19703 - Pin actions by version tag, trim wheel perms
  • #19706 - [Tests] Fix s_tir tests using removed T.block API in TIRx script
  • #19700 - Fix release verification script
  • #19704 - [Tests] Skip test modules cleanly when optional deps are missing
  • #19713 - Fix CI script test subprocess environment
  • #19724 - [Tests][Disco] Skip CCL tests when runtime support is absent
  • #19725 - [Tests][Relax] Gate multi-GPU VM test on three devices
  • #19726 - [Tests][Hexagon] Lazily import pytest plugin dependencies
  • #19730 - [Tests][NNAPI] Skip tests cleanly when remote environment is unavailable
  • #19729 - [Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let
  • #19715 - [Tests] Remove test_runtime_ndarray (covered by tvm-ffi)
  • #19731 - [Script][Tests] Fix dialect redirect module re-execution and stray category-less tirx.intrin_test op
  • #19735 - [S-TIR][Tests] Fix transform test failures after TIRx bringup
  • #19740 - [Tests] Check WebGPU volatile allreduce annotation structurally
  • #19746 - [Tests] Fix flaky popen pool executor test
  • #19738 - Align cuda-python with PyTorch cuda-bindings
  • #19745 - [Tests][LLVM] Gate stepvector intrinsic rename on LLVM 20
  • #19751 - [S-TIR][Tests] Mark test_cp_async_in_if_then_else as xfail
  • #19737 - Run s_tir/transform tests in the python-unittest stage
  • #19754 - Updated cibw to 4.1.0
  • #19752 - [Tests][AArch64] Make SVE codegen assertions robust across LLVM versions
  • #19761 - Drop redundant cmake/ninja install from the Linux wheel CUDA sidecar
  • #19777 - [Tests] Modernize test gating
  • #19786 - [Tests] Make TargetCreation.DeduplicateKeys host-agnostic on AArch64
  • #19787 - [Tests] Replace remaining requires_* helpers with standard pytest
  • #19793 - Pin GitHub Actions to SHA for ASF INFRA compliance
  • #19798 - Remove Jenkins PR linter step
  • #19800 - [Tests][Refactor] Remove unused testing helpers

Docs

  • #19606 - Reorganize development guide content
  • #19720 - Clarify loading serialized artifacts requires a trusted source
  • #19782 - [CI] Bump tlcpack-sphinx-addon to restore search result summaries
  • #19788 - Modernize test-gating documentation

Frontend

  • #19590 - [ONNX] Add RMSNormalization converter for ONNX opset 23

Hexagon

  • #19747 - [Tests] Clean up stale hexagon tests
  • #19796 - [REFACTOR]Phase out Hexagon app and test wrappers

LLVM

  • #19716 - [Codegen]Accept splat form in VLA broadcast test
  • #19744 - [Codegen][Tests] Gate +v9a vscale_range expectation on LLVM version

Relax

  • #19495 - [Frontend] Add ParameterList and ParameterDict containers
  • #19491 - [Frontend][TFLite] Add segment operator mappings
  • #19499 - [Frontend][TFLite] Add tests coverage for SPACE_TO_BATCH_ND and BATCH_TO_SPACE_ND
  • #19516 - [TFLite] Add gather frontend expected IRModule tests
  • #19488 - [PyTorch] Fix segfault in from_exported_program when model uses index_put_ with tuple output
  • #19523 - [Frontend][TFLite] Add Conv3D support
  • #19525 - [ONNX] Normalize negative indices before the take call for Gather operator
  • #19530 - [Frontend] Add TFLite Frontend Support for CONV_3D_TRANSPOSE
  • #19536 - [Frontend][TFLite] Add initial StableHLO builtin operator support
  • #19547 - [ONNX] Set max_output_boxes_per_class default value to 0 for NonMaxSuppression
  • #19515 - [ONNX] Add ONNX Backend Tests for systematic frontend coverage
  • #19566 - [ONNX] Prevent Div divide-by-zero crashes
  • #19573 - [ONNX] Fix TopK scalar K extraction in from_onnx
  • #19587 - [Frontend][TFLite] Support StableHLO region-based ops and multi-subgraph models
  • [#19588](#1...
Read more

v0.25.0.rc1

16 Jun 22:55
c7ba073

Choose a tag to compare

v0.25.0.rc1 Pre-release
Pre-release

What's Changed

Full Changelog: v0.25.0.rc0...v0.25.0.rc1

v0.25.0.rc0

08 Jun 20:11
5ec6844

Choose a tag to compare

v0.25.0.rc0 Pre-release
Pre-release

What's Changed

  • [release][Dont Squash] Update version to 0.24.0 and 0.25.0.dev on main branch by @ysh329 in #19446
  • [Relax][Frontend] Add ParameterList and ParameterDict containers by @mshr-h in #19495
  • [Relax][Frontend][TFLite] Add segment operator mappings by @Aharrypotter in #19491
  • [BUGFIX][TIR] Skip bool-typed expressions in CSE by @tqchen in #19502
  • [Relax][Frontend][TFLite] Add tests coverage for SPACE_TO_BATCH_ND and BATCH_TO_SPACE_ND by @rknastenka in #19499
  • [BugFix][Relax] Fix scatter_elements and scatter_nd CUDA compilation by @as4230 in #19497
  • [BugFix][Relax][ONNX] Resolve param Vars in Concat to handle mixed Shape/Tensor inputs by @swjng in #19498
  • [Web] Add support for OPFS by @akaashrp in #19494
  • [BugFix][Relax][Torch] Honor multi-axis dims in torch.flip converter by @swjng in #19511
  • [BugFix][Relax][Torch] Honor correction in std/var converter by @swjng in #19512
  • [BugFix][S-TIR] Wrap bare scalar bodies in DefaultGPUSchedule to avoid root-block crash by @swjng in #19514
  • [Relax][TFLite] Add gather frontend expected IRModule tests by @weicheng-hsu in #19516
  • [Relax][PyTorch] Fix segfault in from_exported_program when model uses index_put_ with tuple output by @cchung100m in #19488
  • [Relax][Frontend][TFLite] Add Conv3D support by @weicheng-hsu in #19523
  • [REFACTOR][IR] Remove dead AttrFunctor template by @tqchen in #19528
  • [Relax][ONNX] Normalize negative indices before the take call for Gather operator by @cchung100m in #19525
  • [Relax][Frontend] Add TFLite Frontend Support for CONV_3D_TRANSPOSE by @weicheng-hsu in #19530
  • [TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope by @oraluben in #19423
  • [Relax][Frontend][TFLite] Add initial StableHLO builtin operator support by @Aharrypotter in #19536
  • [Contrib] Fix CUDA contrib build after FFI/header cleanups by @MasterJH5574 in #19539
  • [BugFix][Relax]: handle ONNX ScatterElements reduction by @THINKER-ONLY in #19527
  • [Fix][Relax]: ONNX Clip NaN bounds and preserve input NaN (ORT parity) by @ConvolutedDog in #19535
  • [Fix][CI]: remove astral-sh/setup-uv from lint workflow by @ConvolutedDog in #19554
  • [Relax][ONNX] Set max_output_boxes_per_class default value to 0 for NonMaxSuppression by @cchung100m in #19547
  • [Relax][ONNX] Add ONNX Backend Tests for systematic frontend coverage by @Aharrypotter in #19515
  • [Fix][Relax] Lower bool prod as logical all by @ConvolutedDog in #19557
  • [Relax][ONNX] Prevent Div divide-by-zero crashes by @cchung100m in #19566
  • [TIRx] Bringup TIRx Infrastructure by @spectrometerHBH in #19581
  • [BugFix][Target][LLVM] Use libm for asin/acos instead of buggy inline Taylor by @swjng in #19567
  • [RFC][CodeGen][CUDA]: Gate fast math intrinsic lowering behind target option by @ConvolutedDog in #19565
  • [TVMScript] Handle undefined functions when dumping IRModule by @ConvolutedDog in #19583
  • [BugFix][Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern by @swjng in #19568
  • [Relax][ONNX] Fix TopK scalar K extraction in from_onnx by @javierdejesusda in #19573
  • [Relax][Frontend][TFLite] Support StableHLO region-based ops and multi-subgraph models by @Aharrypotter in #19587
  • [ONNX] Add RMSNormalization converter for ONNX opset 23 by @q55180514 in #19590
  • [BUILD] Modularize device runtime into per-backend DSOs by @tqchen in #19594
  • [Relax] Normalize negative concat axis in ReorderPermuteDimsAfterConcat by @cchung100m in #19588
  • [RPC][Tracker] Bound msg_size to MAX_TRACKER_MSG_BYTES to prevent unbounded buffer growth by @bl4cksku11 in #19586
  • [CodeGen][CUDA] Move fast math intrinsic lowering option to PassContext by @tlopex in #19596
  • [IR] Add annotations to Call nodes by @tlopex in #19597
  • [REFACTOR][RELAX] Fold CalleeCollector into relax DeadCodeElimination by @tqchen in #19603
  • [Relax][Frontend][TFLite] Support quantized TFLite import via QDQ decomposition by @Aharrypotter in #19538
  • Fix PytestUnknownMarkWarning: Unknown pytest.mark.adreno_clml by @cchung100m in #19602
  • [REFACTOR][IR] Cleanup attrs.h: drop NullValue, AttrsNodeReflAdapter, legacy BaseAttrsNode methods by @tqchen in #19607
  • [Docs] Reorganize development guide content by @tlopex in #19606
  • [REFACTOR] Move src/ir/script_printer.cc to src/script/printer/ by @tqchen in #19611
  • [REFACTOR][IR] Phase out src/ir/structural_{hash,equal}.cc to tvm-ffi by @tqchen in #19613
  • [REFACTOR][IR] Inline ApplyPassToFunction into relax decompose_ops, delete the util by @tqchen in #19612
  • [REFACTOR][TIR][ARITH] Phase out ControlFlowGraph, NarrowPredicateExpression, and rename Simplify to StmtSimplify by @tqchen in #19604
  • [REFACTOR][IR] Phase out class Integer and class Bool in Attrs and PassConfig by @tqchen in #19614
  • [CMAKE][RUNTIME] Link tvm_rpc with all backend runtime libraries by @cbalint13 in #19617
  • [REFACTOR][IR] attrs.h follow-up cleanup: drop legacy vtable / rename / phase out AttrFieldInfo by @tqchen in #19615
  • [REFACTOR][TIR] Tie AnnotateDeviceRegions/SplitHostDevice/LowerDeviceKernelLaunch together by @tqchen in #19605
  • [Relax][Frontend][TFLite] Support control-flow multi-subgraph operators by @Aharrypotter in #19616
  • [Relax][Frontend][TFLite] Add UNIDIRECTIONAL_SEQUENCE_RNN converter by @LudovicoYIN in #19601
  • [IR] Rename Call annotations to attrs by @tlopex in #19618
  • [REFACTOR][RUNTIME] Phase out tvm::runtime::regex_match by @tqchen in #19620
  • [REFACTOR][RUNTIME] Remove leftover microTVM/CRT crumbs by @tqchen in #19622
  • [REFACTOR][RUNTIME] Relocate nvtx.h to tvm/support/cuda and make it header-only by @tqchen in #19621
  • [REFACTOR][PYTHON] Lift compiler/CLI/process modules from tvm.contrib to tvm.support by @tqchen in #19624
  • [REFACTOR][IR][FFI] Bump tvm-ffi (+ SEqHashDef migration) and phase out tvm/ir/repr.h by @tqchen in #19627
  • [REFACTOR][IR] Inline ReplaceGlobalVars into AttachGlobalSymbol by @tqchen in #19625
  • [BugFix][Vulkan][CodeGen] Change OpControlBarrier to AcquireRelease by @kistenklaus in #19619
  • [REFACTOR][RUNTIME] Structural reorganization: locality moves for thread_map, texture, minrpc, disco, contrib by @tqchen in #19628
  • [REFACTOR][PYTHON] Consolidate derived_object into tvm.ir.utils by @tqchen in #19630
  • [CI] Remove tvm-lint from tvm-bot by @yongwww in #19629
  • [REFACTOR][SCRIPT] tvmscript streamline: lift printer.h, restore one-way dep, migrate dialect config to extra_config by @tqchen in #19631
  • [REFACTOR][ARITH] Phase out arith/scalable_expression; arith no longer proves over scalable vectors by @tqchen in #19638
  • [Relax][Frontend][TFLite] Add REDUCE_WINDOW support by @THINKER-ONLY in #19637
  • [Relax][Frontend][TFLite] Add RNN converter by @LudovicoYIN in #19632
  • [REFACTOR][IR] Delete class Bool and class Integer boxed-type wrappers by @tqchen in #19636
  • [Relax][Frontend][TFLite] Add LSTM and SVDF converter by @LudovicoYIN in #19633
  • [Relax][Frontend][TFLite] Add TFLite Resource Variable and Static Hashtable Import Support by @Aharrypotter in #19639
  • [TIRx] Fix stale Simplify import in lowering test by @tlopex in #19642
  • [Relax][Frontend][TFLite] Support sequence LSTM and RNN operators by @LudovicoYIN in #19634
  • [Relax][Frontend][TFLite] Support STABLEHLO_WHILE by @Aharrypotter in #19646
  • [Fix] Stabilize layer_norm variance computation with two-pass reduction by @ConvolutedDog in #19643
    ...
Read more

Apache TVM v0.24.0

09 May 01:20

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax etc.

Please visit the full listing of commits for a complete view: v0.24.dev0...v0.24.0.rc0.

Community

None.

RFCs

None.

Adreno

  • #18867 - Revive and consolicate Adreno features

Arith

  • #19417 - Expose allow_override parameter in Python Analyzer.bind()

BugFix

  • #19432 - [Fix][CUDA] Version compatibility of CUDA symbols
  • #19427 - [FIX] Skip metal target tag registration for unsupported LLVM CPUs
  • #19390 - [LLVM] Fix insertDeclare API mismatch for ROCm-bundled LLVM 20
  • #19410 - [Fix][Runtime][RPC] Fix remote tensor handle cleanup for RPC return values
  • #19385 - [MetaSchedule] Fix compile_relax to apply MetaScheduleApplyDatabase after FuseOps
  • #19383 - [TIRx] Fix bad-optional-access in BF16/FP8 legalize passes for target-less PrimFuncs
  • #19382 - [TIRx] Fix VerifyMemory crash for PrimFuncs without target attribute
  • #19380 - [TOPI] Fix get_const_tuple hanging indefinitely when passed a te.Tensor
  • #19368 - Align tir.round to ties-to-even across all backends
  • #19367 - [ONNX] Fix Round op to use ties-to-even
  • #19362 - [TVMScript] Fix invalid f-string format spec causing TypeError on Python 3.14
  • #19352 - [TVMScript] Add doc.keyword handling for ExprEvaluator._visit
  • #18957 - [FIX] Inline ceil_log2 in gpu_2d_continuous_cumsum to fix MakePackedAPI error
  • #18940 - [Fix] Fix tvm.tir references in Tflite frontend
  • #18887 - [FIX] Fix cumsum kernel sblock_alloc_buffer for non-sblock buffer
  • #18881 - [FIX][Adreno] Replace AllocBuffer with Bind in texture alloc injection
  • #18838 - [TOPI] Fix resize accuracy issue with non-floor rounding
  • #18782 - [S-TIR][FIX] Remove redundant std::move() to itself
  • #18742 - [Fix] Handle empty variable name in NameSupply::FreshName
  • #18694 - [TIR] Fix incorrect optimization when lowering floordiv and f…
  • #18695 - [FIX] Fix T.sblock due to concurrent merge

CI

  • #19445 - [REFACTOR] Decouple data.py from Jenkins script and docker images
  • #18827 - Update images to 20260301-134651-63f099ad
  • #18863 - [S-TIR][Test] Mark meta_schedule tuning tests as skip
  • #18851 - Remove stale test scripts (i386, hexagon, mypy)
  • #18850 - [TEST] Remove stale URL mappings from request_hook
  • #18848 - Remove legacy lint scripts and Apache RAT
  • #18817 - [REFACTOR]Further cleanup docker images
  • #18812 - [REFACTOR]Modernize Python dependency management with uv
  • #18809 - Add GitHub Actions lint workflow
  • #18805 - [REFACTOR][TEST] Migrate tir-transform tests from TE to TVMScript
  • #18804 - [REFACTOR][TEST] Remove unused te imports from test files
  • #18800 - Update images to 20260219-160550-72f51851
  • #18796 - Refactor Dockerfiles and installation scripts
  • #18775 - Update images to 20260214-152058-2a448ce4
  • #18783 - Update system cuda version 12.4->12.8
  • #18780 - Remove unity from tvm-bot
  • #18777 - Update Pillow, pytest-rerunfailures, junitparser, xgboost, onnx and pytorch
  • #18647 - Upgrade Python to 3.10 in CI
  • #18749 - Remove i386 and Hexagon from CI pipeline (2)
  • #18757 - Further cleanup CI after merging unity to main test
  • #18456 - Move conda config files to tests/conda and remove unused conda build infrastructure
  • #18755 - [TEST] Cleanup legacy tests and migrate unity tests to main one
  • #18737 - Remove i386 and Hexagon from CI pipeline (1)
  • #18748 - Remove i386 and hexagon from .asf.yaml
  • #18719 - [REFACTOR][TEST] Migrate all codegen test to tvmscript
  • #18717 - Fix double newlines in nightly docker update
  • #18711 - [REFACTOR][TEST] Replace CompareBeforeAfter for pytest compact
  • #18692 - Fix NameError in nightly docker update workflow

Docker

  • #18854 - Refactor bash.sh: auto-detect rootless, add --shell, TVM_DEV_MOUNTS
  • #18710 - [ci]Nightly Docker image update

Docs

  • #19439 - Refactor BYOC example NPU tutorial
  • #19414 - Fix stale tvm.tirx exclude list and add missing legalize_ops.unary entry
  • #19409 - Fix outdated source install and API reference docs
  • #19407 - Fix #18714: python -c "import tvm; print(tvm.file)" fail
  • #19396 - Add code generation architecture documentation
  • #19398 - Add TVMScript architecture documentation
  • #19397 - Add PyModule tutorial to How-To toctree
  • #19399 - Clean up architecture docs: remove duplicates, fix stale content
  • #19389 - Add Relax VM architecture documentation
  • #19394 - Add operator fusion architecture documentation
  • #19395 - Add BYOC external library dispatch architecture documentation
  • #19387 - Add docstrings for nn.Module classes and core APIs in relax.frontend.nn
  • #19386 - Add tvm.s_tir.tensor_intrin API reference and remove empty legacy tvm/tir directory
  • #19379 - Add API reference for tvm.arith, tvm.testing, tvm.exec, tvm.tirx.backend and extend topi/contrib/ir/target docs
  • #19369 - Add API reference for tvm.s_tir submodules: dlight, meta_schedule, backend
  • #19366 - Add API reference documentation for tvm.script module
  • #19356 - Add DLight and MetaSchedule deep-dive instructions
  • #19364 - TFLite tests requiring Python 3.10 and specific package versions to avoid core dumps
  • #19354 - Add tutorial for importing models from PyTorch, ONNX, and TFLite
  • #19358 - Add Dataflow Pattern Language (DPL) documentation for Relax
  • #19357 - Add Disco distributed runtime architecture overview
  • #19351 - Fix outdated paths, links, and add missing API references across documentation(3)
  • #19353 - Add tvm.s_tir.analysis API reference page
  • #19350 - Add Relax VM architecture overview in documentation
  • #19344 - Fix outdated code examples, typos, and missing API reference in documentation(2)
  • #18965 - Fix outdated code examples, types, and missing references across documentation
  • #18966 - [DOC] Fix various issues
  • #18953 - Align documentation with tirx/s_tir namespace split
  • #18947 - Add tutorial for mixing Python/PyTorch with TVM using BasePyModule
  • #18939 - [DOC] Fix inconsistent code comments
  • [#18941](#1894...
Read more

Apache TVM v0.23.0

01 Feb 10:49

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), TIR etc.

Please visit the full listing of commits for a complete view: v0.23.dev0...v0.23.0.rc0.

Community

None.

RFCs

None.

Adreno

  • #18523 - [TEXTURE] Texture based lowering

Arith

  • #18542 - Revert "Fix InternalError: Check failed: (eval_vec_) is false"
  • #18536 - Fix InternalError: Check failed: (eval_vec_) is false

BugFix

  • #18628 - [Fix] Fix typo in file header comment
  • #18589 - [OpenCL] Guard QCOM perf hint behind USE_OPENCL_EXTN_QCOM to avoid undefined symbol on non-QCOM runtimes
  • #18534 - Prevent segfault when instantiating abstract SearchStrategy

CI

  • #18549 - Remove hardcoded user and repo values
  • #18484 - Update file patterns for specific linting hooks
  • #18470 - Enhance python linting scripts to support revision-based checks
  • #18498 - Use glob for conda/build-environment.yaml in cache key
  • #18495 - Update actions/cache to v4 in setup action
  • #18457 - Fix crash when grep finds no matches
  • #18448 - Update pre-commit configuration
  • #18432 - Enable username checks in PR title and body
  • #18430 - [TEST][CODEGEN] Fix the test scripts tries to tell numpy a dtype name that it cannot recognise
  • #18419 - [TEST] Refactor: remove the deprecated warning message check from test cases

Docs

  • #18545 - Improve static shape tuning parameter configuration (follow-up to commit c71aefc)
  • #18539 - Fix e2e_opt_model tutorial for GPU deployment
  • #18451 - Update the merge setting
  • #18436 - Remove prebuilt package references and disable Colab button at tutorials
  • #18413 - Update cross-compilation and RPC tutorial with modern PyTorch deployment workflow
  • #18412 - Update tutorial for exporting and loading back Relax executables
  • #18404 - Add tutorial for exporting and loading back Relax executables

Frontend

  • #18435 - [ONNX] Fix operator Transpose: TVMError: PermuteDims expects the number of input axes to equal the ndim of the input tensor

LLVM

  • #18586 - [Codegen] Avoid segfault when arith::GetVScaleValues returns empty vector

MetaSchedule

  • #18547 - Fix tune_tir crash with ScheduleError in RewriteParallelVectorizeUnroll

Relax

  • #18676 - Implement dynamic output trimming for NMS
  • #18664 - Add FDataDependent operator attribute for LegalizeOps
  • #18668 - [Onnx] Support Local Response Normalization (LRN)
  • #18667 - Add native size operator
  • #18675 - [LAYOUT] Support for dynamic layout specification
  • #18652 - [ONNX] add support for unique optional outputs
  • #18665 - Replace topi.take with relax.op.take
  • #18663 - Fix wrong memory planning when only lower bound was provided
  • #18666 - [Onnx][Resize] Handle non-4D input tensors
  • #18658 - [Onnx][PReLU] Handle slope and axis argument with different slope shapes
  • #18649 - Remove obsolete TODO comments
  • #18642 - Add FRelaxInferLayout for gather_elements operator
  • #18643 - Add FRelaxInferLayout for scatter_nd operator
  • #18641 - [Op] Fixed incorrect output shape of Pool op when ceil_mode = true
  • #18638 - Add FRelaxInferLayout for scatter_elements operator
  • #18637 - Add FRelaxInferLayout for flip operator
  • #18633 - Add FRelaxInferLayout and TMixedPrecisionPolicy for dynamic_strided_slice
  • #18635 - [Onnx] Pass output_padding param in ConvTranspose
  • #18632 - Move GetUsedVars to analysis module
  • #18629 - Add FInferMixedPrecision and FRelaxInferLayout for conv transpose ops
  • #18626 - [Op][PyTorch] Supported Median operator
  • #18576 - Correct YaRN RoPE frequency scaling formula to align with the original paper
  • #18615 - Add gpu-generic fallback for unrecognized GPU targets
  • #18621 - Use weight shape instead of dim in Embedding.forward
  • #18613 - Remove duplicated test case: test_if_branch_var_scope
  • #18616 - Replaced call_pure_packed with tensor_to_shape operator
  • #18593 - feat: Implement FRelaxInferLayout for tile operator
  • #18618 - Add test case for op attributes in AST printer
  • #18619 - [PyTorch] Fix PyTorch Dynamo frontend for Darwin compatibility
  • #18575 - [ONNX] Add edge padding mode
  • #18620 - Fix flaky test_conv2d gradient numeric test
  • #18609 - Fix batch normalization computation logic
  • #18574 - [Torch] AssertionError: Unsupported function types ['mean.default']
  • #18591 - Chore: Fix the DeprecationWarning: invalid escape sequence \
  • #18577 - Clean up scatter_elements unknown dtype handling
  • #18579 - Add layout inference support for repeat operator
  • #18583 - [Torch] Fixed issues related to sum op when without dim and keep dim
  • #18554 - Enhance unique block name generation with numeric suffixes
  • #18558 - Add edge padding mode
  • #18559 - Add mod operator support
  • #18544 - [PyTorch] Add support for Custom Ops for ExportedProgram frontend
  • #18535 - [PyTorch] Add support for masked_select
  • #18551 - [Frontend] Introduce ModuleDict
  • #18550 - [PyTorch] Enhance scale_factor handling in interpolation
  • #18553 - [PyTorch] Unify dtype used in conv2d tests
  • #18548 - [PyTroch] Add NHWC layout support
  • #18533 - [PyTorch] Fix index_put with broadcast indices
  • #18521 - [PyTorch] Handle unknown output shapes for _sym_size_int
  • #18532 - [PyTorch] Add support for bidirectional GRU
  • #18530 - [PyTorch] Add boolean tensor support for max operation and corresponding test case
  • #18524 - [PyTorch] Fix InternalError when converting scaled_dot_product_attention with 2D inputs
  • #18527 - [PyTorch] Add support for non-persistent buffers in ExportedProgram frontend
  • #18529 - [PyTorch] Add support for binary scalar operations in ExportedProgram frontend and corresponding tests
  • #18522 - [PyTorch] Unify tests using shared tvm.testing.assert_allclose
  • #18516 - [PyTorch] Add support for bidirectional LSTM
  • #18499 - [PyTorch] Add support for sparse matrix multiplication
  • #18518 - [PyTorch] Fix batch normalization training mode correctness
  • #18517 - [PyTorch] Unify tests using shared verify_mo...
Read more

Apache TVM v0.22.0

24 Oct 16:01

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), FFI etc.

Please visit the full listing of commits for a complete view: v0.22.dev0...v0.22.0.rc0.

Community

None.

RFCs

None.

BugFix

  • #18352 - [Fix] Update ShapeView use in nccl.cc
  • #18324 - Fixing binding for bert
  • #18296 - [Fix] Add libxml2 dependency to fix Windows CI build failure
  • #18294 - [Fix] Set DRefObj and CUDAIPCMemoryObj as mutable
  • #18285 - [FFI]Enable load_inline on macos
  • #18287 - [Hotfix] Fix the conflicts about ffi-related updated names
  • #18281 - [FFI]Fix bug of ffi.cpp.load_inline on Windows
  • #18262 - [NNAPI] Use kind() instead of type_key() after FFI refactor
  • #18244 - [Fix] Update FlashInfer JIT header lookup
  • #18237 - [FFI]Fix type_traits on DataType after SmallStr update
  • #18232 - [LLVM][Fix] Do not emit debuginfo on vscale or other unknown types
  • #18219 - [Fix] Resolve deadlock in PopenPoolExecutor and LocalBuilder
  • #18207 - [Fix][ONNX] No precision widening for numpy binary operations
  • #18209 - [ONNX][FRONTEND][Fix] Update Resize to accept ShapeExpr
  • #18210 - [Bug] Fix core dump in InferLayoutRMSNorm and fix typo
  • #18208 - [FFI][Fix] Update datatype registry calls to the new paths
  • #18190 - [Fix] Codegen fix for relax cutlass
  • #18170 - [Fix] Fix the wrong check for tuple node in #18163
  • #18174 - [Misc]Fix missing PadAttrs register in op_attrs.py
  • #18158 - Fix NCCL build with GlobalDef registration
  • #18140 - [NNAPI] Fix type mismatch and test_mean annotation
  • #18138 - [Fix][ONNX] Fixed constant ROI handling in resize2d when loading onnx models
  • #18137 - [Fix][ONNX] Fix CumSum conversion when loading ONNX model

CI

  • #18245 - [LLVM][MSWIN]Fix LLVM module build with latest CI update
  • #18227 - Exit the build for AbortException
  • #18145 - [Test] Use roi_list variable instead of hardcoded values in ROI tensor creation

Docs

  • #18279 - [FFI]Initial bringup of cpp docs
  • #18264 - Misc docs fix
  • #18263 - [FFI]Initial docs scaffolding
  • #18261 - [FFI]Add missing files in packaging example
  • #18256 - [FFI]Wheel Packaging
  • #18128 - [Doc] Visualize the architecture using a UML sequence diagram

Frontend

  • #18143 - [ONNX] Extend axes for layer_norm when gamma/beta are multi-dimensional

LLVM

  • #18204 - Fixes up to the latest LLVM21
  • #18202 - [CPPTEST] Small fixes for LLVM >= 20

MetaSchedule

  • #18243 - [LLVM]Add RISCV V-extension v1.0 kernels to metaschedule

Metal

  • #18290 - Fix MetalModuleCreate
  • #18283 - [Fix]Fix type for device array in Metal API

ROCm

  • #18225 - Minor fixes for latest refactor

FFI

  • #18375 - [TE] [FFI] Fix broken axis/reduce_axis properties in BaseComputeOp and ScanOp after FFI refactoring
  • #18376 - [FFI] Bump tvm-ffi to 0.1.0rc2
  • #18370 - [FFI] Bump tvm-ffi dependency
  • #18354 - [FFI][ABI] Bump tvm-ffi to latest
  • #18349 - [FFI][ABI] Bump tvm-ffi to latest
  • #18345 - [FFI][ABI] Bump tvm-ffi version to reflect RC ABI Update
  • #18332 - [FFI][ABI] Bump version ffi to latest
  • #18314 - [REFACTOR][FFI] Split tvm-ffi into a separate repo
  • #18312 - [FFI][REFACTOR] Update TVM_FFI_STATIC_INIT_BLOCK to fn style
  • #18311 - [FFI][ABI] Better String and Nested Container handling
  • #18308 - [FFI][ABI] Refactor the naming of DLPack speed converter
  • #18307 - [FFI] Update load_inline interface
  • #18306 - [FFI][ABI][REFACTOR] Enhance DLPack Exchange Speed and Behavior
  • #18302 - [FFI][REFACTOR] Refactor python ffi call mechanism for perf
  • #18298 - [FFI] Fix system library symbol lookup
  • #18297 - [FFI] Temp skip windows tests
  • #18295 - [FFI][ABI] Introduce generic stream exchange protocol
  • #18289 - [FFI][REFACTOR] Streamline Object Declare Macros
  • #18284 - [FFI][REFACTOR] Introduce UnsafeInit and enhance ObjectRef null safety
  • #18282 - [FFI] Relax default alignment and continguous requirement
  • #18280 - [FFI][REFACTOR] Cleanup namespace
  • #18278 - [FFI] Temp skip load_inline tests nonlinux
  • #18277 - [FFI][REFACTOR] Cleanup tvm_ffi python API and types
  • #18276 - [FFI] Add ffi::Tensor.strides()
  • #18275 - [FFI][REFACTOR][ABI] Rename NDArray to Tensor
  • #18274 - [FFI] Update the interface of ffi.load_inline to match torch
  • #18273 - [FFI][ABI] Append symbol prefix for ffi exported functions
  • #18272 - [FFI] Construct NDArray.strides by default
  • #18271 - [FFI] Support inline module
  • #18270 - [FFI] Support Opaque PyObject
  • #18266 - [FFI] Update torch stream getter to use native torch c api
  • #18259 - [FFI][ABI] Introduce weak rc support
  • #18258 - [FFI] fix two seemingly migration issue
  • #18254 - [FFI][ABI] ABI Updates to for future metadata and complex ordering
  • #18249 - [FFI][CMAKE] Revert cmake libbacktrace URL and update submodule
  • #18246 - [FFI][CMAKE] Add missing download path for libbacktrace
  • #18234 - [FFI] Misc fixup for windows
  • #18233 - [FFI] Robustify the pyproject setup
  • #18226 - [FFI][REFACTOR] Establish tvm_ffi python module
  • #18221 - [FFI] Fix JSON parser/writer for the fast-math flag
  • #18218 - [FFI][REFACTOR] Cleanup API locations
  • #18217 - [FFI] AudoDLPack compatible with torch stream context
  • #18216 - [FFI][REFACTOR] Establish Stream Context in ffi
  • #18214 - [FFI][REFACTOR] Establish ffi.Module in python
  • #18213 - [FFI] Formalize ffi.Module
  • #18212 - [FFI] Make JSON Parser/Write fastmath safe
  • #18205 - [FFI][REFATOR] Cleanup entry function to redirect
  • #18200 - [FFI][REFACTOR] Update Map ABI to enable flexible smallMap switch
  • #18198 - [FFI][REFACTOR] Move Downcast out of ffi for now
  • #18192 - [FFI] Phase out ObjectPath in favor of AccessPath
  • #18191 - [FFI][REFACTOR] Refactor AccessPath to...
Read more

Apache TVM v0.21.0

17 Jul 02:12

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), FFI etc.

Please visit the full listing of commits for a complete view: v0.21.dev0...v0.21.0.rc0.

Community

None.

RFCs

None.

Arith

  • #18067 - Add IsBound method to ConstIntBoundAnalyzer
  • #18031 - Canonicalize mul-coefficient to rhs
  • #18025 - Fix canonical simplify for LE with incorrect range assumptions

BugFix

  • #18115 - [Fix][Serialization] Add support for NaN value serialization
  • #18103 - [Fix] Replace dmlc::Error with std::exception in VerifyGPUCode
  • #18092 - [Fix] Fix ExecBuilderDeclareFunction method name in exec_builder.py
  • #18087 - fix exception when tvm not built with llvm support
  • #18035 - [CUDA] Fix: Update settings for rerun on Increase FloatImm precision when printing 64 bit values in CUDA codegen
  • #17968 - [Relax][Pytorch] Bugfix of conv_transpose1d and conv_transpose2d
  • #17950 - [Fix][Relax] Fix dangling reference in GetTargetFunctions()
  • #17902 - Fix off-by-one error in the type index range check within Object::IsInstance()
  • #17882 - [Relax][Pytorch] Fix incorrect behaviour of % (mod) operator in TVM frontend
  • #17875 - [Relax][Pytorch] Incorrect Handling of In-Place Ops in FX-Based TVM Frontend
  • #17838 - [TIR] Schedule support reverse-inline with reduction blocks

CI

  • #18071 - Update windows to 2025
  • #18058 - [TEST] Move temp files into tempdir
  • #18037 - Further robustify is_last_build check
  • #17981 - Update images to 20250513-063354-70aa3797
  • #17891 - Update images to 20250428-080833-03eadc65
  • #17905 - Install PyTorch 2.7 compatible with CUDA 11.8
  • #17887 - Upgrade pytorch to 2.7.0, torchvision to 0.22.0, and vulkan sdk to 1.4.309
  • #17846 - Upgrade ubuntu runner image for GitHub CI

Docker

  • #17955 - [CI] Reintroduce NNEF to CI images

Docs

  • #18056 - Update installation instruction based ffi refactor

Frontend

  • #18090 - [Relax][ONNX] Update Reduce ops to support axes as input
  • #18072 - [Relax][ONNX] Update ReduceL1 to opset 18
  • #18016 - [Relax][ONNX] Replace deprecated mapping.TENSOR_TYPE_TO_NP_TYPE usage
  • #18001 - [Relax][ONNX] Fix: bitwise_not misclassified as binary (is …
  • #17990 - [Relax]Fix: Output tensor with zero dimension after torch.u…
  • #17925 - [Relax][PyTorch] Re-enable test_subgraph_capture in dynamo test
  • #17980 - [ONNX] Make bias input optional in LayerNormalization
  • #17918 - [Relax][PyTorch] Add ReLU6 Op Support for Exported Program and FX graph
  • #17930 - [Relax][PyTorch] Add torch.outer Op Support for Exported Program and FX graph
  • #17932 - [Relax][PyTorch] Add UpSample Bicubic Op Support for Exported Program and FX graph
  • #17921 - [Relax][PyTorch] Add AvgPool 1D and 3D Op Support for Exported Program and FX graph
  • #17922 - [Relax][PyTorch] Add Adaptive AvgPool 1D and 3D Op Support for Exported Program and FX graph
  • #17863 - [Relax][PyTorch] CrossEntropyLoss
  • #17919 - [Relax][PyTorch] Add MaxPool 1D and 3D Op Support for Exported Program and FX graph
  • #17926 - [Relax][PyTorch] Add tests for all the dtypes supported in the PyTorch frontend
  • #17924 - [Relax][PyTorch] Add div.Tensor_mode and trunc Op Support for Exported Program and FX graph
  • #17904 - [Relax][PyTorch] Add Meshgrid Op Support for Exported Program and FX graph
  • #17915 - [Relax][PyTorch] Add support for linspace op in fx graph
  • #17886 - [Relax][PyTorch] Add Pixel Shuffle Op Support for Exported Program and FX graph
  • #17908 - [Relax][PyTorch] Add support for eye op in fx graph
  • #17893 - [Relax][Pytorch] Add fmod support
  • #17894 - [Relax][PyTorch] Support torch.bfloat16 dtype in pytorch frontend
  • #17878 - [Relax][PyTorch] Add torch.isin Op Support for Exported Program and FX graph
  • #17889 - [Relax][PyTorch] Support linspace op for ExportedProgram importer
  • #17868 - [Relax][Pytorch] Add support for ones_like, zero_, zeros, type_as, item ops
  • #17857 - [Relax][PyTorch] Refactor norm op for ExportedProgram importer
  • #17852 - [Relax][PyTorch] Sort.default
  • #17871 - [Relax][Pytorch] Add support for bitwise_or op support
  • #17836 - [Relax][PyTorch] support for index.Tensor
  • #17864 - [Relax][PyTorch] Support eye op for ExportedProgram importer
  • #17858 - [Relax][PyTorch] Add copy_ op support in fxGraph
  • #17851 - [Relax][PyTorch] Support leaky_relu_.default and reshape_as.default in ExportedProgram frontend
  • #17843 - [Relax][PyTorch] Add mul_.Tensor, max.default, min.default and pow.Scalar Op Support into Exported Program Frontend
  • #17821 - [Relax][PyTorch] Add Pad Op Support for Exported Program and FX graph
  • #17819 - [Relax][PyTorch] Add Stack Op Support for Exported Program
  • #17849 - [Relax][PyTorch] Add RSub Op Support for Exported Program and FX graph
  • #17850 - [Relax][Pytorch] Add masked_fill op support in ExportedProgram
  • #17816 - [Relax][PyTorch] Add PReLU Op Support for Exported Program and FX graph
  • #17803 - [Relax][PyTorch] Add Logaddexp op support for exported program
  • #17841 - [Relax][PyTorch] Add support for norm op
  • #17832 - [Relax][PyTorch] full.default, full_like.default, ones.default
  • #17830 - [Relax][PyTorch] Support narrow and broadcast_to ops for ExportedProgram importer

LLVM

  • #17859 - [Codegen] Enable SVE/VLA for RISCV targets
  • #17958 - Fix JIT unknown reloc issue for case of RISCV
  • #17954 - [FFI]Fix compilation errors with clang20

Metal

  • #18034 - Fix GetFunction of metal runtime

ROCm

  • #18029 - Fix ROCm build after FFI refactor

Relax

  • #18102 - Fix rotary embedding buffer size calculation
  • #17928 - [KVCache] Per Layer Sliding Window
  • #17840 - Refactor missing op check into shared utility for Torch frontends
  • #17826 - Fix Torch frontends to report all the missing ops

Runtime

  • #18097 - CutensorMap support

TIR

  • #18068 - Extend address_of to support Buffer objects
  • #18069 - Fix block access region detection for nested let bindings
  • #18057 - Phase out ProducerStore, ProducerRealize and Prefetch

TOPI

  • #18039 - [Relax] Support InstanceNorm & Bugfix of InstanceNorm
  • #18063 - [N...
Read more

Apache TVM v0.20.0

19 Apr 12:21

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), CUDA etc.

Please visit the full listing of commits for a complete view: v0.20.dev0...v0.20.0.rc0.

Community

None.

RFCs

None.

Adreno

  • #17608 - [WINDOWS] Windows build dependencies for Adreno target

BugFix

  • #17761 - [FIX][RELAX] fix fusion of transpose + matmul when constant weight
  • #17762 - [Fix] Fix OpenCL header in attention utils
  • #17711 - [Fix][dlight] add an explicit reduction loop check in Reduce
  • #17697 - [Fix] Include <chrono> for std::chrono
  • #17677 - Declare build backend for python package
  • #17598 - [TIR][FIX] update FlopEstimator to include missing nodes
  • #17601 - [Flashinfer][Fix] fix missing args in flashinfer test
  • #17607 - [FIX][TVMC] Fix the mixed precision conversion pipeline

CI

  • #17687 - Update images to 20250226-223225-63bc315f
  • #17680 - update images to 20250225-035137-aeadc31c
  • #17675 - [skip ci]Update github tvmbot
  • #17635 - Cleanup legacy files
  • #17634 - [skip ci]Improve build time
  • #17629 - [skip ci]Robustify CI for SPOT failure
  • #17620 - Unpin pytest-profiling
  • #17621 - [skip ci] Remove legacy CI runners protection
  • #17619 - [Refactor]Remove legacy frontend tests

Dlight

  • #17754 - Fix general reduction rule to support non-last reduction axis
  • #17663 - [CPU] Add CPU Backend Support for GEMV Optimization

Docker

  • #17691 - Fix ml_dtypes downgrade issue introduced by TensorFlow
  • #17686 - Update ml_dtypes to 0.5.1+
  • #17676 - Use Torch GPU on gpu device
  • #17648 - Tensorflow (aka TFLite) upgrade to 2.18.0
  • #17643 - Update ml_dtypes version
  • #17638 - [skip ci]Update ml_dtypes version
  • #17638 - [skip ci]Update ml_dtypes version
  • #17617 - Tensorflow upgrade to 2.18.0

Docs

  • #17650 - Update README
  • #17611 - Download 3rd party embeds to local files
  • #17604 - Update README

MetaSchedule

  • #17104 - Adding post optimization in MetaSchedule to Improve Scheduling

OpenCL & CLML

  • #17571 - [OPENCL][TEXTURE] Improved texture memory planning

Relax

  • #17814 - [PyTorch] Add stack.default and sum.default to exported programs translator
  • #17820 - [PyTorch] Add support for broadcast_to, narrow ops
  • #17822 - [PyTorch] Cleanup tests for ExportedProgram frontend
  • #17806 - [PyTorch] Add Softplus Op Support for Exported Program and FX graph
  • #17817 - [PyTorch] Support dynamic shapes in ExportedProgram frontend
  • #17813 - [PyTorch] Improve ExportedProgram frontend by supporting unflatten.int, hardtanh_.default, dropout_.default, silu_.default, add_.Tensor and relu_.default
  • #17812 - [PyTorch] Support argsort, topk ops for ExportedProgram importer
  • #17810 - [PyTorch] Add support for argsort, sort, topk ops
  • #17809 - [PyTorch] Delete duplicate converter function _to
  • #17807 - [PyTorch] Fix torch 2.6 compatibility issues
  • #17797 - [Pytorch] Update SELU Implementation Using Decomposed Core-Level Ops
  • #17802 - [Pytorch] support for arange in exported programs translator
  • #17801 - [PyTorch] Support where, cumprod and reciprocal ops for ExportedProgram importer
  • #17790 - [PyTorch] Add support for index_select
  • #17786 - [PyTorch] Support softshrink op for ExportedProgram
  • #17788 - [PyTorch] Add support for where, cumprod and reciprocal ops
  • #17785 - [PyTorch] Support prod, std and var ops for ExportedProgram importer
  • #17778 - [PyTorch] Support log2, log10 and log1p ops for ExportedProgram importer
  • #17772 - [PyTorch] Add support for prod, std and var ops
  • #17766 - [PyTorch] Add support for log2, log10 and log1p ops
  • #17760 - [PyTorch] Add support for lerp, select and clone ops
  • #17751 - [PyTorch] Support one_hot, empty_like ops for ExportedProgram importer
  • #17747 - [PyTorch] Support flip, gather, take ops for ExportedProgram importer
  • #17738 - [PyTorch] Support elu, celu, selu ops for ExportedProgram importer
  • #17726 - [PyTorch] Add support for numel, empty_like and one_hot ops
  • #17707 - [PyTorch] Add support for gather, flip and take ops
  • #17702 - [PyTorch] Add support for celu, selu, is_floating_point ops
  • #17694 - [PyTorch] Add support for elu, hardtanh ops
  • #17689 - [PyTorch] Support several binary ops for ExportedProgram importer
  • #17672 - [PyTorch] Refactor binary ops tests
  • #17679 - [PyTorch] Support several unary ops for ExportedProgram importer
  • #17668 - [PyTorch] Add support for and_, lshift, min, or_, rshift, xor ops
  • #17664 - [PyTorch] Add support for ge, gt, le, mod, ne ops
  • #17659 - [PyTorch] Add support for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square ops
  • #17622 - [PyTorch] Add support for abs, ceil, erf, floor, log ops and refactor unary tests
  • #17566 - [ONNX] Add prim experssion support to Neg converter and update Arange converter to use relax.op.arange
  • #17642 - [ONNX]replace topi.split with relax.op.split in the onnx frontend
  • #17674 - [KVCache] PagedKVCache refactor, FlashInfer JIT and MLA integration
  • #17618 - [KVCache] TIR attention kernel support for MLA
  • #17615 - [KVCache] Add KV Cache for CPU Runtime
  • #17616 - [Runtime][KVCache] Initial interface setup for MLA
  • #17782 - [Frontend] Support max/min in frontend op interface
  • #17758 - Allow ingesting tensor.chunk() from exported torch program
  • #17781 - Enable bfloat16 for softmax struct-info inference
  • #17752 - Batch norm correctness on eval mode
  • #17774 - check for tensor_meta in exported_program_translator
  • #17757 - Tensor.split with uneven tensors
  • #17749 - Move TIR backend to gpu_generic
  • #17725 - Ingest Tensor.clamp from torch export
  • #17724 - Add support to ingest Tensor.expand_as()
  • #17723 - Add torch exported program ingestion capability for Tensor.detach(), Tensor.copy_, and aten.lift_fresh_copy
  • #17721 - Allow ingesting Upsample module from torch.export either using Size or Scale Factor argument
  • [#17722](https://g...
Read more

Apache TVM v0.19.0

24 Jan 02:05

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax, OpenCL, MetaSchedule.

Please visit the full listing of commits for a complete view: v0.19.dev0...v0.19.0.rc0.

Community

None.

RFCs

None.

Arith

  • #17469 - [LLVM]Presburger compile fix for MLIR/LLVM 19.x

BugFix

  • #17595 - [Fix][KVCache] Fix incorrect tile size calculation
  • #17549 - [FIX][LLVM] Workaround -mcpu=apple-latest for llvm above 18.0 (#17492)
  • #17537 - [FIX][topi.scatter_nd] fixed shape equality assert by using analyzer to prove equality
  • #17502 - [FIX][TOPI][strided_slice] Fix topi.strided_slice output shape
  • #17505 - [RELAX][ONNX][FIX] add a parser to handle expression in the shape dim names
  • #17490 - [FIX][ONNX][RELAX] Add support for dynamic ShapeExpr in Slice, Squeeze and Flatten
  • #17467 - [FIX][RELAX][ONNX] Fix typo in onnx frontend

CI

  • #17596 - [Test] Skip flaky test to unblock CI
  • #17451 - Upgrade CI image to 20241105-030952-3e386fd3
  • #17534 - Upgrade zephyr-sdk to 0.16.9
  • #17503 - Upgrade oneflow==0.9.0
  • #17485 - Revert jax, keras, tensorflow, and tflite upgrades introduced #17425
  • #17470 - Pin cpplint==1.6.1

Docs

  • #17518 - Few fixes for broken Adreno docs
  • #17527 - Fix typo in TensorIR
  • #17528 - Fix Typo in Debugging TVM

LLVM

  • #17547 - Make compilable with LLVM-20
  • #17538 - [RUNTIME] Make ORCJIT LLVM executor the default one

MetaSchedule

  • #17465 - Fix a multilevel tiling error on dynamic relax workload

OpenCL & CLML

  • #17516 - [RUNTIME][CLML] Dynamic backward compatibility
  • #17519 - [OPENCL][ADRENO] Introduce Qualcomm extension support
  • #17517 - [TEST][CLML] Clip test case updated
  • #17472 - [Device][OpenCL] add CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST to …

Relax

  • #17541 - Fix bug in convert_layout pass
  • #17539 - [KVCache] Fix attention prefill kernel for Metal and Android
  • #17540 - Add support for ONNX LPPool
  • #17536 - [Frontend][Onnx] Add auto_pad support for conv
  • #17525 - support masked_scatter
  • #17506 - [Python]Update Rotary positional embedding scaling
  • #17523 - Add gather_elements and gather_nd operators
  • #17511 - Update ONNX frontend for unique, nonzero and compress
  • #17509 - support scatter ops
  • #17504 - [ONNX] Add support for dynamic shape expression in Expand
  • #17482 - [KVCACHE] Improved schedule for prefill attention
  • #17445 - [MetaSchedule] Support CPU weight prepack
  • #17462 - Enhance Relax op and ONNX frontend
  • #17466 - Revert "[KVCACHE] Improved schedule for prefill attention"

Runtime

  • #17557 - [Dist] Implementation of KV cache transfer
  • #17498 - [mrvl]: Support Marvell Hardware Runtime

TIR

  • #17423 - [Schedule] Add annotate_buffer_access primitive

web

  • #17545 - Allows setting powerPreference on webgpu

Misc

  • #17593 - Fix GPU detection in PerStoreFeatureNode
  • #17554 - [Refactor] Phase out microTVM
  • #17542 - [REFACTOR] Phase out VTA
  • #17533 - [Contrib] Remove CLML version print
  • #17532 - [3rdparty] Update Picojson with const operator[] function (#327)
  • #17474 - [TE][CreatePrimFunc] Fix loop carried dependency case with nested block levels
  • #17501 - Fix InternalError in StaticPlanBlockMemory when visiting DataflowBlockNode
  • #17455 - Compiled with Default Target(LLVM) and Built with USE_MRVL=ON
  • #17481 - [Marvell BYOC]: global_max_pool2d and squeeze op support
  • #17484 - Replace np.int with np.int32
  • #17476 - Pin pytest-profiling==1.7.0
  • #17464 - [JVM] Align Java GraphModule Initialization with Python API
  • #17458 - Show the record if the escape sequence is unsupported

Apache TVM v0.18.0

17 Oct 15:36

Choose a tag to compare

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress):

  • Frontend: PyTorch's ExportedProgram is supported in the relax frontend ( #17346)
  • Community, RFCs
  • AOT, Hexagon, OpenCL & CLML, Web, Metal
  • Relax, Dlight, Disco
  • TIR, TVMScript
  • Docs, Docker, CI, Misc, BugFix

Please visit the full listing of commits for a complete view: v0.18.dev0...v0.18.0.rc0.

Community

  • #17450 - update contributors

RFCs

The new RFC introduces a new backend Android Neural Network API (NNAPI) for BYOC. It is a graph-level neural network inference API provided by the Android runtime. Prior to this RFC, TVM on Android mobile devices mainly relies on OpenCL for GPU acceleration. This RFC aims to add a new codegen and a runtime via the BYOC framework, which enables execution on custom accelerators from SoC vendors on mobile devices.

  • #109 - [RFC] NNAPI Integration via BYOC

BYOC

  • #17385 - [NNAPI] Add NNAPI backend for BYOC

BugFix

  • #17440 - [TIR][Schedule] TileWithTensorIntrin skip ComputeInline if bu…
  • #17419 - [FFI]Grab GIL when check env signals
  • #17403 - [Fix][LLVM] Fix getHostCPUFeatures LLVM version cutoff
  • #17383 - [ONNX] Skip constant If node generated by PyTorch
  • #17360 - [FIX] fix bug when normalize iter with different lower bounds
  • #17148 - [Relax] Preserve existing DataflowBlock in ConvertToDataflow
  • #17345 - [Fix][Relax] Add the missing tree-attn func arg for KV cache creation
  • #17073 - [Relax]FCallPacked not checked in CodegenVMTIR
  • #17315 - [MSC]Bugfix for strided_slice op
  • #17335 - [Relax][PyTorch][Fix] use_convert_torch_tensor_to_relax() where possible
  • #17330 - [Relax][PyTorch]Update layer_norm converter to support immutable_list for normalized_shape
  • #17324 - [Fix] Remove tvm. prefix from image name when ./docker/build.sh
  • #17308 - [TVM4J]Fix unhandled return type in JNI
  • #17307 - [Fix][TIR] LowerThreadAllreduce warp reduction mask
  • #17312 - [Relax]Infer TIR values from shapes inside a tuple
  • #17292 - [Relax]Support torch.unbind op and fix bugs for expand && split
  • #17263 - [Relax]Preserve dtype in ToMixedPrecision for kNever ops
  • #17229 - [Cutlass] fix cutlass instantiate attention template bugs
  • #17121 - [Relax]Fix a bug about the IR construction in test file
  • #17142 - Allow import of TVM when current directory is read-only

CI

  • #17444 - [Docs] Upgrade Sphinx
  • #17425 - Upgrade CI to Python 3.9
  • #17410 - Upgrade unity image tag to 20240917-153130-9f281758
  • #17409 - [Windows] Workaround for error in FindLLVM
  • #17397 - Update image tag to 20240917-153130-9f281758
  • #17338 - Upgrade PyTorch to 2.4.1
  • #17337 - Disable NNPACK build and fix error on Android SDK installaion
  • #17355 - Upgrade github upload-artifact action
  • #17334 - [Hexagon] Forward gtest tests into pytest as separate tests
  • #17271 - Resolve CI compilation failures on MacOSX
  • #17221 - Reduce logging level when checking if docker image exists
  • #17206 - Update dummy-variable regex for pylint
  • #17117 - [CLML]Fix for few clml regression issues
  • #17155 - Remove lint step from unity/pr-head step

Disco

  • #17398 - Enable float8 data type in disco
  • #17275 - Fix double free of nccl communicator
  • #17264 - Disable splitting nccl communicator in single-group
  • #17182 - Implement SocketSession
  • #17191 - Cross-group and p2p send/receive primitives
  • #17180 - Group-wise operation

Dlight

  • #17430 - [GPU] Improve matmul schedule for adreno
  • #17363 - Fix Matmul rule for Conv3D
  • #17259 - [ADRENO] Fix for opencl adreno matmul schedule
  • #17187 - [GPU] Add OpenCL dequant matmul schedule

Docker

  • #17433 - [CI] Add NNEF dependency to CI images

Docs

  • #17436 - [Relax][PyTorch]Use torch.export insteamd of fx.symbolic_trace for tutorial
  • #17402 - [Doc] Update Architecture Overview
  • #17382 - More clarity on security model of RPC server
  • #17380 - [Doc] Relax Deep Dive
  • #17377 - Update document to include security model of RPC server
  • #17378 - Link to project-specific security page
  • #17352 - TVM pip Installation fix
  • #17343 - Minor fix typo in developer howto guide
  • #17328 - [Doc] Deep Dive TensorIR
  • #17327 - [Doc] How to Optimize a Language Model
  • #17320 - [Doc] Customize Optimization
  • #17319 - [Doc] Fix doc build error in e2e_opt_model.py
  • #17306 - [Doc] Refactor How-To
  • #17296 - [Doc] Overview
  • #17298 - [Doc] IRModule
  • #17286 - Introduce Relax API and move legacy part to standalone page
  • #17289 - [Doc] Quick Start
  • #17287 - [Doc] Refactor install docs

Frontend

  • #17431 - [Relax][Onnx] Add support for pad-2
  • #17447 - [ONNX] Move relax related tests to the correct file
  • #17427 - [Relax][ONNX] Expand op support for ONNX frontend
  • #17429 - [Relax][PyTorch] Support tensor manipulation and creation ops for ExportedProgram importer
  • #17426 - [Relax][PyTorch] Support neural network ops for ExportedProgram importer
  • #17424 - [Relax][PyTorch] Support binary, statistical and search ops for ExportedProgram importer
  • #17421 - [Relax][PyTorch] Support more unary ops for ExportedProgram importer
  • #17396 - [Relax][PyTorch] Add support for torch.export.ExportedProgram in Relax PyTorch Frontend
  • #17379 - [Relax][PyTorch] Fix output shape of torch.nn.functional.scaled_dot_product_attention
  • #17376 - [Relax][PyTorch] Cleanup Tensor Manipulation and Creation op converters
  • #17372 - [Relax][PyTorch] Cleanup Statistical, Search and DataType op converters
  • #17369 - [Relax][PyTorch] Cleanup Neural Network op converters
  • #17366 - [Relax][PyTorch] Cleanup binary op converters
  • #17356 - [Relax][PyTorch] Cleanup unary op converters
  • #17350 - [Relax][Onnx] fix params name bug in onnx frontend
  • #17342 - [Relax][PyTorch] Add support for torch.ops.aten.sym_size.int
  • #17300 - [Relax][PyTorch] Add support for torchvision.ops.stochastic_depth
  • #17325 - [Relax][PyTorch] Add support for `torc...
Read more