Skip to content

v0.55.0-rc18

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 25 Jan 02:06
· 138 commits to main since this release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12960537424

📦 Uncategorized

  • Remove ARCH_NAME from host library code
  • #12253: Implement Batch norm operation for inference mode
  • #16443: Add a programming example of vecadd_multi_core and gtest
  • Enable to/from torch tests for 0D/1D tensors
  • Port all data movements ops to compute_output_specs
  • #15246: Add sweep tests for addcdiv, addcmul, rdiv, rsub, ceil
  • Fix build break
  • Logical sharding for input tensor and halo output
  • #16495: reduce grid for falcon7b mlp matmul
  • Stress NOC mcast test
  • [skip ci] Update subdevice doc
  • Read from and write to partial buffer regions for interleaved buffers where offset and size of specified buffer region are divisible by buffer page size
  • Fix resnet large on GS
  • Fix Pre-allgather Layernorm bad PCC when use 1D reduction
  • #16353: skip no volume tensors
  • Create README.md
  • Update README.md
  • #16367: Added support to enable dram and l1 memory collection without saving to disk
  • Update .clang-format-ignore
  • Tweak BH csrrs init code
  • #0: Clean up confusing refs to Greyskull from ttnn.copy error messages.
  • Update perf and latest features for llm models (Jan 13)
  • Update README.md
  • #16657: Fix to_layout conversion into row major for 1D tensors
  • Tilize with val padding results in L1 cache OOM
  • #0: Fixes from commit ae61802
  • #0: Skip build-docker-image during post-commit code-analysis since the docker image is already built in a previous job
  • Generate test executables per architecture
  • #16587: Update UMD submodule commit for P150 compatibility
  • Replace some instances of Tensor::get_shape with get_logical_shape
  • Update METALIUM_GUIDE.md
  • #16621: Add barriers at end of cq_dispatch_slave.cpp on IERISC
  • Finish porting OPs to compute_output_specs
  • ScopedGraphCapture
  • #15756 Pull in BH LLK fix for maxpool hang
  • #15246: Add sweep tests for logical_and, logical_or, logical_xor
  • #0: (MINOR) Bump to v0.55.0
  • #11512: Add sweeps for eltwise sharded ops 3
  • Add sweeps for unary, unary_sharded and binary_sharded versions of ops: fmod, remainder, maximum, minimum.
  • Don't leak tt_cluster.hpp through kernel_types.hpp
  • #6983: Renable skipped TT-NN unit test
  • #15450: Remove default values from circular buffer parameters in LLK compute APIs
  • update build flag on programming examples docs
  • Fix for P100 board type
  • Sever TT-Train's dependency on TT-Metalium's tests
  • [TT-Train] Update generate of LLM
  • [TT-Train] Add bias=false in LinearLayer
  • TT-Fabric Bringup Initial Check-in
  • #0: Sanitize writes to mailbox on ethernet cores.
  • Add Llama11B-N300 and Llama70B-TG (TP=32) to LLM table in README.md
  • [skip ci] Update llms.md
  • Update test_slice.py
  • #16625: Refactor tracking of sub-device managers from Device to a new class
  • Update code-analysis.yaml
  • [skip ci] Update llms.md
  • remove references to LFS
  • Fixes for conversion to row major for 0D and 0-volume tensors
  • #0: Disable BH tools test at workflow level
  • Removing some usages of LegacyShape, improve Tensor::to_string
  • [skip ci] Fix lint on a doc
  • #0: API Unification for Device and MeshDevice
  • Port ttnn::random and uniform from LegacyShape to SimpleShape
  • #16379: make softmax call moreh_softmax if rank above 4
  • #7126: remove skip for test_sd_matmul test
  • #0: Make device an optional parameter in the tensor distribution API
  • Added build-wheels to fast-dispatch-build-and-unit-tests-wrapper.yaml
  • Adding CCL Async test cases to TG nightly and bug fix
  • #11119: Move op_profiler.hpp under the ttnn folder
  • #15979: Switch to google benchmark for pgm dispatch tests
  • [tt-train] Add weight tying option for NanoGPT demo
  • #0: Fix build of test_pgm_dispatch
  • [tt-train] Update serialization of tensor for DDP
  • #0: Fix failing TG regression tests
  • [skip ci] Update llms.md
  • Add tiled interleaved permute for when width dimension doesn't move (row-major tiled invariant)
  • Add Fabric Router Config to to Hal
  • [skip ci] Update llms.md
  • Reflect ARCH_NAME Changes in CI Workflows
  • [skip ci] Update llms.md
  • #0: Migrate pytensor to use from_vector Tensor creation APIs
  • Afuller/metalium api reorg
  • Ngrujic/sweep tests 3
  • #0: Enable nlp create heads tests on BH
  • Fix to_layout shard bug
  • Fix broken link to host API
  • Add noc flag to test stress noc mcast
  • Set codeowners for transformer ttnn ops
  • #15450: Remove default value for ocb argument in LLK compute API
  • Linking tensor.reshape to ttnn.reshape
  • #16646: Fix dangling reference in sharded tensor args
  • #15450: Remove default values from circular buffer parameters in LLK compute APIs: Transpose and Reduce
  • Add new python api to get architecture name
  • Remove base.hpp
  • [tt-train] Change weights initialization for GPT-2
  • [skip ci] Update llms.md
  • fuse residual add with layernorm
  • [TT-Train] Add multidevice support to dropout
  • #16171: Preload kernels before receiving go message
  • #15450: Remove default values from circular buffer parameters in LLK compute APIs: Test Kernels
  • #16366: Changed kernel config to HiFi4 for 32F matmul
  • Add nightly APC run in debug mode
  • [skip ci] Update llms.md
  • [skip ci] Update llms.md
  • Remove some ARCH_NAME ENV usage at runtime
  • Move out tensor storage into a separate .hpp/.cpp
  • #16460: Add more helpful error message when tt-topology needs to be run
  • Make creation functions use SimpleShape, expose SimpleShape to Python
  • #16242: Initial implementation of MeshBuffer
  • Enable use-override check
  • Privatize Taskflow
  • Fix test_new_all_gather.py regressions caused by API unification between Device/MeshDevice
  • Fix CB allocation warnings from ttnn.reshard
  • Optimize upsample for bilinear mode
  • Remove Shape usage from MultiDeviceStorage
  • Remove redundant bank offset from destination address in ttnn.reshard
  • Add option to raise error on failed local/global tensor comparison
  • Padded Shards for Concat Support
  • #0: Add support for tracing some sub-devices while others are still running programs
  • #16769: bring up all reduce async as a composite op and added llama shape ccl test sweep
  • #0: Lower Size to metalium as Shape2D
  • #15976: Ensure reports insert all devices into the devices table
  • Modify UNet Shallow to return output in CHW channel ordering
  • #16758: Optimize usage and implementation of encode/decode tensor data
  • Device to Device profiler sync
  • Templating and Queue Size Adjustments for Packet Queue
  • Refactor Superset model benchmarking tools to use Pydantic classes and save one json
  • #16078: Fix back-to-back calls of ttnn.close_device()
  • #16434: DPRINT to read buffer once
  • Bring Taskflow from CPM
  • This file seems to be kernel-only
  • Minor SDPA optimizations
  • #15450: Remove default values from circular buffer parameters in LLK compute APIs: Eltwise Unary
  • Fix scaling issue with RT arguments in tilize/untilize with padding
  • Make stress noc mcast test respect physical coordinates + allow option to skip mcaster
  • Fix some shapes for Prefetcher + Matmul, Use Multi-device Global CB
  • Do not build UMD tests
  • Move risc_attribs back to hw/inc
  • Re-enable UNet Shallow trace+2CQ test case
  • Upgrade error message in control plane
  • #15824 Workaround LLK issue in max_pool
  • [skip ci] Fixed TG configuration description in documentation
  • #0: Update pgm_dispatch_golden.json
  • #0: fix stackoverflow in eth tun
  • #0: Refactor enqueue_write_buffer
  • #0: Add skip for mnist tests because I can't take this anymore
  • #0: Remove SetLazyCommandQueueMode from Metal API
  • #16868: Update profiler post proc asserts tripping due to kernel preload
  • #16350: Update reciprocal docs
  • [skip ci] : Update INSTALLING.md
  • Remove sharded_to_interleaved workaround in UNet Shallow
  • Add CI job for running models in comparison mode
  • pybind expose MeshDevice::reshape
  • #0: Update sweeps README
  • Workaround issue #16895, fix PCC checking for wormhole in Resnet50 demo