Release v0.55.0-rc18 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12960537424

📦 Uncategorized

Remove ARCH_NAME from host library code
- PR: #16616
#12253: Implement Batch norm operation for inference mode
- PR: #16432
#16443: Add a programming example of vecadd_multi_core and gtest
- PR: #16446
Enable to/from torch tests for 0D/1D tensors
- PR: #16653
Port all data movements ops to compute_output_specs
- PR: #16652
#15246: Add sweep tests for addcdiv, addcmul, rdiv, rsub, ceil
- PR: #15998
Fix build break
- PR: #16656
Logical sharding for input tensor and halo output
- PR: #16517
#16495: reduce grid for falcon7b mlp matmul
- PR: #16569
Stress NOC mcast test
- PR: #16639
[skip ci] Update subdevice doc
- PR: #16669
Read from and write to partial buffer regions for interleaved buffers where offset and size of specified buffer region are divisible by buffer page size
- PR: #16102
Fix resnet large on GS
- PR: #16665
Fix Pre-allgather Layernorm bad PCC when use 1D reduction
- PR: #16622
#16353: skip no volume tensors
- PR: #16619
Create README.md
- PR: #16675
Update README.md
- PR: #16676
#16367: Added support to enable dram and l1 memory collection without saving to disk
- PR: #16368
Update .clang-format-ignore
- PR: #16681
Tweak BH csrrs init code
- PR: #16682
#0: Clean up confusing refs to Greyskull from ttnn.copy error messages.
- PR: #16647
Update perf and latest features for llm models (Jan 13)
- PR: #16677
Update README.md
- PR: #16702
#16657: Fix to_layout conversion into row major for 1D tensors
- PR: #16684
Tilize with val padding results in L1 cache OOM
- PR: #16633
#0: Fixes from commit ae61802
- PR: #16686
#0: Skip build-docker-image during post-commit code-analysis since the docker image is already built in a previous job
- PR: #16703
Generate test executables per architecture
- PR: #16594
#16587: Update UMD submodule commit for P150 compatibility
- PR: #16709
Replace some instances of Tensor::get_shape with get_logical_shape
- PR: #16655
Update METALIUM_GUIDE.md
- PR: #16602
#16621: Add barriers at end of cq_dispatch_slave.cpp on IERISC
- PR: #16666
Finish porting OPs to compute_output_specs
- PR: #16695
ScopedGraphCapture
- PR: #15774
#15756 Pull in BH LLK fix for maxpool hang
- PR: #16663
#15246: Add sweep tests for logical_and, logical_or, logical_xor
- PR: #16132
#0: (MINOR) Bump to v0.55.0
- PR: #16714
#11512: Add sweeps for eltwise sharded ops 3
- PR: #16307
Add sweeps for unary, unary_sharded and binary_sharded versions of ops: fmod, remainder, maximum, minimum.
- PR: #15911
Don't leak tt_cluster.hpp through kernel_types.hpp
- PR: #16691
#6983: Renable skipped TT-NN unit test
- PR: #16642
#15450: Remove default values from circular buffer parameters in LLK compute APIs
- PR: #16389
update build flag on programming examples docs
- PR: #16635
Fix for P100 board type
- PR: #16718
Sever TT-Train's dependency on TT-Metalium's tests
- PR: #16685
[TT-Train] Update generate of LLM
- PR: #16723
[TT-Train] Add bias=false in LinearLayer
- PR: #16707
TT-Fabric Bringup Initial Check-in
- PR: #16343
#0: Sanitize writes to mailbox on ethernet cores.
- PR: #16574
Add Llama11B-N300 and Llama70B-TG (TP=32) to LLM table in README.md
- PR: #16724
[skip ci] Update llms.md
- PR: #16737
Update test_slice.py
- PR: #16734
#16625: Refactor tracking of sub-device managers from Device to a new class
- PR: #16683
Update code-analysis.yaml
- PR: #16738
[skip ci] Update llms.md
- PR: #16745
remove references to LFS
- PR: #16722
Fixes for conversion to row major for 0D and 0-volume tensors
- PR: #16736
#0: Disable BH tools test at workflow level
- PR: #16749
Removing some usages of LegacyShape, improve Tensor::to_string
- PR: #16711
[skip ci] Fix lint on a doc
- PR: #16751
#0: API Unification for Device and MeshDevice
- PR: #16570
Port ttnn::random and uniform from LegacyShape to SimpleShape
- PR: #16744
#16379: make softmax call moreh_softmax if rank above 4
- PR: #16735
#7126: remove skip for test_sd_matmul test
- PR: #16729
#0: Make device an optional parameter in the tensor distribution API
- PR: #16746
Added build-wheels to fast-dispatch-build-and-unit-tests-wrapper.yaml
- PR: #16638
Adding CCL Async test cases to TG nightly and bug fix
- PR: #16700
#11119: Move op_profiler.hpp under the ttnn folder
- PR: #11167
#15979: Switch to google benchmark for pgm dispatch tests
- PR: #16547
[tt-train] Add weight tying option for NanoGPT demo
- PR: #16768
#0: Fix build of test_pgm_dispatch
- PR: #16773
[tt-train] Update serialization of tensor for DDP
- PR: #16778
#0: Fix failing TG regression tests
- PR: #16776
[skip ci] Update llms.md
- PR: #16775
Add tiled interleaved permute for when width dimension doesn't move (row-major tiled invariant)
- PR: #16671
Add Fabric Router Config to to Hal
- PR: #16761
[skip ci] Update llms.md
- PR: #16791
Reflect ARCH_NAME Changes in CI Workflows
- PR: #16706
[skip ci] Update llms.md
- PR: #16792
#0: Migrate pytensor to use from_vector Tensor creation APIs
- PR: #16767
Afuller/metalium api reorg
- PR: #16578
Ngrujic/sweep tests 3
- PR: #16316
#0: Enable nlp create heads tests on BH
- PR: #16777
Fix to_layout shard bug
- PR: #16754
Fix broken link to host API
- PR: #16799
Add noc flag to test stress noc mcast
- PR: #16772
Set codeowners for transformer ttnn ops
- PR: #16803
#15450: Remove default value for ocb argument in LLK compute API
- PR: #16376
Linking tensor.reshape to ttnn.reshape
- PR: #16377
#16646: Fix dangling reference in sharded tensor args
- PR: #16782
#15450: Remove default values from circular buffer parameters in LLK compute APIs: Transpose and Reduce
- PR: #16427
Add new python api to get architecture name
- PR: #16747
Remove base.hpp
- PR: #16796
[tt-train] Change weights initialization for GPT-2
- PR: #16815
[skip ci] Update llms.md
- PR: #16828
fuse residual add with layernorm
- PR: #16794
[TT-Train] Add multidevice support to dropout
- PR: #16823
#16171: Preload kernels before receiving go message
- PR: #16680
#15450: Remove default values from circular buffer parameters in LLK compute APIs: Test Kernels
- PR: #16613
#16366: Changed kernel config to HiFi4 for 32F matmul
- PR: #16743
Add nightly APC run in debug mode
- PR: #16831
[skip ci] Update llms.md
- PR: #16835
[skip ci] Update llms.md
- PR: #16839
Remove some ARCH_NAME ENV usage at runtime
- PR: #16825
Move out tensor storage into a separate .hpp/.cpp
- PR: #16832
#16460: Add more helpful error message when tt-topology needs to be run
- PR: #16783
Make creation functions use SimpleShape, expose SimpleShape to Python
- PR: #16826
#16242: Initial implementation of MeshBuffer
- PR: #16327
Enable use-override check
- PR: #16842
Privatize Taskflow
- PR: #16838
Fix test_new_all_gather.py regressions caused by API unification between Device/MeshDevice
- PR: #16836
Fix CB allocation warnings from ttnn.reshard
- PR: #16795
Optimize upsample for bilinear mode
- PR: #16487
Remove Shape usage from MultiDeviceStorage
- PR: #16841
Remove redundant bank offset from destination address in ttnn.reshard
- PR: #16800
Add option to raise error on failed local/global tensor comparison
- PR: #16585
Padded Shards for Concat Support
- PR: #16765
#0: Add support for tracing some sub-devices while others are still running programs
- PR: #16810
#16769: bring up all reduce async as a composite op and added llama shape ccl test sweep
- PR: #16784
#0: Lower Size to metalium as Shape2D
- PR: #16814
#15976: Ensure reports insert all devices into the devices table
- PR: #16834
Modify UNet Shallow to return output in CHW channel ordering
- PR: #16742
#16758: Optimize usage and implementation of encode/decode tensor data
- PR: #16759
Device to Device profiler sync
- PR: #16543
Templating and Queue Size Adjustments for Packet Queue
- PR: #16732
Refactor Superset model benchmarking tools to use Pydantic classes and save one json
- PR: #16790
#16078: Fix back-to-back calls of ttnn.close_device()
- PR: #16840
#16434: DPRINT to read buffer once
- PR: #16586
Bring Taskflow from CPM
- PR: #16843
This file seems to be kernel-only
- PR: #16853
Minor SDPA optimizations
- PR: #16566
#15450: Remove default values from circular buffer parameters in LLK compute APIs: Eltwise Unary
- PR: #16527
Fix scaling issue with RT arguments in tilize/untilize with padding
- PR: #16690
Make stress noc mcast test respect physical coordinates + allow option to skip mcaster
- PR: #16833
Fix some shapes for Prefetcher + Matmul, Use Multi-device Global CB
- PR: #16764
Do not build UMD tests
- PR: #16877
Move risc_attribs back to hw/inc
- PR: #16867
Re-enable UNet Shallow trace+2CQ test case
- PR: #16875
Upgrade error message in control plane
- PR: #16863
#15824 Workaround LLK issue in max_pool
- PR: #16849
[skip ci] Fixed TG configuration description in documentation
- PR: #16884
#0: Update pgm_dispatch_golden.json
- PR: #16818
#0: fix stackoverflow in eth tun
- PR: #16889
#0: Refactor enqueue_write_buffer
- PR: #16880
#0: Add skip for mnist tests because I can't take this anymore
- PR: #16891
#0: Remove SetLazyCommandQueueMode from Metal API
- PR: #16886
#16868: Update profiler post proc asserts tripping due to kernel preload
- PR: #16872
#16350: Update reciprocal docs
- PR: #16371
[skip ci] : Update INSTALLING.md
- PR: #16893
Remove sharded_to_interleaved workaround in UNet Shallow
- PR: #16770
Add CI job for running models in comparison mode
- PR: #16808
pybind expose MeshDevice::reshape
- PR: #16798
#0: Update sweeps README
- PR: #16902
Workaround issue #16895, fix PCC checking for wormhole in Resnet50 demo
- PR: #16896

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.55.0-rc18

📦 Uncategorized