Releases · tenstorrent/tt-metal

16 Jul 02:20

github-actions

v0.51.0-rc3

e1835e2

v0.51.0-rc3 Pre-release

Pre-release

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247
Move pad_on_host/unpad_on_host to host function in TTNN
- PR: #10178
#9874: Move polygamma_bw to TTNN
- PR: #10146
#5337: increase t3k frequent test timeout
- PR: #10202
Update falcon40b readme
- PR: #10261
#0: add layernorm rmsnorm pybind, move to ttnn
- PR: #10012
#0: Re-enable read cache in llama_model_optimized.
- PR: #10208
Update Mistral/Mixtral README files
- PR: #10259
#0: Update LLama2/3 readme with demo details
- PR: #10263
#0: resnet perf fix
- PR: #10273
Update Mamba README.md
- PR: #10262
OPT convs in RN50 to get better device perf
- PR: #10279
Increase timeout for N300 WH-only model pipeline
- PR: #10287
Prefill+Decode Demo Functional Implementation
- PR: #10281
[Falcon7b] Add wormhole demo perf mode and output verification tests
- PR: #10269
Update Falcon7/40b READMEs with details on model functionality and perf-mode
- PR: #10290
bump python 3.8 venv package version
- PR: #10315
Git bisect workflow on CI runners
- PR: #10316
#9613: scaffolding for weekly scheduled t3k perplexity tests
- PR: #10142
fix syntax issue with bisect script
- PR: #10328
#10231: Clean up t3k runs-on tags to minimum
- PR: #10232
#9490: Remove tt_eager unary ops and bindings
- PR: #10194
only build for arch that a dispatched workflow is running for
- PR: #10318

Assets 7

15 Jul 02:19

github-actions

v0.51.0-rc2

f1dc594

v0.51.0-rc2 Pre-release

Pre-release

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247

Assets 7

11 Jul 02:01

github-actions

v0.51.0-rc1

07aacde

v0.51.0-rc1 Pre-release

Pre-release

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114

Assets 7

10 Jul 22:04

github-actions

v0.50.0

f7c10a2

v0.50.0

📦 Uncategorized

Fix issue with Mamba SSM A weight preprocessing
- PR: #9443
Make buid key unique for mmio and remote devices with same harvest mask
- PR: #9435
#5337: Removed eth_dispatch yaml flag from mistral tests
- PR: #9421
New workflow for custom test dispatch on CI runners
- PR: #9536
#9312: Add single-header boost-ext/reflect library as dependency
- PR: #9328
Opt LayerNorm/RMSNorm with 2D reduce
- PR: #9603
Revert "#8630: support uint8 data type"
- PR: #9649
#0: Fix codeowners for metal bert
- PR: #9635
Revert "Revert "#8630: support uint8 data type""
- PR: #9651
#9642: fix matmul2d in1 sharded with batch>1
- PR: #9655
#0: add tile layout support for GN
- PR: #9645
FD2 packed binary commands
- PR: #9572
#9082: t3k demo with slack notifications for owners. split jobs
- PR: #9625
Rtawfik/issue 9142
- PR: #9674
#9688: Remove redundant left shift in DEBUG_SANITIZE_NOC_READ_TRANSACTION_FROM_STATE
- PR: #9689
#9500: Update eth_interface include in tt_cluster to not be hardcoded for WH
- PR: #9501
#9578: Add WITH_PYTHON_BINDINGS option to allow build w/o python
- PR: #9662
#9587: Update CB and worker Go signals to respect max sub cmd limit introduced by dispatch packed write local copy change
- PR: #9670
Add support for bfloat4 weights in Mamba
- PR: #8869
Use in-place binary operations in Mamba block
- PR: #9726
#5337: Relaxed Mistral expected compilation time in CI by 1 sec
- PR: #9731
Mo/9406 profiler build flags
- PR: #9549
Add support for single col/row/core output grid for matmul 2D
- PR: #9683
#9725: Set release candidate releases on GitHub to pre-release, not draft, to enable downstream users
- PR: #9729
add tagged docker image with releases
- PR: #9693
Rtawfik/issue 9164
- PR: #9700
#5562: resolve reduce scatter issues (nd hang and correctness)
- PR: #9423
Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf
- PR: #9659
#0: Fix bug with var name in single-chip falcon7b demo tests
- PR: #9740
#9735: fix issues with including reflect library
- PR: #9737
#9527: Remove usage of bcast where multiply is used
- PR: #9717
Mchiou/9082 slack notification owners
- PR: #9690
#9681: set name attribute for ttnn operations when fast runtime m…
- PR: #9730
#9553: Add prefix scan op for Mamba prefill
- PR: #9554
#9628: Merge Binary backward ops from tt_eager to TTNN
- PR: #9570
Namhyeong kim/support fp32 dest acc in moreh adam
- PR: #9135
#0: Update t3k workflow timeouts (except freq pipeline)
- PR: #9772
Temporary update Mixtral perf times to pass CI
- PR: #9673
#9479: fix cpu core worker bug
- PR: #9739
#4858: add typecast fp32 <-> int32
- PR: #9736
#0: ViT demo fix
- PR: #9768
#9389: Add support for integer type in sum operation
- PR: #9548
Transfer llama2/3 from experimental to demo folder.
- PR: #9716
#9657: add topk multicore to support larger dimension sizes
- PR: #9718
#4858: add typecast bfp8_b
- PR: #9779
#9082: t3k model perf split tests with slack notifications, disabled cnn
- PR: #9761
#0: Add ttnn/cpp to packages to enable using ttnn kernels in tt_eager ops
- PR: #9784
#9741: Set stricter pytest timeouts
- PR: #9742
#9492: Change models matmul usage to ttnn
- PR: #9727
#9778: test prefetcher hanging with changes to test
- PR: #9795
#9490: TTNN eltwise/unary migration
- PR: #9732
Update timeout for falcon40b t3k demo test
- PR: #9777
#0: Remove extra t3k falcon40b matrix test group
- PR: #9802
#9044: Move dispatch core x y to be part of launch msg
- PR: #9743
Modify rot mat each iteration to avoid allocating 10k tensors upfront
- PR: #9809
Optimize bcast sharded op
- PR: #9822
Start using reflect library
- PR: #9780
#0: Properly delete source folders for wheel testing
- PR: #9829
#9479: Update Mixtral perf estimates
- PR: #9803
#0: Added github community issue workflow
- PR: #9833
#8729: Pytest multiprocess reset infrastructure
- PR: #9677
Enable switching between 1 and 2 cqs in the same process
- PR: #9832
Fixed failing tests for SD Conv tests for WH using new conv
- PR: #9799
#0: Switch org-membership check to an authenticated call
- PR: #9840
#0: Decrease num loops in trace stress tests
- PR: #9724
#9628: Support optional return tensor
- PR: #9769
#0: Use CV to wait for cq_reader in production mode. Remove enqueue_record_event for NB calls
- PR: #9793
#9628: Merge second set of binary backward op from tt_eager to TTNN
- PR: #9771
#0: Bump bert compile time threshold since it's been intermittently failing on ci
- PR: #9844
Mchiou/9792 t3k runner management
- PR: #9847
#0: Bump up Bert inference time due to instability on ci
- PR: #9850
#8865: For host dispatch time measureing increese failing reference t…
- PR: #9438
#9484: Add output_tensor queue_id to dependency ops
- PR: #9494
Adding the new op: Flash Decode!
- PR: #9794
#0: Add missing permissions to issue notification job
- PR: #9863
#9275: Fix Falcon7b demo failing to run by default on an Grayskull e75
- PR: #9859
#9801: Account for 64B BH PCIe alignment in cq cmd sizing
- PR: #9862
#0: Make prefetcher early exit after fetching/reading exec_buf
- PR: #9856
#8683: Add Unary bitwise AND, OR
- PR: #9437
Ngrujic/profiling
- PR: #9875
#9628: Merge third set of binary backward op from tt_eager to TTNN
- PR: #9846
#4858: add typecast uint32
- PR: #9843
Migrate Pad Host Code, Bindings, C++ Usages from TT Eager to TTNN
- PR: #9816
Support longer sequence lengths in ssm_prefix_scan
- PR: #9776
#9709: Add optional transpose_a and transpose_b to ttnn matmul and linear
- PR: #9836
#0: Only run batch 12 bert for GS profiling and tighten some bert/resnet thresholds
- PR: #9851
Asarje/resnet highres 20240624
- PR: #9660
#9492: replace falcon specific matmul calls
- PR: #9810
Extend ssm_eltwise_mul for num_users > 32
- PR: #9867
Update documentation for adding new ttnn operation
- PR: #9841
Extend ssm_1d_reduce for the batch>32
- PR: #9881
#0: rn50 fix add api
- PR: #9890
#9123: Add support for optional output tensors to run in the worker t…
- PR: #9894
#9861: support check_tensor helper_function
- PR: #9869
Fix syntax issues in custom test dispatch workflow
- PR: #9567
Add Mixtral accuracy tests and cleanup its other tests (CI-friendly)
- PR: #9864
#9876: Increase timeout on falcon7b perplexity tests.
- PR: #9880
#9492: Remove bmm/resnet_matmul from models
- PR: #9896
#9410: enable fp32 precision unpacking for interm. CBs
- PR: #9885
#9903: Fix conditional statements and indexing of y values in CoreRange::diff
- PR: #9915
#9860: fix test create device apis
- PR: #9919
#0: delete unused code
- PR: #9921
#9719: fixed l1 clear issue on nlp create qkv heads decode test case
- PR: #9924
Fixing type in llama demo readme
- PR: #9927
#9892: Device only op report
- PR: #9914
#8704: define consts for registers that hold x-y coordinates and amount to shift address to get x-y coord
- PR: #9897
CODEOWNERS update
- PR: #9930
Abhullar/bh misc fix
- PR: #9899
Auto-register C++ ttnn operations in python
- PR: #9900
#9788: Remove TopK from TTLib and replace all references with the TTNN api
- PR: #9884
#0: add owners for resnet demo
- PR: #9937
7-way split of eager tests
- PR: #9950
#9910: Improve Softplus kernel accuracy
- PR: #9893
#9818: Add cache check to op info V2
- PR: #9826
#0: update noc test bound
- PR: #9922
Fix branching bug in softplus kernel
- PR: #9955
propagate error upwards for tests in falcon 40b suite
- PR: #9957
#0: Fix falcon40b softmax import failure
- PR: #9958
#9755: move ttnn.concat to match the new file structure
- PR: #9923
#9837: Assign workers after performing ref count cleanup in async mode
- PR: #9944
#0: Make event_synchronize API safer
- PR: #9965
#0: Update buffer asserts to account for trace buffers
- PR: #9918
Clean up ttnn operation registration on python side
- PR: #9961
#9164: [Blackhole bringup] Add fix for unpack untilize
- PR: #9967
Aliu/no l1 clear
- PR: #9931
Restructure ttnn::permute to match the new standard format
- PR: #9917
#9815: Update host to pass packed write max unicast sub cmds to cq dispatch
- PR: #9868
Distributed layernorm op
- PR: #9382
#9831: re-enable test
- PR: #9976
#8835: cleaned up ttnn operation registration on C++ side
- PR: #9975
#9941: update dram/l1 to noc xy header to do the appropriate shift
- PR: #9948
#9336: Refactoring moreh layernorm
- PR: #9636
#9745: move unpad to slice ttnn cpp references
- PR: #9970
#9980: Update falcon updated outputs
- PR: #9981
Fix Main after Pad Merge
- PR: #9988
Update eltwise bcast unary ops to use memory_config and fix PCC issue for interleaved output
- PR: #9939
Update FD cmds to be PCIe aligned
- PR: #9929
Fix N150 product name to nebula_x1 even if its unharvested.
- PR: #9925
#0: add a second codeowner for conv
- PR: #9990
#0: Get tt-metal to compile with gcc-12
- PR: #9943
#9492: Change to ttnn matmul in tests and tt_eager
- PR: #9928
#9441: add typecast uint16->uint32
- PR: #9991
Move ttnn::embedding to match new pybind structure and replace C++ ttlib embeddings usage with it
- PR: #9969
  -...

Assets 7

12 Jun 14:05

github-actions

v0.49.0

d35ea9d

v0.49.0

📦 Uncategorized

#5044: Add optional output to addalpha
- PR: #8785
#9059: Fix matmul for single core grid
- PR: #9341
readme update
- PR: #9352
#0: (MINOR) Update to v0.49.0
- PR: #9353
#7586: Move common models for single-card nightly to ln model
- PR: #9351
Update Mamba README
- PR: #9344
TTLIB interval to sharded sweeps
- PR: #9003
#0: Update dataflow api comments
- PR: #9343
#9196: Merge new op: Fast reduce nc into main
- PR: #9330
#0: New resnet50 test skipped on WH since its WIP
- PR: #9355
#9329: Restructure ttnn::argmax
- PR: #9331
#9323: Introduce template for new ttnn pull requests
- PR: #9324
#0: skip release build on GH runners, we already test it via build a…
- PR: #9362
Remove unused dependencies and fetch gtest via CPM
- PR: #9332
#8764: Part 3 of docs and model demos changes
- PR: #9350
Ngrujic/profiling
- PR: #8939
[Mistral-7B] Add flags for weight paths
- PR: #9173
Typecast int32->fp16b
- PR: #9317
#9258: Remove ARCH_NAME and TT_METAL_ENV from wheel testing
- PR: #9354
Implemented SD using new Conv API
- PR: #8786
#9258: Re-add wheel into release assets
- PR: #9374
#9361: Install Clang-17 and gdb 14.2
- PR: #9363
#7525: Re-skip demo batch 7 metal_BERT_large_11 on WH because it still hangs ND
- PR: #9385
#9206: add sfpu config reg init to llk sfpu inits
- PR: #9358
#9059: Avoid a couple of fatals in matmul
- PR: #9387
Add Galaxy support.
- PR: #9068

Assets 7

10 Jun 18:09

github-actions

v0.48.0

a19eb11

v0.48.0

📦 Uncategorized

#7744: Add support for non-4D tensor in moreh_sum, moreh_sum_backward
- PR: #7745
#5544: Add output tensors parameter to moreh_nll_loss op
- PR: #7194
#5544: Add output tensors parameter to moreh_sgd op
- PR: #7193
#5544: Fix package build error
- PR: #7818
#5544: Add output tensors parameter to moreh_linear op
- PR: #7147
#5544: Prevent eager unit test failures
- PR: #7835
#7997: Support non-4D tensor in moreh_softmax
- PR: #7998
#7816: Bump SD perf target
- PR: #8140
#8098: Remove temp buffer copying when reading from hugepage to host buffer
- PR: #8138
#0: Specify DEBUG_STATUS as a string literal instead of multiple chars
- PR: #7981
#8212: Fix uneven shards for interleaved_to_sharded op
- PR: #8259
#0: Refactor unpad tile to modify rt args in place and remove dynamic…
- PR: #8308
#7838: Add support for non-4D tensor in moreh_linear OPs
- PR: #8388
#0: Use split_work_for_tilize in both tilize and untilize
- PR: #8470
#8131: resnet-50 fix for b20.
- PR: #8283
Add support for multiple parameters in EltwiseUnary
- PR: #8398
#7625: Enable multicore for tilize with padding by default
- PR: #8527
Trace Support
- PR: #8572
#0: Switch set runtime args assertion for if kernel was placed on core to TT_ASSERT
- PR: #8645
#7179: enabling test case. The issue was not reproducible on 8.12 dri…
- PR: #8613
#4625: Multicore runs for untilize with unpadding on interleaved tensors
- PR: #8622
#0: Cache program cmds, convert cb configs from write linear to write packed
- PR: #8604
#0: Make skip and xfail optional in defining sweep tests
- PR: #8687
Shwetank tt/bcast op
- PR: #8058
#8364: Disable implicit fallback for ttnn.pad
- PR: #8742
#8513: Add slack notifications to several more pipelines
- PR: #8685
#0: Update common RT args to use no stride flag for packed cmd.
- PR: #8696
#0: Option to write compile_commands.json from CMake
- PR: #8761
#8718: eltwise testing for bfloat8
- PR: #8753
Add support for bfloat8 input tensors in Mamba SSM block custom kernels
- PR: #8733
#8460: Enable Clang-17
- PR: #8516
#0: Remove overhead in calling functions wrapped in tensor_impl_wrapper
- PR: #8840
#0: Updating the perf thresold to incorporate Merge back uneven reshard commit.
- PR: #8849
#6365: Add ttnn host tests
- PR: #8210
#6365: Revert "#6365: Add ttnn host tests (#8210)"
- PR: #8879
#4382: fix GH reported vulnerabilities
- PR: #8876
#0: bump C++ timeout limit to 45 minutes
- PR: #8882
update unpad doc for slice generality
- PR: #8878
Convert Falcon7b tt_lib ops and tensors to ttnn.experimental
- PR: #8870
#6365: Fix ttnn host wheel tests
- PR: #8897
Add git bisect script
- PR: #8894
#0: Move falcon40b ci unit tests to different pipeline
- PR: #8891
#8437: remove default matmul program config
- PR: #8772
#0: Add myself to ttnn codeowners
- PR: #8905
#0: Update README.md to include mention of TTNN_CONFIG_OVERRIDES
- PR: #8909
#0: Fix typos and add TTNN_CONFIG_OVERRIDES parameter descriptions to readme
- PR: #8910
#0: Add basic sanity checks during matmul program config creation
- PR: #8875
#8907: Sweep tests for tilize/untilize
- PR: #8908
#8902: Fixed program caching bug in nlp load slice op and added additional test cases for the op
- PR: #8913
#8917: Add sweep test for the fold op
- PR: #8918
#0: Properly support trivial single core case for 1D matmuls
- PR: #8915
#6343: updated test_perf with test for bloom causal_lm
- PR: #8391
#6343: Add functional_bloom test_demo
- PR: #8431
Update README.md
- PR: #8927
Enable optimised attention by default in falcon prefill.
- PR: #8892
Replace FreeList shared_ptr with local_shared_ptr
- PR: #8798
Add dummy_weights mode for mixtral tests
- PR: #8864
Refactor operation calls: Replace operation::run() with operation::launch_op()
- PR: #8893
Use HiFi2 to bump Falcon7b prefill PCC
- PR: #8719
#8902: add input and attn_mask del
- PR: #8928
#8930: Disable llama perf test
- PR: #8935
#0: Add third codeowner to matmul path
- PR: #8934
#0: Add create_venv.sh as environment option in installation instructions
- PR: #8898
#7083: Composite conv fix for relu called after matmul
- PR: #8919
#7525: Skip batch 7 metal BERT on WH B0 because it still hangs too often
- PR: #8938
#8871: Add initial infra/support for dram sharding
- PR: #8901
#8531: delete all makefiles
- PR: #8546
#0: Delete dead code from work_split.hpp
- PR: #8950
#8853: Uplift SFPI to latest w/ BH support
- PR: #8854
#8725: Warn user if kernel cache is enabled
- PR: #8951
#0: Minor test_prefetcher fixes
- PR: #8955
#5389: Move ttnn.repeat to c++
- PR: #8911
#8131: temp fix for PCC issue on W0.
- PR: #8948
Optimize e2e perf Falcon40b modifying layernorm
- PR: #8969
#0: Relax Falcon7b perf target
- PR: #8972
#0: Resolve segfault in llama async mode
- PR: #8963
Resnet Optimizations
- PR: #8933
Create Falcon7b perplexity test and utility functions for text-gen datasets
- PR: #8960
Revert "#8131: temp fix for PCC issue on W0."
- PR: #8984
bmm dram sharded opt
- PR: #8947
#8943: Clean up profiler python_env build flow
- PR: #8949
#8904: Add slack notifications for T3000 unit-tests
- PR: #8906
Add unet shallow functional, performance and demo test files
- PR: #8884
#8932: Multi-Device Mixtral Argmax Support
- PR: #8990
#8264: Worker thread optimizations:
- PR: #8778
TTNN tests for bf8 with mk tiled scalar
- PR: #8485
Ihamer/7468 inject noc delays
- PR: #8889
Support changed csv row orderings in Mixtral's op_perf_results.py
- PR: #8999
Correct merge issue in op_perf_results.py
- PR: #9001
#0: Add kernel groups to test_pgm_dispatch
- PR: #8992
#0: Add docs requirements to python env cache key because it can change the environment as well
- PR: #9010
#0: Add helper function to create CBs
- PR: #8991
#8973: Remove TT_METAL_ENV because we don't need it anymore
- PR: #8974
#5773: Move SD model to demo folder
- PR: #8294
#6938: Implement softplus as a single kernel
- PR: #8249
Model team/rotary embeddings llama
- PR: #8812
#8735: Fix hw/inc/blackhole files for compilation
- PR: #8880
Improve Mixtral perf with ttlib
- PR: #8971
Update README.md
- PR: #9014
#3712: fix old version of GN test
- PR: #9017
#0: Don't error on unused functions in compiler call
- PR: #9018
Revert " #8904: Add slack notifications for T3000 unit-tests"
- PR: #9023
Rtawfik/bh llk api
- PR: #8809
#0: Added interactive demo
- PR: #9020
Move Falcon7b before Mixtral in demo pipeline to workaround issue
- PR: #9034
#8112: Add support for ND tensors to matmul
- PR: #9004
#0: fix dram read benchmark
- PR: #9019
Fix bug in utility_functions::Profiler
- PR: #9025
Remove 1x1 matmul fallback on convolution and generalize convo…
- PR: #8886
#5389: Remove ttnn.split
- PR: #9027
#8767: decouple build folder name from build.cpp
- PR: #8780
#8735: Update common flags for BH build after sfpi module update
- PR: #9024
#8895: Fix ttnn.as_tensor(..) method for placing tensors on-device
- PR: #8964
#8539: Add cq_id to run_operation function args
- PR: #9039
#8632: Support fp32 dest acc en in moreh_sum and moreh_sum_backward
- PR: #8724
#5044: Add optional output tensor and remove autoformat in eltwise binary ops
- PR: #8394
#8895: Fix failing regression test in dump_tensor(...) API
- PR: #9040
More Resnet Optimizations
- PR: #8993
#4858: add typecast fp32 to uint32 op
- PR: #9033
#8995: refactoring moreh arange
- PR: #8996
#0: Add ccache option to build_metal.sh
- PR: #9015
Update Mixtral perf figures
- PR: #9048
#8349: Use BFP4_B for attention mask in falcon7b optimised prefill.
- PR: #9047
#0: Add CODEOWNERS for build_metal.sh
- PR: #9053
Rtawfik/add binary reuse metal
- PR: #8727
Update watcher.rst - use double backticks
- PR: #9054
Falcon40b tt_lib to ttnn.experimental
- PR: #9008
#0: fix dram sharded program cache
- PR: #9031
#7083: New halo fix for enabled program cache
- PR: #8987
#9051: Enable Llama model perf test
- PR: #9052
#8764: Single card WH demo tests
- PR: #9058
#8764: Various docs fixes for WH release
- PR: #8975
#0: Correct script locations for nightly single card
- PR: #9062
#8764: Use new device_l1_small_size fixture for SD demo interactive test
- PR: #9063
#9059: Update matmul test pcc
- PR: #9061
#0: Ensure weka mount is active for demo tests otherwise it won't run
- PR: #9069
#0: remove reserve to avoid bad alloc
- PR: #9067
#8764: Separate n150/n300 demo tests to not run BERT 11 on N150
- PR: #9073
Remove unnecessary llk sfpu param files
- PR: #9065
#9059: Add fallback for getting matmul program config
- PR: #9077
Add grouped convolution support
- PR: #8341
#8282: Support non-4d tensor and fp32_dest_acc_en for moreh nllloss backward
- PR: #8966
#8976: moreh_getitem receive signed integer index tensors
- PR: #8978
#9049: fix moreh_sgd callback and add callback test
- PR: #9050
#0: Remove argmax multi-device test due to segfault
- PR: #9089
#7724: Add prototype for autonomous streams for use in tunneller
- PR: #8207
#9036: GS & BH --> Combine llk param files using variable args
- PR: #9078
#0: optimize allgather for small tensor sizes
...

Assets 5

05 Apr 13:57

github-actions

v0.46.0

cd00276

v0.46.0

📦 Uncategorized

user-triggerable C++ post-commit suite
- PR: #6626
#6406: add missing position_ids/attention_mask to bert demo
- PR: #6617
#6282: Add AdamW
- PR: #6333
#6315: Fix dprint tests for T3000
- PR: #6599
FD2: prefetch stall, dispatch wait, linear read, delay and cleanup
- PR: #6620
#6609: update wording in demo section of main README.md
- PR: #6639
#6364: Autocomplete for pybinded types
- PR: #6440
Asarje/ttnn rn50 b20
- PR: #6629
FD2.0 Test - Fix l1 buffer not page-size aligned in after FD-on-eth changes to L1_UNRESERVED_BASE
- PR: #6646
#6593: Add resharding to Llama2 model when possible.
- PR: #6595
#6572: Fix ttnn.repeat_interleave example in documentation
- PR: #6574
#5780: Re-enable 100K enqueue program stress test on grayskull
- PR: #6648
Enable basic width sharding support in all-gather
- PR: #6642
Alex/metal/remove cb wait markers
- PR: #6628
#6657: Use sysmem manager cq size instead of recomputing it each time…
- PR: #6658
#0: (MINOR) Add Grayskull purchase link and update version to 0.46.0
- PR: #6667
#5063: add TopK API to metal
- PR: #6563
#5480: FD2.0 Test - Fix test_prefetcher for dram paged read test (-t 3) on whb0
- PR: #6663
Fix logit low pcc
- PR: #6538
Backward op - Fixed ldexp, hardsigmoid and asin
- PR: #6542
#6598: Fix softplus
- PR: #6675
Add support for BFP4_B tensor serialization
- PR: #6545
Eltwise mul for different batch size
- PR: #6587
#6575: Split docs into separate Metalium and nn docs
- PR: #6666
#0: Add two separate links for documentation (tt-metalium/ttnn) on README
- PR: #6697
#6361: Update ttnn repeat to use correct shapes when formatting output
- PR: #6526
#0: Sayonaraaaaaaa
- PR: #6702
FD2.0 Test fix test_prefetcher add_paged_dram_data_to_worker_data dropping start_page
- PR: #6703
#5785: Watcher ringbuffer implementation
- PR: #6652
Add FD 2.0 WriteHost Command
- PR: #6614
#0: Put back frequent api tests because I'm an idiot
- PR: #6698
Optimize All Gather Interleaved Worker send/receive
- PR: #6706
#0: changing all #include common/* to #include tt_metal/common/*
- PR: #6669
#6676: Fix issues related to unary lte and gte
- PR: #6685
#5817: Fix lerp
- PR: #6630
#6589: Fix for relu_bw
- PR: #6631
#6633: Backward test update
- PR: #6679
#0: Skip logit, logiteps test
- PR: #6714
#0: Testing CI fix
- PR: #6708
#5480: Update test_prefetcher to pass added hugepage args to dispatch kernel
- PR: #6717
Fix l1 acc, add whb0 optimized conv tests
- PR: #6668
Alignment fix for eth core kernels
- PR: #6696
Add data parallel (multi-chip) for Falcon7b (prefill/decode) model and corresponding tests
- PR: #6656
CQ_DISPATCH_CMD_WRITE_PAGED support in test_dispatcher and passing tests
- PR: #6641
#6647: disable failing ci cpp tests and reenable cpp pipeline on CI
- PR: #6704
Backward test updates
- PR: #6692
Ngrujic/check bugs
- PR: #6688
Add Llama matmul perf tests to main
- PR: #6690
TTLIB: removing working tests from broken
- PR: #6718
#6443: Update backward asin and addcdiv logic
- PR: #6715
#0: Fix output cb size calculation in reshard op for bfp8b
- PR: #6739
#0: use smart ptrs in allocator
- PR: #6719
Jvasilje docs 0322
- PR: #6745
DRAM based device profiler with Tracy support
- PR: #6460
#6553: Fix ttnn.reshape(..) handling for bfloat16, TILE_LAYOUT
PR: #6746
Add Llama2 demo to tt-metal docs
- PR: #6682
Mistral-7B WH demo
- PR: #6501
Revert "#0: Put back frequent api tests because I'm an idiot"
- PR: #6755
FP32 support
- PR: #6747
#0: Add back frequent api tests to run.sh
- PR: #6756
Bteng/watcher ci3
- PR: #6530
Remove cpuprof
- PR: #6758
logo update
- PR: #6762
#6184: sharded row major silu support.
- PR: #6643
#6443: Update div_bw and backward ops test file
- PR: #6742
#6705: Relax forcing of keyword argument in ttnn.open_device
- PR: #6707
Forward op tests
- PR: #6730
#6691: Allow blocking of inner dim within a core for shaded in0 for 2d and 1d systolic matmuls
- PR: #6640
#6662: Width Sharding support for eltwise OP
- PR: #6671
Stable diffusion python API level perf improvements
- PR: #6681
Add get_compute_kernel_config_args function
- PR: #6768
#0: Add fd-2/main triggers for pull_request and push for post-commit
- PR: #6709
#5480: FD2 refactor for pre/dis patch variants
- PR: #6655
#6654: Add perf tests for ttnn ResNet50
- PR: #6673
#5480: Fix fd gtest unit test test_write_host
- PR: #6778
#0: Set myself as setup.py owner
- PR: #6779
#6780: Add mistral7b to demos list in getting started
- PR: #6781
#4003: re-added TTNN_ENABLE_LOGGING as runtime flag
- PR: #6750
#0: Fix semaphore address gen bug
- PR: #6233
#6769: Disable program caching for failing Llama tests.
- PR: #6770
#5480: Fix zero sized write transaction request that could occur in write_linear_host
- PR: #6784
#6077: Fix unet pcc issues
- PR: #6660
Remove DstSync from llk api templates
- PR: #6753
FP32 Support
- PR: #6785
#6680: Reverting move op change
- PR: #6811
#6443: Update asinh and softsign backward
- PR: #6773
Backward tests with updated test modules
- PR: #6765
Ngrujic/check bugs 1
- PR: #6734
#6654: Moving init for self.compute_kernel_config
- PR: #6782
#6805: reproduce the bug with sharded split_query_key_value_and_split_heads
- PR: #6806
#6832: Account for tile-padding in softmax for mistral 7B
- PR: #6833
Enable support for uint32 format to be consumed by SFPU (issue #4624)
- PR: #6796
#4252: fix clang build error since std::log2 only constexpr in gcc
- PR: #6835
#4003: log, debug and add pre- and post- hooks only for top-level ttnn ops
- PR: #6841
#6823: Fix core count to not include dispatch cores in op reprot
- PR: #6831
#6197: Align pages for interleaved <-> sharded.
- PR: #6828
METALIUM_GUIDE
- PR: #6846
Bteng/watcher post commit
- PR: #6760
#6443: update backward test file for relational ops and concat op
- PR: #6817
Revert "Bteng/watcher post commit"
- PR: #6866
#6443: Update backward ops
- PR: #6826
Backward test updates
- PR: #6822
#0: Add the dim 0 support repeat backward
- PR: #5596
Update hard related test ops
- PR: #6816
#6757: Remove set_profiler_location
- PR: #6824
#6443: Update backward ops erfinv elu hypot cos sin
- PR: #6827
#6861: Enable Watcher/dprint tests on T3000 CI
- PR: #6869
Update Mistral perf regression for CI, until issue is resolved
- PR: #6883
Mamba/perf v1
- PR: #6744
#0: remove data movement ops related to silu in SD
- PR: #6798
#4003: added proper fallback for getitem of ttnn.Tensor. Slice the tensor only on the tile boundary but set the shape based on whatever user provided
- PR: #6886
#4003: added proper fallbacks for every op that falls back to torch
- PR: #6888
#6731: add fix to LN width sharding
- PR: #6891
#5797: add back sweep test for ln
- PR: #6893
Integrate GroupNorm V2 to SD model
- PR: #6862
METALIUM_GUIDE.md updates
- PR: #6863
[Falcon7b] Fix bugs with inference throughput measurements in demo
- PR: #6884
#0: shallow unet add perf_mode
- PR: #6904
#6154: 2d matmul in0 height, in1 width sharding
- PR: #6821
#5249: Various Falcon40b test and demo cleanup
- PR: #6764
#0: fix incremental build
- PR: #6914
#0: remove upsample spill to DRAM
- PR: #6905
[Llama2 Prefill] Model Functionality completed
- PR: #6800
Watcher alignment checking for PCIe/DRAM <-> L1
- PR: #6901
#6920: fixed the error in whisper
- PR: #6921
Update METALIUM_GUIDE.md
- PR: #6902
#6644: save l1 buffers to data base
- PR: #6856
Update usage.rst
- PR: #6929
#6804: fix ttnn falcon7b demo regression + add to CI regressions
- PR: #6924
#6285: Add backward support for floor round and div_no_nan
- PR: #6290
[skip ci] Update INSTALLING.md
- PR: #6936
#6873: Add more test combinations to tt_lib sweeps add, add_unary, su…
- PR: #6887
Ngrujic/check bugs 3
- PR: #6951
#6882: Updated Mistral-7b perf estimate
- PR: #6892
#6850: Update install links in Sphinx docs to point directly to INSTALLING.md
- PR: #6953
#6619: Fix per op profiler sum
- PR: #6955
#6644: sync before calling print l1 buffers
- PR: #6958
Barsic/ttlib ops check
- PR: #6772
Barsic/ttlib params fix
- PR: #6944
#6962: Move cd tt-metal earlier in the command list of INSTALLING.md
- PR: #6966
#6819: Add support for CreateKernel absolute file paths
- PR: #6922
#6356: Remove half-half grid logic for bmms
- PR: #6968
#4003: added a flag to disable ttnn fallbacks. Don't throw an error w…
- PR: #6961
#0: Correct FW versions, tt-smi versions, and add note about tt-topology
- PR: #6971
#0: Capitalize tt to TT consistently for marketing
- PR: #6973
#0: Add myself as CODEOWNER for INSTALLING.md
- PR: #6974
#6644: ttnn visualizer
- PR: #6935
#6847: Allow disabling individual watcher features
- PR: #6855
#6889: Support printing/padding/tilizing multi-device tensors
- PR: #6976
#4003: removed ttnn.print_l1_buffers and consolidated all ttnn flags into a CONFIG class
- PR: #6980
#6217: tt_lib async mode support (single chipp tensors supported)
- PR: #6700
Reshard With Ranges
- PR: #6919
#4003: updated buffer report to show...

Assets 5

22 Mar 18:03

github-actions

v0.45.0

4f11681

v0.45.0

🚀 Features

#6204: added support for num_users < 32 for update cache op.
- PR: #6213
#6247 Llama2 Galaxy MLP implementation
- PR: #6265

📦 Uncategorized

#4736: Add support for moreh_norm op
- PR: #4864
Fix moreh_layernorm rstd
- PR: #5616
#5508: Change test_moreh_layernorm.py for debugging
- PR: #5619
#4686: add infra for sharing global struct among ops
- PR: #5456
#5592: Fix pcc on Falcon 7b prefill by turning on l1 packer on MLP 4h-to-h matmul
- PR: #5686
Fix layernorm beta data format reconfig
- PR: #5760
Add linked support for in0 in1 mcast in matmul
- PR: #5759
#4957: optimizing construct_2d_padded_tensor_list
- PR: #5614
#4003: added ttnn.as_tensor and enabled support for caching torch tensor
- PR: #5809
Revert "#0: Fix for fail in asinh backward"
- PR: #5886
#5829: Use moreh_common.hpp for data movement kernels across moreh OPs
- PR: #5833
Barsic/ttnn ops
- PR: #5892
#6030: Update resnet performance metrics
- PR: #6030
#5876: pytest & c++ test logging cleanup
- PR: #5987
#0: Use both 2x2 and 2x4 machines on every scheduled run
- PR: #6091
Add single core matmul benchmark
- PR: #5997
#6079: Update FORCE_INLINE to be nop when watcher is enabled
- PR: #6092
#5980: Fix a hard-coded bounds check in dprint
- PR: #6028
#5389: merged ttl and ttnn tensor classes into one
- PR: #6051
Initial Performance Model
- PR: #6025
fix ci
- PR: #6089
TTNN RN50 :: on the road to match perf with TTLIB version
- PR: #6046
#4438: Optimized single-core fold op
- PR: #5999
#5589: Add repeat-interleave and addcmul sweeps
- PR: #6102
#6055: Add square backward support
- PR: #6071
#6057: Add backward support for lgamma
- PR: #6059
#6056: Add backward support for frac and trunc
- PR: #6065
#6066: Add support for backward log sigmoid
- PR: #6069
#6002: Add backward support for binary maximum
- PR: #6003
Ngrujic/improve conversion to bfloat8b in sweeps
- PR: #6068
#5829: Use moreh_common.hpp for compute kernels across moreh OPs
- PR: #6122
#0: Remove post-commit label from multi device pipeline because it's not actually post commit
- PR: #6142
Add pack l1 acc to resnet conv
- PR: #6054
#6144: Skip 512x512 cross attn 2d upblock for now in nightly because it hangs
- PR: #6145
#6061: Add tanhshrink, threshold, Unary EQ backward ops support
- PR: #6137
Width Sharded Concat for Unet
- PR: #5776
#5184: uncommenting various moreh test case.
- PR: #6143
Fix compute kernel config arg for resnet50
- PR: #6147
Nsmith/untilize unit test
- PR: #6105
Revert "Revert "#5389: merged ttl and tensor classes into one""
- PR: #6158
#4438: Do not use the new fold op in Resnet tests
- PR: #6153
Remove corerangeset that does not work on wormhole
- PR: #6156
#6129: Expose kernel config attrs and use 4 dst tiles for fp32 configs
- PR: #6134
#5391: Add device perf
- PR: #5875
#0: Use multiplier for wormhole b0 mulsi3
- PR: #6160
#4003: removed ttnn.Tensor autoclass from tensor.rst
- PR: #6170
TTNN MultiDevice Support
- PR: #6131
build artifacts
- PR: #6111
#4947: Add noc alignment checks to watcher
- PR: #5998
Add ttnn multi-chip unit test for checking device shards
- PR: #6179
Nsmith/fix unet
- PR: #6141
#6043: Random program stress test of command queues
- PR: #6044
Logit and logiteps backward support
- PR: #6016
Backward support for log2
- PR: #6064
Add missing ttnn tests and disable broken tests until issues are fixed
- PR: #6186
Fix Events feature for FD1.3 (out-of-order event ids, events feature missing) #6093
- PR: #6181
#5873: make top-level post commit workflow re-useable
- PR: #6188
#5589: add groupnorm for ttnn sweeps
- PR: #6167
Ngrujic/ttnn sweeps 4
- PR: #6135
Add ethernet datamover (EDM) - a foundational ethernet transfer engine
- PR: #5718
#6116: Add backward support for softshrink
- PR: #6118
#0: Add verbose make logs to artifact and make nicer name on metal
- PR: #6199
#0: Only use 2x4 setup for multi-card WH CI as 2x2 does not provide us good feedback
- PR: #6202
#4809 dprint tensix regs
- PR: #6072
#4003: fixed bloom perf test
- PR: #6208
#6187: Conv bugfix
- PR: #6205
#0: concat RM support variable stick widths across inputs
- PR: #6207
TTNN RN50 on WHB0
- PR: #6173
#6084: Lower thresholds slightly after using proper configs for device resnet
- PR: #6214
Fast dispatch 2.0 proof of concept
- PR: #6176
#6218: add pytest for matmul 1d 2d
- PR: #6219
#6177: use is_tensor_storage_on_device so it works for MultiDeviceStorage
- PR: #6178
#6082: support workers + eth cores in one program
- PR: #6172
#6215: Rename TensorToMeshMapper/MeshToTensorComposer
- PR: #6220
#6164: Update test_noc_unicast_vs_multicast_to_single_core_latency to not use same cores for producer and consumer on WH
- PR: #6224
#6117: Add backward support for softplus
- PR: #6128
#6223: remove redundant call to context switch
- PR: #6225
Integrate EDM with all-gather.
- PR: #6169
#6136: Add backward support for unary LE and GE
- PR: #6138
#5398: fix unicast binaries
- PR: #6231
Barsic/ttnn ops 2
- PR: #6070
#5380: Add wormhole_b0 model perf tests, only falcon7b in ttlib for now
- PR: #6216
#5372: Updated README.md file for demo
- PR: #6060
#4003: updated ttnn.concat to have a registered fallback
- PR: #6127
Llama2 functional bringup
- PR: #6087
#5589: Add working BFLOAT8_B sweeps to working folder
- PR: #6192
FD2.0 rename HostQ->PrefetchQ, add multi-core capability, fix NOC coords
- PR: #6229
#0: bugfix in ttnn resnet caught by nightly
- PR: #6251
#0: fix tt_bisect build bug
- PR: #6256
Watcher Asserts
- PR: #6175
#6183: add unit test for sd matmul ops
- PR: #6246
#6254: Make program cache per device:
- PR: #6255
#5394: Add functional version of Mamba architecture
- PR: #5948
#6257: Add temporary convenience script for 800MHz / new eth reset dependent CI
- PR: #6258
#5661: Enable gtests for fast dispatch + R chip
- PR: #6110
Alex/metal/bmm large block untilize out
- PR: #6201
#5389: made tensor attributes public and use ttnn::Shape instead of tt::tt_metal::Shape for storing shape
- PR: #6261
Revert "#6183: add unit test for sd matmul ops"
- PR: #6278
#4003: print all of the L1 buffers using ttnn.print_l1_buffer_state
- PR: #6268
#4003: print all of the L1 buffers using ttnn.print_l1_buffers
- PR: #6279
#4438: Implement sharded multi-core fold op for Resnet50
- PR: #6275
#6149: disabled the check for comparing generated report with GOLDEN_L1_BUFFER_REPORT becauson pipelines it looks different than when running locally
- PR: #6280
FD2.0 fixes+mcast support for write and packed_write
- PR: #6263
Shwetank tt/config
- PR: #5843
#0: Change order of device and use_program_cache fixture in remaining pytests
- PR: #6269
Softplus with beta and threshold param
- PR: #6239
Build tests during artifact creation
- PR: #6286
#6149: disabled test_print_l1_buffers_of_add_operation
- PR: #6299
#4003: updated ttnn.to_torch to work with bfloat8_b tensors that are not multiple of tile size without tile padding
- PR: #6277
#0: add to/from L1 reshard test
- PR: #6309
#0: Add back deleted shape assertions for interleaved concat
- PR: #6307
test errors flagged by watcher
- PR: #6320
#0: fix incremental build
- PR: #6103
Merge xuncai/llama-attention-galaxy to main: First version of llama-attention galaxy on emulated chips
- PR: #6297
#6329: Fixing a bug causing mismatch on indices
- PR: #6330
#6321: Test which sweeps read/write buffer and just checks that the e…
- PR: #6322
Support moreh_getitem forward
- PR: #6227
#6125: Update in0_block_w to be full shard width for sharded 2D systolic matmul
- PR: #6262
#6107: Add softsign, sign, unary ceil backward support
- PR: #6191
#6226: Add backward support for div
- PR: #6235
#6234: Add backward support for rdiv
- PR: #6238
#6236: Add backward support for fmod and remainder
- PR: #6240
#4003: added positional embeddings to bert and updated ttnn_sharded_optimized_bert to run with batch size of 12
- PR: #6327
Indexed Fill
- PR: #6328
#5589: remove dtype in gen function sweep tests where needed
- PR: #6249
#6347: Print built-in defines once only
- PR: #6351
#0: Add Mo as code owner on profiler code
- PR: #6352
#0: Simplify tt_lib.scripts package by adding a specific tt_eager/scripts directory and putting the production scripts in there, whereas development scripts will stay in /scripts
- PR: #6324
#0: Fixture reorder changes reverted for falcon_7b perf test
- PR: #6318
#5424: remove metal_ckernel_sfpu
- PR: #5665
#0: Update remaining tt_lib.program_cache calls to use device APIs
- PR: #6357
#6183: add unit test for sd matmul ops
- PR: #6323
#6289: fix dispatcher page calculation
- PR: #6340
#5924: Enable unet on wormhole_b0 changes
- PR: #6198
#6325: skip test_multi_device.py for grayskull arch
- PR: #6332
Alex/metal/pack untilize no repack
- PR: #6371
#6144: Not hanging on GS or WH with or without Watcher
- PR: #6373
Agrebenisan/swq hwq cardinality cleanup
- PR: #6369
#6146: Add backward support for conj
- PR: #6272
#0: bug fix UTWH div_up instead of div trunc for calculating CB sizes
- PR: #6367
Fix To/From Sharded Bug
- PR: #6381
#6206: Fix resharding page mapp...

Assets 5

27 Feb 15:57

github-actions

v0.44.0

4db6308

v0.44.0

📦 Uncategorized

Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr
- PR: #5154
#4794: Implement DownBlock2D using ttnn for stable_diffusion model
- PR: #5091
#4797: Implement BasicTransformerBlock sub-module using ttnn for stab…
- PR: #5081
#0: write cluster config for FD mode, non tunneling cores as well
- PR: #5161
Update bw test, change mulsi calls to use *
- PR: #5149
#3003: updated tt-lib documentation
- PR: #5179
#0: Update to v0.44.0
- PR: #5188
#4003: added ability to trace ttnn operations using torchtrail library
- PR: #5135
Support moreh logsoftmax
- PR: #4961
#4614: gitmodules: Use https URLs for submodules
- PR: #5183
#0: add reviewers to frequently touched ops docs file
- PR: #5190
backward ops - hypot and atan2
- PR: #5045
#4885: Move program device map to program
- PR: #5193
#4858: Add support for float to int typecast
- PR: #5058
Matmul_block on a smaller grid size
- PR: #5170
Revert "#0: Add support for typecast float to int"
- PR: #5199
Add dst ethernet router support and remote command processor to accept FD packets on remote chip
- PR: #5102
Falcon40B TT Implementation
- PR: #5046
#5198: Fix moreh softmax related bug
- PR: #5200
#0: skip MOREH Softmax tests from main
- PR: #5202
#3122: Use device grid size in falcon_attention to be genereric...
- PR: #5207
#0: Add assertions for interleaved tensors for ops that don't support sharding
- PR: #5195
#5169: Add activation ops to ttnn
- PR: #5217
#3003: add duration to the ttnn operation nodes when TTNN_ENABLE_LOGGING=1 is used to compile the code
- PR: #5201
#5027: Optimize group attn matmul for Falcon40B decode
- PR: #5127
#0: add documentation about managing documentation
- PR: #5227
Adding docs for maxpool, avg pool and upsample
- PR: #5223
Revert "#0: skip MOREH Softmax tests from d5811b7…
- PR: #5228
#5165: Add hyperbolic ops to ttnn
- PR: #5166
#4866: Add grayskull open source llk-library
- PR: #5136
#5002: simplified preprocessing of CNNs using preprocess_model
- PR: #5181
Create GroupNorm sharded in TTNN
- PR: #5221
#5097: Support for dedicated completion queue thread
- PR: #5098
upsample test calculate grid
- PR: #5238
fix for sharded allocater when num banks == num cores
- PR: #5229
MHA tutorial interactive notebook with diagrams
- PR: #5239
#4003: Adding a profile tutorial
- PR: #5242
#0: Added non-blocking read stress test
- PR: #5243
Revert "MHA tutorial interactive notebook with diagrams"
- PR: #5245
#0: Update all_gather to work for multi_link. Update falcon-40b to use 2 links for all gathers
- PR: #5214
#5142: Remove slow dispatch mode from workgin sweeps
- PR: #5146
#3003: fixed the input tensor documentation
- PR: #5255
#0: Temp slower resnet VM run
- PR: #5256
throw on fast dispatch for to_host_sharded as its not supported
- PR: #5264
#5253: Fix kv_past_len being passed in to rotary embedding for falcon models
- PR: #5254
#5233: started adding ttnn_functional_resnet
- PR: #5240
#3003: updated ttnn documentation to explain what features it has over tt_lib. Added standalone examples of basic usage of ttnn
- PR: #5265
#0: Speedup incremental builds
- PR: #5251
#0: Change setup.py to be git worktree friendly
- PR: #5234
MHA tutorial interactive notebook with diagrams
- PR: #5277
#3003: disable tutorial 6 from running as the unit test
- PR: #5278
Agrebenisan/non blocking tensor reads
- PR: #5244
#5275: CODEOWNERS: update to include files relevant for ttnn team
- PR: #5276
Fix an intermittent launch message transfer error
- PR: #5152
Revert "MHA tutorial interactive notebook with diagrams"
- PR: #5282
#0: add parens in LLK doc
- PR: #5283
#3003: only unit test tutorials that work on pipelines
- PR: #5291
#5246: Add unary math ops to ttnn
- PR: #5259
Vignesh/stable diffusion ttnn basic transformer block fix
- PR: #5211
#4854: Implement attention and rms_norm sub-module using ttnn for mis…
- PR: #5175
#4795: Add upblock2d to functional stable diffusion model
- PR: #5085
#4796: Implement Transformer2DModel using ttnn for stable_diffusion m…
- PR: #5092
#0: Adding llk wormhole_b0 submodule
- PR: #5262
#4003: Adding pyind11 to ttnn
- PR: #5236
#5296: Fix broken link to host_api.hpp in README.md
- PR: #5297
#0: Fix bug with the way we were measuring bert inference time
- PR: #5312
#0: Change local tt_lib._C module install from symlink to copy
- PR: #5292
#5233: added ability to fold batch_norm2d into conv2d
- PR: #5317
#5222: replace hex8_to_hex32.py with cpp to shave off some compile time -temporary fix
- PR: #5220
Enable tests for WHB0
- PR: #5307
#5137: Cleanups for newer Linux distro / toolchains
- PR: #5162
#5233: implemented support for converting all Resnet-18 modules using preprocess_model function
- PR: #5325
#3003: fix model preprocessing bug
- PR: #5332
#4799: Implement CrossAttnDownBlock2D sub-module using ttnn for stabl…
- PR: #5086
#4800: Implement UNetMidBlock2DCrossAttn using ttnn for stable_diffus…
- PR: #5093
#4798: Add ttnn cross attn upblock2d in functional stable diffusion m…
- PR: #5089
#4801: Implement Unet 2D Condition model using ttnn for stable_diffus…
- PR: #5119
#4965: Rename Conv2D to Conv2d and MaxPool2D to MaxPool2d to match torch
- PR: #5219
#0: Remove departed team member from CODEOWNERS
- PR: #5340
#0: add to codeowners
- PR: #5339
#5314: Only stall on first scheduled read after commands with side effects
- PR: #5315
#4965: fix bad rebase
- PR: #5342
#0: Add more instructions for dispatching workflow actions and a note about skipping git hooks
- PR: #5345
Update optimized Bert to support WH grid sizes, add sharding support for RMSNorm
- PR: #5308
#4642: create gtest_smoke as a sanity test suit
- PR: #5112
#5341: context switch if eth txq is full
- PR: #5347
#5323: Convolutions of small size fail during parallelization calculations
- PR: #5324
Npetrovic/transformer softmax
- PR: #5298
Fix groupnorm for narrow channels
- PR: #5320
#4862: added more test for ttnn bloom. Update optimized ttnn bert to match the structure of non-optimized ttnn bert
- PR: #5336
#0: Add an envvar parser with value detection and default value setti…
- PR: #5367
#4732: Clean up compute kernel apis
- PR: #5316
#5318: Modify Falcon7B to use attn_matmul for wormhole
- PR: #5322
#0: make logLocationsRecord a static function
- PR: #5351
#5233: run convs with auto-format
- PR: #5364
#5377: Avoid segfault by checking buffer !null before getting device
- PR: #5381
Alex/metal/pack untilize b0
- PR: #5378
#4487: Support block sharding in upsample
- PR: #5361
#5359: update python package transformers + dependencies to include Falcon
- PR: #5360
#3708: Add support for LN having gamma/beta in bfp8
- PR: #5376
#4003: Skip sweep tests if not available
- PR: #5392
#4003: use faster TMs in optimized ttnn whisper
- PR: #5384
#4732: Clean up compute_kernel_api
- PR: #5375
More optimizations for group_attn_matmul
- PR: #5385
#5233: updated resnet18 to run residual connections
- PR: #5390
#3003: added more meaningful errors to ttnn. Updated getitem to run on device in the cases when it can
- PR: #5403
#5233: simplified the logic in tracer
- PR: #5370
#3003: include ttl operations and necessary types under ttnn.ttl
- PR: #5405
#0: Add note about no merge commits in main
- PR: #5349
#0: Add timeout in profiler regression workflow
- PR: #5355
codeowners update
- PR: #5407
#5365: Add device argument to determine grid size based on target
- PR: #5366
disable whisper until further investigation, see issue #5430
- PR: #5431
#3003: fixed ttnn convs
- PR: #5432
#3886: Fix build error for C++ tests in debug mode
- PR: #5434
#4954: Support depth 32 in maxpool writer
- PR: #4956
#0: Pass output cb to pack init functions
- PR: #5418
#0: skipping DeviceLoadBlankKernels on remote devices
- PR: #5437
#5359: transformers: update version and relax pcc asserts
- PR: #5421
#3003: guidelines for adding new op
- PR: #5440
Don't assume user has one entry in their $PYTHONPATH
- PR: #5250
FP32 tensor support for matmul
- PR: #5414
#3003: updated tutorial 001 to describe the tensor more comprehensively before showing the add
- PR: #5441
Onboard additional metal code owners
- PR: #5445
#5402: Add redesigned host-side sw command queue, it can be configured i…
- PR: #5382
#3003: fixed docs
- PR: #5455
Alex/metal/enable conv tests on b0
- PR: #5425
#5356: git bisect script to find broken commits
- PR: #5348
#0: Update data_format.cpp file
- PR: #5399
Add skip to full grid matmul whb0
- PR: #5461
#3003: simplified the logic in ttnn/operations/matmul.py. Added dataclasses instead of tuples for CoreGrid and ShardShape
- PR: #5450
#5204: adding moreh's test suit. removing an absolute assertion.
- PR: #5373
Npetrovic/lt gt ne fix
- PR: #5304
#0: Move device id attribute from tensor to DeviceStorage
- PR: #5467
#3003: fixed scheduled pipeline
- PR: #5466
Npetrovic/transformer concat sweeps ttnn
- PR: #5208
#3003: added support for running ttnn.matmul using 1D_systolic_array. Also, added support for passsing in the program config directly
- PR: #5468...

Assets 5

08 Feb 18:02

github-actions

v0.43.0

4b97c17

v0.43.0

📦 Uncategorized

#4668: Yolov5 GS Demo Benchmarking
- PR: #4776
#0: uplift umd; pick up fix for n150 cluster
- PR: #4881
#3178: Fix for wormhole b0 reduce w
- PR: #4882
#4489: fixed bugs in the program caching of eltwise unary and eltwise binary. Updated bloom to use L1 memory config
- PR: #4842
#4821: Add cumsum op to tt_dnn
- PR: #4824
Dispatch/Bandwidth tests
- PR: #4783
#4003: fixed test_eltwise_unary_op
- PR: #4901
Argmax and Argmin Support
- PR: #4779
#3212: softmax works after reduce fix of max, sum, etc. for WHB0
- PR: #4907
#0: (MINOR) Update version to v0.43.0
- PR: #4910
#4761: Add call to ttl repeat_interleave and also provide script for …
- PR: #4891
#4003: fixed the bug with printing the compile-time attributes
- PR: #4918
Support moreh arange
- PR: #4921
Remove skip_for_wormhole_b0 for test_moreh_softmax and test_moreh_softmin
- PR: #4924
#4541: remove unpad start at 0 limitation
- PR: #4566
Agrebenisan/restart cmd fix
- PR: #4922
Support moreh SGD
- PR: #4929
#0: Use fetch-depth: 0 instead of fetch-tags because otherwise git complains of commit SHA/tag conflict
- PR: #4934
#0: Add code owners for primary operations api binding
- PR: #4936
#4547: Add 2x2 window unit tests to ttnn maxpool
- PR: #4909
#4003: restructure ttnn
- PR: #4902
#4889: Change TileSlice printing to only print tile data
- PR: #4912
#4836: Add support for blocking conv activation in 2d systolic conv v…
- PR: #4837
#0: Update unicast cycles lower bound
- PR: #4937
#4904: Add support for 1d width sharded LN
- PR: #4905
#4941: Convert command header to struct for easier maintainability
- PR: #4942
#4823: enable sum_0 operation fails with low PCC [Wormhole,Grayskull]
- PR: #4955
Fix sharded buffers for one core in fast dispatch
- PR: #4944
#4906: global reduce sum, mean, max, min operations added
- PR: #4908
Revert "#4823: enable sum_0 operation fails with low PCC [Wormhole,GS]
- PR: #4963
#0: Change codeowners from specific op binding files/dirs to all tt_lib bindings
- PR: #4938
#4003: split unary sweep into per op sweeps
- PR: #4952
#4232: added support for converting from numpy arrays to ttnn tensors. Borrow data whenever possible when converting from numpy/torch
- PR: #4893
Uplift AttnMatmul to support GroupAttnMatmul
- PR: #4913
Add watcher-specific CI tests
- PR: #4919
#4916: Add avg pool to ttnn
- PR: #4917
#0: Add a lock on DPRINT server raise/wait structures
- PR: #4920
#4967: added validation for input tensors
- PR: #4977
#4971: update documentation by a new doc hierarchy;
- PR: #4983
#0: Leftover decorate_operation replacement for avg pool
- PR: #4987
#4899: fix the permute to operate on the intended shape
- PR: #4951
#4730: Add tt_lib.tensor.concat
- PR: #4990
Aliu/enqueue eth
- PR: #4845
#4003: Updating functional performance from changes in ttnn.permute w…
- PR: #4991
#4984: Remove dead OP_INFO and graph interpreter
- PR: #4985
#4878: initial commit to add Conv parameters to ttnn.preprocess_model_parameters
- PR: #4966
Update Program Hashes for Ops using Mem config
- PR: #4953
#4984: Remove unused dprint functionality
- PR: #5000
Aliu/ci fix
- PR: #5001
#4215: Add Argmax and Argmin Fallback
- PR: #4928
#4999: added input tensor validation to add, sub and mul operations.
- PR: #5004
Support for softmax rm major sharding and causal mask sharding
- PR: #5006
#0: provide API for where() to support scalar True/False branches
- PR: #4988
#5003: Update expected compile and runtimes for perf regression on VM
- PR: #5008
Revert "Update Program Hashes for Ops using Mem config"
- PR: #5021
#4931: add apis to get ethernet by socket ids
- PR: #4932
#4786: Add upsample_nearest2d functional stable diffusion
- PR: #4870
#4986: deploy docs only to main and enable devs to run docs build on different pages
- PR: #5020
Deploy ttnn sweeps results to docs
- PR: #5019
#4958: Move all python api unit tests to frequent in order to reduce SD pipeline length
- PR: #4981
#4999: Added input validation for ttnn.matmul and ttnn.linear. Add unit test for linear operation. Update input tensor validation in binary.py. Fix compute_output_shapes in bmm_op.cpp
- PR: #5010
#4620: Fix+improve bw test
- PR: #5029
#4852: Add unit tests for functional bloom
- PR: #5013
#5032: scalar argument versions for relops
- PR: #5018
#0: Add some README recommendations from MCW to clarify issue about access to internal workflows VM installation page
- PR: #5034
#4790: Implement GEGLU using ttnn for stable_diffusion model
- PR: #4869
#4999: Adding validation checks
- PR: #5011
#4791: Implement Feedforward sub-module using ttnn for stable_diffusi…
- PR: #4868
Npetrovic/bw ops sweeps
- PR: #5009
#4999: update documentation of ttnn operations to include the validation schema
- PR: #5031
#0: Remove model run from frequent_api_pipeline per @tt-rkim
- PR: #5043
Minor dprint/watcher cleanup
- PR: #5030
#4858: Add support for typecast
- PR: #4840
#0: Disable dprint tests because they're flaky at the moment
- PR: #5026
#4946: Add trig ops to ttnn
- PR: #5041
Nshanker/convs split by 2
- PR: #5042
#4946: Add inv trig ops to ttnn
- PR: #5038
#4003: fixed circular dependency in decorators
- PR: #5052
#5054: Removed asserts from conv op host code that are not required. …
- PR: #5055
#4003: fixed circular dependencies in ttnn
- PR: #5061
#4852: Fix CI pipeline by re-enabling functional bloom for causal LM
- PR: #5060
GroupNorm Sharded. support
- PR: #4945
#4972: is_sharded and memory_config is free from tensor
- PR: #4980
#0: eltwise ops/activate operator tracking for GS, and WHB0
- PR: #5074
Aliu/fd tunneling pr
- PR: #4725
#4642: Converted 14 old cpp tests to use gtest, with capabilities to switch btwn FD/SD when possible
- PR: #5050
#4852: Add tests for functional ttnn bloom implementation.
- PR: #5078
#4003: correctly convert all parameters of torch module to ttnn parameters
- PR: #5100
#5082: Pow gradient calculation method is different with pytorch
- PR: #5106
Argmax/Argmin support for channel, batch and all dim
- PR: #5040
#4420: switch to shared_ptr
- PR: #5123
#4420: return shared_future from taskflow async wrapper
- PR: #5121
Minor DPrint fixes
- PR: #5108
#0: Enable/disable clearing L1 from env var
- PR: #5107
#4003: started moving ttnn operation to C++
- PR: #5111
#4003: Add script to help with finding issues that we need approval for
- PR: #5129
#5044: Adding support for optional output tensors
- PR: #5104
#4003: Adding the open flag to show only open PRs
- PR: #5134
#5048: Add CreateDevices and CloseDevices api to detail
- PR: #5118
decouple ClearProgramCache from CommandQueue
- PR: #5124
Conv fixes for padding input channels. Shallow conv fixes. Conv input/output autoformatting. Cleanup
- PR: #5109
Asarje/mp unpack tilize fused
- PR: #5033
Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr
- PR: #5125
#5137: Cleanups for newer Linux distro / toolchains
- PR: #5114
Revert "#5137: Cleanups for newer Linux distro / toolchains"
- PR: #5139
Revert "Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr"
- PR: #5138
#4793: Implement ResnetBlock2D using ttnn for stable_diffusion model
- PR: #5084
#4788: Implement Downsample2D using ttnn for stable_diffusion model
- PR: #5090
#4792: Implement CrossAttention sub-module using ttnn for stable_diff…
- PR: #4927
#4747: Reduce amount of samples in bert sweeps
- PR: #5140
#4789: Add upsample2d to functional_stable_diffusion model
- PR: #5080
#0: Add fix for lamb optimizer
- PR: #5144
#5057: Add relational ops support to TTNN
- PR: #5120
skip eth test suite on GS
- PR: #5155
#4003: updated ttnn.Tensor to be derived form ttl.tensor.Tensor
- PR: #5130
Asarje/shwetank upsample
- PR: #5105
#5082: power gradient is erroneous when exponent is in range (0-1)
- PR: #5158

Contributors

tt-rkim

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

🚀 Features

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

Contributors

Releases: tenstorrent/tt-metal

v0.51.0-rc3

📦 Uncategorized

v0.51.0-rc2

📦 Uncategorized

v0.51.0-rc1

📦 Uncategorized

v0.50.0

📦 Uncategorized

v0.49.0

📦 Uncategorized

v0.48.0

📦 Uncategorized

v0.46.0

📦 Uncategorized

v0.45.0

🚀 Features

📦 Uncategorized

v0.44.0

📦 Uncategorized

v0.43.0

📦 Uncategorized

Contributors