Release v0.44.0 · tenstorrent/tt-metal

📦 Uncategorized

Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr
- PR: #5154
#4794: Implement DownBlock2D using ttnn for stable_diffusion model
- PR: #5091
#4797: Implement BasicTransformerBlock sub-module using ttnn for stab…
- PR: #5081
#0: write cluster config for FD mode, non tunneling cores as well
- PR: #5161
Update bw test, change mulsi calls to use *
- PR: #5149
#3003: updated tt-lib documentation
- PR: #5179
#0: Update to v0.44.0
- PR: #5188
#4003: added ability to trace ttnn operations using torchtrail library
- PR: #5135
Support moreh logsoftmax
- PR: #4961
#4614: gitmodules: Use https URLs for submodules
- PR: #5183
#0: add reviewers to frequently touched ops docs file
- PR: #5190
backward ops - hypot and atan2
- PR: #5045
#4885: Move program device map to program
- PR: #5193
#4858: Add support for float to int typecast
- PR: #5058
Matmul_block on a smaller grid size
- PR: #5170
Revert "#0: Add support for typecast float to int"
- PR: #5199
Add dst ethernet router support and remote command processor to accept FD packets on remote chip
- PR: #5102
Falcon40B TT Implementation
- PR: #5046
#5198: Fix moreh softmax related bug
- PR: #5200
#0: skip MOREH Softmax tests from main
- PR: #5202
#3122: Use device grid size in falcon_attention to be genereric...
- PR: #5207
#0: Add assertions for interleaved tensors for ops that don't support sharding
- PR: #5195
#5169: Add activation ops to ttnn
- PR: #5217
#3003: add duration to the ttnn operation nodes when TTNN_ENABLE_LOGGING=1 is used to compile the code
- PR: #5201
#5027: Optimize group attn matmul for Falcon40B decode
- PR: #5127
#0: add documentation about managing documentation
- PR: #5227
Adding docs for maxpool, avg pool and upsample
- PR: #5223
Revert "#0: skip MOREH Softmax tests from d5811b7…
- PR: #5228
#5165: Add hyperbolic ops to ttnn
- PR: #5166
#4866: Add grayskull open source llk-library
- PR: #5136
#5002: simplified preprocessing of CNNs using preprocess_model
- PR: #5181
Create GroupNorm sharded in TTNN
- PR: #5221
#5097: Support for dedicated completion queue thread
- PR: #5098
upsample test calculate grid
- PR: #5238
fix for sharded allocater when num banks == num cores
- PR: #5229
MHA tutorial interactive notebook with diagrams
- PR: #5239
#4003: Adding a profile tutorial
- PR: #5242
#0: Added non-blocking read stress test
- PR: #5243
Revert "MHA tutorial interactive notebook with diagrams"
- PR: #5245
#0: Update all_gather to work for multi_link. Update falcon-40b to use 2 links for all gathers
- PR: #5214
#5142: Remove slow dispatch mode from workgin sweeps
- PR: #5146
#3003: fixed the input tensor documentation
- PR: #5255
#0: Temp slower resnet VM run
- PR: #5256
throw on fast dispatch for to_host_sharded as its not supported
- PR: #5264
#5253: Fix kv_past_len being passed in to rotary embedding for falcon models
- PR: #5254
#5233: started adding ttnn_functional_resnet
- PR: #5240
#3003: updated ttnn documentation to explain what features it has over tt_lib. Added standalone examples of basic usage of ttnn
- PR: #5265
#0: Speedup incremental builds
- PR: #5251
#0: Change setup.py to be git worktree friendly
- PR: #5234
MHA tutorial interactive notebook with diagrams
- PR: #5277
#3003: disable tutorial 6 from running as the unit test
- PR: #5278
Agrebenisan/non blocking tensor reads
- PR: #5244
#5275: CODEOWNERS: update to include files relevant for ttnn team
- PR: #5276
Fix an intermittent launch message transfer error
- PR: #5152
Revert "MHA tutorial interactive notebook with diagrams"
- PR: #5282
#0: add parens in LLK doc
- PR: #5283
#3003: only unit test tutorials that work on pipelines
- PR: #5291
#5246: Add unary math ops to ttnn
- PR: #5259
Vignesh/stable diffusion ttnn basic transformer block fix
- PR: #5211
#4854: Implement attention and rms_norm sub-module using ttnn for mis…
- PR: #5175
#4795: Add upblock2d to functional stable diffusion model
- PR: #5085
#4796: Implement Transformer2DModel using ttnn for stable_diffusion m…
- PR: #5092
#0: Adding llk wormhole_b0 submodule
- PR: #5262
#4003: Adding pyind11 to ttnn
- PR: #5236
#5296: Fix broken link to host_api.hpp in README.md
- PR: #5297
#0: Fix bug with the way we were measuring bert inference time
- PR: #5312
#0: Change local tt_lib._C module install from symlink to copy
- PR: #5292
#5233: added ability to fold batch_norm2d into conv2d
- PR: #5317
#5222: replace hex8_to_hex32.py with cpp to shave off some compile time -temporary fix
- PR: #5220
Enable tests for WHB0
- PR: #5307
#5137: Cleanups for newer Linux distro / toolchains
- PR: #5162
#5233: implemented support for converting all Resnet-18 modules using preprocess_model function
- PR: #5325
#3003: fix model preprocessing bug
- PR: #5332
#4799: Implement CrossAttnDownBlock2D sub-module using ttnn for stabl…
- PR: #5086
#4800: Implement UNetMidBlock2DCrossAttn using ttnn for stable_diffus…
- PR: #5093
#4798: Add ttnn cross attn upblock2d in functional stable diffusion m…
- PR: #5089
#4801: Implement Unet 2D Condition model using ttnn for stable_diffus…
- PR: #5119
#4965: Rename Conv2D to Conv2d and MaxPool2D to MaxPool2d to match torch
- PR: #5219
#0: Remove departed team member from CODEOWNERS
- PR: #5340
#0: add to codeowners
- PR: #5339
#5314: Only stall on first scheduled read after commands with side effects
- PR: #5315
#4965: fix bad rebase
- PR: #5342
#0: Add more instructions for dispatching workflow actions and a note about skipping git hooks
- PR: #5345
Update optimized Bert to support WH grid sizes, add sharding support for RMSNorm
- PR: #5308
#4642: create gtest_smoke as a sanity test suit
- PR: #5112
#5341: context switch if eth txq is full
- PR: #5347
#5323: Convolutions of small size fail during parallelization calculations
- PR: #5324
Npetrovic/transformer softmax
- PR: #5298
Fix groupnorm for narrow channels
- PR: #5320
#4862: added more test for ttnn bloom. Update optimized ttnn bert to match the structure of non-optimized ttnn bert
- PR: #5336
#0: Add an envvar parser with value detection and default value setti…
- PR: #5367
#4732: Clean up compute kernel apis
- PR: #5316
#5318: Modify Falcon7B to use attn_matmul for wormhole
- PR: #5322
#0: make logLocationsRecord a static function
- PR: #5351
#5233: run convs with auto-format
- PR: #5364
#5377: Avoid segfault by checking buffer !null before getting device
- PR: #5381
Alex/metal/pack untilize b0
- PR: #5378
#4487: Support block sharding in upsample
- PR: #5361
#5359: update python package transformers + dependencies to include Falcon
- PR: #5360
#3708: Add support for LN having gamma/beta in bfp8
- PR: #5376
#4003: Skip sweep tests if not available
- PR: #5392
#4003: use faster TMs in optimized ttnn whisper
- PR: #5384
#4732: Clean up compute_kernel_api
- PR: #5375
More optimizations for group_attn_matmul
- PR: #5385
#5233: updated resnet18 to run residual connections
- PR: #5390
#3003: added more meaningful errors to ttnn. Updated getitem to run on device in the cases when it can
- PR: #5403
#5233: simplified the logic in tracer
- PR: #5370
#3003: include ttl operations and necessary types under ttnn.ttl
- PR: #5405
#0: Add note about no merge commits in main
- PR: #5349
#0: Add timeout in profiler regression workflow
- PR: #5355
codeowners update
- PR: #5407
#5365: Add device argument to determine grid size based on target
- PR: #5366
disable whisper until further investigation, see issue #5430
- PR: #5431
#3003: fixed ttnn convs
- PR: #5432
#3886: Fix build error for C++ tests in debug mode
- PR: #5434
#4954: Support depth 32 in maxpool writer
- PR: #4956
#0: Pass output cb to pack init functions
- PR: #5418
#0: skipping DeviceLoadBlankKernels on remote devices
- PR: #5437
#5359: transformers: update version and relax pcc asserts
- PR: #5421
#3003: guidelines for adding new op
- PR: #5440
Don't assume user has one entry in their $PYTHONPATH
- PR: #5250
FP32 tensor support for matmul
- PR: #5414
#3003: updated tutorial 001 to describe the tensor more comprehensively before showing the add
- PR: #5441
Onboard additional metal code owners
- PR: #5445
#5402: Add redesigned host-side sw command queue, it can be configured i…
- PR: #5382
#3003: fixed docs
- PR: #5455
Alex/metal/enable conv tests on b0
- PR: #5425
#5356: git bisect script to find broken commits
- PR: #5348
#0: Update data_format.cpp file
- PR: #5399
Add skip to full grid matmul whb0
- PR: #5461
#3003: simplified the logic in ttnn/operations/matmul.py. Added dataclasses instead of tuples for CoreGrid and ShardShape
- PR: #5450
#5204: adding moreh's test suit. removing an absolute assertion.
- PR: #5373
Npetrovic/lt gt ne fix
- PR: #5304
#0: Move device id attribute from tensor to DeviceStorage
- PR: #5467
#3003: fixed scheduled pipeline
- PR: #5466
Npetrovic/transformer concat sweeps ttnn
- PR: #5208
#3003: added support for running ttnn.matmul using 1D_systolic_array. Also, added support for passsing in the program config directly
- PR: #5468
#5247: Add unary ops to ttnn
- PR: #5279
#4854: Implement feed-forward sub-module using ttnn for mistral model
- PR: #5176
#5326: Add tensor manip ops to ttnn
- PR: #5410
#4775: Add lamb optimizer sweep
- PR: #5039
disable model perf measurements on VMs
- PR: #5464
#0: Add more explicit checks for Q and KV heads sharding for group_attn_matmul
- PR: #5481
#4003: added manage_device decorator. Renamed ttnn.open to ttnn.open_device and ttnn.close to ttnn.close_device
- PR: #5473
#5442: Add math ops to ttnn
- PR: #5465
#5343: Add binary ops to ttnn
- PR: #5443
#5474: Add backward op support for expm1 and exp2
- PR: #5476
#3003: renamed module.(hpp/cpp) to init.(hpp/cpp). Added stubs for matmul
- PR: #5475
#5216: Fix broken link for developer's docs in CONTRIBUTING.md
- PR: #5488
#5334: Add unary math ops to ttnn
- PR: #5404
#5492: Assert before exiting completion queue thread to ensure users …
- PR: #5497
#4003: updated create_sharded_memory_config
- PR: #5503
#5099: Add dprint/watcher enabled to kernel hash
- PR: #5502
A collection of small DPrint features/fixes
- PR: #5463
#0: Fix rebase issue from previous commit
- PR: #5505
#5056: Impl moreh_groupnorm
- PR: #5490
#4003: fix docs
- PR: #5507
#5113: Added generation function which excludes ranges
- PR: #5371
Barsic/bw sweeps
- PR: #5141
#4416: Add post-commit workflow to check build for all available CONFIGs
- PR: #5495
#0: Update ATOL to 0.0081 for roberta
- PR: #5493
#3420: remove S/R EthernetConfig, move out of experimental ns
- PR: #5470
#4420: eager EnqueueHostToDeviceTransfer impl
- PR: #5293
#5446: Remove restart command and associated functionality
- PR: #5501
#4982: Upgrade to checkout v4 so we can use new node 20 and get rid of warnings
- PR: #5524
#4003: removed pad_to_tile and unpad_to_tile
- PR: #5528
#0: Add fail-fast: false for build-and-upload
- PR: #5527
#4003: changed ttnn.ttl to ttnn.experimental
- PR: #5533
#5531: Remove ppa registration for Git
- PR: #5532
#4620: Add host<->L1 bw/latency tests
- PR: #5537
#4443: distribute data transfer between brisc and ncrisc
- PR: #5534

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.44.0

📦 Uncategorized