v0.44.0
📦 Uncategorized
- Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr
- PR: #5154
- #4794: Implement DownBlock2D using ttnn for stable_diffusion model
- PR: #5091
- #4797: Implement BasicTransformerBlock sub-module using ttnn for stab…
- PR: #5081
- #0: write cluster config for FD mode, non tunneling cores as well
- PR: #5161
- Update bw test, change mulsi calls to use *
- PR: #5149
- #3003: updated tt-lib documentation
- PR: #5179
- #0: Update to v0.44.0
- PR: #5188
- #4003: added ability to trace ttnn operations using torchtrail library
- PR: #5135
- Support moreh logsoftmax
- PR: #4961
- #4614: gitmodules: Use https URLs for submodules
- PR: #5183
- #0: add reviewers to frequently touched ops docs file
- PR: #5190
- backward ops - hypot and atan2
- PR: #5045
- #4885: Move program device map to program
- PR: #5193
- #4858: Add support for float to int typecast
- PR: #5058
- Matmul_block on a smaller grid size
- PR: #5170
- Revert "#0: Add support for typecast float to int"
- PR: #5199
- Add dst ethernet router support and remote command processor to accept FD packets on remote chip
- PR: #5102
- Falcon40B TT Implementation
- PR: #5046
- #5198: Fix moreh softmax related bug
- PR: #5200
- #0: skip MOREH Softmax tests from main
- PR: #5202
- #3122: Use device grid size in falcon_attention to be genereric...
- PR: #5207
- #0: Add assertions for interleaved tensors for ops that don't support sharding
- PR: #5195
- #5169: Add activation ops to ttnn
- PR: #5217
- #3003: add duration to the ttnn operation nodes when TTNN_ENABLE_LOGGING=1 is used to compile the code
- PR: #5201
- #5027: Optimize group attn matmul for Falcon40B decode
- PR: #5127
- #0: add documentation about managing documentation
- PR: #5227
- Adding docs for maxpool, avg pool and upsample
- PR: #5223
- Revert "#0: skip MOREH Softmax tests from d5811b7…
- PR: #5228
- #5165: Add hyperbolic ops to ttnn
- PR: #5166
- #4866: Add grayskull open source llk-library
- PR: #5136
- #5002: simplified preprocessing of CNNs using preprocess_model
- PR: #5181
- Create GroupNorm sharded in TTNN
- PR: #5221
- #5097: Support for dedicated completion queue thread
- PR: #5098
- upsample test calculate grid
- PR: #5238
- fix for sharded allocater when num banks == num cores
- PR: #5229
- MHA tutorial interactive notebook with diagrams
- PR: #5239
- #4003: Adding a profile tutorial
- PR: #5242
- #0: Added non-blocking read stress test
- PR: #5243
- Revert "MHA tutorial interactive notebook with diagrams"
- PR: #5245
- #0: Update all_gather to work for multi_link. Update falcon-40b to use 2 links for all gathers
- PR: #5214
- #5142: Remove slow dispatch mode from workgin sweeps
- PR: #5146
- #3003: fixed the input tensor documentation
- PR: #5255
- #0: Temp slower resnet VM run
- PR: #5256
- throw on fast dispatch for to_host_sharded as its not supported
- PR: #5264
- #5253: Fix kv_past_len being passed in to rotary embedding for falcon models
- PR: #5254
- #5233: started adding ttnn_functional_resnet
- PR: #5240
- #3003: updated ttnn documentation to explain what features it has over tt_lib. Added standalone examples of basic usage of ttnn
- PR: #5265
- #0: Speedup incremental builds
- PR: #5251
- #0: Change setup.py to be git worktree friendly
- PR: #5234
- MHA tutorial interactive notebook with diagrams
- PR: #5277
- #3003: disable tutorial 6 from running as the unit test
- PR: #5278
- Agrebenisan/non blocking tensor reads
- PR: #5244
- #5275: CODEOWNERS: update to include files relevant for ttnn team
- PR: #5276
- Fix an intermittent launch message transfer error
- PR: #5152
- Revert "MHA tutorial interactive notebook with diagrams"
- PR: #5282
- #0: add parens in LLK doc
- PR: #5283
- #3003: only unit test tutorials that work on pipelines
- PR: #5291
- #5246: Add unary math ops to ttnn
- PR: #5259
- Vignesh/stable diffusion ttnn basic transformer block fix
- PR: #5211
- #4854: Implement attention and rms_norm sub-module using ttnn for mis…
- PR: #5175
- #4795: Add upblock2d to functional stable diffusion model
- PR: #5085
- #4796: Implement Transformer2DModel using ttnn for stable_diffusion m…
- PR: #5092
- #0: Adding llk wormhole_b0 submodule
- PR: #5262
- #4003: Adding pyind11 to ttnn
- PR: #5236
- #5296: Fix broken link to host_api.hpp in README.md
- PR: #5297
- #0: Fix bug with the way we were measuring bert inference time
- PR: #5312
- #0: Change local tt_lib._C module install from symlink to copy
- PR: #5292
- #5233: added ability to fold batch_norm2d into conv2d
- PR: #5317
- #5222: replace hex8_to_hex32.py with cpp to shave off some compile time -temporary fix
- PR: #5220
- Enable tests for WHB0
- PR: #5307
- #5137: Cleanups for newer Linux distro / toolchains
- PR: #5162
- #5233: implemented support for converting all Resnet-18 modules using preprocess_model function
- PR: #5325
- #3003: fix model preprocessing bug
- PR: #5332
- #4799: Implement CrossAttnDownBlock2D sub-module using ttnn for stabl…
- PR: #5086
- #4800: Implement UNetMidBlock2DCrossAttn using ttnn for stable_diffus…
- PR: #5093
- #4798: Add ttnn cross attn upblock2d in functional stable diffusion m…
- PR: #5089
- #4801: Implement Unet 2D Condition model using ttnn for stable_diffus…
- PR: #5119
- #4965: Rename Conv2D to Conv2d and MaxPool2D to MaxPool2d to match torch
- PR: #5219
- #0: Remove departed team member from CODEOWNERS
- PR: #5340
- #0: add to codeowners
- PR: #5339
- #5314: Only stall on first scheduled read after commands with side effects
- PR: #5315
- #4965: fix bad rebase
- PR: #5342
- #0: Add more instructions for dispatching workflow actions and a note about skipping git hooks
- PR: #5345
- Update optimized Bert to support WH grid sizes, add sharding support for RMSNorm
- PR: #5308
- #4642: create gtest_smoke as a sanity test suit
- PR: #5112
- #5341: context switch if eth txq is full
- PR: #5347
- #5323: Convolutions of small size fail during parallelization calculations
- PR: #5324
- Npetrovic/transformer softmax
- PR: #5298
- Fix groupnorm for narrow channels
- PR: #5320
- #4862: added more test for ttnn bloom. Update optimized ttnn bert to match the structure of non-optimized ttnn bert
- PR: #5336
- #0: Add an envvar parser with value detection and default value setti…
- PR: #5367
- #4732: Clean up compute kernel apis
- PR: #5316
- #5318: Modify Falcon7B to use attn_matmul for wormhole
- PR: #5322
- #0: make logLocationsRecord a static function
- PR: #5351
- #5233: run convs with auto-format
- PR: #5364
- #5377: Avoid segfault by checking buffer !null before getting device
- PR: #5381
- Alex/metal/pack untilize b0
- PR: #5378
- #4487: Support block sharding in upsample
- PR: #5361
- #5359: update python package transformers + dependencies to include Falcon
- PR: #5360
- #3708: Add support for LN having gamma/beta in bfp8
- PR: #5376
- #4003: Skip sweep tests if not available
- PR: #5392
- #4003: use faster TMs in optimized ttnn whisper
- PR: #5384
- #4732: Clean up compute_kernel_api
- PR: #5375
- More optimizations for group_attn_matmul
- PR: #5385
- #5233: updated resnet18 to run residual connections
- PR: #5390
- #3003: added more meaningful errors to ttnn. Updated getitem to run on device in the cases when it can
- PR: #5403
- #5233: simplified the logic in tracer
- PR: #5370
- #3003: include ttl operations and necessary types under ttnn.ttl
- PR: #5405
- #0: Add note about no merge commits in main
- PR: #5349
- #0: Add timeout in profiler regression workflow
- PR: #5355
- codeowners update
- PR: #5407
- #5365: Add device argument to determine grid size based on target
- PR: #5366
- disable whisper until further investigation, see issue #5430
- PR: #5431
- #3003: fixed ttnn convs
- PR: #5432
- #3886: Fix build error for C++ tests in debug mode
- PR: #5434
- #4954: Support depth 32 in maxpool writer
- PR: #4956
- #0: Pass output cb to pack init functions
- PR: #5418
- #0: skipping DeviceLoadBlankKernels on remote devices
- PR: #5437
- #5359: transformers: update version and relax pcc asserts
- PR: #5421
- #3003: guidelines for adding new op
- PR: #5440
- Don't assume user has one entry in their
$PYTHONPATH
- PR: #5250
- FP32 tensor support for matmul
- PR: #5414
- #3003: updated tutorial 001 to describe the tensor more comprehensively before showing the add
- PR: #5441
- Onboard additional metal code owners
- PR: #5445
- #5402: Add redesigned host-side sw command queue, it can be configured i…
- PR: #5382
- #3003: fixed docs
- PR: #5455
- Alex/metal/enable conv tests on b0
- PR: #5425
- #5356: git bisect script to find broken commits
- PR: #5348
- #0: Update data_format.cpp file
- PR: #5399
- Add skip to full grid matmul whb0
- PR: #5461
- #3003: simplified the logic in ttnn/operations/matmul.py. Added dataclasses instead of tuples for CoreGrid and ShardShape
- PR: #5450
- #5204: adding moreh's test suit. removing an absolute assertion.
- PR: #5373
- Npetrovic/lt gt ne fix
- PR: #5304
- #0: Move device id attribute from tensor to DeviceStorage
- PR: #5467
- #3003: fixed scheduled pipeline
- PR: #5466
- Npetrovic/transformer concat sweeps ttnn
- PR: #5208
- #3003: added support for running ttnn.matmul using 1D_systolic_array. Also, added support for passsing in the program config directly
- PR: #5468
- #5247: Add unary ops to ttnn
- PR: #5279
- #4854: Implement feed-forward sub-module using ttnn for mistral model
- PR: #5176
- #5326: Add tensor manip ops to ttnn
- PR: #5410
- #4775: Add lamb optimizer sweep
- PR: #5039
- disable model perf measurements on VMs
- PR: #5464
- #0: Add more explicit checks for Q and KV heads sharding for group_attn_matmul
- PR: #5481
- #4003: added manage_device decorator. Renamed ttnn.open to ttnn.open_device and ttnn.close to ttnn.close_device
- PR: #5473
- #5442: Add math ops to ttnn
- PR: #5465
- #5343: Add binary ops to ttnn
- PR: #5443
- #5474: Add backward op support for expm1 and exp2
- PR: #5476
- #3003: renamed module.(hpp/cpp) to init.(hpp/cpp). Added stubs for matmul
- PR: #5475
- #5216: Fix broken link for developer's docs in CONTRIBUTING.md
- PR: #5488
- #5334: Add unary math ops to ttnn
- PR: #5404
- #5492: Assert before exiting completion queue thread to ensure users …
- PR: #5497
- #4003: updated create_sharded_memory_config
- PR: #5503
- #5099: Add dprint/watcher enabled to kernel hash
- PR: #5502
- A collection of small DPrint features/fixes
- PR: #5463
- #0: Fix rebase issue from previous commit
- PR: #5505
- #5056: Impl moreh_groupnorm
- PR: #5490
- #4003: fix docs
- PR: #5507
- #5113: Added generation function which excludes ranges
- PR: #5371
- Barsic/bw sweeps
- PR: #5141
- #4416: Add post-commit workflow to check build for all available CONFIGs
- PR: #5495
- #0: Update ATOL to 0.0081 for roberta
- PR: #5493
- #3420: remove S/R EthernetConfig, move out of experimental ns
- PR: #5470
- #4420: eager EnqueueHostToDeviceTransfer impl
- PR: #5293
- #5446: Remove restart command and associated functionality
- PR: #5501
- #4982: Upgrade to checkout v4 so we can use new node 20 and get rid of warnings
- PR: #5524
- #4003: removed pad_to_tile and unpad_to_tile
- PR: #5528
- #0: Add fail-fast: false for build-and-upload
- PR: #5527
- #4003: changed ttnn.ttl to ttnn.experimental
- PR: #5533
- #5531: Remove ppa registration for Git
- PR: #5532
- #4620: Add host<->L1 bw/latency tests
- PR: #5537
- #4443: distribute data transfer between brisc and ncrisc
- PR: #5534