Skip to content

v0.44.0

Compare
Choose a tag to compare
@github-actions github-actions released this 27 Feb 15:57
· 7816 commits to main since this release

📦 Uncategorized

  • Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr
  • #4794: Implement DownBlock2D using ttnn for stable_diffusion model
  • #4797: Implement BasicTransformerBlock sub-module using ttnn for stab…
  • #0: write cluster config for FD mode, non tunneling cores as well
  • Update bw test, change mulsi calls to use *
  • #3003: updated tt-lib documentation
  • #0: Update to v0.44.0
  • #4003: added ability to trace ttnn operations using torchtrail library
  • Support moreh logsoftmax
  • #4614: gitmodules: Use https URLs for submodules
  • #0: add reviewers to frequently touched ops docs file
  • backward ops - hypot and atan2
  • #4885: Move program device map to program
  • #4858: Add support for float to int typecast
  • Matmul_block on a smaller grid size
  • Revert "#0: Add support for typecast float to int"
  • Add dst ethernet router support and remote command processor to accept FD packets on remote chip
  • Falcon40B TT Implementation
  • #5198: Fix moreh softmax related bug
  • #0: skip MOREH Softmax tests from main
  • #3122: Use device grid size in falcon_attention to be genereric...
  • #0: Add assertions for interleaved tensors for ops that don't support sharding
  • #5169: Add activation ops to ttnn
  • #3003: add duration to the ttnn operation nodes when TTNN_ENABLE_LOGGING=1 is used to compile the code
  • #5027: Optimize group attn matmul for Falcon40B decode
  • #0: add documentation about managing documentation
  • Adding docs for maxpool, avg pool and upsample
  • Revert "#0: skip MOREH Softmax tests from d5811b7
  • #5165: Add hyperbolic ops to ttnn
  • #4866: Add grayskull open source llk-library
  • #5002: simplified preprocessing of CNNs using preprocess_model
  • Create GroupNorm sharded in TTNN
  • #5097: Support for dedicated completion queue thread
  • upsample test calculate grid
  • fix for sharded allocater when num banks == num cores
  • MHA tutorial interactive notebook with diagrams
  • #4003: Adding a profile tutorial
  • #0: Added non-blocking read stress test
  • Revert "MHA tutorial interactive notebook with diagrams"
  • #0: Update all_gather to work for multi_link. Update falcon-40b to use 2 links for all gathers
  • #5142: Remove slow dispatch mode from workgin sweeps
  • #3003: fixed the input tensor documentation
  • #0: Temp slower resnet VM run
  • throw on fast dispatch for to_host_sharded as its not supported
  • #5253: Fix kv_past_len being passed in to rotary embedding for falcon models
  • #5233: started adding ttnn_functional_resnet
  • #3003: updated ttnn documentation to explain what features it has over tt_lib. Added standalone examples of basic usage of ttnn
  • #0: Speedup incremental builds
  • #0: Change setup.py to be git worktree friendly
  • MHA tutorial interactive notebook with diagrams
  • #3003: disable tutorial 6 from running as the unit test
  • Agrebenisan/non blocking tensor reads
  • #5275: CODEOWNERS: update to include files relevant for ttnn team
  • Fix an intermittent launch message transfer error
  • Revert "MHA tutorial interactive notebook with diagrams"
  • #0: add parens in LLK doc
  • #3003: only unit test tutorials that work on pipelines
  • #5246: Add unary math ops to ttnn
  • Vignesh/stable diffusion ttnn basic transformer block fix
  • #4854: Implement attention and rms_norm sub-module using ttnn for mis…
  • #4795: Add upblock2d to functional stable diffusion model
  • #4796: Implement Transformer2DModel using ttnn for stable_diffusion m…
  • #0: Adding llk wormhole_b0 submodule
  • #4003: Adding pyind11 to ttnn
  • #5296: Fix broken link to host_api.hpp in README.md
  • #0: Fix bug with the way we were measuring bert inference time
  • #0: Change local tt_lib._C module install from symlink to copy
  • #5233: added ability to fold batch_norm2d into conv2d
  • #5222: replace hex8_to_hex32.py with cpp to shave off some compile time -temporary fix
  • Enable tests for WHB0
  • #5137: Cleanups for newer Linux distro / toolchains
  • #5233: implemented support for converting all Resnet-18 modules using preprocess_model function
  • #3003: fix model preprocessing bug
  • #4799: Implement CrossAttnDownBlock2D sub-module using ttnn for stabl…
  • #4800: Implement UNetMidBlock2DCrossAttn using ttnn for stable_diffus…
  • #4798: Add ttnn cross attn upblock2d in functional stable diffusion m…
  • #4801: Implement Unet 2D Condition model using ttnn for stable_diffus…
  • #4965: Rename Conv2D to Conv2d and MaxPool2D to MaxPool2d to match torch
  • #0: Remove departed team member from CODEOWNERS
  • #0: add to codeowners
  • #5314: Only stall on first scheduled read after commands with side effects
  • #4965: fix bad rebase
  • #0: Add more instructions for dispatching workflow actions and a note about skipping git hooks
  • Update optimized Bert to support WH grid sizes, add sharding support for RMSNorm
  • #4642: create gtest_smoke as a sanity test suit
  • #5341: context switch if eth txq is full
  • #5323: Convolutions of small size fail during parallelization calculations
  • Npetrovic/transformer softmax
  • Fix groupnorm for narrow channels
  • #4862: added more test for ttnn bloom. Update optimized ttnn bert to match the structure of non-optimized ttnn bert
  • #0: Add an envvar parser with value detection and default value setti…
  • #4732: Clean up compute kernel apis
  • #5318: Modify Falcon7B to use attn_matmul for wormhole
  • #0: make logLocationsRecord a static function
  • #5233: run convs with auto-format
  • #5377: Avoid segfault by checking buffer !null before getting device
  • Alex/metal/pack untilize b0
  • #4487: Support block sharding in upsample
  • #5359: update python package transformers + dependencies to include Falcon
  • #3708: Add support for LN having gamma/beta in bfp8
  • #4003: Skip sweep tests if not available
  • #4003: use faster TMs in optimized ttnn whisper
  • #4732: Clean up compute_kernel_api
  • More optimizations for group_attn_matmul
  • #5233: updated resnet18 to run residual connections
  • #3003: added more meaningful errors to ttnn. Updated getitem to run on device in the cases when it can
  • #5233: simplified the logic in tracer
  • #3003: include ttl operations and necessary types under ttnn.ttl
  • #0: Add note about no merge commits in main
  • #0: Add timeout in profiler regression workflow
  • codeowners update
  • #5365: Add device argument to determine grid size based on target
  • disable whisper until further investigation, see issue #5430
  • #3003: fixed ttnn convs
  • #3886: Fix build error for C++ tests in debug mode
  • #4954: Support depth 32 in maxpool writer
  • #0: Pass output cb to pack init functions
  • #0: skipping DeviceLoadBlankKernels on remote devices
  • #5359: transformers: update version and relax pcc asserts
  • #3003: guidelines for adding new op
  • Don't assume user has one entry in their $PYTHONPATH
  • FP32 tensor support for matmul
  • #3003: updated tutorial 001 to describe the tensor more comprehensively before showing the add
  • Onboard additional metal code owners
  • #5402: Add redesigned host-side sw command queue, it can be configured i…
  • #3003: fixed docs
  • Alex/metal/enable conv tests on b0
  • #5356: git bisect script to find broken commits
  • #0: Update data_format.cpp file
  • Add skip to full grid matmul whb0
  • #3003: simplified the logic in ttnn/operations/matmul.py. Added dataclasses instead of tuples for CoreGrid and ShardShape
  • #5204: adding moreh's test suit. removing an absolute assertion.
  • Npetrovic/lt gt ne fix
  • #0: Move device id attribute from tensor to DeviceStorage
  • #3003: fixed scheduled pipeline
  • Npetrovic/transformer concat sweeps ttnn
  • #3003: added support for running ttnn.matmul using 1D_systolic_array. Also, added support for passsing in the program config directly
  • #5247: Add unary ops to ttnn
  • #4854: Implement feed-forward sub-module using ttnn for mistral model
  • #5326: Add tensor manip ops to ttnn
  • #4775: Add lamb optimizer sweep
  • disable model perf measurements on VMs
  • #0: Add more explicit checks for Q and KV heads sharding for group_attn_matmul
  • #4003: added manage_device decorator. Renamed ttnn.open to ttnn.open_device and ttnn.close to ttnn.close_device
  • #5442: Add math ops to ttnn
  • #5343: Add binary ops to ttnn
  • #5474: Add backward op support for expm1 and exp2
  • #3003: renamed module.(hpp/cpp) to init.(hpp/cpp). Added stubs for matmul
  • #5216: Fix broken link for developer's docs in CONTRIBUTING.md
  • #5334: Add unary math ops to ttnn
  • #5492: Assert before exiting completion queue thread to ensure users …
  • #4003: updated create_sharded_memory_config
  • #5099: Add dprint/watcher enabled to kernel hash
  • A collection of small DPrint features/fixes
  • #0: Fix rebase issue from previous commit
  • #5056: Impl moreh_groupnorm
  • #4003: fix docs
  • #5113: Added generation function which excludes ranges
  • Barsic/bw sweeps
  • #4416: Add post-commit workflow to check build for all available CONFIGs
  • #0: Update ATOL to 0.0081 for roberta
  • #3420: remove S/R EthernetConfig, move out of experimental ns
  • #4420: eager EnqueueHostToDeviceTransfer impl
  • #5446: Remove restart command and associated functionality
  • #4982: Upgrade to checkout v4 so we can use new node 20 and get rid of warnings
  • #4003: removed pad_to_tile and unpad_to_tile
  • #0: Add fail-fast: false for build-and-upload
  • #4003: changed ttnn.ttl to ttnn.experimental
  • #5531: Remove ppa registration for Git
  • #4620: Add host<->L1 bw/latency tests
  • #4443: distribute data transfer between brisc and ncrisc