Skip to content

Latest commit

 

History

History
109 lines (85 loc) · 3.79 KB

BlackholeBringUpProgrammingGuide.md

File metadata and controls

109 lines (85 loc) · 3.79 KB

Blackhole Bring-Up Programming Guide

Introduction

Information relevant to programming Blackhole while it is being brought up.

Wormhole N150 vs. Blackhole

Tensix Ethernet DRAM NoC
Total Available for Compute L1 Total Programmability   Total Bank Size Programmability Alignments Multicast
DRAM PCIe L1
Wormhole N150 8x10 8x8 1464 KB 16 1x RISC-V
256 KB L1
12 banks 1 GB N/A Read: 32B
Write: 16B
Read: 32B
Write: 16B
Read: 16B
Write: 16B
Rectangular
Blackhole 14x10 13x10 1464 KB
Data cache added
14 2x RISC-V
512 KB L1
8 banks ~4 GB 1x RISC-V
128 KB L1
Read: 64B
Write: 16B
Read: 64B
Write 16B
Read: 16B
Write: 16B
Rectangular
Strided
L-shaped

L1 Data Cache

Blackhole added a small (4 x 16B cachelines) write-through data cache in L1. Writing an address on one core and reading it from another only requires the reader to invalidate if the address was previously read.

Invalidating the cache can be done via calls to invalidate_l1_cache(). Hardware can clear the cache at some randomized time interval but this is slower than explicitly invalidating the cache. By default the hardware timeout is disabled but can be enabled by setting env var TT_METAL_ENABLE_HW_CACHE_INVALIDATION

The cache can be disabled through an env var:

export TT_METAL_DISABLE_L1_DATA_CACHE_RISCVS=<BR,NC,TR,ER>

where the values specify which riscs to disable cache on.

Ethernet Cores

Runtime has enabled access to second RISC-V on idle ethernet cores.

Support for Fast Dispatch out of idle ethernet cores is added but temporarily disabled while bringing up multi-chip ethernet support.

DRAM

Runtime has not enabled access to program RISC-V on DRAM yet.

NoC

Non-rectangular multicast shapes and strided multicast has been brought up and tested. See gtest DispatchFixture.DRAMtoL1MulticastExcludeRegionUpLeft for example on usage.

On previous architectures there are instances in kernels where NoC commands are issued without explicit flushes. These were causing ND mismatches or hangs on BH because data and semaphore signals were getting updated faster than NoC has a chance to service the command and are resolved by adding flushes. Previous architectures did not need this because of higher RISC to L1 latency compared to NoC latency.

Debug

Debug tools are functional on BH and it is reccomended to use Watcher when triaging Op failures to catch potential alignment issues. Disabling the L1 cache can be helpful to identify missed cache invalidations.

Resetting

Depending on the firmware, reset via tt-smi -r 0 may not work and the board will need to be rebooted.

CI

Bringing up full post commit is a WIP on BH, currently we only run the cpp tests. It is triggered on pushes to main but we have seen some instability with the machines with ND failures.

Issue Tracking

Please file issues or any instances of ND behaviour to the Blackhole board