Skip to content

Commit

Permalink
#0: [skip_ci] Updating BH bring-up programming guide
Browse files Browse the repository at this point in the history
  • Loading branch information
abhullar-tt committed Sep 12, 2024
1 parent 9aee509 commit c8bea61
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 9 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,4 @@ Get started with [simple kernels](https://docs.tenstorrent.com/tt-metalium/lates
- [Flash Attention on Wormhole](./tech_reports/FlashAttention/FlashAttention.md) (updated Sept 6th)
- [CNNs on TT Architectures](./tech_reports/CNNs/ttcnn.md) (updated Sept 6th)
- [Ethernet and Multichip Basics](./tech_reports/CCL/CclDeveloperGuide.md) (Updated Sept 12th)
- [Blackhole Bring-Up Prgramming Guide](./tech_reports/Blackhole/BlackholeBringUpProgrammingGuide.md) (Updated Sept 12th)
106 changes: 97 additions & 9 deletions tech_reports/Blackhole/BlackholeBringUpProgrammingGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,105 @@

Information relevant to programming Blackhole while it is being brought up.

## Memory Alignment
## Wormhole N150 vs. Blackhole

- 32 bytes for L1
- 64 bytes for DRAM
<table><thead>
<tr>
<th rowspan="3"></th>
<th colspan="3">Tensix</th>
<th colspan="2">Ethernet</th>
<th colspan="3">DRAM</th>
<th colspan="4">NoC</th>
</tr>
<tr>
<th rowspan="2">Total</th>
<th rowspan="2">Available for Compute</th>
<th rowspan="2">L1</th>
<th rowspan="2">Total</th>
<th rowspan="2">Programmability&nbsp;&nbsp;</th>
<th rowspan="2">Total</th>
<th rowspan="2">Bank Size </th>
<th rowspan="2">Programmability</th>
<th colspan="3">Alignments</th>
<th rowspan="2">Multicast</th>
</tr>
<tr>
<th>DRAM</th>
<th>PCIe</th>
<th>L1</th>
</tr></thead>
<tbody>
<tr>
<td>Wormhole N150</td>
<td>8x10</td>
<td>8x8</td>
<td>1464 KB</td>
<td>16</td>
<td>1x RISC-V<br>256 KB L1</td>
<td>12 banks</td>
<td>1 GB</td>
<td>N/A</td>
<td>Read: 32B<br>Write: 16B</td>
<td>Read: 32B<br>Write: 16B</td>
<td>Read: 16B<br>Write: 16B</td>
<td>Rectangular</td>
</tr>
<tr>
<td>Blackhole</td>
<td>14x10</td>
<td>13x10</td>
<td>1464 KB<br>Data cache added </td>
<td>14</td>
<td>2x RISC-V<br>512 KB L1</td>
<td>8 banks</td>
<td>~4 GB</td>
<td>1x RISC-V<br>128 KB L1</td>
<td>Read: 64B<br>Write: 16B</td>
<td>Read: 64B<br>Write 16B</td>
<td>Read: 16B<br>Write: 16B</td>
<td>Rectangular<br>Strided<br>L-shaped</td>
</tr>
</tbody></table>

## Command Buffer
### L1 Data Cache

BH cmd\_buffer is known to have issues, need to turn on cmd\_buffer\_fifo.
Instead of using noc\_cmd\_buf\_ready and cmd\_buffer for sending out mcast requests (as well as other read/write requests),
use cmd\_buffer\_fifo and CMD\_BUF\_AVAIL
Blackhole added a data cache in L1. Writing an address on one core and reading it from another only requires the reader to invalidate if the address was previously read.

## tt-smi
Invalidating the cache can be done via calls to `invalidate_l1_cache()`

Depending on the firmware, tt-smi reset may not work and the board will need to be rebooted.
The cache can be disabled through an env var:
```
export TT_METAL_DISABLE_L1_DATA_CACHE_RISCVS=<BR,NC,TR,ER>
```

### Ethernet Cores

Runtime has not enabled access to second RISC-V on the ethernet cores yet.

Fast dispatch can be run out of ethernet cores.

### DRAM

Runtime has not enabled access to program RISC-V on DRAM yet.

### NoC

Non-rectangular multicast shapes have not been tested yet.

BH enabled 16-deep FIFOs for each of the four command buffers. These are enabled by default in `noc_init` as BH cmd\_buffer has known issues. NoC APIs are not impacted by this change.

## Debug

Debug tools are functional on BH and it is reccomended to use Watcher when triaging Op failures to catch potential alignment issues. Disabling the L1 cache can be helpful to identify missed cache invalidations.

## Resetting

Depending on the firmware, reset via `tt-smi -r 0` may not work and the board will need to be rebooted.

## CI

Bringing up full post commit is a WIP on BH, currently we only run the cpp tests. It is triggered on pushes to main but we have seen some instability with the machines with ND failures.

## Issue Tracking

Please file issues or any instances of ND behaviour to the Blackhole [board](https://github.com/orgs/tenstorrent/projects/50/views/1)

0 comments on commit c8bea61

Please sign in to comment.