-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UMD] Switch pci_cores and dram_cores to CoreCoord api. #17620
Conversation
on WH the translated space does not modify pcie/dram cores but on BH we do so I think when noc translation is enabled Metal host will use the translated pcie/dram coords from Andrews spec In my testing I was trying to pass in physical pcie coordinate and get translated but I saw it returns physical rather than (19,24). I think this might be a bug? |
The above mistranslation seems more like a machine issue since noc translation is being readback as not enabled |
side note: after hardcoding UMD's Cluster to set noc translation enabled to true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not gonna block approval on the pcie translation issue. can be debugged separately
…17707) ### Ticket Related to #17002 ### Problem description A couple of leftover usages of old soc descriptor API. After this and other PRs from this set, tt-metal will finally build with harvesting code completely removed from tt::umd::Cluster and members of tt_SocDescriptor made private, so that all usages are forced through get_cores() and other APIs. Related PRs: #17620 #17642 #17645 #17674 #17678 ### What's changed - .dram_cores changed with get_cores_for_dram_channel - dram_cores.size() changed with get_grid_size - replace .workers and .ethernet_cores from tlb_config with get_cores ### Checklist - [x] All post-commit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197621962 - [x] Newest All post-commit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13499189013 - [x] Blackhole post-commit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197623746 - [ ] (Single-card) Model perf tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197626137 - [ ] (Single-card) Device perf regressions : https://github.com/tenstorrent/tt-metal/actions/runs/13197628487 - [ ] (T3K) T3000 unit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197630092 - [ ] (T3K) T3000 demo tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197632086 - [ ] (TG) TG unit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197633394 - [ ] (TG) TG demo tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197635275 - [x] (TGG) TGG unit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197637219 - [x] (TGG) TGG demo tests : https://github.com/tenstorrent/tt-metal/actions/runs/13197639736
…onfigurations (#539) ### Issue #279 ### Description This is how it is used by tt-metal, so this change will support the metal's scenario for both boards. ### List of the changes - Exposed the board type and "is_chip_remote" up to soc descriptor - Don't use _type1 and _type2 soc descriptors. Add default bh soc descriptor - Fix router cores for _type2 soc descriptor (not needed) - Introduced get_blackhole_chip_type to be used at a couple of places, along with BlackholeChipType enum, for clarity. - Added BoardBasedPCIE test - test in test_cluster_bh that only single PCIE core exists in resulting soc descriptor - changed "is_chip_remote" to "asic_location" which communicates better the intention. Although we still don't have p300 to confirm behavior - Changed constructor of local_chip to accept sdesc_path, so that it works on tt_metal. We still have a problem that our internal soc descriptors are unused. - Changed _no_eth soc descriptor to hold both pcie cores. ### Testing Added tests to test new feature. tt_metal test, which removes its own logic on choosing pcie_core, passes: - APC run which passes on tt_metal with this branch: https://github.com/tenstorrent/tt-metal/actions/runs/13541164354 - BH run: https://github.com/tenstorrent/tt-metal/actions/runs/13541162298/job/37843118101 ### API Changes There are no API changes in this PR. Previous PR which changed P150A to P150 broke metal build. But fortunately, exactly the changes which I'm doing in this PR and related metal PR fix that (remove that dependancy altogether): tenstorrent/tt-metal#17620
…18352) ### Ticket Related to #17002 ### Problem description A follow up from removing .dram_cores in #17620 and removing eth_cores in #17642 Not sure if I made mistake during merging changes, or if these changes happened in the meantime. With this changes, tt_metal builds with this UMD change which is the ultimate goal: tenstorrent/tt-umd#509 ### What's changed - Changed .dram_cores with .get_dram_cores() - Changed .ethernet_cores with .get_cores(CoreType::ETH) ### Checklist All runs on brosko/umd_api_last_changes : - [ ] All post-commit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548653822 - [x] Blackhole post-commit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548656443 - [ ] (Single-card) Model perf tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548658812 - [ ] (Single-card) Device perf regressions : https://github.com/tenstorrent/tt-metal/actions/runs/13548660915 - [ ] (T3K) T3000 unit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548663044 - [ ] (T3K) T3000 demo tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548664896 - [ ] (TG) TG unit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548666720 - [ ] (TG) TG demo tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548669182 - [x] (TGG) TGG unit tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548671739 - [x] (TGG) TGG demo tests : https://github.com/tenstorrent/tt-metal/actions/runs/13548674558
### Issue Last one for #439 ### Description Finally removing old harvesting code. All the tt_metal's usages have been moved to the new soc_descriptor and core_coordinate_manager apis. ### List of the changes - Remove using_harvested_soc_descriptors, get_harvesting_masks_for_soc_descriptors and translate_to_noc_table_coords functions, and performed_harvesting, harvested_rows_per_target, translation_tables_en from the public cluster API - Remove harvesting code from Cluster with many internal functions - move old soc_descriptor structures to be private. - Adjust tests accordingly - Added noc_translation public member to soc_descriptor. Verify that the one passed through constructor and through soc_descriptor matches. This code should be improved. ### Testing Code builds ### API Changes There are no breaking API changes in this PR. Related metal PRs prior to this change: tenstorrent/tt-metal#17003 tenstorrent/tt-metal#17306 tenstorrent/tt-metal#17543 tenstorrent/tt-metal#17620 tenstorrent/tt-metal#17642 tenstorrent/tt-metal#17645 tenstorrent/tt-metal#17674 tenstorrent/tt-metal#17678 tenstorrent/tt-metal#17707
Ticket
Related to #17002
Problem description
Switched .pci_cores and .dram_cores to new API.
What's changed
Checklist