-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add physical <-> translated mapping for all BH pcie cores #530
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment which highlights some of the things that I think can be problematic in coordinate manager.
But in general, #497 was supposed to introduce creation of Cluster for Blackhole, it should figure out for metal runtime which PCI core should be used. I suppose that the problem can be the fact that Metal still can't omit the soc descriptor path out of Cluster constructor? If that is the case I think we should focus on fixing that problem
blackhole::pcie_translated_coordinate_start_y, | ||
CoreType::PCIE, | ||
CoordSystem::TRANSLATED); | ||
CoreCoord translated_coord = CoreCoord( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using maps, I think having the same translated coordinate for different physical coordinates may cause problems. Is this working on all board types we have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when order of PCI cores in soc desc yaml is switched?
So I thought about this and have a simple proposal on how to maybe fix this in a better way. If Blackhole Coordinate Manager had additional info on which board type it is working on (plus chip location if necessary), it could already map the proper PCI core to the translated one, and for the other PCI core do 1-1 mapping with physical. I think this could help to remove this logic out of metal runtime. Let me know what you think |
As discussed with @abhullar-tt offline, this will be checked in so metal is not blocked, any further changes can be put through a metal blackhole ci to not break things |
I think some of your proposal might be captured by #539. Im okay with letting that be merged and testing translation support on top of those changes + closing this PR |
Issue
N/A
Description
In Metal the BH soc descriptor lists both potential physical PCIe cores (2,0) and (11,0) but later picks which physical (noc0) PCIe core to use for the board type.
When switching to translated coordinates Metal tests were failing to translate (11,0) because the BH coordinate manager assumes only 1 PCIe core
List of the changes
Update physical <-> translated mapping for BH to iterate over all PCIe cores.
Testing
abhullar/bh-virtual