You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of running entire model graph through our compiler + silicon run, this will allow us to run the model op by op and
individually analyze each of them: are they running end to end, are they failing compile, at which concrete compile step are they failing, etc.
Model graph doesn't allow us to do this since if, for example, %1 = op2(arg1, arg2) fails, we won't know what the status of op3 is.
After doing this op by op analysis, we will be able to tell how many ops (which ops, what shapes...) we still need to support in order to run the entire model end to end.
Proposal
Following steps are needed in order to achieve this:
Export model as stablehlo graph
Split stablehlo graph into constituent stablehlo ops (each wrapped in a separate function and module to be able to run it through the compiler easily)
For each op:
Compile it
Convert stablehlo -> ttir
Convert ttir -> ttnn
Convert ttnn -> flatbuffer
Run it (flatbuffer) on device
Collect statistics (which model op came from, input shapes, compile depth, if it runs on silicon or not, etc)
Report this statistics in nightly CI run and display on a dashboard
1. Export model as stablehlo graph
Most frontends can do this natively. In python we can do
Compilation is a process consisting of a couple of well known steps:
StableHLO -> TTIR (stablehlo-to-ttir-pipeline)
TTIR -> TTNN (ttir-to-ttnn-backend-pipeline)
TTNN -> flatbuffer (ttnn-to-flatbuffer)
Each of these processes can be pybinded and exposed in python in form of a ttmlir lib. This also isn't frontend specific and can be reused.
It would look something like
Summary
We need a way to split models into their constituent operations. For example:
should be split into
Instead of running entire model graph through our compiler + silicon run, this will allow us to run the model op by op and
individually analyze each of them: are they running end to end, are they failing compile, at which concrete compile step are they failing, etc.
Model graph doesn't allow us to do this since if, for example,
%1 = op2(arg1, arg2)
fails, we won't know what the status ofop3
is.After doing this op by op analysis, we will be able to tell how many ops (which ops, what shapes...) we still need to support in order to run the entire model end to end.
Proposal
Following steps are needed in order to achieve this:
stablehlo
graph1. Export model as stablehlo graph
Most frontends can do this natively. In python we can do
to be able to register
stablehlo
as a dialect and doModule.parse(stablehlo_module_str)
to get the NN graph wrapped in a module.2. Split stablehlo graph into constituent stablehlo ops
This shouldn't be too hard to do and it isn't frontend specific so it is reusable. Here is a snippet of what it might look like
3. Compile, run and dump statistics op by op
Compilation is a process consisting of a couple of well known steps:
stablehlo-to-ttir-pipeline
)ttir-to-ttnn-backend-pipeline
)ttnn-to-flatbuffer
)Each of these processes can be pybinded and exposed in python in form of a
ttmlir
lib. This also isn't frontend specific and can be reused.It would look something like
Now, we can expand
StableHLOCompiler
with aand so on. This way it is easy to keep track how far the op has come in this entire process.
4. Report statistics in nightly run
For each op we should dump info in format described by pydantic model https://github.com/tenstorrent/tt-github-actions/pull/12/files#:~:text=class%20OpTest(BaseModel)%3A.
ml_kernel_op
pipeline described here and here will pick it up and allow us to query it in Superset.The text was updated successfully, but these errors were encountered: