-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: TFRT Kernel Fallback #266
RFC: TFRT Kernel Fallback #266
Conversation
a12efcd
to
bfd54a3
Compare
Formatting fixes Formatting fixes and removed option 2 for C API Fixed top table Adjusted some of the wording Changed kernel fallback RFC name, updated selective registration section Fix links to cs.opensource.google
bfd54a3
to
f1ce513
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, thanks for putting this RFC together!
Can you introduce why we are talking about fallback? The no fallback ones are codegen? What Is the native kernel concept in TFRT? |
…ty into tfrt_kernel_fallback_rfc
@bhack, thank you for the comments! Note that I am still keeping it as draft to gather comments from coworkers/coauthors. I will hopefully announce it for review very soon.
"Native kernels" are kernels implemented in the new runtime (as opposed to existing TensorFlow runtime). See: https://github.com/tensorflow/runtime/blob/62695daf59a51774709fac081165f36611dfe67c/documents/tfrt_host_runtime_design.md#low-level-abstractions-asyncvalue-kernel-threadpool-allocator . @fdxmw let me know if there is anything I am missing. Since TFRT is a new runtime, it only supports a small subset of kernels. At the same time, existing TensorFlow runtime supports many kernels (probably around 1000). We want to define a way for TFRT to call these existing TensorFlow kernels and we call it "fallback".
Impact is optional. We are thinking of having two types of "fallback": Runtime fallback and Kernel fallback. Runtime fallback should support everything out of the box, including custom C++ ops. Kernel fallback aims to support a subset of kernels with a subset of functionality. Custom ops can choose to support kernel fallback by extending
Main concern is binary size. It might be that Runtime fallback can run on mobile as well - which would automatically support all kernels. However, Runtime Fallback would have larger binary size demands, which might or might not fit mobile constraints. At the same time, the focus of this doc - Kernel fallback aims to support a subset kernels but minimize binary size. So it would be a better option for models that just use this subset. |
Most significant update: added example for op registration.
@bhack thank you for the questions, see replies below:
Correct.
Yes, we just update existing kernel code instead of adding new ones. We will add separate registration macros, but these will also be in TensorFlow repo. |
Design review notes:
|
Will mark as accepted once the next steps are completed by Anna above. |
Just a heads up that I am still working on running more benchmarks to make final template/inheritance decision and update the doc. |
@annarev How the kernels are sampled for this template/inheritance benchmarks? By "popularity"? (I suppose that you have a small set) |
We try to look at kernels that are as small as possible (for e.g. scalar) so that framework overhead is emphasized. |
Are you also analyzing the compile time? Do we need to introduce precompiled headers if we go on with inheritance? Probably you don't care too much about compile time but currently it is already quite heavy to compile Tensorflow sources for contributors without accessing to the "internal" cache (tensorflow/build#5). |
Definitely we don't want to increase compile time significantly given that it already takes a long time. I would actually expect a larger compile time increase due to templates since all templates would need to be expanded. Do you expect inheritance to have a larger build time penalty? |
Yes I don't know all the internals but I suppose that It Is a little bit hard to isolate the compile time of a good sample of kernels. In general of course also ISOCPP itself claims an increased compile time of templates but we still need to figure out in our empirical case just to not be aware of this too late (when we have accumulated a significante amount of registered kernels). |
`final` keyword removes previously observed large regression
Quick update. I ran build 3 times at head and 3 times with |
@ematejska This RFC is now ready to accept and merge. |
Feedback open until 7/28
Objective
This proposal focuses on getting a majority of "well-behaved" TensorFlow ops running efficiently on
mobile devices by removing the need to execute them via the TensorFlow eager runtime, instead
calling kernels directly from the new TFRT TensorFlow runtime.
Note that there is an effort to call existing kernels by delegating toTensorFlow eager runtime instead. This approach is called Runtime Fallback and corresponding RFC will be published soon. The goals of the two fallback mechanisms are as follows: