RFC: TFRT Kernel Fallback #266

annarev · 2020-07-13T04:51:06Z

Feedback open until 7/28

Status	Proposed
RFC #	266
Author(s)	Anna Revinskaya (annarev@google.com), Jeremy Lau (lauj@google.com)
Sponsor	Jeremy Lau (lauj@google.com)
Updated	2020-09-09

Objective

This proposal focuses on getting a majority of "well-behaved" TensorFlow ops running efficiently on
mobile devices by removing the need to execute them via the TensorFlow eager runtime, instead
calling kernels directly from the new TFRT TensorFlow runtime.

Note that there is an effort to call existing kernels by delegating toTensorFlow eager runtime instead. This approach is called Runtime Fallback and corresponding RFC will be published soon. The goals of the two fallback mechanisms are as follows:

Runtime Fallback aims to reuse all current TensorFlow kernels in TFRT.
Kernel Fallback (focus of this document) aims to get a large number of existing kernels working in TFRT while reducing binary size to support mobile devices.

Formatting fixes Formatting fixes and removed option 2 for C API Fixed top table Adjusted some of the wording Changed kernel fallback RFC name, updated selective registration section Fix links to cs.opensource.google

rfcs/20200712-tfrt-kernel-fallback.md

fdxmw

Minor comments, thanks for putting this RFC together!

rfcs/20200712-tfrt-kernel-fallback.md

bhack · 2020-07-15T01:28:05Z

Can you introduce why we are talking about fallback? The no fallback ones are codegen? What Is the native kernel concept in TFRT?
Is there an impact on custom c++ ops in SIGs repositories? There will be a way to run that on mobile?

…ty into tfrt_kernel_fallback_rfc

annarev · 2020-07-15T05:37:43Z

@bhack, thank you for the comments! Note that I am still keeping it as draft to gather comments from coworkers/coauthors. I will hopefully announce it for review very soon.

Can you introduce why we are talking about fallback? The no fallback are codegen? What Is the native kernel concept in TFRT?

"Native kernels" are kernels implemented in the new runtime (as opposed to existing TensorFlow runtime). See: https://github.com/tensorflow/runtime/blob/62695daf59a51774709fac081165f36611dfe67c/documents/tfrt_host_runtime_design.md#low-level-abstractions-asyncvalue-kernel-threadpool-allocator . @fdxmw let me know if there is anything I am missing.

Since TFRT is a new runtime, it only supports a small subset of kernels. At the same time, existing TensorFlow runtime supports many kernels (probably around 1000). We want to define a way for TFRT to call these existing TensorFlow kernels and we call it "fallback".

Is there an impact on custom c++ ops in SIGs repositories?

Impact is optional. We are thinking of having two types of "fallback": Runtime fallback and Kernel fallback. Runtime fallback should support everything out of the box, including custom C++ ops. Kernel fallback aims to support a subset of kernels with a subset of functionality. Custom ops can choose to support kernel fallback by extending OpKernelBase instead of OpKernel and taking OpKernelConstructionInterface/OpKernelContextInterface instead of OpKernelConstruction/OpKernelContext as inputs. Depending on the kernel, this might or might not be feasible.

There will be a way to run that on mobile?

Main concern is binary size. It might be that Runtime fallback can run on mobile as well - which would automatically support all kernels. However, Runtime Fallback would have larger binary size demands, which might or might not fit mobile constraints. At the same time, the focus of this doc - Kernel fallback aims to support a subset kernels but minimize binary size. So it would be a better option for models that just use this subset.

Most significant update: added example for op registration.

annarev · 2020-07-17T02:48:22Z

@bhack thank you for the questions, see replies below:

@annarev So is this just a more or less long temporary solution waiting for native kernel population in TFRT?

Correct.

Will these fallback kernels be maintained in the Tensorflow repo?

Yes, we just update existing kernel code instead of adding new ones. We will add separate registration macros, but these will also be in TensorFlow repo.

rfcs/20200712-tfrt-kernel-fallback.md

annarev · 2020-08-01T05:07:52Z

Design review notes:

tfrt_fallback as preferred dialect prefix
String interning less important now, will consider later
Why called fallback? Because we fallback from TFRT
Gradients op comment is not super relevant, can be removed
- Can kernel fallback be used for training?
  - Kernels are available
Size measurement
- Clarified: 200k per kernel per dtype
- Should compare TFLite select (Flex) delegate with Kernel Fallback
  - Is the main saving from the framework or kernel side?
- Ideal size number: 200k framework, 20k for kernels
Kernel fallback
- Short term usable for TFLite select and also TFMRT, to get 400 of kernels while we develop proper native kernels
What's required per kernel?
- Add base classes, change TF kernels to take interfaces, add registrations
- Unexpected kernel change work, because some kernel use special base class
TFRT attribute names are sorted alphabetically
- Long standing issue, there are some solution, but not committed to any solution yet
Kernel registration not related to native kernel
Subset of kernels: how many?
- Ideally ~400 based on estimate
- Will get a better idea of whether we can convert all 400 kernels
How to fallback to full TF for kernels that use C API?
- Depends on which platform you are using, select at compile time
Do we expect external users to use Kernel Fallback to fallback to their custom kernels?
- We release two binaries, one with full TF, and the other with kernel fallback
- One kernel implementation can work with both
Want to use TFRT kernel fallback for both TFLite Select and also TFMRT
- TFMRT won't use the same API as TFLite
Should be able to package both runtime, kernel fallback
- If we do that, we won't get binary size benefits
- In the future, may consider kernel fallback to be used together with runtime fallback for non-mobile
Correct, this is a temporary solution until we have native kernels
- Yes, will maintain them in TF repo
ENG cost (not scalability) for the kernel fallback:
- Anna is responsible for adding those new kernels
- Will provided an ENG estimate in Q4
Whether we want to use template or inheritance?
- We need to convert a lot of kernels? How to convert kernels?
- On latency, inheritance is slower, due to vtable lookup for small kernels
- basic_ops_benchmark call methods that are used by all TF kernels
  - Regression meant lots of more GCU used in TF
- Will cause regression on existing TF => this inheritance approach doesn't seem feasible
  - Could #ifdef stuff we don't want to use; we already tried, but really hard to make it work
  - Have a simple base class with the common methods? But might as well use template
- Only new TF kernels that want conversion require templates
- If we go with templates, then would it be easy to just add native kernels?
  - You will have to add a few lines of code to every kernel
  - But likely can use ClangMR to update kernels in batch
  - If it's just a script that can update all kernels, then it seems fine
  - If it requires a lot more manual work, then maybe we need to prioritize on the set of kernels
- What's the current scope of kernels?
  - There are two for now
  - Will estimate the ENG cost in Q3. Will implement in Q4.
- Should come up with a list
- TF has a lot of kernels. Need to depend on Runtime Fallback and/or Kernel Fallback
  - Kernels outside TF repo do not use special TensorFlow headers but more likely use C++ code that we don't handle
What's the next steps?
- We are moving forward with kernel fallback, there are some details that need to be figured out
- Anna will add more numbers, change inheritance to template, in a week
- Need to know how many kernels can be done?
  - 20 not so much
  - 400 is great!
  - What's the code size overhead? estimated at ~3% based on one kernel

ematejska · 2020-08-03T16:44:56Z

Will mark as accepted once the next steps are completed by Anna above.

annarev · 2020-08-04T22:12:02Z

Just a heads up that I am still working on running more benchmarks to make final template/inheritance decision and update the doc.
I will add a comment here once this doc is ready to merge.

bhack · 2020-08-04T22:18:42Z

@annarev How the kernels are sampled for this template/inheritance benchmarks? By "popularity"? (I suppose that you have a small set)

annarev · 2020-08-04T22:25:36Z

@annarev How the kernels are sampled for this template/inheritance benchmarks? By "popularity"? (I suppose that you have a small set)

We try to look at kernels that are as small as possible (for e.g. scalar) so that framework overhead is emphasized.

bhack · 2020-08-04T22:40:08Z

Are you also analyzing the compile time? Do we need to introduce precompiled headers if we go on with inheritance? Probably you don't care too much about compile time but currently it is already quite heavy to compile Tensorflow sources for contributors without accessing to the "internal" cache (tensorflow/build#5).

annarev · 2020-08-04T22:50:04Z

Are you also analyzing the compile time? Do we need to introduce precompiled headers if we go on with inheritance? Probably you don't care too much about compile time but currently it is already quite heavy to compile Tensorflow sources for contributors without accessing to the "internal" cache (tensorflow/build#5).

Definitely we don't want to increase compile time significantly given that it already takes a long time. I would actually expect a larger compile time increase due to templates since all templates would need to be expanded. Do you expect inheritance to have a larger build time penalty?
I tried measuring build time, but it is not easy. Basically, building a single op is dominated by all building we already have to do in TensorFlow that I couldn't measure overhead of this specific change.

bhack · 2020-08-04T23:17:17Z

Yes I don't know all the internals but I suppose that It Is a little bit hard to isolate the compile time of a good sample of kernels. In general of course also ISOCPP itself claims an increased compile time of templates but we still need to figure out in our empirical case just to not be aware of this too late (when we have accumulated a significante amount of registered kernels).

`final` keyword removes previously observed large regression

annarev · 2020-09-11T04:13:44Z

Yes I don't know all the internals but I suppose that It Is a little bit hard to isolate the compile time of a good sample of kernels. In general of course also ISOCPP itself claims an increased compile time of templates but we still need to figure out in our empirical case just to not be aware of this too late (when we have accumulated a significante amount of registered kernels).

Quick update. I ran build 3 times at head and 3 times with OpKernelContext and OpKernelConstruction updated to inherit from interfaces (our preferred approach right now is inheritance, so I didn't test templates).
Time measurements at head (seconds):
base = np.array([6776.767, 6595.033, 6926.461])
Time measurements when using inheritance (seconds):
inheritance = np.array([7109.257, 6722.884, 6660.029])
p-value (calculated with stats.ttest_ind(inheritance, base)) is quite high at 0.72. Therefore, most likely there is no build time regression.

annarev · 2020-09-14T18:11:17Z

@ematejska This RFC is now ready to accept and merge.

annarev requested review from ematejska, ewilderj, martinwicke and theadactyl as code owners July 13, 2020 04:51

googlebot added the cla: yes label Jul 13, 2020

annarev force-pushed the tfrt_kernel_fallback_rfc branch from a12efcd to bfd54a3 Compare July 13, 2020 22:49

Initial commit for TFRT Kernel Fallback RFC

f1ce513

Formatting fixes Formatting fixes and removed option 2 for C API Fixed top table Adjusted some of the wording Changed kernel fallback RFC name, updated selective registration section Fix links to cs.opensource.google

annarev force-pushed the tfrt_kernel_fallback_rfc branch from bfd54a3 to f1ce513 Compare July 13, 2020 22:59

fdxmw reviewed Jul 14, 2020

View reviewed changes

rfcs/20200712-tfrt-kernel-fallback.md Outdated Show resolved Hide resolved

Fix top table formatting

43ae49d

zhangqiaorjc reviewed Jul 14, 2020

View reviewed changes

rfcs/20200712-tfrt-kernel-fallback.md Outdated Show resolved Hide resolved

Add title

8ca021a

zhangqiaorjc reviewed Jul 14, 2020

View reviewed changes

rfcs/20200712-tfrt-kernel-fallback.md Outdated Show resolved Hide resolved

fdxmw approved these changes Jul 14, 2020

View reviewed changes

aselle reviewed Jul 14, 2020

View reviewed changes

rfcs/20200712-tfrt-kernel-fallback.md Show resolved Hide resolved

aselle suggested changes Jul 14, 2020

View reviewed changes

rfcs/20200712-tfrt-kernel-fallback.md Outdated Show resolved Hide resolved

annarev added 4 commits July 14, 2020 16:30

Aligned attrs, added comment about naming, wording updates

ebda0ea

Add diagrams for Kernel Fallback RFC

6369d6d

Included diagrams in the doc

9918d19

tfd.kernel_fallback --> tfrt_fallback.kernel_fallback

c0e5d5b

bhack mentioned this pull request Jul 15, 2020

Mobile PoC level support ETA tensorflow/runtime#6

Open

annarev added 7 commits July 15, 2020 04:25

Supplemental tables for Kernel Fallback RFC

2a8f792

Merge branch 'tfrt_kernel_fallback_rfc' of github.com:annarev/communi…

c99b019

…ty into tfrt_kernel_fallback_rfc

Added header

282504e

Added header to kernel_to_unsupported

d562b7e

Mention that the doc is an approximation

120f58c

Adding links to supplemental tables

64845de

Updated date

07f8551

annarev added 4 commits July 16, 2020 17:24

Appendix 2: Inheritance vs templates

a0c5216

Addressed more comments

a95a940

Most significant update: added example for op registration.

Removed reference to TF Lite from kernel selection

b0c7152

Added Engineering impact details

7cb88dd

zhangqiaorjc reviewed Jul 26, 2020

View reviewed changes

rfcs/20200712-tfrt-kernel-fallback.md Outdated Show resolved Hide resolved

annarev added 3 commits July 27, 2020 11:50

Size benchmark details

510f318

Compare kernel vs runtime fallback scalability

4c01f6d

Changed TF Lite --> mobile in some cases

5ebe1ba

raziel reviewed Jul 28, 2020

View reviewed changes

rfcs/20200712-tfrt-kernel-fallback.md Outdated Show resolved Hide resolved

annarev added 3 commits September 9, 2020 14:50

Benchmark update, wording update

d1a3262

`final` keyword removes previously observed large regression

Clarify "mobile" + a few other changes

d193cda

mobile --> mobile platforms

8ff7ddf

Update 20200712-tfrt-kernel-fallback.md

f3e9792

ematejska approved these changes Nov 13, 2020

View reviewed changes

ematejska added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Nov 13, 2020

ematejska merged commit 9c4102d into tensorflow:master Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: TFRT Kernel Fallback #266

RFC: TFRT Kernel Fallback #266

annarev commented Jul 13, 2020 •

edited

Loading

fdxmw left a comment

bhack commented Jul 15, 2020 •

edited

Loading

annarev commented Jul 15, 2020 •

edited

Loading

annarev commented Jul 17, 2020 •

edited

Loading

annarev commented Aug 1, 2020

ematejska commented Aug 3, 2020

annarev commented Aug 4, 2020

bhack commented Aug 4, 2020

annarev commented Aug 4, 2020

bhack commented Aug 4, 2020 •

edited

Loading

annarev commented Aug 4, 2020

bhack commented Aug 4, 2020

annarev commented Sep 11, 2020

annarev commented Sep 14, 2020

RFC: TFRT Kernel Fallback #266

RFC: TFRT Kernel Fallback #266

Conversation

annarev commented Jul 13, 2020 • edited Loading

Objective

fdxmw left a comment

Choose a reason for hiding this comment

bhack commented Jul 15, 2020 • edited Loading

annarev commented Jul 15, 2020 • edited Loading

annarev commented Jul 17, 2020 • edited Loading

annarev commented Aug 1, 2020

Design review notes:

ematejska commented Aug 3, 2020

annarev commented Aug 4, 2020

bhack commented Aug 4, 2020

annarev commented Aug 4, 2020

bhack commented Aug 4, 2020 • edited Loading

annarev commented Aug 4, 2020

bhack commented Aug 4, 2020

annarev commented Sep 11, 2020

annarev commented Sep 14, 2020

annarev commented Jul 13, 2020 •

edited

Loading

bhack commented Jul 15, 2020 •

edited

Loading

annarev commented Jul 15, 2020 •

edited

Loading

annarev commented Jul 17, 2020 •

edited

Loading

bhack commented Aug 4, 2020 •

edited

Loading