From f1ce513693d188481e9f4184f7901225a87a5be5 Mon Sep 17 00:00:00 2001 From: annarev Date: Tue, 7 Jul 2020 02:09:41 +0000 Subject: [PATCH] Initial commit for TFRT Kernel Fallback RFC Formatting fixes Formatting fixes and removed option 2 for C API Fixed top table Adjusted some of the wording Changed kernel fallback RFC name, updated selective registration section Fix links to cs.opensource.google --- rfcs/20200712-tfrt-kernel-fallback.md | 641 ++++++++++++++++++++++++++ 1 file changed, 641 insertions(+) create mode 100644 rfcs/20200712-tfrt-kernel-fallback.md diff --git a/rfcs/20200712-tfrt-kernel-fallback.md b/rfcs/20200712-tfrt-kernel-fallback.md new file mode 100644 index 000000000..6c5e1328f --- /dev/null +++ b/rfcs/20200712-tfrt-kernel-fallback.md @@ -0,0 +1,641 @@ +# Title of RFC + +| Status | (Proposed / Accepted / Implemented / Obsolete) | +| :------------ | :------------------------------------------------------ | +| **RFC #** | [NNN](https://github.com/tensorflow/community/pull/NNN) | +| **Author(s)** | Anna Revinskaya (annarev@google.com), Jeremy Lau | +: : (lauj@google.com) : +| **Sponsor** | Jeremy Lau (lauj@google.com) | +| **Updated** | 2020-07-06 | + +## Objective + +This proposal focuses on getting majority of “well-behaved” ops running in +[TF Lite](https://www.tensorflow.org/lite) by skipping current eager runtime and +calling kernels directly in [TFRT](https://github.com/tensorflow/runtime) (a new +TensorFlow runtime). + +Note that there is an effort to call existing kernels by delegating to +TensorFlow eager runtime instead. This approach is called Runtime Fallback and +corresponding RFC will be published soon. The goals of the two fallback +mechanisms are as follows: + +* Runtime Fallback aims to reuse all current TensorFlow kernels in TFRT. +* Kernel Fallback (focus of this document) aims to get a large number of + existing kernels working in TFRT while reducing binary size to support + mobile devices. + +## Goals + +High level goals of the project: + +* Call existing kernels from new TensorFlow runtime +* Reduce size and overhead to make this a feasible option for mobile + +We address the first goal by implementing a new fallback mechanism that directly +calls TensorFlow kernels without going through Eager runtime first. We plan to +address the second high level goal by trimming down dependencies, switching to +more compact proto representation, etc.. + +### Op Coverage Goals + +First of all, we plan to target all the easier-to-support ops that don’t require +implementing extensive pieces of infrastructure, but at the same time provide +the most value to the TF Lite team. + +We analysed how many kernels we can support in the future and include our +findings in the following spreadsheets. As we describe in +[Design Proposal](#design-proposal) below, Kernel Fallback depends on +customizing +[OpKernelConstruction](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/framework/op_kernel.h;l=256?q=OpKernelConstruction) +and +[OpKernelContext](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/framework/op_kernel.h;l=584?q=OpKernelContext&ss=tensorflow%2Ftensorflow) +classes. Number of supported kernels will depend on the surface we manage to +customize. (Note that I have already started prototyping the implementation that +includes a few common methods such as `input`, `output`. The spreadsheet below +consideres these methods to be already *supported*). + +* List of kernels and `OpKernelConstruction`/`OpKernelContext` methods they + require: + [spreadsheet](https://docs.google.com/spreadsheets/d/18bOu2gJQnZtCRGPZ4yerEAKUHgp1V429dPdCuzoCSkU/edit?usp=sharing) +* Proposed implementation order for these methods: + [spreadsheet](https://docs.google.com/spreadsheets/d/10u6tcTE9PAi45A04nxSz61whSnwhscRLlSUNRPJugIY/edit?usp=sharing) + +Based on these estimates, we can support >= 423 kernels. Note that this number +is just based on the `OpKernelConstruction`/`OpKernelContext` coverage that we +can provide. It doesn't take into consideration other issues we might face. + +### TFRT Integration Goals + +We want to support executing a [BEF](https://github.com/tensorflow/runtime) file +on mobile device that calls kernels using Kernel Fallback mechanism. Users will +be able to generate a BEF file based on a saved model and we will provide a +script to create it. + +We might also want to support running ops using TFRT eager mode (that is, add a +custom +[OpHandler](https://github.com/tensorflow/runtime/blob/3c7a1ea02c87325f1b47aebb24b3ca6e84e7e7e7/include/tfrt/core_runtime/op_handler.h#L47)). + +## Non-goals + +* Supporting all existing ops. `OpKernelContext` surface is quite large and + implementing all of it would require a significant amount of time. Instead, + we will start by adding most common and easy functionality. If certain + functionality is only used by a handful of kernels, it might make sense to + implement TFRT native kernels instead. One notable example is + [ResourceMgr](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/framework/resource_mgr.h;l=152?q=ResourceMgr). + We might support it later, but it is definitely not first priority due to + extra effort required. +* Gradients would not be supported by the first iteration of Kernel Fallback, + but we might revisit it later. +* Exact details of TFRT integration are still being worked out by TFRT and TF + Lite teams. Since these teams might change the plan, exact details are not a + part of this doc. The take away is that we will integrate kernel fallback + following the approach they decide on. + +## Motivation + +Currently, [TF Lite](https://www.tensorflow.org/lite) supports a +[limited set of ops](https://www.tensorflow.org/lite/guide/ops_compatibility). +As the range and variety of applications grows, it becomes essential to grow the +pool of available ops as well, ideally supporting everything that fully-fledged +TensorFlow supports now. + +However, supporting TensorFlow ops on mobile devices presents some challenges. +Specifically, binary size on mobile platforms should be restricted. TF Lite team +provided us with the following *ideal* numbers: + +* 100-200k overhead to call TF kernels +* 20k / kernel marginal size + +To get closer to the size restrictions we plan to define a call path from TFRT +to TensorFlow kernels that minimizes the amount of generated code. + +## User Benefit + +Running more kernels on mobile devices would allow TensorFlow users to implement +a wider range of models. + +## Design Proposal + +We propose to call the kernel’s Compute method directly from +[TFRT](https://github.com/tensorflow/runtime) without going through TensorFlow +Eager C API first. + +We introduce kernel context and registration implementation that support core +kernel functionality with minimal dependencies. + +## Kernel registration + +We will use a separate registry for kernels supported by TFRT forwarding. To do +so, we will define a `TFRTOpKernelFactories` class that would keep a map from +kernel name to a list of registrations. + +```cpp +class TFRTOpKernelFactories { + public: + TFRTOpKernelFactories(); + void RegisterFactory(StringPiece kernel_class_name, + TFRTOpKernelReg kernel_info); + + // Creates a kernel with the given name and passes op_kernel_construction + // to kernel constructor. + // Returns the constructed kernel on success. + // In case of failure, returns a nullptr. Kernel creation can fail in one + // of the following cases: + // 1. Kernel with the given name is not found. + // 2. Attributes in op_kernel_construction don't match type constraints + // for any of the kernels with this name. + // Note that we consider a constraint to be "not matched" if attribute + // it applies to is not in op_kernel_construction. + std::unique_ptr CreateKernel( + StringPiece kernel_class_name, + TFRTOpKernelConstruction* op_kernel_construction) const; + + private: + llvm::StringMap> factories_; +}; + +extern llvm::ManagedStatic + tfrt_forwarding_kernel_factories; +``` + +Similar to current TensorFlow kernel registartion, we will introduce a +registration macro that adds a kernel to `TFRTOpKernelFactories`. + +```cpp +#define REGISTER_KERNEL_FALLBACK_KERNEL(name, ...) \ + REGISTER_KERNEL_FALLBACK_KERNEL_UNIQ_HELPER(__COUNTER__, name, __VA_ARGS__) + +#define REGISTER_KERNEL_FALLBACK_KERNEL_UNIQ_HELPER(ctr, name, ...) \ + REGISTER_KERNEL_FALLBACK_KERNEL_UNIQ(ctr, name, __VA_ARGS__) + +#define REGISTER_KERNEL_FALLBACK_KERNEL_UNIQ(ctr, name, ...) \ + static bool global_tfrt_forwarding_kernel_##ctr##_registered_ = []() { \ + ::tensorflow::tfrt_forwarding_kernel_factories->RegisterFactory( \ + name, TFRTOpKernelReg([](TFRTOpKernelConstruction* construction) \ + -> std::unique_ptr { \ + return std::make_unique<__VA_ARGS__>(construction); \ + })); \ + return true; \ + }(); +``` + +## Op registration + +To support type specification, we will also provide a minimal Op registry and +corresponding macro `REGISTER_KERNEL_FALLBACK_OP`. + +## Kernel implementation + +TensorFlow kernels inherit from the +[OpKernel](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/framework/op_kernel.h;l=82?q=opkernel) +class and depend on two key classes: +[OpKernelConstruction](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/framework/op_kernel.h;l=256?q=opkernel) +and +[OpKernelContext](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/framework/op_kernel.h;l=584?q=opkernel). +We want to provide custom implementations of these two classes in terms of data +we get from TFRT (for e.g. inputs, attributes). + +There are two main approaches to customize class implementations: + +* Use inheritance and define common interfaces. +* Use templates. + +We ran multiple benchmarks to get an idea of the trade offs between inheritance +and templating approaches. Key findings are summarized below: + +* Time difference negligible for full model benchmarks. +* A simple scalar op benchmark with Kernel Fallback (runs scalar + multiplication, division, addition) was only 0.3% slower on mobile with + inheritance compared to templates. +* [basic\_ops\_benchmark](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/kernels/basic_ops_benchmark_test.cc?q=basic_ops_benchmark_test) + with inheritance is significantly slower: ~7% (median) or ~19% (mean) + (running on Linux). Note that this difference was measured *without* Kernel + Fallback. Adding inheritance would impact all existing TensorFlow kernels + even those that don't support Kernel Fallback. +* Binary size increase when using templates compared to inheritance is + estimated at 2.6% (based on adding `AddN` op). + +Right now, we are leaning towards using inheritance. Seems like time increase is +only significant for running many scalar ops in a sequence - probably a rare use +case in the real world. + +To use inheritance, we will define `OpKernelConstructionInterface` and +`OpKernelContextInterface` interfaces. Ideally, these interfaces should be pure +virtual. However, we will have one exception - templated `eigen_device` method +that calls per-device pure-virtual implementations. + +We will then introduce `TFRTOpKernelConstruction` and `TFRTOpKernelContext` +subclasses that implement `OpKernelConstructionInterface` and +`OpKernelContextInterface` in terms of TFRT data structures. Example how +`TFRTOpKernelConstruction` might look like: + +```cpp +class TFRTOpKernelConstruction : public OpKernelConstructionInterface { + public: + explicit TFRTOpKernelConstruction(AttrMap attributes); + ~TFRTOpKernelConstruction() override {}; + + Status GetAttr(StringPiece attr_name, int32* value) const override; + Status GetAttr(StringPiece attr_name, DataType* value) const override; + + void CtxFailure(const Status& s); + void CtxFailureWithWarning(const Status& s); + void CtxFailure(const char* file, int line, const Status& s); + void CtxFailureWithWarning(const char* file, int line, const Status& s); + ... +}; +``` + +When forwarding, we instantiate the kernel interfaces with TFRT’s lightweight +OpKernel definitions, rather than TensorFlow’s +[heavyweight OpKernel definitions](https://cs.opensource.google/android/platform/superproject/+/master:external/tensorflow/tensorflow/core/framework/op_kernel.h;l=612?q=opkernelcontext) +for example. + +Example `AddN` kernel implementation using these new interfaces: + +```cpp +class AddNOp : public OpKernelBase { + public: + explicit AddNOp(OpKernelConstructionInterface* construction) : + OpKernelBase(construction) {} + + void Compute(OpKernelContextInterface* ctx) override { + if(!ctx->ValidateInputsAreSameShape(this)) return; + ... +``` + +Here, `OpKernelBase` implementation will be minimal: + +```cpp +class OpKernelBase { + public: + explicit OpKernelBase(OpKernelConstructionInterface* context) { + } + virtual ~OpKernelBase() {} + virtual void Compute(OpKernelContextInterface* context) = 0; +}; +``` + +(For details how extending from `OpKernelBase` instead of `OpKernel` would work +with current TensorFlow runtime see [Appendix 1](#appendix-1)) + +Corresponding .cc file then registers the kernel using the correct kernel and +context classes. For example, this is how we register `AddN` kernel with TFRT: + +```cpp +REGISTER_KERNEL_FALLBACK_KERNEL( "AddN", AddNOp); +``` + +## Calling kernel + +We add a new TFRT BEF kernel called `tfd.kernel_fallback`. This kernel directly +calls a TF kernel’s `Compute` method by creating `TFRTOpKernel*` data structures +that forward to corresponding TFRT concepts. For example, the following code +accesses an input in `llvm::ArrayRef>` which +we get from TFRT: + +```cpp +const Tensor& TFRTOpKernelContext::input(int index) { + return inputs_[index]->get(); +} +``` + +Simplified definition of `tfd.kernel_fallback`: + +```cpp +// Instantiate a kernel. This would be a TensorFlow kernel converted to inherit +// from `OpKernelBase` instead of `OpKernel`. +std::unique_ptr op = …; + +// Create TFRTOpKernelContext. The variable exec_ctx here is the tfrt::ExecutionContext passed to the kernel handler. +TFRTOpKernelContext op_kernel_context(inputs, outputs.size(), op_meta, exec_ctx.host()); + +// Directly invoke the TF kernel's Compute() method. +op->Compute(&op_kernel_context); +``` + +## tfd.kernel\_fallback call structure + +We will be using the following conventions (essentially, these are based on +Runtime Fallback work that will probably have RFC coming soon): + +* Attributes are passed as key-value pairs, where both key and value are + represented as strings. +* Types have a specific string representation. We are trying to use names + consistent with BEF syntax as much as possible (for e.g. `f32` represents + `float`). +* Inputs and outputs have type `tensorflow::Tensor`. We will provide BEF + kernels to construct these from BEF data (for e.g. constant values). + +Example of invoking Conv3D kernel: + +``` +%tft_c = "tfd.kernel_fallback"(%tft_a, %tft_b) { + _op_name = "Conv3D", attr1_name="data_format", + attr1_value="string$NDHWC", attr2_name="strides", + attr2_value="list(i32)$1,1,1,1,1", attr3_name="dilations", + attr3_value="list(i32)$1,1,1,1,1", attr4_name="padding", + attr4_value="padding$SAME"}: (!tfd.tensor, !tfd.tensor) -> !tfd.tensor +``` + +For example, `dilations` attribute here has a value of `[1, 1, 1, 1, 1]`. + +## Reusing Kernels + +TensorFlow currently reuses kernels instantiated for a particular node in a +graph. It would be nice to have this optimization for Kernel fallback as well. + +BEF executor keeps track of offsets within a BEF file. We can use this offset to +cache corresponding kernel objects. + +We should make sure that Kernel Fallback is thread safe when reusing kernel +objects since Compute for the same kernel can be called from multiple threads. +We can take a simple approach and support kernel cache only for stateless +kernels. Stateless kernels only update `OpKernelContext` and not `OpKernel` +state itself. + +## C API Integration + +Modular TensorFlow effort aims to break up giant monolithic TensorFlow binaries +into smaller shared libraries. Specifically, James (@sjamesr) and Gunhan +(@gunhan) looked at splitting out kernels out of TensorFlow core. Initial Kernel +C API definition is at +[kernel.h](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/kernels.h) +and its implementation is at +[kernel.cc](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/kernels.cc?q=kernels.cc). + +Kernel Fallback should support kernels migrated to C API as well. We can +implement this support behind the C API, so that we don’t have to update +individual kernels. + +### C API multiple implementation structure + +There are a few important takeaways from current kernel C API implementation +that will impact decisions in the document: + +1. We register a + [COpKernel](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/kernels.cc;l=104?q=copkernel) + object (with TensorFlow op kernel registry) for _any_ kernel defined using + the C API. +1. `OpKernelContext` and `OpKernelConstruction` are passed around as opaque + pointers on the C API surface (they get cast to `TF_OpKernelContext` and + `TF_OpKernelConstruction` aliases). +1. Most of the functions just provide accessors into + `OpKernelContext`/`OpKernelConstruction` types. + +Given current API structure, we can consider two approaches going forward: + +1. TFRT fully supports all functionality available in the C API. This way any + kernel defined using the C API would be automatically available using either + full TensorFlow or the TFRT-to-TF forwarding delegate. +1. Certain functionality is only available with TF backend. TFRT C API + implementation falls back to full TensorFlow in these cases. + +I recommend that we prioritize option 1 and try to get it working (i.e. support +all functionality with both TensorFlow and TFRT C API backend). It already takes +a significant effort to support more kernels with C API, so we can put a little +extra effort and make sure it is supported by both runtimes. + +We propose to provide two implementations of the kernel C API. First +implementation is the +[current one](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/kernels.cc) - +implemented in terms of TensorFlow runtime. Second implementation will use TFRT +Kernel Fallback instead. We can select between the two kernel C API +implementations by adding a build config setting: + +``` +# Whether to use TFRT-based implementation of the kernel C API. +config_setting( + name = "tfrt_kernel_c_api", + define_values = { + "tfrt_kernel_c_api": "True", + }, +) +``` + +Most of the kernel C API implementation will be the same between the two with a +few notable exceptions: + +* TFRT Kernel Fallback implementation will cast `TF_OpKernelContext` and + `TF_OpKernelConstruction` to `TFRTOpKernelContext` and + `TFRTOpKernelConstruction` respectively. +* TFRT Kernel Fallback implementation will use Kernel Fallback registration + mechanism. + +### TFRT forwarding kernel registration using C API + +We plan to implement C API for TFRT kernel registration that calls TFRT Kernel +Fallback registration mechanism. Note that this is analogous to TF Lite +providing +[their own C API registration mechanism](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/lite/c/common.h;l=739?q=tfliteregistration&ss=tensorflow%2Ftensorflow). + +```cpp +TF_KernelBuilder* TF_NewKernelBuilder( + const char* op_name, const char* device_name, + void* (*create_func)(TF_OpKernelConstruction*), + void (*compute_func)(void*, TF_OpKernelContext*), + void (*delete_func)(void*)) { + TF_KernelBuilder* result = new TF_KernelBuilder; + result->create_function = create_func; + result->compute_function = compute_func; + result->delete_function = delete_func; + return result; +} + +void TF_RegisterKernelBuilder(const char* name, + TF_KernelBuilder* builder, + TF_Status* status) { + auto* create_fn = builder->create_function; + auto* compute_fn = builder->compute_function; + auto* delete_fn = builder->delete_function; + auto create_kernel = [create_fn, compute_fn, delete_fn]( + TFRTOpKernelConstruction* construction) { + return std::make_unique( + construction, create_fn, compute_fn, delete_fn); + }; + ::tensorflow::TFRTKernelReg kernelinfo(create_kernel); + kernelinfo.type_constraints = builder->attr_to_type; + ::tensorflow::tfrt_kernel_factories->RegisterFactory( + name, kernelinfo); + tensorflow::TFRTOpRegisterer(tensorflow::TFRTOpMetaBuilder(name)); + TF_DeleteKernelBuilder(builder); + TF_SetStatus(status, TF_OK, ""); +} +``` + +## TFRT integration + +Current preferred direction would generate a +[BEF](https://github.com/tensorflow/runtime) file in advance and then run that +file on a mobile device. Generated BEF file would have to call either native, TF +Lite, runtime fallback or kernel fallback kernels and provide any glue logic in +between (such as tensor conversions). + +We also need to consider how kernel or runtime fallback will be selected. This +could be a parameter at BEF file creation step. It might also be good to package +both runtime and kernel fallback implementations in a BEF file to be selected at +runtime. + +## Size Reduction + +Since we want to run on a mobile platform, we need to look for any opportunity +to cut down size. First of all, we remove dependency on current TensorFlow +runtime (for e.g. we no longer depend on `NodeDef` and `OpDef` protos). We are +also looking at ways to reduce large size contributions of +[absl libraries](https://github.com/abseil/abseil-cpp/tree/master/absl) and +[protos](https://github.com/protocolbuffers/protobuf). + +### Protos + +We are currently investigating the following options: + +* Switch to [micropb](https://github.com/protocolbuffers/upb). This proto + implementation provides C interfaces and is more compact. +* Remove dependency on protos. + +### ABSL + +We can hide ABSL references behind aliases (see +[tensorflow::StringPiece](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/platform/stringpiece.h;l=33;drc=af7fd02ca40f362c4ac96dd064d6a2224b65d784) +for example) to make it easier to replace all references to save binary size. + +@gunhan is also starting an effort to define a library of STL utilities that +helps us cut down on binary size. + +## Selecting which kernels to register + +We want to add a script to the TF Lite build setup that can determine required +kernels based on a model. We would then only build these kernels. For now, we +will only support selective registration when building from source. + +Script details still need to be worked out. + +### Alternatives Considered + +The main alterantive to TFRT Kernel Fallback is TFRT Runtime Fallback. TFRT +Runtime Fallback will call TensorFlow Eager C API (corresponding RFC should be +published soon). Main trade offs between the two fallbacks are described in the +table below: + +Property | TFRT Kernel Fallback | TFRT Runtime Fallback +----------- | --------------------- | --------------------- +Generality | Support subset of ops | Support all ops +Performance | Lower overhead | Higher overhead +Binary size | Lower (no TF runtime) | Higher + +### Performance Implications + +* Slow down due to adding inheritance for `OpKernelContext` and + `OpKernelConsturction`. +* Speed up for lighter weight kernel calls. + +We will run benchmarks to check performance numbers as we work on the +implementation. + +### Dependencies + +No new dependencies. + +### Engineering Impact + +* Build / startup time / binary size will be impacted by additional code added + to implement Kernel Fallback. At the same time one of the goals of Kernel + Fallback is to provide a lower-binary-size way to run existing TensorFlow + kernels in TF Lite. +* Code will be maintained by TensorFlow DevInfra and TFRT teams. + +### Platforms and Environments + +* Primarily geared towards mobile platforms but should work on non-mobile + platforms as well. + +### Best Practices + +* It might be preferrable to implement future kernels that extend + `OpKernelBase` and take `OpKernelConstructionInterface`/`OpKernelContext` + interface. This would allow new kernels to be used by Kernel Fallback. + Currently, there is no plan to enforce it beyond providing advice at code + review time. + +### Tutorials and Examples + +* Would be useful to update + [Create an op](https://www.tensorflow.org/guide/create_op) documentation. + +### Compatibility + +This proposal should not impact compatibility. + +### User Impact + +* There will be a new way to implement a kernel, but it will be optional. + Current APIs should still work. + +## Questions and Discussion Topics + +Seed this with open questions you require feedback on from the RFC process. + +## Appendix 1 + +As discussed above, we want to convert (some) kernels to extend from +`OpKernelBase` instead of `OpKernel`. This lets us remove runtime-specific +information from kernel subclasses and lets us support both current and new +TensorFlow runtime. + +However, TensorFlow runtime assumes that kernel extend `OpKernel` and support +all of its functionality. In other words we want kernels to extend +`OpKernelBase` but be added to existing TensorFlow registry as `OpKernel` +objects. + +It seems easiest to me to wrap OpKernelBase some class that extends OpKernel (I +call this wrapper WrappedOpKernel below): + +```cpp + class WrappedOpKernel : public OpKernel { + public: + explicit WrappedOpKernel(OpKernelConstruction* context, + std::unique_ptr impl) + : OpKernel(context), impl_(std::move(impl)) {} + + void Compute(OpKernelContext* context) override { + impl_->Compute(context); + } + + private: + std::unique_ptr impl_; +}; +``` + +Kernels of type WrappedOpKernel will be created with corresponding +WrappedOpKernelFactory in TensorFlow: + +```cpp +struct WrappedOpKernelFactory : public OpKernelFactory { + explicit WrappedOpKernelFactory( + OpKernelBase* (*create_func)(OpKernelConstructionInterface*)) + : create_func_(create_func) {} + + OpKernel* Create(OpKernelConstruction* context) override; + OpKernelBase* (*create_func_)(OpKernelConstructionInterface*); +}; + + +OpKernel* OpKernelRegistrar::WrappedOpKernelFactory::Create( + OpKernelConstruction* context) { + std::unique_ptr impl((*create_func_)(context)); + return new WrappedOpKernel(context, std::move(impl)); +} +``` + +This approach has several benefits: + +* Existing, non-converted kernels still extend `OpKernel`, no code change + needed. +* Converted kernels registered with TensorFlow are still wrapped with OpKernel + and therefore, TensorFlow runtime can access all fields currently supported + by OpKernel. +* Converted kernels registered with TFRT only depend on `OpKernelBase` (for + example, they do not have `NodeDef`-related properties that are not + supported by TFRT).