Task Abstraction #12114
Replies: 2 comments 2 replies
-
Another idea: Create a wrapper around the The main downside I can see immediately is that I do not know how to incorporate spacy's built-in components since these are already registered.
|
Beta Was this translation helpful? Give feedback.
-
Hi @coltonflowers! These are some interesting thoughts. Got some questions to better understand your use case.
I. e. all components your engineers are building and data scientists will be using will be custom spaCy components, and your EL system will be assembled and run as a spaCy pipeline. Is that correct?
From your description it seems to me that the
"Users" is equivalent with data scientists for your use case? |
Beta Was this translation helpful? Give feedback.
-
I am currently building my own entity linking system that has two overarching goals:
Allow our NLP Engineers to implement their own solutions to the various tasks and subtasks needed, extracting entities/spans, linking spans/entities to knowledge base ids, and doing both in an end-to-end fashion. I'd like to be able to define the task by defining the document components they require and the ones that they will assign, much like in the language.factory method and have the NLP engineers implement concrete pipeline components that have the corresponding
requires
andassigns
arguments.Allow our data scientists to deploy the concrete implementations for the various tasks/subtasks provided by the NLP engineers without needing to know anything but a components'
requires
andassigns
arguments.To keep in-line with the open-closed principle, I originally found myself creating a series of abstract base classes of models, each defined by its
requires
andassigns
arguments.I eventually realized that I was reimplementing a lot of functionality already present in Spacy so I decided to do the entire project using Spacy's API. I thought that I could replace the abstract base classes with their respective pipeline components, but these two concepts are not really isomorphic since these concrete pipeline components determine some (but not a lot) of implementation/architecture detail. For example, the entity-recognizer pipeline component is inherently a transition-based parser, but one could imagine doing NER using something else. Interestingly, different concrete pipeline components can have the same input/output type combinations but each pipeline component class is dedicated to a different type of implementation, e.g. rule-based vs. statististical as in the
entity_ruler
vsner
. I would like some sort of task abstraction to unite these two and tell users that I can replace components of one task type with a component of the same task type.I see three options:
Pipe
/TrainablePipe
component classes and their concrete component classes. Possibly use a mix-in? But then I'm not sure how to enforce that the mix-in only gets inherited along with either thePipe
orTrainablePipe
class. Maybe I just need to do some more research about this.Are there any additional pros/cons with these two approaches or possibly an entirely different solution? Or, am I better off not trying to do this, altogether? (edited)
Beta Was this translation helpful? Give feedback.
All reactions