You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for your interest for this extension.
Currently, spaCy fishing relies on entity-fishing version 0.0.5 which supports 11 languages (English, French, German, Spanish, Italian, Arabic, Japanese, Chinese (Mandarin), Russian, Portuguese and Farsi). Unfortunately, Danish resources are not yet supported by entity-fishing.
The resources creation for a new language is a process which strongly depends on the evolution of entity-fishing tool and not (directly) of spaCy fishing.
However, if the Wikipedia corpus for Danish language is sufficient, you can create the resources for a new language with grisp tool and start a new entity-fishing instance for Danish. All detailed process to initialize a new language with grisp & entity-fishing is described here.
Feel free to write an issue on entity-fishing for more details on this process (maybe this language is already considered in progress?).
There is no plan currently to support Danish because the size of the Danish Wikipedia is very small for a decent entity disambiguation usage - it has 286,583 articles. I made some experiments and with a size lower than 1M articles, it starts to be difficult to have a correct coverage of entities, enough statistics and disambiguation context examples. The resulting disambiguator would be very limited and inaccurate for a concrete usage.
So I am currently rather focusing on languages with around 1M articles or more.
However, with some cross-lingual approaches, it might be possible in the future to support languages with limited Wikipedia size.
I really love this library, and it would be awesome, if support for the Danish Wikipedia was added.
What is needed for this to happen?
The text was updated successfully, but these errors were encountered: