Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
-
Updated
Jul 2, 2024 - Go
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
TokenScript schema, specs and paper
Open morphology for Finnish
Frames iOS: making native card payments simple
Frames Android: making native card payments simple
A Python 3 module that provides functions for splitting identifiers found in source code files.
Taiwanese Hokkien Transliterator and Tokeniser
Built a complete search engine by creating an Inverted Index on the Wikipedia corpus ( of 2018 with size 72 GB). That gives you top search result related to given query words.
Taiwanese Hokkien Transliterator and Tokeniser
A tiny utility that takes a string and decomoposes it to the letters of the Hungarian alphabet.
A search engine is constructed to return customised recipes according to three sorting algorithms. Speed is improved by performing pre-processing and inverted index.
This project predicts MBTI personality types from users' recent 50 posts using NLP and ML techniques.
an R2E (research to earn) dapp for legal researchers
It is an end-to-end text summarizer application, which uses Meta's BART model and is fine-tuned on the Samsung dataset.
Add a description, image, and links to the tokenisation topic page so that developers can more easily learn about it.
To associate your repository with the tokenisation topic, visit your repo's landing page and select "manage topics."