Skip to content
#

tokenizer-parser

Here are 40 public repositories matching this topic...

数据标注是一款专门对文本数据进行处理和标注的工具,通过简化快捷的文本标注流程和动态的算法反馈,支持用户快速标注关键词并能通过算法持续减少人工标注的成本和时间。数据标注的过程先由人工标注构筑基础,再由自动标注反哺人工标注,最后由人工标注进行纠偏,从而大幅度提高标注的精准度和高效性。数据标注是一个完全开源的项目,无商业版,但是需要依赖开源的数字底座进行人员岗位管控。各类词库结果会定期在本平台公开。

  • Updated Dec 13, 2024
  • Java
antlr4-experiments

🔧 My studies on context-free grammar, using ANTLR4 (C++) to generate the parser files. Some basics are developed, such as token processing, recursion, variable definition, array processing, Abstract Syntax Tree (AST) manipulation, UNICODE support, and error handling.

  • Updated Oct 17, 2022
  • Java
lex-yacc-experiments

🔧 My studies involving context-free grammar analysis. The analyzers were built using familiar tools such as YACC, Lex and Bison. Topics covered include token filtering, simple variable manipulation, and arrays.

  • Updated Oct 17, 2022
  • Yacc

A Basic Experiment in Parser and Compilers and Stack VM . A basic stack based CPU with Assembly language and basic commands. A basic programming Languge Parsed to Tokens to e parsed to expressions to be compiled to assembly code to be executed on the virtual CPU... Also to be used to Parse English grammar to make abstract syntax trees.

  • Updated May 2, 2021
  • Visual Basic .NET

Machine Learning approach to Bengali Corpus POS Tagging using BNLTK. This is an experimenting project under the mentorship of Prof. Sandipan Ganguly, HIT-K.

  • Updated May 2, 2022
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the tokenizer-parser topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tokenizer-parser topic, visit your repo's landing page and select "manage topics."

Learn more