Releases: cedricrupb/code_tokenize
Releases · cedricrupb/code_tokenize
code_tokenize v0.2.1
This releases is mainly for bug fixes and for updating dependencies. It should no change anything drastically.
code_tokenize v0.2.0
Major API redesign
code_tokenize in v0.2.0 makes now mainly use of the visitor pattern for parsing the AST
Changes
- tokenize parses source code now by parsing the AST and traversing the AST via a visitor
- custom tokenizing visitors can be defined per language
- For Python, we correct the tokenization process: the indentation is now AST based computed
- Code is extensively tested in parsing large libraries (Python and Java)
- more languages are closer integrated
code_tokenize v0.1.0
First main release of code.tokenize
First version to extend the functionality of the underlying AST parser.
Changes
tokenize
parses source code now with language specific configuration- For Python, we automatically detect indentations and add special tokens
- Code is now extensively tested in parsing large libraries (Python and Java)
- Update documentation to make usage more easier
Minor features (still under test)
- AST path based detection of token types (detection of variable usages, definitions or function calls)
- Language specific configuration for Java
code_tokenize v0.0.1
The first version of code(dot)tokenize.
The version introduces the following features:
- Introduction of Token API
- AST backed tokenization: The token interface enables easy access to the complete AST structure
- Fast AST parsing backend based on Tree-Sitter
- Full support of Tree-Sitter: Currently, all languages which are supported by Tree-Sitter can be tokenized
- Auto loading: The parser definition for each language is automatically downloaded
Minor features (still under test):
- Convention based statement head identification (the starting token of an statement)
- Convention based statement splitting