Skip to content

Releases: cedricrupb/code_tokenize

code_tokenize v0.2.1

14 Jan 09:15
Compare
Choose a tag to compare

This releases is mainly for bug fixes and for updating dependencies. It should no change anything drastically.

code_tokenize v0.2.0

28 Jun 11:07
2705388
Compare
Choose a tag to compare

Major API redesign

code_tokenize in v0.2.0 makes now mainly use of the visitor pattern for parsing the AST

Changes

  • tokenize parses source code now by parsing the AST and traversing the AST via a visitor
  • custom tokenizing visitors can be defined per language
  • For Python, we correct the tokenization process: the indentation is now AST based computed
  • Code is extensively tested in parsing large libraries (Python and Java)
  • more languages are closer integrated

code_tokenize v0.1.0

19 Jan 18:53
Compare
Choose a tag to compare

First main release of code.tokenize

First version to extend the functionality of the underlying AST parser.

Changes

  • tokenize parses source code now with language specific configuration
  • For Python, we automatically detect indentations and add special tokens
  • Code is now extensively tested in parsing large libraries (Python and Java)
  • Update documentation to make usage more easier

Minor features (still under test)

  • AST path based detection of token types (detection of variable usages, definitions or function calls)
  • Language specific configuration for Java

code_tokenize v0.0.1

01 Nov 16:14
Compare
Choose a tag to compare

The first version of code(dot)tokenize.
The version introduces the following features:

  • Introduction of Token API
  • AST backed tokenization: The token interface enables easy access to the complete AST structure
  • Fast AST parsing backend based on Tree-Sitter
  • Full support of Tree-Sitter: Currently, all languages which are supported by Tree-Sitter can be tokenized
  • Auto loading: The parser definition for each language is automatically downloaded

Minor features (still under test):

  • Convention based statement head identification (the starting token of an statement)
  • Convention based statement splitting