Library that explores source code to model it and make predictions using N-gram method.
Original idea and library lays here
You can do it in several ways
We will be using JitPack service
- Add JitPack to repositories
repositories {
maven { url "https://jitpack.io" }
}
- Add library to compile dependencies
dependencies {
implementation "com.github.AndreyBychkov:SLP:x"
}
and replace x
with latest version number.
Download jar
and source code from latest release
and use your favorite way to add it as a dependency.
val file = File("path/to/file.ext")
val manager = ModelRunnerManager()
val modelRunner = manager.getModelRunner(file.extension)
modelRunner.train(file)
val suggestion = modelRunner.getSuggestion("your source code")
println(suggestion)
Here we train a model on specified file
and make a suggestion of next token for inputs like for (
or System.out.
ModelRunnerManager is a class that
- provides you Models for specified file's extension or Language.
- Contains and manages all your models.
- Provides IO operations for save & load itself and thus containing models
Example
val storeDirectory = File("path/to/dir")
ModelRunnerManager().apply {
load(storeDirectory)
getModelRunner(Language.JAVA).train("int i = 0;")
save(storeDirectory)
}
ModelRunner and LocalGlobalModelRunner are classes that wraps N-gram Models and
- provides
train
andforget
operations for texts and files - provides flexible suggesting API for predicting next tokens
ModelRunner's aim is to build a pipeline with form
Input -> Lexer -> Vocabulary -> Model -> Vocabulary -> Reverse Lexer -> Output
so it requires 3 components:
- LexerRunner
- Vocabulary
- Model
Providing custom components can help you customize ModelRunner for your own needs.
LocalGlobalModelRunner is extension of ModelRunner which handles 2 different models: Local and Global.
We propose to use Local model in quickly changing contexts, like within a file.
On the contrary, we propose using Global in large static contexts like modules or projects.
Together they generate more balanced suggestion than they do individually.
LexerRunner is the class that manages Lexer and implements lexing pipeline.
Example
val lexerRunner = LexerRunnerFactory.getLexerRunner(Language.JAVA)
val code = "for (int i = 0; i != 10; ++i) {"
println(lexerRunner.lexLine(code).toList())
will generate list
[<s>, for, (, int, i, =, 0, ;, i, !=, 10, ;, ++, i, ), {, </s>]
You can use LexerRunnerFactory
to get predefined LexerRunner
for implemented languages.
Vocabulary is the class that translates tokens to numbers for model.
You will never interact with it directly but if you wish to manually control it's content you can save & load it with VocabularyRunner and pass already filled vocabulary to ModelRunner.
Model is the interface every model must implement so they can be used by ModelRunner.
All our abstract models like NGramModel
have static method standart
which returns the generally best model in it's category.
If you want to mix your, for instance, neural network model with N-gram based,
your model should implement Model
interface
and can be mixed with by MixModel
If your language is not provided by SLP and you want to increase it's performance or appearance we propose you to to following steps:
- Implement
Lexer
to have control over tokens extraction. - Implement
CodeFilter
to have control over output text appearance. This class translates tokens into text. Also, feel free to use some predefined filters fromFilters
- Add your language to
Language
andLexerRunnerFactory
with it's file extensions. - Make a pull request.
Currently I working on removing need in pull request so can extend SLP directly on your project.