Skip to content

Files

Latest commit

 

History

History

development

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Elastiknn Development

This document includes some notes about development of Elastiknn.

Local Development Setup

You need at least the following software installed: git, Java 21, Python 3.10, SBT, docker, docker compose, and task. We're assuming the operating system is Linux or MacOS. There might be other software which is missing. If so, please submit an issue or PR.

AWS Development Setup

The aws directory contains a Terraform file and instructions for creating a development instance in AWS.

Development

Run a local Elasticsearch instance with the plugin installed

Once you have the prerequisites installed, clone the project and run:

task jvmRunLocal

This starts a local instance of Elasticsearch with the plugin installed. It can take about five minutes the first time you run it.

Once you see "EXECUTING", you should open another shell and run curl localhost:9200. You should see the usual Elasticsearch JSON response containing the version, cluster name, etc.

Project Structure

Elastiknn currently consists of several subprojects managed by Task and Gradle:

  • client-python - Python client.
  • elastiknn-api4s - SBT project containing Scala case classes that model the Elastiknn API.
  • elastiknn-client-elastic4s - SBT project containing a Scala client based on Elastic4s.
  • elastiknn-lucene - SBT project containing custom Lucene queries implemented in Java.
  • elastiknn-models - SBT project containing custom similarity models implemented in Java.
  • elastiknn-plugin - SBT project containing the actual plugin implementation.
  • elastiknn-testing - SBT project containing Scala tests for all the other Gradle subprojects.
  • ann-benchmarks - Python project for benchmarking based on erikbern/ann-benchmarks.

The lucene and models sub-projects are implemented in Java for a few reasons:

  1. It makes it easier to ask questions on the Lucene issue tracker and mailing list.
  2. They are the most CPU-bound parts of the codebase. While Scala's abstractions are nicer than Java's, they sometimes have a surprising performance cost (e.g., boxing).

Build tools: Task and SBT

SBT manages the plugin and all the Java and Scala subprojects.

Task is used to define command aliases with simple dependencies. This makes it relatively easy to run tests, generate docs, publish artifacts, etc. all from one file.

IDE

I recommend using IntelliJ Idea to work on the SBT projects and Pycharm to work on the client-python project.

For IntelliJ, install the IntelliJ Scala plugin and open the elastiknn directory in IntelliJ. IntelliJ should recognize the SBT project. You might have to specify the JDK and Scala SDK; as of April 2024, we're using JDK 21 and Scala 3.3.3. Since early 2023, we're also using some experimental JDK features which also require some additional settings. Go to Settings > Build, Execution, Deployment > Java Compiler, and add --add-modules jdk.incubator.vector --add-exports java.base/jdk.internal.vm.vector=ALL-UNNAMED --add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED to the "Additional command line parameters". Then go to Settings > Build, Execution, Deployment > Scala Compiler, and add the same parameters in the "Additional compiler options".

For Python and Pycharm, you should first create a virtual environment in client-python/venv. You can do this by running task pyCreateVenv. Then you should configure PyCharm to use the interpreter in client-python/venv.

Testing

Elastiknn has a fairly thorough test suite.

To run it, you'll first need to run task dockerRunTestingCluster or task jvmRun to start a local Elasticsearch server.

Then, run task jvmUnitTest to run the SBT test suite, or task pyTest to run the smaller Python test suite.

Debugging

You can attach IntelliJ's debugger to a local Elasticsearch process. This can be immensely helpful when dealing with bugs or just figuring out how the code is structured.

First, open your project in IntelliJ and run the Debug Elasticsearch target (usually in the upper right corner). Then just run task jvmRunLocalDebug in your terminal.

Now you can set and hit breakpoints in IntelliJ. To try it out, open the RestPluginsAction.java file in IntelliJ, add a breakpoint in the getTableWithHeader method, and run curl localhost:9200/_cat/plugins. IntelliJ should stop execution at your breakpoint.

Local Cluster

Use task dockerRunTestingCluster to run a local cluster with one master node and one data node (using docker compose). There are a couple parts of the codebase that deal with serializing queries for use in a distributed environment. Running this small local cluster exercises those code paths.

Benchmarking and Profiling

See ann-benchmarks/README.md

Miscellaneous Quirks

  • To run Elasticsearch on Linux, you need to increase the vm.max_map_count setting. See the Elasticsearch docs.
  • To run ann-benchmarks on MacOS, you might need to export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES. See this Stackoverflow answer.
  • If you're running on MacOS 13.x (Ventura), the operating system's privacy settings might block task jvmRunLocal from starting. One solution is to go to System Settings > Privacy & Security > Developer Tools, and add and check your terminal (e.g., iTerm) to the list of developer apps. If that doesn't work, see this thread for more ideas: elastic/elasticsearch#91159.
  • When running tests from Intellij, you might need to add --add-modules jdk.incubator.vector to the VM options.

Nearest Neighbors Search

Nearest neighbors search is a large topic. Some good places to start are: