-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added github action
- Loading branch information
Showing
17 changed files
with
906 additions
and
70 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
env: | ||
CGO_ENABLED: "0" | ||
REGISTRY: ghcr.io | ||
IMAGE_NAME: alash3al/vecdb | ||
|
||
on: | ||
release: | ||
types: [created] | ||
|
||
jobs: | ||
binary-releaser: | ||
name: Release Go Binary | ||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
goos: [linux, darwin, windows] | ||
goarch: [amd64, arm64] | ||
steps: | ||
- uses: actions/checkout@v4.1.3 | ||
- uses: wangyoucao577/go-release-action@v1.50 | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
goos: ${{ matrix.goos }} | ||
goarch: ${{ matrix.goarch }} | ||
goversion: "https://dl.google.com/go/go1.22.0.linux-amd64.tar.gz" | ||
project_path: "." | ||
binary_name: "vecdb" | ||
ldflags: "-s -w" | ||
|
||
docker-releaser: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
contents: read | ||
packages: write | ||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v4.1.3 | ||
|
||
- name: Get Latest Tag | ||
id: var_tag | ||
run: echo "name=$(git describe --tags --abbrev=0)" >> $GITHUB_OUTPUT | ||
|
||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v3.3.0 | ||
|
||
- name: Log in to Github Container registry | ||
uses: docker/login-action@v3 | ||
with: | ||
registry: ${{ env.REGISTRY }} | ||
username: ${{ github.actor }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
- name: Build and push Docker image | ||
uses: docker/build-push-action@v5.3.0 | ||
with: | ||
file: Dockerfile | ||
context: . | ||
push: true | ||
tags: | | ||
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest | ||
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.var_tag.outputs.name }} | ||
cache-from: type=gha | ||
cache-to: type=gha,mode=max |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,104 @@ | ||
VecDB `[WIP]` | ||
=========== | ||
> a very simple vector embedding database | ||
VecDB | ||
====== | ||
> a very simple vector embedding database, | ||
> you can say that it is a hash-table that let you find items similar to the item you're searching for. | ||
TODO | ||
Why! | ||
==== | ||
- [x] Build the storage layer interface | ||
- [x] Introduce "Cosine Similarity" as the first vector similarity search algorithm in `v1` driver. | ||
- [x] Introduce "BoltDB" as the first storage driver `v1` driver. | ||
- [ ] Expose a server interface. | ||
- [ ] Expose a command line interface. | ||
> I'm a databases enthusiast, and this is a for fun and learning project that could be used in production ;). | ||
> | ||
> **P.S**: I like to re-invent the wheel in my free time, because it is my free time! | ||
Why? | ||
==== | ||
> I'm a databases enthusiast, and this is a for fun and learning project. | ||
Data Model | ||
========== | ||
> I'm using the `{key => value}` model, | ||
> - `key` should be a unique value that represents the item. | ||
> - `value` should be the vector itself (List of Floats). | ||
Configurations | ||
============== | ||
> by default `vecdb` searches for `config.yml` in the current working directory. | ||
> but you can override it using the `--config /path/to/config.yml` flag by providing your own custom file path. | ||
```yaml | ||
# http server related configs | ||
server: | ||
# the address to listen on in the form of '[host]:port' | ||
listen: "0.0.0.0:3000" | ||
|
||
# storage related configs | ||
store: | ||
# the driver you want to use | ||
# currently vecdb supports "bolt" which is based on boltdb the in process embedded the database | ||
driver: "bolt" | ||
# the arguments required by the driver | ||
# for bolt, it requires a key called `database` points to the path you want to store the data in. | ||
args: | ||
database: "./vec.db" | ||
|
||
# embeddings related configs | ||
embedder: | ||
# whether to enable the embedder and all endpoints using it or not | ||
enabled: true | ||
# the driver you want to use, currently vecdb supports gemini | ||
driver: gemini | ||
# the arguments required by the driver | ||
# currently gemini driver requires `api_key` and `text_embedding_model` | ||
args: | ||
api_key: "${GEMINI_API_KEY}" | ||
text_embedding_model: "text-embedding-004" | ||
``` | ||
Components | ||
=========== | ||
- Raw Vectors Layer (low-level) | ||
- send [VectorWriteRequest](#VectorWriteRequest) to `POST /v1/vectors/write` when you have a vector and want to store it somewhere. | ||
- send [VectorSearchRequest](#VectorSearchRequest) to `POST /v1/vectors/search` when you have a vector and want to list all similar vectors' keys/ids ordered by cosine similarity in descending order. | ||
- Embedding Layer (optional) | ||
- send [TextEmbeddingWriteRequest](#TextEmbeddingWriteRequest) to `POST /v1/embeddings/text/write` when you have a text and want `vecdb` to build and store the vector for you using the configured embedder (gemini for now). | ||
- send [TextEmbeddingSearchRequest](#TextEmbeddingSearchRequest) to `POST /v1/embeddings/text/search` when you have a text and want `vecdb` to build a vector and search for similar vectors' keys for you ordered by cosine similarity in descending order. | ||
|
||
Requests | ||
======== | ||
|
||
### VectorWriteRequest | ||
```json5 | ||
{ | ||
"bucket": "BUCKET_NAME", // consider it a collection or a table | ||
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc) | ||
"vector": [1.929292, 0.3848484, -1.9383838383, ... ] // the vector you want to store | ||
} | ||
``` | ||
|
||
### VectorSearchRequest | ||
```json5 | ||
{ | ||
"bucket": "BUCKET_NAME", // consider it a collection or a table | ||
"vector": [1.929292, 0.3848484, -1.9383838383, ... ], // you will get a list ordered by cosine-similarity in descending order | ||
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get | ||
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit) | ||
} | ||
``` | ||
|
||
### TextEmbeddingWriteRequest | ||
> if you set `embedder.enabled` to `true`. | ||
|
||
```json5 | ||
{ | ||
"bucket": "BUCKET_NAME", // consider it a collection or a table | ||
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc) | ||
"content": "This is some text representing the product" // this will be converted to a vector using the configured embedder | ||
} | ||
``` | ||
|
||
### TextEmbeddingSearchRequest | ||
> if you set `embedder.enabled` to `true`. | ||
|
||
```json5 | ||
{ | ||
"bucket": "BUCKET_NAME", // consider it a collection or a table | ||
"content": "A Product Text", // you will get a list ordered by cosine-similarity in descending order | ||
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get | ||
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit) | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
server: | ||
listen: ":3000" | ||
|
||
store: | ||
driver: "bolt" | ||
args: | ||
database: "./vec.db" | ||
|
||
embedder: | ||
enabled: true | ||
driver: gemini | ||
args: | ||
api_key: "${GEMINI_API_KEY}" | ||
text_embedding_model: "text-embedding-004" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.