Skip to content

Commit

Permalink
finalized most requirements
Browse files Browse the repository at this point in the history
added github action
  • Loading branch information
alash3al committed Aug 7, 2024
1 parent 48595b8 commit c699417
Show file tree
Hide file tree
Showing 17 changed files with 906 additions and 70 deletions.
63 changes: 63 additions & 0 deletions .github/workflows/releaser.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
env:
CGO_ENABLED: "0"
REGISTRY: ghcr.io
IMAGE_NAME: alash3al/vecdb

on:
release:
types: [created]

jobs:
binary-releaser:
name: Release Go Binary
runs-on: ubuntu-latest
strategy:
matrix:
goos: [linux, darwin, windows]
goarch: [amd64, arm64]
steps:
- uses: actions/checkout@v4.1.3
- uses: wangyoucao577/go-release-action@v1.50
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
goos: ${{ matrix.goos }}
goarch: ${{ matrix.goarch }}
goversion: "https://dl.google.com/go/go1.22.0.linux-amd64.tar.gz"
project_path: "."
binary_name: "vecdb"
ldflags: "-s -w"

docker-releaser:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4.1.3

- name: Get Latest Tag
id: var_tag
run: echo "name=$(git describe --tags --abbrev=0)" >> $GITHUB_OUTPUT

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3.3.0

- name: Log in to Github Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push Docker image
uses: docker/build-push-action@v5.3.0
with:
file: Dockerfile
context: .
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.var_tag.outputs.name }}
cache-from: type=gha
cache-to: type=gha,mode=max
113 changes: 101 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,104 @@
VecDB `[WIP]`
===========
> a very simple vector embedding database
VecDB
======
> a very simple vector embedding database,
> you can say that it is a hash-table that let you find items similar to the item you're searching for.
TODO
Why!
====
- [x] Build the storage layer interface
- [x] Introduce "Cosine Similarity" as the first vector similarity search algorithm in `v1` driver.
- [x] Introduce "BoltDB" as the first storage driver `v1` driver.
- [ ] Expose a server interface.
- [ ] Expose a command line interface.
> I'm a databases enthusiast, and this is a for fun and learning project that could be used in production ;).
>
> **P.S**: I like to re-invent the wheel in my free time, because it is my free time!
Why?
====
> I'm a databases enthusiast, and this is a for fun and learning project.
Data Model
==========
> I'm using the `{key => value}` model,
> - `key` should be a unique value that represents the item.
> - `value` should be the vector itself (List of Floats).
Configurations
==============
> by default `vecdb` searches for `config.yml` in the current working directory.
> but you can override it using the `--config /path/to/config.yml` flag by providing your own custom file path.
```yaml
# http server related configs
server:
# the address to listen on in the form of '[host]:port'
listen: "0.0.0.0:3000"

# storage related configs
store:
# the driver you want to use
# currently vecdb supports "bolt" which is based on boltdb the in process embedded the database
driver: "bolt"
# the arguments required by the driver
# for bolt, it requires a key called `database` points to the path you want to store the data in.
args:
database: "./vec.db"

# embeddings related configs
embedder:
# whether to enable the embedder and all endpoints using it or not
enabled: true
# the driver you want to use, currently vecdb supports gemini
driver: gemini
# the arguments required by the driver
# currently gemini driver requires `api_key` and `text_embedding_model`
args:
api_key: "${GEMINI_API_KEY}"
text_embedding_model: "text-embedding-004"
```
Components
===========
- Raw Vectors Layer (low-level)
- send [VectorWriteRequest](#VectorWriteRequest) to `POST /v1/vectors/write` when you have a vector and want to store it somewhere.
- send [VectorSearchRequest](#VectorSearchRequest) to `POST /v1/vectors/search` when you have a vector and want to list all similar vectors' keys/ids ordered by cosine similarity in descending order.
- Embedding Layer (optional)
- send [TextEmbeddingWriteRequest](#TextEmbeddingWriteRequest) to `POST /v1/embeddings/text/write` when you have a text and want `vecdb` to build and store the vector for you using the configured embedder (gemini for now).
- send [TextEmbeddingSearchRequest](#TextEmbeddingSearchRequest) to `POST /v1/embeddings/text/search` when you have a text and want `vecdb` to build a vector and search for similar vectors' keys for you ordered by cosine similarity in descending order.

Requests
========

### VectorWriteRequest
```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
"vector": [1.929292, 0.3848484, -1.9383838383, ... ] // the vector you want to store
}
```

### VectorSearchRequest
```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"vector": [1.929292, 0.3848484, -1.9383838383, ... ], // you will get a list ordered by cosine-similarity in descending order
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}
```

### TextEmbeddingWriteRequest
> if you set `embedder.enabled` to `true`.

```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
"content": "This is some text representing the product" // this will be converted to a vector using the configured embedder
}
```

### TextEmbeddingSearchRequest
> if you set `embedder.enabled` to `true`.

```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"content": "A Product Text", // you will get a list ordered by cosine-similarity in descending order
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}
```
14 changes: 14 additions & 0 deletions config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
server:
listen: ":3000"

store:
driver: "bolt"
args:
database: "./vec.db"

embedder:
enabled: true
driver: gemini
args:
api_key: "${GEMINI_API_KEY}"
text_embedding_model: "text-embedding-004"
50 changes: 49 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,55 @@ require (
)

require (
cloud.google.com/go v0.115.0 // indirect
cloud.google.com/go/ai v0.8.0 // indirect
cloud.google.com/go/auth v0.7.3 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.3 // indirect
cloud.google.com/go/compute/metadata v0.5.0 // indirect
cloud.google.com/go/longrunning v0.5.7 // indirect
github.com/andybalholm/brotli v1.0.5 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/gabriel-vasile/mimetype v1.4.3 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-playground/locales v0.14.1 // indirect
github.com/go-playground/universal-translator v0.18.1 // indirect
github.com/go-playground/validator/v10 v10.22.0 // indirect
github.com/gofiber/fiber/v2 v2.52.5 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/generative-ai-go v0.17.0 // indirect
github.com/google/s2a-go v0.1.8 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
github.com/googleapis/gax-go/v2 v2.13.0 // indirect
github.com/klauspost/compress v1.17.0 // indirect
github.com/leodido/go-urn v1.4.0 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.15 // indirect
github.com/rivo/uniseg v0.2.0 // indirect
github.com/stretchr/testify v1.9.0 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
github.com/valyala/fasthttp v1.51.0 // indirect
github.com/valyala/tcplisten v1.0.0 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.51.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.51.0 // indirect
go.opentelemetry.io/otel v1.26.0 // indirect
go.opentelemetry.io/otel/metric v1.26.0 // indirect
go.opentelemetry.io/otel/trace v1.26.0 // indirect
golang.org/x/crypto v0.25.0 // indirect
golang.org/x/net v0.27.0 // indirect
golang.org/x/oauth2 v0.21.0 // indirect
golang.org/x/sync v0.7.0 // indirect
golang.org/x/sys v0.18.0 // indirect
golang.org/x/sys v0.22.0 // indirect
golang.org/x/text v0.16.0 // indirect
golang.org/x/time v0.5.0 // indirect
google.golang.org/api v0.190.0 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20240711142825-46eb208f015d // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240730163845-b1a4ccb954bf // indirect
google.golang.org/grpc v1.64.1 // indirect
google.golang.org/protobuf v1.34.2 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
Loading

0 comments on commit c699417

Please sign in to comment.