Skip to content

Commit

Permalink
Add SingleStore vectorstore integration (#1409)
Browse files Browse the repository at this point in the history
* add SingleStore vectorstore integration

* undo wrong index formatting

* update yarn.lock file

* fix merge error

* switched to mysql2/promise and addressed other review comments

* update docs

* sanitize sql queries

* sanitize sql queries

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
  • Loading branch information
volodymyr-memsql and volodymyr-memsql authored Jun 2, 2023
1 parent b495505 commit 08bfda4
Show file tree
Hide file tree
Showing 13 changed files with 381 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/docs/modules/indexes/vector_stores/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ Here's a quick guide to help you pick the right vector store for your use case:
- If you're looking for an open-source full-featured vector database that you can run locally in a docker container, then go for [Chroma](./integrations/chroma)
- If you're using Supabase already then look at the [Supabase](./integrations/supabase) vector store to use the same Postgres database for your embeddings too
- If you're looking for a production-ready vector store you don't have to worry about hosting yourself, then go for [Pinecone](./integrations/pinecone)
- If you are already utilizing SingleStore, or if you find yourself in need of a distributed, high-performance database, you might want to consider the [SingleStore](./integrations/singlestore) vectore store.

## All Vector Stores

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
sidebar_class_name: node-only
---

import CodeBlock from "@theme/CodeBlock";

# SingleStore

[SingleStoreDB](https://singlestore.com/) is a high-performing, distributed database system. For an extended period, it has offered support for vector functions such as [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html), thus establishing itself as an optimal solution for AI applications necessitating text similarity matching.

:::tip Compatibility
Only available on Node.js.
:::

LangChain.js accepts `mysql2/promise Pool` as the connections pool for SingleStore vectorstore.

## Setup

1. Establish a SingleStoreDB environment. You have the flexibility to choose between [Cloud-based](https://docs.singlestore.com/managed-service/en/getting-started-with-singlestoredb-cloud.html) or [On-Premise](https://docs.singlestore.com/db/v8.1/en/developer-resources/get-started-using-singlestoredb-for-free.html) editions.
2. Install the mysql2 JS client

```bash npm2yarn
npm install -S mysql2
```

## Usage

import UsageExample from "@examples/indexes/vector_stores/singlestore.ts";

<CodeBlock language="typescript">{UsageExample}</CodeBlock>
5 changes: 5 additions & 0 deletions examples/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ MYSCALE_PORT=ADD_YOURS_HERE
MYSCALE_USERNAME=ADD_YOURS_HERE
MYSCALE_PASSWORD=ADD_YOURS_HERE
REDIS_URL=ADD_YOURS_HERE
SINGLESTORE_HOST=ADD_YOURS_HERE
SINGLESTORE_PORT=ADD_YOURS_HERE
SINGLESTORE_USERNAME=ADD_YOURS_HERE
SINGLESTORE_PASSWORD=ADD_YOURS_HERE
SINGLESTORE_DATABASE=ADD_YOURS_HERE
TIGRIS_URI=ADD_YOURS_HERE
TIGRIS_PROJECT=ADD_YOURS_HERE
TIGRIS_CLIENT_ID=ADD_YOURS_HERE
Expand Down
1 change: 1 addition & 0 deletions examples/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"langchain": "workspace:*",
"ml-distance": "^4.0.0",
"mongodb": "^5.2.0",
"mysql2": "^3.3.3",
"pickleparser": "^0.1.0",
"prisma": "^4.11.0",
"redis": "^4.6.6",
Expand Down
26 changes: 26 additions & 0 deletions examples/src/indexes/vector_stores/singlestore.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import { SingleStoreVectorStore } from "langchain/vectorstores/singlestore";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { createPool } from "mysql2/promise";

export const run = async () => {
const pool = createPool({
host: process.env.SINGLESTORE_HOST,
port: Number(process.env.SINGLESTORE_PORT),
user: process.env.SINGLESTORE_USERNAME,
password: process.env.SINGLESTORE_PASSWORD,
database: process.env.SINGLESTORE_DATABASE,
});

const vectorStore = await SingleStoreVectorStore.fromTexts(
["Hello world", "Bye bye", "hello nice world"],
[{ id: 2 }, { id: 1 }, { id: 3 }],
new OpenAIEmbeddings(),
{
connectionPool: pool,
}
);

const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);
await pool.end();
};
5 changes: 5 additions & 0 deletions langchain/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ MYSCALE_USERNAME=ADD_YOURS_HERE
MYSCALE_PASSWORD=ADD_YOURS_HERE
FIGMA_ACCESS_TOKEN=ADD_YOURS_HERE
REDIS_URL=ADD_YOURS_HERE
SINGLESTORE_HOST=ADD_YOURS_HERE
SINGLESTORE_PORT=ADD_YOURS_HERE
SINGLESTORE_USERNAME=ADD_YOURS_HERE
SINGLESTORE_PASSWORD=ADD_YOURS_HERE
SINGLESTORE_DATABASE=ADD_YOURS_HERE
UPSTASH_REDIS_REST_URL=https://ADD_YOURS_HERE.upstash.io
UPSTASH_REDIS_REST_TOKEN=ADD_YOURS_HERE
TIGRIS_URI=ADD_YOURS_HERE
Expand Down
3 changes: 3 additions & 0 deletions langchain/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,9 @@ vectorstores/myscale.d.ts
vectorstores/redis.cjs
vectorstores/redis.js
vectorstores/redis.d.ts
vectorstores/singlestore.cjs
vectorstores/singlestore.js
vectorstores/singlestore.d.ts
vectorstores/tigris.cjs
vectorstores/tigris.js
vectorstores/tigris.d.ts
Expand Down
13 changes: 13 additions & 0 deletions langchain/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@
"vectorstores/redis.cjs",
"vectorstores/redis.js",
"vectorstores/redis.d.ts",
"vectorstores/singlestore.cjs",
"vectorstores/singlestore.js",
"vectorstores/singlestore.d.ts",
"vectorstores/tigris.cjs",
"vectorstores/tigris.js",
"vectorstores/tigris.d.ts",
Expand Down Expand Up @@ -446,6 +449,7 @@
"jest": "^29.5.0",
"mammoth": "^1.5.1",
"mongodb": "^5.2.0",
"mysql2": "^3.3.3",
"pdf-parse": "1.1.1",
"peggy": "^3.0.2",
"pickleparser": "^0.1.0",
Expand Down Expand Up @@ -498,6 +502,7 @@
"ignore": "^5.2.0",
"mammoth": "*",
"mongodb": "^5.2.0",
"mysql2": "^3.3.3",
"pdf-parse": "1.1.1",
"peggy": "^3.0.2",
"pickleparser": "^0.1.0",
Expand Down Expand Up @@ -612,6 +617,9 @@
"mongodb": {
"optional": true
},
"mysql2": {
"optional": true
},
"pdf-parse": {
"optional": true
},
Expand Down Expand Up @@ -920,6 +928,11 @@
"import": "./vectorstores/redis.js",
"require": "./vectorstores/redis.cjs"
},
"./vectorstores/singlestore": {
"types": "./vectorstores/singlestore.d.ts",
"import": "./vectorstores/singlestore.js",
"require": "./vectorstores/singlestore.cjs"
},
"./vectorstores/tigris": {
"types": "./vectorstores/tigris.d.ts",
"import": "./vectorstores/tigris.js",
Expand Down
2 changes: 2 additions & 0 deletions langchain/scripts/create-entrypoints.js
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ const entrypoints = {
"vectorstores/prisma": "vectorstores/prisma",
"vectorstores/myscale": "vectorstores/myscale",
"vectorstores/redis": "vectorstores/redis",
"vectorstores/singlestore": "vectorstores/singlestore",
"vectorstores/tigris": "vectorstores/tigris",
// text_splitter
text_splitter: "text_splitter",
Expand Down Expand Up @@ -193,6 +194,7 @@ const requiresOptionalDependency = [
"vectorstores/milvus",
"vectorstores/myscale",
"vectorstores/redis",
"vectorstores/singlestore",
"vectorstores/tigris",
"memory/zep",
"document_loaders/web/apify_dataset",
Expand Down
142 changes: 142 additions & 0 deletions langchain/src/vectorstores/singlestore.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
import type {
Pool,
RowDataPacket,
OkPacket,
ResultSetHeader,
FieldPacket,
} from "mysql2/promise";
import { format } from "mysql2";
import { VectorStore } from "./base.js";
import { Embeddings } from "../embeddings/base.js";
import { Document } from "../document.js";

export interface SingleStoreVectorStoreConfig {
connectionPool: Pool;
tableName?: string;
contentColumnName?: string;
vectorColumnName?: string;
metadataColumnName?: string;
}

export class SingleStoreVectorStore extends VectorStore {
connectionPool: Pool;

tableName: string;

contentColumnName: string;

vectorColumnName: string;

metadataColumnName: string;

constructor(embeddings: Embeddings, config: SingleStoreVectorStoreConfig) {
super(embeddings, config);
this.connectionPool = config.connectionPool;
this.tableName = config.tableName ?? "embeddings";
this.contentColumnName = config.contentColumnName ?? "content";
this.vectorColumnName = config.vectorColumnName ?? "vector";
this.metadataColumnName = config.metadataColumnName ?? "metadata";
}

async createTableIfNotExists(): Promise<void> {
await this.connectionPool
.execute(`CREATE TABLE IF NOT EXISTS ${this.tableName} (
${this.contentColumnName} TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
${this.vectorColumnName} BLOB,
${this.metadataColumnName} JSON);`);
}

async addDocuments(documents: Document[]): Promise<void> {
const texts = documents.map(({ pageContent }) => pageContent);
const vectors = await this.embeddings.embedDocuments(texts);
return this.addVectors(vectors, documents);
}

async addVectors(vectors: number[][], documents: Document[]): Promise<void> {
await this.createTableIfNotExists();
const { tableName } = this;

await Promise.all(
vectors.map(async (vector, idx) => {
try {
await this.connectionPool.execute(
format(
`INSERT INTO ${tableName} VALUES (?, JSON_ARRAY_PACK('[?]'), ?);`,
[
documents[idx].pageContent,
vector,
JSON.stringify(documents[idx].metadata),
]
)
);
} catch (error) {
console.error(`Error adding vector at index ${idx}:`, error);
}
})
);
}

async similaritySearchVectorWithScore(
query: number[],
k: number,
_filter?: undefined
): Promise<[Document, number][]> {
// use vector DOT_PRODUCT as a distance function
const [rows]: [
(
| RowDataPacket[]
| RowDataPacket[][]
| OkPacket
| OkPacket[]
| ResultSetHeader
),
FieldPacket[]
] = await this.connectionPool.query(
format(
`SELECT ${this.contentColumnName},
${this.metadataColumnName},
DOT_PRODUCT(${this.vectorColumnName}, JSON_ARRAY_PACK('[?]')) as __score FROM ${this.tableName}
ORDER BY __score DESC LIMIT ?;`,
[query, k]
)
);
const result: [Document, number][] = [];
for (const row of rows as RowDataPacket[]) {
const rowData = row as unknown as Record<string, unknown>;
result.push([
new Document({
pageContent: rowData[this.contentColumnName] as string,
metadata: rowData[this.metadataColumnName] as Record<string, unknown>,
}),
Number(rowData.score),
]);
}
return result;
}

static async fromTexts(
texts: string[],
metadatas: object[],
embeddings: Embeddings,
dbConfig: SingleStoreVectorStoreConfig
): Promise<SingleStoreVectorStore> {
const docs = texts.map((text, idx) => {
const metadata = Array.isArray(metadatas) ? metadatas[idx] : metadatas;
return new Document({
pageContent: text,
metadata,
});
});
return SingleStoreVectorStore.fromDocuments(docs, embeddings, dbConfig);
}

static async fromDocuments(
docs: Document[],
embeddings: Embeddings,
dbConfig: SingleStoreVectorStoreConfig
): Promise<SingleStoreVectorStore> {
const instance = new this(embeddings, dbConfig);
await instance.addDocuments(docs);
return instance;
}
}
70 changes: 70 additions & 0 deletions langchain/src/vectorstores/tests/singlestore.int.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
/* eslint-disable no-process-env */
/* eslint-disable import/no-extraneous-dependencies */
import { test, expect } from "@jest/globals";
import { createPool } from "mysql2/promise";
import { OpenAIEmbeddings } from "../../embeddings/openai.js";
import { SingleStoreVectorStore } from "../singlestore.js";
import { Document } from "../../document.js";

test("SingleStoreVectorStore", async () => {
expect(process.env.SINGLESTORE_HOST).toBeDefined();
expect(process.env.SINGLESTORE_PORT).toBeDefined();
expect(process.env.SINGLESTORE_USERNAME).toBeDefined();
expect(process.env.SINGLESTORE_PASSWORD).toBeDefined();
expect(process.env.SINGLESTORE_DATABASE).toBeDefined();

const pool = createPool({
host: process.env.SINGLESTORE_HOST,
port: Number(process.env.SINGLESTORE_PORT),
user: process.env.SINGLESTORE_USERNAME,
password: process.env.SINGLESTORE_PASSWORD,
database: process.env.SINGLESTORE_DATABASE,
});
const vectorStore = await SingleStoreVectorStore.fromTexts(
["Hello world", "Bye bye", "hello nice world"],
[
{ id: 2, name: "2" },
{ id: 1, name: "1" },
{ id: 3, name: "3" },
],
new OpenAIEmbeddings(),
{
connectionPool: pool,
contentColumnName: "cont",
metadataColumnName: "met",
vectorColumnName: "vec",
}
);
expect(vectorStore).toBeDefined();

const results = await vectorStore.similaritySearch("hello world", 1);

expect(results).toEqual([
new Document({
pageContent: "Hello world",
metadata: { id: 2, name: "2" },
}),
]);

await vectorStore.addDocuments([
new Document({
pageContent: "Green forest",
metadata: { id: 4, name: "4" },
}),
new Document({
pageContent: "Green field",
metadata: { id: 5, name: "5" },
}),
]);

const results2 = await vectorStore.similaritySearch("forest", 1);

expect(results2).toEqual([
new Document({
pageContent: "Green forest",
metadata: { id: 4, name: "4" },
}),
]);

await pool.end();
});
1 change: 1 addition & 0 deletions langchain/tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@
"src/vectorstores/prisma.ts",
"src/vectorstores/myscale.ts",
"src/vectorstores/redis.ts",
"src/vectorstores/singlestore.ts",
"src/vectorstores/tigris.ts",
"src/text_splitter.ts",
"src/memory/index.ts",
Expand Down
Loading

1 comment on commit 08bfda4

@vercel
Copy link

@vercel vercel bot commented on 08bfda4 Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.