Skip to content

Commit

Permalink
Add support for Xata as a vector store (#2125)
Browse files Browse the repository at this point in the history
* Added Xata as vector store + integration tests

* Added examples and docs

* Declare xata in peerDependencies as per the contributing guide

* Removed any for the client type

* review comments
  • Loading branch information
tsg authored Aug 3, 2023
1 parent 7691e10 commit 8352ffc
Show file tree
Hide file tree
Showing 18 changed files with 485 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Xata

[Xata](https://xata.io) is a serverless data platform, based on PostgreSQL. It provides a type-safe TypeScript/JavaScript SDK for interacting with your database, and a UI for managing your data.

Xata has a native vector type, which can be added to any table, and supports similarity search. LangChain inserts vectors directly to Xata, and queries it for the nearest neighbors of a given vector, so that you can use all the LangChain Embeddings integrations with Xata.

## Setup

### Install the Xata CLI

```bash
npm install @xata.io/cli -g
```

### Create a database to be used as a vector store

In the [Xata UI](https://app.xata.io) create a new database. You can name it whatever you want, but for this example we'll use `langchain`.
Create a table, again you can name it anything, but we will use `vectors`. Add the following columns via the UI:

* `content` of type "Long text". This is used to store the `Document.pageContent` values.
* `embedding` of type "Vector". Use the dimension used by the model you plan to use (1536 for OpenAI).
* any other columns you want to use as metadata. They are populated from the `Document.metadata` object. For example, if in the `Document.metadata` object you have a `title` property, you can create a `title` column in the table and it will be populated.

### Initialize the project

In your project, run:

```bash
xata init
```

and then choose the database you created above. This will also generate a `xata.ts` or `xata.js` file that defines the client you can use to interact with the database. See the [Xata getting started docs](https://xata.io/docs/getting-started/installation) for more details on using the Xata JavaScript/TypeScript SDK.

## Usage

import CodeBlock from "@theme/CodeBlock";

### Example: Q&A chatbot using OpenAI and Xata as vector store

This example uses the `VectorDBQAChain` to search the documents stored in Xata and then pass them as context to the OpenAI model, in order to answer the question asked by the user.

import FromDocs from "@examples/indexes/vector_stores/xata.ts";

<CodeBlock language="typescript">{FromDocs}</CodeBlock>

### Example: Similarity search with a metadata filter

This example shows how to implement semantic search using LangChain.js and Xata. Before running it, make sure to add an `author` column of type String to the `vectors` table in Xata.

import SimSearch from "@examples/indexes/vector_stores/xata_metadata.ts";

<CodeBlock language="typescript">{SimSearch}</CodeBlock>
1 change: 1 addition & 0 deletions examples/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"@tensorflow/tfjs-backend-cpu": "^4.4.0",
"@tigrisdata/vector": "^1.1.0",
"@upstash/redis": "^1.20.6",
"@xata.io/client": "^0.25.1",
"@zilliz/milvus2-sdk-node": "^2.2.7",
"axios": "^0.26.0",
"chromadb": "^1.4.0",
Expand Down
65 changes: 65 additions & 0 deletions examples/src/indexes/vector_stores/xata.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import { XataVectorSearch } from "langchain/vectorstores/xata";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { BaseClient } from "@xata.io/client";
import { Document } from "langchain/document";
import { VectorDBQAChain } from "langchain/chains";
import { OpenAI } from "langchain/llms/openai";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/xata

// if you use the generated client, you don't need this function.
// Just import getXataClient from the generated xata.ts instead.
const getXataClient = () => {
if (!process.env.XATA_API_KEY) {
throw new Error("XATA_API_KEY not set");
}

if (!process.env.XATA_DB_URL) {
throw new Error("XATA_DB_URL not set");
}
const xata = new BaseClient({
databaseURL: process.env.XATA_DB_URL,
apiKey: process.env.XATA_API_KEY,
branch: process.env.XATA_BRANCH || "main",
});
return xata;
};

export async function run() {
const client = getXataClient();

const table = "vectors";
const embeddings = new OpenAIEmbeddings();
const store = new XataVectorSearch(embeddings, { client, table });

// Add documents
const docs = [
new Document({
pageContent: "Xata is a Serverless Data platform based on PostgreSQL",
}),
new Document({
pageContent:
"Xata offers a built-in vector type that can be used to store and query vectors",
}),
new Document({
pageContent: "Xata includes similarity search",
}),
];

const ids = await store.addDocuments(docs);

// eslint-disable-next-line no-promise-executor-return
await new Promise((r) => setTimeout(r, 2000));

const model = new OpenAI();
const chain = VectorDBQAChain.fromLLM(model, store, {
k: 1,
returnSourceDocuments: true,
});
const response = await chain.call({ query: "What is Xata?" });

console.log(JSON.stringify(response, null, 2));

await store.delete({ ids });
}
61 changes: 61 additions & 0 deletions examples/src/indexes/vector_stores/xata_metadata.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import { XataVectorSearch } from "langchain/vectorstores/xata";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { BaseClient } from "@xata.io/client";
import { Document } from "langchain/document";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/xata
// Also, add a column named "author" to the "vectors" table.

// if you use the generated client, you don't need this function.
// Just import getXataClient from the generated xata.ts instead.
const getXataClient = () => {
if (!process.env.XATA_API_KEY) {
throw new Error("XATA_API_KEY not set");
}

if (!process.env.XATA_DB_URL) {
throw new Error("XATA_DB_URL not set");
}
const xata = new BaseClient({
databaseURL: process.env.XATA_DB_URL,
apiKey: process.env.XATA_API_KEY,
branch: process.env.XATA_BRANCH || "main",
});
return xata;
};

export async function run() {
const client = getXataClient();
const table = "vectors";
const embeddings = new OpenAIEmbeddings();
const store = new XataVectorSearch(embeddings, { client, table });
// Add documents
const docs = [
new Document({
pageContent: "Xata works great with Langchain.js",
metadata: { author: "Xata" },
}),
new Document({
pageContent: "Xata works great with Langchain",
metadata: { author: "Langchain" },
}),
new Document({
pageContent: "Xata includes similarity search",
metadata: { author: "Xata" },
}),
];
const ids = await store.addDocuments(docs);

// eslint-disable-next-line no-promise-executor-return
await new Promise((r) => setTimeout(r, 2000));

// author is applied as pre-filter to the similarity search
const results = await store.similaritySearchWithScore("xata works great", 6, {
author: "Langchain",
});

console.log(JSON.stringify(results, null, 2));

await store.delete({ ids });
}
3 changes: 3 additions & 0 deletions langchain/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,9 @@ vectorstores/tigris.d.ts
vectorstores/vectara.cjs
vectorstores/vectara.js
vectorstores/vectara.d.ts
vectorstores/xata.cjs
vectorstores/xata.js
vectorstores/xata.d.ts
text_splitter.cjs
text_splitter.js
text_splitter.d.ts
Expand Down
13 changes: 13 additions & 0 deletions langchain/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,9 @@
"vectorstores/vectara.cjs",
"vectorstores/vectara.js",
"vectorstores/vectara.d.ts",
"vectorstores/xata.cjs",
"vectorstores/xata.js",
"vectorstores/xata.d.ts",
"text_splitter.cjs",
"text_splitter.js",
"text_splitter.d.ts",
Expand Down Expand Up @@ -574,6 +577,7 @@
"@typescript-eslint/eslint-plugin": "^5.58.0",
"@typescript-eslint/parser": "^5.58.0",
"@upstash/redis": "^1.20.6",
"@xata.io/client": "^0.25.1",
"@zilliz/milvus2-sdk-node": ">=2.2.7",
"apify-client": "^2.7.1",
"axios": "^0.26.0",
Expand Down Expand Up @@ -655,6 +659,7 @@
"@tensorflow/tfjs-core": "*",
"@tigrisdata/vector": "^1.1.0",
"@upstash/redis": "^1.20.6",
"@xata.io/client": "^0.25.1",
"@zilliz/milvus2-sdk-node": ">=2.2.7",
"apify-client": "^2.7.1",
"axios": "*",
Expand Down Expand Up @@ -772,6 +777,9 @@
"@upstash/redis": {
"optional": true
},
"@xata.io/client": {
"optional": true
},
"@zilliz/milvus2-sdk-node": {
"optional": true
},
Expand Down Expand Up @@ -1267,6 +1275,11 @@
"import": "./vectorstores/vectara.js",
"require": "./vectorstores/vectara.cjs"
},
"./vectorstores/xata": {
"types": "./vectorstores/xata.d.ts",
"import": "./vectorstores/xata.js",
"require": "./vectorstores/xata.cjs"
},
"./text_splitter": {
"types": "./text_splitter.d.ts",
"import": "./text_splitter.js",
Expand Down
4 changes: 3 additions & 1 deletion langchain/scripts/create-entrypoints.js
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ const entrypoints = {
"vectorstores/singlestore": "vectorstores/singlestore",
"vectorstores/tigris": "vectorstores/tigris",
"vectorstores/vectara": "vectorstores/vectara",
"vectorstores/xata": "vectorstores/xata",
// text_splitter
text_splitter: "text_splitter",
// memory
Expand Down Expand Up @@ -197,7 +198,8 @@ const entrypoints = {
"experimental/babyagi": "experimental/babyagi/index",
"experimental/generative_agents": "experimental/generative_agents/index",
"experimental/plan_and_execute": "experimental/plan_and_execute/index",
"experimental/multimodal_embeddings/googlevertexai": "experimental/multimodal_embeddings/googlevertexai",
"experimental/multimodal_embeddings/googlevertexai":
"experimental/multimodal_embeddings/googlevertexai",
// evaluation
evaluation: "evaluation/index",
};
Expand Down
1 change: 1 addition & 0 deletions langchain/src/load/import_map.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ export * as vectorstores__base from "../vectorstores/base.js";
export * as vectorstores__memory from "../vectorstores/memory.js";
export * as vectorstores__prisma from "../vectorstores/prisma.js";
export * as vectorstores__vectara from "../vectorstores/vectara.js";
export * as vectorstores__xata from "../vectorstores/xata.js";
export * as text_splitter from "../text_splitter.js";
export * as memory from "../memory/index.js";
export * as document from "../document.js";
Expand Down
Loading

1 comment on commit 8352ffc

@vercel
Copy link

@vercel vercel bot commented on 8352ffc Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.