Skip to content

Commit

Permalink
Support delete vectorstore operations (#1830)
Browse files Browse the repository at this point in the history
* Adds optional id field to document base class to support upsert and delete operations

* Rehydrate retrieved Chroma docs with ids

* Allow ids or a filter type to be passed into vector store delete method

* Remove id from TypeORM Document model

* Revert base document class changes and use pinecone precendent to pass delete args

* Add deletion tests

* Update tests

* Change vectorstore add docs methods to accept an options object, update docs

* Update VectorStoreRetriever interface
  • Loading branch information
jacoblee93 authored Jul 9, 2023
1 parent bbbe315 commit 99b1985
Show file tree
Hide file tree
Showing 27 changed files with 762 additions and 93 deletions.
22 changes: 19 additions & 3 deletions docs/docs/modules/indexes/vector_stores/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,15 @@ A vector store is a particular type of database optimized for storing documents
```typescript
interface VectorStore {
/**
* Add more documents to an existing VectorStore
* Add more documents to an existing VectorStore.
* Some providers support additional parameters, e.g. to associate custom ids
* with added documents or to change the batch size of bulk inserts.
* Returns an array of ids for the documents or nothing.
*/
addDocuments(documents: Document[]): Promise<void>;
addDocuments(
documents: Document[],
options?: Record<string, any>
): Promise<string[] | void>;

/**
* Search for the most similar documents to a query
Expand All @@ -42,11 +48,21 @@ interface VectorStore {
*/
asRetriever(k?: number): BaseRetriever;

/**
* Delete embedded documents from the vector store matching the passed in parameter.
* Not supported by every provider.
*/
delete(params?: Record<string, any>): Promise<void>;

/**
* Advanced: Add more documents to an existing VectorStore,
* when you already have their embeddings
*/
addVectors(vectors: number[][], documents: Document[]): Promise<void>;
addVectors(
vectors: number[][],
documents: Document[],
options?: Record<string, any>
): Promise<string[] | void>;

/**
* Advanced: Search for the most similar documents to a query,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,9 @@ import FromTexts from "@examples/indexes/vector_stores/chroma/fromTexts.ts";
import Search from "@examples/indexes/vector_stores/chroma/search.ts";

<CodeBlock language="typescript">{Search}</CodeBlock>

## Usage, delete docs

import Delete from "@examples/indexes/vector_stores/chroma/delete.ts";

<CodeBlock language="typescript">{Delete}</CodeBlock>
77 changes: 75 additions & 2 deletions docs/docs/modules/indexes/vector_stores/integrations/pinecone.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ sidebar_class_name: node-only
Only available on Node.js.
:::

Langchain.js accepts [@pinecone-database/pinecone](https://docs.pinecone.io/docs/node-client) as the client for Pinecone vectorstore. Install the library with
LangChain.js accepts [@pinecone-database/pinecone](https://docs.pinecone.io/docs/node-client) as the client for Pinecone vectorstore. Install the library with:

```bash npm2yarn
npm install -S dotenv langchain @pinecone-database/pinecone
npm install -S dotenv @pinecone-database/pinecone
```

## Index docs
Expand Down Expand Up @@ -114,3 +114,76 @@ console.log(response);
}
*/
```

## Delete docs

```typescript
import { PineconeClient } from "@pinecone-database/pinecone";
import * as dotenv from "dotenv";
import { Document } from "langchain/document";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { PineconeStore } from "langchain/vectorstores/pinecone";

dotenv.config();

const client = new PineconeClient();
await client.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
});
const pineconeIndex = client.Index(process.env.PINECONE_INDEX);
const embeddings = new OpenAIEmbeddings();
const pineconeStore = new PineconeStore(embeddings, { pineconeIndex });

const docs = [
new Document({
metadata: { foo: "bar" },
pageContent: "pinecone is a vector db",
}),
new Document({
metadata: { foo: "bar" },
pageContent: "the quick brown fox jumped over the lazy dog",
}),
new Document({
metadata: { baz: "qux" },
pageContent: "lorem ipsum dolor sit amet",
}),
new Document({
metadata: { baz: "qux" },
pageContent: "pinecones are the woody fruiting body and of a pine tree",
}),
];

const ids = await pineconeStore.addDocuments(docs);

const results = await pineconeStore.similaritySearch(pageContent, 2, {
foo: "bar",
});

console.log(results);
/*
[
Document {
pageContent: 'pinecone is a vector db',
metadata: { foo: 'bar' },
},
Document {
pageContent: "the quick brown fox jumped over the lazy dog",
metadata: { foo: "bar" },
}
]
*/

await pineconeStore.delete({
ids: [ids[0], ids[1]],
});

const results2 = await pineconeStore.similaritySearch(pageContent, 2, {
foo: "bar",
});

console.log(results2);
/*
[]
*/
```
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ import CodeBlock from "@theme/CodeBlock";
import Example from "@examples/indexes/vector_stores/supabase.ts";
import MetadataFilterExample from "@examples/indexes/vector_stores/supabase_with_metadata_filter.ts";
import MetadataQueryBuilderFilterExample from "@examples/indexes/vector_stores/supabase_with_query_builder_metadata_filter.ts";
import DeletionExample from "@examples/indexes/vector_stores/supabase_deletion.ts";

### Standard Usage

Expand All @@ -81,3 +82,7 @@ Given the above `match_documents` Postgres function, you can also pass a filter
You can also use query builder-style filtering similar to how [the Supabase JavaScript library works](https://supabase.com/docs/reference/javascript/using-filters) instead of passing an object. Note that since most of the filter properties are in the metadata column, you need to use arrow operators (`->` for integer or `->>` for text) as defined in [Postgrest API documentation](https://postgrest.org/en/stable/references/api/tables_views.html?highlight=operators#json-columns) and specify the data type of the property (e.g. the column should look something like `metadata->some_int_value::int`).

<CodeBlock language="typescript">{MetadataQueryBuilderFilterExample}</CodeBlock>

### Document deletion

<CodeBlock language="typescript">{DeletionExample}</CodeBlock>
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,9 @@ import InsertExample from "@examples/indexes/vector_stores/weaviate_fromTexts.ts
import QueryExample from "@examples/indexes/vector_stores/weaviate_search.ts";

<CodeBlock language="typescript">{QueryExample}</CodeBlock>

## Usage delete documents

import DeleteExample from "@examples/indexes/vector_stores/weaviate_delete.ts";

<CodeBlock language="typescript">{DeleteExample}</CodeBlock>
69 changes: 69 additions & 0 deletions examples/src/indexes/vector_stores/chroma/delete.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import { Chroma } from "langchain/vectorstores/chroma";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";

const embeddings = new OpenAIEmbeddings();
const vectorStore = new Chroma(embeddings, {
collectionName: "godel-escher-bach",
});

const documents = [
{
pageContent: `Tortoise: Labyrinth? Labyrinth? Could it Are we in the notorious Little
Harmonic Labyrinth of the dreaded Majotaur?`,
metadata: {
speaker: "Tortoise",
},
},
{
pageContent: "Achilles: Yiikes! What is that?",
metadata: {
speaker: "Achilles",
},
},
{
pageContent: `Tortoise: They say-although I person never believed it myself-that an I
Majotaur has created a tiny labyrinth sits in a pit in the middle of
it, waiting innocent victims to get lost in its fears complexity.
Then, when they wander and dazed into the center, he laughs and
laughs at them-so hard, that he laughs them to death!`,
metadata: {
speaker: "Tortoise",
},
},
{
pageContent: "Achilles: Oh, no!",
metadata: {
speaker: "Achilles",
},
},
{
pageContent: "Tortoise: But it's only a myth. Courage, Achilles.",
metadata: {
speaker: "Tortoise",
},
},
];

const ids = await vectorStore.addDocuments(documents);

const response = await vectorStore.similaritySearch("scared", 2);
console.log(response);
/*
[
Document { pageContent: 'Achilles: Oh, no!', metadata: {} },
Document {
pageContent: 'Achilles: Yiikes! What is that?',
metadata: { id: 1 }
}
]
*/

// You can also pass a "filter" parameter instead
await vectorStore.delete({ ids });

const response2 = await vectorStore.similaritySearch("scared", 2);
console.log(response2);

/*
[]
*/
15 changes: 13 additions & 2 deletions examples/src/indexes/vector_stores/elasticsearch/elasticsearch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,11 @@ export async function run() {
baseOptions: { temperature: 0 },
});

await ElasticVectorSearch.fromDocuments(docs, embeddings, clientArgs);

// await ElasticVectorSearch.fromDocuments(docs, embeddings, clientArgs);
const vectorStore = new ElasticVectorSearch(embeddings, clientArgs);

const ids = await vectorStore.addDocuments(docs);

/* Search the vector DB independently with meta filters */
const results = await vectorStore.similaritySearch("fox jump", 1);
console.log(JSON.stringify(results, null, 2));
Expand Down Expand Up @@ -93,4 +94,14 @@ export async function run() {
]
}
*/

await vectorStore.delete({ ids });

const response2 = await chain.call({ query: "What is Elasticsearch?" });

console.log(JSON.stringify(response2, null, 2));

/*
[]
*/
}
49 changes: 49 additions & 0 deletions examples/src/indexes/vector_stores/supabase_deletion.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import { SupabaseVectorStore } from "langchain/vectorstores/supabase";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { createClient } from "@supabase/supabase-js";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/supabase

const privateKey = process.env.SUPABASE_PRIVATE_KEY;
if (!privateKey) throw new Error(`Expected env var SUPABASE_PRIVATE_KEY`);

const url = process.env.SUPABASE_URL;
if (!url) throw new Error(`Expected env var SUPABASE_URL`);

export const run = async () => {
const client = createClient(url, privateKey);

const embeddings = new OpenAIEmbeddings();

const store = new SupabaseVectorStore(embeddings, {
client,
tableName: "documents",
});

const docs = [
{ pageContent: "hello", metadata: { b: 1, c: 9, stuff: "right" } },
{ pageContent: "hello", metadata: { b: 1, c: 9, stuff: "wrong" } },
];

const ids = await store.addDocuments(docs);

const resultA = await store.similaritySearch("hello", 2);
console.log(resultA);

/*
[
Document { pageContent: "hello", metadata: { b: 1, c: 9, stuff: "right" } },
Document { pageContent: "hello", metadata: { b: 1, c: 9, stuff: "wrong" } },
]
*/

await store.delete({ ids });

const resultB = await store.similaritySearch("hello", 2);
console.log(resultB);

/*
[]
*/
};
41 changes: 41 additions & 0 deletions examples/src/indexes/vector_stores/weaviate_delete.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/* eslint-disable @typescript-eslint/no-explicit-any */
import weaviate from "weaviate-ts-client";
import { WeaviateStore } from "langchain/vectorstores/weaviate";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";

export async function run() {
// Something wrong with the weaviate-ts-client types, so we need to disable
const client = (weaviate as any).client({
scheme: process.env.WEAVIATE_SCHEME || "https",
host: process.env.WEAVIATE_HOST || "localhost",
apiKey: new (weaviate as any).ApiKey(
process.env.WEAVIATE_API_KEY || "default"
),
});

// Create a store for an existing index
const store = await WeaviateStore.fromExistingIndex(new OpenAIEmbeddings(), {
client,
indexName: "Test",
metadataKeys: ["foo"],
});

const docs = [{ pageContent: "see ya!", metadata: { foo: "bar" } }];

const ids = await store.addDocuments(docs);

// Search the index without any filters
const results = await store.similaritySearch("see ya!", 1);
console.log(results);
/*
[ Document { pageContent: 'see ya!', metadata: { foo: 'bar' } } ]
*/

await store.delete({ ids });

const results2 = await store.similaritySearch("see ya!", 1);
console.log(results2);
/*
[]
*/
}
4 changes: 2 additions & 2 deletions langchain/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@
"apify-client": "^2.7.1",
"axios": "^0.26.0",
"cheerio": "^1.0.0-rc.12",
"chromadb": "^1.5.2",
"chromadb": "^1.5.3",
"cohere-ai": "^5.0.2",
"d3-dsv": "^2.0.0",
"dotenv": "^16.0.3",
Expand Down Expand Up @@ -588,7 +588,7 @@
"apify-client": "^2.7.1",
"axios": "*",
"cheerio": "^1.0.0-rc.12",
"chromadb": "^1.5.2",
"chromadb": "^1.5.3",
"cohere-ai": "^5.0.2",
"d3-dsv": "^2.0.0",
"epub2": "^3.0.1",
Expand Down
Loading

1 comment on commit 99b1985

@vercel
Copy link

@vercel vercel bot commented on 99b1985 Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.