From c4eaf3bec169fb2a90f78f81f970881acdc59f0d Mon Sep 17 00:00:00 2001 From: Milvus-doc-bot Date: Mon, 23 Sep 2024 06:15:19 +0000 Subject: [PATCH] Release new docs --- .../en/embeddings/embed-with-instructor.md | 102 ++++++++++++++++++ v2.4.x/site/en/embeddings/embed-with-mgte.md | 96 +++++++++++++++++ .../en/embeddings/embed-with-mistral-ai.md | 88 +++++++++++++++ v2.4.x/site/en/embeddings/embed-with-nomic.md | 95 ++++++++++++++++ .../site/en/embeddings/embed-with-voyage.md | 4 +- v2.4.x/site/en/embeddings/embeddings.md | 4 + v2.4.x/site/en/menuStructure/en.json | 24 +++++ 7 files changed, 411 insertions(+), 2 deletions(-) create mode 100644 v2.4.x/site/en/embeddings/embed-with-instructor.md create mode 100644 v2.4.x/site/en/embeddings/embed-with-mgte.md create mode 100644 v2.4.x/site/en/embeddings/embed-with-mistral-ai.md create mode 100644 v2.4.x/site/en/embeddings/embed-with-nomic.md diff --git a/v2.4.x/site/en/embeddings/embed-with-instructor.md b/v2.4.x/site/en/embeddings/embed-with-instructor.md new file mode 100644 index 000000000..e13c3e7f8 --- /dev/null +++ b/v2.4.x/site/en/embeddings/embed-with-instructor.md @@ -0,0 +1,102 @@ +--- +id: embed-with-instructor.md +order: 10 +summary: This article describes how to use the InstructorEmbeddingFunction to encode documents and queries using the Instructor embedding model. +title: Instructor +--- + +# Instructor + +[Instructor](https://instructor-embedding.github.io/) is an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. + +Milvus integrates with Instructor's embedding models via the InstructorEmbeddingFunction class. This class provides methods for encoding documents and queries using the Instructor embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. + +To use this feature, install the necessary dependencies: + +```python +pip install --upgrade pymilvus +pip install "pymilvus[model]" +``` + +Then, instantiate the InstructorEmbeddingFunction: + +```python +from pymilvus.model.dense import InstructorEmbeddingFunction + +ef = InstructorEmbeddingFunction( + model_name="hkunlp/instructor-xl", # Defaults to `hkunlp/instructor-xl` + query_instruction="Represent the question for retrieval:", + doc_instruction="Represent the document for retrieval:" +) +``` + +**Parameters**: + +- `model_name` (*string*) + + The name of the Mistral AI embedding model to use for encoding. The value defaults to `hkunlp/instructor-xl`. For more information, refer to [Model List](https://github.com/xlang-ai/instructor-embedding?tab=readme-ov-file#model-list). + +- `query_instruction` (*string*) + + Task-specific instruction that guides the model on how to generate an embedding for a query or question. + +- `doc_instruction` (*string*) + + Task-specific instruction that guides the model to generate an embedding for a document. + +To create embeddings for documents, use the `encode_documents()` method: + +```python +docs = [ + "Artificial intelligence was founded as an academic discipline in 1956.", + "Alan Turing was the first person to conduct substantial research in AI.", + "Born in Maida Vale, London, Turing was raised in southern England.", +] + +docs_embeddings = ef.encode_documents(docs) + +# Print embeddings +print("Embeddings:", docs_embeddings) +# Print dimension and shape of embeddings +print("Dim:", ef.dim, docs_embeddings[0].shape) +``` + +The expected output is similar to the following: + +```python +Embeddings: [array([ 1.08575663e-02, 3.87877878e-03, 3.18090729e-02, -8.12458917e-02, + -4.68971021e-02, -5.85585833e-02, -5.95418774e-02, -8.55880603e-03, + -5.54775111e-02, -6.08020350e-02, 1.76202394e-02, 1.06648318e-02, + -5.89960292e-02, -7.46861771e-02, 6.60329172e-03, -4.25189249e-02, + ... + -1.26921125e-02, 3.01475357e-02, 8.25323071e-03, -1.88470203e-02, + 6.04814291e-03, -2.81618331e-02, 5.91602828e-03, 7.13866428e-02], + dtype=float32)] +Dim: 768 (768,) +``` + +To create embeddings for queries, use the `encode_queries()` method: + +```python +queries = ["When was artificial intelligence founded", + "Where was Alan Turing born?"] + +query_embeddings = ef.encode_queries(queries) + +print("Embeddings:", query_embeddings) +print("Dim", ef.dim, query_embeddings[0].shape) +``` + +The expected output is similar to the following: + +```python +Embeddings: [array([ 1.21721877e-02, 1.88485277e-03, 3.01732980e-02, -8.10302645e-02, + -6.13401756e-02, -3.98149453e-02, -5.18723316e-02, -6.76784338e-03, + -6.59285188e-02, -5.38365729e-02, -5.13435388e-03, -2.49210224e-02, + -5.74403182e-02, -7.03031123e-02, 6.63730130e-03, -3.42259370e-02, + ... + 7.36595877e-03, 2.85532661e-02, -1.55952033e-02, 2.13342719e-02, + 1.51187545e-02, -2.82798670e-02, 2.69396193e-02, 6.16136603e-02], + dtype=float32)] +Dim 768 (768,) +``` diff --git a/v2.4.x/site/en/embeddings/embed-with-mgte.md b/v2.4.x/site/en/embeddings/embed-with-mgte.md new file mode 100644 index 000000000..86cef2874 --- /dev/null +++ b/v2.4.x/site/en/embeddings/embed-with-mgte.md @@ -0,0 +1,96 @@ +--- +id: embed-with-mgte.md +order: 13 +summary: This article describes how to use the MGTEEmbeddingFunction to encode documents and queries using the mGTE embedding model. +title: mGTE +--- + +# mGTE + +mGTE is a multilingual text representation model and reranking model for text retrieval tasks. + +Milvus integrates with the mGTE embedding model via the MGTEEmbeddingFunction class. This class provides methods for encoding documents and queries using the mGTE embedding model and returning the embeddings as dense and sparse vectors compatible with Milvus indexing. + +To use this feature, install the necessary dependencies: + +```python +pip install --upgrade pymilvus +pip install "pymilvus[model]" +``` + +Then, instantiate the MGTEEmbeddingFunction: + +```python +from pymilvus.model.hybrid import MGTEEmbeddingFunction + +ef = MGTEEmbeddingFunction( + model_name="Alibaba-NLP/gte-multilingual-base", # Defaults to `Alibaba-NLP/gte-multilingual-base` +) +``` + +**Parameters**: + +- `model_name` (*string*) + + The name of the mGTE embedding model to use for encoding. The value defaults to `Alibaba-NLP/gte-multilingual-base`. + +To create embeddings for documents, use the `encode_documents()` method: + +```python +docs = [ + "Artificial intelligence was founded as an academic discipline in 1956.", + "Alan Turing was the first person to conduct substantial research in AI.", + "Born in Maida Vale, London, Turing was raised in southern England.", +] + +docs_embeddings = ef.encode_documents(docs) + +# Print embeddings +print("Embeddings:", docs_embeddings) +# Print dimension of embeddings +print(ef.dim) +``` + +The expected output is similar to the following: + +```python +Embeddings: {'dense': [tensor([-4.9149e-03, 1.6553e-02, -9.5524e-03, -2.1800e-02, 1.2075e-02, + 1.8500e-02, -3.0632e-02, 5.5909e-02, 8.7365e-02, 1.8763e-02, + 2.1708e-03, -2.7530e-02, -1.1523e-01, 6.5810e-03, -6.4674e-02, + 6.7966e-02, 1.3005e-01, 1.1942e-01, -1.2174e-02, -4.0426e-02, + ... + 2.0129e-02, -2.3657e-02, 2.2626e-02, 2.1858e-02, -1.9181e-02, + 6.0706e-02, -2.0558e-02, -4.2050e-02], device='mps:0')], + 'sparse': } + +{'dense': 768, 'sparse': 250002} +``` + +To create embeddings for queries, use the `encode_queries()` method: + +```python +queries = ["When was artificial intelligence founded", + "Where was Alan Turing born?"] + +query_embeddings = ef.encode_queries(queries) + +print("Embeddings:", query_embeddings) +print(ef.dim) +``` + +The expected output is similar to the following: + +```python +Embeddings: {'dense': [tensor([ 6.5883e-03, -7.9415e-03, -3.3669e-02, -2.6450e-02, 1.4345e-02, + 1.9612e-02, -8.1679e-02, 5.6361e-02, 6.9020e-02, 1.9827e-02, + -9.2933e-03, -1.9995e-02, -1.0055e-01, -5.4053e-02, -8.5991e-02, + 8.3004e-02, 1.0870e-01, 1.1565e-01, 2.1268e-02, -1.3782e-02, + ... + 3.2847e-02, -2.3751e-02, 3.4475e-02, 5.3623e-02, -3.3894e-02, + 7.9408e-02, 8.2720e-03, -2.3459e-02], device='mps:0')], + 'sparse': } + +{'dense': 768, 'sparse': 250002} +``` diff --git a/v2.4.x/site/en/embeddings/embed-with-mistral-ai.md b/v2.4.x/site/en/embeddings/embed-with-mistral-ai.md new file mode 100644 index 000000000..a9e214494 --- /dev/null +++ b/v2.4.x/site/en/embeddings/embed-with-mistral-ai.md @@ -0,0 +1,88 @@ +--- +id: embed-with-mistral-ai.md +order: 11 +summary: This article describes how to use the MistralAIEmbeddingFunction to encode documents and queries using the Mistral AI embedding model. +title: Mistral AI +--- + +# Mistral AI + +[Mistral AI](https://mistral.ai/)'s embedding models are text embedding models designed to convert textual inputs into dense numerical vectors, effectively capturing the underlying meaning of the text. These models are highly optimized for tasks such as semantic search, natural language understanding, and context-aware applications, making them suitable for a wide range of AI-powered solutions. + +Milvus integrates with Mistral AI's embedding models via the MistralAIEmbeddingFunction class. This class provides methods for encoding documents and queries using the Mistral AI embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. To utilize this functionality, obtain an API key from [Mistral AI](https://console.mistral.ai/). + +To use this feature, install the necessary dependencies: + +```python +pip install --upgrade pymilvus +pip install "pymilvus[model]" +``` + +Then, instantiate the MistralAIEmbeddingFunction: + +```python +from pymilvus.model.dense import MistralAIEmbeddingFunction + +ef = MistralAIEmbeddingFunction( + model_name="mistral-embed", # Defaults to `mistral-embed` + api_key="MISTRAL_API_KEY" # Provide your Mistral AI API key +) +``` + +**Parameters**: + +- `model_name` (*string*) + + The name of the Mistral AI embedding model to use for encoding. The value defaults to `mistral-embed`. For more information, refer to [Embeddings](https://docs.mistral.ai/capabilities/embeddings/). + +- `api_key` (*string*) + + The API key for accessing the Mistral AI API. + +To create embeddings for documents, use the `encode_documents()` method: + +```python +docs = [ + "Artificial intelligence was founded as an academic discipline in 1956.", + "Alan Turing was the first person to conduct substantial research in AI.", + "Born in Maida Vale, London, Turing was raised in southern England.", +] + +docs_embeddings = ef.encode_documents(docs) + +# Print embeddings +print("Embeddings:", docs_embeddings) +# Print dimension and shape of embeddings +print("Dim:", ef.dim, docs_embeddings[0].shape) +``` + +The expected output is similar to the following: + +```python +Embeddings: [array([-0.06051636, 0.03207397, 0.04684448, ..., -0.01618958, + 0.02442932, -0.01302338]), array([-0.04675293, 0.06512451, 0.04290771, ..., -0.01454926, + 0.0014801 , 0.00686646]), array([-0.05978394, 0.08728027, 0.02217102, ..., -0.00681305, + 0.03634644, -0.01802063])] +Dim: 1024 (1024,) +``` + +To create embeddings for queries, use the `encode_queries()` method: + +```python +queries = ["When was artificial intelligence founded", + "Where was Alan Turing born?"] + +query_embeddings = ef.encode_queries(queries) + +print("Embeddings:", query_embeddings) +print("Dim", ef.dim, query_embeddings[0].shape) +``` + +The expected output is similar to the following: + +```python +Embeddings: [array([-0.04916382, 0.04568481, 0.03594971, ..., -0.02653503, + 0.02804565, 0.00600815]), array([-0.05938721, 0.07098389, 0.01773071, ..., -0.01708984, + 0.03582764, 0.00366592])] +Dim 1024 (1024,) +``` diff --git a/v2.4.x/site/en/embeddings/embed-with-nomic.md b/v2.4.x/site/en/embeddings/embed-with-nomic.md new file mode 100644 index 000000000..f970c9224 --- /dev/null +++ b/v2.4.x/site/en/embeddings/embed-with-nomic.md @@ -0,0 +1,95 @@ +--- +id: embed-with-nomic.md +order: 12 +summary: This article describes how to use the NomicEmbeddingFunction to encode documents and queries using the Nomic embedding model. +title: Nomic +--- + +# Nomic + +[Nomic](https://atlas.nomic.ai/) models are a series of advanced text and image embedding solutions developed by Nomic AI, designed to convert various forms of data into dense numerical vectors that capture their semantic meaning. + +Milvus integrates with Nomic's embedding models via the NomicEmbeddingFunction class. This class provides methods for encoding documents and queries using the Nomic embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. To utilize this functionality, obtain an API key from [Nomic Atlas](https://atlas.nomic.ai/). + +To use this feature, install the necessary dependencies: + +```python +pip install --upgrade pymilvus +pip install "pymilvus[model]" +``` + +Then, instantiate the NomicEmbeddingFunction: + +```python +# Before accessing the Nomic Atlas API, configure your Nomic API token +import nomic +nomic.login('YOUR_NOMIC_API_KEY') + +# Import Nomic embedding function +from pymilvus.model.dense import NomicEmbeddingFunction + +ef = NomicEmbeddingFunction( + model_name="nomic-embed-text-v1.5", # Defaults to `mistral-embed` +) +``` + +**Parameters**: + +- `model_name` (*string*) + + The name of the Nomic embedding model to use for encoding. The value defaults to `nomic-embed-text-v1.5`. For more information, refer to [Nomic official documentation](https://docs.nomic.ai/atlas/models/image-embedding). + +To create embeddings for documents, use the `encode_documents()` method: + +```python +docs = [ + "Artificial intelligence was founded as an academic discipline in 1956.", + "Alan Turing was the first person to conduct substantial research in AI.", + "Born in Maida Vale, London, Turing was raised in southern England.", +] + +docs_embeddings = ef.encode_documents(docs) + +# Print embeddings +print("Embeddings:", docs_embeddings) +# Print dimension and shape of embeddings +print("Dim:", ef.dim, docs_embeddings[0].shape) +``` + +The expected output is similar to the following: + +```python +Embeddings: [array([ 5.59997560e-02, 7.23266600e-02, -1.51977540e-01, -4.53491200e-02, + 6.49414060e-02, 4.33654800e-02, 2.26593020e-02, -3.51867680e-02, + 3.49998470e-03, 1.75571440e-03, -4.30297850e-03, 1.81274410e-02, + ... + -1.64337160e-02, -3.85437000e-02, 6.14318850e-02, -2.82745360e-02, + -7.25708000e-02, -4.15563580e-04, -7.63320900e-03, 1.88446040e-02, + -5.78002930e-02, 1.69830320e-02, -8.91876200e-03, -2.37731930e-02])] +Dim: 768 (768,) +``` + +To create embeddings for queries, use the `encode_queries()` method: + +```python +queries = ["When was artificial intelligence founded", + "Where was Alan Turing born?"] + +query_embeddings = ef.encode_queries(queries) + +print("Embeddings:", query_embeddings) +print("Dim", ef.dim, query_embeddings[0].shape) +``` + +The expected output is similar to the following: + +```python +Embeddings: [array([ 3.24096680e-02, 7.35473600e-02, -1.63940430e-01, -4.45556640e-02, + 7.83081050e-02, 2.64587400e-02, 1.35898590e-03, -1.59606930e-02, + -3.33557130e-02, 1.05056760e-02, -2.35290530e-02, 2.23388670e-02, + ... + 7.67211900e-02, 4.54406740e-02, 9.70459000e-02, 4.00161740e-03, + -3.12805180e-02, -7.05566400e-02, 5.04760740e-02, 5.22766100e-02, + -3.87878400e-02, -3.03649900e-03, 5.90515140e-03, -1.95007320e-02])] +Dim 768 (768,) +``` diff --git a/v2.4.x/site/en/embeddings/embed-with-voyage.md b/v2.4.x/site/en/embeddings/embed-with-voyage.md index 169733c87..29fc1f1cb 100644 --- a/v2.4.x/site/en/embeddings/embed-with-voyage.md +++ b/v2.4.x/site/en/embeddings/embed-with-voyage.md @@ -22,7 +22,7 @@ Then, instantiate the `VoyageEmbeddingFunction`: from pymilvus.model.dense import VoyageEmbeddingFunction voyage_ef = VoyageEmbeddingFunction( - model_name="voyage-lite-02-instruct", # Defaults to `voyage-2` + model_name="voyage-3", # Defaults to `voyage-3` api_key=VOYAGE_API_KEY # Provide your Voyage API key ) ``` @@ -30,7 +30,7 @@ voyage_ef = VoyageEmbeddingFunction( __Parameters__: - `model_name` (string) - The name of the Voyage model to use for encoding. You can specify any of the available Voyage model names, for example, `voyage-law-2`, `voyage-code-2`, etc. If you leave this parameter unspecified, `voyage-2` will be used. For a list of available models, refer to [Voyage official documentation](https://docs.voyageai.com/docs/embeddings). + The name of the Voyage model to use for encoding. You can specify any of the available Voyage model names, for example, `voyage-3-lite`, `voyage-finance-2`, etc. If you leave this parameter unspecified, `voyage-3` will be used. For a list of available models, refer to [Voyage official documentation](https://docs.voyageai.com/docs/embeddings). - `api_key` (string) The API key for accessing the Voyage API. For information on how to create an API key, refer to [API Key and Python Client](https://docs.voyageai.com/docs/api-key-and-installation). diff --git a/v2.4.x/site/en/embeddings/embeddings.md b/v2.4.x/site/en/embeddings/embeddings.md index 2d140b48c..1ad2e4114 100644 --- a/v2.4.x/site/en/embeddings/embeddings.md +++ b/v2.4.x/site/en/embeddings/embeddings.md @@ -29,6 +29,10 @@ To create embeddings in action, refer to [Using PyMilvus's Model To Generate Tex | [voyageai](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/VoyageEmbeddingFunction/VoyageEmbeddingFunction.md) | Dense | API | | [jina](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/JinaEmbeddingFunction/JinaEmbeddingFunction.md) | Dense | API | | [cohere](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/CohereEmbeddingFunction/CohereEmbeddingFunction.md) | Dense | API | +| [Instructor](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/InstructorEmbeddingFunction/InstructorEmbeddingFunction.md) | Dense | Open-sourced | +| [Mistral AI](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/MistralAIEmbeddingFunction/MistralAIEmbeddingFunction.md) | Dense | API | +| [Nomic](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/NomicEmbeddingFunction/NomicEmbeddingFunction.md) | Dense | API | +| [mGTE](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/MGTEEmbeddingFunction/MGTEEmbeddingFunction.md) | Hybrid | Open-sourced | ## Example 1: Use default embedding function to generate dense vectors diff --git a/v2.4.x/site/en/menuStructure/en.json b/v2.4.x/site/en/menuStructure/en.json index 96c0241a9..17cddac04 100644 --- a/v2.4.x/site/en/menuStructure/en.json +++ b/v2.4.x/site/en/menuStructure/en.json @@ -567,6 +567,30 @@ "id": "embed-with-cohere.md", "order": 8, "children": [] + }, + { + "label": "Instructor", + "id": "embed-with-instructor.md", + "order": 9, + "children": [] + }, + { + "label": "Mistral AI", + "id": "embed-with-mistral-ai.md", + "order": 10, + "children": [] + }, + { + "label": "Nomic", + "id": "embed-with-nomic.md", + "order": 11, + "children": [] + }, + { + "label": "mGTE", + "id": "embed-with-mgte.md", + "order": 12, + "children": [] } ] },