-
Notifications
You must be signed in to change notification settings - Fork 122
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Milvus-doc-bot
authored and
Milvus-doc-bot
committed
Sep 23, 2024
1 parent
f014221
commit c4eaf3b
Showing
7 changed files
with
411 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
--- | ||
id: embed-with-instructor.md | ||
order: 10 | ||
summary: This article describes how to use the InstructorEmbeddingFunction to encode documents and queries using the Instructor embedding model. | ||
title: Instructor | ||
--- | ||
|
||
# Instructor | ||
|
||
[Instructor](https://instructor-embedding.github.io/) is an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. | ||
|
||
Milvus integrates with Instructor's embedding models via the InstructorEmbeddingFunction class. This class provides methods for encoding documents and queries using the Instructor embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. | ||
|
||
To use this feature, install the necessary dependencies: | ||
|
||
```python | ||
pip install --upgrade pymilvus | ||
pip install "pymilvus[model]" | ||
``` | ||
|
||
Then, instantiate the InstructorEmbeddingFunction: | ||
|
||
```python | ||
from pymilvus.model.dense import InstructorEmbeddingFunction | ||
|
||
ef = InstructorEmbeddingFunction( | ||
model_name="hkunlp/instructor-xl", # Defaults to `hkunlp/instructor-xl` | ||
query_instruction="Represent the question for retrieval:", | ||
doc_instruction="Represent the document for retrieval:" | ||
) | ||
``` | ||
|
||
**Parameters**: | ||
|
||
- `model_name` (*string*) | ||
|
||
The name of the Mistral AI embedding model to use for encoding. The value defaults to `hkunlp/instructor-xl`. For more information, refer to [Model List](https://github.com/xlang-ai/instructor-embedding?tab=readme-ov-file#model-list). | ||
|
||
- `query_instruction` (*string*) | ||
|
||
Task-specific instruction that guides the model on how to generate an embedding for a query or question. | ||
|
||
- `doc_instruction` (*string*) | ||
|
||
Task-specific instruction that guides the model to generate an embedding for a document. | ||
|
||
To create embeddings for documents, use the `encode_documents()` method: | ||
|
||
```python | ||
docs = [ | ||
"Artificial intelligence was founded as an academic discipline in 1956.", | ||
"Alan Turing was the first person to conduct substantial research in AI.", | ||
"Born in Maida Vale, London, Turing was raised in southern England.", | ||
] | ||
|
||
docs_embeddings = ef.encode_documents(docs) | ||
|
||
# Print embeddings | ||
print("Embeddings:", docs_embeddings) | ||
# Print dimension and shape of embeddings | ||
print("Dim:", ef.dim, docs_embeddings[0].shape) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: [array([ 1.08575663e-02, 3.87877878e-03, 3.18090729e-02, -8.12458917e-02, | ||
-4.68971021e-02, -5.85585833e-02, -5.95418774e-02, -8.55880603e-03, | ||
-5.54775111e-02, -6.08020350e-02, 1.76202394e-02, 1.06648318e-02, | ||
-5.89960292e-02, -7.46861771e-02, 6.60329172e-03, -4.25189249e-02, | ||
... | ||
-1.26921125e-02, 3.01475357e-02, 8.25323071e-03, -1.88470203e-02, | ||
6.04814291e-03, -2.81618331e-02, 5.91602828e-03, 7.13866428e-02], | ||
dtype=float32)] | ||
Dim: 768 (768,) | ||
``` | ||
|
||
To create embeddings for queries, use the `encode_queries()` method: | ||
|
||
```python | ||
queries = ["When was artificial intelligence founded", | ||
"Where was Alan Turing born?"] | ||
|
||
query_embeddings = ef.encode_queries(queries) | ||
|
||
print("Embeddings:", query_embeddings) | ||
print("Dim", ef.dim, query_embeddings[0].shape) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: [array([ 1.21721877e-02, 1.88485277e-03, 3.01732980e-02, -8.10302645e-02, | ||
-6.13401756e-02, -3.98149453e-02, -5.18723316e-02, -6.76784338e-03, | ||
-6.59285188e-02, -5.38365729e-02, -5.13435388e-03, -2.49210224e-02, | ||
-5.74403182e-02, -7.03031123e-02, 6.63730130e-03, -3.42259370e-02, | ||
... | ||
7.36595877e-03, 2.85532661e-02, -1.55952033e-02, 2.13342719e-02, | ||
1.51187545e-02, -2.82798670e-02, 2.69396193e-02, 6.16136603e-02], | ||
dtype=float32)] | ||
Dim 768 (768,) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
--- | ||
id: embed-with-mgte.md | ||
order: 13 | ||
summary: This article describes how to use the MGTEEmbeddingFunction to encode documents and queries using the mGTE embedding model. | ||
title: mGTE | ||
--- | ||
|
||
# mGTE | ||
|
||
mGTE is a multilingual text representation model and reranking model for text retrieval tasks. | ||
|
||
Milvus integrates with the mGTE embedding model via the MGTEEmbeddingFunction class. This class provides methods for encoding documents and queries using the mGTE embedding model and returning the embeddings as dense and sparse vectors compatible with Milvus indexing. | ||
|
||
To use this feature, install the necessary dependencies: | ||
|
||
```python | ||
pip install --upgrade pymilvus | ||
pip install "pymilvus[model]" | ||
``` | ||
|
||
Then, instantiate the MGTEEmbeddingFunction: | ||
|
||
```python | ||
from pymilvus.model.hybrid import MGTEEmbeddingFunction | ||
|
||
ef = MGTEEmbeddingFunction( | ||
model_name="Alibaba-NLP/gte-multilingual-base", # Defaults to `Alibaba-NLP/gte-multilingual-base` | ||
) | ||
``` | ||
|
||
**Parameters**: | ||
|
||
- `model_name` (*string*) | ||
|
||
The name of the mGTE embedding model to use for encoding. The value defaults to `Alibaba-NLP/gte-multilingual-base`. | ||
|
||
To create embeddings for documents, use the `encode_documents()` method: | ||
|
||
```python | ||
docs = [ | ||
"Artificial intelligence was founded as an academic discipline in 1956.", | ||
"Alan Turing was the first person to conduct substantial research in AI.", | ||
"Born in Maida Vale, London, Turing was raised in southern England.", | ||
] | ||
|
||
docs_embeddings = ef.encode_documents(docs) | ||
|
||
# Print embeddings | ||
print("Embeddings:", docs_embeddings) | ||
# Print dimension of embeddings | ||
print(ef.dim) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: {'dense': [tensor([-4.9149e-03, 1.6553e-02, -9.5524e-03, -2.1800e-02, 1.2075e-02, | ||
1.8500e-02, -3.0632e-02, 5.5909e-02, 8.7365e-02, 1.8763e-02, | ||
2.1708e-03, -2.7530e-02, -1.1523e-01, 6.5810e-03, -6.4674e-02, | ||
6.7966e-02, 1.3005e-01, 1.1942e-01, -1.2174e-02, -4.0426e-02, | ||
... | ||
2.0129e-02, -2.3657e-02, 2.2626e-02, 2.1858e-02, -1.9181e-02, | ||
6.0706e-02, -2.0558e-02, -4.2050e-02], device='mps:0')], | ||
'sparse': <Compressed Sparse Row sparse array of dtype 'float64' | ||
with 41 stored elements and shape (3, 250002)>} | ||
|
||
{'dense': 768, 'sparse': 250002} | ||
``` | ||
|
||
To create embeddings for queries, use the `encode_queries()` method: | ||
|
||
```python | ||
queries = ["When was artificial intelligence founded", | ||
"Where was Alan Turing born?"] | ||
|
||
query_embeddings = ef.encode_queries(queries) | ||
|
||
print("Embeddings:", query_embeddings) | ||
print(ef.dim) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: {'dense': [tensor([ 6.5883e-03, -7.9415e-03, -3.3669e-02, -2.6450e-02, 1.4345e-02, | ||
1.9612e-02, -8.1679e-02, 5.6361e-02, 6.9020e-02, 1.9827e-02, | ||
-9.2933e-03, -1.9995e-02, -1.0055e-01, -5.4053e-02, -8.5991e-02, | ||
8.3004e-02, 1.0870e-01, 1.1565e-01, 2.1268e-02, -1.3782e-02, | ||
... | ||
3.2847e-02, -2.3751e-02, 3.4475e-02, 5.3623e-02, -3.3894e-02, | ||
7.9408e-02, 8.2720e-03, -2.3459e-02], device='mps:0')], | ||
'sparse': <Compressed Sparse Row sparse array of dtype 'float64' | ||
with 13 stored elements and shape (2, 250002)>} | ||
|
||
{'dense': 768, 'sparse': 250002} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
--- | ||
id: embed-with-mistral-ai.md | ||
order: 11 | ||
summary: This article describes how to use the MistralAIEmbeddingFunction to encode documents and queries using the Mistral AI embedding model. | ||
title: Mistral AI | ||
--- | ||
|
||
# Mistral AI | ||
|
||
[Mistral AI](https://mistral.ai/)'s embedding models are text embedding models designed to convert textual inputs into dense numerical vectors, effectively capturing the underlying meaning of the text. These models are highly optimized for tasks such as semantic search, natural language understanding, and context-aware applications, making them suitable for a wide range of AI-powered solutions. | ||
|
||
Milvus integrates with Mistral AI's embedding models via the MistralAIEmbeddingFunction class. This class provides methods for encoding documents and queries using the Mistral AI embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. To utilize this functionality, obtain an API key from [Mistral AI](https://console.mistral.ai/). | ||
|
||
To use this feature, install the necessary dependencies: | ||
|
||
```python | ||
pip install --upgrade pymilvus | ||
pip install "pymilvus[model]" | ||
``` | ||
|
||
Then, instantiate the MistralAIEmbeddingFunction: | ||
|
||
```python | ||
from pymilvus.model.dense import MistralAIEmbeddingFunction | ||
|
||
ef = MistralAIEmbeddingFunction( | ||
model_name="mistral-embed", # Defaults to `mistral-embed` | ||
api_key="MISTRAL_API_KEY" # Provide your Mistral AI API key | ||
) | ||
``` | ||
|
||
**Parameters**: | ||
|
||
- `model_name` (*string*) | ||
|
||
The name of the Mistral AI embedding model to use for encoding. The value defaults to `mistral-embed`. For more information, refer to [Embeddings](https://docs.mistral.ai/capabilities/embeddings/). | ||
|
||
- `api_key` (*string*) | ||
|
||
The API key for accessing the Mistral AI API. | ||
|
||
To create embeddings for documents, use the `encode_documents()` method: | ||
|
||
```python | ||
docs = [ | ||
"Artificial intelligence was founded as an academic discipline in 1956.", | ||
"Alan Turing was the first person to conduct substantial research in AI.", | ||
"Born in Maida Vale, London, Turing was raised in southern England.", | ||
] | ||
|
||
docs_embeddings = ef.encode_documents(docs) | ||
|
||
# Print embeddings | ||
print("Embeddings:", docs_embeddings) | ||
# Print dimension and shape of embeddings | ||
print("Dim:", ef.dim, docs_embeddings[0].shape) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: [array([-0.06051636, 0.03207397, 0.04684448, ..., -0.01618958, | ||
0.02442932, -0.01302338]), array([-0.04675293, 0.06512451, 0.04290771, ..., -0.01454926, | ||
0.0014801 , 0.00686646]), array([-0.05978394, 0.08728027, 0.02217102, ..., -0.00681305, | ||
0.03634644, -0.01802063])] | ||
Dim: 1024 (1024,) | ||
``` | ||
|
||
To create embeddings for queries, use the `encode_queries()` method: | ||
|
||
```python | ||
queries = ["When was artificial intelligence founded", | ||
"Where was Alan Turing born?"] | ||
|
||
query_embeddings = ef.encode_queries(queries) | ||
|
||
print("Embeddings:", query_embeddings) | ||
print("Dim", ef.dim, query_embeddings[0].shape) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: [array([-0.04916382, 0.04568481, 0.03594971, ..., -0.02653503, | ||
0.02804565, 0.00600815]), array([-0.05938721, 0.07098389, 0.01773071, ..., -0.01708984, | ||
0.03582764, 0.00366592])] | ||
Dim 1024 (1024,) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
--- | ||
id: embed-with-nomic.md | ||
order: 12 | ||
summary: This article describes how to use the NomicEmbeddingFunction to encode documents and queries using the Nomic embedding model. | ||
title: Nomic | ||
--- | ||
|
||
# Nomic | ||
|
||
[Nomic](https://atlas.nomic.ai/) models are a series of advanced text and image embedding solutions developed by Nomic AI, designed to convert various forms of data into dense numerical vectors that capture their semantic meaning. | ||
|
||
Milvus integrates with Nomic's embedding models via the NomicEmbeddingFunction class. This class provides methods for encoding documents and queries using the Nomic embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. To utilize this functionality, obtain an API key from [Nomic Atlas](https://atlas.nomic.ai/). | ||
|
||
To use this feature, install the necessary dependencies: | ||
|
||
```python | ||
pip install --upgrade pymilvus | ||
pip install "pymilvus[model]" | ||
``` | ||
|
||
Then, instantiate the NomicEmbeddingFunction: | ||
|
||
```python | ||
# Before accessing the Nomic Atlas API, configure your Nomic API token | ||
import nomic | ||
nomic.login('YOUR_NOMIC_API_KEY') | ||
|
||
# Import Nomic embedding function | ||
from pymilvus.model.dense import NomicEmbeddingFunction | ||
|
||
ef = NomicEmbeddingFunction( | ||
model_name="nomic-embed-text-v1.5", # Defaults to `mistral-embed` | ||
) | ||
``` | ||
|
||
**Parameters**: | ||
|
||
- `model_name` (*string*) | ||
|
||
The name of the Nomic embedding model to use for encoding. The value defaults to `nomic-embed-text-v1.5`. For more information, refer to [Nomic official documentation](https://docs.nomic.ai/atlas/models/image-embedding). | ||
|
||
To create embeddings for documents, use the `encode_documents()` method: | ||
|
||
```python | ||
docs = [ | ||
"Artificial intelligence was founded as an academic discipline in 1956.", | ||
"Alan Turing was the first person to conduct substantial research in AI.", | ||
"Born in Maida Vale, London, Turing was raised in southern England.", | ||
] | ||
|
||
docs_embeddings = ef.encode_documents(docs) | ||
|
||
# Print embeddings | ||
print("Embeddings:", docs_embeddings) | ||
# Print dimension and shape of embeddings | ||
print("Dim:", ef.dim, docs_embeddings[0].shape) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: [array([ 5.59997560e-02, 7.23266600e-02, -1.51977540e-01, -4.53491200e-02, | ||
6.49414060e-02, 4.33654800e-02, 2.26593020e-02, -3.51867680e-02, | ||
3.49998470e-03, 1.75571440e-03, -4.30297850e-03, 1.81274410e-02, | ||
... | ||
-1.64337160e-02, -3.85437000e-02, 6.14318850e-02, -2.82745360e-02, | ||
-7.25708000e-02, -4.15563580e-04, -7.63320900e-03, 1.88446040e-02, | ||
-5.78002930e-02, 1.69830320e-02, -8.91876200e-03, -2.37731930e-02])] | ||
Dim: 768 (768,) | ||
``` | ||
|
||
To create embeddings for queries, use the `encode_queries()` method: | ||
|
||
```python | ||
queries = ["When was artificial intelligence founded", | ||
"Where was Alan Turing born?"] | ||
|
||
query_embeddings = ef.encode_queries(queries) | ||
|
||
print("Embeddings:", query_embeddings) | ||
print("Dim", ef.dim, query_embeddings[0].shape) | ||
``` | ||
|
||
The expected output is similar to the following: | ||
|
||
```python | ||
Embeddings: [array([ 3.24096680e-02, 7.35473600e-02, -1.63940430e-01, -4.45556640e-02, | ||
7.83081050e-02, 2.64587400e-02, 1.35898590e-03, -1.59606930e-02, | ||
-3.33557130e-02, 1.05056760e-02, -2.35290530e-02, 2.23388670e-02, | ||
... | ||
7.67211900e-02, 4.54406740e-02, 9.70459000e-02, 4.00161740e-03, | ||
-3.12805180e-02, -7.05566400e-02, 5.04760740e-02, 5.22766100e-02, | ||
-3.87878400e-02, -3.03649900e-03, 5.90515140e-03, -1.95007320e-02])] | ||
Dim 768 (768,) | ||
``` |
Oops, something went wrong.