Skip to content

Commit

Permalink
Release new docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Milvus-doc-bot authored and Milvus-doc-bot committed Sep 23, 2024
1 parent f014221 commit c4eaf3b
Show file tree
Hide file tree
Showing 7 changed files with 411 additions and 2 deletions.
102 changes: 102 additions & 0 deletions v2.4.x/site/en/embeddings/embed-with-instructor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
id: embed-with-instructor.md
order: 10
summary: This article describes how to use the InstructorEmbeddingFunction to encode documents and queries using the Instructor embedding model.
title: Instructor
---

# Instructor

[Instructor](https://instructor-embedding.github.io/) is an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning.

Milvus integrates with Instructor's embedding models via the InstructorEmbeddingFunction class. This class provides methods for encoding documents and queries using the Instructor embedding models and returning the embeddings as dense vectors compatible with Milvus indexing.

To use this feature, install the necessary dependencies:

```python
pip install --upgrade pymilvus
pip install "pymilvus[model]"
```

Then, instantiate the InstructorEmbeddingFunction:

```python
from pymilvus.model.dense import InstructorEmbeddingFunction

ef = InstructorEmbeddingFunction(
model_name="hkunlp/instructor-xl", # Defaults to `hkunlp/instructor-xl`
query_instruction="Represent the question for retrieval:",
doc_instruction="Represent the document for retrieval:"
)
```

**Parameters**:

- `model_name` (*string*)

The name of the Mistral AI embedding model to use for encoding. The value defaults to `hkunlp/instructor-xl`. For more information, refer to [Model List](https://github.com/xlang-ai/instructor-embedding?tab=readme-ov-file#model-list).

- `query_instruction` (*string*)

Task-specific instruction that guides the model on how to generate an embedding for a query or question.

- `doc_instruction` (*string*)

Task-specific instruction that guides the model to generate an embedding for a document.

To create embeddings for documents, use the `encode_documents()` method:

```python
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]

docs_embeddings = ef.encode_documents(docs)

# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension and shape of embeddings
print("Dim:", ef.dim, docs_embeddings[0].shape)
```

The expected output is similar to the following:

```python
Embeddings: [array([ 1.08575663e-02, 3.87877878e-03, 3.18090729e-02, -8.12458917e-02,
-4.68971021e-02, -5.85585833e-02, -5.95418774e-02, -8.55880603e-03,
-5.54775111e-02, -6.08020350e-02, 1.76202394e-02, 1.06648318e-02,
-5.89960292e-02, -7.46861771e-02, 6.60329172e-03, -4.25189249e-02,
...
-1.26921125e-02, 3.01475357e-02, 8.25323071e-03, -1.88470203e-02,
6.04814291e-03, -2.81618331e-02, 5.91602828e-03, 7.13866428e-02],
dtype=float32)]
Dim: 768 (768,)
```

To create embeddings for queries, use the `encode_queries()` method:

```python
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]

query_embeddings = ef.encode_queries(queries)

print("Embeddings:", query_embeddings)
print("Dim", ef.dim, query_embeddings[0].shape)
```

The expected output is similar to the following:

```python
Embeddings: [array([ 1.21721877e-02, 1.88485277e-03, 3.01732980e-02, -8.10302645e-02,
-6.13401756e-02, -3.98149453e-02, -5.18723316e-02, -6.76784338e-03,
-6.59285188e-02, -5.38365729e-02, -5.13435388e-03, -2.49210224e-02,
-5.74403182e-02, -7.03031123e-02, 6.63730130e-03, -3.42259370e-02,
...
7.36595877e-03, 2.85532661e-02, -1.55952033e-02, 2.13342719e-02,
1.51187545e-02, -2.82798670e-02, 2.69396193e-02, 6.16136603e-02],
dtype=float32)]
Dim 768 (768,)
```
96 changes: 96 additions & 0 deletions v2.4.x/site/en/embeddings/embed-with-mgte.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
id: embed-with-mgte.md
order: 13
summary: This article describes how to use the MGTEEmbeddingFunction to encode documents and queries using the mGTE embedding model.
title: mGTE
---

# mGTE

mGTE is a multilingual text representation model and reranking model for text retrieval tasks.

Milvus integrates with the mGTE embedding model via the MGTEEmbeddingFunction class. This class provides methods for encoding documents and queries using the mGTE embedding model and returning the embeddings as dense and sparse vectors compatible with Milvus indexing.

To use this feature, install the necessary dependencies:

```python
pip install --upgrade pymilvus
pip install "pymilvus[model]"
```

Then, instantiate the MGTEEmbeddingFunction:

```python
from pymilvus.model.hybrid import MGTEEmbeddingFunction

ef = MGTEEmbeddingFunction(
model_name="Alibaba-NLP/gte-multilingual-base", # Defaults to `Alibaba-NLP/gte-multilingual-base`
)
```

**Parameters**:

- `model_name` (*string*)

The name of the mGTE embedding model to use for encoding. The value defaults to `Alibaba-NLP/gte-multilingual-base`.

To create embeddings for documents, use the `encode_documents()` method:

```python
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]

docs_embeddings = ef.encode_documents(docs)

# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension of embeddings
print(ef.dim)
```

The expected output is similar to the following:

```python
Embeddings: {'dense': [tensor([-4.9149e-03, 1.6553e-02, -9.5524e-03, -2.1800e-02, 1.2075e-02,
1.8500e-02, -3.0632e-02, 5.5909e-02, 8.7365e-02, 1.8763e-02,
2.1708e-03, -2.7530e-02, -1.1523e-01, 6.5810e-03, -6.4674e-02,
6.7966e-02, 1.3005e-01, 1.1942e-01, -1.2174e-02, -4.0426e-02,
...
2.0129e-02, -2.3657e-02, 2.2626e-02, 2.1858e-02, -1.9181e-02,
6.0706e-02, -2.0558e-02, -4.2050e-02], device='mps:0')],
'sparse': <Compressed Sparse Row sparse array of dtype 'float64'
with 41 stored elements and shape (3, 250002)>}

{'dense': 768, 'sparse': 250002}
```

To create embeddings for queries, use the `encode_queries()` method:

```python
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]

query_embeddings = ef.encode_queries(queries)

print("Embeddings:", query_embeddings)
print(ef.dim)
```

The expected output is similar to the following:

```python
Embeddings: {'dense': [tensor([ 6.5883e-03, -7.9415e-03, -3.3669e-02, -2.6450e-02, 1.4345e-02,
1.9612e-02, -8.1679e-02, 5.6361e-02, 6.9020e-02, 1.9827e-02,
-9.2933e-03, -1.9995e-02, -1.0055e-01, -5.4053e-02, -8.5991e-02,
8.3004e-02, 1.0870e-01, 1.1565e-01, 2.1268e-02, -1.3782e-02,
...
3.2847e-02, -2.3751e-02, 3.4475e-02, 5.3623e-02, -3.3894e-02,
7.9408e-02, 8.2720e-03, -2.3459e-02], device='mps:0')],
'sparse': <Compressed Sparse Row sparse array of dtype 'float64'
with 13 stored elements and shape (2, 250002)>}

{'dense': 768, 'sparse': 250002}
```
88 changes: 88 additions & 0 deletions v2.4.x/site/en/embeddings/embed-with-mistral-ai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
id: embed-with-mistral-ai.md
order: 11
summary: This article describes how to use the MistralAIEmbeddingFunction to encode documents and queries using the Mistral AI embedding model.
title: Mistral AI
---

# Mistral AI

[Mistral AI](https://mistral.ai/)'s embedding models are text embedding models designed to convert textual inputs into dense numerical vectors, effectively capturing the underlying meaning of the text. These models are highly optimized for tasks such as semantic search, natural language understanding, and context-aware applications, making them suitable for a wide range of AI-powered solutions.

Milvus integrates with Mistral AI's embedding models via the MistralAIEmbeddingFunction class. This class provides methods for encoding documents and queries using the Mistral AI embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. To utilize this functionality, obtain an API key from [Mistral AI](https://console.mistral.ai/).

To use this feature, install the necessary dependencies:

```python
pip install --upgrade pymilvus
pip install "pymilvus[model]"
```

Then, instantiate the MistralAIEmbeddingFunction:

```python
from pymilvus.model.dense import MistralAIEmbeddingFunction

ef = MistralAIEmbeddingFunction(
model_name="mistral-embed", # Defaults to `mistral-embed`
api_key="MISTRAL_API_KEY" # Provide your Mistral AI API key
)
```

**Parameters**:

- `model_name` (*string*)

The name of the Mistral AI embedding model to use for encoding. The value defaults to `mistral-embed`. For more information, refer to [Embeddings](https://docs.mistral.ai/capabilities/embeddings/).

- `api_key` (*string*)

The API key for accessing the Mistral AI API.

To create embeddings for documents, use the `encode_documents()` method:

```python
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]

docs_embeddings = ef.encode_documents(docs)

# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension and shape of embeddings
print("Dim:", ef.dim, docs_embeddings[0].shape)
```

The expected output is similar to the following:

```python
Embeddings: [array([-0.06051636, 0.03207397, 0.04684448, ..., -0.01618958,
0.02442932, -0.01302338]), array([-0.04675293, 0.06512451, 0.04290771, ..., -0.01454926,
0.0014801 , 0.00686646]), array([-0.05978394, 0.08728027, 0.02217102, ..., -0.00681305,
0.03634644, -0.01802063])]
Dim: 1024 (1024,)
```

To create embeddings for queries, use the `encode_queries()` method:

```python
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]

query_embeddings = ef.encode_queries(queries)

print("Embeddings:", query_embeddings)
print("Dim", ef.dim, query_embeddings[0].shape)
```

The expected output is similar to the following:

```python
Embeddings: [array([-0.04916382, 0.04568481, 0.03594971, ..., -0.02653503,
0.02804565, 0.00600815]), array([-0.05938721, 0.07098389, 0.01773071, ..., -0.01708984,
0.03582764, 0.00366592])]
Dim 1024 (1024,)
```
95 changes: 95 additions & 0 deletions v2.4.x/site/en/embeddings/embed-with-nomic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
id: embed-with-nomic.md
order: 12
summary: This article describes how to use the NomicEmbeddingFunction to encode documents and queries using the Nomic embedding model.
title: Nomic
---

# Nomic

[Nomic](https://atlas.nomic.ai/) models are a series of advanced text and image embedding solutions developed by Nomic AI, designed to convert various forms of data into dense numerical vectors that capture their semantic meaning.

Milvus integrates with Nomic's embedding models via the NomicEmbeddingFunction class. This class provides methods for encoding documents and queries using the Nomic embedding models and returning the embeddings as dense vectors compatible with Milvus indexing. To utilize this functionality, obtain an API key from [Nomic Atlas](https://atlas.nomic.ai/).

To use this feature, install the necessary dependencies:

```python
pip install --upgrade pymilvus
pip install "pymilvus[model]"
```

Then, instantiate the NomicEmbeddingFunction:

```python
# Before accessing the Nomic Atlas API, configure your Nomic API token
import nomic
nomic.login('YOUR_NOMIC_API_KEY')

# Import Nomic embedding function
from pymilvus.model.dense import NomicEmbeddingFunction

ef = NomicEmbeddingFunction(
model_name="nomic-embed-text-v1.5", # Defaults to `mistral-embed`
)
```

**Parameters**:

- `model_name` (*string*)

The name of the Nomic embedding model to use for encoding. The value defaults to `nomic-embed-text-v1.5`. For more information, refer to [Nomic official documentation](https://docs.nomic.ai/atlas/models/image-embedding).

To create embeddings for documents, use the `encode_documents()` method:

```python
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]

docs_embeddings = ef.encode_documents(docs)

# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension and shape of embeddings
print("Dim:", ef.dim, docs_embeddings[0].shape)
```

The expected output is similar to the following:

```python
Embeddings: [array([ 5.59997560e-02, 7.23266600e-02, -1.51977540e-01, -4.53491200e-02,
6.49414060e-02, 4.33654800e-02, 2.26593020e-02, -3.51867680e-02,
3.49998470e-03, 1.75571440e-03, -4.30297850e-03, 1.81274410e-02,
...
-1.64337160e-02, -3.85437000e-02, 6.14318850e-02, -2.82745360e-02,
-7.25708000e-02, -4.15563580e-04, -7.63320900e-03, 1.88446040e-02,
-5.78002930e-02, 1.69830320e-02, -8.91876200e-03, -2.37731930e-02])]
Dim: 768 (768,)
```

To create embeddings for queries, use the `encode_queries()` method:

```python
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]

query_embeddings = ef.encode_queries(queries)

print("Embeddings:", query_embeddings)
print("Dim", ef.dim, query_embeddings[0].shape)
```

The expected output is similar to the following:

```python
Embeddings: [array([ 3.24096680e-02, 7.35473600e-02, -1.63940430e-01, -4.45556640e-02,
7.83081050e-02, 2.64587400e-02, 1.35898590e-03, -1.59606930e-02,
-3.33557130e-02, 1.05056760e-02, -2.35290530e-02, 2.23388670e-02,
...
7.67211900e-02, 4.54406740e-02, 9.70459000e-02, 4.00161740e-03,
-3.12805180e-02, -7.05566400e-02, 5.04760740e-02, 5.22766100e-02,
-3.87878400e-02, -3.03649900e-03, 5.90515140e-03, -1.95007320e-02])]
Dim 768 (768,)
```
Loading

0 comments on commit c4eaf3b

Please sign in to comment.