Add more information about the chunking strategy parameter

vectara · Nov 25, 2024 · 7072af8 · 7072af8
1 parent fe9bdcd
commit 7072af8
Show file tree

Hide file tree

Showing 3 changed files with 27 additions and 39 deletions.
diff --git a/www/docs/api-reference/indexing-apis/file-upload/file_upload.md b/www/docs/api-reference/indexing-apis/file-upload/file_upload.md
@@ -54,11 +54,11 @@ following parts:
 
 - `metadata` - (Optional) Specifies a JSON object representing any additional
   metadata to be associated with the extracted document.
-- `chunking_strategy` - (Optional) Specifies how the document should be split into 
+- `chunking_strategy` - (Optional) Specifies whether to split the document into 
   chunks during ingestion. Set the `type` as `max_chars_chunking_strategy` and 
-  then specify the `max_chars_per_chunk` to the number of characters per chunk.
-  If not set, the platform defaults to sentence-based chunking, where each chunk 
-  contains one full sentence. 
+  then specify the `max_chars_per_chunk` to the number of characters per chunk 
+  like `200`. If not set, the platform defaults to sentence-based chunking, where 
+  each chunk contains one full sentence. 
 - `file` - Specifies the file that you want to upload.
 - `filename` - Specified as part of the `file` field with the file name that you 
   want to associate with the uploaded file.

diff --git a/www/docs/api-reference/indexing-apis/indexing.md b/www/docs/api-reference/indexing-apis/indexing.md
@@ -64,20 +64,11 @@ sum of both values.
 ### Structured document chunking
 
 Structured documents can also specify a `chunking_strategy` which indicates 
-how to split the document into chunks during ingestion. Set the `type` as 
+whether to split the document into chunks during ingestion. Set the `type` as 
 `max_chars_chunking_strategy` and then specify the `max_chars_per_chunk` to 
-the number of characters per chunk. If not set, the platform defaults to 
-sentence-based chunking, where each chunk contains one full sentence. For more 
-details, see [Chunking strategy](/docs/learn/select-ideal-indexing-api#chunking-strategy).
-
-In this example, you apply a limit of 200 characters per chunk:
-
-```json
-"chunking_strategy": {
-  "type": "max_chars_chunking_strategy",
-  "max_chars_per_chunk": 200
-}
-```
+the number of characters per chunk such as `200`. If not set, the platform 
+defaults to sentence-based chunking, where each chunk contains one full 
+sentence. For more details, see [Document chunking](/docs/learn/select-ideal-indexing-api#document-chunking).
 
 ## Core Document Object Definition
 

diff --git a/www/docs/learn/select-ideal-indexing-api.md b/www/docs/learn/select-ideal-indexing-api.md
@@ -1,7 +1,7 @@
 ---
 id: select-ideal-indexing-api
-title: Indexing
-sidebar_label: Indexing
+title: Data Ingestion
+sidebar_label: Data Ingestion
 ---
 
 import Tabs from '@theme/Tabs';
@@ -10,15 +10,15 @@ import CodeBlock from '@theme/CodeBlock';
 import {vars} from '@site/static/variables.json';
 import {Config} from '@site/docs/definitions.md';
 
-Efficient data ingestion is critical for ensuring that your application 
-delivers fast, accurate, and relevant query results. Whether handling 
-structured, semi-structured, or unstructured data, selecting the 
-right indexing method can significantly impact the performance and usability 
-of your applications. Vectara offers multiple indexing methods to accommodate 
-different use cases that enable users to efficiently index their data and 
-leverage our advanced search capabilities. This flexible approach allows for 
-the precise integration of Vectara’s search functionalities into different 
-applications.
+Efficient data ingestion, also known as indexing, is critical for ensuring 
+that your application delivers fast, accurate, and relevant query results. 
+Whether handling structured, semi-structured, or unstructured data, selecting 
+the right indexing method can significantly impact the performance and 
+usability of your applications. Vectara offers multiple indexing methods to 
+accommodate different use cases that enable users to efficiently index their 
+data and leverage our advanced search capabilities. This flexible approach 
+allows for the precise integration of Vectara’s search functionalities into 
+different applications.
 
 ## Vectara Ingest: sample data ingestion framework
 
@@ -74,7 +74,6 @@ We recommend this option for applications where documents already have a
 clear and consistent structure like news articles, product descriptions, 
 rows in database tables or CSV files, or records from an ERP system.
 
-
 ### Core document definition
 
 For the most advanced use cases, if you want full, granular control to chunk 
@@ -97,21 +96,19 @@ By leveraging the appropriate data indexing method is based on the nature of
 your documents, you can ingest and structure your data for optimal performance 
 with Vectara's Retrieval Augmented Generatation as-a-Service platform.
 
-## Chunking strategy
+## Document chunking
 
 Chunking refers to the process of breaking a document into smaller parts 
 (chunks) for efficient indexing and retrieval. Chunking is critical for 
 optimizing search performance, particularly for large documents.
 
-Both the [File Upload API](/docs/api-reference/indexing-apis/file-upload/file-upload) and [Indexing API](/docs/api-reference/indexing-apis/indexing) provide an optional `chunking_strategy` 
-parameter that enables you to define how documents should be chunked during 
-ingestion. When you set the `type` to `max_chars_chunking_strategy`, you can 
-then define the maximum number of characters per chunk which enables more 
-granular control over how the platform splits the document.
+Both the [File Upload API](/docs/api-reference/indexing-apis/file-upload/file-upload) and [Indexing API](/docs/api-reference/indexing-apis/indexing) provide an optional 
+`chunking_strategy` parameter that enables you to define how to chunk 
+documents during ingestion. When you set the `type` to 
+`max_chars_chunking_strategy`, you can then define the maximum number of 
+characters per chunk, which enables more granular control over how the 
+platform splits the document.
 
 If you do not set this option, then the platform uses the default chunking 
-strategy that defaults to each chunk containing one full sentence.
-
-
-
+strategy that splits each chunk into one full sentence.