Skip to content

Commit

Permalink
Update table extract details and misc update (#370)
Browse files Browse the repository at this point in the history
  • Loading branch information
pwoznic authored Dec 19, 2024
1 parent 64a65a0 commit f9fc461
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@ following parts:
metadata to be associated with the extracted document.
- `chunking_strategy` (Optional) Specifies whether to split the document into
chunks during ingestion. If not set, the platform defaults to sentence-based
chunking, where each chunk contains one full sentence. Set the `type` as
`max_chars_chunking_strategy` and then specify the `max_chars_per_chunk` to
chunking, where each chunk contains typically one full sentence. Set the `type`
as `max_chars_chunking_strategy` and then specify the `max_chars_per_chunk` to
the number of characters per chunk like `512` or `1024`. Smaller chunks may improve granularity
but can lead to excessive latency, especially in applications with high
document volumes or large corpora.
Expand Down
2 changes: 1 addition & 1 deletion www/docs/api-reference/indexing-apis/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ sum of both values.

### Structured document chunking

By default, Vectara uses sentence-based chunking, where each chunk consists of
By default, Vectara uses sentence-based chunking, where each chunk typically contains
one complete sentence. This strategy works well but can lead to higher
retrieval latency because of the increased number of chunks. Alternatively,
you can use character-based chunking to make the chunks larger.
Expand Down
4 changes: 2 additions & 2 deletions www/docs/console-ui/manage_documents.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ shows the Text, Context, and Metadata.

![View Document Parts](/img/parts_tab.png)

Select the **Tables** tab to view information about extracted tables, including
Select the **Tables** tab to view information about ingested tables, including
their ID, Title, Rows, and Description. You can also select **View Table** in
the Table column.
the Table column to view the rendered table.

![View Table Tab](/img/tables_tab.png)
6 changes: 3 additions & 3 deletions www/docs/learn/querying-table-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,9 +130,9 @@ table-specific metadata that’s shaped like this:
* **row_num:** This value is a number if the search result is for a specific row
of the table.

When you open a corpus in the UI and select the **Data** tab, you can then
select the **Tables** tab to view the ingested table data as well as view the
rendered table. For more details, see [Manage Documents](/docs/console-ui/manage-documents).
When you open a corpus in the UI and select the **Data** tab, you can click on
**each uploaded document** and select the **Tables** tab to view the ingested table
data as well as view the rendered table. For more details, see [Manage Documents](/docs/console-ui/manage-documents).

## Table examples

Expand Down
5 changes: 3 additions & 2 deletions www/docs/learn/select-ideal-indexing-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,9 @@ the trade-offs between granularity and latency.
### Default chunking

By default, the platform uses sentence-based chunking, where each chunk
contains one complete sentence. This strategy can lead to higher retrieval
latency for large documents due to the increased number of chunks created.
typically contains one complete sentence. This strategy can lead to higher
retrieval latency for large documents due to the increased number of chunks
created.

### Fixed-size chunking

Expand Down

0 comments on commit f9fc461

Please sign in to comment.