diff --git a/www/docs/api-reference/search-apis/interpreting-responses/metadata.md b/www/docs/api-reference/search-apis/interpreting-responses/metadata.md index ac1c597b3..78a60fcb7 100644 --- a/www/docs/api-reference/search-apis/interpreting-responses/metadata.md +++ b/www/docs/api-reference/search-apis/interpreting-responses/metadata.md @@ -12,7 +12,9 @@ In , when you [index a document](/docs/api-reference/ document has a `type` parameter that determines the format of the document as `core` or `structured`. The `core` type has `document_parts` and the `structured` type has `sections`. Both can be nested and both can contain separate `metadata`, -including some metadata that will auto-generate. +including some metadata that will auto-generate. + +## Metadata structure For example, a document might have global attributes such as the `URL` or `owner` but individual sections have a `section` attribute and a `lang`. @@ -65,17 +67,83 @@ document-level metadata, which can reduce the total time for the response. ## Metadata type consistency -The metadata type conversion applies only to the query responses. Metadata -remains unconverted during the document upload process, even when using API v2: +The metadata type conversion applies only to the `part_metadata` and +`document_metadata` fields in query responses. Metadata remains +unconverted during the document upload process, even when using API v2: * **Numbers** are returned as numbers (for example, `section: 2`, `publicationyear: 1979`). -* **Booleans** are returned as booleans. +* **Booleans** are returned as `true` or `false` (case-sensitive). * **JSON objects** maintain their native structure. This behavior differs from API v1, where metadata such as `section` or `publicationyear` might have been returned as strings (`"2"`, `"1979"`). Ensure client applications handle these types correctly for smooth integration. +## Metadata type regex patterns + +The following regex examples provide information about how each type is +identified and processed. By understanding these patterns, users can account +for type conversion in their client applications. + +### Numbers regex + +This pattern matches valid numeric formats, including integers, decimals, and +scientific notation, ensuring they are returned as numbers instead of strings. +Examples include `section: 2` or `offset: 316`. + +**Pattern:** `-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?` + + +| Input | Matches | Explanation | +|------------|---------|--------------------------------------------| +| `123` | ✅ | Valid integer. | +| `0` | ✅ | Valid zero. | +| `-456` | ✅ | Valid negative integer. | +| `3.14` | ✅ | Valid decimal number. | +| `-0.001` | ✅ | Valid negative decimal. | +| `2e10` | ✅ | Valid scientific notation. | +| `-1.23E-4` | ✅ | Valid negative number in scientific notation. | +| `.5` | ❌ | Invalid (missing leading integer). | +| `1e` | ❌ | Invalid (missing exponent value). | +| `1.2.3` | ❌ | Invalid (multiple decimal points). | +| `-` | ❌ | Invalid (missing digits). | + + +### Boolean regex + +This pattern matches exact boolean values (`true` or `false`), with exact case +sensitivity and no extra characters. + +**Pattern:** `^(true|false)$` + +| Input | Matches | Explanation | +|------------|---------|-------------------------------------------------| +| `true` | ✅ | Exact match for `true`. | +| `false` | ✅ | Exact match for `false`. | +| ` true` | ❌ | Invalid (leading space). | +| `false ` | ❌ | Invalid (trailing space). | +| `True` | ❌ | Invalid (case-sensitive; must be lowercase). | +| `TRUE` | ❌ | Invalid (case-sensitive; must be lowercase). | +| `falsey` | ❌ | Invalid (extra characters after `false`). | +| `truest` | ❌ | Invalid (extra characters after `true`). | +| `tru` | ❌ | Invalid (partial match; incomplete `true`). | + +### JSON regex + +This pattern identifies JSON-like structures, ensuring valid JSON objects so +that `{}` or arrays like `[]` are properly maintained. + +**Pattern:** `^[{|\[].*$` + +| Input | Matches | Explanation | +|---------------|---------|----------------------------------------------| +| `{example}` | ✅ | Starts with `{` and has additional content. | +| `[data]` | ✅ | Starts with `[` and has additional content. | +| `{` | ✅ | Matches a single `{` at the start. | +| `[` | ✅ | Matches a single `[` at the start. | +| `example` | ❌ | Does not start with `{` or `[`. | +| `something{` | ❌ | Starts with other characters, not `{`. | +| `(empty)` | ❌ | Empty string does not match. | ## Combining document and section metadata diff --git a/www/docs/learn/recommendation-systems/overview.md b/www/docs/learn/recommendation-systems/overview.md index bdd52c339..5cfe2d1d7 100644 --- a/www/docs/learn/recommendation-systems/overview.md +++ b/www/docs/learn/recommendation-systems/overview.md @@ -46,7 +46,7 @@ are similar to the one they're looking at or a recently purchased product. These use cases can be dealt with by using in a document-to-document search/recommendation platform. In order to do this, the most important change is that you'll need to use `RESPONSE` similarity measure -(available to [our Scale plan users](https://vectara.com/pricing/)). +(available to [our Pro and Enterprise plan users](https://vectara.com/pricing/)). It's easier to explain how this is different by first explaining how the `DEFAULT` similarity works.