Skip to content

Commit

Permalink
Add regex patterns
Browse files Browse the repository at this point in the history
Added regex patterns and other misc updates to improve usability
  • Loading branch information
pwoznic committed Nov 22, 2024
1 parent 16253b5 commit 8a6870a
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ In <Config v="names.product"/>, when you [index a document](/docs/api-reference/
document has a `type` parameter that determines the format of the document
as `core` or `structured`. The `core` type has `document_parts` and the `structured`
type has `sections`. Both can be nested and both can contain separate `metadata`,
including some metadata that <Config v="names.product"/> will auto-generate.
including some metadata that <Config v="names.product"/> will auto-generate.

## Metadata structure

For example, a document might have global attributes such as the `URL` or `owner`
but individual sections have a `section` attribute and a `lang`.
Expand Down Expand Up @@ -65,17 +67,83 @@ document-level metadata, which can reduce the total time for the response.

## Metadata type consistency

The metadata type conversion applies only to the query responses. Metadata
remains unconverted during the document upload process, even when using API v2:
The metadata type conversion applies only to the `part_metadata` and
`document_metadata` fields in query responses. Metadata remains
unconverted during the document upload process, even when using API v2:

* **Numbers** are returned as numbers (for example, `section: 2`, `publicationyear: 1979`).
* **Booleans** are returned as booleans.
* **Booleans** are returned as `true` or `false` (case-sensitive).
* **JSON objects** maintain their native structure.

This behavior differs from API v1, where metadata such as `section` or
`publicationyear` might have been returned as strings (`"2"`, `"1979"`).
Ensure client applications handle these types correctly for smooth integration.

## Metadata type regex patterns

The following regex examples provide information about how each type is
identified and processed. By understanding these patterns, users can account
for type conversion in their client applications.

### Numbers regex

This pattern matches valid numeric formats, including integers, decimals, and
scientific notation, ensuring they are returned as numbers instead of strings.
Examples include `section: 2` or `offset: 316`.

**Pattern:** `-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?`


| Input | Matches | Explanation |
|------------|---------|--------------------------------------------|
| `123` || Valid integer. |
| `0` || Valid zero. |
| `-456` || Valid negative integer. |
| `3.14` || Valid decimal number. |
| `-0.001` || Valid negative decimal. |
| `2e10` || Valid scientific notation. |
| `-1.23E-4` || Valid negative number in scientific notation. |
| `.5` || Invalid (missing leading integer). |
| `1e` || Invalid (missing exponent value). |
| `1.2.3` || Invalid (multiple decimal points). |
| `-` || Invalid (missing digits). |


### Boolean regex

This pattern matches exact boolean values (`true` or `false`), with exact case
sensitivity and no extra characters.

**Pattern:** `^(true|false)$`

| Input | Matches | Explanation |
|------------|---------|-------------------------------------------------|
| `true` || Exact match for `true`. |
| `false` || Exact match for `false`. |
| ` true` || Invalid (leading space). |
| `false ` || Invalid (trailing space). |
| `True` || Invalid (case-sensitive; must be lowercase). |
| `TRUE` || Invalid (case-sensitive; must be lowercase). |
| `falsey` || Invalid (extra characters after `false`). |
| `truest` || Invalid (extra characters after `true`). |
| `tru` || Invalid (partial match; incomplete `true`). |

### JSON regex

This pattern identifies JSON-like structures, ensuring valid JSON objects so
that `{}` or arrays like `[]` are properly maintained.

**Pattern:** `^[{|\[].*$`

| Input | Matches | Explanation |
|---------------|---------|----------------------------------------------|
| `{example}` || Starts with `{` and has additional content. |
| `[data]` || Starts with `[` and has additional content. |
| `{` || Matches a single `{` at the start. |
| `[` || Matches a single `[` at the start. |
| `example` || Does not start with `{` or `[`. |
| `something{` || Starts with other characters, not `{`. |
| `(empty)` || Empty string does not match. |

## Combining document and section metadata

Expand Down
2 changes: 1 addition & 1 deletion www/docs/learn/recommendation-systems/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ are similar to the one they're looking at or a recently purchased product. These
use cases can be dealt with by using <Config v="names.product"/> in a
document-to-document search/recommendation platform. In order to do this, the
most important change is that you'll need to use `RESPONSE` similarity measure
(available to [our Scale plan users](https://vectara.com/pricing/)).
(available to [our Pro and Enterprise plan users](https://vectara.com/pricing/)).
It's easier to explain how this is different by first explaining how the `DEFAULT`
similarity works.

Expand Down

0 comments on commit 8a6870a

Please sign in to comment.