Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should search engines follow database semantic conventions? #1869

Open
lmolkova opened this issue Feb 4, 2025 · 1 comment
Open

Should search engines follow database semantic conventions? #1869

lmolkova opened this issue Feb 4, 2025 · 1 comment
Labels
area:db question Further information is requested

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Feb 4, 2025

Services like Azure Search, AWS Kendra, or Google Cloud Search provide an interface similar to databases and allow to query data, but they are not usually the ones that store this data.
For example, Azure search indexes data from external storage. It can also act as a proxy when writing data.

Search engines are used in different scenarios than databases - they support non-exact matches, natural language queries, normalization (chunking, OCR, etc) on the indexing side.

Arguably, they look like a duck, quack like a duck, but they are not databases.

  1. Option 1: pretend that search engines are databases. Populate db.system.name and other DB attributes on them
    • Pros: common DB conventions would apply reasonably well on the client side
    • Cons:
      • they won't apply on the server-side
      • some concepts are much more applicable to search engines than databases in general (ranking scores, big free-form documents, OCR), the terminology is different. Evolving search inside db namespace would be limiting to search and not beneficial to db
  2. Option 2: have separate conventions for search engines
    • Pros: we'll be able to record semantics more precisely and in a more idiomatic way on the client and server side
    • Cons:
      • more conventions and some confusion between database (data source) and search engine being separate
      • it's not clear what elasticsearch or opensearch are (they can be either - search engine with internal or external data store)

Proposal (Option 2):

  1. Document what DB conventions cover: data stores, not search engines
  2. Search engines should have separate conventions that are not in db namespace and are focused on search capabilities.
  3. Each search engine system can decide what it is (or if it's both) and can leverage attributes from both namespaces. I.e. it may end up conforming to more than one set of conventions
@lmolkova
Copy link
Contributor Author

lmolkova commented Feb 5, 2025

curious what elasticsearch and opensearch folks think about having search vs DB conventions. /cc @gregkalapos @Aneurysm9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

1 participant