-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VE-3102: DMH vector search: AION updates #590
Conversation
…eId to vtn-standard schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orca Security Scan Summary
Status | Check | Issues by priority | |
---|---|---|---|
![]() |
Infrastructure as Code | ![]() ![]() ![]() ![]() |
View in Orca |
![]() |
Secrets | ![]() ![]() ![]() ![]() |
View in Orca |
![]() |
Vulnerabilities | ![]() ![]() ![]() ![]() |
View in Orca |
@ndthang15 we also need to support GLC's new AION format which is similar but introduces a few more fields like fingerprintVector. The ticket links to an example in the search draft doc, but the example is collapsed so you have to expand it. Open up https://veritone.atlassian.net/wiki/spaces/ENG/pages/3330900143/Vector+Search+Draft#AION-updates and search for "GLC prototype" to expand and see it. Note that the object object does not need to have start and stop times. Here is another view of GLCs needed AION changes. https://veritone.atlassian.net/wiki/spaces/VP/pages/2859532299/Next+generation+tracker+AION+format#Accepted-Solution |
@mgiasiVeri Today I have updated the schema to support GLC's new AION format base on your information. And this PR is ready for review again. Thank you so much for reviewing! |
@ndthang15 i think we need "tags" added at the objectResult level too. If you look at the GLC example, they leverage tags in the top level object, which is outside of the seriesItem. Let me know if you have any questions. And to be clear, we do not need the start and stop time values added at the objectResult level, that is a mistake in the examples provided. |
@mgiasiVeri You're right that I'm missing the |
Based on the information of GLC prototype https://veritone.atlassian.net/wiki/spaces/ENG/pages/3330900143/Vector+Search+Draft#GLC-prototype%3A I see that fingerPrintVector has the properties |
@@ -767,6 +785,30 @@ | |||
"vendor": { | |||
"description": "Custom data that doesn't conform to any other field. You can add any arbitrary data inside this object, but it will not be indexed, searchable, or have any impact on the system. However it will be returned when reading the data back out.", | |||
"type": "object" | |||
}, | |||
"fingerprintVector": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we define the type and range of the items in the fingerprintVector array? like
"items": {
"type": "number",
"minimum": 0,
"maximum": 1,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defined the type of items in the fingerprintVector array in e534527. I think we don't need to define range for it, like the KNN example in the ES document:
curl -X POST "localhost:9200/image-index/_bulk?refresh=true&pretty" -H 'Content-Type: application/json' -d'
{ "index": { "_id": "1" } }
{ "image-vector": [1, 5, -20], "title-vector": [12, 50, -10, 0, 1], "title": "moose family", "file-type": "jpg" }
{ "index": { "_id": "2" } }
{ "image-vector": [42, 8, -15], "title-vector": [25, 1, 4, -12, 2], "title": "alpine lake", "file-type": "png" }
{ "index": { "_id": "3" } }
{ "image-vector": [15, 11, 23], "title-vector": [1, 5, 25, 50, 20], "title": "full moon", "file-type": "jpg" }
...
'
Could you please review again? Thanks you!
@crondonveritone I think that was a typo. label, referenced, and type should all be outside of fingerprintVector. And fingerprintVector should only be an array of numbers. @frankayars @alex-oleksiiuk could you confirm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested two minor description changes.
@@ -767,6 +785,33 @@ | |||
"vendor": { | |||
"description": "Custom data that doesn't conform to any other field. You can add any arbitrary data inside this object, but it will not be indexed, searchable, or have any impact on the system. However it will be returned when reading the data back out.", | |||
"type": "object" | |||
}, | |||
"fingerprintVector": { | |||
"description": "An array of vectors related to the vectorized data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
An array of floats representing objects in vector space.
} | ||
}, | ||
"embedding": { | ||
"description": "The embedding engine result was generated", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
An array of floats representing objects in vector space.
|
@frankayars Thanks for your suggestions. I changed the descriptions of these definitions. Also, the fingerprintVector definition was renamed 'vector'. Could you please review again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Overview
AION needs to be updated with additional properties that could be used by GLC and DMH vector search.
Description
referenceId
property toobjectResult
andseriesItem
definitions.embedding
object (withvector
,tags
, andreferenceId
properties) to a new top-level in the AION schema.tags
property to the top level ofobjectResult
.Related Issue
https://veritone.atlassian.net/browse/VE-3102
How Has This Been Tested
npm i && npm run test
in /packages/veritone-json-schemas folder.