Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuvs-java: Support providing indexing data off-heap #698

Open
ChrisHegarty opened this issue Feb 14, 2025 · 0 comments
Open

cuvs-java: Support providing indexing data off-heap #698

ChrisHegarty opened this issue Feb 14, 2025 · 0 comments

Comments

@ChrisHegarty
Copy link
Contributor

Index builders allow to set the input vector dataset (to index) as a float[][]. This requires that all vectors be reified in the Java heap as a single contiguous block of memory. This can be expensive and may exhaust the java heap. Usage example:

float[][] dataset = ...

CagraIndex.newBuilder(resources)
  .withDataset(dataset)

For the Lucene use case, we write the raw vectors to disk and mmap them. To index them, then we need to copy them back into that Java heap as a float[][]. This is not ideal. It would be better to allow the index builder to set the dataset as an opaque pointer + length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant