Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for pgvector Extension #301

Closed

Conversation

KellyRousselHoomano
Copy link

This pull request aims to enhance sqlacodegen's capabilities, allowing users to seamlessly integrate their PostgreSQL databases with pgvector for advanced data types such as 'Vector.' The feature caters to the increasing demand for efficient handling of embeddings in the context of Large Language Models (LLMs) and retrieval tools.

@coveralls
Copy link

coveralls commented Nov 14, 2023

Coverage Status

coverage: 97.639%. remained the same
when pulling 67057e7 on hoomano:feature-pgvector
into 8eae529 on agronholm:master.

@KellyRousselHoomano
Copy link
Author

Changes Made:
Created a dedicated branch: feature-pgvector.
Followed a similar process employed for previous extensions such as "citext" or "geoalchemy2" to enable support for the pgvector extension.
Verified successful installation using:

pip install git+https://github.com/hoomano/sqlacodegen.git@feature-pgvector#egg=sqlacodegen\[pgvector\]

Issue:
The current issue is evident when attempting to export PostgreSQL database models using sqlacodegen, resulting in the following warning:

/Users/kellyroussel/anaconda3/lib/python3.10/site-packages/sqlacodegen/cli.py:81: SAWarning: Did not recognize type 'vector' of column 'embedding'
  metadata.reflect(engine, schema, not args.noviews, tables)

@agronholm
Copy link
Owner

Could you also add a note to the changelog?

@KellyRousselHoomano KellyRousselHoomano marked this pull request as draft November 14, 2023 15:31
CHANGES.rst Outdated Show resolved Hide resolved
CHANGES.rst Outdated Show resolved Hide resolved
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
@agronholm
Copy link
Owner

You'll need to add the other suggested change too (I can't do it for you as you prevented me from modifying the branch).

Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
@agronholm
Copy link
Owner

Alright, if you're okay with me merging this now, then remove the draft status please.

@KellyRousselHoomano
Copy link
Author

@agronholm Thank you for that. Problem is I'm not sure this is working !

I have verified that pgvector is correctly installed when using the command:

pip install git+https://github.com/hoomano/sqlacodegen.git@feature-pgvector#egg=sqlacodegen\[pgvector\]

However, despite successful installation, running the sqlacodegen command line to export PostgreSQL database models results in the following warning:

sqlacodegen/cli.py:81: SAWarning: Did not recognize type 'vector' of column 'embedding'
  metadata.reflect(engine, schema, not args.noviews, tables)

I don't know if I'm wrong with the installation command (egg=sqlacodegen\[pgvector\] ?) or if this is not enough to enable pgvector in the right way?

@agronholm
Copy link
Owner

I thought you already tried it before sending this PR. Did you check if pgvector actually gets installed?

@agronholm
Copy link
Owner

Adding #egg=... to an install URL is not kosher. If you want to do a git-based install, the syntax for that is pip install sqlacodegen[pgvector]@git+https://github.com/hoomano/sqlacodegen.git@feature-pgvector. I can confirm that this does install the pgvector extra.

@KellyRousselHoomano
Copy link
Author

Thank you for the correct syntax. I can also confirm the pgvector extension is installed correctly.
But then running sqlacodegen ... it doesn't find type Vector. I may either be pointing to the wrong sqlacodegen or the PR is not working and I have no idea why 🤔

@agronholm
Copy link
Owner

Have you managed to get it work on a local checkout? If importing pgvector enough?

@agronholm
Copy link
Owner

It looks like pgvector is a namespace package and doesn't actually contain anything that would register classes with any PostgreSQL driver.

@agronholm
Copy link
Owner

What if you change it to import pgvector.sqlalchemy?

@agronholm
Copy link
Owner

You could also try to add:

import pgvector.asyncpg
import pgvector.psycopg
import pgvector.psycopg2

@KellyRousselHoomano
Copy link
Author

I managed to add:

import pgvector.sqlalchemy
import pgvector.psycopg2

Those are correctly imported but I still face the same error when running sqlacodegen command...

@agronholm
Copy link
Owner

And you're using the psycopg2 driver to connect to your postgresql db?

@agronholm
Copy link
Owner

According to the README, these types need to be explicitly registered with a connection :P

@agronholm
Copy link
Owner

Too bad pgvector doesn't auto-register its types in the SQLAlchemy dialects like GeoAlchemy2 does.

@KellyRousselHoomano
Copy link
Author

You seem to understand what's the problem but I don't get it ! What part of the README are you referring? Is there anything I can do to implement what's missing?

@agronholm
Copy link
Owner

Scratch that part about registering types on a connection. What needs to happen is for pgvector-python to insert its own types to SQLAlchemy's type registries on import, like GeoAlchemy2 does. You should probably make a PR against pgvector-python to make that happen.

@KellyRousselHoomano
Copy link
Author

PR submitted to pgvector-python repo 🤞

@agronholm
Copy link
Owner

Awesome, now we just need to wait for a release.

@KellyRousselHoomano KellyRousselHoomano marked this pull request as ready for review November 27, 2023 08:35
Comment on lines +24 to +25
pgvector = None
print("Import error")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pgvector = None
print("Import error")
pass

Comment on lines +4 to +5
**UNRELEASED**
- Added support for the ``pgvector`` extension
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**UNRELEASED**
- Added support for the ``pgvector`` extension
**UNRELEASED**
- Added support for the ``pgvector`` extension (PR by KellyRousselHoomano)

@agronholm
Copy link
Owner

I made the necessary changes on my own and credited you in the changelog.

@agronholm agronholm closed this Nov 30, 2023
@KellyRousselHoomano
Copy link
Author

Ow thank you !!

@pezafar
Copy link

pezafar commented Jan 12, 2024

Hi !

Thank you @agronholm and @KellyRousselHoomano for the project and the feature, this is much appreciated :)

I have a problem when installing from github, I still have this issue:
xxx/xxx/lib/python3.10/site-packages/sqlacodegen/cli.py:85: SAWarning: Did not recognize type 'vector' of column 'xxx'
when using the cli.

To install I ran pip install "sqlacodegen[pgvector]@git+https://github.com/agronholm/sqlacodegen.git" and pgvector is well imported as I have the Using pgvector 0.2.4 output but I can't figure why I still have this issue.

(Although when installing from the @KellyRousselHoomano fork everything seemed to work)

Thanks !

@monneyboi
Copy link

I'm also still running into this issue, versions installed:

poetry show | grep "sqla\|pgvec"
pgvector             0.2.4      pgvector support for Python
sqlacodegen          3.0.0rc4   Automatic model code generator for SQLAlchemy
sqlalchemy           2.0.27     Database Abstraction Library

Output of sqlacodegen (abbreviated for clarity)

poetry run sqlacodegen --generator=dataclasses --schemas=private postgresql://... --outfile models.py
Using pgvector 0.2.4
...lib/python3.11/site-packages/sqlacodegen/cli.py:85: SAWarning: Did not recognize type 'vector' of column 'embedding'
  metadata.reflect(engine, schema, not args.noviews, tables)

Mapping of embedding is output as Any:

    embedding: Mapped[Optional[Any]] = mapped_column(NullType)

@agronholm
Copy link
Owner

@KellyRousselHoomano is it working for you?

@KellyRousselHoomano
Copy link
Author

KellyRousselHoomano commented Feb 22, 2024

Hi,
Indeed it's working for me on Hoomano's feature-pgvector branch (pip install "sqlacodegen[pgvector]@git+https://github.com/hoomano/sqlacodegen.git@feature-pgvector") but not with pip install "sqlacodegen[pgvector]@git+https://github.com/agronholm/sqlacodegen.git"

Trying to understand what's the difference...

@KellyRousselHoomano
Copy link
Author

@agronholm on this repo master branch src/sqlacodegen/cli.py (lines 21 - 24) import is:

try:
    import pgvector
except ImportError:
    pgvector = None

While in my PR (and on Hoomano's fork branch) it is:

try:
    import pgvector.sqlalchemy
except ImportError:
    pgvector = None

I think this might be the issue !

@agronholm
Copy link
Owner

Indeed. I've pushed a fix now.

@agronholm
Copy link
Owner

There's a new release out now.

@pezafar
Copy link

pezafar commented Feb 22, 2024

All good with 3.0.0rc5, thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants