Skip to content
/ pgai Public

A suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL

License

Notifications You must be signed in to change notification settings

timescale/pgai

pgai pgai

Power your RAG and Agentic applications with PostgreSQL


A Python library that transforms PostgreSQL into a robust, production-ready retrieval engine for RAG and Agentic applications.

  • 🔄 Automatically create and synchronize vector embeddings from PostgreSQL data and S3 documents. Embeddings update automatically as data changes.

  • 🔍 Powerful vector and semantic search with pgvector and pgvectorscale.

  • 🛡️ Production-ready out-of-the-box: Supports batch processing for efficient embedding generation, with built-in handling for model failures, rate limits, and latency spikes.

  • 🐘 Works with any PostgreSQL database, including Timescale Cloud, Amazon RDS, Supabase and more.

Auto Create and Sync Vector Embeddings in 1 Line of SQL (pgai Vectorizer)

install via pip

pip install pgai

Quick Start

This quickstart demonstrates how pgai Vectorizer enables semantic search and RAG over PostgreSQL data by automatically creating and synchronizing embeddings as data changes.

The key "secret sauce" of pgai Vectorizer is its declarative approach to embedding generation. Simply define your pipeline and let Vectorizer handle the operational complexity of keeping embeddings in sync, even when embedding endpoints are unreliable. You can define a simple version of the pipeline as follows:

SELECT ai.create_vectorizer(
     'wiki'::regclass,
     loading => ai.loading_column(column_name=>'text'),
     destination => ai.destination_table(target_table=>'wiki_embedding_storage'),
     embedding => ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536')
    )

The vectorizer will automatically create embeddings for all the rows in the wiki table, and, more importantly, will keep the embeddings synced with the underlying data as it changes. Think of it almost like declaring an index on the wiki table, but instead of the database managing the index datastructure for you, the Vectorizer is managing the embeddings.

Running the quick start

Prerequisites:

Create a .env file with the following:

OPENAI_API_KEY=<your-openai-api-key>
DB_URL=<your-database-url>

You can download the full python code and requirements.txt from the quickstart example and run it in the same directory as the .env file.

Click here for a bash script to run the quickstart
curl -O https://raw.githubusercontent.com/timescale/pgai/main/examples/quickstart/main.py
curl -O https://raw.githubusercontent.com/timescale/pgai/main/examples/quickstart/requirements.txt
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py
Sample output:
Click to expand sample output
Search results 1:
[WikiSearchResult(id=7,
                  url='https://en.wikipedia.org/wiki/Aristotle',
                  title='Aristotle',
                  text='Aristotle (;  Aristotélēs, ; 384–322\xa0BC) was an '
                       'Ancient Greek philosopher and polymath. His writings '
                       'cover a broad range of subjects spanning the natural '
                       'sciences, philosophy, linguistics, economics, '
                       'politics, psychology and the arts. As the founder of '
                       'the Peripatetic school of philosophy in the Lyceum in '
                       'Athens, he began the wider Aristotelian tradition that '
                       'followed, which set the groundwork for the development '
                       'of modern science.\n'
                       '\n'
                       "Little is known about Aristotle's life. He was born in "
                       'the city of Stagira in northern Greece during the '
                       'Classical period. His father, Nicomachus, died when '
                       'Aristotle was a child, and he was brought up by a '
                       "guardian. At 17 or 18 he joined Plato's Academy in "
                       'Athens and remained there till the age of 37 (). '
                       'Shortly after Plato died, Aristotle left Athens and, '
                       'at the request of Philip II of Macedon, tutored his '
                       'son Alexander the Great beginning in 343 BC. He '
                       'established a library in the Lyceum which helped him '
                       'to produce many of his hundreds of books on papyru',
                  chunk='Aristotle (;  Aristotélēs, ; 384–322\xa0BC) was an '
                        'Ancient Greek philosopher and polymath. His writings '
                        'cover a broad range of subjects spanning the natural '
                        'sciences, philosophy, linguistics, economics, '
                        'politics, psychology and the arts. As the founder of '
                        'the Peripatetic school of philosophy in the Lyceum in '
                        'Athens, he began the wider Aristotelian tradition '
                        'that followed, which set the groundwork for the '
                        'development of modern science.',
                  distance=0.22242502364217387)]
Search results 2:
[WikiSearchResult(id=41,
                  url='https://en.wikipedia.org/wiki/pgai',
                  title='pgai',
                  text='pgai is a Python library that turns PostgreSQL into '
                       'the retrieval engine behind robust, production-ready '
                       'RAG and Agentic applications. It does this by '
                       'automatically creating vector embeddings for your data '
                       'based on the vectorizer you define.',
                  chunk='pgai is a Python library that turns PostgreSQL into '
                        'the retrieval engine behind robust, production-ready '
                        'RAG and Agentic applications. It does this by '
                        'automatically creating vector embeddings for your '
                        'data based on the vectorizer you define.',
                  distance=0.13639101792546204)]
RAG response:
The main thing pgai does right now is generating vector embeddings for data in PostgreSQL databases based on the vectorizer defined by the user, enabling the creation of robust RAG and Agentic applications.

Code walkthrough

Install the pgai database components

pgai.install(DB_URL)

Create the vectorizer

This defines the vectorizer, which tells the system how to create the embeddings from the text column in the wiki table. The vectorizer creates a view wiki_embedding that we can query for the embeddings (as we'll see below).

async def create_vectorizer(conn: psycopg.AsyncConnection):
    async with conn.cursor() as cur:    
        await cur.execute("""
            SELECT ai.create_vectorizer(
                'wiki'::regclass,
                if_not_exists => true,
                loading => ai.loading_column(column_name=>'text'),
                embedding => ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536'),
                destination => ai.destination_table(view_name=>'wiki_embedding')
            )
        """)   
    await conn.commit()

Run the vectorizer worker

In this example, we run the vectorizer worker once to create the embeddings for the existing data.

worker = Worker(DB_URL, once=True)
worker.run()

In a real application, we would not call the worker manually like this every time we want to create the embeddings. Instead, we would run the worker in the background and it would run continuously, polling for work from the vectorizer.

You can run the worker in the background from the application, the cli, or docker. See the vectorizer worker documentation for more details.

Search the wiki articles using semantic search

This is standard pgvector semantic search in PostgreSQL. The search is performed against the wiki_embedding view, which is created by the vectorizer and includes all the columns from the wiki table plus the embedding column and the chunk text. This function returns both the entire text column from the wiki table and smaller chunks of the text that are most relevant to the query.

@dataclass
class WikiSearchResult:
    id: int
    url: str
    title: str
    text: str
    chunk: str
    distance: float

async def _find_relevant_chunks(client: AsyncOpenAI, query: str, limit: int = 1) -> List[WikiSearchResult]:
    # Generate embedding for the query using OpenAI's API
    response = await client.embeddings.create(
        model="text-embedding-ada-002",
        input=query,
        encoding_format="float",
    )
    
    embedding = np.array(response.data[0].embedding)
    
    # Query the database for the most similar chunks using pgvector's cosine distance operator (<=>)
    async with pool.connection() as conn:
        async with conn.cursor(row_factory=class_row(WikiSearchResult)) as cur:
            await cur.execute("""
                SELECT w.id, w.url, w.title, w.text, w.chunk, w.embedding <=> %s as distance
                FROM wiki_embedding w
                ORDER BY distance
                LIMIT %s
            """, (embedding, limit))
            
            return await cur.fetchall()

Insert a new article into the wiki table

This code is notable for what it is not doing. This is a simple insert of a new article into the wiki table. We did not need to do anything different to create the embeddings, the vectorizer worker will take care of updating the embeddings as the data changes.

def insert_article_about_pgai(conn: psycopg.AsyncConnection):
    async with conn.cursor(row_factory=class_row(WikiSearchResult)) as cur:
        await cur.execute("""
            INSERT INTO wiki (url, title, text) VALUES
            ('https://en.wikipedia.org/wiki/pgai', 'pgai', 'pgai is a Python library that turns PostgreSQL into the retrieval engine behind robust, production-ready RAG and Agentic applications. It does this by automatically creating vector embeddings for your data based on the vectorizer you define.')
        """)
    await conn.commit() 

Perform RAG with the LLM

This code performs RAG with the LLM. It uses the _find_relevant_chunks function defined above to find the most relevant chunks of text from the wiki table and then uses the LLM to generate a response.

    query = "What is the main thing pgai does right now?"
    relevant_chunks = await _find_relevant_chunks(client, query)
    context = "\n\n".join(
        f"{chunk.title}:\n{chunk.text}" 
        for chunk in relevant_chunks
    )
    prompt = f"""Question: {query}

Please use the following context to provide an accurate response:   

{context}

Answer:"""

    response = await client.chat.completions.create({
        model: "gpt-3.5-turbo",
        messages: [{ role: "user", content: prompt }],
    })
    print("RAG response:")
    print(response.choices[0].message.content)

Next steps

Look for other quickstarts:

  • Quickstart with FastAPI and psycopg here

Explore more about the vectorizer:

Features

Our pgai Python library lets you work with embeddings generated from your data:

  • Automatically create and sync vector embeddings for your data using the vectorizer.
  • Load data from a column in your table or from a file, s3 bucket, etc.
  • Create multiple embeddings for the same data with different models and parameters for testing and experimentation.
  • Customize how your embedding pipeline parses, chunks, formats, and embeds your data.

You can use the vector embeddings to:

  • Perform semantic search using pgvector.
  • Implement Retrieval Augmented Generation (RAG)
  • Perform high-performance, cost-efficient ANN search on large vector workloads with pgvectorscale, which complements pgvector.

We also offer a PostgreSQL extension that can perform LLM model calling directly from SQL. This is often useful for use cases like classification, summarization, and data enrichment on your existing data.

A configurable vectorizer pipeline

The vectorizer is designed to be flexible and customizable. Each vectorizer defines a pipeline for creating embeddings from your data. The pipeline is defined by a series of components that are applied in sequence to the data:

  • Loading: First, you define the source of the data to embed. It can be the data stored directly in a column of the source table or a URI referenced in a column of the source table that points to a file, s3 bucket, etc.
  • Parsing: Then, you define the way the data is parsed if it is a non-text document such as a PDF, HTML, or markdown file.
  • Chunking: Next, you define the way text data is split into chunks.
  • Formatting: Then, for each chunk, you define the way the data is formatted before it is sent for embedding. For example, you can add the title of the document as the first line of the chunk.
  • Embedding: Finally, you specify the LLM provider, model, and the parameters to be used when generating the embeddings.

Supported embedding models

The following models are supported for embedding:

The devil is in the error handling

Simply creating vector embeddings is easy and straightforward. The challenge is that LLMs are somewhat unreliable and the endpoints exhibit intermittent failures and/or degraded performance. A critical part of properly handling failures is that your primary data-modification operations (INSERT, UPDATE, DELETE) should not be dependent on the embedding operation. Otherwise, your application will be down every time the endpoint is slow or fails and your user experience will suffer.

Normally, you would need to implement a custom MLops pipeline to properly handle endpoint failures. This commonly involves queuing system like Kafka, specialized workers, and other infrastructure for handling the queue and retrying failed requests. This is a lot of work and it is easy to get wrong.

With pgai, you can skip all that and focus on building your application because the vectorizer is managing the embeddings for you. We have built in queueing and retry logic to handle the various failure modes you can encounter. Because we do this work in the background, the primary data modification operations are not dependent on the embedding operation. This is why pgai is production-ready out of the box.

Many specialized vector databases create embeddings for you. However, they typically fail when embedding endpoints are down or degraded, placing the burden of error handling and retries back on you.

Resources

Why we built it

Quick start guides

Tutorials about pgai vectorizer

Contributing

We welcome contributions to pgai! See the Contributing page for more information.

Get involved

pgai is still at an early stage. Now is a great time to help shape the direction of this project; we are currently deciding priorities. Have a look at the list of features we're thinking of working on. Feel free to comment, expand the list, or hop on the Discussions forum.

To get started, take a look at how to contribute and how to set up a dev/test environment.

About Timescale

Timescale is a PostgreSQL database company. To learn more visit the timescale.com.

Timescale Cloud is a high-performance, developer focused, cloud platform that provides PostgreSQL services for the most demanding AI, time-series, analytics, and event workloads. Timescale Cloud is ideal for production applications and provides high availability, streaming backups, upgrades over time, roles and permissions, and great security.