Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add knowledge retrieval by using RAG #805

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

✨ Add knowledge retrieval by using RAG #805

wants to merge 10 commits into from

Conversation

sasamuku
Copy link
Member

@sasamuku sasamuku commented Mar 4, 2025

2025-03-04.18.47.26.mov

Issue

  • resolve:

Why is this change needed?

What would you like reviewers to focus on?

Testing Verification

What was done

🤖 Generated by PR Agent at 81ac3b5

  • Introduced a knowledge retrieval feature using RAG (Retrieval-Augmented Generation).
    • Added a KnowledgeRetrievalWindow component for user queries and AI responses.
    • Implemented a retrieval API endpoint using vector embeddings.
  • Added vectorization capabilities for text and URLs.
    • Created TextVectorizer and UrlVectorizer components.
    • Developed APIs for text and URL vectorization.
  • Integrated Supabase for vector storage and retrieval.
    • Added a documents table with vector embeddings.
    • Configured Supabase client and vector store in the application.
  • Enhanced the migration web app with a responsive header and navigation.
    • Updated layout to include a Header component.
    • Added pages for vectorization and knowledge retrieval.

Detailed Changes

Relevant files
Enhancement
12 files
KnowledgeRetrievalWindow.tsx
Add Knowledge Retrieval Window component                                 
+160/-0 
UrlVectorizer.tsx
Add URL Vectorizer component                                                         
+92/-0   
route.ts
Implement retrieval API for knowledge queries                       
+94/-0   
TextVectorizer.tsx
Add Text Vectorizer component                                                       
+85/-0   
vectorization.ts
Add vectorization logic for text and URLs                               
+79/-0   
route.ts
Implement vectorization API for text and URLs                       
+49/-0   
Header.tsx
Add responsive header with navigation                                       
+49/-0   
layout.tsx
Update layout to include Header component                               
+5/-1     
page.tsx
Add Vectorize page with text and URL vectorization             
+13/-0   
page.tsx
Add Knowledge Retrieval page                                                         
+14/-0   
vectorStore.ts
Configure vector store with Supabase                                         
+13/-0   
index.ts
Export vectorization and Supabase utilities                           
+3/-0     
Configuration changes
4 files
supabase.ts
Add Supabase client configuration                                               
+6/-0     
20250304050615_remote_schema.sql
Add Supabase migration for documents table and vector functions
+80/-0   
.env.template
Add Supabase environment variables                                             
+2/-0     
pnpm-workspace.yaml
Exclude migration-web from workspace packages                       
+2/-0     
Formatting
7 files
KnowledgeRetrievalWindow.module.css
Add styles for Knowledge Retrieval Window                               
+127/-0 
TextVectorizer.module.css
Add styles for Text Vectorizer component                                 
+77/-0   
UrlVectorizer.module.css
Add styles for URL Vectorizer component                                   
+74/-0   
Header.module.css
Add styles for Header component                                                   
+66/-0   
globals.css
Update global styles for layout and main content                 
+11/-0   
page.module.css
Add styles for Knowledge Retrieval page                                   
+15/-0   
page.module.css
Add styles for Vectorize page                                                       
+14/-0   
Dependencies
2 files
dependency_decisions.yml
Approve new dependencies and licenses                                       
+15/-6   
package.json
Update dependencies and Supabase CLI commands                       
+4/-2     
Additional files
1 files
pnpm-lock.yaml +910/-156

Additional Notes


Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Copy link

    changeset-bot bot commented Mar 4, 2025

    ⚠️ No Changeset found

    Latest commit: 81ac3b5

    Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

    This PR includes no changesets

    When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

    Click here to learn what changesets are, and how to add one.

    Click here if you're a maintainer who wants to add a changeset to this PR

    Copy link

    vercel bot commented Mar 4, 2025

    The latest updates on your projects. Learn more about Vercel for Git ↗︎

    Name Status Preview Comments Updated (UTC)
    liam-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 4, 2025 9:53am
    liam-erd-sample ✅ Ready (Inspect) Visit Preview Mar 4, 2025 9:53am
    3 Skipped Deployments
    Name Status Preview Comments Updated (UTC)
    test-liam-docs ⬜️ Ignored (Inspect) Mar 4, 2025 9:53am
    test-liam-erd-sample ⬜️ Ignored (Inspect) Mar 4, 2025 9:53am
    test-liam-erd-web ⬜️ Ignored (Inspect) Mar 4, 2025 9:53am

    Copy link
    Contributor

    qodo-merge-pro-for-open-source bot commented Mar 4, 2025

    CI Feedback 🧐

    (Feedback updated until commit 81ac3b5)

    A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

    Action: run-e2e / e2e-test (Mobile Safari)

    Failed stage: Run e2e tests [❌]

    Failed test name: [Mobile Safari] › tests/vrt/vrt.test.ts › top

    Failure summary:

    The action failed due to two test failures in the Mobile Safari environment:
    1. Visual Regression
    Test (VRT) failure: The top test failed because the screenshot comparison did not match the expected
    result. The test was retried 3 times but continued to fail.
    2. E2E Test timeout: The test "Table
    node should be highlighted when clicked" failed due to a timeout of 10000ms while trying to execute
    tableNode.click().

    Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    181:  Scope: all 11 workspace projects
    182:  Lockfile is up to date, resolution step is skipped
    183:  Progress: resolved 1, reused 0, downloaded 0, added 0
    184:  Packages: +1440
    185:  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    186:  Progress: resolved 1440, reused 1282, downloaded 0, added 0
    187:  Progress: resolved 1440, reused 1420, downloaded 0, added 703
    188:  Progress: resolved 1440, reused 1420, downloaded 0, added 1440, done
    189:  WARN  Failed to create bin at /home/runner/work/liam/liam/frontend/apps/erd-sample/node_modules/.bin/liam. ENOENT: no such file or directory, open '/home/runner/work/liam/liam/frontend/packages/cli/dist-cli/bin/cli.js'
    ...
    
    194:  + @turbo/gen 2.1.2
    195:  + syncpack 13.0.0
    196:  + turbo 2.1.2
    197:  frontend/apps/docs postinstall$ fumadocs-mdx
    198:  frontend/apps/docs postinstall: [MDX] types generated
    199:  frontend/apps/docs postinstall: Done
    200:  frontend/apps/erd-web postinstall$ cp ../../packages/db-structure/node_modules/@ruby/prism/src/prism.wasm prism.wasm
    201:  frontend/apps/erd-web postinstall: Done
    202:  WARN  Failed to create bin at /home/runner/work/liam/liam/frontend/apps/erd-sample/node_modules/.bin/liam. ENOENT: no such file or directory, open '/home/runner/work/liam/liam/frontend/apps/erd-sample/node_modules/@liam-hq/cli/dist-cli/bin/cli.js'
    ...
    
    1386:  env:
    1387:  CI: true
    1388:  PNPM_HOME: /home/runner/setup-pnpm/node_modules/.bin
    1389:  URL: https://liam-erd-avlqx7vt0-route-06-core.vercel.app
    1390:  ##[endgroup]
    1391:  Running 17 tests using 1 worker
    1392:  °°°·°×±·····°°°°°××F
    1393:  1) [Mobile Safari] › tests/vrt/vrt.test.ts:21:5 › top ────────────────────────────────────────────
    1394:  Error: �[2mexpect(�[22m�[31mpage�[39m�[2m).�[22mtoHaveScreenshot�[2m(�[22m�[32mexpected�[39m�[2m)�[22m
    ...
    
    1425:  ────────────────────────────────────────────────────────────────────────────────────────────────
    1426:  attachment #2: top-1-actual.png (image/png) ────────────────────────────────────────────────────
    1427:  test-results/vrt-vrt-top-Mobile-Safari/top-1-actual.png
    1428:  ────────────────────────────────────────────────────────────────────────────────────────────────
    1429:  attachment #3: top-1-diff.png (image/png) ──────────────────────────────────────────────────────
    1430:  test-results/vrt-vrt-top-Mobile-Safari/top-1-diff.png
    1431:  ────────────────────────────────────────────────────────────────────────────────────────────────
    1432:  Retry #1 ───────────────────────────────────────────────────────────────────────────────────────
    1433:  Error: �[2mexpect(�[22m�[31mpage�[39m�[2m).�[22mtoHaveScreenshot�[2m(�[22m�[32mexpected�[39m�[2m)�[22m
    ...
    
    1469:  test-results/vrt-vrt-top-Mobile-Safari-retry1/top-1-diff.png
    1470:  ────────────────────────────────────────────────────────────────────────────────────────────────
    1471:  attachment #4: trace (application/zip) ─────────────────────────────────────────────────────────
    1472:  test-results/vrt-vrt-top-Mobile-Safari-retry1/trace.zip
    1473:  Usage:
    1474:  pnpm exec playwright show-trace test-results/vrt-vrt-top-Mobile-Safari-retry1/trace.zip
    1475:  ────────────────────────────────────────────────────────────────────────────────────────────────
    1476:  Retry #2 ───────────────────────────────────────────────────────────────────────────────────────
    1477:  Error: �[2mexpect(�[22m�[31mpage�[39m�[2m).�[22mtoHaveScreenshot�[2m(�[22m�[32mexpected�[39m�[2m)�[22m
    ...
    
    1509:  attachment #2: top-1-actual.png (image/png) ────────────────────────────────────────────────────
    1510:  test-results/vrt-vrt-top-Mobile-Safari-retry2/top-1-actual.png
    1511:  ────────────────────────────────────────────────────────────────────────────────────────────────
    1512:  attachment #3: top-1-diff.png (image/png) ──────────────────────────────────────────────────────
    1513:  test-results/vrt-vrt-top-Mobile-Safari-retry2/top-1-diff.png
    1514:  ────────────────────────────────────────────────────────────────────────────────────────────────
    1515:  2) [Mobile Safari] › tests/e2e/page.test.ts:25:5 › Table node should be highlighted when clicked ─
    1516:  �[31mTest timeout of 10000ms exceeded.�[39m
    1517:  Error: locator.click: Test timeout of 10000ms exceeded.
    ...
    
    1527:  > 33 |   await tableNode.click()
    1528:  |                   ^
    1529:  34 |
    1530:  35 |   await expect(
    1531:  36 |     tableNode.locator('div[class^="TableNode_wrapper"]'),
    1532:  at /home/runner/work/liam/liam/frontend/packages/e2e/tests/e2e/page.test.ts:33:19
    1533:  Slow test file: [Mobile Safari] › tests/e2e/toolbar.test.ts (55.9s)
    1534:  Consider running tests from slow files in parallel, see https://playwright.dev/docs/test-parallel.
    1535:  1 failed
    1536:  [Mobile Safari] › tests/vrt/vrt.test.ts:21:5 › top ─────────────────────────────────────────────
    1537:  1 flaky
    1538:  [Mobile Safari] › tests/e2e/page.test.ts:25:5 › Table node should be highlighted when clicked ──
    1539:  9 skipped
    1540:  6 passed (2.1m)
    1541:  ##[error]Process completed with exit code 1.
    

    @@ -0,0 +1,80 @@
    create extension if not exists "vector" with schema "public" version '0.8.0';
    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    📝 Use template for LangChain from Quick Start.
    see: https://supabase.com/docs/guides/ai/langchain#initializing-your-database

    @sasamuku sasamuku changed the title ✨ Add schema review by using RAG ✨ Add knowledge retrieval by using RAG Mar 4, 2025
    @sasamuku sasamuku marked this pull request as ready for review March 4, 2025 09:59
    @sasamuku sasamuku requested a review from a team as a code owner March 4, 2025 09:59
    @sasamuku sasamuku requested review from hoshinotsuyoshi, FunamaYukina, junkisai, MH4GF and NoritakaIkeda and removed request for a team March 4, 2025 09:59
    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 Security concerns

    Database permissions:
    The Supabase migration grants excessive permissions (delete, insert, update, truncate) to anonymous and authenticated users on the documents table. This could allow malicious users to manipulate or delete vector embeddings and stored documents. Consider restricting permissions to only what's necessary for the application to function.

    ⚡ Recommended focus areas for review

    Invalid Model Name

    The OpenAI model name 'gpt-4o-mini' appears to be invalid. This could cause runtime errors when making API calls to OpenAI.

    model: 'gpt-4o-mini',
    Security Concern

    The migration grants extensive database permissions (delete, insert, update, etc.) to anonymous and authenticated users on the documents table. This could allow unauthorized data manipulation.

    grant delete on table "public"."documents" to "anon";
    
    grant insert on table "public"."documents" to "anon";
    
    grant references on table "public"."documents" to "anon";
    
    grant select on table "public"."documents" to "anon";
    
    grant trigger on table "public"."documents" to "anon";
    
    grant truncate on table "public"."documents" to "anon";
    
    grant update on table "public"."documents" to "anon";
    
    grant delete on table "public"."documents" to "authenticated";
    
    grant insert on table "public"."documents" to "authenticated";
    
    grant references on table "public"."documents" to "authenticated";
    
    grant select on table "public"."documents" to "authenticated";
    
    grant trigger on table "public"."documents" to "authenticated";
    
    grant truncate on table "public"."documents" to "authenticated";
    
    grant update on table "public"."documents" to "authenticated";
    Environment Variables

    The code uses empty string fallbacks for required environment variables. This could cause silent failures if the variables are not properly configured.

    process.env.SUPABASE_URL || '',
    process.env.SUPABASE_SERVICE_ROLE_KEY || '',

    Copy link
    Contributor

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Security
    Restrict anonymous user permissions

    Restrict permissions for anonymous users by removing grants to the 'anon' role.
    Anonymous users should not have full access to modify sensitive document data.

    frontend/apps/migration-web/supabase/migrations/20250304050615_remote_schema.sql [40-52]

    -grant delete on table "public"."documents" to "anon";
    -grant insert on table "public"."documents" to "anon";
    -grant references on table "public"."documents" to "anon";
    +-- Remove all grants for anon role, only allow select
     grant select on table "public"."documents" to "anon";
    -grant trigger on table "public"."documents" to "anon";
    -grant truncate on table "public"."documents" to "anon";
    -grant update on table "public"."documents" to "anon";
    • Apply this suggestion
    Suggestion importance[1-10]: 10

    __

    Why: This is a critical security suggestion. Granting full table access permissions (delete, insert, update, truncate) to anonymous users creates significant security risks and potential data exposure vulnerabilities. Limiting anonymous users to read-only access is a crucial security best practice.

    High
    Possible issue
    Fix invalid model name

    The model name 'gpt-4o-mini' appears to be a typo. Use a valid OpenAI model name
    like 'gpt-4' or 'gpt-3.5-turbo' to ensure the API works correctly.

    frontend/apps/migration-web/app/api/retrieve/route.ts [65-68]

     const model = new ChatOpenAI({
       temperature: 0.7,
    -  model: 'gpt-4o-mini',
    +  model: 'gpt-4',
     })
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    __

    Why: The model name 'gpt-4o-mini' is incorrect and would cause API failures. Using a valid OpenAI model name is critical for the application's core functionality.

    High
    Fix incorrect Supabase CLI command

    The supabase:pull script command is using an incorrect syntax. The supabase db
    pull command is needed to pull the database schema, while supabase pull is for
    pulling environment variables.

    frontend/apps/migration-web/package.json [37]

    -"supabase:pull": "pnpm supabase pull",
    +"supabase:pull": "pnpm supabase db pull",

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 8

    __

    Why: The suggestion correctly identifies and fixes a critical issue where the wrong Supabase CLI command was being used, which would prevent proper database schema synchronization.

    Medium
    General
    Fix auto-scroll dependency

    The useEffect hook for scrolling should have retrievalResult as a dependency to
    ensure auto-scrolling happens when new content is added.

    frontend/apps/migration-web/components/KnowledgeRetrievalWindow.tsx [35-40]

     useEffect(() => {
       if (resultContainerRef.current) {
         resultContainerRef.current.scrollTop =
           resultContainerRef.current.scrollHeight
       }
    -}, [])
    +}, [retrievalResult])
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    __

    Why: Missing dependency would prevent auto-scrolling when new content is added, affecting user experience as they would need to manually scroll to see new responses.

    Medium
    • More

    chunkCount: number
    }

    export async function vectorizeUrl(url: string): Promise<VectorizationResult> {
    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Copy link
    Member

    @MH4GF MH4GF left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM 👍🏻 Cool!!

    Comment on lines +5 to +13
    const embeddings = new OpenAIEmbeddings({
    modelName: 'text-embedding-3-small',
    })

    export const vectorStore = new SupabaseVectorStore(embeddings, {
    client: supabaseClient,
    tableName: 'documents',
    queryName: 'match_documents',
    })
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    📝 if we use other Models, It may be necessary to have a corresponding embeddings model and vectorStore for each.

    Comment on lines +27 to +30
    Relevant Context:
    """
    {context}
    """
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I see 👀

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants