The system repository hosts the final application of CodeMatch. In this repository, we outline the complete workflow, from retrieving code from the web to detecting code clones for a given code snippet. The system is divided into three core services: Backend, Frontend, and Vector Database.
-
Main Page: Enter the desired code snippet to find existing GitHub projects with similar code.
-
Similar GitHub Projects Page: Displays all the GitHub projects with code similar to the input.
The system consists of two main components essential for its operation:
This includes the structure of the backend and frontend, along with their integration with the database.
![Workflow](https://private-user-images.githubusercontent.com/36690071/399595222-5232c3ac-cc52-42fa-b7b5-8dbecb11dc2a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxOTE3NDYsIm5iZiI6MTczOTE5MTQ0NiwicGF0aCI6Ii8zNjY5MDA3MS8zOTk1OTUyMjItNTIzMmMzYWMtY2M1Mi00MmZhLWI3YjUtOGRiZWNiMTFkYzJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDEyNDQwNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQzNTRlNDI1ZDU4Mjc1NTI5YjIzMGJmMWRiYjI5Yjc0MDExZGQzMTM1MTdlY2YyZjNiMDE1ZWFkM2YzMGM2NGImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.VLx5WOIhT1UsldNj8jL0kS5h6wtlnczD2Oy8Uv1_K80)
This step involves retrieving code projects from GitHub to populate the database with data.
![Workflow](https://private-user-images.githubusercontent.com/36690071/399595069-d6656be1-762f-4a78-978d-8db500746e4a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxOTE3NDYsIm5iZiI6MTczOTE5MTQ0NiwicGF0aCI6Ii8zNjY5MDA3MS8zOTk1OTUwNjktZDY2NTZiZTEtNzYyZi00YTc4LTk3OGQtOGRiNTAwNzQ2ZTRhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDEyNDQwNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBmNGJlNmM0NDQyZDE3N2YwYjcyZDZhOTllMGEzMWY5ODUyMWJjMWZhYTdiNjI0YTMyNjVhOWQ1NDFkYTc4YmMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.wNzn2RMsqw03NVnV8uDkAxv552tK8bLGZpKNpvKlYQw)
(This step is done in the following process - populate_database.py)
-
Python: 3.9+
-
Docker Desktop:
Download Docker Desktop -
Qdrant:
Download and Run Qdrant – Follow the Download and Run section for installation. -
Node.js and npm:
a. Install from Node.js – Keep the option checked to install necessary tools.
b. Verify installation:npm -v
-
Clone the Repository:
git clone https://github.com/codematch-llm/system.git
-
Acquire Access to The-Stack-V2 Dataset:
a. Get access to The-Stack-V2 dataset
b. Create a Hugging Face personal access token
c. Add the token to the.env
file in the root directory:HUGGING_FACE_TOKEN=<paste your token here>
-
Ensure you are in the project root directory (
system
). -
Start all services:
docker-compose up
- This launches:
- Backend API at http://localhost:8000
- Frontend at http://localhost:8080
- Qdrant Dashboard at http://localhost:6333/dashboard
- This launches:
-
Populate the Database:
cd backend $env:PYTHONPATH = (Get-Location).Path # Ensure the path ends with 'backend' (check with `Write-Output $env:PYTHONPATH`) python populate_database.py
-
Backend:
cd backend pip install -r requirements.txt
-
Frontend:
cd frontend npm install
-
Ensure you are in the project root directory (ends with
system
). -
Terminal 1: Start the frontend:
cd frontend npm run serve
- Access the Vue.js frontend at http://localhost:8080.
-
Terminal 2: Start the backend:
$env:PYTHONPATH = (Get-Location).Path # Ensure the path ends with 'backend' (check with `Write-Output $env:PYTHONPATH`) cd backend uvicorn backend.main:app --reload
- The backend API will be available at http://localhost:8000.
-
Terminal 3: Start the Qdrant database:
If installed separately, run Qdrant per its documentation. If a script (qdrant_server.py
) is included in this project:cd backend python qdrant_server.py
-
Populate the Vector Database:
cd backend python populate_database.py
- Verify the database is populated by checking the Qdrant dashboard at http://localhost:6333/dashboard.