The system repository hosts the final application of CodeMatch. In this repository, we outline the complete workflow, from retrieving code from the web to detecting code clones for a given code snippet. The system is divided into three core services: Backend, Frontend, and Vector Database.
-
Main Page: Enter the desired code snippet to find existing GitHub projects with similar code.
-
Similar GitHub Projects Page: Displays all the GitHub projects with code similar to the input.
The system consists of two main components essential for its operation:
This includes the structure of the backend and frontend, along with their integration with the database.
![Workflow](https://private-user-images.githubusercontent.com/36690071/399595222-5232c3ac-cc52-42fa-b7b5-8dbecb11dc2a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzAsIm5iZiI6MTczOTEzNjgzMCwicGF0aCI6Ii8zNjY5MDA3MS8zOTk1OTUyMjItNTIzMmMzYWMtY2M1Mi00MmZhLWI3YjUtOGRiZWNiMTFkYzJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZiNmM5NjVmNDhkMGY1ZDYyNzFkMGMzM2M0YzNhMDgyNGZjMGY1Y2Q2NDk1ODEzZDYwZjViMzBlYmVhZjVmYTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0._HjWPwNtUk7IoRKfT1iZ9heoWxmZ_BtuyZbdGCTTFK8)
This step involves retrieving code projects from GitHub to populate the database with data.
![Workflow](https://private-user-images.githubusercontent.com/36690071/399595069-d6656be1-762f-4a78-978d-8db500746e4a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzcxMzAsIm5iZiI6MTczOTEzNjgzMCwicGF0aCI6Ii8zNjY5MDA3MS8zOTk1OTUwNjktZDY2NTZiZTEtNzYyZi00YTc4LTk3OGQtOGRiNTAwNzQ2ZTRhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIxMzM1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZhYzBkOTBlN2E0ODNkOTdiZWY5OTA4MDQ3NDA2Y2ZmNjIzYmE0NTViYjUzZDA4OWYxZWRjM2Y1ZDQxMDQ5ZDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1kSmR1h8BnrSQMfnJ5m43ZxVjYR0wklWvHkCJMuh3Rw)
(This step is done in the following process - populate_database.py)
-
Python: 3.9+
-
Docker Desktop:
Download Docker Desktop -
Qdrant:
Download and Run Qdrant – Follow the Download and Run section for installation. -
Node.js and npm:
a. Install from Node.js – Keep the option checked to install necessary tools.
b. Verify installation:npm -v
-
Clone the Repository:
git clone https://github.com/codematch-llm/system.git
-
Acquire Access to The-Stack-V2 Dataset:
a. Get access to The-Stack-V2 dataset
b. Create a Hugging Face personal access token
c. Add the token to the.env
file in the root directory:HUGGING_FACE_TOKEN=<paste your token here>
-
Ensure you are in the project root directory (
system
). -
Start all services:
docker-compose up
- This launches:
- Backend API at http://localhost:8000
- Frontend at http://localhost:8080
- Qdrant Dashboard at http://localhost:6333/dashboard
- This launches:
-
Populate the Database:
cd backend $env:PYTHONPATH = (Get-Location).Path # Ensure the path ends with 'backend' (check with `Write-Output $env:PYTHONPATH`) python populate_database.py
-
Backend:
cd backend pip install -r requirements.txt
-
Frontend:
cd frontend npm install
-
Ensure you are in the project root directory (ends with
system
). -
Terminal 1: Start the frontend:
cd frontend npm run serve
- Access the Vue.js frontend at http://localhost:8080.
-
Terminal 2: Start the backend:
$env:PYTHONPATH = (Get-Location).Path # Ensure the path ends with 'backend' (check with `Write-Output $env:PYTHONPATH`) cd backend uvicorn backend.main:app --reload
- The backend API will be available at http://localhost:8000.
-
Terminal 3: Start the Qdrant database:
If installed separately, run Qdrant per its documentation. If a script (qdrant_server.py
) is included in this project:cd backend python qdrant_server.py
-
Populate the Vector Database:
cd backend python populate_database.py
- Verify the database is populated by checking the Qdrant dashboard at http://localhost:6333/dashboard.