A Django-powered medical document search engine that utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) approach for indexing and retrieval. It leverages the clinicaltrials/2021/trec-ct-2021 dataset from ir-datasets, boasting a collection of 376,000 medical documents.
TF-IDF based indexing: Efficiently index documents based on term frequency and document frequency to identify relevant documents for user queries. Django backend: Robust and secure backend framework for managing document processing, indexing, and searching. Tailwind CSS front-end: Modern and responsive web interface built with Tailwind CSS for a sleek and user-friendly experience. Searchable by keywords and filters: Find relevant documents by entering keywords and refining results with additional filters. Highlighting of matched keywords: Easily identify relevant terms within retrieved documents for quick scanning.
- Back-end: Python, Django
- Front-end: HTML, CSS, Tailwind
- Indexing: TF-IDF
- Dataset: clinicaltrials/2021/trec-ct-2021 (376,000 medical documents)
- Clone the repository:
git clone https://github.com/daffafaizan/nubengine.git
- Set up a virtual environment and install dependencies:
python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt
- Configure database settings in settings.py
- Run database migrations:
python manage.py migrate
- Run the development server:
python manage.py runserver
Visithttp://127.0.0.1:8000
in your browser and start searching!
We welcome contributions to improve and expand the features of this project. Feel free to fork the repository, make changes, and create pull requests!