🧠 Machine Learning NLP: Drugs Reviews

Our project aims to leverage advanced techniques in natural language processing (NLP) and machine learning (ML) to analyze and extract valuable insights from a dataset containing medication reviews. The objective is to understand the relationship between patients' opinions on medications and the quantitative assessments of their effectiveness and side effects. Additionally, we will incorporate various other variables to enhance the reliability and realism of our results, thereby gaining a comprehensive understanding of the impact of different drugs on various patients.

🗂️ Project Overview

1. 📚 Text Preprocessing and Vectorization

Text Preprocessing:

Objective: Prepare text data for analysis by reducing noise and variability.
Steps: Transform raw text into a structured format to facilitate feature extraction. This involves cleaning and normalizing text, such as removing irrelevant characters and standardizing terms.
Outcome: Structured text data ready for vectorization.

Vectorization:

Objective: Convert preprocessed text into numerical representations usable by machine learning models.
Methods:
- Bag-of-Words (BoW) and TF-IDF: Represent text by counting word frequencies and term importance.
- Word2Vec/Glove: Generate word embeddings based on context.
- Doc2Vec: Create document-level embeddings by considering the entire document context.
- LDA (Latent Dirichlet Allocation): Extract topics from the text corpus and represent documents as topic distributions.
Preparation: Clean dataset by removing infrequent and overly common words to refine the vector representations.

2. 🤖 Machine Learning Models

Classification:

Objective: Predict the effectiveness of drugs based on review types (benefits, side effects, comments).
Approach:
- Models Tested: K-Nearest Neighbors (KNN) and Support Vector Classification (SVC).
- Process: Apply cross-validation to select optimal models and configurations for each type of review.
- Comparison: Evaluate model performance using different vectorization techniques (BoW, TF-IDF, Word2Vec, LDA).

Regression:

Objective: Predict drug ratings based on review text using various regression models.
Models Tested: Linear Regression, Random Forest, Gradient Boosting, and Support Vector Regression (SVR).
Techniques: Utilize vectorization methods (BoW, TF-IDF, Word2Vec) and dimensionality reduction (e.g., Singular Value Decomposition) to improve prediction accuracy.
Process: Compare model performance with and without dimensionality reduction to assess impact on prediction.

3. 📊 Dashboard

Objective: Provide an interactive tool for visualizing and analyzing the dataset and model performance.

Tabs:
- Tab 1: LDA Topic Visualization: Interactive charts and pyLDAvis visualizations showing topic distribution in the corpus.
- Tab 2: Topic-Document Similarity Heatmap: Matrix visualizing document-topic probabilities, helping to understand document distribution across topics.
- Tab 3: Classification Model Evaluation: Performance graphs for SVC and KNN, allowing users to explore model accuracy across different configurations and datasets.
- Tab 4: Regression Hyperparameter Tuning: Visualization of RMSE impacts from hyperparameter adjustments in various regression models.

📄 Conclusions

Detailed conclusions and insights from this project are available in the .pdf written report

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
DashboardML		DashboardML
README.md		README.md
Report_NLP_Diego_Hernandez.pdf		Report_NLP_Diego_Hernandez.pdf
_Celia_Ramperez_Diego_Hernandez_Wouter_Hendrickx_Anthony_Mendoza.ipynb		_Celia_Ramperez_Diego_Hernandez_Wouter_Hendrickx_Anthony_Mendoza.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Machine Learning NLP: Drugs Reviews

🗂️ Project Overview

1. 📚 Text Preprocessing and Vectorization

2. 🤖 Machine Learning Models

3. 📊 Dashboard

📄 Conclusions

About

Releases

Packages

Languages

Diego-HernSua/Drugs_Reviews_NLP_ML

Folders and files

Latest commit

History

Repository files navigation

🧠 Machine Learning NLP: Drugs Reviews

🗂️ Project Overview

1. 📚 Text Preprocessing and Vectorization

2. 🤖 Machine Learning Models

3. 📊 Dashboard

📄 Conclusions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages