ToxiScan

ToxiScan is an advanced text analysis tool designed to detect toxicity in textual data. By leveraging the power of Natural Language Toolkit (NLTK), TfidfVectorizer, and the Naive Bayes classifier, ToxiScan provides accurate predictions on whether a given text is toxic or non-toxic. With its simple user interface built using Streamlit, ToxiScan makes toxicity analysis easily accessible to users.

Key Features

Toxicity Detection: ToxiScan uses the Naive Bayes classifier, trained on a diverse dataset of labeled toxic and non-toxic comments, to predict the presence of toxicity in a given text.
Text Preprocessing: ToxiScan employs NLTK, a powerful natural language processing library, for comprehensive text preprocessing. It performs essential tasks such as tokenization, part-of-speech tagging, lemmatization, and stopword removal to ensure the input text is properly prepared for analysis.
Feature Extraction: TfidfVectorizer is utilized to extract relevant features from the preprocessed text. This vectorization technique transforms text into numerical feature vectors, enabling the Naive Bayes classifier to make predictions.
Accuracy Evaluation: To assess the performance of the classifier, ToxiScan employs metrics such as roc_auc_score and roc_curve, providing insights into the accuracy and efficiency of the toxicity detection model.

Training Data

The training data used for ToxiScan was obtained from Kaggle, specifically the "Toxic Tweets Dataset" created by ASHWIN U IYER. The dataset consists of a collection of labeled toxic and non-toxic tweets, providing valuable examples for training the Naive Bayes classifier. The use of this dataset ensures the model's ability to recognize patterns and features indicative of toxicity in various text inputs.

Installation

To run ToxiScan on your local machine, follow these steps:

Clone the repository:

git clone https://github.com/<username>/<repository>.git
cd <repository>

Install the required dependencies:

pip install -r requirements.txt

Launch the ToxiScan application:

streamlit run toxiscan.py

Access ToxiScan in your web browser:

http://localhost:8501

Usage

Input Text: Enter the text you want to analyze for toxicity in the provided text input box.
Analyze: Click the "Analyze" button to trigger the toxicity prediction process.
Result: ToxiScan will display the prediction result, indicating whether the text is classified as toxic or non-toxic.

Dependencies

ToxiScan utilizes the following libraries and resources:

NLTK - Natural Language Toolkit for text preprocessing.
Scikit-learn - Machine learning library for feature extraction and classification.
Streamlit - Framework for building interactive web applications.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
FinalBalancedDataset.csv		FinalBalancedDataset.csv
README.md		README.md
app.py		app.py
model.py		model.py
requirements.txt		requirements.txt
tf_idf.pkt		tf_idf.pkt
toxicity_model.pkt		toxicity_model.pkt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToxiScan

Key Features

Training Data

Installation

Usage

Dependencies

About

Releases

Packages

Languages

RafayKhattak/ToxiScan

Folders and files

Latest commit

History

Repository files navigation

ToxiScan

Key Features

Training Data

Installation

Usage

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages