Web_Application_Firewall

This project is a Web Application Firewall (WAF) designed to protect web applications from malicious requests. By leveraging Machine Learning , specifically Logistic Regression, the WAF can distinguish between good (legitimate) and bad (malicious) requests. The solution involves a proxy server that intercepts incoming requests, evaluates them using a trained ML model, and determines whether to allow or block the request based on the prediction.

Section	Description
Overview	Introduction to the Web Application Firewall (WAF) project.
Features	Key features of the WAF including proxy server, ML model, and logging.
Architecture	Overview of the components and workflow of the WAF.
Tech Stack	Technologies and tools used in the project.
Installation	Step-by-step guide to install the WAF.
Usage	Instructions on how to run and use the WAF.
Dataset	Details on the dataset used for training the ML model.
Machine Learning Model	Information on the ML model and training process.
Contributing	Guidelines for contributing to the project.

Overview

Web Application Firewalls (WAFs) are critical components for protecting web applications from attacks such as SQL injection, Cross-Site Scripting (XSS), and other OWASP Top 10 vulnerabilities. This WAF uses a Logistic Regression model to classify incoming HTTP requests as either good or bad, enhancing the security of the web application it protects.

Features

Proxy Server: Intercepts incoming HTTP requests and forwards them to the web server if deemed safe.
Machine Learning Model: Logistic Regression model trained to detect malicious requests.
Real-Time Request Analysis: Analyzes and classifies requests in real-time.
Logging: Logs all requests and their classification for auditing and further analysis.

Architecture

The architecture of the WAF is composed of the following components:

Proxy Server: Acts as an intermediary between the client and the web server.
Request Logger: Logs incoming requests for analysis and model training.
Feature Extractor: Extracts relevant features from HTTP requests for ML model input.
Logistic Regression Model: Trained model to classify requests as good or bad.
Decision Engine: Uses the model's prediction to allow or block the request.

Tech Stack

Programming Language: Python
Machine Learning Library: Scikit-learn
Data Handling: Pandas
HTTP Handling: Requests
Logging: Python's logging module
Network Security: Integration of security best practices and protocols
Web Security: Implementing security measures to protect against these vulnerabilities.

Installation

Clone the Repository:

git clone https://github.com/Pratham-verma/Web_Application_Firewall.git

Usage

Run the Proxy Server:
```
python proxy_server.py
```
Monitor Logs: Check the logs generated by the proxy server to see the classification of requests.

Dataset

The dataset used for training the Logistic Regression model consists of labeled HTTP requests. Each request is classified as either good (legitimate) or bad (malicious). The dataset includes various features extracted from the HTTP headers, body, and other metadata.

To prepare the dataset:

Collect a large number of HTTP requests from various sources.
Label the requests as good or bad.
Extract features from each request.
Split the dataset into training and testing sets.

Machine Learning Model

The Logistic Regression model is trained using the prepared dataset. The model learns to identify patterns and features that distinguish good requests from bad ones.

Training the Model

Prepare the Dataset: Ensure your dataset is in a suitable format (e.g., CSV) with labeled features.

Train the Model:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Load dataset
data = pd.read_csv('dataset.csv')
X = data.drop('label', axis=1)
y = data['label']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)

Evaluate the Model:

from sklearn.metrics import accuracy_score, classification_report

# Predict and evaluate
y_pred = model.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
Data_Collection		Data_Collection
Log_Files		Log_Files
Testing_Data		Testing_Data
Proxy_server.py		Proxy_server.py
README.md		README.md
Web application Firewall .ipynb		Web application Firewall .ipynb
log_parse.py		log_parse.py
training_model.pkl		training_model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web_Application_Firewall

Table of Contents

Overview

Features

Architecture

Tech Stack

Installation

Usage

Dataset

Machine Learning Model

Training the Model

Contributing

thank you

About

Releases

Packages

Languages

Pratham-verma/Web_Application_Firewall

Folders and files

Latest commit

History

Repository files navigation

Web_Application_Firewall

Table of Contents

Overview

Features

Architecture

Tech Stack

Installation

Usage

Dataset

Machine Learning Model

Training the Model

Contributing

thank you

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages