This project involves parsing and analyzing log files to extract useful information about user activity, most accessed endpoints, and potential security issues. The analysis includes identifying IP addresses that made the most requests, the most accessed endpoints, and IP addresses with failed login attempts that exceed a specified threshold.
- Requests per IP: Analyzes the log file to count the number of requests made by each IP address.
- Most Accessed Endpoint: Identifies the most frequently accessed endpoint in the log file.
- Suspicious Activity: Flags IP addresses with failed login attempts exceeding a defined threshold.
- Data Visualization: Displays bar charts showing the top IP addresses by request count and the suspicious activity (failed login attempts).
- Clone this repository or download the code files.
- Install the required dependencies:
pip install pandas matplotlib
- Ensure the log file you want to analyze is located at the specified
log_file_path
. - Adjust the
FAILED_LOGIN_THRESHOLD
value if necessary to detect suspicious activity based on your log data. - Run the script using Python:
python log_analysis.py
The script will output:
-
CSV Files: The results are saved as individual CSV files:
log_analysis_results.csv
: Contains all the analysis data in a single CSV file.
-
Bar Charts: The analysis results will also be visualized in the form of bar charts:
- A bar chart showing the top 10 IP addresses by request count.
- A bar chart showing the suspicious IP addresses with failed login attempts.
log_analysis.py
: The script containing the analysis code.sample_data.log
: Example log file for analysis (replace with your own log file).processed_outputs/
: Directory for storing the output CSV files.
- Requests per IP: Displays a table of IP addresses and their request count.
- Most Accessed Endpoint: Displays the most frequently accessed endpoint.
- Suspicious Activity: Displays IP addresses that have exceeded the failed login threshold.
Requests per IP:
IP Address Request Count
0 192.168.1.1 150
1 192.168.1.2 120
...
Most Accessed Endpoint:
Endpoint Access Count
0 /login 200
1 /dashboard 150
...
Suspicious Activity:
IP Address Failed Login Count
0 192.168.1.1 15
...
This project is licensed under the GNU General Public License 3.0 - see the LICENSE file for details.
- This project uses
pandas
for data analysis andmatplotlib
for data visualization. - The log parsing is based on common log formats, but it can be adapted for different log file structures.