Project Log Analysis- Udacity Full Stack Nanodegree course A Reporting tool was built that will use information from the database to discover what kind of articles the site's readers like. The database contains newspaper articles, as well as the web server log for the site. The log has a database row for each time a reader loaded a web page. Using this information, the following code will answer questions about the site's user activity.
For this project, the task was to create a reporting tool that prints out reports( in plain text) based on the data in the given database. This reporting tool is a Python program using the psycopg2
module to connect to the database. This project sets up a mock PostgreSQL database for a fictional news website. The provided Python script uses the psycopg2 library to query the database and produce a report that answers the following three questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
In this project, I can stretch my SQL database skills. I will get practice interacting with a live database both from the command line and from my code. I will explore a large database with over a million rows. And I can build and refine complex queries and use them to draw business conclusions from data.
-
To get started, I recommend the user use a virtual machine to ensure they are using the same environment that this project was developed on, running on your computer. You can download Vagrant and VirtualBox to install and manage your virtual machine. Use
vagrant up
to bring the virtual machine online andvagrant ssh
to login. -
Download the data provided by Udacity here. Unzip the file in order to extract newsdata.sql. This file should be inside the Vagrant folder.
- Install Vagrant and VirtualBox
- Download the data from here.
- Unzip this file after downloading it. The file inside is called newsdata.sql.
- Copy the newsdata.sql file.
- Launch the Vagrant VM inside Vagrant sub-directory in the downloaded fullstack-nanodegree-vm repository using command:
$ vagrant up
- Then Log into this using command:
$ vagrant ssh
-
Change directory to /vagrant/Log-Analysis.
-
Load the database using
psql -d news -f newsdata.sql
. -
Connect to the database using
psql -d news
. -
Create the Views given below.
CREATE VIEW art_authors AS
SELECT title, name
FROM articles, authors
WHERE articles.author = authors.id;
CREATE VIEW art_views AS
SELECT title, count(log.id) as views
FROM articles, log
WHERE log.path = CONCAT('/article/', articles.slug)
GROUP BY articles.title
ORDER BY views desc;
CREATE VIEW logs AS
SELECT to_char(time,'DD-MON-YYYY') as Date, count(*) as LogCount
FROM log
GROUP BY Date;
CREATE VIEW errlogs AS
SELECT to_char(time,'DD-MON-YYYY') as Date, count(*) as ErrorCount
FROM log
WHERE STATUS = '404 NOT FOUND'
GROUP BY Date;
-
Then exit
psql
. -
Now execute the Python file -
python project.py
.