A company has been tracking their sales for the year of 2019. At the end of the fisical year the company decided to upgrade tech from using Excel to a more optimal SQL-centric database. The data contains hundreds of thousands of electronics store purchases broken down by 'Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date', 'Purchase Address'. Analyze the dataset and answer the following quesitons.
Deployable Dashboard: https://salesdashboard.streamlitapp.com/
CSV File: Sales Data
Question 1: What was the best month for sales? How much was earned that month? Plot that in a bar chart.
Question 3: What time should we display advertisements to maximize likelihood of customer's buying product? Plot that in a line chart.
Libraries Used: Pandas for data manipulation, Path for file directory, Matplotlib and Plotly for Visualization.
For this assignment, there are three main tasks:
File: Sales Analysis
First, read and clean our CSV file for analysis. The CSV file includes hundreds of thousands of electronics store purchases broken down by 'Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date', 'Purchase Address'.
-
Use Pandas to read the sales CSV file as a DataFrame. Dont forget to infer_datetime_format.
sales.csv
: Contains the dataset we will be working with.
-
Detect and remove null values. sales_df.dropna(how='all', inplace=True)
-
Remove the text in the 'Order Date column' by taking the first 2 characters in the string and make that the month column. The months are being read in as 'Or' so make our month column numerical we need to simply state not equal to 'Or'
- sales_df[sales_df['Order Date'].str[0:2]!='Or']
-
Added 'Month', 'Address', 'City', 'State','Total Sales' columns to the dataset
- Parsed cells as strings to make new columns (.str)
- Apply the to_datetime(), to_numeric() and astype() functions to convert columns
Analyze the data to answer the following questions:
- Question 1: What was the best month for sales? How much was earned that month? Plot that in a bar chart.
- Question 2: What city sold the most product? Plot that in a bar chart.
- Question 3: What time should we display advertisements to maximize likelihood of customer's buying product? Plot that in a line chart.
- Use groupby() to perform aggregate analysis
- Plot bar charts and lines graphs to visualize results
- Be sure to clearly label graphs
Streamlit Dashboard: File: Sales Analysis Dashboard
View the Streamlit application Run the Streamlit application to view a completed Sales Analysis Dashboard. To do so, complete the following steps:
- In the terminal, navigate to the projects folder.
- In the terminal, run the Streamlit application by using streamlit run "Sales_Analysis_Dashboard.py"
- Question 1: In Q4 the sales spiked, this can be for a number of reasons, one potentially being the holiday season.
- Question 2: San Francisco has the most sales followed by Los Angeles and New York City.
- Question 3: Around 11am and 7pm, peaks can be observed in our data. It may make the most sence to advertise around that time.
-
The CSV file contains multiple headers throughout the dataset.
-
Be sure to use
head()
orsample()
when you want to look at your data, but don't want to print to a large DataFrame.