-
Notifications
You must be signed in to change notification settings - Fork 0
/
data_tidying_project.Rmd
120 lines (98 loc) · 3.59 KB
/
data_tidying_project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: "Data Tidying Project"
author: "Mohammad Hadavand"
date: "6/13/2018"
output: html_document
---
```{r setup, include=FALSE}
## install packages if necessary
list.of.packages <- c("httr", "readxl", "dplyr", "tidyr", "stringr", "reshape2", "skimr", "ggplot2")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
## load packages
library(httr)
library(readxl)
library(dplyr)
library(tidyr)
library(stringr)
library(reshape2)
library(skimr)
library(ggplot2)
```
### Get Data
[Data Set 1](https://data.world/retail/department-store-sales): Sales from the Retail Trade and Food Services Report from the US Census. This dataset only covers Department Stores, though the report covers a wide range of retail types. [1992-2016]
[Data Set 2](https://data.world/garyhoov/retail-sales-growth) US Retail Sales by Store Type with Growth Rate [2009-2014]
```{r data, warning=FALSE}
#1992-2016
#https://data.world/retail/department-store-sales
GET("https://query.data.world/s/gdk7iwtlisq6vkktmybqqr7hjjty5s", write_disk(tf <- tempfile(fileext = ".xls")))
df1 <- read_excel(tf)
#2009-2014
# https://data.world/garyhoov/retail-sales-growth
GET("https://query.data.world/s/py7kinxvyuxjpzwdjs2ti4wdmui6bi", write_disk(tf <- tempfile(fileext = ".xls")))
df2 <- read_excel(tf)
## the the first row and make that the column names of the data frame
colnames(df2) <- df2[1,]
```
### Save Raw Data
```{r save-data, warning=FALSE}
## use saveRDS() to save each object as a .rds file
```
### Wrangle Data
```{r wrangle-df2, warning=FALSE}
## an example working with df2
## let's wrangle!
df_retail <- df2 %>%
## remove the r from the column names of df2
magrittr::set_colnames(gsub("r","",df2[1,])) %>%
## add a new column called "business"
mutate(business = gsub("[…]|[.]","",`Kind of business`)) %>%
## filter to include Retail sales or Department stores sales
filter(grepl('Retail sales, total |Department stores', business)) %>%
## only look at columns with year information in them
select(.,c(matches('19|20'),business)) %>%
## take year column and collapse them into a single column
gather(., "year", "n", 1:(ncol(.)-1)) %>%
## make sure the count column `n` is numeric
mutate(n=as.numeric(n)) %>%
## filter to only include the businesses we're interested in
filter(business == "Retail sales, total "| business=="Department stores ")
```
```{r wrangle-df1, warning=FALSE}
## now, your turn!
## work with df1
df_department <- df1 %>%
## split Period column into one column called "month" and one called "year"
function_name() %>%
## add a column `value` which contains the
## information from the `Value (in millions)`
function_name() %>%
## group the data frame by the `year` column
function_name() %>%
## Summarize the data by creating a new column
## call this column `n`
## have it contain the sum of the `value` column
function_name() %>%
### create a new column called `business`
## set the value of this column to be "department stores"
## for the entire data set
function_name() %>%
## reorder column names to be : business, year, n
function_name()
```
```{r merge-data, warning=FALSE}
## Now, combine the two data frames
df_total <- function_name()
```
```{r plot, warning=FALSE, fig.width=10}
## Plot Retail Sales data
ggplot(df_retail, aes(x=year,y=n,colour=business)) +
geom_point()
## Plot Department Sales data
ggplot(df_department, aes(x=year,y=n)) +
geom_point()
cd
## Plot Combined Data
ggplot(df_total, aes(x=year,y=as.numeric(n), colour=business)) +
geom_point()
```