-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathProject TSA_V3.Rmd
171 lines (122 loc) · 5.94 KB
/
Project TSA_V3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
title: "Project"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Background
Unemployment is an important yardstick that defines the condition of a country’s economy. The unemployment rate has several consequences on the country’s economy such as loss of productive forces, loss of income, as well as a burden on the state budget. Continued and persistent employment rate helps bolster the country’s social and economic status.
The unemployment rate has been the primary summary statistic for the health of the labor market for quite some time. Recently, however, forecasts of the unemployment rate have come to the forefront, as monetary policy makers are trying to formulate a way of conditioning expectations in the new and extraordinary policy environment.
The unemployment rate has varied from as low as 1% during World War I to as high as 25% during the Great Depression. More recently, it hit 10.8% in November 1982 and 10.0% in October 2009. Unemployment tends to rise during recessions and fall during expansions. From 1948 to 2015, unemployment averaged about 5.8%. The United States has experienced 11 recessions since the end of the postwar period in 1948.
## Purpose
The purpose of this report is to identify the most appropriate model to forecast future unemployment rate in the US using the historical data. We present an in-depth study of the forecasts for the monthly U.S. unemployment rate using various time series models and comparing them to further our understanding of the strengths and deficiencies of these methods.
## About the Data
The data used for the purposes of this report represents the US Unemployment Statistics from January 1990 - December 2016, broken down by state and month. The formatted version of the data in CSV format for the purposes of this analysis was obtained from Kaggle. The raw unformatted data is available at the United States Bureau of Labor Statistics Website.
Note: These unemployment rates are monthly U-3 rates, and are NOT seasonally adjusted or categorized by age, gender, level of education, etc.
```{r , echo=FALSE, message=FALSE, warning=FALSE}
library(dplyr)
library(forecast)
library(urca)
library(fpp2)
unemp_data = read.csv('D:/4th Qtr Study Material/Project/output.csv', header = TRUE, stringsAsFactors = FALSE)
```
## Including Plots
Exploratory Data Analysis
Overview Of Data:
To see how these rates varied from state to state, we plotted a map shown below. We see that states like California, Arizona and Michigan have fairly high average rates of unemployment. We can also see that Central America has a fairly lower unemployment rate than the east & west coast, suggesting that coasts are rather more volatile in jobs.
```{r , echo=FALSE, message=FALSE, warning=FALSE}
str(unemp_data)
```
```{r , echo=FALSE, message=FALSE, warning=FALSE}
state_wise_avg <- unemp_data %>%
select(State,Rate) %>%
group_by(State) %>%
summarise('Average'= mean(Rate)) %>%
mutate(State = tolower(State))
colnames(state_wise_avg)[1]<-"region"
colnames(state_wise_avg)[2]<-"value"
require(choroplethr)
require(choroplethrMaps)
state_choropleth(state_wise_avg, title="Average Unemployment across USA", num_colors=8, legend="Avg unemp rate")
```
States like California, Arizona and Michigan have fairly high average rates of unemployment.
## Analysis
```{r , echo=FALSE, message=FALSE, warning=FALSE}
byYear <- unemp_data %>%
select(-State, -County) %>%
group_by(Year, Month) %>%
summarise(Rate = mean(Rate)) %>%
arrange(Year, match(Month, month.name))
df <- ts(byYear[,3], start = c(1990,1), freq = 12)
plot(stl(df[, 1], s.window = "periodic"))
```
There seems to be some seasonality/cyclicity. Also the data is not stationary. SO we take a log and do a first order difference.
```{r , echo=FALSE, message=FALSE, warning=FALSE}
p1 = autoplot(df)+
ggtitle("US Monthly Unemployment Rate") + xlab("Year") + ylab("Unemployment Rate")
df_bc = BoxCox(df,lambda=BoxCox.lambda(df))
p2 = autoplot(df_bc)+
xlab("Year") + ylab("BoxCox of Unemployment Rate")
gridExtra::grid.arrange(p1,p2, nrow=2)
train <- window(df_bc, end = c(2010,12))
test <- window(df_bc, start = c(2011,1), end = c(2016,12))
```
## ETS Models
```{r , echo=FALSE, message=FALSE, warning=FALSE}
ets_fit0 <- ets(train)
checkresiduals(ets_fit0)
#a2 <- accuracy(ets_fit0)
#a2[,c("RMSE","MAE","MAPE","MASE")]
```
## ARIMA Models
```{r , echo=FALSE, message=FALSE, warning=FALSE}
df_bc %>% nsdiffs()
df_bc %>% diff(lag=12) %>% nsdiffs()
df_bc %>% diff(lag=12) %>% ndiffs()
df_bc %>% diff(lag=12) %>% ur.kpss() %>% summary() #since t-stat is < 5pct, we'll call this series stationary
df_bc %>% diff(lag=12) %>% ggtsdisplay()
```
```{r , echo=FALSE, message=FALSE, warning=FALSE}
#--Data Modeling. This takes a few minutes to run
#auto.arima(log(df), stepwise=FALSE, max.order = 9,
# approximation=FALSE)
```
```{r , echo=FALSE, message=FALSE, warning=FALSE}
fit1 <- Arima(train, order=c(4,0,3), seasonal = c(1,1,1))
fit1 %>% forecast %>% autoplot
checkresiduals(fit1)
```
```{r , echo=FALSE, message=FALSE, warning=FALSE}
fit2 <- Arima(train,order=c(4,0,2), seasonal = c(1,1,1))
fit2 %>% forecast %>% autoplot
checkresiduals(fit2)
```
```{r , echo=FALSE, message=FALSE, warning=FALSE}
fit3 <- Arima(train,order=c(4,0,3), seasonal = c(2,1,1))
fit3 %>% forecast %>% autoplot
checkresiduals(fit3)
```
## Check which Arima Model fits better
Based on the AICc values, we see that Model 1 fits the best with lowest AICc of -1252.247.
```{r echo=FALSE, message=FALSE, warning=FALSE}
fit1$aicc
fit2$aicc
fit3$aicc
```
## Comparing ETS vs ARIMA
```{r}
a1 = ets_fit0 %>% forecast(h=72) %>% accuracy(test)
a2 = fit1 %>% forecast(h=72) %>% accuracy(test)
a1[,c("RMSE","MAE","MAPE","MASE")]
a2[,c("RMSE","MAE","MAPE","MASE")]
```
## Forecasting next 2 years
```{r echo=FALSE, warning=FALSE}
fit = Arima(df_bc, order=c(4,0,3), seasonal = c(1,1,1))
fc <- forecast(fit, h=48)
summary(fc)
plot(fc)
```