-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
112 lines (84 loc) · 3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
output: github_document
bibliography: vignettes/references.bib
link-citations: yes
---
```{r echo=FALSE, results = 'asis'}
pkg <- "markovDP"
source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg)
```
## Introduction
A Markov decision process (MDP) [@Bellman1957; @Howard1960] is a discrete-time
stochastic control process. In each time step, an
agent can perform actions which affect the system (i.e., may cause
the system state to change). The agent's goal
is to maximize its expected
future rewards that depend on the sequence of system state and the
agent's actions in the future. Solving the MDP means finding
the optimal (or at least a good) policy
that guides the agent's actions.
The `markovDP` package provides the infrastructure to work with MDPs in R.
The focus is on convenience in formulating MDPs with small to medium sized
state spaces in multiple ways, the support of
sparse representations (using sparse matrices, lists and data.frames) and
visualization of results.
Some key components are implemented in
C++ to speed up computation.
The package provides to the following popular tabular methods:
* __Dynamic Programming__
- Value Iteration [@Bellman1957]
- Modified Policy Iteration [@Howard1960; @Puterman1978]
- Prioritized Sweeping [@Moore1993; @Andre1997; @Li2008]
* __Linear Programming__
- Primal Formulation [@Manne1960]
* __Monte Carlo Control__
- Monte Carlo Control with Exploring Starts [@Sutton1998]
- On-policy Monte Carlo Control [@Sutton1998]
- Off-policy Monte Carlo Control [@Sutton1998]
* __Termporal Differencing__
- Q-Learning [@Watkins1992]
- Sarsa [@Rummery1994]
- Expected Sarsa [@Sutton1998]
- n-step Sarsa [@Sutton1998]
* __Linear Function Approximation__
- Episodic Semi-gradient Sarsa [@Sutton1998]
* __Sampling__
- Random-sample one-step tabular Q-planning [@Sutton1998]
These implementations follow the description is [@Russell2020] and
[@Sutton1998]. The implementations represent the state space explicitly, so only
problems with small to medium state spaces can be used.
Partially observable Markov Decision Problems (POMDPs) can me modeled in a similar
fashion using package **pomdp** [@CRAN_pomdp].
```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 1)
pkg_install(pkg, CRAN = FALSE)
```
## Usage
Solving the simple maze from [@Russell2020].
```{r problem}
library("markovDP")
data("Maze")
Maze
```
The maze is a gridworld and can be displayed directly.
```{r display}
gw_plot(Maze, state = TRUE)
```
```{r solve}
sol <- solve_MDP(model = Maze)
sol
```
Display the value function.
```{r value_function}
plot_value_function(sol)
```
The state values can be shown in the gridworld as colored map.
```{r gridworld_plot}
gw_plot(sol)
```
## Acknowledgments
Development of this package was supported in part by
National Institute of Standards and Technology (NIST) under grant number
[60NANB17D180](https://www.nist.gov/ctl/pscr/safe-net-integrated-connected-vehicle-computing-platform).
## References