README.Rmd

---
output: github_document
bibliography: vignettes/references.bib
link-citations: yes
---

```{r echo=FALSE, results = 'asis'}
pkg <- "markovDP"

source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg)
```

## Introduction
A Markov decision process (MDP) [@Bellman1957; @Howard1960] is a discrete-time 
stochastic control process. In each time step, an 
agent can perform actions which affect the system (i.e., may cause
the system state to change). The agent's goal 
is to maximize its expected
future rewards that depend on the sequence of system state and the
agent's actions in the future. Solving the MDP means finding 
the optimal (or at least a good) policy
that guides the agent's actions. 

The `markovDP` package provides the infrastructure to work with MDPs in R.
The focus is on convenience in formulating MDPs with small to medium sized 
state spaces in multiple ways, the support of
sparse representations (using sparse matrices, lists and data.frames) and 
visualization of results. 
Some key components are implemented in
C++ to speed up computation.
The package provides to the following popular tabular methods:

* __Dynamic Programming__
  - Value Iteration [@Bellman1957]
  - Modified Policy Iteration [@Howard1960; @Puterman1978]
  - Prioritized Sweeping [@Moore1993; @Andre1997; @Li2008]

* __Linear Programming__
  - Primal Formulation [@Manne1960]

* __Monte Carlo Control__
  - Monte Carlo Control with Exploring Starts [@Sutton1998]
  - On-policy Monte Carlo Control [@Sutton1998]
  - Off-policy Monte Carlo Control [@Sutton1998]

* __Termporal Differencing__
  - Q-Learning [@Watkins1992]
  - Sarsa [@Rummery1994]
  - Expected Sarsa [@Sutton1998]
  - n-step Sarsa [@Sutton1998]
  
* __Linear Function Approximation__
  - Episodic Semi-gradient Sarsa [@Sutton1998]
  
* __Sampling__
  - Random-sample one-step tabular Q-planning [@Sutton1998]

These implementations follow the description is [@Russell2020] and 
[@Sutton1998]. The implementations represent the state space explicitly, so only
problems with small to medium state spaces can be used.

Partially observable Markov Decision Problems (POMDPs) can me modeled in a similar 
fashion using package **pomdp** [@CRAN_pomdp].


```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 1)
pkg_install(pkg, CRAN = FALSE)
```

## Usage

Solving the simple maze from [@Russell2020].
```{r problem}
library("markovDP")
data("Maze")
Maze
```

The maze is a gridworld and can be displayed directly.

```{r display}
gw_plot(Maze, state = TRUE)
```

```{r solve}
sol <- solve_MDP(model = Maze)
sol
```

Display the value function.

```{r value_function}
plot_value_function(sol)
```

The state values can be shown in the gridworld as colored map.


```{r gridworld_plot}
gw_plot(sol)
```


## Acknowledgments

Development of this package was supported in part by 
National Institute of Standards and Technology (NIST) under grant number 
[60NANB17D180](https://www.nist.gov/ctl/pscr/safe-net-integrated-connected-vehicle-computing-platform).

## References