A-NDVI.Rmd

---
output: 
  html_document:
    toc: TRUE
    toc_float: TRUE
---

# Appendix A: NDVI {-}

## Overview {-}

In this tutorial, we will learn to deal with raster data. For illustrative purposes, we will take a look at the NDVI data (Normalized Difference Vegetation Index) of 21 large counties in the Midwest. Our objectives are to:

* Visualize raster data using the "tmap" package
* Learn the basic techniques of cropping and masking
* Check and analyze summary statistics of raster data

## Environment Setup {-}

### Input/Output {-}

Our inputs include the shapefile of 21 large counties in the Midwest and the quarterly data of NDVI. The files are available for download on this project's [GitHub repo](https://github.com/GeoDaCenter/OpenAirQ-toolkit/tree/master/data).

### Load Libraries {-}

We will use the following packages in this tutorial:

* `raster`: to manipulate and analyze raster data
* `sf`: to conduct basic spatial data manipulation
* `tamp`: to create spatial data visualization

```{r A_package.setup, message = FALSE}
library(raster)
library(sf)
library(tmap)
```


Since we are mainly analyzing raster data in this tutorial, we use the "raster" package heavily, which "implements basic and high-level functions for raster data and for vector data operations such as intersections." The detailed documentation can be found [here](https://cran.r-project.org/web/packages/raster/raster.pdf).

### Load Data {-}

The data we are analyzing are the quarterly average of the Normalized Difference Vegetation Index for 21 large counties in the Midwest, from 2014 to 2018, which are stacked chronologically. Note that the original dataset is quite large (over 50 gigabytes), so we processed the data in advance. We can load the pre-processed data simply by running the code below. 


```{r A_loading ndvi data}
ndvi.quarterly <- stack("./data/NDVILargeAreaQuarterlyStack.tif")
```


(Remark: You may encounter trouble loading the data if you do not have the "rgdal" package installed!)

It is a good idea to take a glance at our dataset before we proceed any further.


```{r A_examining ndvi data}
ndvi.quarterly
```


Also remember to load the shapfile of the 21 large counties. We will use it shortly when we start plotting. 


```{r A_loading shapefile ndvi}
counties <- st_read("./data/LargeAreaCounties")
```

## Data manipulation and Plotting {-}

We start our analysis by looking at the data for one specific quarter - the 3rd quarter of 2018. Since this is the 19th layer of our dataset (recall that our dataset is stacked chronologically), we can easily extract the data using the line of code shown below.


```{r A_extracting ndvi data}
this.qtr <- raster::subset(ndvi.quarterly, 19, drop = FALSE)
```


Thanks to the data processing done beforehand, we don't have much data manipulation to do. It is time for us to start making plots! We begin with the most basic raster map.


```{r message=FALSE, eval = T}
tm_shape(this.qtr) +
  tm_raster() +
  tm_layout(legend.outside = TRUE)
```


This plot gives an overview of the NDVI of the 21 large counties. Without the county borders explicitly drawn, we are unable to compare the NDVI across counties. Moreover, the plot is far from aesthetically pleasing.

Hence we re-draw the graph, this time with counties borders as well as an informative title for the plot. In addition, we modify the title of the legend so that it is more comprehensible to readers. 


```{r fig.align='center', message = FALSE, eval = T}
tm_shape(this.qtr) +
  tm_raster(title = "NDVI") +
  tm_shape(counties) +
  tm_borders() +
  tm_layout(legend.outside = TRUE, main.title = "NDVI - The 3rd Quarter of 2018")
```


With this nicer plot, it is possible to compare the NDVI of different counties. Those who are familiar with the geography of the Midwest would immediately recognize that Cook county seems to have a lower NDVI than other counties. It is not at all surprising since the City of Chicago is located in Cook County, and metropolitan areas tend to have less vegetation than rural areas.

We can also make the plot interactive so that the readers can explore the plot more thoroughly.


```{r A_interavtive ndvi, message = FALSE, warning = FALSE, eval = T}
tmap_mode("view")

tm_shape(this.qtr) +
  tm_raster(title = "NDVI") +
  tm_shape(counties) +
  tm_borders() +
  tm_layout(legend.outside = TRUE, main.title = "NDVI - the 3rd Quarter of 2018")
```


We can turn off the interactive mode by running the following code. 


```{r A_Plot Mode ndvi, message = FALSE, warning = FALSE, eval = T}
tmap_mode("plot")
```


Before we jump to the next section, it may be helpful to quickly examine the summary statistics for the sub-dataset `this.qtr`. Notice that we use `raster::summary` here to specify that we want R to use the `summary` function from the "raster" package.


```{r A_summary ndvi, message = FALSE, warning = FALSE}
raster::summary(this.qtr)
```

## More Plotting {-}


Now that we have learned how to make a raster map, let's make more. In this section, we will place our focus on the NDVI of Cook County, IL - the home county of Chicago. First, we will extract the shape of Cook County.


```{r A_Extract Cook County Shape ndvi}
cook.county <- counties[counties$COUNTYNAME == "Cook", ]
```


Before we do anything else, we can plot the shape of Cook County that we just extracted to make sure we subsetted the dataset correctly. 


```{r A_Plot Cook County Shape ndvi, fig.align='center'}
plot(cook.county$geometry)
```


This is a very crude plot since it only shows the border of Cook County - we confirm the general shape of Cook County seems correct. If you are unsatisfied with how this map looks, rest assured that our final product is much more visually pleasing than this!

The following two lines of code crop and mask raster data to Cook County. Cropping and then masking accelerates the actions for large raster data. It probably does not matter in our case, since we are only looking at Cook County, and the volume of our data is rather small.


```{r A_Crop and Mask ndvi}
cook.ndvi <- raster::crop(this.qtr, cook.county)
cook.ndvi <- raster::mask(cook.ndvi, cook.county)
```


Now we can plot a raster map for Cook County. The commands we use here are very similar to the ones we used before. 


```{r A_Plot Cook County ndvi, fig.align='center', eval = T}
tm_shape(cook.ndvi) +
  tm_raster(title = "NDVI", palette = "Greens") +
  tm_shape(cook.county) +
  tm_borders() +
  tm_layout(legend.outside = TRUE, main.title = "NDVI for Cook County:\nThe 3rd Quarter of 2018")
```

With this raster map, we are able to discern patterns of NDVI with Cook County. The middle-east part of Cook County, where the City of Chicago is located, is shown in a lighter green than other parts of the county, signaling a lower NDVI. The northern and southern parts of the county, which are mostly rural areas, are shown to have a higher NDVI, much as we would expect. 

Before we conclude this tutorial, let's again take a look at the summary statistics.


```{r}
raster::summary(cook.ndvi)
base::mean(na.omit(getValues(cook.ndvi))) 
base::mean(na.omit(getValues(this.qtr))) 
```


The mean NDVI for Cook County is 0.52, while the mean for all 21 counties is 0.67. This is reasonable because NDVI tends to be lower in large cities, and Cook County happens to be at the center of the Chicago metropolitan area. Moreover, this is consistent with what we observe from the plots - Cook County is shown in a lighter green than other counties, which we have pointed out earlier in the tutorial. 

This marks the end of our tutorial. Hopefully, now you feel comfortable dealing with raster data. Of course, there are many more techniques to be learned in order to conduct more sophisticated spatial analysis with raster data. There are abundant online resources that introduce more complicated tools to handle raster data, which you may want to explore on your own.