creek_selectR_final.Rmd

---
title: "Creek SelectR"
author: "Mwessel"
date: "7/4/2021"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)


rm(list = ls())

# marcus says install install.version(sf,.98) but i haven't yet.
library(sf)
library(tidyverse)
library(haven)
library(leaflet)
library(dplyr)
library(here)
library(data.table)
```

# Workflow to Update Tidal Creeks to Run61
This describes an initial attempt to update the dashboard with IWR run61 data. Because some wbids include multiple creeks and some creeks include multiple wbids it is not possible to just subset the iwr dataset based on a static wbid file and run the TC functions. Instead stations (STA) must be assigned to a creek identifier (JEI) and WBID within the creek line file using a 200 meter buffer. An additional complicating factor is that FDEP changes WBID names and boundaries routinely which means that the IWR run WBID and Station shapefiles associated with each run should be used to intersect the creek population and then the data should be subset in that manner. The following workflow is being pursued using as much R code as possible.Run 61 is used here in the naming convention. Names will change for each run so you would replace run61 with the new run.

Shapefile definitions need to produce the dashboard results:

 
* WBID_Run61 - shapefile generated by DEP for each run with all WBIDS in the state of Florida

* IWR_Stations_Run61 - DEP shapefile of all water quality and biology stations ever sampled in the state of Florida

* TidalCreek_ALL_Line - Line File with CreekID and Creek Length(m). This line file is not broken into WBIDs but is used to create the 200 meter buffer
  
* TidalCreek_ALL_Line_Buff200m - Polygon file from All_line file above  with a 200 m buffer. A flat buffer is used here so the buffer doesnt extend past the ends of the line
  
* TidalCreek_ALL_Line_WBID61 - Line file joined to IWR WBID shape so line file is broken into WBIDs with CreekID, total         Length(m), WBID, and Length in WBID (m).

* TidalCreek_ALL_Line_StatWBID61_Join - IWR Station point file joined with the buffer file to select stations within buffer, then joined to the WBID61 line file so that station is matched with  closest creek. A distance field (m) is provided and WBID is retained.


## Step 1 Read in creek line file and the run 61 WBID and Station shapefiles to intersect .

Use the static creek line file with pre-established buffer to select stations that fall in buffer for each run.Since WBID boundaries and names change over time its probably best to try and do the intersection that divides the line file by WBID for each run.  


```{r}
setwd(here::here("./shapes"))

# Full IWR Run61 WBID  and station files 
IWR_WBID<- sf::read_sf(dsn = ".", layer = "WBID_Run61")
IWR_STA <- sf::read_sf(dsn = ".", layer = "IWR_Stations_Run61") # this has all stations ever!

# This is the line file with buffer that can be used to select stations. This shouldn't change over time
Creek_buff <- sf::read_sf(dsn = ".", layer = "TidalCreek_All_Line_Buff200m")

```

# Step 2 Workflow to do GIS work in R :
   1) join IWR_STA and TidalCreek_All_Line_Buff200m and select those stations that fall in the buffer
   2) intersect the line file with IWR_WBID to create a line file that is broken up by WBID 
   3) subset unique wbids and stations from these files to pull from IWR dataset

```{r}

# MARCUS - HERE'S WHERE YOU COME IN

```


The results of this should look like the shapefiles listed below. Note: i don't think we need the creek_pop sas file anymore. We should generate it from the TidalCreek_ALL_Line_WBID61 using R by selecting JEI and WBID


```{r}
setwd(here::here("./shapes"))

# Created shapes
Creek_sta  <- sf::read_sf(dsn = ".", layer = "TidalCreek_ALL_Line_StatWBID61_join")%>%
      mutate(sta=STATION_ID)
Creek_jei <- sf::read_sf(dsn = ".", layer = "TidalCreek_ALL_Line_WBID61")


# create unique list of stations with data only in last 10 years
buffer_sta<-tibble(unique(Creek_sta$sta))%>%
   mutate(sta=unique(Creek_sta$sta))%>%
   select(sta)

# save to use in index for creekstats page
save(buffer_sta, file = here('./data/buffer_sta.RData'))

```


## Step 3 Read in Full IWR dataset which was exported from the FDEP SAS database as a txt file reduced to only needed params.See SAS file export_iwr.sas which is used to export the SAS IWR dataset to the text file. The output can be read in using a 16gb RAM computer 

```{r readin}
here()

yearin <- 2010
thisyr <- 2021

# Full reduced just means its the full dataset reduced to only the necessary variables for the creeks run
tmp <- fread('E:/run61/iwrrun61_full_reduced.txt', sep = '\t')

# Take full IWR - subset to last 10 years - and pull only stations identified by buffer
iwrraw <- tmp%>%dplyr::filter(year > yearin & year !=thisyr)%>%
 dplyr::select('wbid', 'class', 'sta', 'year', 'month', 'day', 'time', 'depth', 'depth2', 'param', 'masterCode',  'result','rCode', 'newComment' )%>%
 # filter joined shapefile to stations only in last 10 years 
filter(sta %in% (buffer_sta$sta)) 

rm(list = 'tmp')

save(iwrraw_run61, file = here::here('./data/iwrraw_run61.RData'))

```

# One thing to be aware of is that it is possible that a station may be in a WBID that is not part of the WBID population based on the line file and wbid join. In the dashboard functions somewhere i think we toss anything that is not in creek pop. it should be a rare occurrence but it is possible and maybe we should hunt down a way to correct it.