change README

super-lou · Aug 29, 2024 · e35de7b · e35de7b
1 parent 816b3c5
commit e35de7b
Show file tree

Hide file tree

Showing 6 changed files with 103 additions and 61 deletions.
diff --git a/R/CARD_management.R b/R/CARD_management.R
@@ -27,7 +27,7 @@
 #'
 #' @param CARD_path A [character][base::character] string representing the path to the downloaded CARD directory (it should end with `"CARD"`). In this directory, you can search the CARDs that you want in the `"__all__"` subdirectory that will be use for an analysis (see [layout] to know how to specify which CARD you want).
 #' @param CARD_out A [character][base::character] string for a path to a directory where the CARD parameterization files will be copied and pasted for an analysis. Default is `NULL` if you want to use a `tmp` subdirectory in the `CARD_path` directory.
-#' @param layout A [character][base::character] [vector][base::c()] specifying the tree structure of files that you want for your analysis. Each element of the vector represents either:
+#' @param layout A [character][base::character] string [vector][base::c()] specifying the tree structure of files that you want for your analysis. Each element of the vector represents either:
 #' * the name of an analysis directory (e.g., `"EX"`)
 #' * the beginning and ending of an analysis directory: `"["` for the start and `"]"` for the end
 #' * the CARD name (e.g., `"QA"`)

diff --git a/R/CARD_process.R b/R/CARD_process.R
@@ -117,7 +117,7 @@ get_last_Process = function (Process) {
 #'
 #' @param data Input data format is a [tibble][tibble::tibble()] from the tibble package. It needs to have :
 #' * Only one column of [Date][base::Date] that are regularly spaced and unique for each time serie.
-#' * If there is more than one time serie, at least one column needs to be of [character][base::character] for names of time series in order to identify them. If more than one column of identifier is given, they will all be used in order to identify a unique time serie.
+#' * If there is more than one time serie, at least one column needs to be of [character][base::character] string for names of time series in order to identify them. If more than one column of identifier is given, they will all be used in order to identify a unique time serie.
 #' * At least one column of [numeric][base::numeric] (or [logical][base::logical]) on which the process of variable extraction will be perform. More numerical column can be leave but if they are useless, they will be suppressed.
 #'
 #' e.g.
@@ -142,14 +142,14 @@ get_last_Process = function (Process) {
 #' @param CARD_tmp If you want to locate the `CARD_dir` directory somewhere other than in the `CARD_path` directory, you can specify a [character][base::character] string in `CARD_tmp` for a path where the `CARD_dir` subdirectory of CARDs will be searched. Default is `"NULL"` if you want to locate the `CARD_dir` subdirectory of CARDs in `CARD_path`.
 #' @param CARD_dir A [character][base::character] string for the name of a subdirectory in `CARD_path` (or `CARD_tmp`) where the CARD parameterization files are located for an analysis. Default is `"WIP"`.
 #' @param CARD_name By default, all CARDs in the `CARD_dir` directory will be used for the analysis. However, you can specify a [vector][base::c()] of [character][base::character] strings with the names of the CARDs to be used. Default is `"NULL"` for using all the CARDs.
-#' @param period_default A [vector][base::c()] of two [dates][base::Date] (or two unambiguous [character][base::character] that can be coerced to [dates][base::Date]) to restrict the period of analysis. As an example, it can be `c("1950-01-01", "2020-12-31")` to select data from the 1st January of 1950 to the end of December of 2020. The default option is `period=NULL`, which considers all available data for each time serie.
-#' @param suffix A [character][base::character] [vector][base::c()] representing suffixes to be appended to the column names of the extracted variables. This parameter allows handling multiple extraction scenarios. For example, a cumbersome case can be to have a unique function to apply to a multiple list of column. It is possible to give `funct=list(QA_obs=mean, QA_sim=mean)` and `funct_args=list(list("Q_obs", na.rm=TRUE), list("Q_sim", na.rm=TRUE))` or simply `funct=list(QA=mean)` and `funct_args=list("Q", na.rm=TRUE)` with `suffix=c("obs", "sim")`. The two approach give the same result. Default `NULL`.
-#' @param suffix_delimiter [character][base::character] specifies the delimiter to use between the variable name and the suffix if not `NULL`. The default is `"_"`.
+#' @param period_default A [vector][base::c()] of two [dates][base::Date] (or two unambiguous [character][base::character] strings that can be coerced to [dates][base::Date]) to restrict the period of analysis. As an example, it can be `c("1950-01-01", "2020-12-31")` to select data from the 1st January of 1950 to the end of December of 2020. The default option is `period=NULL`, which considers all available data for each time serie.
+#' @param suffix A [character][base::character] string [vector][base::c()] representing suffixes to be appended to the column names of the extracted variables. This parameter allows handling multiple extraction scenarios. For example, a cumbersome case can be to have a unique function to apply to a multiple list of column. It is possible to give `funct=list(QA_obs=mean, QA_sim=mean)` and `funct_args=list(list("Q_obs", na.rm=TRUE), list("Q_sim", na.rm=TRUE))` or simply `funct=list(QA=mean)` and `funct_args=list("Q", na.rm=TRUE)` with `suffix=c("obs", "sim")`. The two approach give the same result. Default `NULL`.
+#' @param suffix_delimiter [character][base::character] string specifies the delimiter to use between the variable name and the suffix if not `NULL`. The default is `"_"`.
 #' @param cancel_lim A [logical][base::logical] to specify whether to cancel the NA percentage limits in the CARDs. Default is `FALSE`.
 #' @param simplify A [logical][base::logical] to specify whether to simplify the extracted data by joining each [tibble][tibble::tibble()] extracted from each CARDs. Usefull when the extracted variable has no temporal extension. Default `"FALSE"`.
 #' @param expand_overwrite [logical][base::logical] or `NULL`. If `TRUE`, expand the output [tibble][tibble::tibble()] as a [list][base::list()] of [tibble][tibble::tibble()] for each extracted variable by `suffix`.
 #' Default `NULL` to conserve the value specified in the CARDs used.
-#' @param sampling_period_overwrite A [character][base::character] or a [vector][base::c()] of two [characters][base::character] that will indicate how to sample the data for each time step defined by `time_step`. Hence, the choice of this argument needs to be link with the choice of the time step. For example, for a yearly extraction so if `time_step` is set to `"year"`, `sampling_period` needs to be formated as `%m-%d` (a month - a day of the year) in order to indicate the start of the sampling of data for the current year. More precisly, if `time_step="year"` and `sampling_period="03-19"`, `funct` will be apply on every data from the 3rd march of each year to the 2nd march of the following one. In this way, it is possible to create a sub-year sampling with a [vector][base::c()] of two [characters][base::character] as `sampling_period=c("02-01", "07-31")` in order to process data only if the date is between the 1st february and the 31th jully of each year.
+#' @param sampling_period_overwrite A [character][base::character] string or a [vector][base::c()] of two [character][base::character] strings that will indicate how to sample the data for each time step defined by `time_step`. Hence, the choice of this argument needs to be link with the choice of the time step. For example, for a yearly extraction so if `time_step` is set to `"year"`, `sampling_period` needs to be formated as `%m-%d` (a month - a day of the year) in order to indicate the start of the sampling of data for the current year. More precisly, if `time_step="year"` and `sampling_period="03-19"`, `funct` will be apply on every data from the 3rd march of each year to the 2nd march of the following one. In this way, it is possible to create a sub-year sampling with a [vector][base::c()] of two [character][base::character] strings as `sampling_period=c("02-01", "07-31")` in order to process data only if the date is between the 1st february and the 31th jully of each year.
 #' *not available for now* For a monthly (or seasonal) extraction, `sampling_period` needs to give only day in each month, so for example `sampling_period="10"` to extract data from the 10th of each month to the 9th of each following month.
 #' Default `NULL` to conserve the value specified in the CARDs used.
 #' @param rmNApct [logical][base::logical]. Should the `NApct` column, which shows the percentage of missing values in the output, be removed ? Default `TRUE`.

diff --git a/R/process_extraction.R b/R/process_extraction.R
@@ -26,7 +26,7 @@
 #'
 #' @param data Input data format is a [tibble][tibble::tibble()] from the tibble package. It needs to have :
 #' * Only one column of [Date][base::Date] that are regularly spaced and unique for each time serie.
-#' * If there is more than one time serie, at least one column needs to be of [character][base::character] for names of time series in order to identify them. If more than one column of identifier is given, they will all be used in order to identify a unique time serie.
+#' * If there is more than one time serie, at least one column needs to be of [character][base::character] string for names of time series in order to identify them. If more than one column of identifier is given, they will all be used in order to identify a unique time serie.
 #' * At least one column of [numeric][base::numeric] (or [logical][base::logical]) on which the process of variable extraction will be perform. More numerical column can be leave but if they are useless, they will be suppressed.
 #'
 #' e.g.
@@ -49,7 +49,7 @@
 #' 
 #' @param funct The function that you want to use for the process of variable extraction. More specificaly, it is possible to give a [list][base::list()] with several functions as element of that [list][base::list()] and the name that will be used for the extracted column as the names element of each function of that previously defined [list][base::list()]. A simple case will be `funct=mean` and a more complicated one `funct=list(QA=mean, QJXA=max)`. Default [max][base::max()].
 #' @param funct_args A [list][base::list()] of [list][base::list()] of named arguments needed for each functions provided through `funct`. This [list][base::list()] can be a simple [list][base::list()] if there is only one function given by `funct`. The argument can relate to a column name in order to specify on which numerical column the extraction will be perfom. For the simple example, `funct_args=list("Q_obs", na.rm=TRUE)` and for the more complex case `funct_args=list(list("Q_obs", na.rm=TRUE), list("Q_sim", na.rm=FALSE))`. Default [list][base::list()].
-#' @param time_step A [character][base::character] specifying the time step of the variable extraction process. Possible values are :
+#' @param time_step A [character][base::character] string specifying the time step of the variable extraction process. Possible values are :
 #' - "year" for a value per year
 #' - "month" for a value for each month of the year (so 12 values if at least a full year is given)
 #' - "year-month" for a value for each month of each year (so 12 times the number of given year values at the end)
@@ -58,18 +58,18 @@
 #' - "yearday" for one value per day of the year (so 365 values at the end if at least a full year is given... but more than one year seems obviously more interesting)
 #' "none" if you want to extract a unique value for the whole time serie
 #' Default `"year"`.
-#' @param sampling_period A [character][base::character] or a [vector][base::c()] of two [characters][base::character] that will indicate how to sample the data for each time step defined by `time_step`. Hence, the choice of this argument needs to be link with the choice of the time step. For example, for a yearly extraction so if `time_step` is set to `"year"`, `sampling_period` needs to be formated as `%m-%d` (a month - a day of the year) in order to indicate the start of the sampling of data for the current year. More precisly, if `time_step="year"` and `sampling_period="03-19"`, `funct` will be apply on every data from the 3rd march of each year to the 2nd march of the following one. In this way, it is possible to create a sub-year sampling with a [vector][base::c()] of two [characters][base::character] as `sampling_period=c("02-01", "07-31")` in order to process data only if the date is between the 1st february and the 31th jully of each year.
+#' @param sampling_period A [character][base::character] string or a [vector][base::c()] of two [characters][base::character] strings that will indicate how to sample the data for each time step defined by `time_step`. Hence, the choice of this argument needs to be link with the choice of the time step. For example, for a yearly extraction so if `time_step` is set to `"year"`, `sampling_period` needs to be formated as `%m-%d` (a month - a day of the year) in order to indicate the start of the sampling of data for the current year. More precisly, if `time_step="year"` and `sampling_period="03-19"`, `funct` will be apply on every data from the 3rd march of each year to the 2nd march of the following one. In this way, it is possible to create a sub-year sampling with a [vector][base::c()] of two [character][base::character] strings as `sampling_period=c("02-01", "07-31")` in order to process data only if the date is between the 1st february and the 31th jully of each year.
 #' *not available for now* For a monthly (or seasonal) extraction, `sampling_period` needs to give only day in each month, so for example `sampling_period="10"` to extract data from the 10th of each month to the 9th of each following month.
 #' Default `NULL`.
-#' @param period A [vector][base::c()] of two [dates][base::Date] (or two unambiguous [character][base::character] that can be coerced to [dates][base::Date]) to restrict the period of analysis. As an example, it can be `c("1950-01-01", "2020-12-31")` to select data from the 1st January of 1950 to the end of December of 2020. The default option is `period=NULL`, which considers all available data for each time serie.
+#' @param period A [vector][base::c()] of two [dates][base::Date] (or two unambiguous [character][base::character] strings that can be coerced to [dates][base::Date]) to restrict the period of analysis. As an example, it can be `c("1950-01-01", "2020-12-31")` to select data from the 1st January of 1950 to the end of December of 2020. The default option is `period=NULL`, which considers all available data for each time serie.
 #' @param is_date [logical][base::logical]. If TRUE, `process_extration()` will convert the result of the application of `funct` to a day of the year. The aim is for example to give `funct=which.min` and if `is_date=TRUE`, the result will not be the indice of the minimum of the sample but the associated day of the year given by an [integer][base::integer] (1 is the 1st of january). Default `FALSE`.
 #' @param NApct_lim [numeric][base::numeric]. The maximum percentage of missing values in each sample allowed. If this threshold is exceeded, the value associated to the current sample will be convert to NA. Default `NULL`.
 #' @param NAyear_lim [numeric][base::numeric].The maximum number of continuous missing years allowed. If this threshold is exceeded, the time serie will be split in half around the problematic period and only the longest part will be used for the extraction process. Default `NULL`.
-#' @param Seasons A [vector][base::c()] of [characters][base::character] that indicates the seasonal pattern of a year. All months of the year needs to be contain in the `Seasons` variable. Give months circulary in a vector in which each element is a character chain of several months identify by the first letter of their names. The default is `Seasons=c("DJF", "MAM", "JJA", "SON")` but it can be set for example to `Seasons=c("MAMJJA", "SONDJF")`.
-#' @param nameEX A [character][base::character] specifying the name of the column of the extracted variable if no name is given in `funct`. Default is `"X"`.
-#' @param suffix A [character][base::character] [vector][base::c()] representing suffixes to be appended to the column names of the extracted variables. This parameter allows handling multiple extraction scenarios. For example, a cumbersome case can be to have a unique function to apply to a multiple list of column. It is possible to give `funct=list(QA_obs=mean, QA_sim=mean)` and `funct_args=list(list("Q_obs", na.rm=TRUE), list("Q_sim", na.rm=TRUE))` or simply `funct=list(QA=mean)` and `funct_args=list("Q", na.rm=TRUE)` with `suffix=c("obs", "sim")`. The two approach give the same result. Default `NULL`.
-#' @param suffix_delimiter [character][base::character] specifies the delimiter to use between the variable name and the suffix if not `NULL`. The default is `"_"`.
-#' @param keep *in developpement* A [character][base::character] [vector][base::c()] of column names to keep in the output [tibble][tibble::tibble()]. In the current state, `keep` can only be set to `NULL` if you don't want to keep anythings in the output besides the usefull column, or `"all"` if you want to conserve all the initial column in the output column.
+#' @param Seasons A [vector][base::c()] of [character][base::character] strings that indicates the seasonal pattern of a year. All months of the year needs to be contain in the `Seasons` variable. Give months circulary in a vector in which each element is a character chain of several months identify by the first letter of their names. The default is `Seasons=c("DJF", "MAM", "JJA", "SON")` but it can be set for example to `Seasons=c("MAMJJA", "SONDJF")`.
+#' @param nameEX A [character][base::character] string specifying the name of the column of the extracted variable if no name is given in `funct`. Default is `"X"`.
+#' @param suffix A [character][base::character] string [vector][base::c()] representing suffixes to be appended to the column names of the extracted variables. This parameter allows handling multiple extraction scenarios. For example, a cumbersome case can be to have a unique function to apply to a multiple list of column. It is possible to give `funct=list(QA_obs=mean, QA_sim=mean)` and `funct_args=list(list("Q_obs", na.rm=TRUE), list("Q_sim", na.rm=TRUE))` or simply `funct=list(QA=mean)` and `funct_args=list("Q", na.rm=TRUE)` with `suffix=c("obs", "sim")`. The two approach give the same result. Default `NULL`.
+#' @param suffix_delimiter [character][base::character] string specifies the delimiter to use between the variable name and the suffix if not `NULL`. The default is `"_"`.
+#' @param keep *in developpement* A [character][base::character] string [vector][base::c()] of column names to keep in the output [tibble][tibble::tibble()]. In the current state, `keep` can only be set to `NULL` if you don't want to keep anythings in the output besides the usefull column, or `"all"` if you want to conserve all the initial column in the output column.
 #' Warning : The number of rows in the output with `keep="all"` will, as a result, be the same as in the input. For example, the extracted value for a year from a daily time series will be assigned to the first day of that year, and `NaN` will be assigned to every other value in the output. Default `NULL`.                     
 #' @param compress [logical][base::logical]. If `time_step` is set to `"month"`, `"year-month"`, `"season"` or `"year-season"` should the function return a standard [tibble][tibble::tibble()] or a compressed one ?  When `compress = TRUE`, the function will perform a [pivot_wider][tidyr::pivot_wider()] operation to display the month or season information in columns instead of rows. Default `FALSE`.
 #'