[load] Minor tweak to loadvals() and update documentation (#880)

* [load] skip empty input on write * [load] hide `calc_chain` argument and extend the documentation for `wb_load()` * update NEWS
JanMarvin · Dec 31, 2023 · 85503cf · 85503cf
1 parent ca06d05
commit 85503cf
Show file tree

Hide file tree

Showing 7 changed files with 120 additions and 91 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,11 @@
 # openxlsx2 (development version)
 
+## Documentation improvement
+
+* Further tweaks to documentation and vignettes to make them more consistent.
+  * `wb_add_pivot_table()` / `wb_add_slicer()`
+  * `wb_load()`: `calc_chain` is no longer visible and the previous text that might have been misleading in regards of its use, has been replaced by a more detailed description of what are the consequences of keeping the calculation chain
+
 ## New features
 
 * Allow further modifications of comments. The background can now be filled with a color or an image. [870](https://github.com/JanMarvin/openxlsx2/pull/870)

diff --git a/R/class-workbook.R b/R/class-workbook.R
@@ -1924,23 +1924,20 @@ wbWorkbook <- R6::R6Class(
     #' @description load workbook
     #' @param file file
     #' @param data_only data_only
-    #' @param calc_chain calc_chain
     #' @return The `wbWorkbook` object invisibly
     load = function(
       file,
       sheet,
       data_only  = FALSE,
-      calc_chain = FALSE,
       ...
     ) {
       # Is this required?
-      if (missing(file)) file <- substitute()
+      if (missing(file))  file  <- substitute()
       if (missing(sheet)) sheet <- substitute()
       self <- wb_load(
         file       = file,
         sheet      = sheet,
         data_only  = data_only,
-        calc_chain = calc_chain,
         ...        = ...
       )
       invisible(self)

diff --git a/R/wb_load.R b/R/wb_load.R
@@ -1,60 +1,76 @@
-#' Load an existing .xlsx file
+#' Load an existing .xlsx, .xlsm or .xlsb file
 #'
-#' `wb_load()` returns a [wbWorkbook] object conserving styles and
-#' formatting of the original input file.
+#' `wb_load()` returns a [wbWorkbook] object conserving the content of the
+#' original input file, including data, styles, media. This workbook can be
+#' modified, read from, and be written back into a xlsx file.
 #'
-#' A warning is displayed if an xml namespace for main is found in the xlsx file.
-#' Certain xlsx files created by third-party applications contain a namespace
-#' (usually `x`). This namespace is not required for the file to work in spreadsheet
-#' software and is not expected by `openxlsx2`. Therefore it is removed when the
-#' file is loaded into a workbook. Removal is generally expected to be safe,
-#' but the feature is still experimental.
+#' @details
+#' If a specific `sheet` is selected, the workbook will still contain sheets
+#' for all worksheets. The argument `sheet` and `data_only` are used internally
+#' by [wb_to_df()] to read from a file with minimal changes. They are not
+#' specifically designed to create rudimentary but otherwise fully functional
+#' workbooks. It is possible to import with
+#' `wb_load(data_only = TRUE, sheet = NULL)`. In this way, only a workbook
+#' framework is loaded without worksheets or data. This can be useful if only
+#' some workbook properties are of interest.
 #'
-#' Initial support for binary openxml files (`xlsb`) has been added to the package.
-#' We parse the binary file format into pseudo-openxml files that we can import.
-#' Therefore, after importing, it is possible to interact with the file as if it
-#' had been provided as xlsx in the first place. This is of course slower than
-#' reading directly from the binary file. Our implementation is also still missing
-#' some features: some array formulas are still broken, conditional formatting and
-#' data validation are not implemented, nor are pivot tables and slicers.
+#' There are some internal arguments that can be passed to wb_load, which are
+#' used for development. The `debug` argument allows debugging of `xlsb` files
+#' in particular. With `calc_chain` it is possible to maintain the calculation
+#' chain. The calculation chain is used by spreadsheet software to determine
+#' the order in which formulas are evaluated. Removing the calculation chain
+#' has no known effect. The calculation chain is created the next time the
+#' worksheet is loaded into the spreadsheet. Keeping the calculation chain
+#' could only shorten the loading time in said software. Unfortunately, if a
+#' cell is added to the worksheet, the calculation chain may block the
+#' worksheet as the formulas will not be evaluated again until each individual
+#' cell with a formula is selected in the spreadsheet software and the Enter
+#' key is pressed manually. It is therefore strongly recommended not to
+#' activate this function.
 #'
-#' It is possible to import with `wb_load(data_only = TRUE, sheet = NULL)`. This
-#' way only a workbook skeleton is loaded. This can be useful if only some
-#' workbook properties are of interest.
+#' In rare cases, a warning is issued when loading an xlsx file that an xml
+#' namespace has been removed from xml files. This refers to the internal
+#' structure of the loaded xlsx file. Certain xlsx files created by third-party
+#' applications contain a namespace (usually x). This namespace is not required
+#' for the file to work in spreadsheet software and is not expected by
+#' `openxlsx2`. It is therefore removed when the file is loaded into a
+#' workbook. Removal is generally considered safe, but the feature is still not
+#' commonly observed, hence the warning.
+#'
+#' Initial support for binary openxml files (`xlsb`) has been added to the
+#' package. We parse the binary file format into pseudo-openxml files that we
+#' can import. Therefore, once imported, it is possible to interact with the
+#' file as if it had been provided in xlsx file format in the first place. This
+#' parsing into pseudo xml files is of course slower than reading directly from
+#' the binary file. Our implementation is also still missing some functions:
+#' some array formulas are not yet correct, conditional formatting and data
+#' validation are not implemented, nor are pivot tables and slicers.
 #'
 #' @param file A path to an existing .xlsx, .xlsm or .xlsb file
 #' @param sheet optional sheet parameter. if this is applied, only the selected
 #'   sheet will be loaded. This can be a numeric, a string or `NULL`.
 #' @param data_only mode to import if only a data frame should be returned. This
 #'   strips the `wbWorkbook` to a bare minimum.
-#' @param calc_chain optionally you can keep the calculation chain intact. This
-#'   is used by spreadsheet software to identify the order in which formulas are
-#'   evaluated. Removing the calculation chain is considered harmless. The calc
-#'   chain will be created upon the next time the worksheet is loaded in
-#'   spreadsheet software. Keeping it, might only speed loading time in said
-#'   software.
 #' @param ... additional arguments
 #' @return A Workbook object.
-#' @export
 #' @examples
-#' ## load existing workbook from package folder
-#' wb <- wb_load(file = system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2"))
-#' wb$get_sheet_names() # list worksheets
-#' wb ## view object
-#' ## Add a worksheet
-#' wb$add_worksheet("A new worksheet")
+#' ## load existing workbook
+#' fl <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
+#' wb <- wb_load(file = fl)
+#' @export
 wb_load <- function(
     file,
     sheet,
     data_only = FALSE,
-    calc_chain = FALSE,
     ...
 ) {
 
-  debug     <- list(...)$debug
-  xlsx_file <- list(...)$xlsx_file
+  calc_chain <- list(...)$calc_chain
+  debug      <- list(...)$debug
+  xlsx_file  <- list(...)$xlsx_file
   standardize_case_names(...)
 
+  if (is.null(calc_chain)) calc_chain <- FALSE
   if (is.null(debug)) debug <- FALSE
 
   if (!is.null(xlsx_file)) {

diff --git a/inst/WORDLIST b/inst/WORDLIST
@@ -47,7 +47,6 @@ autocompletion
 bandedCols
 bandedRows
 bool
-calc
 calcChain
 calculatedColumn
 camelCase

diff --git a/man/wbWorkbook.Rd b/man/wbWorkbook.Rd
diff --git a/man/wb_load.Rd b/man/wb_load.Rd
diff --git a/src/openxlsx2_types.h b/src/openxlsx2_types.h
@@ -99,21 +99,21 @@ inline SEXP wrap(const std::vector<xml_col> &x) {
 
   // struct to vector
   for (size_t i = 0; i < n; ++i) {
-    r[i] = Rcpp::String(x[i].r);
-    row_r[i] = Rcpp::String(x[i].row_r);
-    c_r[i]   = Rcpp::String(x[i].c_r);
-    c_s[i]   = Rcpp::String(x[i].c_s);
-    c_t[i]   = Rcpp::String(x[i].c_t);
-    c_cm[i]  = Rcpp::String(x[i].c_cm);
-    c_ph[i]  = Rcpp::String(x[i].c_ph);
-    c_vm[i]  = Rcpp::String(x[i].c_vm);
-    v[i]     = Rcpp::String(x[i].v);
-    f[i]     = Rcpp::String(x[i].f);
-    f_t[i]   = Rcpp::String(x[i].f_t);
-    f_ref[i] = Rcpp::String(x[i].f_ref);
-    f_ca[i]  = Rcpp::String(x[i].f_ca);
-    f_si[i]  = Rcpp::String(x[i].f_si);
-    is[i]    = Rcpp::String(x[i].is);
+    if (!x[i].r.empty())     r[i]     = Rcpp::String(x[i].r);
+    if (!x[i].row_r.empty()) row_r[i] = Rcpp::String(x[i].row_r);
+    if (!x[i].c_r.empty())   c_r[i]   = Rcpp::String(x[i].c_r);
+    if (!x[i].c_s.empty())   c_s[i]   = Rcpp::String(x[i].c_s);
+    if (!x[i].c_t.empty())   c_t[i]   = Rcpp::String(x[i].c_t);
+    if (!x[i].c_cm.empty())  c_cm[i]  = Rcpp::String(x[i].c_cm);
+    if (!x[i].c_ph.empty())  c_ph[i]  = Rcpp::String(x[i].c_ph);
+    if (!x[i].c_vm.empty())  c_vm[i]  = Rcpp::String(x[i].c_vm);
+    if (!x[i].v.empty())     v[i]     = Rcpp::String(x[i].v);
+    if (!x[i].f.empty())     f[i]     = Rcpp::String(x[i].f);
+    if (!x[i].f_t.empty())   f_t[i]   = Rcpp::String(x[i].f_t);
+    if (!x[i].f_ref.empty()) f_ref[i] = Rcpp::String(x[i].f_ref);
+    if (!x[i].f_ca.empty())  f_ca[i]  = Rcpp::String(x[i].f_ca);
+    if (!x[i].f_si.empty())  f_si[i]  = Rcpp::String(x[i].f_si);
+    if (!x[i].is.empty())    is[i]    = Rcpp::String(x[i].is);
   }
 
   // Assign and return a dataframe