From 437ad5f7b380fb631a1dac3f3d3477fc5716ac99 Mon Sep 17 00:00:00 2001 From: Kyle Haynes <5267027+KyleHaynes@users.noreply.github.com> Date: Sun, 22 Sep 2024 08:34:06 +1000 Subject: [PATCH 01/10] fix inconsistent code formatting in markdown vignette. --- vignettes/datatable-intro.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-intro.Rmd b/vignettes/datatable-intro.Rmd index 5f2718ec1..e08280c5c 100644 --- a/vignettes/datatable-intro.Rmd +++ b/vignettes/datatable-intro.Rmd @@ -665,7 +665,7 @@ We can do much more in `i` by keying a `data.table`, which allows for blazing fa 3. Compute on columns: `DT[, .(sum(colA), mean(colB))]`. -4. Provide names if necessary: `DT[, .(sA =sum(colA), mB = mean(colB))]`. +4. Provide names if necessary: `DT[, .(sA = sum(colA), mB = mean(colB))]`. 5. Combine with `i`: `DT[colA > value, sum(colB)]`. From 5222ed58ae41afd7f05124903eaaf44cf5cd587d Mon Sep 17 00:00:00 2001 From: Kyle Haynes <5267027+KyleHaynes@users.noreply.github.com> Date: Sun, 22 Sep 2024 09:12:10 +1000 Subject: [PATCH 02/10] minor fixes to Rmd files --- vignettes/datatable-faq.Rmd | 2 +- vignettes/datatable-importing.Rmd | 4 ++-- vignettes/datatable-reshape.Rmd | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/vignettes/datatable-faq.Rmd b/vignettes/datatable-faq.Rmd index 8d949bfa9..eb03ce694 100644 --- a/vignettes/datatable-faq.Rmd +++ b/vignettes/datatable-faq.Rmd @@ -632,7 +632,7 @@ Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no s ## I think it's great. What can I do? Please file suggestions, bug reports and enhancement requests on our [issues tracker](https://github.com/Rdatatable/data.table/issues). This helps make the package better. -Please do star the package on [GitHub](https://github.com/Rdatatable/data.table/wiki). This helps encourage the developers and helps other R users find the package. +Please do star the package on [GitHub](https://github.com/Rdatatable/data.table). This helps encourage the developers and helps other R users find the package. You can submit pull requests to change the code and/or documentation yourself; see our [Contribution Guidelines](https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md). diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index 484047317..ea7a7bc11 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -15,7 +15,7 @@ h2 { } -This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#non-r-API) of this vignette. +This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#importing-from-non-r-applications-non-r-api) of this vignette. Importing `data.table` is no different from importing other R packages. This vignette is meant to answer the most common questions arising around that subject; the lessons presented here can be applied to other R packages. @@ -138,7 +138,7 @@ The option mechanism in R is _global_. Meaning that if a user sets a `data.table If you face any problems in creating a package that uses data.table, please confirm that the problem is reproducible in a clean R session using the R console: `R CMD check package.name`. -Some of the most common issues developers are facing are usually related to helper tools that are meant to automate some package development tasks, for example, using `roxygen` to generate your `NAMESPACE` file from metadata in the R code files. Others are related to helpers that build and check the package. Unfortunately, these helpers sometimes have unintended/hidden side effects which can obscure the source of your troubles. As such, be sure to double check using R console (run R on the command line) and ensure the import is defined in the `DESCRIPTION` and `NAMESPACE` files following the [instructions](#DESCRIPTION) [above](#NAMESPACE). +Some of the most common issues developers are facing are usually related to helper tools that are meant to automate some package development tasks, for example, using `roxygen` to generate your `NAMESPACE` file from metadata in the R code files. Others are related to helpers that build and check the package. Unfortunately, these helpers sometimes have unintended/hidden side effects which can obscure the source of your troubles. As such, be sure to double check using R console (run R on the command line) and ensure the import is defined in the `DESCRIPTION` and `NAMESPACE` files following the [instructions](#description-file-description) [above](#namespace-file-namespace). If you are not able to reproduce problems you have using the plain R console build and check, you may try to get some support based on past issues we've encountered with `data.table` interacting with helper tools: [devtools#192](https://github.com/r-lib/devtools/issues/192) or [devtools#1472](https://github.com/r-lib/devtools/issues/1472). diff --git a/vignettes/datatable-reshape.Rmd b/vignettes/datatable-reshape.Rmd index 41b36c1a0..42045acf3 100644 --- a/vignettes/datatable-reshape.Rmd +++ b/vignettes/datatable-reshape.Rmd @@ -241,7 +241,7 @@ melt(two.iris, measure.vars = measure(value.name, dim, sep=".")) ``` Using the code above we get one value column per flower part. If we -instead want a value column for each measurement dimension, we can do +instead want a value column for each measurement dimension, we can do: ```{r} melt(two.iris, measure.vars = measure(part, value.name, sep=".")) From 11314fa60680b323374be2c7afb9b299236cca1d Mon Sep 17 00:00:00 2001 From: rikivillalba <32423469+rikivillalba@users.noreply.github.com> Date: Sat, 21 Sep 2024 22:32:58 -0300 Subject: [PATCH 03/10] Tweak: out_col missing s Since you're updating the vignette i add a minor tweak, --- vignettes/datatable-reference-semantics.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index e46412b5e..db4e53380 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -247,7 +247,7 @@ head(flights) * We use the `LHS := RHS` form. We store the input column names and the new columns to add in separate variables and provide them to `.SDcols` and for `LHS` (for better readability). -* Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_col`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases. +* Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_cols`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases. * The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the *"Introduction to data.table"* vignette. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group. From ca904cb4c87257b38540b7c7cc24da802e44241a Mon Sep 17 00:00:00 2001 From: rikivillalba <32423469+rikivillalba@users.noreply.github.com> Date: Sat, 21 Sep 2024 22:38:56 -0300 Subject: [PATCH 04/10] Tweak: duplicated "version" --- vignettes/datatable-reference-semantics.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index db4e53380..fb829fdf9 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -68,7 +68,7 @@ DF$c <- 18:13 # (1) -- replace entire column DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c' ``` -both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R` versions `< 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784). +both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R` `< 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784). Great performance improvements were made in `R v3.1` as a result of which only a *shallow* copy is made for (1) and not *deep* copy. However, for (2) still, the entire column is *deep* copied even in `R v3.1+`. This means the more columns one subassigns to in the *same query*, the more *deep* copies R does. From c7ed43d69a530fe63b7e35b8a9b9ef414aa114e6 Mon Sep 17 00:00:00 2001 From: Ivan K Date: Sun, 22 Sep 2024 09:29:01 +0300 Subject: [PATCH 05/10] Missing asterisk made into --- vignettes/datatable-importing.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index ea7a7bc11..178ee8103 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -275,7 +275,7 @@ result <- merge(dt, other_dt, by = "x") ``` ### Benefits of using `Imports` -- **User-Friendliness*: `Depends` alters your users' `search()` path, possibly without their wanting to do so. +- **User-Friendliness**: `Depends` alters your users' `search()` path, possibly without their wanting to do so. - **Namespace Management**: Only the functions your package explicitly imports are available, reducing the risk of function name clashes. - **Cleaner Package Loading**: Your package's dependencies are not attached to the search path, making the loading process cleaner and potentially faster. - **Easier Maintenance**: It simplifies maintenance tasks as upstream dependencies' APIs evolve. Depending too much on `Depends` can lead to conflicts and compatibility issues over time. From 80fcef6d847ebb604053828bc7d20a812fd0f13f Mon Sep 17 00:00:00 2001 From: Ivan K Date: Sun, 22 Sep 2024 09:31:11 +0300 Subject: [PATCH 06/10] If you actually run the code, 'gender' is IDate --- vignettes/datatable-reshape.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-reshape.Rmd b/vignettes/datatable-reshape.Rmd index 42045acf3..ef077a585 100644 --- a/vignettes/datatable-reshape.Rmd +++ b/vignettes/datatable-reshape.Rmd @@ -157,7 +157,7 @@ DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)] DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value") DT.c1 -str(DT.c1) ## gender column is character type now! +str(DT.c1) ## gender column is character IDate now! ``` #### Issues From 7d35a00e56d3be314d587ca47459f21add8370f6 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Sun, 22 Sep 2024 20:29:48 -0700 Subject: [PATCH 07/10] fix internal link anchors --- vignettes/datatable-importing.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index 178ee8103..bcb5eb89e 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -27,11 +27,11 @@ One of the biggest features of `data.table` is its concise syntax which makes ex It is very easy to use `data.table` as a dependency due to the fact that `data.table` does not have any of its own dependencies. This applies both to operating system and to R dependencies. It means that if you have R installed on your machine, it already has everything needed to install `data.table`. It also means that adding `data.table` as a dependency of your package will not result in a chain of other recursive dependencies to install, making it very convenient for offline installation. -## `DESCRIPTION` file {DESCRIPTION} +## `DESCRIPTION` file {#DESCRIPTION} The first place to define a dependency in a package is the `DESCRIPTION` file. Most commonly, you will need to add `data.table` under the `Imports:` field. Doing so will necessitate an installation of `data.table` before your package can compile/install. As mentioned above, no other packages will be installed because `data.table` does not have any dependencies of its own. You can also specify the minimal required version of a dependency; for example, if your package is using the `fwrite` function, which was introduced in `data.table` in version 1.9.8, you should incorporate this as `Imports: data.table (>= 1.9.8)`. This way you can ensure that the version of `data.table` installed is 1.9.8 or later before your users will be able to install your package. Besides the `Imports:` field, you can also use `Depends: data.table` but we strongly discourage this approach (and may disallow it in future) because this loads `data.table` into your user's workspace; i.e. it enables `data.table` functionality in your user's scripts without them requesting that. `Imports:` is the proper way to use `data.table` within your package without inflicting `data.table` on your user. In fact, we hope the `Depends:` field is eventually deprecated in R since this is true for all packages. -## `NAMESPACE` file {NAMESPACE} +## `NAMESPACE` file {#NAMESPACE} The next thing is to define what content of `data.table` your package is using. This needs to be done in the `NAMESPACE` file. Most commonly, package authors will want to use `import(data.table)` which will import all exported (i.e., listed in `data.table`'s own `NAMESPACE` file) functions from `data.table`. @@ -195,7 +195,7 @@ For more canonical documentation of defining packages dependency check the offic Some of internally used C routines are now exported on C level thus can be used in R packages directly from their C code. See [`?cdt`](https://rdatatable.gitlab.io/data.table/reference/cdt.html) for details and [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) _Linking to native routines in other packages_ section for usage. -## Importing from non-r Applications {non-r-api} +## Importing from non-r Applications {#non-r-api} Some tiny parts of `data.table` C code were isolated from the R C API and can now be used from non-R applications by linking to .so / .dll files. More concrete details about this will be provided later; for now you can study the C code that was isolated from the R C API in [src/fread.c](https://github.com/Rdatatable/data.table/blob/master/src/fread.c) and [src/fwrite.c](https://github.com/Rdatatable/data.table/blob/master/src/fwrite.c). From 49af9cdee1343bb11847fdb51ef2d543d23217c9 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Sun, 22 Sep 2024 20:31:09 -0700 Subject: [PATCH 08/10] Revert: use correctly-named anchors --- vignettes/datatable-importing.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index bcb5eb89e..2d08ffcf9 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -15,7 +15,7 @@ h2 { } -This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#importing-from-non-r-applications-non-r-api) of this vignette. +This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#non-r-api) of this vignette. Importing `data.table` is no different from importing other R packages. This vignette is meant to answer the most common questions arising around that subject; the lessons presented here can be applied to other R packages. @@ -138,7 +138,7 @@ The option mechanism in R is _global_. Meaning that if a user sets a `data.table If you face any problems in creating a package that uses data.table, please confirm that the problem is reproducible in a clean R session using the R console: `R CMD check package.name`. -Some of the most common issues developers are facing are usually related to helper tools that are meant to automate some package development tasks, for example, using `roxygen` to generate your `NAMESPACE` file from metadata in the R code files. Others are related to helpers that build and check the package. Unfortunately, these helpers sometimes have unintended/hidden side effects which can obscure the source of your troubles. As such, be sure to double check using R console (run R on the command line) and ensure the import is defined in the `DESCRIPTION` and `NAMESPACE` files following the [instructions](#description-file-description) [above](#namespace-file-namespace). +Some of the most common issues developers are facing are usually related to helper tools that are meant to automate some package development tasks, for example, using `roxygen` to generate your `NAMESPACE` file from metadata in the R code files. Others are related to helpers that build and check the package. Unfortunately, these helpers sometimes have unintended/hidden side effects which can obscure the source of your troubles. As such, be sure to double check using R console (run R on the command line) and ensure the import is defined in the `DESCRIPTION` and `NAMESPACE` files following the [instructions](#DESCRIPTION) [above](#NAMESPACE). If you are not able to reproduce problems you have using the plain R console build and check, you may try to get some support based on past issues we've encountered with `data.table` interacting with helper tools: [devtools#192](https://github.com/r-lib/devtools/issues/192) or [devtools#1472](https://github.com/r-lib/devtools/issues/1472). From 709195b9c29ea297114928ebf23d78a9141404c8 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Sun, 22 Sep 2024 20:32:39 -0700 Subject: [PATCH 09/10] combine blocks --- vignettes/datatable-reference-semantics.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index fb829fdf9..170783165 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -68,7 +68,7 @@ DF$c <- 18:13 # (1) -- replace entire column DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c' ``` -both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R` `< 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784). +both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R < 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784). Great performance improvements were made in `R v3.1` as a result of which only a *shallow* copy is made for (1) and not *deep* copy. However, for (2) still, the entire column is *deep* copied even in `R v3.1+`. This means the more columns one subassigns to in the *same query*, the more *deep* copies R does. From 3e07a8d90716b9a0efc0410ed5442af2ff803d3f Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Sun, 22 Sep 2024 20:38:47 -0700 Subject: [PATCH 10/10] Correct wording --- vignettes/datatable-reshape.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-reshape.Rmd b/vignettes/datatable-reshape.Rmd index ef077a585..c84c1558d 100644 --- a/vignettes/datatable-reshape.Rmd +++ b/vignettes/datatable-reshape.Rmd @@ -157,7 +157,7 @@ DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)] DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value") DT.c1 -str(DT.c1) ## gender column is character IDate now! +str(DT.c1) ## gender column is class IDate now! ``` #### Issues