Skip to content

Commit

Permalink
Inconsistent code formatting / minor fixes to vignettes (#6521)
Browse files Browse the repository at this point in the history
* fix inconsistent code formatting in markdown vignette.

* minor fixes to Rmd files

* Tweak: out_col missing s

Since you're updating the vignette i add a minor tweak,

* Tweak: duplicated "version"

* Missing asterisk made <strong> into <emph>

* If you actually run the code, 'gender' is IDate

* fix internal link anchors

* Revert: use correctly-named anchors

* combine blocks

* Correct wording

---------

Co-authored-by: rikivillalba <32423469+rikivillalba@users.noreply.github.com>
Co-authored-by: Ivan K <krylov.r00t@gmail.com>
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
  • Loading branch information
4 people authored Sep 23, 2024
1 parent 075107a commit 47923c9
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 11 deletions.
2 changes: 1 addition & 1 deletion vignettes/datatable-faq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,7 @@ Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no s
## I think it's great. What can I do?
Please file suggestions, bug reports and enhancement requests on our [issues tracker](https://github.com/Rdatatable/data.table/issues). This helps make the package better.

Please do star the package on [GitHub](https://github.com/Rdatatable/data.table/wiki). This helps encourage the developers and helps other R users find the package.
Please do star the package on [GitHub](https://github.com/Rdatatable/data.table). This helps encourage the developers and helps other R users find the package.

You can submit pull requests to change the code and/or documentation yourself; see our [Contribution Guidelines](https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md).

Expand Down
10 changes: 5 additions & 5 deletions vignettes/datatable-importing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ h2 {
}
</style>

This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#non-r-API) of this vignette.
This document is focused on using `data.table` as a dependency in other R packages. If you are interested in using `data.table` C code from a non-R application, or in calling its C functions directly, jump to the [last section](#non-r-api) of this vignette.

Importing `data.table` is no different from importing other R packages. This vignette is meant to answer the most common questions arising around that subject; the lessons presented here can be applied to other R packages.

Expand All @@ -27,11 +27,11 @@ One of the biggest features of `data.table` is its concise syntax which makes ex

It is very easy to use `data.table` as a dependency due to the fact that `data.table` does not have any of its own dependencies. This applies both to operating system and to R dependencies. It means that if you have R installed on your machine, it already has everything needed to install `data.table`. It also means that adding `data.table` as a dependency of your package will not result in a chain of other recursive dependencies to install, making it very convenient for offline installation.

## `DESCRIPTION` file {DESCRIPTION}
## `DESCRIPTION` file {#DESCRIPTION}

The first place to define a dependency in a package is the `DESCRIPTION` file. Most commonly, you will need to add `data.table` under the `Imports:` field. Doing so will necessitate an installation of `data.table` before your package can compile/install. As mentioned above, no other packages will be installed because `data.table` does not have any dependencies of its own. You can also specify the minimal required version of a dependency; for example, if your package is using the `fwrite` function, which was introduced in `data.table` in version 1.9.8, you should incorporate this as `Imports: data.table (>= 1.9.8)`. This way you can ensure that the version of `data.table` installed is 1.9.8 or later before your users will be able to install your package. Besides the `Imports:` field, you can also use `Depends: data.table` but we strongly discourage this approach (and may disallow it in future) because this loads `data.table` into your user's workspace; i.e. it enables `data.table` functionality in your user's scripts without them requesting that. `Imports:` is the proper way to use `data.table` within your package without inflicting `data.table` on your user. In fact, we hope the `Depends:` field is eventually deprecated in R since this is true for all packages.

## `NAMESPACE` file {NAMESPACE}
## `NAMESPACE` file {#NAMESPACE}

The next thing is to define what content of `data.table` your package is using. This needs to be done in the `NAMESPACE` file. Most commonly, package authors will want to use `import(data.table)` which will import all exported (i.e., listed in `data.table`'s own `NAMESPACE` file) functions from `data.table`.

Expand Down Expand Up @@ -195,7 +195,7 @@ For more canonical documentation of defining packages dependency check the offic

Some of internally used C routines are now exported on C level thus can be used in R packages directly from their C code. See [`?cdt`](https://rdatatable.gitlab.io/data.table/reference/cdt.html) for details and [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) _Linking to native routines in other packages_ section for usage.

## Importing from non-r Applications {non-r-api}
## Importing from non-r Applications {#non-r-api}

Some tiny parts of `data.table` C code were isolated from the R C API and can now be used from non-R applications by linking to .so / .dll files. More concrete details about this will be provided later; for now you can study the C code that was isolated from the R C API in [src/fread.c](https://github.com/Rdatatable/data.table/blob/master/src/fread.c) and [src/fwrite.c](https://github.com/Rdatatable/data.table/blob/master/src/fwrite.c).

Expand Down Expand Up @@ -275,7 +275,7 @@ result <- merge(dt, other_dt, by = "x")
```

### Benefits of using `Imports`
- **User-Friendliness*: `Depends` alters your users' `search()` path, possibly without their wanting to do so.
- **User-Friendliness**: `Depends` alters your users' `search()` path, possibly without their wanting to do so.
- **Namespace Management**: Only the functions your package explicitly imports are available, reducing the risk of function name clashes.
- **Cleaner Package Loading**: Your package's dependencies are not attached to the search path, making the loading process cleaner and potentially faster.
- **Easier Maintenance**: It simplifies maintenance tasks as upstream dependencies' APIs evolve. Depending too much on `Depends` can lead to conflicts and compatibility issues over time.
2 changes: 1 addition & 1 deletion vignettes/datatable-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -665,7 +665,7 @@ We can do much more in `i` by keying a `data.table`, which allows for blazing fa

3. Compute on columns: `DT[, .(sum(colA), mean(colB))]`.

4. Provide names if necessary: `DT[, .(sA =sum(colA), mB = mean(colB))]`.
4. Provide names if necessary: `DT[, .(sA = sum(colA), mB = mean(colB))]`.

5. Combine with `i`: `DT[colA > value, sum(colB)]`.

Expand Down
4 changes: 2 additions & 2 deletions vignettes/datatable-reference-semantics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ DF$c <- 18:13 # (1) -- replace entire column
DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c'
```

both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R` versions `< 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784).
both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R < 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784).

Great performance improvements were made in `R v3.1` as a result of which only a *shallow* copy is made for (1) and not *deep* copy. However, for (2) still, the entire column is *deep* copied even in `R v3.1+`. This means the more columns one subassigns to in the *same query*, the more *deep* copies R does.

Expand Down Expand Up @@ -247,7 +247,7 @@ head(flights)

* We use the `LHS := RHS` form. We store the input column names and the new columns to add in separate variables and provide them to `.SDcols` and for `LHS` (for better readability).

* Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_col`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases.
* Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_cols`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases.

* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the *"Introduction to data.table"* vignette. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group.

Expand Down
4 changes: 2 additions & 2 deletions vignettes/datatable-reshape.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)]
DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value")
DT.c1
str(DT.c1) ## gender column is character type now!
str(DT.c1) ## gender column is class IDate now!
```

#### Issues
Expand Down Expand Up @@ -241,7 +241,7 @@ melt(two.iris, measure.vars = measure(value.name, dim, sep="."))
```

Using the code above we get one value column per flower part. If we
instead want a value column for each measurement dimension, we can do
instead want a value column for each measurement dimension, we can do:

```{r}
melt(two.iris, measure.vars = measure(part, value.name, sep="."))
Expand Down

0 comments on commit 47923c9

Please sign in to comment.