Skip to content

Commit

Permalink
GForce works with data.table::shift (#5944)
Browse files Browse the repository at this point in the history
* GForce works with data.table::shift

* use if+implicit else in return for concision

* better comment
  • Loading branch information
MichaelChirico authored Feb 22, 2024
1 parent 47e2637 commit 87d12c2
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 7 deletions.
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

2. `cedta()` now returns `FALSE` if `.datatable.aware = FALSE` is set in the calling environment, [#5654](https://github.com/Rdatatable/data.table/issues/5654).

3. Namespace-qualifying `data.table::shift()`, `data.table::first()`, or `data.table::last()` will not deactivate GForce, [#5942](https://github.com/Rdatatable/data.table/issues/5942). Thanks @MichaelChirico for the proposal and fix. Namespace-qualifying other calls like `stats::sum()`, `base::prod()`, etc., continue to work as an escape valve to avoid GForce, e.g. to ensure S3 method dispatch.

## NOTES

1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.
Expand Down
26 changes: 19 additions & 7 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1783,15 +1783,25 @@ replace_dot_alias = function(e) {
is_constantish(q[["na.rm"]]) &&
(is.null(q[["w"]]) || eval(call('is.numeric', q[["w"]]), envir=x))
}
# run GForce for simple f(x) calls and f(x, na.rm = TRUE)-like calls where x is a column of .SD
.get_gcall = function(q) {
if (!is.call(q)) return(NULL)
# is.symbol() is for #1369, #1974 and #2949
if (!is.symbol(q[[2L]])) return(NULL)
q1 <- q[[1L]]
if (is.symbol(q1)) return(if (q1 %chin% gfuns) q1)
if (!q1 %iscall% "::") return(NULL)
if (q1[[2L]] != "data.table") return(NULL)
return(if (q1[[3L]] %chin% gdtfuns) q1[[3L]])
}
.gforce_ok = function(q) {
if (dotN(q)) return(TRUE) # For #334
# run GForce for simple f(x) calls and f(x, na.rm = TRUE)-like calls where x is a column of .SD
# is.symbol() is for #1369, #1974 and #2949
if (!(is.call(q) && is.symbol(q[[1L]]) && is.symbol(q[[2L]]) && (q[[1L]]) %chin% gfuns)) return(FALSE)
q1 <- .get_gcall(q)
if (is.null(q1)) return(FALSE)
if (!(q2 <- q[[2L]]) %chin% names(SDenv$.SDall) && q2 != ".I") return(FALSE) # 875
if (length(q)==2L || (!is.null(names(q)) && startsWith(names(q)[3L], "na") && is_constantish(q[[3L]]))) return(TRUE)
# ^^ base::startWith errors on NULL unfortunately
switch(as.character(q[[1L]]),
switch(as.character(q1),
"shift" = .gshift_ok(q),
"weighted.mean" = .gweighted.mean_ok(q, x),
"tail" = , "head" = .ghead_ok(q),
Expand All @@ -1806,7 +1816,8 @@ replace_dot_alias = function(e) {
}
} else GForce = .gforce_ok(jsub)
gforce_jsub = function(x, names_x) {
x[[1L]] = as.name(paste0("g", x[[1L]]))
call_name <- if (is.symbol(x[[1L]])) x[[1L]] else x[[1L]][[3L]] # latter is like data.table::shift, #5942. .gshift_ok checked this will work.
x[[1L]] = as.name(paste0("g", call_name))
# gforce needs to evaluate arguments before calling C part TODO: move the evaluation into gforce_ok
# do not evaluate vars present as columns in x
if (length(x) >= 3L) {
Expand Down Expand Up @@ -3040,8 +3051,9 @@ rleidv = function(x, cols=seq_along(x), prefix=NULL) {
# (1) add it to gfuns
# (2) edit .gforce_ok (defined within `[`) to catch which j will apply the new function
# (3) define the gfun = function() R wrapper
gfuns = c("[", "[[", "head", "tail", "first", "last", "sum", "mean", "prod",
"median", "min", "max", "var", "sd", ".N", "shift", "weighted.mean") # added .N for #334
gdtfuns = c("first", "last", "shift") # exported by data.table, not generic, thus also accept data.table:: form under GForce, #5942.
gfuns = c(gdtfuns,
"[", "[[", "head", "tail", "sum", "mean", "prod", "median", "min", "max", "var", "sd", ".N", "weighted.mean") # added .N for #334
`g[` = `g[[` = function(x, n) .Call(Cgnthvalue, x, as.integer(n)) # n is of length=1 here.
ghead = function(x, n) .Call(Cghead, x, as.integer(n))
gtail = function(x, n) .Call(Cgtail, x, as.integer(n))
Expand Down
8 changes: 8 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -18278,3 +18278,11 @@ test(2245.1, dt[1], data.table(foo = 1L, bar = 4L)) # Single index should be int
test(2245.2, copy(dt[1]), data.table(foo = 1:3)) # Revert to data.frame syntax: Interpret index as column
rm(.datatable.aware)
test(2245.3, dt[1], data.table(foo = 1L, bar = 4L)) # Default in this environment should be datatable-aware

# data.table:: doesn't turn off GForce, #5942
DT = data.table(a = rep(1:5, 2L), b = 1:10)
old = options(datatable.optimize=Inf, datatable.verbose=TRUE)
test(2246.1, DT[, data.table::shift(b), by=a], DT[, shift(b), by=a], output="GForce TRUE")
test(2246.2, DT[, data.table::first(b), by=a], DT[, first(b), by=a], output="GForce TRUE")
test(2246.3, DT[, data.table::last(b), by=a], DT[, last(b), by=a], output="GForce TRUE")
options(old)

0 comments on commit 87d12c2

Please sign in to comment.