Skip to content

Commit

Permalink
Merge branch 'master' into stopf-nframe
Browse files Browse the repository at this point in the history
  • Loading branch information
MichaelChirico authored Jul 15, 2024
2 parents 6caf66a + bbe7563 commit 353675d
Show file tree
Hide file tree
Showing 10 changed files with 90 additions and 44 deletions.
6 changes: 1 addition & 5 deletions .dev/cc.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,7 @@ sourceImports = function(path=getwd(), quiet=FALSE) {
return(invisible())
}

cc = function(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, cc_dir, path=Sys.getenv("PROJ_PATH", unset="."), CC="gcc", quiet=FALSE) {
if (!missing(cc_dir)) {
warning("'cc_dir' arg is deprecated, use 'path' argument or 'PROJ_PATH' env var instead")
path = cc_dir
}
cc = function(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, path=Sys.getenv("PROJ_PATH", unset=normalizePath(".")), CC="gcc", quiet=FALSE) {
stopifnot(is.character(CC), length(CC)==1L, !is.na(CC), nzchar(CC))
gc()

Expand Down
27 changes: 10 additions & 17 deletions .github/workflows/R-CMD-check-occasional.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
on:
schedule:
- cron: '17 13 13 * *' # 13th of month at 13:17 UTC
- cron: '17 13 14 * *' # 14th of month at 13:17 UTC

# A more complete suite of checks to run monthly; each PR/merge need not pass all these, but they should pass before CRAN release
name: R-CMD-check-occasional
Expand Down Expand Up @@ -42,7 +42,6 @@ jobs:
r: '4.1'

env:
R_LIBS_USER: /home/runner/work/r-lib
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

Expand Down Expand Up @@ -95,16 +94,6 @@ jobs:
eval sudo $cmd
done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "20.04"))')
- name: Install dependencies
run: |
remotes::install_deps(dependencies=TRUE, force=TRUE)
other_deps_expr = parse('inst/tests/other.Rraw', n=1L)
eval(other_deps_expr)
pkgs <- get(as.character(other_deps_expr[[1L]][[2L]]))
# May not install on oldest R versions
try(remotes::install_cran(c(pkgs, "rcmdcheck"), force=TRUE))
shell: Rscript {0}

- name: Check
env:
# several Suggests dependencies have R dependencies more recent than ours
Expand All @@ -114,22 +103,26 @@ jobs:
run: |
options(crayon.enabled = TRUE)
remotes::install_deps(dependencies=TRUE, force=TRUE)
other_deps_expr = parse('inst/tests/other.Rraw', n=1L)
eval(other_deps_expr)
pkgs <- get(as.character(other_deps_expr[[1L]][[2L]]))
# Many will not install on oldest R versions
try(remotes::install_cran(c(pkgs, "rcmdcheck"), force=TRUE))
# we define this in data.table namespace, but it appears to be exec
if (!exists("isFALSE", "package:base")) {
if (!exists("isFALSE", asNamespace("data.table"))) {
cat("isFALSE not found in base, but data.table did not define it either!\n")
message("isFALSE not found in base, but data.table did not define it either!\n")
}
# attempt defining it here as a workaround...
isFALSE = function(x) is.logical(x) && length(x) == 1L && !is.na(x) && !x
}
other_deps_expr = parse('inst/tests/other.Rraw', n=1L)
eval(other_deps_expr)
pkgs = get(as.character(other_deps_expr[[1L]][[2L]]))
has_pkg = sapply(pkgs, requireNamespace, quietly=TRUE)
run_other = all(has_pkg)
if (!run_other) {
cat(sprintf("Skipping other.Rraw since some required packages are not available: %s\n", toString(pkgs[!has_pkg])))
message(sprintf("Skipping other.Rraw since some required packages are not available: %s\n", toString(pkgs[!has_pkg])))
}
Sys.setenv(TEST_DATA_TABLE_WITH_OTHER_PACKAGES=as.character(run_other))
Expand Down
24 changes: 24 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@

10. `dt[,,by=año]` (i.e., using a column name containing a non-ASCII character in `by` as a plain symbol) no longer errors with "object 'año' not found", #4708. Thanks @pfv07 for the report, and Michael Chirico for the fix.

11. Fix some memory management issues in the C routines backing `melt()`, `froll()`, and GForce `mean()`, as identified by `rchk`. Thanks Tomas Kalibera and the CRAN team for setting up the `rchk` system, and @MichaelChirico for the fix.

12. data.table's `all.equal()` method now dispatches to each column's own `all.equal()` method as appropriate, [#4543](https://github.com/Rdatatable/data.table/issues/4543). Thanks @MichaelChirico for the report and fix. Note that this had two noteworthy changes to data.table's own test suite that might affect you: (1) comparisons of POSIXct columns compare absolute, not relative differences, meaning that millisecond-scale differences might trigger a "not equal" report that was hidden before; and (2) comparisons of integer64 columns could be totally wrong since they were being compared on the basis of their representation as doubles, not long integers. The former might be a matter of preference requiring you to specify a different `tolerance=`, while the latter was clearly a bug.

12. `rbindlist` could lead to a protection stack overflow when applied to a list containing many nested lists exceeding the pointer protection stack size, [#4536](https://github.com/Rdatatable/data.table/issues/4536). Thanks to @ProfFancyPants for reporting, and Benjamin Schwendinger for the fix.

## NOTES

1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.
Expand Down Expand Up @@ -114,6 +120,24 @@
1. Fix a typo in a Mandarin translation of an error message that was hiding the actual error message, [#6172](https://github.com/Rdatatable/data.table/issues/6172). Thanks @trafficfan for the report and @MichaelChirico for the fix.
# data.table [v1.15.4](https://github.com/Rdatatable/data.table/milestone/33) (27 March 2024)
## BUG FIXES
1. Optimized `shift` per group produced wrong results when simultaneously subsetting, for example, `DT[i==1L, shift(x), by=group]`, [#5962](https://github.com/Rdatatable/data.table/issues/5962). Thanks to @renkun-ken for the report and Benjamin Schwendinger for the fix.
## NOTES
1. Updated a test relying on `>` working for comparing language objects to a string, which will be deprecated by R, [#5977](https://github.com/Rdatatable/data.table/issues/5977); no user-facing effect. Thanks to R-core for continuously improving the language.
# data.table [v1.15.2](https://github.com/Rdatatable/data.table/milestone/32) (27 Feb 2024)
## BUG FIXES
1. An error in `fwrite()` is more robust across platforms -- CRAN found the use of `PRId64` does not always match the output of `xlength()`, e.g. on some Mac M1 builds [#5935](https://github.com/Rdatatable/data.table/issues/5935). Thanks CRAN for identifying the issue and @ben-schwen for the fix.
2. `shift()` of a vector in grouped queries (under GForce) returns a vector, consistent with `shift()` in other contexts, [#5939](https://github.com/Rdatatable/data.table/issues/5939). Thanks @shrektan for the report and @MichaelChirico for the fix.
# data.table [v1.15.0](https://github.com/Rdatatable/data.table/milestone/29) (30 Jan 2024)
## BREAKING CHANGE
Expand Down
10 changes: 7 additions & 3 deletions R/setops.R
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ all.equal.data.table = function(target, current, trim.levels=TRUE, check.attribu
# trim.levels moved here
x = target[[i]]
y = current[[i]]
if (xor(is.factor(x),is.factor(y)))
if (XOR(is.factor(x), is.factor(y)))
stopf("Internal error: factor type mismatch should have been caught earlier") # nocov
cols.r = TRUE
if (is.factor(x)) {
Expand All @@ -282,8 +282,12 @@ all.equal.data.table = function(target, current, trim.levels=TRUE, check.attribu
# dealt with according to trim.levels
}
} else {
cols.r = all.equal(unclass(x), unclass(y), tolerance=tolerance, ..., check.attributes=check.attributes)
# classes were explicitly checked earlier above, so ignore classes here.
# for test 1710.5 and #4543, we want to (1) make sure we dispatch to
# any existing all.equal methods for x while also (2) treating class(x)/class(y)
# as attributes as regards check.attributes argument
cols.r = all.equal(x, y, tolerance=tolerance, ..., check.attributes=check.attributes)
if (!isTRUE(cols.r) && !check.attributes && isTRUE(all.equal(unclass(x), unclass(y), tolerance=tolerance, ..., check.attributes=FALSE)))
cols.r = TRUE
}
if (!isTRUE(cols.r)) return(paste0("Column '", names(target)[i], "': ", paste(cols.r,collapse=" ")))
}
Expand Down
3 changes: 3 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ vapply_1i = function(x, fun, ..., use.names = TRUE) {
vapply(X = x, FUN = fun, ..., FUN.VALUE = NA_integer_, USE.NAMES = use.names)
}

# base::xor(), but with scalar operators
XOR = function(x, y) (x || y) && !(x && y)

# not is.atomic because is.atomic(matrix) is true
cols_with_dims = function(x) vapply_1i(x, function(j) length(dim(j))) > 0L

Expand Down
43 changes: 31 additions & 12 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -15114,7 +15114,7 @@ if (test_bit64) {
# int64 in x
test(2044.69, dt1[dt2, ..cols, on="int64==int", nomatch=0L, verbose=TRUE],
ans<-data.table(x.int=1:5, x.doubleInt=as.double(1:5), x.realDouble=c(0.5,1.0,1.5,2.0,2.5), x.int64=as.integer64(1:5),
i.int=1:5, i.doubleInt=as.double(1:5), i.realDouble=c(0.5,1.0,1.5,2.0,2.5), i.int64=as.integer64(1:5)),
i.int=1:5, i.doubleInt=as.double(1:5), i.realDouble=c(0.5,1.0,1.5,2.0,2.5), i.int64=as.integer64(c(1:4, 3000000000))),
output = "Coercing integer column i.int to type integer64 to match type of x.int64")
test(2044.70, dt1[dt2, ..cols, on="int64==doubleInt", nomatch=0L, verbose=TRUE],
ans,
Expand Down Expand Up @@ -16895,16 +16895,17 @@ times = .POSIXct(tz = 'UTC', c(
1201364458.72486, 939956943.690777
))
DT = data.table(dates, times)
DT_trunc = copy(DT)[, times := as.POSIXct(trunc(times))]
tmp = tempfile()
## ISO8601 format (%FT%TZ) by default
fwrite(DT, tmp)
test(2150.01, fread(tmp), DT) # defaults for fwrite/fread simple and preserving
fwrite(DT, tmp, dateTimeAs='write.csv') # as write.csv, writes the UTC times as-is not local because the time column has tzone=="UTC", but without the Z marker
fwrite(DT, tmp, dateTimeAs='write.csv') # as write.csv, writes the UTC times as-is not local because the time column has tzone=="UTC", but without the Z marker. Also truncates milliseconds, hence DT_trunc below.
test(2150.021, env=list(TZ=NULL), sapply(fread(tmp,tz=""), typeof), c(dates="integer", times="character")) # from v1.14.0 tz="" needed to read datetime as character
test(2150.022, env=list(TZ=NULL), fread(tmp,tz="UTC"), DT) # user can tell fread to interpret the unmarked datetimes as UTC
test(2150.023, env=c(TZ='UTC'), fread(tmp), DT) # TZ environment variable is also recognized
test(2150.022, env=list(TZ=NULL), fread(tmp,tz="UTC"), DT_trunc) # user can tell fread to interpret the unmarked datetimes as UTC
test(2150.023, env=c(TZ='UTC'), fread(tmp), DT_trunc) # TZ environment variable is also recognized
if (.Platform$OS.type!="windows") {
test(2150.024, env=c(TZ=''), fread(tmp), DT) # on Windows this unsets TZ, see ?Sys.setenv
test(2150.024, env=c(TZ=''), fread(tmp), DT_trunc) # on Windows this unsets TZ, see ?Sys.setenv
# blank TZ env variable on non-Windows is recognized as UTC consistent with C and R; but R's tz= argument is the opposite and uses "" for local
}
# Notes:
Expand All @@ -16919,14 +16920,14 @@ test(2150.025,
attr(fread(tmp, colClasses=list(POSIXct="times"), tz="")$times, "tzone"),
"")
# the times will be different though here because as.POSIXct read them as local time.
fwrite(copy(DT)[ , times := format(times, '%FT%T+00:00')], tmp)
test(2150.03, fread(tmp), DT)
fwrite(copy(DT)[ , times := format(times, '%FT%T+00:00')], tmp) #NB: %T-->truncate milliseconds.
test(2150.03, fread(tmp), DT_trunc)
fwrite(copy(DT)[ , times := format(times, '%FT%T+0000')], tmp)
test(2150.04, fread(tmp), DT)
test(2150.04, fread(tmp), DT_trunc)
fwrite(copy(DT)[ , times := format(times, '%FT%T+0115')], tmp)
test(2150.05, fread(tmp), copy(DT)[ , times := times - 4500])
test(2150.05, fread(tmp), copy(DT_trunc)[ , times := times - 4500])
fwrite(copy(DT)[ , times := format(times, '%FT%T+01')], tmp)
test(2150.06, fread(tmp), copy(DT)[ , times := times - 3600])
test(2150.06, fread(tmp), copy(DT_trunc)[ , times := times - 3600])
## invalid tz specifiers
fwrite(copy(DT)[ , times := format(times, '%FT%T+3600')], tmp)
test(2150.07, fread(tmp), copy(DT)[ , times := format(times, '%FT%T+3600')])
Expand Down Expand Up @@ -17458,8 +17459,8 @@ if (test_bit64) {
# X[Y,,by=.EACHI] when Y contains integer64 also fixed in 1.12.4, #3779
X = data.table(x=1:3)
Y = data.table(x=1:2, y=as.integer64(c(10,20)))
test(2193.2, copy(X)[Y, `:=`(y=i.y), on="x", by=.EACHI], data.table(x=1:3, y=as.integer64(10L,20L,NA)))
test(2193.3, copy(X)[Y, let(y=i.y), on="x", by=.EACHI], data.table(x=1:3, y=as.integer64(10L,20L,NA)))
test(2193.2, copy(X)[Y, `:=`(y=i.y), on="x", by=.EACHI], data.table(x=1:3, y=as.integer64(c(10L,20L,NA))))
test(2193.3, copy(X)[Y, let(y=i.y), on="x", by=.EACHI], data.table(x=1:3, y=as.integer64(c(10L,20L,NA))))
}

# endsWithAny added in #5097 for internal use replacing one use of base::endsWith (in fread.R)
Expand Down Expand Up @@ -18713,3 +18714,21 @@ DT = data.table(a = rep(1:3, 2))
# NB: recall we can't use non-ASCII symbols here. the text is a-<n-tilde>-o (year in Spanish)
setnames(DT, "a", "a\U00F1o")
test(2266, eval(parse(text="DT[ , .N, a\U00F1o]$N[1L]")), 2L)

# all.equal failed to dispatch to methods of columns, #4543
DT1 = data.table(t = .POSIXct(1590973200, tz='UTC'))
DT2 = data.table(t = .POSIXct(1590974413, tz='UTC'))
test(2267, all.equal(DT1, DT2), "Column 't': Mean absolute difference: 1213")

# rbindlist with many nested lists leads to protection stack overflow #4536
# test case needs to bind list columns with atomic columns to trigger
x = list(
data.table(V1="a"),
data.table(V1=""),
data.table(V2=list(FALSE)),
data.table(V2=list(list(V2A=FALSE, V2B="", V2C="" ))),
data.table(V1=list(list(V1A="", V1B="", V1C="", V1D="", V1E="")), V3 = list(list(V3A=TRUE, V3B="", V3C=1L, V3D=0L, V3F=1L)))
)
N = 5e4
y = x[rep(1:5, N)]
test(2268, rbindlist(y, fill=TRUE), rbindlist(x, fill=TRUE)[rep(1:5, N)])
5 changes: 3 additions & 2 deletions src/fmelt.c
Original file line number Diff line number Diff line change
Expand Up @@ -197,8 +197,9 @@ SEXP uniq_diff(SEXP int_or_list, int ncol, bool is_measure) {
INTEGER(unique_col_numbers)[unique_i++] = INTEGER(int_vec)[i];
}
}
SEXP out = set_diff(unique_col_numbers, ncol);
UNPROTECT(3);
return set_diff(unique_col_numbers, ncol);
return out;
}

SEXP cols_to_int_or_list(SEXP cols, SEXP dtnames, bool is_measure) {
Expand Down Expand Up @@ -247,7 +248,7 @@ SEXP checkVars(SEXP DT, SEXP id, SEXP measure, Rboolean verbose) {
if (length(value_col)) Rprintf(_("Assigned 'measure.vars' are [%s].\n"), concat(dtnames, value_col));
}
} else if (isNull(id) && !isNull(measure)) {
SEXP measure_int_or_list = cols_to_int_or_list(measure, dtnames, true);
SEXP measure_int_or_list = PROTECT(cols_to_int_or_list(measure, dtnames, true)); protecti++;
idcols = PROTECT(uniq_diff(measure_int_or_list, ncol, true)); protecti++;
if (isNewList(measure)) valuecols = measure_int_or_list;
else {
Expand Down
9 changes: 6 additions & 3 deletions src/frollR.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ SEXP coerceToRealListR(SEXP obj) {
SEXP this_obj = VECTOR_ELT(obj, i);
if (!(isReal(this_obj) || isInteger(this_obj) || isLogical(this_obj)))
error(_("x must be of type numeric or logical, or a list, data.frame or data.table of such"));
SET_VECTOR_ELT(x, i, coerceAs(this_obj, ScalarReal(NA_REAL), /*copyArg=*/ScalarLogical(false))); // copyArg=false will make type-class match to return as-is, no copy
SET_VECTOR_ELT(x, i, coerceAs(this_obj, PROTECT(ScalarReal(NA_REAL)), /*copyArg=*/ScalarLogical(false))); // copyArg=false will make type-class match to return as-is, no copy
UNPROTECT(1); // as= input to coerceAs()
}
UNPROTECT(protecti);
return x;
Expand Down Expand Up @@ -145,7 +146,8 @@ SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEX
error(_("fill must be a vector of length 1"));
if (!isInteger(fill) && !isReal(fill) && !isLogical(fill))
error(_("fill must be numeric or logical"));
double dfill = REAL(PROTECT(coerceAs(fill, ScalarReal(NA_REAL), ScalarLogical(true))))[0]; protecti++;
double dfill = REAL(PROTECT(coerceAs(fill, PROTECT(ScalarReal(NA_REAL)), ScalarLogical(true))))[0]; protecti++;
UNPROTECT(1); // as= input to coerceAs()

bool bnarm = LOGICAL(narm)[0];

Expand Down Expand Up @@ -257,7 +259,8 @@ SEXP frollapplyR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP align, SEXP rho) {
error(_("fill must be a vector of length 1"));
if (!isInteger(fill) && !isReal(fill) && !isLogical(fill))
error(_("fill must be numeric or logical"));
double dfill = REAL(PROTECT(coerceAs(fill, ScalarReal(NA_REAL), ScalarLogical(true))))[0]; protecti++;
double dfill = REAL(PROTECT(coerceAs(fill, PROTECT(ScalarReal(NA_REAL)), ScalarLogical(true))))[0]; protecti++;
UNPROTECT(1); // as= input to coerceAs()

SEXP ans = PROTECT(allocVector(VECSXP, nk * nx)); protecti++;
if (verbose)
Expand Down
3 changes: 2 additions & 1 deletion src/gsumm.c
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,8 @@ SEXP gmean(SEXP x, SEXP narmArg)
x = PROTECT(coerceVector(x, REALSXP)); protecti++;
case REALSXP: {
if (INHERITS(x, char_integer64)) {
x = PROTECT(coerceAs(x, /*as=*/ScalarReal(1), /*copyArg=*/ScalarLogical(TRUE))); protecti++;
x = PROTECT(coerceAs(x, /*as=*/PROTECT(ScalarReal(1)), /*copyArg=*/ScalarLogical(TRUE))); protecti++;
UNPROTECT(1); // as= input to coerceAs()
}
const double *restrict gx = gather(x, &anyNA);
ans = PROTECT(allocVector(REALSXP, ngrp)); protecti++;
Expand Down
4 changes: 3 additions & 1 deletion src/rbindlist.c
Original file line number Diff line number Diff line change
Expand Up @@ -515,12 +515,14 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
if (w==-1 || !length(thisCol=VECTOR_ELT(li, w))) { // !length for zeroCol warning above; #1871
writeNA(target, ansloc, thisnrow, false); // writeNA is integer64 aware and writes INT64_MIN
} else {
bool listprotect = false;
if ((TYPEOF(target)==VECSXP || TYPEOF(target)==EXPRSXP) && TYPEOF(thisCol)!=TYPEOF(target)) {
// do an as.list() on the atomic column; #3528
thisCol = PROTECT(coerceVector(thisCol, TYPEOF(target))); nprotect++;
thisCol = PROTECT(coerceVector(thisCol, TYPEOF(target))); listprotect = true;
}
// else coerces if needed within memrecycle; with a no-alloc direct coerce from 1.12.4 (PR #3909)
const char *ret = memrecycle(target, R_NilValue, ansloc, thisnrow, thisCol, 0, -1, idcol+j+1, foundName);
if (listprotect) UNPROTECT(1); // earlier unprotect rbindlist calls with lots of lists #4536
if (ret) warning(_("Column %d of item %d: %s"), w+1, i+1, ret);
// e.g. when precision is lost like assigning 3.4 to integer64; test 2007.2
// TODO: but maxType should handle that and this should never warn
Expand Down

0 comments on commit 353675d

Please sign in to comment.