Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore comma-separated columns to by= and key= #6049

Merged
merged 11 commits into from
Jul 11, 2024
Merged
4 changes: 0 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@

# data.table [v1.15.99](https://github.com/Rdatatable/data.table/milestone/30) (in development)

## BREAKING CHANGES

1. Usage of comma-separated character strings representing multiple columns in `data.table()`'s `key=` argument and `[`'s `by=`/`keyby=` arguments is deprecated, [#4357](https://github.com/Rdatatable/data.table/issues/4357). While sometimes convenient, ultimately it introduces inconsistency in implementation that is not worth the benefit to maintain. NB: this hard deprecation is temporary in the development version. Before release, it will soften into the normal data.table deprecation cycle starting from introducing the new behavior with an option, then changing the default for the option with a warning, then upgrading the warning to an error before finally removing the option and the error.

## NEW FEATURES

1. `print.data.table()` shows empty (`NULL`) list column entries as `[NULL]` for emphasis. Previously they would just print nothing (same as for empty string). Part of [#4198](https://github.com/Rdatatable/data.table/issues/4198). Thanks @sritchie73 for the proposal and fix.
Expand Down
7 changes: 3 additions & 4 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,7 @@ data.table = function(..., keep.rownames=FALSE, check.names=FALSE, key=NULL, str
ans = as.data.table.list(x, keep.rownames=keep.rownames, check.names=check.names, .named=nd$.named) # see comments inside as.data.table.list re copies
if (!is.null(key)) {
if (!is.character(key)) stopf("key argument of data.table() must be character")
if (length(key)==1L) {
if (key != strsplit(key,split=",")[[1L]]) stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "key=")
}
if (length(key)==1L) key = cols_from_csv(key)
setkeyv(ans,key)
} else {
# retain key of cbind(DT1, DT2, DT3) where DT2 is keyed but not DT1. cbind calls data.table().
Expand Down Expand Up @@ -797,7 +795,8 @@ replace_dot_alias = function(e) {

if (mode(bysub) == "character") {
if (any(grepl(",", bysub, fixed = TRUE))) {
stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "by=")
if (length(bysub) > 1L) stopf("'by' is a character vector length %d but one or more items include a comma. Either pass a vector of column names (which can contain spaces, but no commas), or pass a vector length 1 containing comma separated column names. See ?data.table for other possibilities.", length(bysub))
bysub = cols_from_csv(bysub)
}
bysub = gsub("^`(.*)`$", "\\1", bysub) # see test 138
nzidx = nzchar(bysub)
Expand Down
5 changes: 2 additions & 3 deletions R/fread.R
Original file line number Diff line number Diff line change
Expand Up @@ -340,9 +340,8 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
if (!is.null(key) && data.table) {
if (!is.character(key))
stopf("key argument of data.table() must be a character vector naming columns (NB: col.names are applied before this)")
if (length(key) == 1L) {
if (key != strsplit(key,split=",")[[1L]]) stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "key=")
}
if (length(key) == 1L)
key = cols_from_csv(key)
setkeyv(ans, key)
}
if (yaml) setattr(ans, 'yaml_metadata', yaml_header)
Expand Down
4 changes: 4 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ brackify = function(x, quote=FALSE) {
sprintf('[%s]', toString(x))
}

# convenience for specifying columns in some cases, e.g. by= and key=
# caller should ensure length(x) == 1 & handle accordingly.
cols_from_csv = function(x) strsplit(x, ',', fixed=TRUE)[[1L]]

# patterns done via NSE in melt.data.table and .SDcols in `[.data.table`
# was called do_patterns() before PR#4731
eval_with_cols = function(orig_call, all_cols) {
Expand Down
9 changes: 7 additions & 2 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -1710,7 +1710,8 @@ test(540, DT[,sum(v),by=eval(a)], data.table(a=1:0,V1=c(11L,10L)))
test(541, DT[,sum(v),keyby=eval(a)], data.table(a=0:1,V1=c(10L,11L),key="a"))

test(542, DT[,sum(v),keyby=c("a","b","c")]$V1, INT(1,3,4,6,5,2))
# tests 543,544 were of deprecated behavior to allow comma-separated entries to keyby
test(543, DT[,sum(v),keyby="a,b,c"]$V1, INT(1,3,4,6,5,2))
test(544, DT[,sum(v),keyby=c("a","b,c")], error="but one or more items include a comma")

# Test single expressions passed to by, FR#1743 in v1.8.0
DT = data.table(a=1:4,date=as.IDate("2012-02-28")+0:3,v=5:8)
Expand Down Expand Up @@ -1777,7 +1778,11 @@ test(569, DT[,list(.N=.N),list(a,b)][,.N,a], error="The column '.N' can't be gro
test(570, DT[,list(.N=.N),list(a,b)][,unique(.N),a], error="The column '.N' can't be grouped because")
test(570.1, DT[,list(.I=.I),list(a,b)][,.I,a], error="The column '.I' can't be grouped because")

# tests 571-573 were of deprecated behavior to allow comma-separated entries in by=
# Test spaces in by="..." format, datatable-help on 31 March
DT = data.table("a "=1:2, "b"=3:4," b"=5:6, v=1:6)
test(571, DT[,sum(v),by="b, b"], data.table("b"=3:4, " b"=5:6, V1=c(9L,12L)))
test(572, DT[,sum(v),by="a , b"], data.table("a "=1:2, " b"=5:6, V1=c(9L,12L)))
test(573, DT[,sum(v),by="b, a"], error=base_messages$missing_object(" a"))

# Test base::unname, used by melt, and only supported by data.table for DF compatibility for non-dtaware packages
DT = data.table(a=1:3, b=4:6)
Expand Down
Loading