-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.table melt memory efficiency #27
Comments
I created a test here but it shows no difference: # Improvement brought by: https://github.com/Rdatatable/data.table/pull/5054
"melt improved in #5054" = atime::atime_test(
N = 10^seq(1, 10),
setup = {
dt = data.table(x = sample(c(1:N, NA), N, replace = TRUE),
y = sample(c(letters, NA), N, replace = TRUE),
z = sample(c(NA, 1:N), N, replace = TRUE),
w = replicate(N, paste0(sample(letters, 20, replace = TRUE), collapse = "")),
v = I(lapply(1:N, function(i) list(sample(1:100, 10, replace = TRUE))))
)
},
expr = data.table:::melt(dt, id.vars = c("x", "z"), na.rm = TRUE),
Slow = "fd24a3105953f7785ea7414678ed8e04524e6955", # Parent of the merge commit (https://github.com/Rdatatable/data.table/commit/ed72e398df76a0fcfd134a4ad92356690e4210ea) of the PR (https://github.com/Rdatatable/data.table/pull/5054) that brought the improvement
Fast = "ed72e398df76a0fcfd134a4ad92356690e4210ea") # Merge commit of the PR (https://github.com/Rdatatable/data.table/pull/5054) that brought the improvement Would you suggest me to revise this? (also, anything particular that I should test for here? I tried to have a lot of NAs as input, with |
NAs in the input data table do not affect this PR, which improved efficiency when there were NAs in Using the code below pkg.edit.fun = function(old.Package, new.Package, sha, new.pkg.path) {
pkg_find_replace <- function(glob, FIND, REPLACE) {
atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
}
Package_regex <- gsub(".", "_?", old.Package, fixed = TRUE)
Package_ <- gsub(".", "_", old.Package, fixed = TRUE)
new.Package_ <- paste0(Package_, "_", sha)
pkg_find_replace(
"DESCRIPTION",
paste0("Package:\\s+", old.Package),
paste("Package:", new.Package))
pkg_find_replace(
file.path("src", "Makevars.*in"),
Package_regex,
new.Package_)
pkg_find_replace(
file.path("R", "onLoad.R"),
Package_regex,
new.Package_)
pkg_find_replace(
file.path("R", "onLoad.R"),
sprintf('packageVersion\\("%s"\\)', old.Package),
sprintf('packageVersion\\("%s"\\)', new.Package))
pkg_find_replace(
file.path("src", "init.c"),
paste0("R_init_", Package_regex),
paste0("R_init_", gsub("[.]", "_", new.Package_)))
pkg_find_replace(
"NAMESPACE",
sprintf('useDynLib\\("?%s"?', Package_regex),
paste0('useDynLib(', new.Package_))
}
vres <- atime::atime_versions(
pkg.path="~/R/data.table",
pkg.edit.fun=pkg.edit.fun,
setup={
DT <- as.data.table(as.list(1:N))
measure.vars <- lapply(1:N,function(i){
x=rep(NA,N)
x[i]=i
x
})
},
expr=data.table:::melt(DT, measure.vars=measure.vars),
Slow = "fd24a3105953f7785ea7414678ed8e04524e6955", # Parent of the merge commit (https://github.com/Rdatatable/data.table/commit/ed72e398df76a0fcfd134a4ad92356690e4210ea) of the PR (https://github.com/Rdatatable/data.table/pull/5054) that brought the improvement
Fast = "ed72e398df76a0fcfd134a4ad92356690e4210ea") I get the result below, which indicates a small constant factor speedup, but I don't see any difference in memory usage. |
Ah I should have had them in
Do you think it's worth adding a test case just for that improvement in time when we are anticipating a memory improvement? (I think we can skip this one, but up to you) |
yes please submit a PR |
Sure! |
Completed in Rdatatable/data.table#6526 |
verify that this PR actually is more memory efficient Rdatatable/data.table#5054
The text was updated successfully, but these errors were encountered: