Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread read compressed .bgz #5474

Merged
merged 5 commits into from
May 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@

13. `dcast`gains `value.var.in.dots`, `value.var.in.LHSdots` and `value.var.in.RHSdots` arguments, [#5824](https://github.com/Rdatatable/data.table/issues/5824). This allows the `value.var` variable(s) in `dcast` to be represented by `...` in the formula (if not otherwise mentioned). Thanks to @iago-pssjd for the report and PR.

14. `fread` loads `.bgz` files directly, [#5461](https://github.com/Rdatatable/data.table/issues/5461). Thanks to @TMRHarrison for the request with proposed fix, and Benjamin Schwendinger for the PR.

## BUG FIXES

1. `unique()` returns a copy the case when `nrows(x) <= 1` instead of a mutable alias, [#5932](https://github.com/Rdatatable/data.table/pull/5932). This is consistent with existing `unique()` behavior when the input has no duplicates but more than one row. Thanks to @brookslogan for the report and @dshemetov for the fix.
Expand Down
4 changes: 2 additions & 2 deletions R/fread.R
Original file line number Diff line number Diff line change
Expand Up @@ -116,10 +116,10 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
gz_signature = as.raw(c(0x1F, 0x8B))
bz2_signature = as.raw(c(0x42, 0x5A, 0x68))
gzsig = FALSE
if ((w <- endsWithAny(file, c(".gz",".bz2"))) || (gzsig <- identical(head(file_signature, 2L), gz_signature)) || identical(head(file_signature, 3L), bz2_signature)) {
if ((w <- endsWithAny(file, c(".gz", ".bgz",".bz2"))) || (gzsig <- identical(head(file_signature, 2L), gz_signature)) || identical(head(file_signature, 3L), bz2_signature)) {
if (!requireNamespace("R.utils", quietly = TRUE))
stopf("To read gz and bz2 files directly, fread() requires 'R.utils' package which cannot be found. Please install 'R.utils' using 'install.packages('R.utils')'.") # nocov
FUN = if (w==1L || gzsig) gzfile else bzfile
FUN = if (w<=2L || gzsig) gzfile else bzfile
R.utils::decompressFile(file, decompFile<-tempfile(tmpdir=tmpdir), ext=NULL, FUN=FUN, remove=FALSE) # ext is not used by decompressFile when destname is supplied, but isn't optional
file = decompFile # don't use 'tmpFile' symbol again, as tmpFile might be the http://domain.org/file.csv.gz download
on.exit(unlink(decompFile), add=TRUE)
Expand Down
Binary file added inst/tests/ch11b.dat.bgz
Binary file not shown.
3 changes: 3 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -17903,6 +17903,9 @@ if (test_R.utils) {
test(2229.4, fread(f, logical01=FALSE)[,1], data.table(V1 = 1:100))
file.copy(testDir("issue_785_fread.txt.gz"), f, overwrite=TRUE)
test(2229.5, fread(f, logical01=FALSE)[,25], data.table(Sv3 = c(10,14,14,15)))
# support .bgz with fread #5461
file.copy(testDir("ch11b.dat.bgz"), f, overwrite=TRUE)
test(2229.55, fread(f, logical01=FALSE)[,1], data.table(V1 = 1:100))
}
unlink(f)
# not supporting multi file zips yet
Expand Down
Loading