- remove
PackedStatExtent
class, releasing it forterra
;reproducible
now uses aPackedStatExtent2
; this will eventually be replaced by theterra
PackedStatExtent
when this conflict is removed; .robustDigest
method for"character"
no longer will evaluate character strings as files, by default. A user can force the old behaviour withoptions(reproducible.testCharacterAsFile = TRUE)
). This created unwanted, and inexplicable hanging of a computer, e.g., in adata.frame
with thousands of rows of a character vector that represent filenames that existed, but their content was not expected to be digested; it would take possibly hours to digest. To digest files, user must explicitly coerce to"Path"
withasPath(x)
, orfs::as_fs_path
as the previous hanging behaviour was surprising and could not be easily diagnosed;url
inprepInputs
can now point to a directory; usealsoExtract
to pick files by regular expression;- improved handling of symlinks in
remapFileNames()
; - pass
terra::project()
argumentsuse_gdal
andby_util
throughprojectTo()
toterra::project()
; - in some cases of downloading a file within
preProcess
, supplying auser_agent
(which happens automatically within the function) would cause the download to fail; now there is some redundancy withindlGeneric
that will retry without auser_agent
if it detects this issue;
- begin transition to use
cli
; instead of custom messaging functions; - rm
crayon
dependency; - begin to replace
httr
--> convert to usehttr2
for some pieces; transition not complete;
- When forcing
cacheId
, e.g., inCache(..., cacheId = "myCacheItem")
,myCacheItem
was not used. Fixed. prepInputs(..., fun = sf::st_read)
now works as expected ... likeprepInputs(..., fun = "sf::st_read")
- new family of functions that are called inside
postProcessTo
that usesf::gdal_utils
directly. These are still experimental and will only be activated withoptions("reproducible.gdalwarp" = TRUE)
- default for
gdalMask
has changed default for "touches". Now has equivalent forterra::mask(..., touches = TRUE)
, using"-wo CUTLINE_ALL_TOUCHED=TRUE"
gdalProject
now uses 2 threads, setting"-wo NUM_THREADS=2"
; can be changed by user withoptions("reproducible.gdalwarpThreads" = X)
; see?reproducibleOptions
gdal*
functions now addressdatatype
issuesgdal*
defaults toFLT8S
ifdatatype
not passedmakeRelative
,makeAbsolute
and similar have been created to ease many issues encountered inpreProcess
showSimilar
(e.g.,options(reproducible.showSimilar = 1)
) now preferentially shows the most recent item in cache if there are several with equivalent matching.- overhaul of messaging in
Cache
andprepInputs
families; functions are highlighted with a different colour; indent level reflects nesting of bothCache
andprepInputs
, so it is easier to identify which message goes with which function call. preProcess
is a lot faster now for large numbers of files; usesCHECKSUMS
more effectively and fewer timesretry
now captures itsexpr
so it doesn't need aquote
; is liketry
now.showSimilar
mechanisms now returns the most recent, if there are >1 similar that are equivalently similar- if a user is having troubles with
googledrive
for e.g., large files on spotting connections, instructions for usinggdown
are provided showCache
,clearCache
now have extra argumentsfun
,cacheId
, and...
now can take any arbitrarytag = value
pair. ThecacheId
argument will be very fast if a user is not usinguseDBI()
isFALSE
..wrap
and.unwrap
can now deal withSpatVectorCollection
(aterra
class that does not have awrap
/unwrap
method interra
)- ALTREP digesting when using
spooky
orfastdigest
were not stable forintegers
andfactors
. There is now a work around in.robustDigest
that stabilizes these by expanding them from their ALTREP representation first. Since they will be saved and recovered anyway, this will have little effect. .wrap
and.unwrap
are becoming more mature and can handle many more classes effectively. Methods can still be written, if needed.
- lots of testing with
cacheSaveFormat = "qs"
, which previously was not reliable especially for environments. With all recent changes to.wrap
and.unwrap
, these appear stable now and should be able to be used forenvironments
.
switchDataType
can now correctly switch betweengdal
formats andterra
- many messaging fixes that were imprecise or missing
- re-submission after removal from CRAN
fastdigest
was removed from CRAN and so is removed from here.
- critical bugfixes for file-backed
SpatRaster
objects
- new function
isUpdated()
to determine whether a cached object has been updated; makeRelative()
is now exported for use downstream (e.g.,SpaDES.core
);- new functions
getRelative()
andnormPathRel()
for improved symlink handling (#362); - messaging is improved for
Cache
with the function named instead of justcacheId
- messaging for
prepInputs
: minor changes - more edge cases for
Checksums
dealt with, so fewer unneeded downloads wrapSpatRaster
(wrap
for file-backedspatRaster
objects) fixes for more edge casespostProcessTo
can now usesf::gdal_utils
for the case offrom
is a gridded object andto
is a polygon vector. This appears to be between 2x and 10x faster in tests.postProcessTo
does a pre-crop (with buffer) to make theprojectTo
faster. When bothfrom
andto
are vector objects, this pre-crop appears to create slivers in some cases. This step is now skipped for these cases.Cache
can now deal with unnamed functions, e.g.,Cache((function(x) x)(1))
. It will be refered to as "headless".terra
would fail if internet was unavailable, even when internet is not necessary, due to needing to retrieve projection information. Many cases where this happens will now divert to usesf
.Cache
can now skip calculatingobjSize
, which can take a non-trivial amount of time for large, complicated objects; seereproducibleOptions()
Filenames
for some classes returned ""; now returns NULL so character vectors are only pointers to files- Cache on a terra object that writes file to disk, when
quick
argument is specified was failing, always creating the same object; fixed with #PR368 useDBI
was incorrectly used if a user had set the option prior to package loading. Now works as expected.- several other minor
preProcess
deals better with more cases of nested paths in archives.- more edge cases corrected for
inputPaths
- minor formatting changes
- sometimes a cache entry gets corrupted. Previously, a message was supplied on how to fix; now this is just tried directly instead of just suggesting a user do it.
- only use character strings when comparing
getRVersion() <= "XXX"
- fixes for
assessDataType
for categorical (factor)Raster
andSpatRaster
- Address change in
round
withR > 4.3.1
; now a primitive, that does method dispatch. Failure was identified with unit tests, by Luke Tierney who was making the change inbase::round
.
- several identified and fixed (PRs by Ceres Barros, notably, PRs #341, #342, #343). These fix missing argument in a
.unwrap
call, and missing check inpreProcess
, whentargetFilePath
wasNULL
. - minor documentation updates
- Updates of
Copy
& new.wrap
,.unwrap
generics and methods to wrap classes that don't save well to disk as is. This uses the name similar toterra::wrap
, but with slight differences internally to allow forSpatRaster
objects who are file-backed and must have their files moved when they are unwrapped. loadFiles
updated for more cases- convert to using
withr
throughout testing for cleaning up - more methods for
Filename
added, including forPath
class Cache(..., useCloud = TRUE)
had many cases that were not working; known cases are now working. Also, now file from file-backed cases are now placed inside thecacheOutputs
folder rather than inside a separate folder (used to be "rasters")
- several small for edge cases
- none
reproducible.useFuture
now defaults to"multisession"
- updated tests to deal with
data.table
development branch (#314) - removed all use of
data.table::setattr
to deal with "modified compiler constants" issue that was detected during CRAN checks - Improvements with testing using GitHub Actions
preProcess
failed whengoogledrive
url filename could be found, butdestinationPath
was not"."
normPath
had different behaviour on *nix-alikes and Windows. Now it is the same.SpatRaster
objects if saved to a specific, non relative (togetwd()
) path would not be recovered correctly (#316)- Several other Issues that addressed edge cases for
prepInputs
and family.
- new optional backend for
Cache
viaoptions(reproducible.useDBI = FALSE)
is single data files with the samebasename
as the cached object, i.e., with the samecacheId
in the file name. This is a replacement forRSQLite
and will likely become the default in the next release. This approach makes cloud caching easier as all metadata are available in small binary files for each cached object. This is simpler, faster and creates far fewer package dependencies (now 11 recursive; before 27 recursive). If a user hasDBI
andRSQLite
installed, then the backend will default to use these currently, i.e., the previous behaviour. The user can change the backend without loss of Cache data. - moved
raster
andsp
toSuggests
; no more internal functions use these. User can still work withRaster
andsp
class objects as before. preProcess
can now handle Google docs files, iftype = ...
is passed.postProcess
now usesterra
andsf
internally (with #253) throughout the family ofpostProcess
functions. The previous*Input
and*Output
functions now redirect to the new*To*
functions. These are faster, more stable, and cover vastly more cases than the previous*Inputs
family. The old backends no longer work as before.- minor functions to assist with transition from
raster
toterra
:maxFn
,minFn
,rasterRead
.dealWithClass
and.dealWithClassOnRecovery
are now exported generics, with several methods here, notably, list, environment, default- other miscellaneous changes to deal with
raster
toterra
transition (e.g.studyAreaName
can deal withSpatVector
) prepInputs
now deals with archives that have sub-folder structure are now dealt with correctly in all examples and tests esp. #181.prepInputs
can now deal with.gdb
files. Though, it is limited tosf
out of the box, so e.g., Raster layers insidegdb
files are not supported (yet?). User can passfun = NA
to not try to load it, but at least have the.gdb
file locally on disk.hardLinkOrCopy
now useslinkOrCopy(symlink = FALSE)
; more cases dealt with especially nested directory structures that do not exist in theto
.- many GitHub issues closed after transition to using
terra
andsf
. preProcess
had multiple changes. The following now work: archives with subfolders, archives with subfolders with identical basenames (different dirnames), gdb files, other files wheretargetFile
is a directory.- ~40 issues were closed with current release.
- code coverage now approaching 85%
- substantial changes to
preProcess
for minor efficiency gains, edge cases, code cleaning - new function
CacheGeo
that weaves togetherprepInputs
andCache
to create a geo-spatial caching. See help and examples. maskTo
now allowstouches
arg forterra::mask
Spatial
class is also "fixed" infixErrorsIn
prepInputs
andpreProcess
now capturedlFun
, so user can pass unquoteddlFun
Copy
method forSpatRaster
, with and without file-backingCache(..., useCloud = TRUE)
reworked so appears to be more robust than previously.maskTo
now works even ifto
is larger thanfrom
netCDF
works withprepInputs
; thanks to user nbsmokee with PR #300.
- no spatial packages are automatically installed any more; to work with
prepInputs
and family, the user will have to installterra
andsf
at a minimum. terra
,sf
are inSuggests
- removed entirely:
fasterize
,fpCompare
,magrittr
- moved to
Suggests
:raster
,sp
,rlang
- A normal (minimal) install of
reproducible
no longer installsDBI
, nor does it useRSQLite
. All cache repositories database files will be in binary individual files in thecacheOutputs
file. If a user hasDBI
and aSQLite
engine, then the previous behaviour will be used.
reproducible.useNewDigestAlgorithm
is not longer an option as the old algorithms do not work reliably.
- removed
assessDataTypeGDAL()
,clearStubArtifacts()
, - removed non-exported
digestRasterLayer2()
;evalArgsOnly()
;.getSourceURL()
;.getTargetCRS()
;.checkSums()
,.groupedMessage()
;.checkForAuxililaryFiles()
option("reproducible.polygonShortcut")
removed
.basename
renamed tobasename2
Cache
was incorrectly dealing withenvironment
andenvironment-like
objects. Since some objects, e.g.,Spat*
objects interra
, must be wrapped prior to saving, environments must be scanned for these classes of objects prior to saving. This previously only occurred forlist
objects;- When working with revdep
SpaDES.core
, there were some cases where theCache
was failing as it could not find the module name; - during transition from
postProcess
(usingraster
andsp
) topostProcessTo
, some cases are falling through the cracks; these have being addressed.
- none
Cache
now captures the first argument passed to it without evaluating it, soCache(rnorm(1))
now works as expected.- As a result of previous,
Cache
now works with base pipe |> (with R >= 4.1). - Due to some internal changes in the way arguments are evaluated and digested, there may be some cache entries that will be rerun. However, in simple cases of
FUN
passed toCache
, there should be no problems with previous cache databases being successfully recovered. - Added more unit tests
- Reworked
Cache
internals so that digesting is more accurate, as the correct methods for functions are more accurately found, objects within functions are more precisely evaluated. - Improved documentation:
- Examples were reworked, replaced, improved;
- All user-facing exported functions and methods now have complete documentation;
- Added
()
in DESCRIPTION for functions; - Added
\value
in.Rd
files for exported methods (structure, the class, the output meaning); - Remove commented code in examples.
postProcess
now also checks resolution when assessing whether to projectprepInputs
has an internalCache
call for loading the object into memory; this was incorrectly evaluating all files if there were more than one file downloaded and extracted. This resulted in cases, e.g. shapefiles, being considered identical if they had the identical geometries, even if their data were different. This is fixed now as it uses the digest of all files extracted.
- remove defunct argument
digestPathContent
fromCache
options("reproducible.useGDAL")
is now deprecated; the package is moving towardsterra
.
- none
- none
- fix tests for
postProcessTo
to deal with changes in GDAL/PROJ/GEOS (#253; @rsbivand) - fixed issue with masking
- Drop support for R 3.6 (#230)
- remove
gdalUtilities
,gdalUtils
, andrgeos
fromSuggests
- Added minimum versions of
raster
andterra
, because previous versions were causing collisions.
- all direct calls to GDAL are removed: only
terra
andsf
are used throughout prepInputs
can now takefun
as a quoted expression onx
, the object loaded bydlFun
inpreProcess
preProcess
argdlFun
can now be a quoted expression- changes to the internals and outputs of
objSize
; now is primarily a wrapper aroundlobstr::obj_size
, but has an option to get more detail for lists and environments. .robustDigest
now deals explicitly with numerics, which digest differently on different OSs. Namely, they get rounded prior to digesting. Through trial and error, it was found that settingoptions("reproducible.digestDigits" = 7)
was sufficient for all known cases. Rounding to deeper than 7 decimal places was insufficient. There are also new methods forlanguage
,integer
,data.frame
(which does each column one at a time to address the numeric issue)- New version of
postProcess
calledpostProcessTo
. This will eventually replacepostProcess
as it is much faster in all cases and simpler code base thanks to the fantastic work of Robert Hijmans (terra
) and all the upstream work thatterra
relies on - Minor message updates, especially for "adding to memoised copy...". The three dots made it seem like it was taking a long time. When in reality, it is instantaneous and is the last thing that happens in the
Cache
call. If there is a delay after this message, then it is the code following theCache
call that is (silently) slow. retry
can now return a named list for theexprBetween
, which allows for more than one object to be modified between retries.
.robustDigest
was removing Cache attributes from objects under many conditions, when it should have left them there. It is unclear what the issues were, as this would likely not have impactedCache
. Now these attributes are left on.data.table
objects appear to not be recovered correctly from disk (e.g., from Cache repository. We have addeddata.table::copy
when recovering from Cache repositoryclearCache
andcc
did not correctly remove file-backed raster files (when not clearing whole CacheRepo); this may have resulted in a proliferation of files, each a filename with an underscore and a new higher number. This fix should eliminate this problem.- deal with development versions of GDAL in
getGDALVersion()
(#239) - fix issue with
maskInputs()
when not passingrasterToMatch
. - fix issue with
isna.SpatialFix
when usingpostProcess.quosure
lwgeom
now a suggested package
terra
class objects can now be correctly saved and recovered byCache
fixErrors
can now distinguishtestValidity = NA
meaning don't fix errors andtestValidity = FALSE
run buffering which fixes many errors, but don't test whether there are any invalid polygons first (maybe slow), ortestValidity = TRUE
meaning test for validity, then if some are invalid, then run buffer.- Change default option to
reproducible.useNewDigestAlgorithm = 2
which will have user visible changes. To keep old behaviour, setoptions(reproducible.useNewDigestAlgorithm = 1)
- minor changes to messaging when
options(reproducible.showSimilar)
is set. It is now more compact e.g., 3 lines instead of 5. - added
sf
methods tostudyAreaName
- A small, but very impactful bug that created false positive
Cache
returns; i.e., a 2nd time through a Cache would return a cached copy, when some of the arguments were different. It occurred for when the differences were in unnamed arguments only.
reproducible
will be slowly changing the defaults for vector GIS datasets from the sp
package to the sf
package.
There is a large user-visible change that will come (in the next release), which will cause prepInputs
to read .shp
files with sf::st_read
instead of raster::shapefile
, as it is much faster. To change now, set options("reproducible.shapefileRead" = "sf::st_read")
- default
fun
inprepInputs
for shapefiles (.shp
) is nowsf::st_read
if the system hassf
installed. This can be overridden withoptions("reproducible.shapefileRead" = "raster::shapefile")
, and this is indicated with a message at the moment this is occurring, as it will cause different behaviour. quick
argument inCache
can now be a character vector, allowing individual character arguments to be digested as character vectors and others to be digested as files located at the specified path as represented by the character vector.objSize
previously included objects innamespaces
,baseenv
andemptyenv
, so it was generally too large. Now uses the same criteria aspryr::object_size
- improvements with messaging when
unzip
missing (thanks to @CeresBarros #202) - while unzipping, will also search for
7z.exe
on Windows if the object is larger than 2GB, if can't findunzip
. fun
argument inprepInputs
and family can now be a quoted expression.archive
argument inprepInputs
can now beNA
which means to treat the file downloaded not as an archive, even if it has a.zip
file extension- many minor improvements to functioning of esp.
prepInputs
- speed improvements during
postProcess
especially for very large objects (>5GB tested). Previously, it was running manyfixErrors
calls; now only callsfixErrors
on fail of the proximate call (e.g., st_crop or whatever) retry
now has a new argumentexprBetween
to allow for doing something after the fail (for example, if an operation fails, e.g.,st_crop
, then runfixErrors
, then return back tost_crop
for the retry)Cache
now has MUCH better nested levels detection, with messaging... and control of how deep the Caching goes seems good, via useCache = 2 will only Cache 2 levels in...archive
argument inprepInputs
family can now be NA ... meaning do not try to unzip even if it is a.zip
file or other standard archive extensiongdb.zip
files (e.g., a file with a .zip extension, but that should not be opened with an unzip-type program) can now be opened withprepInputs(url = "whateverUrl", archive = NA, fun = "sf::st_read")
fun
argument inprepInputs
can now be a quoted function call.preProcess
now does a better job with large archives that can't be correctly handled with the defaultzip
andunzip
with R, by tryingsystem2
calls to possible7z.exe
or other options on Linux-alikes.
Copy
generic no longer hasfileBackedDir
argument. It is now passed through with the...
. This was creating a bug with some cases wherefileBackedDir
was not being correctly executed.fixErrors()
now better handlessf
polygons with mixed geometries that include points.- inadvertent deleting of file-backed rasters in multi-filed stacks during
Cache
writeOutputs.Raster
attempted to changedatatype
ofRaster
class objects using the setReplacementdataType<-
, without subsequently writing to disk viawriteRaster
. This created bad values in theRaster*
object. This now performs awriteRaster
if there is adatatype
passed towriteOutputs
e.g., throughprepInputs
orpostProcess
.updateSlotFilename
has many more tests.prepInputs(..., fun = NA)
now is the correct specification for "do not load object into R". This essentially replicatespreProcess
with same arguments.- several minor bugfixes
Copy
did not correctly copyRasterStack
s when some of theRasterLayer
objects were in memory, some on disk;raster::fromDisk
returnedFALSE
in those cases, soCopy
didn't occur on the file-backed layer files. UsingFilenames
instead to determine if there are any files that need copying.
- Optional (and may be default soon) -- An update to the internal digesting for file-backed Rasters that should be substantially faster, and smaller disk footprint. Set using
options("reproducible.useNewDigestAlgorithm" = 2)
- changed default of
options("reproducible.polygonShortcut" = FALSE)
as there were still too many edge cases that were not covered.
- fixed an error with rcnst on CRAN
RasterStack
objects with a single file (thus acting like aRasterBrick
) are now handled correctly byCache
andprepInputs
families, especially with newoptions("reproducible.useNewDigestAlgorithm" = 2)
, though in tests, it worked with default alsoRSQLite
now uses a RNG duringdbAppend
; this affected 2 tests (#185).
- typo in date
- minor url fix
- removed several uses of
rgeos
- moved
paddedFloatToChar
to reproducible from SpaDES.core. - increased code coverage
- Pull in legacy
%>%
code frommagrittr
to allow the cached alternative,%C%
. With newmagrittr
pipe now in compiled source code, more of the legacy code is required here.
- several minor
- harmonized message colours that are use adjustable via options:
reproducible.messageColourPrepInputs
for allprepInputs
functions;reproducible.messageColourCache
for allCache
functions; andreproducible.messageColourQuestion
for questions that require user input. Defaults arecyan
,blue
andgreen
respectively. These are user-visible colour changes. - improved messaging for
Cache
cases where afile.link
is used instead of saving. - with improved messaging, now
options(reproducible.verbose = 0)
will turn off almost all messaging. postProcess
and family now havefilename2 = NULL
as the default, so not saved to disk. This is a change.verbose
is now an argument throughout, whose default isgetOption(reproducible.verbose)
, which is set by default to1
. Thus, individual function calls can be more or less verbose, or the whole session via option.
RasterStack
objects were not correctly saved to disk under some conditions inpostProcess
- fixed- several minor
postProcess
now uses a simpler single call togdalwarp
, if available, forRasterLayer
class to accomplishcropInputs
,projectInputs
,maskInputs
, andwriteOutputs
all at once. This should be faster, simpler and, perhaps, more stable. It will only be invoked if theRasterLayer
is too large to fit into RAM. To force it to be used the user must setuseGDAL = "force"
inprepInputs
orpostProcess
or globally withoptions("reproducible.useGDAL" = "force")
postProcess
when using the newgdalwarp
, has better persistence of colour table, and NA values as these are kept with better reliability- concurrent
Cache
now works as expected (e.g., with parallel processing, it will avoid collisions) with SQLite thanks to suggestion here: https://stackoverflow.com/a/44445010 - updated digesting of
Raster
class objects to account for more of the metadata (including the colortable). This will change the digest value of allRaster
layers, causing re-run ofCache
- removed
Require
,pkgDep
,trimVersionNumber
,normPath
,checkPath
that were moved toRequire
package. For backwards compatibility, these are imported and reexported - address permanently or temporarily new changes in GDAL>3 and PROJ>6 in the spatial packages.
- new function
file.move
used to rename/copy files across disks (a situation wherefile.rename
would fail) - all
DBI
type functions now have defaultcachePath
ofgetOption("reproducible.cachePath")
Cache(prepInputs, ...
on a file-backedRaster*
class object now gives the non-Cache repository folder as thefilename(returnRaster)
. Previously, the return object would contain the cache repository as the folder for the file-backedRaster*
- net reduction in number of packages that are imported from by 14. Removed completely:
backports
,memoise
,quickPlot
,R.utils
,remotes
,tools
, andversions
; moved to Suggests:fastdigest
,gdalUtils
,googledrive
,httr
,qs
,rgdal
,sf
,testthat
; added:Require
. Now there are 12 non-base packages listed in Imports. This is down from 31 prior to Ver 1.0.0.
- fix over-wide tables in PDF manual (#144)
- use
file.link
notfile.symlink
forsaveToCache
. This would have resulted in C Stack overflow errors due to missing original file in thefile.symlink
- use system call to
unzip
when extracting large (>= 4GB) files (#145, @tati-micheletti) - several minor including
projectInputs
when converting to longlat projections,setMinMax
forgdalwarp
results Filenames
now consistently returns a character vector (#149)- improvements to file-backed Raster caching to accommodate a few more edge cases
- none
- none
- fix CRAN test failure when
file.link
does not succeed.
- begin to accommodate changes in GDAL/PROJ and associated updates to other spatial packages.
More updates are expected as other spatial packages (namely
raster
) are updated. - can now change
options('reproducible.cacheSaveFormat')
on the fly; cache will look for the file bycacheId
and write it usingoptions('reproducible.cacheSaveFormat')
. If it is in another format, Cache will load it and resave it with the new format. Experimental still. - new
Copy
methods forrefClass
objects,SQLite
and movedenvironment
method intoANY
as it would be dispatched for unknown classes that inherit fromenvironment
, of which there are many and this should be intercepted Require
can now handle minimum version numbers, e.g.,Require("bit (>=1.1-15.2)")
; this can be worked into downstream tools. Still experimental.- Cache will do
file.link
orfile.symlink
if an existing Cache entry with identical output exists and it is large (currently1e6
bytes); this will save disk space. - Cache database now has tags for elapsed time of "digest", "original call", and "subsequent recovery from file",
elapsedTimeDigest
,elapsedTimeFirstRun
, andelapsedTimeLoad
, respectively. - Better management of temporary files in package and tests, e.g., during downloading (
preProcess
). Includes 2 new functions,tempdir2
andtempfile2
for use withreproducible
package - New option:
reproducible.tempPath
, which is used for the new control of temporary files. Defaults tofile.path(tempdir(), "reproducible")
. This feature was requested to help manage large amounts of temporary objects that were not being easily and automatically cleaned - Copying or moving of Cache directories now works automatically if using default
drv
andconn
; user may need to manually callmovedCache
if cache is not responding correctly. File-backed Rasters are automatically updated with new paths. - Cache now treats file-backed Rasters as though they had a relative path instead of their absolute path.
This means that Cache directories can be copied from one location to another and the file-backed
Raster*
will have their filenames updated on the fly during a Cache recovery. User doesn't need to do anything. postProcess
now will perform simple tests and skipcropInputs
andprojectInputs
with a message if it can, rather than usingCache
to "skip". This should speed uppostProcess
in many cases.- messaging with
Cache
has change. Now,cacheId
is shown in all cases, making it easier to identify specific items in the cache. - Automatically cleanup temporary (intermediate) raster files (with #110).
- none
Copy
only creates a temporary directory for filebacked rasters; previously anyCopy
command was creating a temporary directory, regardless of whether it was neededcropInputs.spatialObjects
had a bug when object was a large non-Raster class.cropInputs
may have failed due to "self intersection" error when x was aSpatialPolygons*
object; now catches error, runsfixErrors
and retriescrop
. Great reprex by @tati-micheletti. Fixed in commit89e652ef111af7de91a17a613c66312c1b848847
.Filenames
bugfix related toRasterBrick
prepInputs
does a better job of keeping all temporary files in a temporary folder; and cleans up after itself better.prepInputs
now will not show message that it is loading object into R iffun = NULL
(#135).
- This version is not backwards-compatible out of the box. To maintain backwards compatibility, set:
options("reproducible.useDBI" = FALSE)
- A new backend was introduced that uses
DBI
package directly, withoutarchivist
. This has much improved speed. - New option:
options("reproducible.cacheSaveFormat")
. This can be eitherrds
(default) orqs
. All cached objects will be saved with this format. Previously it wasrda
. - Cache objects can now be saved with with
qs::qsave
. In many cases, this has much improved speed and file sizes compared tords
; however, testing across a wide range of conditions will occur before it becomes the default. - Changed default behaviour for memoising
...
becauseCache
is now much faster, the default is to turn memoising off, viaoptions("reproducible.useMemoise" = FALSE)
. In cases of large objects, memoising should still be faster, so user can still activate it, setting the option toTRUE
. - Much better SQLite database handling for concurrent write attempts. Tested with dozens of write attempts per second by 3 cores with abundant locked database occurrences.
postProcess
arguseGDAL
can now take"force"
as the default behaviour is to not use GDAL if the problem can fit into RAM andsf
orraster
tools will be faster thanGDAL
toolsuseCloud
argument inCache
and family has slightly modified functionality (see ?Cache new sectionuseCloud
) and now has more tests including edge cases, such asuseCloud = TRUE, useCache = 'overwrite'
. The cloud version now will also follow the"overwrite"
command.
- deprecating
archivist
; moved to Suggests. - removed imports for
bitops
,dplyr
,fasterize
,flock
,git2r
,lubridate
,RcppArmadillo
,RCurl
andtidyselect
. Some of these went to Suggests.
postProcess
calls that use GDAL made more robust (including #93).- Several minor, edge cases were detected and fixed.
- remove
dplyr
as a direct dependency. It is still an indirect dependency throughDiagrammeR
- new option:
reproducible.showSimilarDepth
allows for a deeper assessment of nested lists for differences between the nearest cached object and the present object. This greater depth may allow more fine tuned understanding of why an object is not correctly caching - for downloading large files from GoogleDrive (currently only implemented), if user has set
options("reproducible.futurePlan")
to something other thanFALSE
, then it will show download progress if the file is "large".
- Several minor, edge cases were detected and fixed.
- made compatible with
googledrive
v 1.0.0 (#119)
pkgDep2
, a new convenience function to get the dependencies of the "first order" dependencies.useCache
, used in many functions (inclCache
,postProcess
) can now be numeric, a qualitative indicator of "how deep" nestedCache
calls should setuseCache = TRUE
-- implemented as 1 or 2 inpostProcess
currently. See?Cache
pkgDep
was becoming unreliable for unknown reasons. It has been reimplemented, much faster, without memoising. The speed gains should be immediately noticeable (6 second to 0.1 second forpkgDep("reproducible")
)- improved
retry
to use exponential backoff when attempting to access online resources (#121)
- Cache has 2 new arguments,
useCloud
andcloudFolderID
. This is a new approach to cloud caching. It has been tested with file backedRasterLayer
,RasterStack
andRasterBrick
and all normal R objects. It will not work for any other class of disk-backed files, e.g.,ff
orbigmatrix
, nor is it likely to work for R6 class objects. - Slowly deprecating cloudCache and family of functions in favour of a new approach using arguments to
Cache
, i.e.,useCache
andcloudFolderID
downloadData
from Google Drive now protects against HTTP2 error by capturing error and retrying. This is a curl issue for interrupted connections.
- fixes for
rcnst
errors on R-devel, tested usingdevtools::check(env_vars = list("R_COMPILE_PKGS"=1, "R_JIT_STRATEGY"=4, "R_CHECK_CONSTANTS"=5))
- other minor improvements, including fixes for #115
- new functions for accessing specific items from the
cacheRepo
:getArtifact
,getCacheId
,getUserTags
retry
, a new function, wrapstry
with an explicit attempt to retry the same code upon error. Useful for flaky functions, such asgoogldrive::drive_download
which sometimes fails due tocurl
HTTP2 error.- removed all
Rcpp
functionality as the functions were no longer faster than their R base alternatives.
prepInputs
was not correctly passinguseCache
cropInputs
was reprojecting extent of y as a time saving approach, but this was incorrect ifstudyArea
is aSpatialPolygon
that is not close to filling the extent. It now reprojectsstudyArea
directly which will be slower, but correct. (#93)- other minor improvements
CHECKSUMS.txt
should now be ordered consistently across operating systems (note:base::order
will not succeed in doing this --> now using.orderDotsUnderscoreFirst
)cloudSyncCache
has a new argument:cacheIds
. Now user can control entries bycacheId
, so can delete/upload individual objects bycacheId
- Experimental support within the
postProcess
family forsf
class objects
- mostly minor
cloudCache
bugfixes for more cases
- remove
tibble
from Imports as it's no longer being used
- remove
%>%
pipe that was long ago deprecated. User should use%C%
if they want a pipe that is Cache-aware. See examples. - Full rewrite of all
options
descriptions now inreproducible
, see?reproducibleOptions
- now
cacheRepo
andoptions("reproducible.cachePath")
can take a vector of paths. Similar to how .libPaths() works for libraries,Cache
will search first in the first entry in thecacheRepo
, then the second etc. until it finds an entry. It will only write to the first entry. - new value for the option:
options("reproducible.useCache" = "devMode")
. The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. IndevMode
, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It differs from the normal Cache if the Cache call does not find a copy in thecacheRepo
, but it does find an entry that matches based onuserTags
. In this case, it will delete the old entry in thecacheRepo
(identified based on matchinguserTags
), then continue with normalCache
. For this to work correctly,userTags
must be unique for each function call. This should be used with caution as it is still experimental. - change to how hashes are calculated. This will cause existing caches to not work correctly. To allow a user to keep old behaviour (during a transition period), the "old" algorithm can be used, with
options("reproducible.useNewDigestAlgorithm" = FALSE)
. There is a message of this change on package load. - add experimental
cloud*
functions, especiallycloudCache
which allows sharing of Cache among collaborators. Currently only works withgoogledrive
- updated
assessDataType
to consolidateassessDataTypeGDAL
andassessDataType
into single function (#71, @ianmseddy) cc
: new function -- a shortcut for some commonly used options forclearCache()
- added experimental capacity for
prepInputs
to handle.rar
archives, on systems with correct binaries to deal with them (#86, @tati-micheletti) - remove
fastdigest::fastdigest
as it is not return the identical hash across operating systems
prepInputs
on GIS objects that don't useraster::raster
to load object were skippingpostProcess
. Fixed.- under some circumstances, the
prepInputs
would cause virtually all entries inCHECKSUMS.txt
to be deleted. 2 cases where this happened were identified and corrected. data.table
class objects would give an error sometimes due to use ofattr(DT)
. Internally, attributes are now added withdata.table::setattr
to deal with this.- calling
gdalwarp
fromprostProcess
now correctly matches extent (#73, @tati-micheletti) - files from url that have unknown extension are now guessed with by
preProcess
(#92, @tati-micheletti)
- Added
remotes
to Imports and removeddevtools
-
New value possible for
options(reproducible.useCache = 'overwrite')
, which allows use ofCache
in cases where the function call has an entry in thecacheRepo
, will purge it and add the output of the current call instead. -
New option
reproducible.inputPaths
(defaultNULL
) andreproducible.inputPathsRecursive
(defaultFALSE
), which will be used inprepInputs
as possible directory sources (searched recursively or not) for files being downloaded/extracted/prepared. This allows the using of local copies of files in (an)other location(s) instead of downloading them. If local location does not have the required files, it will proceed to download so there is little cost in setting this option. If files do exist on local system, the function will attempt to use a hardlink before making a copy. -
dlGoogle()
now setsoptions(httr_oob_default = TRUE)
if using Rstudio Server. -
Files in
CHECKSUMS
now sorted alphabetically. -
Checksums
can now have aCHECKSUMS.txt
file located in a different place than thedestinationPath
-
Attempt to select raster resampling method based on raster type if no method supplied (#63, @ianmseddy)
-
projectInputs
-
new function
assessDataTypeGDAL
, used inpostProcess
, to identify smallestdatatype
for large Raster* objects passed to GDAL system call- when masking and reprojecting large
Raster
objects, enactgdalwarp
system call ifraster::canProcessInMemory(x,4) = FALSE
for faster and memory-safe processing - better handling of various data types in
Raster
objects, including factor rasters
- when masking and reprojecting large
- Work around internally inside
extractFromArchive
for large (>2GB) zip files. In theR
help manual,unzip
fails for zip files >2GB. This uses a system call if the zip file is too large and fails usingbase::unzip
. - Work around for
raster::getData
issues. - Speed up of
Cache()
when deeply nested, due togrep(sys.calls(), ...)
that would take long and hang. - Bugfix for
preProcess(url = NULL)
(#65, @tati-micheletti) - Improved memory performance of
clearCache
(#67), especially for largeRaster
objects that are stored as binaryR
files (i.e.,.rda
) - Other minor bugfixes
- Deal with new
raster
package changes in development version ofraster
package - Added checks for float point number issues in raster resolutions produced by
raster::projectRaster
.robustDigest
now does not includeCache
-added attributes- Additional tests for
preProcess()
(#68, @tati-micheletti) - Many new unit tests written, which caught several minor bugs
- fix and skip downloading test on CRAN
- Add
future
to Suggests.
- new option on non-Windows OSs to use
future
forCache
saving to SQLite database, viaoptions("reproducible.futurePlan")
, if thefuture
package is installed. This isFALSE
by default. - If a
do.call
function is Cached, previously, it would be labelled in the database asdo.call
. Now it attempts to extract the actual function being called by thedo.call
. Messaging is similarly changed. - new option
reproducible.ask
, logical, indicating whetherclearCache
should ask for deletions when in an interactive session prepInputs
,preProcess
anddownloadFile
now havedlFun
, to pass a custom function for downloading (e.g., "raster::getData")prepInputs
will automatically usereadRDS
if the file is a.rds
.prepInputs
will return alist
iffun = "base::load"
, with a message; can still pass anenvir
to obtain standard behaviour ofbase::load
.clearCache
- new argumentask
.- new function
assessDataType
, used inpostProcess
, to identify smallestdatatype
for Raster* objects, if user does not pass an explicitdatatype
inprepInputs
orpostProcess
(#39, @CeresBarros).
- fix problems with tests introduced by recent
git2r
update (@stewid, #36). .prepareRasterBackedFile
-- now will postpend an incremented numeric to a cached copy of a file-backed Raster object, if it already exists. This mirrors the behaviour of the.rda
file. Previously, if two Cache events returned the same file name backing a Raster object, even if the content was different, it would allow the same file name. If either cached object was deleted, therefore, it would cause the other one to break as its file-backing would be missing.- options were wrongly pointing to
spades.XXX
and should have beenreproducible.XXX
. copyFile
did not perform correctly under all cases; now better handling of these cases, often sending tofile.copy
(slower, but more reliable)extractFromArchive
needed a newChecksum
function call under some circumstances- several other minor bug fixes.
extractFromArchive
-- when dealing with nested zips, not all args were passed in recursively (#37, @CeresBarros)prepInputs
-- arguments that were same asCache
were not being correctly passed internally toCache
, and if wrapped in Cache, it was not passed into prepInputs. Fixed..prepareFileBackedRaster
was failing in some cases (specifically if it was inside ado.call
) (#40, @CeresBarros).Cache
was failing under some cases ofCache(do.call, ...)
. Fixed.Cache
-- when arguments to Cache were the same as the arguments inFUN
, Cache would "take" them. Now, they are correctly passed to theFUN
.preProcess
-- writing to checksums may have produced a warning ifCHECKSUMS.txt
was not present. Now it does not.- numerous other minor bugfixes
- most tests now use a standardized approach to attaching libraries, creating objects, paths, enabling easier, error resistant test building
-
new functions:
convertPaths
andconvertRasterPaths
to assist with renaming moved files.
-
prepInputs
-- new featuresalsoExtract
now has more options (NULL
,NA
,"similar"
) and defaults to extracting all files in an archive (NULL
).- skips
postProcess
altogether if nostudyArea
orrasterToMatch
. Previously, this would invoke Cache even if there was nothing topostProcess
.
copyFile
correctly handles directory names containing spaces.makeMemoisable
fixed to handle additional edge cases.- other minor bug fixes.
-
new functions:
prepInputs
to aid in data downloading and preparation problems, solved in a reproducible, Cache-aware way.postProcess
which is a wrapper for sequences of several other new functions (cropInputs
,fixErrors
,projectInputs
,maskInputs
,writeOutputs
, anddetermineFilename
)downloadFile
can handle Google Drive and ftp/http(s) fileszipCache
andmergeCache
compareNA
does comparisons with NA as a possible value e.g.,compareNA(c(1,NA), c(2, NA))
returnsFALSE, TRUE
-
Cache -- new features:
- new arguments
showSimilar
,verbose
which can help with debugging - new argument
useCache
which allows turning caching on and off at a high level (e.g., options("useCache")) - new argument
cacheId
which allows user to hard code a result from a Cache - deprecated arguments:
digestPathContent
-->quick
,compareRasterFileLength
-->length
- Cache arguments now propagate inward to nested
Cache
function calls, unless explicitly set on the inner functions - more precise messages provided upon each use
- many more
userTags
added automatically to cache entries so much more powerful searching viashowCache(userTags="something")
- new arguments
-
checksums
now returns a data.table with the same columns whetherwrite = TRUE
orwrite = FALSE
. -
clearCache
andshowCache
now give messages and require user intervention if request toclearCache
would be large quantities of data deleted -
memoise::memoise
now used on 3rd run through an identicalCache
call, dramatically speeding up in most cases -
new options:
reproducible.cachePath
,reproducible.quick
,reproducible.useMemoise
,reproducible.useCache
,reproducible.useragent
,reproducible.verbose
-
asPath
has a new argument indicating how deep should the path be considered when included in caching (only relevant whenquick = TRUE
) -
New vignette on using Cache
-
Cache is
parallel
-safe, meaning there aretryCatch
around every attempt at writing to SQLite database so it can be used safely on multi-threaded machines -
bug fixes, unit tests, more
imports
for packages e.g.,stats
-
updates for R 3.6.0 compact storage of sequence vectors
-
experimental pipes (
%>%
,%C%
) and assign%<%
-
several performance enhancements
-
mergeCache
: a new function to merge two different Cache repositories -
memoise::memoise
is now used onloadFromLocalRepo
, meaning that the 3rd timeCache()
is run on the same arguments (and the 2nd time in a session), the returned Cache will be from a RAM object via memoise. To stop this behaviour and use only disk-based Caching, setoptions(reproducible.useMemoise = FALSE)
. -
Cache assign --
%<%
can be used instead of normal assign, equivalent tolhs <- Cache(rhs)
. -
new option: reproducible.verbose, set to FALSE by default, but if set to true may help understand caching behaviour, especially for complex highly nested code.
-
all options now described in
?reproducible
. -
All Cache arguments other than FUN and ... will now propagate to internal, nested Cache calls, if they are not specified explicitly in each of the inner Cache calls.
-
Cached pipe operator
%C%
-- use to begin a pipe sequence, e.g.,Cache() %C% ...
-
Cache arg
sideEffect
can now be a path -
Cache arg
digestPathContent
default changed from FALSE (was for speed) to TRUE (for content accuracy) -
New function,
searchFull
, which shows the full search path, known alternatively as "scope", or "binding environments". It is where R will search for a function when requested by a user. -
Uses
memoise::memoise
for several functions (loadFromLocalRepo
,pkgDep
,package_dependencies
,available.packages
) for speed -- will impact memory at the expense of speed. -
New
Require
function- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
require
on those 20 packages, butrequire
does not check for dependencies and deal with them if missing: it just errors. This speed should be fast enough for many purposes. - can accept uncommented name, if length 1.
- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
-
remove
dplyr
from Imports -
Add
RCurl
to Imports -
change name of
digestRaster
to.digestRaster
- fix R CMD check errors on Solaris that were not previously resolved
- fix R CMD check errors on Solaris
- fix bug in
digestRaster
affecting in-memory rasters - move
rgdal
to Suggests
- cleanup examples and do run them (per CRAN)
- add tests to ensure all exported (non-dot) functions have examples
- A new package, which takes all caching utilities out of the
SpaDES
package.