Releases: ropensci/drake
Releases · ropensci/drake
Flexible triggers
- Overhaul the interface for triggers and add new trigger types ("condition" and "change").
- Offload
drake
's code examples to this repository and make makedrake_example()
anddrake_examples()
download examples from there. - Optionally show output files in graph visualizations. See the
show_output_files
argument tovis_drake_graph()
and friends. - Repair output file checksum operations for distributed backends like
"clustermq_staged"
and"future_lapply"
. - Internally refactor the
igraph
attributes of the dependency graph to allow for smarter dependency/memory management duringmake()
. - Enable
vis_drake_graph()
andsankey_drake_graph()
to save static image files viawebshot
. - Deprecate
static_drake_graph()
andrender_static_drake_graph()
in favor ofdrake_ggraph()
andrender_drake_ggraph()
. - Add a
columns
argument toevaluate_plan()
so users can evaluate wildcards in columns other than thecommand
column ofplan
. - Name the arguments of
target()
so users do not have to (explicitly). - Lay the groundwork for a special pretty print method for workflow plan data frames.
Node clusters, Sankey diagrams, and clustermq_staged parallelism
- Allow multiple output files per command.
- Add Sankey diagram visuals:
sankey_drake_graph()
andrender_sankey_drake_graph()
. - Add
static_drake_graph()
andrender_static_drake_graph()
forggplot2
/ggraph
static graph visualizations. - Add
group
andclusters
arguments tovis_drake_graph()
,static_drake_graph()
, anddrake_graph_info()
to optionally condense nodes into clusters. - Implement a
trace
argument toevaluate_plan()
to optionally add indicator columns to show which targets got expanded/evaluated with which wildcard values. - Rename the
always_rename
argument torename
inevaluate_plan()
. - Add a
rename
argument toexpand_plan()
. - Implement
make(parallelism = "clustermq_staged")
, aclustermq
-based staged parallelism backend (see #452). - Implement
make(parallelism = "future_lapply_staged")
, afuture
-based staged parallelism backend (see #450). - Depend on
codetools
rather thanCodeDepends
for finding global variables. - Detect
loadd()
andreadd()
dependencies inknitr
reports referenced withknitr_in()
inside imported functions. Previously, this feature was only available in explicitknitr_in()
calls in commands. - Skip more tests on CRAN. White-list tests instead of blacklisting them in order to try to keep check time under the official 10-minute cap.
- Disallow wildcard names to grep-match other wildcard names or any replacement values. This will prevent careless mistakes and confusion when generating
drake_plan()
s. - Prevent persistent workers from hanging when a target fails.
- Move the example template files to https://github.com/ropensci/drake/tree/master/inst/hpc_template_files.
- Deprecate
drake_batchtools_tmpl_file()
in favor ofdrake_hpc_template_file()
anddrake_hpc_template_files()
. - Add a
garbage_collection
argument tomake()
. IfTRUE
,gc()
is called after every new build of a target. - Remove redundant calls to
sanitize_plan()
inmake()
. - Change
tracked()
to accept only adrake_config()
object as an argument. Yes, it is technically a breaking change, but it is only a small break, and it is the correct API choice. - Move visualization and hpc package dependencies to "Suggests:" rather than "Imports:" in the
DESCRIPTION
file. - Allow processing of codeless
knitr
reports without warnings.
Intermediate mini-release
This release comes right before an implementation of #283. Changelog:
- Add Sankey diagram visuals:
sankey_drake_graph()
andrender_sankey_drake_graph()
. - Add
static_drake_graph()
andrender_static_drake_graph()
forggplot2
/ggraph
static graph visualizations. - Add
group
andclusters
arguments tovis_drake_graph()
,static_drake_graph()
, anddrake_graph_info()
to optionally condense nodes into clusters. - Implement a
trace
argument toevaluate_plan()
to optionally add indicator columns to show which targets got expanded/evaluated with which wildcard values. - Rename the
always_rename
argument torename
inevaluate_plan()
. - Add a
rename
argument toexpand_plan()
. - Implement
make(parallelism = "clustermq_staged")
, aclustermq
-based staged parallelism backend (see #452). - Implement
make(parallelism = "future_lapply_staged")
, afuture
-based staged parallelism backend (see #450). - Depend on
codetools
rather thanCodeDepends
for finding global variables. - Detect
loadd()
andreadd()
dependencies inknitr
reports referenced withknitr_in()
inside imported functions. Previously, this feature was only available in explicitknitr_in()
calls in commands. - Skip more tests on CRAN. White-list tests instead of blacklisting them in order to try to keep check time under the official 10-minute cap.
- Disallow wildcard names to grep-match other wildcard names or any replacement values. This will prevent careless mistakes and confusion when generating
drake_plan()
s. - Prevent persistent workers from hanging when a target fails.
- Move the example template files to https://github.com/ropensci/drake/tree/master/inst/hpc_template_files.
- Deprecate
drake_batchtools_tmpl_file()
in favor ofdrake_hpc_template_file()
anddrake_hpc_template_files()
. - Add a
garbage_collection
argument tomake()
. IfTRUE
,gc()
is called after every new build of a target. - Remove redundant calls to
sanitize_plan()
inmake()
. - Change
tracked()
to accept only adrake_config()
object as an argument. Yes, it is technically a breaking change, but it is only a small break, and it is the correct API choice.
Parallel computing improvements
- Sequester staged parallelism in backends "mclapply_staged" and "parLapply_staged". For the other
lapply
-like backends,drake
uses persistent workers and a master process. In the case of"future_lapply"
parallelism, the master process is a separate background process called byRscript
. - Remove the appearance of staged parallelism from single-job
make()
's.
(Previously, there were "check" messages and a call tostaged_parallelism()
.) - Remove uncontained remnants of staged parallelism internals.
- Allow different parallel backends for imports vs targets. For example,
make(parallelism = c(imports = "mclapply_staged", targets = "mclapply")
. - Fix a bug in environment pruning. Previously, dependencies of downstream targets were being dropped from memory in
make(jobs = 1)
. Now, they are kept in memory until no downstream target needs them (formake(jobs = 1)
). - Improve
predict_runtime()
. It is a more sensible way to go about predicting runtimes with multiple jobs. Likely to be more accurate. - Calls to
make()
no longer leave targets in the user's environment. - Attempt to fix a Solaris CRAN check error. The test at https://github.com/ropensci/drake/blob/b4dbddb840d2549621b76bcaa46c344b0fd2eccc/tests/testthat/test-edge-cases.R#L3 was previously failing on CRAN's Solaris machine (R 3.5.0). In the test, one of the threads deliberately quits in error, and the R/Solaris installation did not handle this properly. The test should work now because it no longer uses any parallelism.
- Deprecate the
imports_only
argument tomake()
anddrake_config()
in favor ofskip_targets
. - Deprecate
migrate_drake_project()
. - Deprecate
max_useful_jobs()
. - For non-distributed parallel backends, stop waiting for all the imports to finish before the targets begin.
- Add an
upstream_only
argument tofailed()
so users can list failed targets that do not have any failed dependencies. Naturally accompaniesmake(keep_going = TRUE)
. - Add an RStudio R Markdown template compatible with https://krlmlr.github.io/drake-pitch/.
- Remove
plyr
as a dependency. - Handle duplicated targets better in
drake_plan()
andbind_plans()
. - Add a true function
target()
to help create drake plans with custom columns. - In
drake_gc()
, clean out disruptive files instorr
s with mangled keys (re: #198). - Move all the vignettes to the up and coming user manual: https://ropenscilabs.github.io/drake-manual/
- Rename the "basic example" to the "mtcars example".
- Deprecate
load_basic_example()
in favor ofload_mtcars_example()
. - Refocus the
README.md
file on the main example rather than the mtcars example. - Use a
README.Rmd
file to generateREADME.md
. - Add function
deps_targets()
. - Deprecate function
deps()
in favor ofdeps_code()
- Add a
pruning_strategy
argument tomake()
anddrake_config()
so the user can decide howdrake
keeps non-import dependencies in memory when it builds a target. - Add optional custom (experimental) "workers" and "priorities" columns to the
drake
plans to help users customize scheduling. - Add a
makefile_path
argument tomake()
anddrake_config()
to avoid potential conflicts between user-side customMakefile
s and the one written bymake(parallelism = "Makefile")
. - Document batch mode for long workflows in the HPC guide.
- Add a
console
argument tomake()
anddrake_config()
so users can redirect console output to a file. - Make it easier for the user to find out where a target in the cache came from:
show_source()
,readd(show_source = TRUE)
,loadd(show_source = TRUE)
.
Intermediate development release
CRAN hotfix
- In R 3.5.0, the
!!
operator from tidyeval andrlang
is parsed differently than in R <= 3.4.4. This change broke one of the tests intests/testthat/tidy-eval.R
The main purpose ofdrake
's 5.1.2 release is to fix the broken test. - Fix an elusive
R CMD check
error from building the pdf manual with LaTeX. - In
drake_plan()
, allow users to customize target-level columns usingtarget()
inside the commands. - Add a new
bind_plans()
function to concatenate the rows of drake plans and then sanitize the aggregate plan. - Add an optional
session
argument to tellmake()
to build targets in a separate, isolated master R session. For example,make(session = callr::r_vanilla)
.
Minor release: new file API, tidyselect, and internal fixes
Version 5.1.0
- Add a
reduce_plan()
function to do pairwise reductions on collections of targets. - Forcibly exclude the dot (
.
) from being a dependency of any target or import. This enforces more consistent behavior in the face of the current static code analysis funcionality, which sometimes detects.
and sometimes does not. - Use
ignore()
to optionally ignore pieces of workflow plan commands and/or imported functions. Useignore(some_code)
to- Force
drake
to not track dependencies insome_code
, and - Ignore any changes in
some_code
when it comes to deciding which target are out of date.
- Force
- Force
drake
to only look for imports in environments inheriting fromenvir
inmake()
(plus explicitly namespaced functions). - Force
loadd()
to ignore foreign imports (imports not explicitly found inenvir
whenmake()
last imported them). - Reduce default verbosity. Only targets are printed out by default. Verbosity levels are integers ranging from 0 through 4.
- Change
loadd()
so that only targets (not imports) are loaded if the...
andlist
arguments are empty. - Add check to drake_plan() to check for duplicate targets
- Add a
.gitignore
file containing"*"
to the default.drake/
cache folder every timenew_cache()
is called. This means the cache will not be automatically committed to git. Users need to remove.gitignore
file to allow unforced commits, and then subsequentmake()
s on the same cache will respect the user's wishes and not add another.gitignore
. this only works for the default cache. Not supported for manualstorr
s. - Add a new experimental
"future"
backend with a manual scheduler. - Implement
dplyr
-styletidyselect
functionality inloadd()
,clean()
, andbuild_times()
. Forbuild_times()
, there is an API change: fortidyselect
to work, we needed to insert a new...
argument as the first argument ofbuild_times()
. - Deprecate the single-quoting API for files. Users should now use formal API functions in their commands:
file_in()
for file inputs to commands or imported functions (for imported functions, the input file needs to be an imported file, not a target).file_out()
for output file targets (ignored if used in imported functions).knitr_in()
forknitr
/rmarkdown
reports. This tellsdrake
to look inside the source file for target dependencies in code chunks (explicitly referenced withloadd()
andreadd()
). Treated as afile_in()
if used in imported functions.
- Change
drake_plan()
so that it automatically fills in any target names that the user does not supply. Also, anyfile_out()
s become the target names automatically (double-quoted internally). - Make
read_drake_plan()
(rather than an emptydrake_plan()
) the defaultplan
argument in all functions that accept aplan
. - Add support for active bindings:
loadd(..., lazy = "bind")
. That way, when you have a target loaded in one R session and hitmake()
in another R session, the target in your first session will automatically update. - Use tibbles for workflow plan data frames and the output of
dataframes_graph()
. - Return warnings, errors, and other context of each build, all wrapped up with the usual metadata.
diagnose()
will take on the role of returning this metadata. - Deprecate the
read_drake_meta()
function in favor ofdiagnose()
. - Add a new
expose_imports()
function to optionally forcedrake
detect deeply nested functions inside specific packages. - Move the "quickstart.Rmd" vignette to "example-basic.Rmd". The so-called "quickstart" didn't end up being very quick, and it was all about the basic example anyway.
- Move
drake_build()
to be an exclusively user-side function. - Add a
replace
argument toloadd()
so that objects already in the user's eOne small thing:nvironment need not be replaced. - When the graph cyclic, print out all the cycles.
- Prune self-referential loops (and duplicate edges) from the workflow graph. That way, recursive functions are allowed.
- Add a
seed
argument tomake()
,drake_config()
, andload_basic_example()
. Also hard-code a default seed of0
. That way, the pseudo-randomness in projects should be reproducible
across R sessions. - Cache the pseudo-random seed at the time the project is created and use that seed to build targets until the cache is destroyed.
- Add a new
drake_read_seed()
function to read the seed from the cache. Its examples illustrate whatdrake
is doing to try to ensure reproducible random numbers. - Evaluate the quasiquotation operator
!!
for the...
argument todrake_plan()
. Suppress this behavior usingtidy_evaluation = FALSE
or by passing in commands passed through thelist
argument. - Preprocess workflow plan commands with
rlang::expr()
before evaluating them. That means you can use the quasiquotation operator!!
in your commands, andmake()
will evaluate them according to the tidy evaluation paradigm. - Restructure
drake_example("basic")
,drake_example("gsp")
, anddrake_example("packages")
to demonstrate how to set up the files for seriousdrake
projects. More guidance was needed in light of this issue. - Improve the examples of
drake_plan()
in the help file (?drake_plan
).
Version 5.0.0
- Transfer
drake
to rOpenSci: https://github.com/ropensci/drake - Several functions now require an explicit
config
argument, which you can get from
drake_config()
ormake()
. Examples:- outdated()
- missed()
- rate_limiting_times()
- predict_runtime()
- vis_drake_graph()
- dataframes_graph()
- Always process all the imports before building any targets. This is part of the solution to #168: if imports and targets are processed together, the full power of parallelism is taken away from the targets. Also, the way parallelism happens is now consistent for all parallel backends.
- Major speed improvement: dispense with internal inventories and rely on
cache$exists()
instead. - Let the user define a trigger for each target to customize when
make()
decides to build targets. - Document triggers and other debugging/testing tools in the new debug vignette.
- Restructure the internals of the
storr
cache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use ofstorr
namespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake. - Use
formatR::tidy_source()
instead ofparse()
intidy_command()
(originallytidy()
inR/dependencies.R
). Previously,drake
was having problems with an edge case: as a command, the literal string"A"
was interpreted as the symbolA
after tidying. Withtidy_source()
, literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0. - Speed up clean() by refactoring the cache inventory and using light parallelism.
- Implement
rescue_cache()
, exposed to the user and used inclean()
. This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more. - Change the default
cpu
andelapsed
arguments ofmake()
toNULL
. This solves an elusive bug in how drake imposes timeouts. - Allow users to set target-level timeouts (overall, cpu, and elapsed) with columns in the workflow plan data frame.
- Document timeouts and retries in the new debug vignette.
- Add a new
graph
argument to functionsmake()
,outdated()
, andmissed()
. - Export a new
prune_graph()
function for igraph objects. - Delete long-deprecated functions
prune()
andstatus()
. - Deprecate and rename functions:
analyses()
=>plan_analyses()
as_file()
=>as_drake_filename()
backend()
=>future::plan()
build_graph()
=>build_drake_graph()
check()
=>check_plan()
config()
=>drake_config()
evaluate()
=>evaluate_plan()
example_drake()
=>drake_example()
examples_drake()
=>drake_examples()
expand()
=>expand_plan()
gather()
=>gather_plan()
plan()
,workflow()
,workplan()
=>drake_plan()
plot_graph()
=>vis_drake_graph()
read_config()
=>read_drake_config()
read_graph()
=>read_drake_graph()
read_plan()
=>read_drake_plan()
render_graph()
=>render_drake_graph()
session()
=>drake_session()
summaries()
=>plan_summaries()
- Disallow
output
andcode
as names in the workflow plan data frame. Usetarget
andcommand
instead. This naming switch has been formally deprecated for several months prior. - Deprecate the ..analysis.. and ..dataset.. wildcards in favor of analysis__ and dataset__, respectively. The new wildcards are stylistically better an pass linting checks.
- Add new functions
drake_quotes()
,drake_unquote()
, anddrake_strings()
to remove the silly dependence on theeply
package. - Add a
skip_safety_checks
flag tomake()
anddrake_config()
. Increases speed. - In
sanitize_plan()
, remove rows with blank targets "". - Add a
purge
argument toclean()
to optionally remove all target-level information. - Add a
namespace
argument tocached()
so users can inspect individualstorr
namespaces. - Change
verbose
to numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everyth...
Intermediate release: last version of drake that does not support tidy evaluation
v5.0.1.9000 Fine tune some prose
First release under rOpenSci
TL;DR: this is the first release in which drake
is part of rOpenSci. Relative to 4.4.0, this release has major changes to cache internals, user-level function names, and documentation.
- Transfer
drake
to rOpenSci: https://github.com/ropensci/drake - Several functions now require an explicit
config
argument, which you can get from
drake_config()
ormake()
. Examples:- outdated()
- missed()
- rate_limiting_times()
- predict_runtime()
- vis_drake_graph()
- dataframes_graph()
- Always process all the imports before building any targets. This is part of the solution to #168: if imports and targets are processed together, the full power of parallelism is taken away from the targets. Also, the way parallelism happens is now consistent for all parallel backends.
- Major speed improvement: dispense with internal inventories and rely on
cache$exists()
instead. - Let the user define a trigger for each target to customize when
make()
decides to build targets. - Document triggers and other debugging/testing tools in the new debug vignette.
- Restructure the internals of the
storr
cache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use ofstorr
namespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake. - Use
formatR::tidy_source()
instead ofparse()
intidy_command()
(originallytidy()
inR/dependencies.R
). Previously,drake
was having problems with an edge case: as a command, the literal string"A"
was interpreted as the symbolA
after tidying. Withtidy_source()
, literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0. - Speed up clean() by refactoring the cache inventory and using light parallelism.
- Implement
rescue_cache()
, exposed to the user and used inclean()
. This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more. - Change the default
cpu
andelapsed
arguments ofmake()
toNULL
. This solves an elusive bug in how drake imposes timeouts. - Allow users to set target-level timeouts (overall, cpu, and elapsed) with columns in the workflow plan data frame.
- Document timeouts and retries in the new debug vignette.
- Add a new
graph
argument to functionsmake()
,outdated()
, andmissed()
. - Export a new
prune_graph()
function for igraph objects. - Delete long-deprecated functions
prune()
andstatus()
. - Deprecate and rename functions:
analyses()
=>plan_analyses()
as_file()
=>as_drake_filename()
backend()
=>future::plan()
build_graph()
=>build_drake_graph()
check()
=>check_plan()
config()
=>drake_config()
evaluate()
=>evaluate_plan()
example_drake()
=>drake_example()
examples_drake()
=>drake_examples()
expand()
=>expand_plan()
gather()
=>gather_plan()
plan()
,workflow()
,workplan()
=>drake_plan()
plot_graph()
=>vis_drake_graph()
read_config()
=>read_drake_config()
read_graph()
=>read_drake_graph()
read_plan()
=>read_drake_plan()
render_graph()
=>render_drake_graph()
session()
=>drake_session()
summaries()
=>plan_summaries()
- Disallow
output
andcode
as names in the workflow plan data frame. Usetarget
andcommand
instead. This naming switch has been formally deprecated for several months prior. - Deprecate the ..analysis.. and ..dataset.. wildcards in favor of analysis__ and dataset__, respectively. The new wildcards are stylistically better an pass linting checks.
- Add new functions
drake_quotes()
,drake_unquote()
, anddrake_strings()
to remove the silly dependence on theeply
package. - Add a
skip_safety_checks
flag tomake()
anddrake_config()
. Increases speed. - In
sanitize_plan()
, remove rows with blank targets "". - Add a
purge
argument toclean()
to optionally remove all target-level information. - Add a
namespace
argument tocached()
so users can inspect individualstorr
namespaces. - Change
verbose
to numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everything. - Add a new
next_stage()
function to report the targets to be made in the next parallelizable stage. - Add a new
session_info
argument tomake()
. Apparently,sessionInfo()
is a bottleneck for smallmake()
s, so there is now an option to suppress it. This is mostly for the sake of speeding up unit tests. - Add a new
log_progress
argument tomake()
to suppress progress logging. This increases storage efficiency and speeds some projects up a tiny bit. - Add an optional
namespace
argument toloadd()
andreadd()
. You can now load and read from non-defaultstorr
namespaces. - Add
drake_cache_log()
,drake_cache_log_file()
, andmake(..., cache_log_file = TRUE)
as options to track changes to targets/imports in the drake cache. - Detect knitr code chunk dependencies in response to commands with
rmarkdown::render()
, not justknit()
. - Add a new general best practices vignette to clear up misconceptions about how to use
drake
properly.
Another intermediate release before version 5
Version 4.4.1.9002 is not back compatible with version 4.4.1.9001 because the cache internals were refactored again to solve #154. Anyone relying on the development version for current projects may need to use packrat
with this release of 4.4.1.9001 to avoid having to rerun projects from scratch.