diff --git a/caching.qmd b/caching.qmd index 3c25c2e..31c6bc9 100644 --- a/caching.qmd +++ b/caching.qmd @@ -3,16 +3,29 @@ title: "Caching" bibliography: references.bib --- -Page goals: +This page will discuss the concept of caching and how it is used in {renv}. -- briefly describe caching and the fact that there are shared libraries -- discuss how {renv} uses caching - - shared {renv} library for packages on local system - - installation of packages, and checks for packages first in the shared library - - "symlinks" to the shared library in `renv/library` + + + + + + + +# What is Caching? + +In the context of software development, caching is used to store data that is frequently accessed, such as packages, to speed up the execution of a program. When a program needs to access a package, it first checks the cache to see if the package is already stored there. If the package is found in the cache, the program can retrieve it quickly without having to download it again. This can significantly reduce the time it takes to run a program, especially if the package is large or if the program is run frequently. + +# Caching in {renv} + +In the context of {renv}, the package cache is a shared library that contains the packages used in your projects. The cache will, when needed, contain multiple different versions of the same package and your project will link to the correct version, only downloading the version specified in the `renv.lock` if you don't already have it somewhere in the renv cache. This shared library is a huge space saver, especially if you have many projects using the same packages. + +A cache is built per each minor version of R you use. For example, if have used {renv} with R versions 4.3 and 4.4 on your computer, then you will end up with a cache matching each of these R versions. This can be unexpected if the caching behavior is not known to you. Upgrading from e.g. R 4.3.2 to R 4.3.3 will not cause this, however. + +## Cache Locations + +The {renv} caches for R packages will be in one of the following locations, based on your operating system: -> You can find the location of the current cache with `renv::paths$cache()`. By default, it will be in one of the following folders: -> > - Linux: `~/.cache/R/renv/cache` > > - macOS: `~/Library/Caches/org.R-project.R/R/renv/cache` @@ -21,9 +34,22 @@ Page goals: [@posit2024] -## What is Caching? +Within each of these cache folders, you should see subfolders for each version of R that you have used with {renv}. For example, on a macOS system, you might see the following folders: + +``` r +list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5", full.names = T) + +#> [1] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/macos" +#> [2] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.2" +#> [3] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3" +#> [4] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4" +``` + +You should not need to interact with the cache directly, but it can be useful to know where it is located and particularly that there can me *multiple caches* in case you need to troubleshoot any issues with {renv}. + +## Project Library -## Caching in {renv} +When you install a package in a project with {renv}, the package is installed in the shared library, and a "symlink" is created in the `renv/library` folder of your project. This symlink points to the package in the shared library, so the package is not duplicated in your project. This helps to save space on your local machine and ensures that the package is only downloaded once, even if it is used in multiple projects. ### Symlinks diff --git a/ex_init_snapshot.qmd b/ex_init_snapshot.qmd index 0d02581..b40c9da 100644 --- a/ex_init_snapshot.qmd +++ b/ex_init_snapshot.qmd @@ -2,7 +2,7 @@ title: "Initialize and Snapshot" --- -```{r} +```{r, echo=FALSE} library(magrittr) ``` diff --git a/intro_dependencies.qmd b/intro_dependencies.qmd index 0be8e7e..f317dcc 100644 --- a/intro_dependencies.qmd +++ b/intro_dependencies.qmd @@ -5,19 +5,19 @@ title: "Software Dependencies" ## Dependencies Overview -The general concept of software dependencies is relatively straightforward: "dependencies" are other softwares/programs that the software you're using or developing *depends* on to function. For example, if you are developing an R package, you will need R installed on your machine, or if you download an R package that uses functions from the `dplyr` package, `dplyr` is a library dependency that must be downloaded too. +The general concept of software dependencies is relatively straightforward: "dependencies" are other softwares/programs that the software you're using or developing *depends* on to function. For example, if you are developing an R package, you will need R installed on your machine, or if the R code you are using includes functions from the `dplyr` package, then the `dplyr` must be downloaded first. -There are many layers of dependencies that can exist in a project, and these dependencies can be difficult to manage. Extending the previous example, one must keep in mind that `dplyr` itself has dependencies, such as `tibble`, `rlang`, and `vctrs`, which must also be downloaded. And `tibble`, `rlang`, and `vctrs` have dependencies too, and so on. This process works recursively until all unique packages are identified, and a project that appears to only use e.g. 5-10 libraries can ultimately require a few dozen or hundred packages. This is not an R-specific problem, bur rather a common issue in software development known as "dependency hell."[^left-pad] +There are many layers of dependencies that can exist in a project, and these dependencies can be difficult to manage. Extending the previous example, one must keep in mind that `dplyr` itself has dependencies, such as `tibble`, `rlang`, and `vctrs`, which must also be downloaded. And `tibble`, `rlang`, and `vctrs` have dependencies too, and so on. This process works recursively until all unique packages are identified, and a project that appears to only use e.g. 5-10 libraries can ultimately require a few dozen or hundred packages. This is not an R-specific problem, but rather a common issue in software development known as "dependency hell."[^left-pad] [^left-pad]: If you have enough time, read about how the [left-pad incident](https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code) broke the internet as an example of how dependencies can go wrong. -If you're ever curious about the dependencies of a package, there are a variety of ways you can find this information via the package's documentation: +If you're ever curious about the dependencies of a package, they should all be declared in the `DESCRIPTION` file of an R package. There are a variety of ways you can access this information: -* Reading the `DESCRIPTION` file of an R package, which can be accessed in a variety of ways +* Methods for reading the `DESCRIPTION` file of an R package: - `utils::packageDescription()` to print the complete `DESCRIPTION` file of a package to the console - `tools::package_dependencies()` for just a list of dependencies from the `DESCRIPTION` file - Looking at the `DESCRIPTION` file online on the package's CRAN or GitHub page -* `pak::pkg_deps_tree()` for a visual representation of the direct dependencies **and** the recursive dependencies +* `pak::pkg_deps_tree()` for a visual representation of the direct dependencies **and** the recursive dependencies. An example of this will be shown in the [Starting Details](starting_details.qmd) chapter. ## Intro to Management Practices diff --git a/restoring_a_project.qmd b/restoring_a_project.qmd index 61b8345..fc606a5 100644 --- a/restoring_a_project.qmd +++ b/restoring_a_project.qmd @@ -8,7 +8,9 @@ We will cover restoring a project from a GitHub repository, restoring a project # From a GitHub Repo -Restoring a project library from a GitHub repository is ideally the most straightforward way to restore a project. The steps are as follows: +Restoring a project library from a GitHub repository is ideally the most straightforward way to restore a project. A repo *should* always have all of the required files for restoring a project including the `renv.lock` file, the `renv` folder, the `*.Rproj` file, and the R scripts. Note that repos generally do not (and should not) contain the `renv/library` folder--this is intended behavior. Recall from the caching chapter that the `renv/library` contains "symlinks" to the actual package files on your machine, and these symlinks are not portable across machines. Moreover, the `renv/library` folder can be quite large and is not necessary for restoring a project. + +The steps for restoring a project from a GitHub repository are as follows: - Clone the repository to your local machine with `git clone` - Open the `*.Rproj` file @@ -18,22 +20,27 @@ There should not be anything else to do! The `renv.lock` file will be used to in # From a lockfile -If a researcher has shared a renv.lock file and other R scripts **outside of an R project** , restoring is still possible but requires an extra step or two: +Another approach to restoring a project is by sharing just the `renv.lock` file and other R scripts. This is useful if you are sharing a project with someone who does not use Git or if you are sharing a project with someone who does not have a GitHub account. The procedure for restoring contains just a few extra steps compared to the previous method, but should still look familiar: -- Place the lockfile in the directory you are working on your project from. This is the same directory that you would run `renv::init()` from, and should be an `*.Rproj` project. +- Place the lockfile in the directory you are working on your project from. This should be a directory with a `*.Rproj` file. - Run `renv::status()`. The results should look like this: + ``` r renv::status() #> This project does not appear to be using renv. #> Use `renv::restore()` to install the packages defined in lockfile. ``` + - Run `renv::restore()`. If prompted, choose the option "Activate the project and use the project library." This will create all other necessary files and directories for a {renv} project. # After Upgrading R -Something that may come as a surprise is the need to restore a project after upgrading R. This is because the `renv` package is tied to the version of R you are using. The `renv.lock` file will have the version of R that the project was last restored with, and if you upgrade R, you will need to restore the project again. This is a good thing! It ensures that the project is using the correct versions of the packages that were used when the project was last worked on. +Something that may come as a surprise is the need to restore a project after upgrading R. This occurs because the {renv} package creates a separate cache for each major and minor version of R that you use. These "major" and "minor" upgrades refer to semantic versioning of software which is a way of providing version numbers to software that uses three numbers separated by periods: `major.minor.patch`. For example, an upgrade from R4.3 to R4.4 will cause you to need to restore your project, but an upgrade from R4.3.1 to R4.3.2 will not. + +Therefore, if you upgrade R by a major or minor version, you should also update the version of R in the `renv.lock` file by running `renv::snapshot()` and update any packages in your project with `renv::update()`. This will ensure that the project is using the correct versions of the packages and of R that were used when the project was last worked on. + +Note that this is unlikely to be an issue if you are using e.g. a managed versioned of R and RStudio on a server, if you use a containerized environment like Docker, or if you actively manage the R version on your local machine with a solution like [rswitch](https://rud.is/rswitch/) or [rig](https://github.com/r-lib/rig). The latter two solutions are especially useful if you are working on multiple projects that require different versions of R on your local machine. -Note that this only occurs when upgrading by major or minor versions of R. If you're not familiar with "semantic versioning," it is a way of versioning software that uses three numbers separated by periods: `major.minor.patch`. An upgrade from R4.3 to R4.4 will cause you to need to restore your projects, but an upgrade from R4.3.1 to R4.3.2 will not. diff --git a/starting_details.qmd b/starting_details.qmd index a5fedd5..ba97add 100644 --- a/starting_details.qmd +++ b/starting_details.qmd @@ -2,18 +2,20 @@ title: "Getting Started Details" --- -# Initiation +The previous page introduced the main functions needed for using {renv} in a project, and showed short example snippets of how to use them. This page will provide a more detailed explanation of the functions `renv::init()`, `renv::status()`, and `renv::snapshot()`, and will provide a more detailed example of how to use them in a project. + +# Initiating {renv} As you might have noted from the videos in the previous sections, initiating a project with {renv} will cause the following changes to your project: 1. Creation of a lockfile, `renv.lock`, which records the version of R in use, the default download repository, and the packages used in the project. 2. Creation of a `renv` folder which contains the project `library`[^1], a settings file, a `.gitignore` file and a staging area for package installation -3. Addition of the line `source("renv/activate.R")` to your .Rprofile. This file is automatically run anytime a project session is started by e.g. opening the `*.Rproj` file. This line ensures that the project library is used in the session, not the global library. +3. Addition of the line `source("renv/activate.R")` to your `.Rprofile`. This file is automatically run anytime a project session is started by e.g. opening the `*.Rproj` file. This line ensures that the project library is used in the session, not the global library. 4. Updates the `.Rbuildignore` to include the `renv` folder and the `renv.lock` file. This is only relevant if you're building an R package. You should rarely, if ever, need to interact directly with any of the files or folders created by {renv}. The lockfile is the most important file, as it records the packages used in the project and their versions, and should be maintained with calls to functions from the {renv} package such as `snapshot()`. -[^1]: Strictly speaking, the directory `renv/library` contains [symlinks or symbolic links](https://www.google.com/search?q=symlink) to packages in the renv library, which is a cache of the packages used in the project. This "cached" library is a shared library, meaning that if you have multiple projects using the same version of the package, the package is only stored once on your computer. This is a huge space saver, especially if you have many projects using the same package. More on this in the Advanced Topics section. +[^1]: Strictly speaking, the directory `renv/library` contains [symlinks or symbolic links](https://www.google.com/search?q=symlink) to packages in the renv library, which is a cache of the packages used in the project. This "cached" library is a shared library, meaning that if you have multiple projects using the same version of the package, the package is only stored once on your computer. This is a huge space saver, especially if you have many projects using the same package. More on this in the [Caching](caching.qmd) chapter. ## Initiation Example @@ -63,7 +65,7 @@ You will likely never need to use `renv::dependencies()` directly, but being awa ## Dependency Trees -As mentioned in the [Dependencies Overview](intro_dependencies.html#dependencies-overview), project dependencies can have their own dependencies. For example, you might have noticed that when installing 1 new package, R asks you if you want to install additional packages. These additional packages are dependencies of the package you are installing, and can be referred to as *dependencies of dependencies* and cumulatively they create a "dependency tree." To get a visual representation of this, consider the following example of a recursive dependency tree (using the function `pak::pkg_deps_tree()` from the {pak} package). +As mentioned in the [Dependencies Overview](intro_dependencies.html#dependencies-overview), project dependencies can have their own dependencies. For example, you might have noticed that when installing 1 new package, R asks you if you want to install additional packages. These additional packages are dependencies of the package you are installing, and cumulatively they create a "dependency tree." To get a visual representation of this, consider the following example of a recursive dependency tree (using the function `pak::pkg_deps_tree()` from the {pak} package). ::: output-overflow @@ -118,13 +120,13 @@ pak::pkg_deps_tree("dplyr", dependencies = "hard")
-This tree shows the dependencies of the `dplyr` package at the top level, and the dependencies of those dependencies nested below each package. The tree is recursive, meaning that it will continue to show dependencies of dependencies until it reaches the end of the dependency chain. This is a simplified example, but it is important to understand that the dependencies of dependencies can be quite complex and can lead to a large number of packages being installed in your project. +This tree shows the dependencies of the `dplyr` package at the top level, and the dependencies of those dependencies nested below each package. The tree is recursive, meaning that it will continue to show dependencies of dependencies until it reaches the end of the dependency chain. This is a simplified example, but it is important to understand that the dependencies of dependencies can be quite complex and can lead to a large number of packages being installed in your project. **Above all, you should note that all of the packages on this dependency tree need to be recorded in the `renv.lock` file to ensure that your project is reproducible.** -You might have noticed the argument `dependencies = "hard"` in the function call. The `dependencies` argument is used to specify the *kinds of dependencies to install*. However, the discussion about dependencies so far has implied that that all dependencies are equally important and essential, and you might now be asking yourself: are there different kinds of dependencies? Yes, there are! To be more specific, there are different levels of dependencies, discussed below. +You might have noticed the argument `dependencies = "hard"` in the function call. The `dependencies` argument is used to specify the *kinds of dependencies to install*. However, the discussion about dependencies so far has implied that that another package is either a dependency or not, and you might now be asking yourself: are there different kinds of dependencies? Yes, there are! To be more specific, there are different levels of dependencies, discussed below. ## Dependency Levels -Generally-speaking, there are two levels of dependencies in R: hard and soft. Hard dependencies are those that are absolutely required for the software to function, while soft dependencies are those that are not required but can enhance the package. In the context of R, the following are the dependency levels: +Generally-speaking, there are two levels of dependencies in R: hard and soft. Hard dependencies are those that are absolutely required for the software to function, while soft dependencies are those that are not required but can improve the package or provide some optional functionality. In the context of R, the following are the dependency levels: - Hard Dependencies - Imports @@ -134,7 +136,7 @@ Generally-speaking, there are two levels of dependencies in R: hard and soft. Ha - Suggests - Enhances -These details may seem obscure or trivial, and, in most cases they are; you will be able to successfully use R and {renv} without knowing the difference between hard and soft dependencies 99% of the time. However, they are important to be aware of when when using automated package managers as they usually just install the Hard dependencies, but you might encounter scenarios where you mistakenly *expect* the Soft dependencies to be installed too. +The exact differences between the sub-levels of dependencies are not important for the purposes of this course, and, in most cases, the details of dependency levels are obscure and irrelevant; you will be able to successfully use R and {renv} without knowing the difference between hard and soft dependencies 99% of the time. However, they are important to be aware of when when using automated package managers as they usually just install the Hard dependencies, but you might encounter scenarios where you mistakenly *expect* the Soft dependencies to be installed too. In such cases, you might need to explicitly `renv::record()` the dependency into the `renv.lock` file. An example of this will be revisited in the exercises. # Status and Snapshot