Skip to content

Commit

Permalink
Various content updates
Browse files Browse the repository at this point in the history
  • Loading branch information
NeuroShepherd committed Jul 24, 2024
1 parent d06e47d commit 4ef72e2
Show file tree
Hide file tree
Showing 5 changed files with 64 additions and 29 deletions.
46 changes: 36 additions & 10 deletions caching.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,29 @@ title: "Caching"
bibliography: references.bib
---

Page goals:
This page will discuss the concept of caching and how it is used in {renv}.

- briefly describe caching and the fact that there are shared libraries
- discuss how {renv} uses caching
- shared {renv} library for packages on local system
- installation of packages, and checks for packages first in the shared library
- "symlinks" to the shared library in `renv/library`
<!-- - briefly describe caching and the fact that there are shared libraries -->
<!-- - discuss how {renv} uses caching -->
<!-- - shared {renv} library for packages on local system -->
<!-- - installation of packages, and checks for packages first in the shared library -->
<!-- - "symlinks" to the shared library in `renv/library` -->


# What is Caching?

In the context of software development, caching is used to store data that is frequently accessed, such as packages, to speed up the execution of a program. When a program needs to access a package, it first checks the cache to see if the package is already stored there. If the package is found in the cache, the program can retrieve it quickly without having to download it again. This can significantly reduce the time it takes to run a program, especially if the package is large or if the program is run frequently.

# Caching in {renv}

In the context of {renv}, the package cache is a shared library that contains the packages used in your projects. The cache will, when needed, contain multiple different versions of the same package and your project will link to the correct version, only downloading the version specified in the `renv.lock` if you don't already have it somewhere in the renv cache. This shared library is a huge space saver, especially if you have many projects using the same packages.

A cache is built per each minor version of R you use. For example, if have used {renv} with R versions 4.3 and 4.4 on your computer, then you will end up with a cache matching each of these R versions. This can be unexpected if the caching behavior is not known to you. Upgrading from e.g. R 4.3.2 to R 4.3.3 will not cause this, however.

## Cache Locations

The {renv} caches for R packages will be in one of the following locations, based on your operating system:

> You can find the location of the current cache with `renv::paths$cache()`. By default, it will be in one of the following folders:
>
> - Linux: `~/.cache/R/renv/cache`
>
> - macOS: `~/Library/Caches/org.R-project.R/R/renv/cache`
Expand All @@ -21,9 +34,22 @@ Page goals:
[@posit2024]

## What is Caching?
Within each of these cache folders, you should see subfolders for each version of R that you have used with {renv}. For example, on a macOS system, you might see the following folders:

``` r
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5", full.names = T)

#> [1] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/macos"
#> [2] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.2"
#> [3] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3"
#> [4] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4"
```

You should not need to interact with the cache directly, but it can be useful to know where it is located and particularly that there can me *multiple caches* in case you need to troubleshoot any issues with {renv}.

## Project Library

## Caching in {renv}
When you install a package in a project with {renv}, the package is installed in the shared library, and a "symlink" is created in the `renv/library` folder of your project. This symlink points to the package in the shared library, so the package is not duplicated in your project. This helps to save space on your local machine and ensures that the package is only downloaded once, even if it is used in multiple projects.

### Symlinks

Expand Down
2 changes: 1 addition & 1 deletion ex_init_snapshot.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Initialize and Snapshot"
---

```{r}
```{r, echo=FALSE}
library(magrittr)
```

Expand Down
10 changes: 5 additions & 5 deletions intro_dependencies.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,19 @@ title: "Software Dependencies"

## Dependencies Overview

The general concept of software dependencies is relatively straightforward: "dependencies" are other softwares/programs that the software you're using or developing *depends* on to function. For example, if you are developing an R package, you will need R installed on your machine, or if you download an R package that uses functions from the `dplyr` package, `dplyr` is a library dependency that must be downloaded too.
The general concept of software dependencies is relatively straightforward: "dependencies" are other softwares/programs that the software you're using or developing *depends* on to function. For example, if you are developing an R package, you will need R installed on your machine, or if the R code you are using includes functions from the `dplyr` package, then the `dplyr` must be downloaded first.

There are many layers of dependencies that can exist in a project, and these dependencies can be difficult to manage. Extending the previous example, one must keep in mind that `dplyr` itself has dependencies, such as `tibble`, `rlang`, and `vctrs`, which must also be downloaded. And `tibble`, `rlang`, and `vctrs` have dependencies too, and so on. This process works recursively until all unique packages are identified, and a project that appears to only use e.g. 5-10 libraries can ultimately require a few dozen or hundred packages. This is not an R-specific problem, bur rather a common issue in software development known as "dependency hell."[^left-pad]
There are many layers of dependencies that can exist in a project, and these dependencies can be difficult to manage. Extending the previous example, one must keep in mind that `dplyr` itself has dependencies, such as `tibble`, `rlang`, and `vctrs`, which must also be downloaded. And `tibble`, `rlang`, and `vctrs` have dependencies too, and so on. This process works recursively until all unique packages are identified, and a project that appears to only use e.g. 5-10 libraries can ultimately require a few dozen or hundred packages. This is not an R-specific problem, but rather a common issue in software development known as "dependency hell."[^left-pad]

[^left-pad]: If you have enough time, read about how the [left-pad incident](https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code) broke the internet as an example of how dependencies can go wrong.

If you're ever curious about the dependencies of a package, there are a variety of ways you can find this information via the package's documentation:
If you're ever curious about the dependencies of a package, they should all be declared in the `DESCRIPTION` file of an R package. There are a variety of ways you can access this information:

* Reading the `DESCRIPTION` file of an R package, which can be accessed in a variety of ways
* Methods for reading the `DESCRIPTION` file of an R package:
- `utils::packageDescription()` to print the complete `DESCRIPTION` file of a package to the console
- `tools::package_dependencies()` for just a list of dependencies from the `DESCRIPTION` file
- Looking at the `DESCRIPTION` file online on the package's CRAN or GitHub page
* `pak::pkg_deps_tree()` for a visual representation of the direct dependencies **and** the recursive dependencies
* `pak::pkg_deps_tree()` for a visual representation of the direct dependencies **and** the recursive dependencies. An example of this will be shown in the [Starting Details](starting_details.qmd) chapter.


## Intro to Management Practices
Expand Down
17 changes: 12 additions & 5 deletions restoring_a_project.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ We will cover restoring a project from a GitHub repository, restoring a project

# From a GitHub Repo

Restoring a project library from a GitHub repository is ideally the most straightforward way to restore a project. The steps are as follows:
Restoring a project library from a GitHub repository is ideally the most straightforward way to restore a project. A repo *should* always have all of the required files for restoring a project including the `renv.lock` file, the `renv` folder, the `*.Rproj` file, and the R scripts. Note that repos generally do not (and should not) contain the `renv/library` folder--this is intended behavior. Recall from the caching chapter that the `renv/library` contains "symlinks" to the actual package files on your machine, and these symlinks are not portable across machines. Moreover, the `renv/library` folder can be quite large and is not necessary for restoring a project.

The steps for restoring a project from a GitHub repository are as follows:

- Clone the repository to your local machine with `git clone`
- Open the `*.Rproj` file
Expand All @@ -18,22 +20,27 @@ There should not be anything else to do! The `renv.lock` file will be used to in

# From a lockfile

If a researcher has shared a renv.lock file and other R scripts **outside of an R project** , restoring is still possible but requires an extra step or two:
Another approach to restoring a project is by sharing just the `renv.lock` file and other R scripts. This is useful if you are sharing a project with someone who does not use Git or if you are sharing a project with someone who does not have a GitHub account. The procedure for restoring contains just a few extra steps compared to the previous method, but should still look familiar:

- Place the lockfile in the directory you are working on your project from. This is the same directory that you would run `renv::init()` from, and should be an `*.Rproj` project.
- Place the lockfile in the directory you are working on your project from. This should be a directory with a `*.Rproj` file.
- Run `renv::status()`. The results should look like this:

``` r
renv::status()
#> This project does not appear to be using renv.
#> Use `renv::restore()` to install the packages defined in lockfile.
```

- Run `renv::restore()`. If prompted, choose the option "Activate the project and use the project library." This will create all other necessary files and directories for a {renv} project.



# After Upgrading R

Something that may come as a surprise is the need to restore a project after upgrading R. This is because the `renv` package is tied to the version of R you are using. The `renv.lock` file will have the version of R that the project was last restored with, and if you upgrade R, you will need to restore the project again. This is a good thing! It ensures that the project is using the correct versions of the packages that were used when the project was last worked on.
Something that may come as a surprise is the need to restore a project after upgrading R. This occurs because the {renv} package creates a separate cache for each major and minor version of R that you use. These "major" and "minor" upgrades refer to semantic versioning of software which is a way of providing version numbers to software that uses three numbers separated by periods: `major.minor.patch`. For example, an upgrade from R4.3 to R4.4 will cause you to need to restore your project, but an upgrade from R4.3.1 to R4.3.2 will not.

Therefore, if you upgrade R by a major or minor version, you should also update the version of R in the `renv.lock` file by running `renv::snapshot()` and update any packages in your project with `renv::update()`. This will ensure that the project is using the correct versions of the packages and of R that were used when the project was last worked on.

Note that this is unlikely to be an issue if you are using e.g. a managed versioned of R and RStudio on a server, if you use a containerized environment like Docker, or if you actively manage the R version on your local machine with a solution like [rswitch](https://rud.is/rswitch/) or [rig](https://github.com/r-lib/rig). The latter two solutions are especially useful if you are working on multiple projects that require different versions of R on your local machine.

Note that this only occurs when upgrading by major or minor versions of R. If you're not familiar with "semantic versioning," it is a way of versioning software that uses three numbers separated by periods: `major.minor.patch`. An upgrade from R4.3 to R4.4 will cause you to need to restore your projects, but an upgrade from R4.3.1 to R4.3.2 will not.

Loading

0 comments on commit 4ef72e2

Please sign in to comment.