---
title: The `Require` approach, comparing `pak` and `renv`
output: rmarkdown::html_vignette
author: "Eliot McIntire"
vignette: >
  %\VignetteIndexEntry{The `Require` approach, comparing `pak` and `renv`}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

`Require` is a single package that combines features of `base::install.packages`, `base::library`, `base::require`, as well as `pak::pkg_install`, `remotes::install_github`, and `versions::install_version`, plus the snapshotting capabilities of `renv`. It takes its name from the idea that a user could simply have one line named from the `require` function that would load a package, but in this case it will also install the package if necessary. Set it and forget it. This means that even if a user has a dependency that is removed from CRAN ("archived"), the line will still work. Because it can be done in one line, it becomes relatively easy to share, which facilitates, for example, making reprexes for debugging. This package can be a key part of a reproducible workflow.

# Principles used in `Require`

`Require` is designed with features that facilitate running R code that is part of a continuous reproducible workflow, from data-to-decisions. For this to work, all functions called by a user should have a property whereby the initial time they are called does the heavy work, and the subsequent times are sufficiently fast that the user is not forced to skip over lines of code when re-running code. This is called "rerun-tolerance" or "idempotency", i.e., the line can be rerun under identical conditions and very quickly return the original result. The package, `reproducible`, has a function `Cache` which can convert many function calls to have this property. It does not work well for functions whose objectives are side-effects, like installing and loading packages. `Require` fills this gap.

# How it works -- **Version priority**

Three rules describe `Require`'s behaviour completely:

1. **Version-number requirements drive updates.** If the installed version already satisfies the constraint, no update happens.
2. **No version requirement, package present → no install.**
3. **Multiple, apparently incompatible requests for the same package don't error.** 

Therefore, `Require` uses statement about *version* as the top level priority. Any request to install a package without a version statement will only install a package if it is not installed. Otherwise, it will install nothing. Examples:

```
Require::Require("data.table") # installs if missing, otherwise calls require
```

The next line installs `data.table` if missing, otherwise checks the locally installed version, installs update if 
needed to satisfy version statement, then calls require:
```
Require::Require("data.table (>=1.18.0)") 
```

This **version priority** behaviour matches the default `install.packages` behaviour in base R, when a package declares a version dependency. `Require` extends this to a user-specified statement.

See below for more detailed examples.

## Apparent package conflicts

When there are apparent package conflicts, `Require` uses, in this order: 

- version requirement;
- CRAN priority;
- order requested

See these examples:

```{r,eval=FALSE}
# No version specifications — CRAN version installed, or nothing if already installed
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible"))

# `HEAD` after the GitHub ref forces the tip of the development branch
Require::Install(c("PredictiveEcology/reproducible@development (HEAD)", "reproducible"))

# Same: `HEAD` after the package name (of either form) forces the tip
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible (HEAD)"))

# No conflict: version requirement is satisfiable by the named branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
                   "PredictiveEcology/reproducible (>= 2.0.10)"))

# Even if a branch doesn't exist, no error if a later requirement names a different branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
                   "PredictiveEcology/reproducible@validityTest (>= 2.0.9)"))
```

# New default as of version 2.0.0

The internal package dependency algorithm and package installation mechanism now uses `pak` for both instead of a custom package dependency function plus `install.packages`. This allows a user to mix and match `pak` based manual installs with `Require`-based code. I highlight below the differences between using `pak` and `Require`, with these new default internals. The old, native `Require` approach still works, if the user desires to use it: `options(Require.usePak = FALSE)`.

## Key features (when `usePak = TRUE`)

Features include:

1. Fast, parallel installs and downloads (delegated to `pak`).
2. Installs CRAN and CRAN-alike packages *even if they have been archived.*
3. Installs GitHub packages.
4. Can loads packages after installing, if using `Require::Require`.
5. User can specify which version to install using the standard R-version approach (e.g., `==3.5.0` or `>=3.5.0`).
6. Local package **caching** (see below) for fast (re-)installs.
7. Manages (several types of) conflicting package requests, i.e., different GitHub branches.
8. Finds specific versions of packages from an incomplete CRAN-like repository (such as r-universe.dev), even when the *version* is not available, but it *is* available on the main CRAN mirrors.

## Rerun-tolerance

To be functionally reproducible, code must be regularly run and tested on many operating systems and computers. When this does not happen, a user/developer does not know that certain code chunks no longer work until they try to run it later. In other words, code gets stale because underlying algorithms and data change. To be rerun-tolerant, a function must:

1. return the same result or outcome every time it is run (first, second or more times later);
2. be very fast after the first time; when it is not fast, users will skip running it "because we don't need to run it again and it is slow"

`Require` does both of these. See below "why is it fast".

## Why these features help teams

It is common during code development to work in teams, and to be updating package code. This is beneficial whether the team is very tight, all working on exactly the same project, or looser where they only share certain components across diverse projects.

### All working on same project

If the whole team is working on the same "whole" project, then it may be useful to use a "package snapshot" approach, as is used with the `renv` package. `Require` offers similar functionality with the function `pkgSnapshot()`. Using this approach provides a mechanism for each team member to update code, then snapshot the project, commit the snapshot and push to the cloud for the team to share.

### Diverse projects

However, if a team is more diversified and they are actually sharing the new code, but not the whole project, then project snapshots will be very inefficient and package management must be on a package-by-package case, not the whole project. In other words, the code developer can work on their package, and the various team members will have 2 options of what they might want to do: keep at the bleeding edge or update only if necessary for dependencies. More likely, they will want to have a mixture of these strategies, i.e., bleeding edge with some code, but only if necessary with others. Thus, `Require` offers programmatic control for this. For example

```{r,eval=FALSE}
Require::Install(
  c("PredictiveEcology/reproducible@development (HEAD)",
    "PredictiveEcology/SpaDES.core@development (>=2.0.5.9004)"))
```
will keep the project at the bleeding edge of the development branch of `reproducible`, but will only update if necessary (based on the version needed, expressed by the inequality) for the development branch of `SpaDES.core`. The user does not have to make decisions at run time as to whether an update should be made, and for which packages.


# Differences between `pak` and `Require`

## How `Require` differs from `pak` in philosophy

By default, as of version 2.0.0, `Require` uses `pak` to calculate package dependency trees and installations. However, `Require` applies a different philosophy to package management. The two tools answer the same question — "what should be installed?" — in different ways.

`Require` is therefore not an *alternative* to `pak`. It is a complementary *wrapper* that applies a different policy on top of `pak`. The differences described in this vignette are differences in **policy**, not in installation machinery. For example: when you call `pak::pkg_install("data.table")`, `pak` will offer to upgrade `data.table` if a newer version is on CRAN. When you call `Require::Install("data.table")`, `Require` first checks whether the installed version already satisfies your request; if it does, nothing happens at all. The actual install, when one is needed, is done by `pak` either way.

Practically speaking, this means that a user can write their list of packages they need in their code, and leave it there, without concern that their packages may unexpectedly be updated -- using time and possibly changing functionality at an inopportune moment.

## Stability vs. Most-recent

The biggest difference is what each tool does when a package is *already installed*.

* **`pak` is current-first.** If you ask `pak` to install a package that is already there, it will check for a newer version and offer to upgrade.
* **`Require` is stability-first.** If the installed version satisfies your request, `Require` does not install. It will only install or upgrade when the version constraint you wrote actually requires it.

This is what makes `Require` "set-and-forget". You can put a `Require::Install(...)` line near the top of a script, run that script every day for a year, and your packages will not silently change underneath you. They only change when you change your version requirement.

| The package state                | `pak::pkg_install("data.table")`            | `Require::Install("data.table")`            |
| -------------------------------- | ------------------------------------------- | ------------------------------------------- |
| Not installed                    | Installs latest                             | Installs latest                             |
| Installed, latest                | No change                                   | No change                                   |
| Installed, but newer on CRAN     | Asks user whether to upgrade                | No change                                   |
| Installed, version `< (>= X)`    | User cannot specify in this way             | Upgrades to satisfy                         |

`Require` exposes the upgrade policy through the version constraints in your code. If you want the latest, ask for it (e.g. `data.table (>= 1.16)` or `data.table (HEAD)`); if you want stability, leave the constraint off.

## GitHub branches: exact pin vs. version minimum

The same stability-first policy shows up clearly when you install from a GitHub branch. `pak` reads the `DESCRIPTION` on that branch and enforces every dependency *exactly* as written — even with `upgrade = FALSE`, it will downgrade (or upgrade) installed dependencies to match the pin. `Require` reads the same `DESCRIPTION` but treats each line as a *minimum*: if an installed dependency already satisfies the constraint, it is left alone.

Here `LandR@development` lists `reproducible (>= 3.0.0.9001)` in its `DESCRIPTION`, while the user already has `reproducible 3.0.0.9083` installed:

```
> pak::pak("PredictiveEcology/LandR@development", upgrade = FALSE)

→ Will install 1 package.
→ Will update 1 package.
→ All 2 packages (0 B) are cached.
+ LandR                     1.1.5.9101 [bld][cmp] (GitHub: c5e771d)
+ reproducible 3.0.0.9083 → 3.0.0.9001 [bld][cmp] (GitHub: ffffec4)
✔ All system requirements are already installed.

? Do you want to continue (Y/n) n
Error: Aborted.

> Require::Install("PredictiveEcology/LandR@development", upgrade = FALSE)
Require/pak skipping new package dependency identification: using cache (103 packages, 0.6h old)
All requested packages are in the pak download cache; installing from cache (no metadata refresh, no network)
offline mode: installing 1 package(s) from pak cache: LandR

→ Will install 1 package.
→ The package (0 B) is cached.
+ LandR   1.1.5.9101 [bld][cmp] (GitHub: c5e771d)

ℹ No downloads are needed, 1 pkg is cached
ℹ Building LandR 1.1.5.9101
✔ Built LandR 1.1.5.9101 (24.8s)
✔ Installed LandR 1.1.5.9101 (github::PredictiveEcology/LandR@c5e771d) (37ms)
✔ 1 pkg: added 1 [25.9s]
Installed 1 packages in 26.3 secs
```

`pak` insists on replacing `reproducible 3.0.0.9083` with the exact pin from the branch (3.0.0.9001), even though the installed version is newer. `Require` keeps the newer copy because it still satisfies LandR's constraint. The two behaviours have different use cases: `pak`'s exact-pin enforcement is what you want when you need to reproduce the dependency graph the branch author committed to; `Require`'s version-minimum policy is what you want for "set-and-forget" scripts where any version meeting the minimum is acceptable.

## Installs *and loads* in one line

`pak` installs packages. To use them, you still need a separate `library()` call.

`Require::Require()` does both: it installs (if needed) and then loads. The whole package-management story for a script can fit on one line:

```{r,eval=FALSE}
Require(c("data.table (>= 1.16)", "lme4", "PredictiveEcology/SpaDES.core@development"))
```

## Version constraints in the package name

`pak` accepts exact version pins via `pkg@1.2.3`. It does not accept ranges like `>=` or `<=` directly — you would have to either pin a specific version yourself or put the constraint in a `DESCRIPTION` file:

```{r,eval=FALSE}
# Won't work — pak does not parse this
try(pak::pak("data.table (>= 1.8.0)"))

# What you have to write instead — pick an exact version yourself
pak::pak("data.table@1.8.0")
```

Consistent with the version requirements that can be specified in a package DESCRIPTION file, `Require` accepts the full set of R-style constraints right in the call, mixed freely:

```{r,eval=FALSE}
Require::Install(c("data.table (>= 1.16)",
                   "stringfish (<= 0.15.8)",
                   "qs (== 0.27.3)"))
```

This matters because the constraint is what tells `Require` "stop, don't install" or "yes, please upgrade". The constraint is the policy.

## Conflicts: resolved vs. raised as errors

When two of your dependencies (or sub-dependencies) point to different sources or different branches of the same package, `pak` reports a conflict and stops. The user is expected to fix it — usually by adding `any::` prefixes or removing one of the requests.

`Require` resolves the conflict for you, using the priority documented above (version requirement, then CRAN, then order requested).

```{r, eval=FALSE,message=TRUE}
# pak: errors out — both branches of LandR are requested
try(pak::pak(c("PredictiveEcology/LandR@development",
               "PredictiveEcology/LandR@main")))

# Require: takes them in order — main wins
Require::Install(c("PredictiveEcology/LandR@main",
                   "PredictiveEcology/LandR@development"))

# Require: takes by version requirement — development wins because it satisfies the constraint
Require::Install(c("PredictiveEcology/LandR@main",
                   "PredictiveEcology/LandR@development (>= 1.1.5)"))
```

The same conflict-resolution applies to mismatches between a CRAN package and a GitHub `Remotes` field deep inside someone else's package: `Require` picks something and explains why, rather than asking you to untangle it.

## Archived packages: automatic vs. manual

When a package is removed from CRAN ("archived"), `pak` cannot install it from a plain name — you need to give it the explicit URL of the archive tarball (`url::https://...`). And if the archived package is a *sub-dependency* of something else, even that workaround doesn't always help.

`Require` retrieves the most recent archived copy automatically and continues. This means a workflow that worked yesterday continues to work today, even if a CRAN package has been archived overnight.

```{r,eval=FALSE,message=FALSE}
# pak: fails — `knn` is archived
try(pak::pkg_install("knn"))

# Require: succeeds — fetches the most recent archived copy
Require::Install("knn")
```

## Installing from a snapshot

A snapshot is a flat list of exact pins (CRAN versions and GitHub SHAs). On paper, that's the easiest possible install — every version is already chosen. In practice, handing the same list to `pak::pkg_install()` runs into trouble that doesn't apply to a "just install the latest" workflow:

* **All-or-nothing solving.** `pak`'s resolver evaluates every pin together. If *one* pin is unsolvable (an archived version, a sub-dep that contradicts another pin), it refuses to install anything. `Require`'s snapshot installer goes pin-by-pin with `install.packages(dependencies = NA)` against a synthesized local repo, so a bad row removes one package, not all of them.
* **Archived / disappeared versions.** Snapshots routinely pin versions that have since left CRAN. `pak` 404s. `Require` substitutes the nearest available archived version and reports the substitution.
* **Non-CRAN homes.** Rows that came from r-universe, RSPM, or another CRAN-alike carry a `Repository` URL. `pak`'s resolver only consults `options(repos)`, so those rows fail to resolve. `Require` honours each row's `Repository` column.
* **Incomplete snapshots.** A snapshot built from a session that already had a transitive dep loaded from another `libPath` can be missing that dep. `pak` errors with `dependency 'X' is not available`. `Require` auto-fills the missing dep from CRAN/PPM and flags it so the user can add it to the snapshot for full reproducibility.
* **Opaque failures.** When `pak` does fail, the user sees `! error in pak subprocess`. `Require` keeps per-package install logs and prints a structured report: status (`download-failed` / `version-conflict` / `missing-dep` / `compile-failed` / `cascade` / `substituted` / `auto-filled`), the reason, and a concrete fix.
* **Speed on Linux/macOS.** Tarballs are fetched in parallel via `libcurl` multi, with PPM binaries preferred (and the right `User-Agent` set so PPM actually serves binaries). The cache is `pkgcache` — the same cache `pak` uses — so anything downloaded here is reusable by `pak` next time, and vice versa.

### Why snapshot install needs Require's plumbing on top of `pak`

It is tempting to assume `pak`'s own cache would handle a snapshot install end-to-end — hand `pak` a list of `cran::pkg@version` refs and let `pkgcache` deduplicate. We measured this directly. Result, on a 379-package snapshot:

| Strategy                                                           | Cache-warm time | Outcome                                                         |
| ------------------------------------------------------------------ | :-------------: | :-------------------------------------------------------------- |
| `Require` snapshot installer (`local::` source + `install.packages(type = "binary")`) | **~60 s**       | All pins installed at the snapshot's exact version              |
| `pak::pkg_install(c("cran::pkgA@verA", "cran::pkgB@verB", …))`       | **~1240 s** (≈18×) | All packages eventually installed, but several pins *bumped* away from the snapshot version (forced source recompile) |

The reason is structural, not a bug in `pak`. CRAN only builds binaries for the **current** version of each package; older versions live in `src/contrib/Archive/` as source only. So when `pak`'s resolver sees `pkgA@<archived-version>`, it constructs the source-Archive URL — it never tries a binary URL, because no binary URL exists. Even if a binary for that exact pin is sitting in `pkgcache` (because we built it on a previous run), `pak` rebuilds from source. Snapshot installs are dominated by archived-version pins, so this is the common case, not the edge case.

`Require`'s snapshot installer works around this with two mechanisms `pak` does not expose:

| Layer                   | What `Require` does                                                                                            | What `pak` alone would do                                              |
| ----------------------- | :------------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------- |
| Cached binary for an archived pin | `install.packages(type = "binary", repos = NULL)` against the cached `.tgz`/`.zip` (skips compile entirely) | Rebuild from source archive (no binary URL exists for non-current versions) |
| Source tarball for an archived pin | `pak::pkg_install("local::<file>")` — bypasses `pak`'s resolver, installs the on-disk file directly      | Re-download from `src/contrib/Archive/...` even if the file is in `pkgcache` under a different URL key |
| Binary `local::` ref     | n/a — Require routes binaries through `install.packages` instead                                              | Refuses with "Platform mismatch" — `pak`'s `local::` is source-only     |
| GitHub `@SHA` pin        | Built once, then cached as a binary tarball under a synthetic `require-snapshot-bin://` URL so subsequent runs unpack instead of rebuilding | Rebuild on every run (the synthetic URL key is not part of `pak`'s resolver vocabulary) |
| Bump-and-retry for a pin that won't compile | Walks the CRAN Archive listing for newer versions, tries each, records the substitution in the diagnostic report | All-or-nothing: one unsolvable pin aborts the whole install            |

So `Require` is using `pak` for everything `pak` is good at — parallel downloads, the install subprocess, the `pkgcache` cache layout — and adding the orchestration layer that turns "snapshot of mostly-archived pinned versions" into a workflow that actually finishes in a minute instead of twenty.

Snapshot installs are the default path of `Require::Install()` when an `inst/snapshot.txt`-style file is supplied; the behaviour above is what makes "snapshot from one machine, restore on another a year later" actually work.

## Working offline

`Require` can install **and** load packages with no internet, as long as they (or compatible builds of them) were downloaded once before. This is useful in some settings, including e.g., a high performance computer cluster that has no internet access on the compute nodes. Set:

```r
options(Require.offlineMode = TRUE)
Require::Require("dplyr")
```

`Require` looks for each package in the local `pak` cache and lets `pak` install from there. With a warm cache, installs are near-instant — no rebuild, no download.

Two things make this work that calling `pak` directly does not:

* **Network probes are suppressed.** `pak` normally fetches `bioconductor.org/config.yaml` and refreshes its repo metadata at startup, even when nothing remote is needed. `Require` sets the right environment variables so those calls are skipped for the duration of the install.
* **Refs are translated.** `Require`'s internal `pkg (>= X.Y.Z)` constraint form is rewritten to the bare package name before `pak` sees it (`pak` rejects the parenthetical form).

You don't have to set `Require.offlineMode` yourself. If `Require` tries an online install and any package fails because the network is unreachable, it probes for connectivity (~2 seconds) and, if there really is no internet, automatically retries from the cache. On the happy path you pay nothing extra; the probe only runs when an install fails.

When a package isn't in the cache at all, `Require` warns "not in pak cache" — a separate message from "tarball was in cache but install failed", so the cause is unambiguous.

## Summary of differences

| *What*                                          | `Require`                              | `pak` (called directly)                                |
| ----------------------------------------------- | :------------------------------------- | :----------------------------------------------------- |
| Installs an already-installed package           | Only if version constraint demands it  | Will offer to upgrade if a newer version exists        |
| Loads packages after install                    | Yes (`Require()`)                      | No, install only                                       |
| Version constraints in package name             | `Pkg (>= X)`, `(== X)`, `(<= X)`, `(HEAD)` | Exact pin only via `Pkg@X`                         |
| Multiple branches/sources for same package      | Resolves by priority                   | Errors as a conflict                                   |
| Archived CRAN package (direct)                  | Automatic                              | Needs explicit `url::...`                              |
| Archived CRAN package (as a dependency)         | Automatic                              | Often fails even with workarounds                      |
| `Additional_repositories` in `DESCRIPTION`      | Honoured                               | Not honoured                                           |
| User-controlled override per package            | `(HEAD)` to force latest               | Not exposed                                            |
| Snapshot creation                               | `pkgSnapshot()` / `pkgSnapshot2()`     | None (use `renv` separately)                           |
| Snapshot install (per-row tolerant)             | Yes — bad row removed, rest installs   | No — one unsolvable pin aborts the whole install       |
| Substitute archived version when pin is gone    | Yes (nearest available)                | No (fails)                                             |
| Honour snapshot row's `Repository` column       | Yes                                    | No (only `options(repos)`)                             |
| Auto-fill missing transitive deps in snapshot   | Yes, with diagnostic                   | No (errors)                                            |
| Per-package failure diagnostic                  | Status / reason / fix per package      | `! error in pak subprocess`                            |

The "installation engine" rows that used to appear here (parallel downloads, parallel installs, local cache) are no longer differences: `Require` uses `pak` for those.

## Set it and forget it speed

Because a major objective for `Require` is to be set it and forget it, it cannot use meaningful human time. Thus, when all packages are installed, rerunning `Require` lines is 2x-10x faster than the equivalent `pak::pak` line:

```
> system.time(pak::pak(c("devtools", "testthat", "roxygen2")))
                                                                             
ℹ No downloads are needed
✔ 3 pkgs + 90 deps: kept 92 [1.4s]
   user  system elapsed 
   0.00    0.00    1.47 
   
> system.time(Require::Install(c("devtools", "testthat", "roxygen2")))
Require/pak skipping new package dependency identification: using memory cache (93 packages)
No packages to install/update
   user  system elapsed 
   0.04    0.00    0.20 
```


# Why is it fast?

`pak` is already fast due to parallel downloads and package caching. `Require` adds a few other features for speed.

### Extra from `Require`

If the packages supplied to a `Require/Install` call are identical as a previous one (commonly the case for ongoing projects), the package dependency tree is not re-calculated as it is stored on disk and in memory (so in-session re-runs are very fast). Since this is a slow process for >200 packages, users will see near instant package assessments.


# `renv` and `Require`

## Managing projects during development

`renv` has a concept of a lockfile. This lockfile records a specific version of a package. If the current installed version of a package is different from the lockfile (e.g., I am the developer and I increment the local version), `renv` will attempt to revert the local changes (with prompt to confirm) *unless* the local package is installed from a cloud repository (e.g., GitHub), and a `snapshot` is taken. This sequence is largely incompatible with `pkgload::load_all()` or `devtools::install()`, as these do not record "where" to get the current version from. Thus, the `renv` sequence can be quite time consuming (1-2 minutes, instead of 1 second with `pkgload::load_all()`).

`Require` does not attempt to update anything unless required by a package. Thus, this issue never comes up. If and when it is important to "snapshot", then `pkgSnapshot` or `pkgSnapshot2` can be used.