---
title: "Check the heaviness of package dependencies"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Check the heaviness of package dependencies}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

When developing R packages, we should try to avoid directly setting
dependencies on "heavy packages". The "heaviness" for a package means, the
number of additional dependency packages it brings to. If your package directly depends
on a heavy package, it would bring several consequences:

1. Users need to install a lot of additional packages when installing your
   package which brings the risk that installation of some packages
   may fail and it makes your package cannot be installed. 
2. The namespaces that are loaded into your R session after loading your package will be huge (you can see the loaded namespaces by `sessionInfo()`).
3. You package will be "heavy" as well and it may take long time to load your package.

In the DESCRIPTION file of your package, there are "direct dependency
pakcages" listed in the `Depends`, `Imports` and `LinkingTo` fields. There are
also "indirect dependency packages" that can be found recursively for each of
the direct dependency packages. Here what we called "dependency packages" are
the union of the direct and indirect dependency packages.

There are also packages listed in `Suggests` and `Enhances` fields in
DESCRIPTION file, but they are not enforced to be installed when installing
your package. Of course, they also have "indirect dependency packages". To get
rid of the heavy packages that are not often used in your package, it is
better to move them into the `Suggests`/`Enhances` fields and to load/install
them only when they are needed.

Here the **pkgndep** package checks the heaviness of the dependency packages
of your package. For each package listed in the `Depends`, `Imports`,
`LinkingTo` and `Suggests`/`Enhances` fields in the DESCRIPTION file,
**pkgndep** checks how many additional packages your package requires. The
summary of the dependency is visualized by a customized heatmap.

As an example, I am developing a package called
[**cola**](https://github.com/jokergoo/cola/) which depends on [a lot of other
packages](https://github.com/jokergoo/cola/blob/master/DESCRIPTION).
The dependency heatmap looks like follows (please drag the figure to a new tab to see it in its actual size):

```{r, echo = FALSE}
library(pkgndep)
x = readRDS(system.file("extdata", "cola_dep.rds", package = "pkgndep"))
pdf(NULL)
size = dependency_heatmap(x, help = FALSE)
invisible(dev.off())
```

```{r, fig.width = size[1], fig.height = size[2], out.width = "1000px", echo = FALSE}
dependency_heatmap(x)
```

In the heatmap, rows are the packages listed in `Depends`, `Imports` and
`Suggests` fields, columns are the additional dependency packages required for
each row package. The barplots on the right show the number of required
package, the number of imported functions/methods/classes (parsed from
NAMESPACE file) and the quantitative measure "heaviness" (the definition of
heaviness will be introduced later).

We can see if all the packages are put in the `Depends` or `Imports` field
(i.e. movig all suggsted packages to `Imports`), in total `r x$n_by_all`
packages are required, which are really a lot. Actually some of the heavy
packages such as **WGCNA**, **clusterProfiler** and **ReactomePA** (the last
three packages in the heatmap rows) are not very frequently used in **cola**,
moving them to `Suggests` field and using them only when they are needed
greatly helps to reduce the dependencies of **cola**. Now the number of required
packages are reduced to only `r x$n_by_strong`.

## Usage

To use this package:

```r
library(pkgndep)
pkg = pkgndep("package-name")  # if the package is already installed
dependency_heatmap(pkg)
```

or

```r
pkg = pkgndep("path-of-the-package")  # if the package has not been installed yet
dependency_heatmap(pkg)
```

The value for `pkgndep()` should be 1. a CRAN/Bioconductor package, 2. an installed package, 3. a path of a local package, 4. URL of a GitHub repository.

Executable examples:

```{r}
library(pkgndep)
pkg = pkgndep("ComplexHeatmap")
pkg
```

`pkgndep()` first needs to retrieve package databases both from remote repositories and local libraries, as you
can see the message from above code. This only happens once and the database is internally
saved and re-used.

We can directly use `dependency_heatmap()` function to create the dependency heatmap:

```{r, echo = FALSE}
pdf(NULL)
size = dependency_heatmap(pkg, help = FALSE)
invisible(dev.off())
```

```{r, fig.width = size[1], fig.height = size[2], out.width = "1000px"}
dependency_heatmap(pkg)
```

You can set the `file` argument to directly save the image into a figure where the figure
size is automatically calculated. Supported image formats are `png`/`jpg`/`svg`/`pdf`.

```{r, eval = FALSE}
dependency_heatmap(pkg, file = "test.png")
```

`heaviness_report()` function can generate an HTML report for the dependency heaviness analysis on the package.

```{r, eval = FALSE}
heaviness_report(pkg)
```

## Heaviness

The heaviness of package dependency can be measured quantitatively. **pkgndep** provides two measures: the absolute
measure and the relative measure.

The heaviness of a dependency package is calculated as follows. If package _B_
is in the `Depends`/`Imports`/`LinkingTo` fields of package _A_, which means, package _B_ 
is directly required for package _A_, denote `v1` as the total number of packages for package _A_, 
and denote `v2` as the total number of required packages if moving package
_B_ to `Suggests` in package _A_ (which means, now _B_ is not enforced to be installed for package _A_). The
absolute measure of heaviness is simply `v1 - v2` and relative measure is `(v1 + a)/(v2 + a)` where `a` is a small constant, e.g. 10.
So here the absolute heaviness for package _B_ on package _A_ is the number of additional packages
that package _B_ uniquely brings in.

In the second scenario, if package _B_ is in the `Suggests`/`Enhances` fields of package
_A_, now `v2` is the total number of required packages if moving package _B_ to `Imports` in package _A_,
the absolute measure of heaviness is `v2 - v1` and relative measure is `(v2 + a)/(v1 + a)`.

The heaviness score can be calculated by the function `heaviness()`:

```{r}
heaviness(pkg)
heaviness(pkg, rel = TRUE)
```


## A fast version of `tools::package_dependencies()`

The package dependencies are based on "package database" which is normally retrieved by `available.packages()`.
In **tools** package, there is a `package_dependencies()` function that can be used to get a list of dependency packages.
In the following example code, we retrieve the dependency packages for package **ggplot2**.

```{r}
chooseCRANmirror(ind = 1) # choose the mirror fro RStudio
db = available.packages()
```


```{r}
system.time(p1 <- tools::package_dependencies("ggplot2", db = db, recursive = TRUE)[[1]])
```

In **pkgndep**, we implement a faster version of `package_dependencies()` function. First the database needs
to be reformatted by `reformat_db()` function. The returned variable `db2` is a reference class object and
its method `db2$package_dependencies()` can be used to retrieve dependency packages.

```{r}
db2 = reformat_db(db)
db2
system.time(p2 <- db2$package_dependencies("ggplot2", recursive = TRUE, simplify = TRUE))
```

`p1` and `p2` are actually identical:

```{r}
identical(sort(p1), sort(p2))
```


## Session info

```{r}
sessionInfo()
```
