When developing R packages, we should try to avoid directly setting dependencies on “heavy packages”. The “heaviness” for a package means, the number of additional dependency packages it brings to. If your package directly depends on a heavy package, it would bring several consequences:
sessionInfo()
).In the DESCRIPTION file of your package, there are “direct dependency
pakcages” listed in the Depends
, Imports
and
LinkingTo
fields. There are also “indirect dependency
packages” that can be found recursively for each of the direct
dependency packages. Here what we called “dependency packages” are the
union of the direct and indirect dependency packages.
There are also packages listed in Suggests
and
Enhances
fields in DESCRIPTION file, but they are not
enforced to be installed when installing your package. Of course, they
also have “indirect dependency packages”. To get rid of the heavy
packages that are not often used in your package, it is better to move
them into the Suggests
/Enhances
fields and to
load/install them only when they are needed.
Here the pkgndep package checks the heaviness of the
dependency packages of your package. For each package listed in the
Depends
, Imports
, LinkingTo
and
Suggests
/Enhances
fields in the DESCRIPTION
file, pkgndep checks how many additional packages your
package requires. The summary of the dependency is visualized by a
customized heatmap.
As an example, I am developing a package called cola which depends on a lot of other packages. The dependency heatmap looks like follows (please drag the figure to a new tab to see it in its actual size):
## The best device size to visualize the complete plot is 38.91 x 11.82 (in inches),
## or use `plot(obj, fix_size = FALSE)` so that heatmap cells are not in fixed sizes.
In the heatmap, rows are the packages listed in Depends
,
Imports
and Suggests
fields, columns are the
additional dependency packages required for each row package. The
barplots on the right show the number of required package, the number of
imported functions/methods/classes (parsed from NAMESPACE file) and the
quantitative measure “heaviness” (the definition of heaviness will be
introduced later).
We can see if all the packages are put in the Depends
or
Imports
field (i.e. movig all suggsted packages to
Imports
), in total 257 packages are required, which are
really a lot. Actually some of the heavy packages such as
WGCNA, clusterProfiler and
ReactomePA (the last three packages in the heatmap
rows) are not very frequently used in cola, moving them
to Suggests
field and using them only when they are needed
greatly helps to reduce the dependencies of cola. Now
the number of required packages are reduced to only 65.
To use this package:
library(pkgndep)
pkg = pkgndep("package-name") # if the package is already installed
dependency_heatmap(pkg)
or
pkg = pkgndep("path-of-the-package") # if the package has not been installed yet
dependency_heatmap(pkg)
The value for pkgndep()
should be 1. a CRAN/Bioconductor
package, 2. an installed package, 3. a path of a local package, 4. URL
of a GitHub repository.
Executable examples:
## retrieve package database from CRAN/Bioconductor (3.20)...
## - 25066 remote packages on CRAN/Bioconductor.
## - 191 packages installed locally.
## prepare dependency table...
## prepare reverse dependency table...
## 'ComplexHeatmap', version 2.23.0
## - 31 packages are required for installing 'ComplexHeatmap'.
## - 131 packages are required if installing packages listed in all fields in DESCRIPTION.
pkgndep()
first needs to retrieve package databases both
from remote repositories and local libraries, as you can see the message
from above code. This only happens once and the database is internally
saved and re-used.
We can directly use dependency_heatmap()
function to
create the dependency heatmap:
## The best device size to visualize the complete plot is 21.36 x 8.54 (in inches),
## or use `plot(obj, fix_size = FALSE)` so that heatmap cells are not in fixed sizes.
You can set the file
argument to directly save the image
into a figure where the figure size is automatically calculated.
Supported image formats are
png
/jpg
/svg
/pdf
.
heaviness_report()
function can generate an HTML report
for the dependency heaviness analysis on the package.
The heaviness of package dependency can be measured quantitatively. pkgndep provides two measures: the absolute measure and the relative measure.
The heaviness of a dependency package is calculated as follows. If
package B is in the
Depends
/Imports
/LinkingTo
fields
of package A, which means, package B is directly
required for package A, denote v1
as the total
number of packages for package A, and denote v2
as
the total number of required packages if moving package B to
Suggests
in package A (which means, now B
is not enforced to be installed for package A). The absolute
measure of heaviness is simply v1 - v2
and relative measure
is (v1 + a)/(v2 + a)
where a
is a small
constant, e.g. 10. So here the absolute heaviness for package B
on package A is the number of additional packages that package
B uniquely brings in.
In the second scenario, if package B is in the
Suggests
/Enhances
fields of package
A, now v2
is the total number of required packages
if moving package B to Imports
in package
A, the absolute measure of heaviness is v2 - v1
and relative measure is (v2 + a)/(v1 + a)
.
The heaviness score can be calculated by the function
heaviness()
:
## grDevices graphics grid stats methods
## 0 0 0 0 0
## RColorBrewer png matrixStats codetools digest
## 1 1 1 0 1
## foreach colorspace GlobalOptions clue doParallel
## 0 0 0 2 4
## GetoptLong circlize IRanges dendsort jpeg
## 3 2 5 1 1
## tiff fastcluster Cairo gridGraphics glue
## 1 1 1 1 1
## markdown grImport magick gplots knitr
## 3 2 4 5 5
## grImport2 pheatmap gridtext GenomicRanges testthat
## 4 12 16 14 22
## rmarkdown dendextend EnrichedHeatmap
## 25 31 19
## grDevices graphics grid stats methods
## 1.000000 1.000000 1.000000 1.000000 1.000000
## RColorBrewer png matrixStats codetools digest
## 1.025000 1.025000 1.025000 1.000000 1.025000
## foreach colorspace GlobalOptions clue doParallel
## 1.000000 1.000000 1.000000 1.051282 1.108108
## GetoptLong circlize IRanges dendsort jpeg
## 1.078947 1.051282 1.138889 1.024390 1.024390
## tiff fastcluster Cairo gridGraphics glue
## 1.024390 1.024390 1.024390 1.024390 1.024390
## markdown grImport magick gplots knitr
## 1.073171 1.048780 1.097561 1.121951 1.121951
## grImport2 pheatmap gridtext GenomicRanges testthat
## 1.097561 1.292683 1.390244 1.341463 1.536585
## rmarkdown dendextend EnrichedHeatmap
## 1.609756 1.756098 1.463415
tools::package_dependencies()
The package dependencies are based on “package database” which is
normally retrieved by available.packages()
. In
tools package, there is a
package_dependencies()
function that can be used to get a
list of dependency packages. In the following example code, we retrieve
the dependency packages for package ggplot2.
## user system elapsed
## 0.217 0.000 0.217
In pkgndep, we implement a faster version of
package_dependencies()
function. First the database needs
to be reformatted by reformat_db()
function. The returned
variable db2
is a reference class object and its method
db2$package_dependencies()
can be used to retrieve
dependency packages.
## prepare dependency table...
## prepare reverse dependency table...
## A package database of 21681 packages.
## - 21681 CRAN / 0 Bioconductor / 0 other packages.
## user system elapsed
## 0.002 0.000 0.003
p1
and p2
are actually identical:
## [1] TRUE
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ComplexHeatmap_2.23.0 pkgndep_1.99.1 knitr_1.49
## [4] rmarkdown_2.29
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 compiler_4.4.2 rjson_0.2.23
## [4] crayon_1.5.3 parallel_4.4.2 cluster_2.1.6
## [7] jquerylib_0.1.4 IRanges_2.41.1 png_0.1-8
## [10] yaml_2.3.10 fastmap_1.2.0 R6_2.5.1
## [13] generics_0.1.3 shape_1.4.6.1 BiocGenerics_0.53.3
## [16] iterators_1.0.14 GetoptLong_1.1.0 circlize_0.4.16
## [19] maketools_1.3.1 RColorBrewer_1.1-3 bslib_0.8.0
## [22] rlang_1.1.4 cachem_1.1.0 xfun_0.49
## [25] sass_0.4.9 sys_3.4.3 GlobalOptions_0.1.2
## [28] doParallel_1.0.17 cli_3.6.3 digest_0.6.37
## [31] foreach_1.5.2 clue_0.3-66 lifecycle_1.0.4
## [34] S4Vectors_0.45.2 evaluate_1.0.1 codetools_0.2-20
## [37] buildtools_1.0.0 hash_2.2.6.3 stats4_4.4.2
## [40] colorspace_2.1-1 BiocVersion_3.21.1 matrixStats_1.4.1
## [43] tools_4.4.2 htmltools_0.5.8.1