Title: | Analyze Dependency Heaviness of R Packages |
---|---|
Description: | A new metric named 'dependency heaviness' is proposed that measures the number of additional dependency packages that a parent package brings to its child package and are unique to the dependency packages imported by all other parents. The dependency heaviness analysis is visualized by a customized heatmap. The package is described in <doi:10.1093/bioinformatics/btac449>. We have also performed the dependency heaviness analysis on the CRAN/Bioconductor package ecosystem and the results are implemented as a web-based database which provides comprehensive tools for querying dependencies of individual R packages. The systematic analysis on the CRAN/Bioconductor ecosystem is described in <doi:10.1016/j.jss.2023.111610>. From 'pkgndep' version 2.0.0, the heaviness database includes snapshots of the CRAN/Bioconductor ecosystems for many old R versions. |
Authors: | Zuguang Gu [aut, cre] |
Maintainer: | Zuguang Gu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.99.1 |
Built: | 2024-11-21 04:37:03 UTC |
Source: | https://github.com/jokergoo/pkgndep |
All Bioconductor releases
ALL_BIOC_RELEASES
ALL_BIOC_RELEASES
A data frame
ALL_BIOC_RELEASES
ALL_BIOC_RELEASES
The complete table of dependency heaviness for all CRAN/Bioconductor packages
all_pkg_stat_snapshot()
all_pkg_stat_snapshot()
The returned data frame is directly from load_pkg_stat_snapshot
, but with only a subset of columns of heaviness metrics.
# There is no example NULL
# There is no example NULL
Check whether a package is available
check_pkg(pkg, bioc = FALSE)
check_pkg(pkg, bioc = FALSE)
pkg |
The name of the package. |
bioc |
Whether it is a Bioconductor package. |
One of the suggestions to avoid heavy dependencies is to put parent packages that are not frequently used
to 'Suggests' and to load them when the corresponding functions are used. Here the check_pkg
function helps to check whether these parent packages are avaiable and if not, it prints messages to guide users to install the corresponding packages.
# There is no example NULL
# There is no example NULL
Get child dependency for a package
child_dependency(package, fields = NULL, online = FALSE)
child_dependency(package, fields = NULL, online = FALSE)
package |
Package name. |
fields |
Which fields in DESCRIPTION? Values should be in |
online |
Whether use the newest package database directly from CRAN/Bioconductor or the pre-computated package database? The version of the pre-computated package database can be set via |
A data frame with child packages as well as its heaviness on its child packages. If snapshot
is set to FALSE
, heaviness on child packages
is set to NA.
## Not run: child_dependency("ComplexHeatmap") ## End(Not run)
## Not run: child_dependency("ComplexHeatmap") ## End(Not run)
Co-heaviness for pairs of parent packages
co_heaviness(x, rel = FALSE, a = 10, jaccard = FALSE)
co_heaviness(x, rel = FALSE, a = 10, jaccard = FALSE)
x |
An object returned by |
rel |
Whether to return the absolute measure or the relative measure. |
a |
A constant added for calculating the relative measure. |
jaccard |
Whether to return Jaccard coeffcient? |
Denote a package as P and its two strong parent packages as A and B, i.e., parent packages in "Depends", "Imports" and "LinkingTo", the co-heaviness for A and B is calculated as follows.
Denote S_A as the set of reduced dependency packages when only moving A to "Suggests" of P, and denote S_B as the set of reduced dependency
packages when only moving B to "Suggests" of P, denote S_AB as the set of reduced dependency packages when moving A and B together to "Suggests" of P,
the co-heaviness of A, B on P is calculatd as length(setdiff(S_AB, union(S_A, S_B)))
, which is the number of reduced package only caused by co-action of A and B.
Note the co-heaviness is only calculated for parent packages in "Depends", "Imports" and "LinkingTo".
When jaccard
is set to TRUE
, the function returns jaccard coeffcient. setdiff(S_AB, union(S_A, S_B))
is actually
the set of dependencies imported by and only by two parent packages A and B. Thus the jaccard coeffcient is calculated as
length(setdiff(S_AB, union(S_A, S_B)))/length(S_AB)
.
## Not run: # DESeq version 1.36.0, the dependencies have been changed in later versions. x = readRDS(system.file("extdata", "DESeq2_dep.rds", package = "pkgndep")) hm = co_heaviness(x) ComplexHeatmap::Heatmap(hm) co_heaviness(x, jaccard = TRUE) ## End(Not run)
## Not run: # DESeq version 1.36.0, the dependencies have been changed in later versions. x = readRDS(system.file("extdata", "DESeq2_dep.rds", package = "pkgndep")) hm = co_heaviness(x) ComplexHeatmap::Heatmap(hm) co_heaviness(x, jaccard = TRUE) ## End(Not run)
Database of package dependency heaviness of all R packages
dependency_database(version = pkgndep_opt$heaviness_db_version)
dependency_database(version = pkgndep_opt$heaviness_db_version)
version |
Version of the heaviness database. See |
if(interactive()) { dependency_database() }
if(interactive()) { dependency_database() }
Make the dependency heatmap
dependency_heatmap(x, pkg_fontsize = 10*cex, title_fontsize = 12*cex, legend_fontsize = 10*cex, fix_size = !dev.interactive(), cex = 1, help = TRUE, file = NULL, res = 144)
dependency_heatmap(x, pkg_fontsize = 10*cex, title_fontsize = 12*cex, legend_fontsize = 10*cex, fix_size = !dev.interactive(), cex = 1, help = TRUE, file = NULL, res = 144)
x |
An object from |
pkg_fontsize |
Font size for the package names. |
title_fontsize |
Font size for the title. |
legend_fontsize |
Font size for the legends. |
fix_size |
Should the rows and columns in the heatmap have fixed size? |
cex |
A factor multiplicated to all font sizes. |
help |
Whether to print help message? |
file |
A path of the figure. The size of the figure is automatically calculated. |
res |
Resolution of the figure (only for png and jpeg). |
If fix_size
is set to TRUE
. The size of the whole plot can be obtained by:
size = dependency_heatmap(x, fix_size = TRUE)
where size
is a numeric vector of length two which are the width and height of the whole heatmap.
If file
argument is set, the size of the figure is automatically calculated.
If there are no dependency packages stored in x
, NULL
is returned.
A vector of two numeric values (in inches) that correspond to the width and height of the plot.
# See examples in `pkgndep()`.
# See examples in `pkgndep()`.
HTML report for package dependency heaviness analysis
dependency_report(...)
dependency_report(...)
... |
Pass to |
It is the same as heaviness_report
.
# There is no example NULL
# There is no example NULL
Database of package dependency heaviness of all R packages
dependency_website(version = pkgndep_opt$heaviness_db_version)
dependency_website(version = pkgndep_opt$heaviness_db_version)
version |
Version of the heaviness database. See |
if(interactive()) { dependency_website() }
if(interactive()) { dependency_website() }
Get downstream dependency for a package
downstream_dependency(package, online = FALSE)
downstream_dependency(package, online = FALSE)
package |
Package name. |
online |
Whether use the newest package database directly from CRAN/Bioconductor or the pre-computated package database? The version of the pre-computated package database can be set via |
Downstream packages with relations of Depends
, Imports
and LinkingTo
are retrieved.
A data frame with all downstream packages.
## Not run: downstream_dependency("ComplexHeatmap") ## End(Not run)
## Not run: downstream_dependency("ComplexHeatmap") ## End(Not run)
Get functions that are imported to its child packages
get_all_functions_imported_to_children(package)
get_all_functions_imported_to_children(package)
package |
Package name. |
The information is based on pre-computated results for a specific CRAN/Bioconductor snapshot. See pkgndep
$heaviness_db_version for how to set the version of the snapshot.
It returns a list of function names that are imported to every of its child packages.
## Not run: get_all_functions_imported_to_children("circlize") ## End(Not run)
## Not run: get_all_functions_imported_to_children("circlize") ## End(Not run)
Gini index
gini_index(v)
gini_index(v)
v |
A numeric vector. |
x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) gini_index(x$heaviness[x$which_required])
x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) gini_index(x$heaviness[x$which_required])
Heaviness from parent packages
heaviness(x, rel = FALSE, a = 10, only_strong_dep = FALSE)
heaviness(x, rel = FALSE, a = 10, only_strong_dep = FALSE)
x |
An object returned by |
rel |
Whether to return the absolute measure or the relative measure. |
a |
A constant added for calculating the relative measure. |
only_strong_dep |
Whether to only return the heaviness for strong parents. |
The heaviness from a parent package is calculated as follows: If package B is in the Depends
/Imports
/LinkingTo
fields of package A,
which means, package B is necessary for package A, denote v1
as the total numbers of packages required for package A,
and v2
as the total number of required packages if moving package B to Suggests
(which means, now B is not necessary for A).
The absolute measure is simply v1 - v2
and relative measure is (v1 + a)/(v2 + a)
.
In the second scenario, if B is in the Suggests
/Enhances
fields of package A, now v2
is the total number of required packages if moving
B to Imports
, the absolute measure is v2 - v1
and relative measure is (v2 + a)/(v1 + a)
.
A numeric vector.
x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) heaviness(x) heaviness(x, rel = TRUE)
x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) heaviness(x) heaviness(x, rel = TRUE)
Database of package dependency heaviness of all R packages
heaviness_database(version = pkgndep_opt$heaviness_db_version)
heaviness_database(version = pkgndep_opt$heaviness_db_version)
version |
Version of the heaviness database. See |
if(interactive()) { heaviness_database() }
if(interactive()) { heaviness_database() }
Heaviness from all upstream packages
heaviness_from_upstream(package)
heaviness_from_upstream(package)
package |
A package name. |
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
A named vector.
# There is no example NULL
# There is no example NULL
Heaviness on all child packages
heaviness_on_children(package, add_values_attr = FALSE, total = FALSE)
heaviness_on_children(package, add_values_attr = FALSE, total = FALSE)
package |
A package name. |
add_values_attr |
Whether to include "values" attribute? Internally used. |
total |
Whether to return the total heaviness? |
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
The value is the mean heaviness of the package on all its child packages.
## Not run: heaviness_on_children("ComplexHeatmap") ## End(Not run)
## Not run: heaviness_on_children("ComplexHeatmap") ## End(Not run)
Heaviness on all downstream packages
heaviness_on_downstream(package, add_values_attr = FALSE, via = NULL, total = FALSE, internal = FALSE)
heaviness_on_downstream(package, add_values_attr = FALSE, via = NULL, total = FALSE, internal = FALSE)
package |
A package name. |
add_values_attr |
Whether to include "values" attribute? Internally used. |
via |
Whether to only consider downstream packages via a intermediate package? |
total |
Whether to return the total heaviness? |
internal |
Whether to use internally calculated heaviness? |
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
The value is the mean heaviness of the package on all its downstream packages. Denote n
as the number of all its downstream packages,
k_i
as the number of required packages for package i,
v_1
as the total number of required packages for all downstream packages, i.e. v_1 = sum_i^n {k_i}
. Denote p_i
as the number of required packages if moving package
to Suggests
,
and v_2
as the total number of required packages, i.e. v_1 = sum_i^n {p_i}
. The final heaviniss on downstream packages is (v_1 - v_2)/n
.
Note since the interaction from package
to its downstream packages may go through several intermediate packages, which means, the reduction of required packages
for a downstream package might be joint effects from all its upstream packages, thus, to properly calculate the heaviness of a package to its downstream packages, we first make
a copy of the package database and move package
to Suggests
for all packages which depends on package
. Then for all downstream packages of package
, dependency analysis
by pkgndep
is redone with the modified package database. Finally, the heaviness on downstream packages is collected and the mean heaviness is calculated.
## Not run: heaviness_on_downstream("ComplexHeatmap") ## End(Not run)
## Not run: heaviness_on_downstream("ComplexHeatmap") ## End(Not run)
HTML report for package dependency heaviness analysis
heaviness_report(pkg, file = NULL)
heaviness_report(pkg, file = NULL)
pkg |
An object from |
file |
The path of the html file. If it is not specified, the report will be automatically opened in the web browser. |
The path of the HTML file of the report.
if(interactive()) { x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) heaviness_report(x) }
if(interactive()) { x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) heaviness_report(x) }
Test the parent-child relationship
is_parent(parent, child, ...)
is_parent(parent, child, ...)
parent |
A vector of package names. |
child |
A single package name. |
... |
Pass to |
A logical vector.
# There is no example NULL
# There is no example NULL
Test upstream-downstream relationship
is_upstream(upstream, package, ...)
is_upstream(upstream, package, ...)
upstream |
A vector of package names. |
package |
A single package name. |
... |
Pass to |
A logical vector.
# There is no example NULL
# There is no example NULL
Load dependency analysis results of all packages
load_all_pkg_dep(hash = TRUE)
load_all_pkg_dep(hash = TRUE)
hash |
Whether to convert the named list to a hash table by |
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
A list (as a hash table) of pkgndep
objects where each element corresponds to the analysis on one package.
## Not run: lt = load_all_pkg_dep() length(lt) head(names(lt)) lt[["ggplot2"]] ## End(Not run)
## Not run: lt = load_all_pkg_dep() length(lt) head(names(lt)) lt[["ggplot2"]] ## End(Not run)
Load pre-computed objects
load_from_heaviness_db(file)
load_from_heaviness_db(file)
file |
File name. |
The pathway of the file can be set via pkgndep_opt
$db_file_template.
Internally used.
# There is no example NULL
# There is no example NULL
Load heaviness statistics at all time points
load_heaviness_timeline()
load_heaviness_timeline()
Used internally.
A list of data frames.
# There is no example NULL
# There is no example NULL
Load package database
load_pkg_db(lib = NULL, online = TRUE, db = NULL, verbose = TRUE)
load_pkg_db(lib = NULL, online = TRUE, db = NULL, verbose = TRUE)
lib |
Local library path. If the value is |
online |
If the value is |
db |
A pre-computed |
verbose |
Whetehr to print messages. |
It loads the package database from CRAN/Bioconductor and locally installed packages.
The database object internaly is cached for repeated use of other functions in this package.
A pkg_db
class object. See reformat_db
for how to use the pkg_db
object.
## Not run: pkg_db = load_pkg_db(lib = NA) pkg_db ## End(Not run)
## Not run: pkg_db = load_pkg_db(lib = NA) pkg_db ## End(Not run)
Load DESCRIPTION files of all packages
load_pkg_description()
load_pkg_description()
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
A list of character vectors.
## Not run: lt = load_pkg_description() lt[1:2] ## End(Not run)
## Not run: lt = load_pkg_description() lt[1:2] ## End(Not run)
Load downstream dependency paths for all packages
load_pkg_downstream_dependency_path_snapshot()
load_pkg_downstream_dependency_path_snapshot()
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
A list.
## Not run: downstream_path_list = load_pkg_downstream_dependency_path_snapshot() downstream_path_list[["ComplexHeatmap"]] ## End(Not run)
## Not run: downstream_path_list = load_pkg_downstream_dependency_path_snapshot() downstream_path_list[["ComplexHeatmap"]] ## End(Not run)
Load NAMESPACE files of all packages
load_pkg_namespace()
load_pkg_namespace()
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
A list of character vectors.
## Not run: lt = load_pkg_namespace() lt[1:2] ## End(Not run)
## Not run: lt = load_pkg_namespace() lt[1:2] ## End(Not run)
Load all package dependency statistics
load_pkg_stat_snapshot()
load_pkg_stat_snapshot()
It is calculated based on a specific CRAN/Bioconductor snapshot. The version is set via pkgndep_opt
$heaviness_db_version.
A data frame.
## Not run: df = load_pkg_stat_snapshot() head(df) ## End(Not run)
## Not run: df = load_pkg_stat_snapshot() head(df) ## End(Not run)
Loaded packages
loaded_packages(pkg, verbose = TRUE)
loaded_packages(pkg, verbose = TRUE)
pkg |
A package name. |
verbose |
Whether to print messages. |
It loads pkg
into a new R session and collects which other packages are loaded by parsing the output from sessionInfo
.
A data frame.
loaded_packages("ComplexHeatmap")
loaded_packages("ComplexHeatmap")
Get parent dependency for a package
parent_dependency(package, fields = NULL, online = FALSE)
parent_dependency(package, fields = NULL, online = FALSE)
package |
Package name. |
fields |
Which fields in DESCRIPTION? Values should be in |
online |
Whether use the newest package database directly from CRAN/Bioconductor or the pre-computated package database? The version of the pre-computated package database can be set via |
A data frame with parent packages as well as their heaviness on pacakge
. If snapshot
is set to FALSE
, heaviness on child packages
is set to NA.
## Not run: parent_dependency("ComplexHeatmap") ## End(Not run)
## Not run: parent_dependency("ComplexHeatmap") ## End(Not run)
Package dependency analysis
pkgndep(package, verbose = TRUE, online = TRUE, load = FALSE, parse_namespace = TRUE)
pkgndep(package, verbose = TRUE, online = TRUE, load = FALSE, parse_namespace = TRUE)
package |
Package name. The value can be 1. a CRAN/Bioconductor package, 2. an installed package, 3. a path of a local package, 4. URL of a GitHub repository. |
verbose |
Whether to show messages. |
online |
If the value is |
load |
If the value is |
parse_namespace |
Whether to also parse the NAMESPACE file. It is only used internally. |
A pkgndep
object.
## Not run: x = pkgndep("ComplexHeatmap") ## End(Not run) # The `x` variable generated by `pkgndep()` is already saved in this package. x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) x dependency_heatmap(x)
## Not run: x = pkgndep("ComplexHeatmap") ## End(Not run) # The `x` variable generated by `pkgndep()` is already saved in this package. x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) x dependency_heatmap(x)
Global parameters for pkgndep
pkgndep_opt(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE, ADD = FALSE)
pkgndep_opt(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE, ADD = FALSE)
... |
Arguments for the parameters, see "details" section |
RESET |
Reset to default values. |
READ.ONLY |
Please ignore. |
LOCAL |
Pllease ignore. |
ADD |
Please ignore. |
There are following parameters:
bioc_version
The bioconductor version. By default it is the version corresponding to the R version under use. Please note this option is only for switching between bioc release version and development version, while not for switching to very old bioc versions.
heaviness_db_version
The version of the heaviness database. The value can be the corresponding bioc version, the R version or the corresponding date for the bioc release. All supported values are in the object ALL_BIOC_RELEASES
.
pkgndep_opt
pkgndep_opt
Make the dependency heatmap
## S3 method for class 'pkgndep' plot(x, ...)
## S3 method for class 'pkgndep' plot(x, ...)
x |
An object from |
... |
Other arguments. |
Please use dependency_heatmap
instead.
# There is no example NULL
# There is no example NULL
Print method
## S3 method for class 'pkgndep' print(x, ...)
## S3 method for class 'pkgndep' print(x, ...)
x |
An object from |
... |
Other arguments. |
No value is returned.
# See examples in `pkgndep()`.
# See examples in `pkgndep()`.
Format the package database
reformat_db(db, version = NULL)
reformat_db(db, version = NULL)
db |
A data frame returned from |
version |
Version of the database, a self-defined text. |
It reformats the data frame of the package database into a pkg_db
class object.
A pkg_db
class object. There are the following methods:
pkg_db$get_meta(package,field=NULL)
field
can take values in "Package", "Version" and "Repository".
pkg_db$get_dependency_table(package)
Get the dependency table.
pkg_db$get_rev_dependency_table(package)
Get the reverse dependency table.
pkg_db$package_dependencies(package,recursive=FALSE,reverse=FALSE,which="strong",simplify=FALSE)
All the arguments are the same as in package_dependencies
. Argument simplify
controls whether to return a data frame or a simplied vector.
## Not run: db = available.packages() db2 = reformat_db(db) # a pkg_db object generated on 2021-10-28 can be loaded by load_pkg_db() db2 = load_pkg_db(online = FALSE) db2 db2$get_meta("ComplexHeatmap") db2$get_dependency_table("ComplexHeatmap") db2$get_rev_dependency_table("ComplexHeatmap") db2$package_dependencies("ComplexHeatmap") db2$package_dependencies("ComplexHeatmap", recursive = TRUE) ## End(Not run)
## Not run: db = available.packages() db2 = reformat_db(db) # a pkg_db object generated on 2021-10-28 can be loaded by load_pkg_db() db2 = load_pkg_db(online = FALSE) db2 db2$get_meta("ComplexHeatmap") db2$get_dependency_table("ComplexHeatmap") db2$get_rev_dependency_table("ComplexHeatmap") db2$package_dependencies("ComplexHeatmap") db2$package_dependencies("ComplexHeatmap", recursive = TRUE) ## End(Not run)
Required dependency packages
required_dependency_packages(x, all = FALSE)
required_dependency_packages(x, all = FALSE)
x |
An object from |
all |
Whether to include the packages required if also including packages from "Suggests"/"Enhances" field. |
The function returns all upstream packages.
A vector of package names.
## Not run: x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) required_dependency_packages(x) ## End(Not run)
## Not run: x = readRDS(system.file("extdata", "ComplexHeatmap_dep.rds", package = "pkgndep")) required_dependency_packages(x) ## End(Not run)
Get upstream dependency for a package
upstream_dependency(package, online = FALSE)
upstream_dependency(package, online = FALSE)
package |
Package name. |
online |
Whether use the newest package database directly from CRAN/Bioconductor or the pre-computated package database? The version of the pre-computated package database can be set via |
Upstream packages with relations of "Depends", "Imports" and "LinkingTo" are retrieved.
A data frame with all upstream packages.
## Not run: upstream_dependency("ComplexHeatmap") ## End(Not run)
## Not run: upstream_dependency("ComplexHeatmap") ## End(Not run)