vignettes/guides/bioconductor_integration.Rmd
bioconductor_integration.Rmd
plotgardener
is designed to be flexibly compatible with
typical Bioconductor classes and libraries of genomic data to easily
integrate genomic data analysis and visualization. In addition to
handling various genomic file types and R
objects, many
plotgardener
functions can also handle GRanges
or GInteractions
objects as input data. Furthermore,
plotgardener
does not hard-code any genomic assemblies and
can utilize TxDb
, OrgDb
, and
BSgenome
objects for various genomic annotations, including
gene and transcript structures and names, chromosome sizes, and
nucleotide sequences. Furthermore, all cytoband data for ideogram plots
is retrieved from the UCSC Genome Browser through
AnnotationHub
. For standard genomic assemblies (i.e. hg19,
hg38, mm10), plotgardener
uses a set of default packages
that can be displayed by calling defaultPackages()
:
defaultPackages("hg38")
#> 'data.frame': 1 obs. of 6 variables:
#> $ Genome : chr "hg38"
#> $ TxDb : chr "TxDb.Hsapiens.UCSC.hg38.knownGene"
#> $ OrgDb : chr "org.Hs.eg.db"
#> $ gene.id.column: chr "ENTREZID"
#> $ display.column: chr "SYMBOL"
#> $ BSgenome : chr "BSgenome.Hsapiens.UCSC.hg38"
defaultPackages("hg19")
#> 'data.frame': 1 obs. of 6 variables:
#> $ Genome : chr "hg19"
#> $ TxDb : chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
#> $ OrgDb : chr "org.Hs.eg.db"
#> $ gene.id.column: chr "ENTREZID"
#> $ display.column: chr "SYMBOL"
#> $ BSgenome : chr "BSgenome.Hsapiens.UCSC.hg19"
defaultPackages("mm10")
#> 'data.frame': 1 obs. of 6 variables:
#> $ Genome : chr "mm10"
#> $ TxDb : chr "TxDb.Mmusculus.UCSC.mm10.knownGene"
#> $ OrgDb : chr "org.Mm.eg.db"
#> $ gene.id.column: chr "ENTREZID"
#> $ display.column: chr "SYMBOL"
#> $ BSgenome : chr "BSgenome.Mmusculus.UCSC.mm10"
To see which assemblies have defaults within
plotgardener
, call genomes()
:
genomes()
#> bosTau8
#> bosTau9
#> canFam3
#> ce6
#> ce11
#> danRer10
#> danRer11
#> dm3
#> dm6
#> galGal4
#> galGal5
#> galGal6
#> hg18
#> hg19
#> hg38
#> mm9
#> mm10
#> rheMac3
#> rheMac8
#> rheMac10
#> panTro5
#> panTro6
#> rn4
#> rn5
#> rn6
#> sacCer2
#> sacCer3
#> susScr3
#> susScr11
plotgardener
functions default to an “hg38” assembly,
but can be customized with any of the other genomic assemblies included
or a assembly
object. To create custom genomic assemblies
and combinations of TxDb
, orgDb
, and
BSgenome
packages for use in plotgardener
functions, we can use the assembly()
constructor. For
example, we can create our own TxDb
from the current human
Ensembl release:
library(GenomicFeatures)
TxDb.Hsapiens.Ensembl.GrCh38.103 <- makeTxDbFromEnsembl(
organism =
"Homo sapiens"
)
We can now create a new assembly
with this
TxDb
and combinations of other Bioconductor packages. The
Genome
parameter can is a string to name or describe this
assembly. Since the TxDb
is now from ENSEMBL, we will
change the gene.id
field to "ENSEMBL"
to map
gene IDs and symbols between our TxDb
and
orgDb
objects. Most gene ID types can be found by calling
AnnotationDbi::keytypes()
on an orgDb
.
Ensembl38 <- assembly(
Genome = "Ensembl.GRCh38.103",
TxDb = TxDb.Hsapiens.Ensembl.GrCh38.103,
OrgDb = "org.Hs.eg.db",
BSgenome = "BSgenome.Hsapiens.NCBI.GRCh38",
gene.id = "ENSEMBL", display.column = "SYMBOL"
)
This assembly
object can now be easily passed into
plotgardener
functions through the assembly
parameter to use custom genomic assembly configurations.
assembly
objects are especially handy for changing the
type of gene or transcript label of our gene and transcript plots. The
default display.column = "SYMBOL"
, but we could change this
value to other available keytypes
in the orgDb
we are using. For example, if we wanted to display the associated
Ensembl IDs of an “hg19” assembly
object, we would set
display.column = "ENSEMBL"
:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(org.Hs.eg.db)
new_hg19 <- assembly(
Genome = "id_hg19",
TxDb = "TxDb.Hsapiens.UCSC.hg19.knownGene",
OrgDb = "org.Hs.eg.db",
gene.id.column = "ENTREZID",
display.column = "ENSEMBL"
)
pageCreate(
width = 5, height = 1.25,
showGuides = FALSE, xgrid = 0, ygrid = 0
)
genePlot <- plotGenes(
chrom = "chr2", chromstart = 1000000, chromend = 20000000,
assembly = new_hg19,
x = 0.25, y = 0.25, width = 4.75, height = 1
)
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 grid stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] org.Hs.eg.db_3.18.0
#> [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
#> [3] GenomicFeatures_1.54.4
#> [4] AnnotationDbi_1.64.1
#> [5] Biobase_2.62.0
#> [6] GenomicRanges_1.54.1
#> [7] GenomeInfoDb_1.38.8
#> [8] IRanges_2.36.0
#> [9] S4Vectors_0.40.2
#> [10] BiocGenerics_0.48.1
#> [11] plotgardener_1.8.2
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.2 bitops_1.0-7
#> [3] biomaRt_2.58.2 rlang_1.1.3
#> [5] magrittr_2.0.3 matrixStats_1.2.0
#> [7] compiler_4.3.2 RSQLite_2.3.5
#> [9] png_0.1-8 systemfonts_1.0.6
#> [11] vctrs_0.6.5 stringr_1.5.1
#> [13] pkgconfig_2.0.3 crayon_1.5.2
#> [15] fastmap_1.1.1 dbplyr_2.5.0
#> [17] XVector_0.42.0 utf8_1.2.4
#> [19] Rsamtools_2.18.0 rmarkdown_2.26
#> [21] strawr_0.0.91 ragg_1.3.0
#> [23] purrr_1.0.2 bit_4.0.5
#> [25] xfun_0.43 zlibbioc_1.48.2
#> [27] cachem_1.0.8 jsonlite_1.8.8
#> [29] progress_1.2.3 blob_1.2.4
#> [31] highr_0.10 DelayedArray_0.28.0
#> [33] BiocParallel_1.36.0 parallel_4.3.2
#> [35] prettyunits_1.2.0 R6_2.5.1
#> [37] plyranges_1.22.0 bslib_0.7.0
#> [39] stringi_1.8.3 RColorBrewer_1.1-3
#> [41] rtracklayer_1.62.0 jquerylib_0.1.4
#> [43] Rcpp_1.0.12 SummarizedExperiment_1.32.0
#> [45] knitr_1.45 Matrix_1.6-5
#> [47] tidyselect_1.2.1 rstudioapi_0.16.0
#> [49] abind_1.4-5 yaml_2.3.8
#> [51] codetools_0.2-19 curl_5.2.1
#> [53] lattice_0.22-6 tibble_3.2.1
#> [55] withr_3.0.0 KEGGREST_1.42.0
#> [57] evaluate_0.23 gridGraphics_0.5-1
#> [59] desc_1.4.3 BiocFileCache_2.10.1
#> [61] xml2_1.3.6 Biostrings_2.70.3
#> [63] filelock_1.0.3 pillar_1.9.0
#> [65] MatrixGenerics_1.14.0 generics_0.1.3
#> [67] RCurl_1.98-1.14 hms_1.1.3
#> [69] ggplot2_3.5.0 munsell_0.5.0
#> [71] scales_1.3.0 glue_1.7.0
#> [73] tools_4.3.2 BiocIO_1.12.0
#> [75] data.table_1.15.2 GenomicAlignments_1.38.2
#> [77] fs_1.6.3 XML_3.99-0.16.1
#> [79] colorspace_2.1-0 GenomeInfoDbData_1.2.11
#> [81] restfulr_0.0.15 cli_3.6.2
#> [83] rappdirs_0.3.3 textshaping_0.3.7
#> [85] fansi_1.0.6 S4Arrays_1.2.1
#> [87] dplyr_1.1.4 gtable_0.3.4
#> [89] yulab.utils_0.1.4 sass_0.4.9
#> [91] digest_0.6.35 SparseArray_1.2.4
#> [93] ggplotify_0.1.2 rjson_0.2.21
#> [95] memoise_2.0.1 htmltools_0.5.8
#> [97] pkgdown_2.0.7 lifecycle_1.0.4
#> [99] httr_1.4.7 bit64_4.0.5