vignettes/guides/bioconductor_integration.Rmd
bioconductor_integration.Rmdplotgardener is designed to be flexibly compatible with
typical Bioconductor classes and libraries of genomic data to easily
integrate genomic data analysis and visualization. In addition to
handling various genomic file types and R objects, many
plotgardener functions can also handle GRanges
or GInteractions objects as input data. Furthermore,
plotgardener does not hard-code any genomic assemblies and
can utilize TxDb, OrgDb, and
BSgenome objects for various genomic annotations, including
gene and transcript structures and names, chromosome sizes, and
nucleotide sequences. Furthermore, all cytoband data for ideogram plots
is retrieved from the UCSC Genome Browser through
AnnotationHub. For standard genomic assemblies (i.e. hg19,
hg38, mm10), plotgardener uses a set of default packages
that can be displayed by calling defaultPackages():
defaultPackages("hg38")
#> 'data.frame': 1 obs. of 6 variables:
#> $ Genome : chr "hg38"
#> $ TxDb : chr "TxDb.Hsapiens.UCSC.hg38.knownGene"
#> $ OrgDb : chr "org.Hs.eg.db"
#> $ gene.id.column: chr "ENTREZID"
#> $ display.column: chr "SYMBOL"
#> $ BSgenome : chr "BSgenome.Hsapiens.UCSC.hg38"
defaultPackages("hg19")
#> 'data.frame': 1 obs. of 6 variables:
#> $ Genome : chr "hg19"
#> $ TxDb : chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
#> $ OrgDb : chr "org.Hs.eg.db"
#> $ gene.id.column: chr "ENTREZID"
#> $ display.column: chr "SYMBOL"
#> $ BSgenome : chr "BSgenome.Hsapiens.UCSC.hg19"
defaultPackages("mm10")
#> 'data.frame': 1 obs. of 6 variables:
#> $ Genome : chr "mm10"
#> $ TxDb : chr "TxDb.Mmusculus.UCSC.mm10.knownGene"
#> $ OrgDb : chr "org.Mm.eg.db"
#> $ gene.id.column: chr "ENTREZID"
#> $ display.column: chr "SYMBOL"
#> $ BSgenome : chr "BSgenome.Mmusculus.UCSC.mm10"To see which assemblies have defaults within
plotgardener, call genomes():
genomes()
#> bosTau8
#> bosTau9
#> canFam3
#> ce6
#> ce11
#> danRer10
#> danRer11
#> dm3
#> dm6
#> galGal4
#> galGal5
#> galGal6
#> hg18
#> hg19
#> hg38
#> mm9
#> mm10
#> rheMac3
#> rheMac8
#> rheMac10
#> panTro5
#> panTro6
#> rn4
#> rn5
#> rn6
#> sacCer2
#> sacCer3
#> susScr3
#> susScr11plotgardener functions default to an “hg38” assembly,
but can be customized with any of the other genomic assemblies included
or a assembly object. To create custom genomic assemblies
and combinations of TxDb, orgDb, and
BSgenome packages for use in plotgardener
functions, we can use the assembly() constructor. For
example, we can create our own TxDb from the current human
Ensembl release:
library(GenomicFeatures)
TxDb.Hsapiens.Ensembl.GrCh38.103 <- makeTxDbFromEnsembl(
organism =
"Homo sapiens"
)We can now create a new assembly with this
TxDb and combinations of other Bioconductor packages. The
Genome parameter can is a string to name or describe this
assembly. Since the TxDb is now from ENSEMBL, we will
change the gene.id field to "ENSEMBL" to map
gene IDs and symbols between our TxDb and
orgDb objects. Most gene ID types can be found by calling
AnnotationDbi::keytypes() on an orgDb.
Ensembl38 <- assembly(
Genome = "Ensembl.GRCh38.103",
TxDb = TxDb.Hsapiens.Ensembl.GrCh38.103,
OrgDb = "org.Hs.eg.db",
BSgenome = "BSgenome.Hsapiens.NCBI.GRCh38",
gene.id = "ENSEMBL", display.column = "SYMBOL"
)This assembly object can now be easily passed into
plotgardener functions through the assembly
parameter to use custom genomic assembly configurations.
assembly objects are especially handy for changing the
type of gene or transcript label of our gene and transcript plots. The
default display.column = "SYMBOL", but we could change this
value to other available keytypes in the orgDb
we are using. For example, if we wanted to display the associated
Ensembl IDs of an “hg19” assembly object, we would set
display.column = "ENSEMBL":
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(org.Hs.eg.db)
new_hg19 <- assembly(
Genome = "id_hg19",
TxDb = "TxDb.Hsapiens.UCSC.hg19.knownGene",
OrgDb = "org.Hs.eg.db",
gene.id.column = "ENTREZID",
display.column = "ENSEMBL"
)
pageCreate(
width = 5, height = 1.25,
showGuides = FALSE, xgrid = 0, ygrid = 0
)
genePlot <- plotGenes(
chrom = "chr2", chromstart = 1000000, chromend = 20000000,
assembly = new_hg19,
x = 0.25, y = 0.25, width = 4.75, height = 1
)
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 grid stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] org.Hs.eg.db_3.18.0
#> [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
#> [3] GenomicFeatures_1.54.4
#> [4] AnnotationDbi_1.64.1
#> [5] Biobase_2.62.0
#> [6] GenomicRanges_1.54.1
#> [7] GenomeInfoDb_1.38.8
#> [8] IRanges_2.36.0
#> [9] S4Vectors_0.40.2
#> [10] BiocGenerics_0.48.1
#> [11] plotgardener_1.8.2
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.2 bitops_1.0-7
#> [3] biomaRt_2.58.2 rlang_1.1.3
#> [5] magrittr_2.0.3 matrixStats_1.2.0
#> [7] compiler_4.3.2 RSQLite_2.3.5
#> [9] png_0.1-8 systemfonts_1.0.6
#> [11] vctrs_0.6.5 stringr_1.5.1
#> [13] pkgconfig_2.0.3 crayon_1.5.2
#> [15] fastmap_1.1.1 dbplyr_2.5.0
#> [17] XVector_0.42.0 utf8_1.2.4
#> [19] Rsamtools_2.18.0 rmarkdown_2.26
#> [21] strawr_0.0.91 ragg_1.3.0
#> [23] purrr_1.0.2 bit_4.0.5
#> [25] xfun_0.43 zlibbioc_1.48.2
#> [27] cachem_1.0.8 jsonlite_1.8.8
#> [29] progress_1.2.3 blob_1.2.4
#> [31] highr_0.10 DelayedArray_0.28.0
#> [33] BiocParallel_1.36.0 parallel_4.3.2
#> [35] prettyunits_1.2.0 R6_2.5.1
#> [37] plyranges_1.22.0 bslib_0.7.0
#> [39] stringi_1.8.3 RColorBrewer_1.1-3
#> [41] rtracklayer_1.62.0 jquerylib_0.1.4
#> [43] Rcpp_1.0.12 SummarizedExperiment_1.32.0
#> [45] knitr_1.45 Matrix_1.6-5
#> [47] tidyselect_1.2.1 rstudioapi_0.16.0
#> [49] abind_1.4-5 yaml_2.3.8
#> [51] codetools_0.2-19 curl_5.2.1
#> [53] lattice_0.22-6 tibble_3.2.1
#> [55] withr_3.0.0 KEGGREST_1.42.0
#> [57] evaluate_0.23 gridGraphics_0.5-1
#> [59] desc_1.4.3 BiocFileCache_2.10.1
#> [61] xml2_1.3.6 Biostrings_2.70.3
#> [63] filelock_1.0.3 pillar_1.9.0
#> [65] MatrixGenerics_1.14.0 generics_0.1.3
#> [67] RCurl_1.98-1.14 hms_1.1.3
#> [69] ggplot2_3.5.0 munsell_0.5.0
#> [71] scales_1.3.0 glue_1.7.0
#> [73] tools_4.3.2 BiocIO_1.12.0
#> [75] data.table_1.15.2 GenomicAlignments_1.38.2
#> [77] fs_1.6.3 XML_3.99-0.16.1
#> [79] colorspace_2.1-0 GenomeInfoDbData_1.2.11
#> [81] restfulr_0.0.15 cli_3.6.2
#> [83] rappdirs_0.3.3 textshaping_0.3.7
#> [85] fansi_1.0.6 S4Arrays_1.2.1
#> [87] dplyr_1.1.4 gtable_0.3.4
#> [89] yulab.utils_0.1.4 sass_0.4.9
#> [91] digest_0.6.35 SparseArray_1.2.4
#> [93] ggplotify_0.1.2 rjson_0.2.21
#> [95] memoise_2.0.1 htmltools_0.5.8
#> [97] pkgdown_2.0.7 lifecycle_1.0.4
#> [99] httr_1.4.7 bit64_4.0.5