vignettes/guides/reading_data_for_plotgardener.Rmd
reading_data_for_plotgardener.Rmd
plotgardener
handles a wide array of genomic data types
in various formats and file types. Not only does it work with
data.frames
, data.tables
,
tibbles
, and Bioconductor GRanges
and
GInteractions
objects, but it can also read in common
genomic file types like BED, BEDPE, bigWig, and .hic files. While files
can be read directly into plotgardener
plotting functions,
plotgardener
also provides functions for reading in these
large genomic data sets to work with them within the R environment:
readBigwig()
: Read in entire bigWig files, or read in
specific genomic regions or strands of bigWig data. Please note that
this function does not work on Windows.
bwFile <- system.file("extdata/test.bw", package="plotgardenerData")
## Read in entire file
bwFileData <- readBigwig(file = bwFile)
## Read in specified region
bwRegion <- readBigwig(file = bwFile,
chrom = "chr2",
chromstart = 1,
chromend = 1500)
## Read in specified region on "+" strand
bwRegionPlus <- readBigwig(file = bwFile,
chrom = "chr2",
chromstart = 1,
chromend = 1500,
strand = "+")
The resulting file will contain seqnames
,
start
, end
, width
,
strand
, and score
columns:
head(bwRegion)
#> seqnames start end width strand score
#> 1 chr2 1 300 300 * -1.00
#> 2 chr2 301 600 300 * -0.75
#> 3 chr2 601 900 300 * -0.50
#> 4 chr2 901 1200 300 * -0.25
#> 5 chr2 1201 1500 300 * 0.00
readHic()
: Read in genomic regions of .hic files with
various data resolutions and normalizations.
hicFile <- system.file("extdata/test_chr22.hic", package="plotgardenerData")
hicDataChrom <- readHic(file = hicFile,
chrom = "22", assembly = "hg19",
resolution = 250000, res_scale = "BP", norm = "NONE"
)
hicDataChromRegion <- readHic(file = hicFile,
chrom = "22", assembly = "hg19",
chromstart = 20000000, chromend = 47500000,
resolution = 100000, res_scale = "BP", norm = "KR"
)
These data will be output in 3-column dataframe in sparse upper triangular matrix format:
head(hicDataChromRegion)
#> 22_A 22_B counts
#> 1 20000000 20000000 55.390347
#> 2 20000000 20100000 6.737655
#> 3 20100000 20100000 31.963037
#> 4 20000000 20200000 3.204865
#> 5 20100000 20200000 12.864663
#> 6 20200000 20200000 37.828201
It is also possible to use readHic
for interchromosomal
Hi-C data:
twoChroms <- readHic(file = "/path/to/hic",
chrom = "chr1", altchrom = "chr2",
resolution = 250000, res_scale = "BP"
)
For other filetypes, we recommend reading in files with
data.table
or rtracklayer
.
library(data.table)
data <- data.table::fread("/path/to/file")
library(rtracklayer)
data <- rtracklayer::import(con = "/path/to/file", format = "fileFormat")
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] grid stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] plotgardenerData_1.8.0 plotgardener_1.8.2
#>
#> loaded via a namespace (and not attached):
#> [1] SummarizedExperiment_1.32.0 gtable_0.3.4
#> [3] rjson_0.2.21 xfun_0.43
#> [5] bslib_0.7.0 ggplot2_3.5.0
#> [7] plyranges_1.22.0 lattice_0.22-6
#> [9] Biobase_2.62.0 vctrs_0.6.5
#> [11] tools_4.3.2 bitops_1.0-7
#> [13] generics_0.1.3 yulab.utils_0.1.4
#> [15] parallel_4.3.2 stats4_4.3.2
#> [17] curl_5.2.1 tibble_3.2.1
#> [19] fansi_1.0.6 pkgconfig_2.0.3
#> [21] Matrix_1.6-5 data.table_1.15.2
#> [23] ggplotify_0.1.2 RColorBrewer_1.1-3
#> [25] desc_1.4.3 S4Vectors_0.40.2
#> [27] lifecycle_1.0.4 GenomeInfoDbData_1.2.11
#> [29] compiler_4.3.2 Rsamtools_2.18.0
#> [31] Biostrings_2.70.3 textshaping_0.3.7
#> [33] munsell_0.5.0 codetools_0.2-19
#> [35] GenomeInfoDb_1.38.8 htmltools_0.5.8
#> [37] sass_0.4.9 RCurl_1.98-1.14
#> [39] yaml_2.3.8 pillar_1.9.0
#> [41] pkgdown_2.0.7 crayon_1.5.2
#> [43] jquerylib_0.1.4 BiocParallel_1.36.0
#> [45] DelayedArray_0.28.0 cachem_1.0.8
#> [47] abind_1.4-5 tidyselect_1.2.1
#> [49] digest_0.6.35 restfulr_0.0.15
#> [51] dplyr_1.1.4 purrr_1.0.2
#> [53] fastmap_1.1.1 SparseArray_1.2.4
#> [55] colorspace_2.1-0 cli_3.6.2
#> [57] magrittr_2.0.3 S4Arrays_1.2.1
#> [59] XML_3.99-0.16.1 utf8_1.2.4
#> [61] withr_3.0.0 scales_1.3.0
#> [63] rmarkdown_2.26 XVector_0.42.0
#> [65] matrixStats_1.2.0 ragg_1.3.0
#> [67] memoise_2.0.1 evaluate_0.23
#> [69] knitr_1.45 BiocIO_1.12.0
#> [71] GenomicRanges_1.54.1 IRanges_2.36.0
#> [73] rtracklayer_1.62.0 gridGraphics_0.5-1
#> [75] rlang_1.1.3 Rcpp_1.0.12
#> [77] glue_1.7.0 BiocGenerics_0.48.1
#> [79] rstudioapi_0.16.0 jsonlite_1.8.8
#> [81] strawr_0.0.91 R6_2.5.1
#> [83] MatrixGenerics_1.14.0 GenomicAlignments_1.38.2
#> [85] systemfonts_1.0.6 fs_1.6.3
#> [87] zlibbioc_1.48.2