TileDBArray 1.19.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.30454928 0.74611496 1.79487244 . 1.85318179 1.58908245
## [2,] -0.08667888 0.16381559 -0.17712760 . 0.42133654 -0.61055805
## [3,] 1.30696410 -1.12996385 0.84009884 . -0.05980494 -0.05808135
## [4,] 0.97244124 1.25299863 2.27219488 . -1.41528260 -1.10159782
## [5,] 0.37216000 -0.44255627 -0.40288727 . 2.13459884 -0.75574280
## ... . . . . . .
## [96,] -0.7995732 -0.9925561 1.5172446 . -0.07532888 0.75586234
## [97,] -0.9043828 -0.6321335 1.3410267 . 0.74636282 -0.17400115
## [98,] 0.4729055 0.3520508 0.2273485 . 1.59644603 1.88167730
## [99,] -0.6355612 -1.2216645 0.6149573 . 0.53476421 -0.84086019
## [100,] 0.2327278 -0.3415308 -0.5742914 . 0.84602165 -1.41892375
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.30454928 0.74611496 1.79487244 . 1.85318179 1.58908245
## [2,] -0.08667888 0.16381559 -0.17712760 . 0.42133654 -0.61055805
## [3,] 1.30696410 -1.12996385 0.84009884 . -0.05980494 -0.05808135
## [4,] 0.97244124 1.25299863 2.27219488 . -1.41528260 -1.10159782
## [5,] 0.37216000 -0.44255627 -0.40288727 . 2.13459884 -0.75574280
## ... . . . . . .
## [96,] -0.7995732 -0.9925561 1.5172446 . -0.07532888 0.75586234
## [97,] -0.9043828 -0.6321335 1.3410267 . 0.74636282 -0.17400115
## [98,] 0.4729055 0.3520508 0.2273485 . 1.59644603 1.88167730
## [99,] -0.6355612 -1.2216645 0.6149573 . 0.53476421 -0.84086019
## [100,] 0.2327278 -0.3415308 -0.5742914 . 0.84602165 -1.41892375
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.30454928 0.74611496 1.79487244 . 1.85318179 1.58908245
## GENE_2 -0.08667888 0.16381559 -0.17712760 . 0.42133654 -0.61055805
## GENE_3 1.30696410 -1.12996385 0.84009884 . -0.05980494 -0.05808135
## GENE_4 0.97244124 1.25299863 2.27219488 . -1.41528260 -1.10159782
## GENE_5 0.37216000 -0.44255627 -0.40288727 . 2.13459884 -0.75574280
## ... . . . . . .
## GENE_96 -0.7995732 -0.9925561 1.5172446 . -0.07532888 0.75586234
## GENE_97 -0.9043828 -0.6321335 1.3410267 . 0.74636282 -0.17400115
## GENE_98 0.4729055 0.3520508 0.2273485 . 1.59644603 1.88167730
## GENE_99 -0.6355612 -1.2216645 0.6149573 . 0.53476421 -0.84086019
## GENE_100 0.2327278 -0.3415308 -0.5742914 . 0.84602165 -1.41892375
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -1.30454928 -0.08667888 1.30696410 0.97244124 0.37216000 -0.35576229
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -1.30454928 0.74611496 1.79487244 -0.51326681 -0.85877981
## GENE_2 -0.08667888 0.16381559 -0.17712760 0.64723217 0.75693758
## GENE_3 1.30696410 -1.12996385 0.84009884 -1.96377398 0.11555497
## GENE_4 0.97244124 1.25299863 2.27219488 -0.69932416 1.84665205
## GENE_5 0.37216000 -0.44255627 -0.40288727 -2.09257396 1.87552604
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.6090986 1.4922299 3.5897449 . 3.7063636 3.1781649
## GENE_2 -0.1733578 0.3276312 -0.3542552 . 0.8426731 -1.2211161
## GENE_3 2.6139282 -2.2599277 1.6801977 . -0.1196099 -0.1161627
## GENE_4 1.9448825 2.5059973 4.5443898 . -2.8305652 -2.2031956
## GENE_5 0.7443200 -0.8851125 -0.8057745 . 4.2691977 -1.5114856
## ... . . . . . .
## GENE_96 -1.5991464 -1.9851123 3.0344892 . -0.1506578 1.5117247
## GENE_97 -1.8087655 -1.2642669 2.6820534 . 1.4927256 -0.3480023
## GENE_98 0.9458111 0.7041017 0.4546970 . 3.1928921 3.7633546
## GENE_99 -1.2711224 -2.4433289 1.2299146 . 1.0695284 -1.6817204
## GENE_100 0.4654555 -0.6830616 -1.1485827 . 1.6920433 -2.8378475
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7 SAMP_8
## -6.628905 -3.854672 9.051895 1.908895 3.341130 5.707348 -5.752570 9.202049
## SAMP_9 SAMP_10
## 7.924505 7.494130
out %*% runif(ncol(out))
## [,1]
## GENE_1 2.72534293
## GENE_2 2.34687395
## GENE_3 0.97673086
## GENE_4 2.16236468
## GENE_5 1.28193192
## GENE_6 0.12517365
## GENE_7 -3.28347289
## GENE_8 0.45001727
## GENE_9 0.26912071
## GENE_10 3.94872957
## GENE_11 -2.04530336
## GENE_12 0.40266428
## GENE_13 0.05933279
## GENE_14 0.59725911
## GENE_15 3.26575168
## GENE_16 0.28212878
## GENE_17 1.79363202
## GENE_18 -0.67826696
## GENE_19 -2.81584082
## GENE_20 -2.74505197
## GENE_21 -3.12611249
## GENE_22 0.04633303
## GENE_23 0.72986121
## GENE_24 3.30286392
## GENE_25 -1.54223166
## GENE_26 -0.14206673
## GENE_27 -0.79863637
## GENE_28 -0.22407264
## GENE_29 4.38092092
## GENE_30 -1.37684405
## GENE_31 -5.61312445
## GENE_32 -1.92396728
## GENE_33 1.18829524
## GENE_34 2.79799165
## GENE_35 0.69243349
## GENE_36 -0.91716141
## GENE_37 -0.14916988
## GENE_38 1.96069101
## GENE_39 -3.33213179
## GENE_40 0.13929131
## GENE_41 -2.04578288
## GENE_42 -0.90128306
## GENE_43 -0.49728141
## GENE_44 -1.56303097
## GENE_45 1.83197516
## GENE_46 4.09875019
## GENE_47 -1.64683141
## GENE_48 -0.95764195
## GENE_49 0.93994897
## GENE_50 4.60739578
## GENE_51 -0.32919623
## GENE_52 3.51656597
## GENE_53 -2.43253613
## GENE_54 -1.36355866
## GENE_55 -0.63626473
## GENE_56 3.68425042
## GENE_57 2.18352707
## GENE_58 -0.79251753
## GENE_59 -3.54821305
## GENE_60 -2.02427055
## GENE_61 1.38565813
## GENE_62 -2.31262713
## GENE_63 0.75478036
## GENE_64 -0.68516654
## GENE_65 1.49734085
## GENE_66 -0.44809108
## GENE_67 -0.41947013
## GENE_68 -0.58861813
## GENE_69 -0.32464755
## GENE_70 1.87496493
## GENE_71 0.49370392
## GENE_72 -3.24944009
## GENE_73 -2.70056672
## GENE_74 2.34939426
## GENE_75 0.70918103
## GENE_76 -1.54354597
## GENE_77 6.08001175
## GENE_78 -0.04864215
## GENE_79 1.82144748
## GENE_80 0.09849927
## GENE_81 -1.14493582
## GENE_82 4.25986154
## GENE_83 -0.31332275
## GENE_84 -0.21749355
## GENE_85 1.59870366
## GENE_86 4.65239223
## GENE_87 -1.23163805
## GENE_88 2.78053506
## GENE_89 -1.00846038
## GENE_90 -0.97225170
## GENE_91 2.14083584
## GENE_92 1.34622188
## GENE_93 1.23127024
## GENE_94 3.32721523
## GENE_95 0.31712645
## GENE_96 -0.72520394
## GENE_97 0.27708376
## GENE_98 3.19247751
## GENE_99 -1.35374567
## GENE_100 -4.58342245
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.33815683 1.67606425 0.85609032 . -0.60236990 -0.96544354
## [2,] -0.45206918 1.92332429 1.48133708 . 0.77827044 -0.12919155
## [3,] 0.61446742 -0.22652528 1.40511165 . 1.59927120 -1.12107774
## [4,] 1.79315391 0.09599173 0.86699517 . 0.99022298 -1.83939950
## [5,] 1.94798553 1.20739159 -0.64628009 . -0.05082119 1.07158379
## ... . . . . . .
## [96,] 0.3547901 0.4657901 0.8046488 . -0.7735190 -1.3012874
## [97,] -0.8234590 -0.8667528 -0.6104276 . 1.1498558 -0.2762244
## [98,] -0.8544119 0.2632983 0.6546272 . -0.9557722 -0.3246828
## [99,] -0.2035931 0.6448651 0.2194941 . 0.6687871 -0.0503780
## [100,] -0.2853703 -1.0817441 0.6952653 . 2.0171445 0.3399586
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.33815683 1.67606425 0.85609032 . -0.60236990 -0.96544354
## [2,] -0.45206918 1.92332429 1.48133708 . 0.77827044 -0.12919155
## [3,] 0.61446742 -0.22652528 1.40511165 . 1.59927120 -1.12107774
## [4,] 1.79315391 0.09599173 0.86699517 . 0.99022298 -1.83939950
## [5,] 1.94798553 1.20739159 -0.64628009 . -0.05082119 1.07158379
## ... . . . . . .
## [96,] 0.3547901 0.4657901 0.8046488 . -0.7735190 -1.3012874
## [97,] -0.8234590 -0.8667528 -0.6104276 . 1.1498558 -0.2762244
## [98,] -0.8544119 0.2632983 0.6546272 . -0.9557722 -0.3246828
## [99,] -0.2035931 0.6448651 0.2194941 . 0.6687871 -0.0503780
## [100,] -0.2853703 -1.0817441 0.6952653 . 2.0171445 0.3399586
sessionInfo()
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.22 TileDBArray_1.19.1 DelayedArray_0.35.2
## [4] SparseArray_1.9.0 S4Arrays_1.9.1 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.4
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.1
## [4] BiocManager_1.30.26 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0-1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.32.0
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.5
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.1
## [28] lifecycle_1.0.4 data.table_1.17.6 evaluate_1.0.4
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.1 htmltools_0.5.8.1