smoothclust 0.99.1
smoothclust
is a method for segmentation of spatial domains and spatially-aware clustering in spatial transcriptomics data. The method generates spatial domains with smooth boundaries by smoothing gene expression profiles across neighboring spatial locations, followed by unsupervised clustering. Spatial domains consisting of consistent mixtures of cell types may then be further investigated by applying cell type compositional analyses or differential analyses.
The development version of the package can be installed from GitHub:
remotes::install_github("lmweber/smoothclust")
Input data can be provided either as a SpatialExperiment object within the Bioconductor framework, or as numeric matrices of expression values and spatial coordinates. See help file (?smoothclust
) for details.
In the example workflow below, we assume the input is in SpatialExperiment
format.
The example workflow in this section demonstrates how to run smoothclust
to generate spatial domains with smooth boundaries for a dataset from the 10x Genomics Visium platform.
library(smoothclust)
library(STexampleData)
library(scuttle)
library(scran)
library(scater)
library(ggspavis)
Load dataset from STexampleData
package:
# load data
spe <- Visium_humanDLPFC()
## see ?STexampleData and browseVignettes('STexampleData') for documentation
## loading from cache
# keep spots over tissue
spe <- spe[, colData(spe)$in_tissue == 1]
dim(spe)
## [1] 33538 3639
assayNames(spe)
## [1] "counts"
Run smoothclust
using default parameter settings, which have been selected to be appropriate for Visium data from human tissue.
# run smoothclust
spe <- smoothclust(spe)
The smoothed expression values are stored as updated values in the counts
assay. The original expression count values are also retained in the object (e.g. in case they are needed later for plotting) and stored in a new assay named counts_unsmoothed
.
# check output object
assayNames(spe)
## [1] "counts" "counts_unsmoothed"
Calculate log-transformed normalized counts (logcounts) on the smoothed expression values:
# calculate logcounts
spe <- logNormCounts(spe)
assayNames(spe)
## [1] "counts" "counts_unsmoothed" "logcounts"
Run clustering. We use a standard clustering workflow from single-cell data, consisting of k-means clustering on the top principal components (PCs) calculated on the set of top highly variable genes (HVGs) with logcounts as the input.
We use a relatively small number of clusters for demonstration purposes in this example:
# preprocessing steps for clustering
# remove mitochondrial genes
is_mito <- grepl("(^MT-)|(^mt-)", rowData(spe)$gene_name)
table(is_mito)
## is_mito
## FALSE TRUE
## 33525 13
spe <- spe[!is_mito, ]
dim(spe)
## [1] 33525 3639
# select top highly variable genes (HVGs)
dec <- modelGeneVar(spe)
top_hvgs <- getTopHVGs(dec, prop = 0.1)
length(top_hvgs)
## [1] 986
spe <- spe[top_hvgs, ]
dim(spe)
## [1] 986 3639
# dimensionality reduction
# compute PCA on top HVGs
set.seed(123)
spe <- runPCA(spe)
# run k-means clustering
set.seed(123)
k <- 5
clust <- kmeans(reducedDim(spe, "PCA"), centers = k)$cluster
table(clust)
## clust
## 1 2 3 4 5
## 967 1574 492 272 334
colLabels(spe) <- factor(clust)
Plot clusters / spatial domains generated by the smoothclust workflow:
# color palettes
pal8 <- "libd_layer_colors"
pal36 <- unname(palette.colors(36, "Polychrome 36"))
# plot clusters / spatial domains
plotSpots(spe, annotate = "label", pal = pal8)
Plot manually annotated reference labels, which can be used to evaluate the performance of the clustering in this dataset:
# plot reference labels
plotSpots(spe, annotate = "ground_truth", pal = pal8)
sessionInfo()
## R Under development (unstable) (2024-03-18 r86148)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggspavis_1.9.5 scater_1.31.2
## [3] ggplot2_3.5.0 scran_1.31.3
## [5] scuttle_1.13.1 STexampleData_1.11.1
## [7] SpatialExperiment_1.13.1 SingleCellExperiment_1.25.0
## [9] SummarizedExperiment_1.33.3 Biobase_2.63.0
## [11] GenomicRanges_1.55.4 GenomeInfoDb_1.39.9
## [13] IRanges_2.37.1 S4Vectors_0.41.5
## [15] MatrixGenerics_1.15.0 matrixStats_1.2.0
## [17] ExperimentHub_2.11.1 AnnotationHub_3.11.3
## [19] BiocFileCache_2.11.1 dbplyr_2.5.0
## [21] BiocGenerics_0.49.1 smoothclust_0.99.1
## [23] BiocStyle_2.31.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 jsonlite_1.8.8
## [3] wk_0.9.1 magrittr_2.0.3
## [5] ggbeeswarm_0.7.2 magick_2.8.3
## [7] farver_2.1.1 rmarkdown_2.26
## [9] zlibbioc_1.49.3 vctrs_0.6.5
## [11] spdep_1.3-3 memoise_2.0.1
## [13] DelayedMatrixStats_1.25.1 htmltools_0.5.7
## [15] S4Arrays_1.3.6 curl_5.2.1
## [17] BiocNeighbors_1.21.2 s2_1.1.6
## [19] SparseArray_1.3.4 sass_0.4.9
## [21] spData_2.3.0 KernSmooth_2.23-22
## [23] bslib_0.6.1 cachem_1.0.8
## [25] igraph_2.0.3 ggside_0.3.1
## [27] mime_0.12 lifecycle_1.0.4
## [29] pkgconfig_2.0.3 rsvd_1.0.5
## [31] Matrix_1.6-5 R6_2.5.1
## [33] fastmap_1.1.1 GenomeInfoDbData_1.2.11
## [35] digest_0.6.35 colorspace_2.1-0
## [37] AnnotationDbi_1.65.2 dqrng_0.3.2
## [39] irlba_2.3.5.1 RSQLite_2.3.5
## [41] beachmat_2.19.1 labeling_0.4.3
## [43] filelock_1.0.3 fansi_1.0.6
## [45] httr_1.4.7 abind_1.4-5
## [47] compiler_4.4.0 proxy_0.4-27
## [49] bit64_4.0.5 withr_3.0.0
## [51] BiocParallel_1.37.1 viridis_0.6.5
## [53] DBI_1.2.2 highr_0.10
## [55] rappdirs_0.3.3 DelayedArray_0.29.9
## [57] rjson_0.2.21 classInt_0.4-10
## [59] bluster_1.13.0 tools_4.4.0
## [61] units_0.8-5 vipor_0.4.7
## [63] beeswarm_0.4.0 glue_1.7.0
## [65] dbscan_1.1-12 grid_4.4.0
## [67] sf_1.0-15 cluster_2.1.6
## [69] generics_0.1.3 gtable_0.3.4
## [71] class_7.3-22 BiocSingular_1.19.0
## [73] ScaledMatrix_1.11.1 metapod_1.11.1
## [75] sp_2.1-3 utf8_1.2.4
## [77] XVector_0.43.1 ggrepel_0.9.5
## [79] BiocVersion_3.19.1 pillar_1.9.0
## [81] limma_3.59.6 dplyr_1.1.4
## [83] lattice_0.22-6 bit_4.0.5
## [85] deldir_2.0-4 tidyselect_1.2.1
## [87] locfit_1.5-9.9 Biostrings_2.71.4
## [89] knitr_1.45 gridExtra_2.3
## [91] bookdown_0.38 edgeR_4.1.18
## [93] xfun_0.42 statmod_1.5.0
## [95] yaml_2.3.8 boot_1.3-30
## [97] evaluate_0.23 codetools_0.2-19
## [99] tibble_3.2.1 BiocManager_1.30.22
## [101] cli_3.6.2 munsell_0.5.0
## [103] jquerylib_0.1.4 Rcpp_1.0.12
## [105] png_0.1-8 parallel_4.4.0
## [107] blob_1.2.4 sparseMatrixStats_1.15.0
## [109] viridisLite_0.4.2 scales_1.3.0
## [111] e1071_1.7-14 purrr_1.0.2
## [113] crayon_1.5.2 rlang_1.1.3
## [115] KEGGREST_1.43.0