0. Introduction

ClustAll is an R package originally designed for patient stratification in complex diseases.

The ClustAll framework is dedicated for identifying patient subgroups, addressing common challenges encountered in clinical data analysis. The underlying concept is that a robust stratification should be reproducible through various clustering methods. To achieve patient stratification, ClustAll employs diverse distance metrics (Correlation-based distance and Gower distance) and clustering methods (K-Means, K-Medoids, and H-Clust).

Moreover, ClustAll:

Additionally, the package includes functions to:

1. Installation

ClustAll is developed using S4 object-oriented programming, and requires R (>=4.2.0). ClustAll utilizes many other R packages that are currently available from CRAN, Bioconductor and GitHub.

if (!require("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("TranslationalBioinformaticsUnit/ClustAll")

After installation, you should be able to load the ClustAll package in your R session:

library(ClustAll)

2. ClustAll workflow

ClustAll constitutes a friendly workflow that comprises 8 functions:

createClustAll

The createClustAll function is a key component of the ClustAll workflow. It serves the purpose of creating an S4 object of class ClustAllObject. This object is specifically designed to facilitate the application of the ClustAll algorithm and to store the primary results of the clustering process.

Usage:

createClustAll(data_use, nImputation, dataImputed)

Arguments:

  • data_use: A dataframe containing the original dataset. This dataset may include missing values (NAs).

  • nImputation: A numeric value indicating the number of imputations to be computed if the original dataset contains NAs.

  • dataImputed: A mids object created with the mice R package. It should represent the imputed data used for clustering, and the data_use and dataImputed datasets must originate from the same source and include the same variables.

The ClustAll pipeline is versatile, accommodating three possible scenarios based on the nature of the input data.

  1. No Missing Values: In scenarios where the initial dataset does not contain any missing values (NAs), the use of nImputation and dataImputed is not required.
  2. Missing Values Imputed Within ClustAll Framework: When the initial dataset contains missing values (NAs), the ClustAll pipeline offers an integrated solution for imputation using the mice package. In this case:
    • nImputations is a required parameter to specify the number of imputations to be computed within the ClustAll framework.
    • dataImputed is not required, as the imputations are handled internally.
  3. External Imputation with Imputed Data (mids class): If the initial dataset contains missing values (NAs), but the imputations have been conducted externally (resulting in an imputed dataset of class mids), the following parameters are needed:
    • nImputations is not required.
    • dataImputed is necessary to provide the imputed dataset, and it should be of the mids class.

These scenarios offer flexibility in adapting the ClustAll pipeline to various data conditions, providing users with options depending on their data preprocessing needs.

runClustAll

The runClustAll method is a pivotal function in the ClustAll pipeline, responsible for executing the ClustAll algorithm and producing clustering results.

Usage:

runClustAll(Object, threads, simplify)

Arguments:

  • Object: A ClustAllObject-class object, created using the createClustAll function.

  • threads: A numeric value specifying the number of cores to use for parallel computing. This parameter enables users to accelerate the computation process.

  • simplify: A logical value. If set to TRUE, only one out of the four depths of the dendrogram is considered, streamlining the results for simplicity. If set to FALSE, all possible depths of the dendrogram are considered, offering a more detailed clustering analysis.

This method is crucial for running the ClustAll algorithm efficiently, and the parameters allow users to customize the computational aspects of the process based on their preferences and system capabilities.

plotJACCARD

The plotJACCARD function is designed to generate a correlation matrix heatmap that visually represents the Jaccard Distance between population-robust stratifications stored within a ClustAllObject-class object.

Usage:

plotJACCARD(Object, paint, stratification_similarity)

Arguments:

  • Object: A ClustAllObject-class object created using the createClustAll function, containing the results of the ClustAll algorithm.

  • paint: A logical value. When set to TRUE, groups of similar stratifications are highlighted within a red square, enhancing the visualization of stratification patterns.

  • stratification_similarity: A numeric value representing the minimum Jaccard Distance required to consider a pair of stratifications as similar. The default value is set to 0.7, but users can adjust this threshold based on their specific needs.

This function is valuable for gaining insights into the relationships between different stratifications and understanding the robustness of population clustering within the context of the ClustAll analysis.

resStratification

The resStratification function is designed to retrieve stratification representatives by filtering out those that do not contain clusters representing a minimum percentage of the total population. This function offers flexibility, allowing users to obtain either all robust stratifications or a single representative from each group of similar stratifications.

Usage:

resStratification(Object, population, stratification_similarity, all)

Arguments:

  • Object: A ClustAllObject-class object created using the createClustAll function, containing the results of the ClustAll algorithm.
  • population: A numeric value indicating the minimum percentage of the total population that a cluster in a stratification must have to be considered as representative. The default is set to 0.05 (5%).
  • stratification_similarity: A numerical value representing the minimum Jaccard Distance required to consider a pair of stratifications as similar. The default is set to 0.7.
  • all: A logical value. When set to TRUE, the function returns all similar representative stratifications. If set to FALSE, only the centroid stratification for each group of similar stratifications is returned.

This function is useful for extracting meaningful representations of robust stratifications, enabling users to focus on key clusters within the population and streamline the interpretation of the clustering results.

plotSANKEY

The plotSankey function is designed to generate a Sankey diagram comparing two selected clusters within the ClustAllObject-class object. This visualization aids in the comparison of stratifications, providing insights into the flow and distribution of elements between the chosen clusters.

Usage:

plotSankey(Object, clusters)

Arguments:

  • Object: A ClustAllObject-class object created using the createClustAll function, containing the results of the ClustAll algorithm.

  • clusters: A character vector with the names of a pair of stratifications. The names can be obtained using the resStratification function.

This function is particularly valuable for visually assessing the differences and similarities between two selected clusters, allowing for a more intuitive understanding of the population distribution within the context of the ClustAll analysis.

cluster2data

The cluster2data function is designed to retrieve a dataframe that combines the original dataset with the selected ClustAll stratification(s), which are included as additional variables. This allows users to explore and analyze the original data in the context of the identified clusters.

Usage:

cluster2data(Object, clusterName)

Arguments:

  • Object: A ClustAllObject-class object created using the createClustAll function, containing the results of the ClustAll algorithm.

  • clusterName: A character vector with one or more stratification names. These names are used to select the specific ClustAll stratifications that will be included as variables in the resulting dataframe.

This function is allows the integration of the clustering information back into the original dataset, facilitating further analysis and interpretation of the data within the context of the identified clusters.

Validation

The Validation function is designed to validate the results obtained from one or more (multiple) robust stratifications by comparing them with the original data labels, provided the original data labeling is available.

Usage:

Validation(Object, stratificationName)

Arguments:

  • Object: A ClustAllObject-class object created using the createClustAll function, containing the results of the ClustAll algorithm. This function serves the step for assessing the reliability of the obtained robust stratifications. By comparing them with the original data labels (if available), it provides a validation mechanism to evaluate the accuracy and consistency of the clustering results in relation to the true underlying patterns in the data.

Note: Ensure that the original data labeling is present in the ClustAllObject for this step.

3. Real data scenario:

Breast Cancer Wisconsin (Diagnostic)

This package contains a real dataset of breast cancer (doi: 10.24432/C5DW2B). The dataset comprises two types of features —categorical and numerical— derived from a digitized image of a fine needle aspirate (FNA) of a breast mass from 659 patients. Each patient is characterized by 30 features (10x3) and belongs to one of two target classes: ‘malignant’ or ‘benign’.

To showcase ClustAll’s when dealing with missing data, a modification with random missing values was applied to the dataset, demonstrating the package’s resilience in handling missing data. The breast cancer dataset includes the following features:

  1. radius: Mean of distances from the center to points on the perimeter.
  2. texture: Standard deviation of gray-scale values.
  3. perimeter
  4. area
  5. smoothness: Local variation in radius lengths.
  6. compactness: (Perimeter^2 / Area) - 1.0.
  7. concavity: Severity of concave portions of the contour.
  8. concave points: Number of concave portions of the contour.
  9. symmetry
  10. fractal dimension: “Coastline approximation” - 1.

The dataset also includes the patient ID and diagnosis (M = malignant, B = benign).

3.1 Get data from example file

# load example data
data("BreastCancerWisconsin", package = "ClustAll") 

# remove patients ID (non-informative), and diagnosis as it is the "true label"
data_use <- subset(wdbc,select=-ID)

# explore the features of example data
str(data_use)
#> 'data.frame':    569 obs. of  31 variables:
#>  $ Diagnosis         : chr  "M" "M" "M" "M" ...
#>  $ radius1           : num  18 20.6 19.7 11.4 20.3 ...
#>  $ texture1          : num  10.4 17.8 21.2 20.4 14.3 ...
#>  $ perimeter1        : num  122.8 132.9 130 77.6 135.1 ...
#>  $ area1             : num  1001 1326 1203 386 1297 ...
#>  $ smoothness1       : num  0.1184 0.0847 0.1096 0.1425 0.1003 ...
#>  $ compactness1      : num  0.2776 0.0786 0.1599 0.2839 0.1328 ...
#>  $ concavity1        : num  0.3001 0.0869 0.1974 0.2414 0.198 ...
#>  $ concave_points1   : num  0.1471 0.0702 0.1279 0.1052 0.1043 ...
#>  $ symmetry1         : num  0.242 0.181 0.207 0.26 0.181 ...
#>  $ fractal_dimension1: num  0.0787 0.0567 0.06 0.0974 0.0588 ...
#>  $ radius2           : num  1.095 0.543 0.746 0.496 0.757 ...
#>  $ texture2          : num  0.905 0.734 0.787 1.156 0.781 ...
#>  $ perimeter2        : num  8.59 3.4 4.58 3.44 5.44 ...
#>  $ area2             : num  153.4 74.1 94 27.2 94.4 ...
#>  $ smoothness2       : num  0.0064 0.00522 0.00615 0.00911 0.01149 ...
#>  $ compactness2      : num  0.049 0.0131 0.0401 0.0746 0.0246 ...
#>  $ concavity2        : num  0.0537 0.0186 0.0383 0.0566 0.0569 ...
#>  $ concave_points2   : num  0.0159 0.0134 0.0206 0.0187 0.0188 ...
#>  $ symmetry2         : num  0.03 0.0139 0.0225 0.0596 0.0176 ...
#>  $ fractal_dimension2: num  0.00619 0.00353 0.00457 0.00921 0.00511 ...
#>  $ radius3           : num  25.4 25 23.6 14.9 22.5 ...
#>  $ texture3          : num  17.3 23.4 25.5 26.5 16.7 ...
#>  $ perimeter3        : num  184.6 158.8 152.5 98.9 152.2 ...
#>  $ area3             : num  2019 1956 1709 568 1575 ...
#>  $ smoothness3       : num  0.162 0.124 0.144 0.21 0.137 ...
#>  $ compactness3      : num  0.666 0.187 0.424 0.866 0.205 ...
#>  $ concavity3        : num  0.712 0.242 0.45 0.687 0.4 ...
#>  $ concave_points3   : num  0.265 0.186 0.243 0.258 0.163 ...
#>  $ symmetry3         : num  0.46 0.275 0.361 0.664 0.236 ...
#>  $ fractal_dimension3: num  0.1189 0.089 0.0876 0.173 0.0768 ...

3.1 Scenario 1: Data with no missing values

3.1.1 Create the ClustAll object

obj_noNA <- createClustAll(data = data_use, colValidation = "Diagnosis",
                           nImputation = NULL, dataImputed = NULL)
#> The dataset contains character values.
#> They are converted to categorical (more than one class) or to binary (one class).
#> Before continuing, check that the transformation has been processed correctly.
#> 
#> ClustALL object created successfully. You can use runClustAll.
str(obj_noNA)
#> Formal class 'ClustAllObject' [package "ClustAll"] with 8 slots
#>   ..@ data              :'data.frame':   569 obs. of  30 variables:
#>   .. ..$ radius1           : num [1:569] 18 20.6 19.7 11.4 20.3 ...
#>   .. ..$ texture1          : num [1:569] 10.4 17.8 21.2 20.4 14.3 ...
#>   .. ..$ perimeter1        : num [1:569] 122.8 132.9 130 77.6 135.1 ...
#>   .. ..$ area1             : num [1:569] 1001 1326 1203 386 1297 ...
#>   .. ..$ smoothness1       : num [1:569] 0.1184 0.0847 0.1096 0.1425 0.1003 ...
#>   .. ..$ compactness1      : num [1:569] 0.2776 0.0786 0.1599 0.2839 0.1328 ...
#>   .. ..$ concavity1        : num [1:569] 0.3001 0.0869 0.1974 0.2414 0.198 ...
#>   .. ..$ concave_points1   : num [1:569] 0.1471 0.0702 0.1279 0.1052 0.1043 ...
#>   .. ..$ symmetry1         : num [1:569] 0.242 0.181 0.207 0.26 0.181 ...
#>   .. ..$ fractal_dimension1: num [1:569] 0.0787 0.0567 0.06 0.0974 0.0588 ...
#>   .. ..$ radius2           : num [1:569] 1.095 0.543 0.746 0.496 0.757 ...
#>   .. ..$ texture2          : num [1:569] 0.905 0.734 0.787 1.156 0.781 ...
#>   .. ..$ perimeter2        : num [1:569] 8.59 3.4 4.58 3.44 5.44 ...
#>   .. ..$ area2             : num [1:569] 153.4 74.1 94 27.2 94.4 ...
#>   .. ..$ smoothness2       : num [1:569] 0.0064 0.00522 0.00615 0.00911 0.01149 ...
#>   .. ..$ compactness2      : num [1:569] 0.049 0.0131 0.0401 0.0746 0.0246 ...
#>   .. ..$ concavity2        : num [1:569] 0.0537 0.0186 0.0383 0.0566 0.0569 ...
#>   .. ..$ concave_points2   : num [1:569] 0.0159 0.0134 0.0206 0.0187 0.0188 ...
#>   .. ..$ symmetry2         : num [1:569] 0.03 0.0139 0.0225 0.0596 0.0176 ...
#>   .. ..$ fractal_dimension2: num [1:569] 0.00619 0.00353 0.00457 0.00921 0.00511 ...
#>   .. ..$ radius3           : num [1:569] 25.4 25 23.6 14.9 22.5 ...
#>   .. ..$ texture3          : num [1:569] 17.3 23.4 25.5 26.5 16.7 ...
#>   .. ..$ perimeter3        : num [1:569] 184.6 158.8 152.5 98.9 152.2 ...
#>   .. ..$ area3             : num [1:569] 2019 1956 1709 568 1575 ...
#>   .. ..$ smoothness3       : num [1:569] 0.162 0.124 0.144 0.21 0.137 ...
#>   .. ..$ compactness3      : num [1:569] 0.666 0.187 0.424 0.866 0.205 ...
#>   .. ..$ concavity3        : num [1:569] 0.712 0.242 0.45 0.687 0.4 ...
#>   .. ..$ concave_points3   : num [1:569] 0.265 0.186 0.243 0.258 0.163 ...
#>   .. ..$ symmetry3         : num [1:569] 0.46 0.275 0.361 0.664 0.236 ...
#>   .. ..$ fractal_dimension3: num [1:569] 0.1189 0.089 0.0876 0.173 0.0768 ...
#>   ..@ dataOriginal      :'data.frame':   569 obs. of  30 variables:
#>   .. ..$ radius1           : num [1:569] 18 20.6 19.7 11.4 20.3 ...
#>   .. ..$ texture1          : num [1:569] 10.4 17.8 21.2 20.4 14.3 ...
#>   .. ..$ perimeter1        : num [1:569] 122.8 132.9 130 77.6 135.1 ...
#>   .. ..$ area1             : num [1:569] 1001 1326 1203 386 1297 ...
#>   .. ..$ smoothness1       : num [1:569] 0.1184 0.0847 0.1096 0.1425 0.1003 ...
#>   .. ..$ compactness1      : num [1:569] 0.2776 0.0786 0.1599 0.2839 0.1328 ...
#>   .. ..$ concavity1        : num [1:569] 0.3001 0.0869 0.1974 0.2414 0.198 ...
#>   .. ..$ concave_points1   : num [1:569] 0.1471 0.0702 0.1279 0.1052 0.1043 ...
#>   .. ..$ symmetry1         : num [1:569] 0.242 0.181 0.207 0.26 0.181 ...
#>   .. ..$ fractal_dimension1: num [1:569] 0.0787 0.0567 0.06 0.0974 0.0588 ...
#>   .. ..$ radius2           : num [1:569] 1.095 0.543 0.746 0.496 0.757 ...
#>   .. ..$ texture2          : num [1:569] 0.905 0.734 0.787 1.156 0.781 ...
#>   .. ..$ perimeter2        : num [1:569] 8.59 3.4 4.58 3.44 5.44 ...
#>   .. ..$ area2             : num [1:569] 153.4 74.1 94 27.2 94.4 ...
#>   .. ..$ smoothness2       : num [1:569] 0.0064 0.00522 0.00615 0.00911 0.01149 ...
#>   .. ..$ compactness2      : num [1:569] 0.049 0.0131 0.0401 0.0746 0.0246 ...
#>   .. ..$ concavity2        : num [1:569] 0.0537 0.0186 0.0383 0.0566 0.0569 ...
#>   .. ..$ concave_points2   : num [1:569] 0.0159 0.0134 0.0206 0.0187 0.0188 ...
#>   .. ..$ symmetry2         : num [1:569] 0.03 0.0139 0.0225 0.0596 0.0176 ...
#>   .. ..$ fractal_dimension2: num [1:569] 0.00619 0.00353 0.00457 0.00921 0.00511 ...
#>   .. ..$ radius3           : num [1:569] 25.4 25 23.6 14.9 22.5 ...
#>   .. ..$ texture3          : num [1:569] 17.3 23.4 25.5 26.5 16.7 ...
#>   .. ..$ perimeter3        : num [1:569] 184.6 158.8 152.5 98.9 152.2 ...
#>   .. ..$ area3             : num [1:569] 2019 1956 1709 568 1575 ...
#>   .. ..$ smoothness3       : num [1:569] 0.162 0.124 0.144 0.21 0.137 ...
#>   .. ..$ compactness3      : num [1:569] 0.666 0.187 0.424 0.866 0.205 ...
#>   .. ..$ concavity3        : num [1:569] 0.712 0.242 0.45 0.687 0.4 ...
#>   .. ..$ concave_points3   : num [1:569] 0.265 0.186 0.243 0.258 0.163 ...
#>   .. ..$ symmetry3         : num [1:569] 0.46 0.275 0.361 0.664 0.236 ...
#>   .. ..$ fractal_dimension3: num [1:569] 0.1189 0.089 0.0876 0.173 0.0768 ...
#>   ..@ dataImputed       : NULL
#>   ..@ dataValidation    : num [1:569] 2 2 2 2 2 2 2 2 2 2 ...
#>   ..@ nImputation       : num 0
#>   ..@ processed         : logi FALSE
#>   ..@ summary_clusters  : NULL
#>   ..@ JACCARD_DISTANCE_F: NULL

3.1.2 Execute the ClustAll algorithm

# Considering all the depths of the dendrogram, otherwise set simplify to TRUE
obj_noNA1 <- runClustAll(Object = obj_noNA, threads = 2, simplify = FALSE)
#>       ______ __              __   ___     __     __
#>      / ____// /__  __ _____ / /_ /   |   / /    / /
#>     / /    / // / / // ___// __// /| |  / /    / /
#>    / /___ / // /_/ /(__  )/ /_ / ___ | / /___ / /___
#>   /_____//_/ |__,_//____/ |__//_/  |_|/_____//_____/
#> Running Data Complexity Reduction and Stratification Process.
#> This step may take some time...
#> 
#> 
#> Calculating correlation distance matrix of the statifications...
#> 
#> 
#> Filtering non-robust statifications...
#> 
#> ClustAll pipeline finished successfully!

3.1.3 Represent the Jaccard Distance between population-robust stratifications

plotJACCARD(Object = obj_noNA1, stratification_similarity = 0.88)

3.1.4 Retrieve stratification representatives

resStratification(Object = obj_noNA1, population = 0.05, 
                  stratification_similarity = 0.88, all = FALSE)
#> $cuts_a_9
#> $cuts_a_9[[1]]
#> 
#>   1   2 
#> 193 376 
#> 
#> 
#> $cuts_c_3
#> $cuts_c_3[[1]]
#> 
#>   1   2 
#> 199 370

3.1.5 Generate Sankey diagrams comparing pairs of stratifications, or a stratification with true labels

plotSANKEY(Object = obj_noNA1, clusters = c("cuts_c_3","cuts_a_9"), validationData = FALSE)

plotSANKEY(Object = obj_noNA1, clusters = c("cuts_c_3","cuts_b_13"), validationData = FALSE)

plotSANKEY(Object = obj_noNA1, clusters = c("cuts_a_9"), validationData = TRUE)

3.1.6 Retrieve the original dataset with the selected ClustAll stratification(s)

df <- cluster2data(Object = obj_noNA1,
                   stratificationName = c("cuts_c_3","cuts_a_9","cuts_b_13"))

3.1.7 Assess the results the sensitivity and specifity of the selected ClustAll stratifications against true labels (if available)

# STRATIFICATION 1
validateStratification(obj_noNA1, "cuts_a_9")
#> sensitivity specificity 
#>   0.8396226   0.9579832

# STRATIFICATION 2
validateStratification(obj_noNA1, "cuts_b_13")
#> sensitivity specificity 
#>   0.8160377   0.9243697

# STRATIFICATION 3
validateStratification(obj_noNA1, "cuts_b_9")
#> sensitivity specificity 
#>   0.8679245   0.8991597

3.2 Scenario 2: Dataset with missing values and imputation performed within ClustAll framework

3.2.1 Create the ClustAll object and compute imputation

data("BreastCancerWisconsinMISSING", package = "ClustAll")
data_use_NA <- wdbcNA
colSums(is.na(data_use_NA)) # dataset present NAs

obj_NA <- createClustAll(data_use_NA, nImputation = 2,  
                         colValidation = "Diagnosis") 
#> Before continuing, check that the transformation has been processed correctly.
#> Running default multiple imputation method.
#> For more information check mice package.
#> Warning: Number of logged events: 90
#> 
#> ClustALL object created successfully. You can use runClustAll.

3.2.2 he rest of the pipeline follows as a)

obj_NA1 <- runClustAll(obj_NA, threads = 2) 

# Represent the Jaccard Distance between population-robust stratifications
plotJACCARD(Object = obj_NA1, stratification_similarity = 0.88)

# Retrieve stratification representatives
resStratification(Object = obj_NA1, population = 0.05, 
                  stratification_similarity = 0.88, all = FALSE)

# Generate Sankey diagrams comparing pairs of stratifications, or a stratification with true labels
plotSANKEY(Object = obj_NA1, clusters = c("cuts_a_2","cuts_a_8"), 
           validationData = FALSE)

plotSANKEY(Object = obj_NA1, clusters = c("cuts_a_2"), 
           validationData = TRUE)

# Retrieve the original dataset with the selected ClustAll stratification(s)
df_NA <- cluster2data(Object = obj_NA1, stratificationName = c("cuts_a_2"))

# Assess the results the sensitivity and specifity of the selected ClustAll 
# stratifications against true labels (if available)
validateStratification(obj_NA1, "cuts_a_2")

3.3 Scenario 3: Dataset with missing values and imputation performed externally

3.3.1 Create the ClustAll object and compute imputation

Perform imputation and store in an mids object. In this case, mice package is used.

require(mice)
#> Loading required package: mice
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
data("BreastCancerWisconsinMISSING", package = "ClustAll") # load example data
data_use_NA <- wdbcNA
str(data_use_NA)

imp_data_use <- mice(data_use_NA[-1], m=2, maxit = 5,seed=1234, print=FALSE)
#> Warning: Number of logged events: 90

3.3.2 Create ClustAll object

# dataImputed contains the mids object with the imputed
obj_imp1 <- createClustAll(data=data_use_NA, dataImputed = imp_data_use, 
                           colValidation = "Diagnosis") 

3.3.3 The rest of the pipeline follows as a)

# The rest of the pipeline follows as a)
obj_imp1 <- runClustAll(obj_imp1, threads = 2) 

# Represent the Jaccard Distance between population-robust stratifications
plotJACCARD(Object = obj_imp1, stratification_similarity = 0.88)

# Retrieve stratification representatives
resStratification(Object = obj_imp1, population = 0.05, 
                  stratification_similarity = 0.88, all = FALSE)

# Generate Sankey diagrams comparing pairs of stratifications, or a stratification with true labels
plotSANKEY(Object = obj_imp1, clusters = c("cuts_a_2","cuts_a_22"), validationData = FALSE)

# Retrieve the original dataset with the selected ClustAll stratification(s)
df_imp <- cluster2data(Object = obj_imp1, stratificationName = c("cuts_a_2"))

# Validate stratification
validateStratification(obj_imp1, "cuts_a_2")

Session Info

sessionInfo()
#> R Under development (unstable) (2024-01-16 r85808)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mice_3.16.0      ClustAll_0.99.10 BiocStyle_2.31.0
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3    jsonlite_1.8.8        shape_1.4.6          
#>   [4] magrittr_2.0.3        magick_2.8.2          TH.data_1.1-2        
#>   [7] estimability_1.4.1    modeltools_0.2-23     jomo_2.7-6           
#>  [10] nloptr_2.0.3          rmarkdown_2.25        GlobalOptions_0.1.2  
#>  [13] vctrs_0.6.5           Cairo_1.6-2           minqa_1.2.6          
#>  [16] htmltools_0.5.7       broom_1.0.5           mitml_0.4-5          
#>  [19] sass_0.4.8            bslib_0.6.1           htmlwidgets_1.6.4    
#>  [22] sandwich_3.1-0        emmeans_1.10.0        zoo_1.8-12           
#>  [25] cachem_1.0.8          networkD3_0.4         igraph_2.0.1.1       
#>  [28] lifecycle_1.0.4       iterators_1.0.14      pkgconfig_2.0.3      
#>  [31] Matrix_1.6-5          R6_2.5.1              fastmap_1.1.1        
#>  [34] clue_0.3-65           digest_0.6.34         colorspace_2.1-0     
#>  [37] spatial_7.3-17        S4Vectors_0.41.3      ps_1.7.6             
#>  [40] rmio_0.4.0            fansi_1.0.6           compiler_4.4.0       
#>  [43] doParallel_1.0.17     backports_1.4.1       highr_0.10           
#>  [46] bigassertr_0.1.6      pan_1.9               MASS_7.3-60.2        
#>  [49] rjson_0.2.21          scatterplot3d_0.3-44  fBasics_4032.96      
#>  [52] flashClust_1.01-2     tools_4.4.0           bigstatsr_1.5.12     
#>  [55] prabclus_2.3-3        FactoMineR_2.9        nnet_7.3-19          
#>  [58] glue_1.7.0            stabledist_0.7-1      nlme_3.1-164         
#>  [61] bigparallelr_0.3.2    grid_4.4.0            cluster_2.1.6        
#>  [64] generics_0.1.3        snow_0.4-4            gtable_0.3.4         
#>  [67] flock_0.7             class_7.3-22          tidyr_1.3.1          
#>  [70] utf8_1.2.4            rmutil_1.1.10         flexmix_2.3-19       
#>  [73] BiocGenerics_0.49.1   ggrepel_0.9.5         foreach_1.5.2        
#>  [76] pillar_1.9.0          clValid_0.7           robustbase_0.99-2    
#>  [79] circlize_0.4.15       splines_4.4.0         dplyr_1.1.4          
#>  [82] lattice_0.22-5        survival_3.5-7        bit_4.0.5            
#>  [85] tidyselect_1.2.0      ComplexHeatmap_2.19.0 knitr_1.45           
#>  [88] IRanges_2.37.1        stats4_4.4.0          xfun_0.41            
#>  [91] diptest_0.77-0        timeDate_4032.109     matrixStats_1.2.0    
#>  [94] DEoptimR_1.1-3        DT_0.31               yaml_2.3.8           
#>  [97] boot_1.3-28.1         evaluate_0.23         codetools_0.2-19     
#> [100] kernlab_0.9-32        timeSeries_4032.109   tibble_3.2.1         
#> [103] BiocManager_1.30.22   multcompView_0.1-9    cli_3.6.2            
#> [106] rpart_4.1.23          xtable_1.8-4          munsell_0.5.0        
#> [109] jquerylib_0.1.4       Rcpp_1.0.12           doSNOW_1.0.20        
#> [112] stable_1.1.6          coda_0.19-4.1         png_0.1-8            
#> [115] parallel_4.4.0        modeest_2.4.0         ellipsis_0.3.2       
#> [118] leaps_3.1             ggplot2_3.4.4         mclust_6.0.1         
#> [121] lme4_1.1-35.1         ff_4.0.12             glmnet_4.1-8         
#> [124] mvtnorm_1.2-4         scales_1.3.0          statip_0.2.3         
#> [127] purrr_1.0.2           crayon_1.5.2          fpc_2.2-11           
#> [130] GetoptLong_1.0.5      rlang_1.1.3           cowplot_1.1.3        
#> [133] multcomp_1.4-25