Contents

About this vignette

This vignette describes how to compute the F-informed multidimensional scaling using the FinfoMDS package. FinfoMDS was developed by Soobin Kim (). A proposal of the method and its full description can be found at:

The vignette was last updated in May 2025.

1 Introduction

Multidimensional scaling (MDS) is a dimensionality reduction technique used in microbial ecology data analysis to represent multivariate structures while preserving pairwise distances between samples. While its improvement has enhanced the ability to reveal data patterns by sample groups, these MDS-based methods often require prior assumptions for inference, limiting their broader application in general microbiome analysis.

Here, we introduce a new MDS-based ordination, F-informed MDS (implemented in the R package FinfoMDS), which configures data distribution based on the F-statistic, the ratio of dispersion between groups that share common and different labels. Our approach offers a well-founded refinement of MDS that aligns with statistical test results, which can be beneficial for broader compositional data analyses in microbiology and ecology.

2 Installation

2.1 Bioconductor official release

To install an official release version of this package, start R (version “4.5”) and enter:

BiocManager::install("FinfoMDS")

For older versions of R, please refer to the appropriate Bioconductor release.

2.2 GitHub development version

The package may be updated before any changes migrate to the official release. The development version can be installed by entering:

devtools::install_github("soob-kim/FinfoMDS")

3 Example

This section outlines the steps for implementing the FinfoMDS package on a microbiome dataset and obtaining its 2D representation using F-informed MDS. As an example, let’s use an algal-associated bacterial community (Kim et al., 2022). First, load a phyloseq-class object by typing:

library(FinfoMDS)
data("microbiome", package = "FinfoMDS")

Next, compute the weighted UniFrac distance from this dataset and obtain its label set:

require(phyloseq)
#> Loading required package: phyloseq
D <- distance(microbiome, method = 'wunifrac') # requires phyloseq package
y <- microbiome@sam_data@.Data[[1]]

Then, compute the F-informed MDS by running:

result <- fmds(lambda = 0.3, threshold_p = 0.05, D = D, y = y)
#> [1] "epoch 0   lambda 0.3   total 0.52   mds 0.45   conf 0.23   p_z 0.475   p_0 0.094"
#> [1] "epoch 1   lambda 0.3   total 0.35   mds 0.24   conf 0.37   p_z 0.416   p_0 0.094"
#> [1] "epoch 2   lambda 0.3   total 0.30   mds 0.23   conf 0.25   p_z 0.289   p_0 0.094"
#> [1] "epoch 3   lambda 0.3   total 0.27   mds 0.22   conf 0.16   p_z 0.172   p_0 0.094"
#> [1] "epoch 4   lambda 0.3   total 0.24   mds 0.23   conf 0.04   p_z 0.099   p_0 0.094"
#> [1] "Lambda 0.30 ...halt iteration"

This procedure will iterate until the 2D distributions converge, as long as the p-value does not deviate by more than threshold_p, or until reaching the default maximum of 100 iterations, whichever occurs first. We have observed that setting lambda between 0.3 and 0.5 typically yields optimal results; however, this hyperparameter can be adjusted as long as it does not exceed 1.

The 2D representation of the community dataset is returned as a matrix and can be visualized by typing:

plot(result, pch = y)

4 Reference

H Kim, JA Kimbrel, CA Vaiana, JR Wollard, X Mayali, and CR Buie (2022). Bacterial response to spatial gradients of algal-derived nutrients in a porous microplate. The ISME Journal, 16(4):1036–1045.

5 Session information

sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] phyloseq_1.53.0  FinfoMDS_0.99.1  BiocStyle_2.37.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6        xfun_0.52           bslib_0.9.0        
#>  [4] ggplot2_3.5.2       rhdf5_2.53.1        Biobase_2.69.0     
#>  [7] lattice_0.22-7      rhdf5filters_1.21.0 vctrs_0.6.5        
#> [10] tools_4.5.0         generics_0.1.4      biomformat_1.37.0  
#> [13] stats4_4.5.0        parallel_4.5.0      tibble_3.3.0       
#> [16] cluster_2.1.8.1     pkgconfig_2.0.3     Matrix_1.7-3       
#> [19] data.table_1.17.4   RColorBrewer_1.1-3  S4Vectors_0.47.0   
#> [22] lifecycle_1.0.4     compiler_4.5.0      farver_2.1.2       
#> [25] stringr_1.5.1       Biostrings_2.77.1   tinytex_0.57       
#> [28] codetools_0.2-20    permute_0.9-7       GenomeInfoDb_1.45.4
#> [31] htmltools_0.5.8.1   sass_0.4.10         yaml_2.3.10        
#> [34] pillar_1.10.2       crayon_1.5.3        jquerylib_0.1.4    
#> [37] MASS_7.3-65         cachem_1.1.0        vegan_2.7-1        
#> [40] magick_2.8.7        iterators_1.0.14    foreach_1.5.2      
#> [43] nlme_3.1-168        tidyselect_1.2.1    digest_0.6.37      
#> [46] stringi_1.8.7       dplyr_1.1.4         reshape2_1.4.4     
#> [49] bookdown_0.43       splines_4.5.0       ade4_1.7-23        
#> [52] fastmap_1.2.0       grid_4.5.0          cli_3.6.5          
#> [55] magrittr_2.0.3      survival_3.8-3      dichromat_2.0-0.1  
#> [58] ape_5.8-1           UCSC.utils_1.5.0    scales_1.4.0       
#> [61] rmarkdown_2.29      XVector_0.49.0      httr_1.4.7         
#> [64] multtest_2.65.0     igraph_2.1.4        evaluate_1.0.3     
#> [67] knitr_1.50          IRanges_2.43.0      mgcv_1.9-3         
#> [70] rlang_1.1.6         Rcpp_1.0.14         glue_1.8.0         
#> [73] BiocManager_1.30.26 BiocGenerics_0.55.0 jsonlite_2.0.0     
#> [76] R6_2.6.1            Rhdf5lib_1.31.0     plyr_1.8.9