Getting started with MuData for Seurat
Danila Bredikhin
European Molecular Biology Laboratory, Heidelberg, Germanydanila.bredikhin@embl.de
Ilia Kats
German Cancer Research Center, Heidelberg, Germanyi.kats@dkfz-heidelberg.de
2023-11-14
Getting-Started.Rmd
Introduction
Multimodal data format — MuData — has been introduced to address the need for cross-platform standard for sharing large-scale multimodal omics data. Importantly, it develops ideas of and is compatible with AnnData standard for storing raw and derived data for unimodal datasets.
In R, multimodal datasets can be stored in Seurat objects. This
MuDataSeurat
package demonstrates how data can be read from
MuData files (H5MU) into Seurat objects as well as how information from
Seurat objects can be saved into H5MU files.
Installation
The most recent MuDataSeurat
build can be installed from
GitHub:
library(remotes)
remotes::install_github("PMBio/MuDataSeurat")
For the purpose of this tutorial, we will use SeuratData
to obtain data in the form of Seurat objects:
devtools::install_github('satijalab/seurat-data')
Loading libraries
library(MuDataSeurat)
library(Seurat)
suppressWarnings(library(SeuratData))
library(hdf5r)
Writing H5MU files
We’ll use a Seurat object distributed via
SeuratData
:
suppressWarnings(InstallData("cbmc"))
#> Installing package into '/home/runner/work/_temp/Library'
#> (as 'lib' is unspecified)
data("cbmc")
cbmc <- UpdateSeuratObject(cbmc)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Warning: Assay ADT changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Updating slots in ADT
#> Validating object structure for Assay 'RNA'
#> Validating object structure for Assay 'ADT'
#> Object representation is consistent with the most current Seurat version
cbmc
#> An object of class Seurat
#> 20511 features across 8617 samples within 2 assays
#> Active assay: RNA (20501 features, 0 variable features)
#> 2 layers present: counts, data
#> 1 other assay present: ADT
First, we make variable names unique across modalities:
# Append -ADT to feature names in the ADT assay
adt_counts <- cbmc[["ADT"]]@counts
rownames(adt_counts) <- paste(rownames(adt_counts), "ADT", sep = "-")
adt_data <- cbmc[["ADT"]]@data
rownames(adt_data) <- rownames(adt_counts)
adt <- CreateAssayObject(counts = adt_counts)
adt@data <- adt_data
cbmc_u <- CreateSeuratObject(cbmc[["RNA"]])
cbmc_u[["ADT"]] <- adt
DefaultAssay(cbmc_u) <- "ADT"
cbmc_u
#> An object of class Seurat
#> 20511 features across 8617 samples within 2 assays
#> Active assay: ADT (10 features, 0 variable features)
#> 2 layers present: counts, data
#> 1 other assay present: RNA
We can then use WriteH5MU()
to write the contents of the
cbmc
object to an H5MU
file:
WriteH5MU(cbmc_u, "cbmc.h5mu")
Reading H5MU files
We can manually check the top level of the structure of the file:
h5 <- H5File$new("cbmc.h5mu", mode = "r")
h5
#> Class: H5File
#> Filename: /home/runner/work/MuDataSeurat/MuDataSeurat/vignettes/cbmc.h5mu
#> Access type: H5F_ACC_RDONLY
#> Attributes: encoding-type, encoding-version, encoder, encoder-version
#> Listing:
#> name obj_type dataset.dims dataset.type_class
#> mod H5I_GROUP <NA> <NA>
#> obs H5I_GROUP <NA> <NA>
#> obsp H5I_GROUP <NA> <NA>
#> uns H5I_GROUP <NA> <NA>
#> var H5I_GROUP <NA> <NA>
Or dig deeper into the file:
h5[["mod"]]
#> Class: H5Group
#> Filename: /home/runner/work/MuDataSeurat/MuDataSeurat/vignettes/cbmc.h5mu
#> Group: /mod
#> Attributes: mod-order
#> Listing:
#> name obj_type dataset.dims dataset.type_class
#> ADT H5I_GROUP <NA> <NA>
#> RNA H5I_GROUP <NA> <NA>
h5$close()
Creating Seurat objects from H5MU files
This package provides ReadH5MU
to create an object with
data from an H5MU file. Since H5MU structure has been designed to
accommodate more structured information than Seurat, only some data will
be read. For instance, Seurat has no support for loading multimodal
embeddings or pairwise graphs.
cbmc_r <- ReadH5MU("cbmc.h5mu")
cbmc_r
#> An object of class Seurat
#> 20511 features across 8617 samples within 2 assays
#> Active assay: RNA (20501 features, 0 variable features)
#> 2 layers present: counts, data
#> 1 other assay present: ADT
Importantly, we recover the information from the original Seurat object:
head(cbmc_u@meta.data[,1:4])
#> orig.ident nCount_RNA nFeature_RNA nCount_ADT
#> CTGTTTACACCGCTAG SeuratProject 18224 910 1540
#> CTCTACGGTGTGGCTC SeuratProject 21210 1410 5216
#> AGCAGCCAGGCTCATT SeuratProject 19970 1007 1539
#> GAATAAGAGATCCCAT SeuratProject 21842 995 1007
#> GTGCATAGTCATGCAT SeuratProject 17679 1046 1642
#> TACACGACACATCCGG SeuratProject 18712 998 1164
head(cbmc_r@meta.data[,1:4])
#> orig.ident nCount_RNA nFeature_RNA nCount_ADT
#> CTGTTTACACCGCTAG SeuratProject 18224 910 1540
#> CTCTACGGTGTGGCTC SeuratProject 21210 1410 5216
#> AGCAGCCAGGCTCATT SeuratProject 19970 1007 1539
#> GAATAAGAGATCCCAT SeuratProject 21842 995 1007
#> GTGCATAGTCATGCAT SeuratProject 17679 1046 1642
#> TACACGACACATCCGG SeuratProject 18712 998 1164
H5AD files
If a Seurat
object contains a single modality (assay),
it can be written to an H5AD file.
For demonstration, we’ll use a Seurat object with scRNA-seq counts
distributed via SeuratDisk
:
suppressWarnings(InstallData("pbmc3k"))
#> Installing package into '/home/runner/work/_temp/Library'
#> (as 'lib' is unspecified)
data("pbmc3k")
pbmc3k <- UpdateSeuratObject(pbmc3k)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Validating object structure for Assay 'RNA'
#> Object representation is consistent with the most current Seurat version
pbmc3k
#> An object of class Seurat
#> 13714 features across 2700 samples within 1 assay
#> Active assay: RNA (13714 features, 0 variable features)
#> 2 layers present: counts, data
We can use WriteH5AD()
to write the contents of the
pbmc3k
object to an H5AD
file since this
dataset contains a single modality (assay):
WriteH5AD(pbmc3k, "pbmc3k.h5ad")
This data can be retrieved from an H5AD file with
ReadH5AD
:
pbmc3k_r <- ReadH5AD("pbmc3k.h5ad")
pbmc3k_r
#> An object of class Seurat
#> 13714 features across 2700 samples within 1 assay
#> Active assay: RNA (13714 features, 0 variable features)
#> 2 layers present: counts, data
References
mudata (Python) documentation
muon documentation and web page
Session Info
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] pbmc3k.SeuratData_3.1.4 cbmc.SeuratData_3.1.4 hdf5r_1.3.8
#> [4] SeuratData_0.2.2.9001 Seurat_5.0.0 SeuratObject_5.0.0
#> [7] sp_2.1-1 MuDataSeurat_0.0.0.9000 BiocStyle_2.28.1
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 jsonlite_1.8.7 magrittr_2.0.3
#> [4] spatstat.utils_3.0-4 rmarkdown_2.25 fs_1.6.3
#> [7] ragg_1.2.6 vctrs_0.6.4 ROCR_1.0-11
#> [10] memoise_2.0.1 spatstat.explore_3.2-5 htmltools_0.5.7
#> [13] sass_0.4.7 sctransform_0.4.1 parallelly_1.36.0
#> [16] KernSmooth_2.23-22 bslib_0.5.1 htmlwidgets_1.6.2
#> [19] desc_1.4.2 ica_1.0-3 plyr_1.8.9
#> [22] plotly_4.10.3 zoo_1.8-12 cachem_1.0.8
#> [25] igraph_1.5.1 mime_0.12 lifecycle_1.0.4
#> [28] pkgconfig_2.0.3 Matrix_1.6-1.1 R6_2.5.1
#> [31] fastmap_1.1.1 fitdistrplus_1.1-11 future_1.33.0
#> [34] shiny_1.7.5.1 digest_0.6.33 colorspace_2.1-0
#> [37] patchwork_1.1.3 rprojroot_2.0.4 tensor_1.5
#> [40] RSpectra_0.16-1 irlba_2.3.5.1 textshaping_0.3.7
#> [43] progressr_0.14.0 fansi_1.0.5 spatstat.sparse_3.0-3
#> [46] httr_1.4.7 polyclip_1.10-6 abind_1.4-5
#> [49] compiler_4.3.2 bit64_4.0.5 fastDummies_1.7.3
#> [52] MASS_7.3-60 rappdirs_0.3.3 tools_4.3.2
#> [55] lmtest_0.9-40 httpuv_1.6.12 future.apply_1.11.0
#> [58] goftest_1.2-3 glue_1.6.2 nlme_3.1-163
#> [61] promises_1.2.1 grid_4.3.2 Rtsne_0.16
#> [64] cluster_2.1.4 reshape2_1.4.4 generics_0.1.3
#> [67] gtable_0.3.4 spatstat.data_3.0-3 tidyr_1.3.0
#> [70] data.table_1.14.8 utf8_1.2.4 spatstat.geom_3.2-7
#> [73] RcppAnnoy_0.0.21 ggrepel_0.9.4 RANN_2.6.1
#> [76] pillar_1.9.0 stringr_1.5.0 spam_2.10-0
#> [79] RcppHNSW_0.5.0 later_1.3.1 splines_4.3.2
#> [82] dplyr_1.1.3 lattice_0.21-9 bit_4.0.5
#> [85] survival_3.5-7 deldir_1.0-9 tidyselect_1.2.0
#> [88] miniUI_0.1.1.1 pbapply_1.7-2 knitr_1.45
#> [91] gridExtra_2.3 bookdown_0.36 scattermore_1.2
#> [94] xfun_0.41 matrixStats_1.1.0 stringi_1.7.12
#> [97] lazyeval_0.2.2 yaml_2.3.7 evaluate_0.23
#> [100] codetools_0.2-19 tibble_3.2.1 BiocManager_1.30.22
#> [103] cli_3.6.1 uwot_0.1.16 xtable_1.8-4
#> [106] reticulate_1.34.0 systemfonts_1.0.5 munsell_0.5.0
#> [109] jquerylib_0.1.4 Rcpp_1.0.11 globals_0.16.2
#> [112] spatstat.random_3.2-1 png_0.1-8 parallel_4.3.2
#> [115] ellipsis_0.3.2 pkgdown_2.0.7 ggplot2_3.4.4
#> [118] dotCall64_1.1-0 listenv_0.9.0 viridisLite_0.4.2
#> [121] scales_1.2.1 ggridges_0.5.4 crayon_1.5.2
#> [124] leiden_0.4.3 purrr_1.0.2 rlang_1.1.2
#> [127] cowplot_1.1.1