Multimodal data format — MuDatahas been introduced to address the need for cross-platform standard for sharing large-scale multimodal omics data. Importantly, it develops ideas of and is compatible with AnnData standard for storing raw and derived data for unimodal datasets.

In R, multimodal datasets can be stored in Seurat objects. This MuDataSeurat package demonstrates how data can be read from MuData files (H5MU) into Seurat objects as well as how information from Seurat objects can be saved into H5MU files.


The most recent MuDataSeurat build can be installed from GitHub:


For the purpose of this tutorial, we will use SeuratData to obtain data in the form of Seurat objects:


Loading libraries

Writing H5MU files

We’ll use a Seurat object distributed via SeuratData:

#> Installing package into '/home/runner/work/_temp/Library'
#> (as 'lib' is unspecified)
cbmc <- UpdateSeuratObject(cbmc)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Warning: Assay ADT changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Updating slots in ADT
#> Validating object structure for Assay 'RNA'
#> Validating object structure for Assay 'ADT'
#> Object representation is consistent with the most current Seurat version
#> An object of class Seurat 
#> 20511 features across 8617 samples within 2 assays 
#> Active assay: RNA (20501 features, 0 variable features)
#>  2 layers present: counts, data
#>  1 other assay present: ADT

First, we make variable names unique across modalities:

# Append -ADT to feature names in the ADT assay
adt_counts <- cbmc[["ADT"]]@counts
rownames(adt_counts) <- paste(rownames(adt_counts), "ADT", sep = "-")
adt_data <- cbmc[["ADT"]]@data
rownames(adt_data) <- rownames(adt_counts)

adt <- CreateAssayObject(counts = adt_counts)
adt@data <- adt_data

cbmc_u <- CreateSeuratObject(cbmc[["RNA"]])
cbmc_u[["ADT"]] <- adt
DefaultAssay(cbmc_u) <- "ADT"
#> An object of class Seurat 
#> 20511 features across 8617 samples within 2 assays 
#> Active assay: ADT (10 features, 0 variable features)
#>  2 layers present: counts, data
#>  1 other assay present: RNA

We can then use WriteH5MU() to write the contents of the cbmc object to an H5MU file:

WriteH5MU(cbmc_u, "cbmc.h5mu")

Reading H5MU files

We can manually check the top level of the structure of the file:

h5 <- H5File$new("cbmc.h5mu", mode = "r")
#> Class: H5File
#> Filename: /home/runner/work/MuDataSeurat/MuDataSeurat/vignettes/cbmc.h5mu
#> Access type: H5F_ACC_RDONLY
#> Attributes: encoding-type, encoding-version, encoder, encoder-version
#> Listing:
#>  name  obj_type dataset.dims dataset.type_class
#>   mod H5I_GROUP         <NA>               <NA>
#>   obs H5I_GROUP         <NA>               <NA>
#>  obsp H5I_GROUP         <NA>               <NA>
#>   uns H5I_GROUP         <NA>               <NA>
#>   var H5I_GROUP         <NA>               <NA>

Or dig deeper into the file:

#> Class: H5Group
#> Filename: /home/runner/work/MuDataSeurat/MuDataSeurat/vignettes/cbmc.h5mu
#> Group: /mod
#> Attributes: mod-order
#> Listing:
#>  name  obj_type dataset.dims dataset.type_class
#>   ADT H5I_GROUP         <NA>               <NA>
#>   RNA H5I_GROUP         <NA>               <NA>

Creating Seurat objects from H5MU files

This package provides ReadH5MU to create an object with data from an H5MU file. Since H5MU structure has been designed to accommodate more structured information than Seurat, only some data will be read. For instance, Seurat has no support for loading multimodal embeddings or pairwise graphs.

cbmc_r <- ReadH5MU("cbmc.h5mu")
#> An object of class Seurat 
#> 20511 features across 8617 samples within 2 assays 
#> Active assay: RNA (20501 features, 0 variable features)
#>  2 layers present: counts, data
#>  1 other assay present: ADT

Importantly, we recover the information from the original Seurat object:

#>                     orig.ident nCount_RNA nFeature_RNA nCount_ADT
#> CTGTTTACACCGCTAG SeuratProject      18224          910       1540
#> CTCTACGGTGTGGCTC SeuratProject      21210         1410       5216
#> AGCAGCCAGGCTCATT SeuratProject      19970         1007       1539
#> GAATAAGAGATCCCAT SeuratProject      21842          995       1007
#> GTGCATAGTCATGCAT SeuratProject      17679         1046       1642
#> TACACGACACATCCGG SeuratProject      18712          998       1164
#>                     orig.ident nCount_RNA nFeature_RNA nCount_ADT
#> CTGTTTACACCGCTAG SeuratProject      18224          910       1540
#> CTCTACGGTGTGGCTC SeuratProject      21210         1410       5216
#> AGCAGCCAGGCTCATT SeuratProject      19970         1007       1539
#> GAATAAGAGATCCCAT SeuratProject      21842          995       1007
#> GTGCATAGTCATGCAT SeuratProject      17679         1046       1642
#> TACACGACACATCCGG SeuratProject      18712          998       1164

H5AD files

If a Seurat object contains a single modality (assay), it can be written to an H5AD file.

For demonstration, we’ll use a Seurat object with scRNA-seq counts distributed via SeuratDisk:

#> Installing package into '/home/runner/work/_temp/Library'
#> (as 'lib' is unspecified)
pbmc3k <- UpdateSeuratObject(pbmc3k)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Validating object structure for Assay 'RNA'
#> Object representation is consistent with the most current Seurat version
#> An object of class Seurat 
#> 13714 features across 2700 samples within 1 assay 
#> Active assay: RNA (13714 features, 0 variable features)
#>  2 layers present: counts, data

We can use WriteH5AD() to write the contents of the pbmc3k object to an H5AD file since this dataset contains a single modality (assay):

WriteH5AD(pbmc3k, "pbmc3k.h5ad")

This data can be retrieved from an H5AD file with ReadH5AD:

pbmc3k_r <- ReadH5AD("pbmc3k.h5ad")
#> An object of class Seurat 
#> 13714 features across 2700 samples within 1 assay 
#> Active assay: RNA (13714 features, 0 variable features)
#>  2 layers present: counts, data


Session Info

