CITE-seq data with MuData and Seurat

Introduction

CITE-seq data provide RNA and surface protein counts for the same cells. There are different workflows to analyse these data in R such as with Seurat or with CiteFuse. This tutorial shows how such data stored in MuData (H5MU) files can be read and integrated with Seurat-based workflows.

Installation

The most recent MuDataSeurat build can be installed from GitHub:

library(remotes)
remotes::install_github("PMBio/MuDataSeurat")

Loading libraries

library(MuDataSeurat)
library(Seurat)

Loading data

For this tutorial, we will use a subset of cells and features from the output of the CITE-seq integration in muon.

This example dataset can be downloaded as a file in the H5MU format, which is then deserialised to create a Seurat object:

# Download file into a temporary directory
fdir <- tempdir()
mdata_path <- file.path(fdir, "minipbcite.h5mu")
download.file(url="https://github.com/gtca/h5xx-datasets/blob/main/datasets/minipbcite.h5mu?raw=true", destfile=mdata_path, mode="wb")

pbmc <- ReadH5MU(mdata_path)
#> Warning: Missing on read: /var. Seurat does not support global variables
#> metadata (feature_types, gene_ids, highly_variable).
#> Warning in read_layers_to_assay(h5[["mod"]][[mod]], mod): The var_names from
#> modality prot have been renamed as feature names cannot contain '_'. E.g.
#> CD3_TotalSeqB -> CD3-TotalSeqB.
#> Warning: No columnames present in cell embeddings, setting to 'MOFA_1:30'
#> Warning: Keys should be one or more alphanumeric characters followed by an
#> underscore, setting key from MOFA_UMAP_ to MOFAUMAP_
#> Warning: No columnames present in cell embeddings, setting to 'MOFAUMAP_1:2'
#> Warning: No columnames present in cell embeddings, setting to 'UMAP_1:2'
#> Warning: Keys should be one or more alphanumeric characters followed by an
#> underscore, setting key from WNN_UMAP_ to WNNUMAP_
#> Warning: No columnames present in cell embeddings, setting to 'WNNUMAP_1:2'
#> Warning: No columnames present in cell embeddings, setting to 'protPCA_1:31'
#> Warning: No columnames present in cell embeddings, setting to 'protUMAP_1:2'
#> Warning: No columnames present in cell embeddings, setting to 'rnaPCA_1:50'
#> Warning: No columnames present in cell embeddings, setting to 'rnaUMAP_1:2'
pbmc
#> An object of class Seurat 
#> 56 features across 411 samples within 2 assays 
#> Active assay: prot (29 features, 29 variable features)
#>  2 layers present: counts, data
#>  1 other assay present: rna
#>  8 dimensional reductions calculated: MOFA, MOFA_UMAP, UMAP, WNN_UMAP, protPCA, protUMAP, rnaPCA, rnaUMAP

Here, pbmc is a full-featured Seurat object that can be used in downstream analysis workflows.

Visualising data

Importantly, data can now be plotted with Seurat.

Idents(pbmc) <- "celltype"
DimPlot(pbmc, reduction = "WNN_UMAP")

Markers

Cells can be coloured by expression level and plotted in a selected latent space:

DefaultAssay(pbmc) <- "rna"
DimPlot(pbmc, reduction = "MOFA_UMAP", label = TRUE, repel = TRUE) + NoLegend() + 
FeaturePlot(pbmc, features = "MS4A1", reduction = "MOFA_UMAP")

RNA expression in different celltypes (genes in the subset were pre-selected to be celltype markers):

rna_features <- c("IRF8", "FCGR3A", "CD14", "MS4A1", "IGHD", 
                  "KLRC2", "NKG7", "CCL5", "CD8B", "IL2RA", "IL7R")
DotPlot(pbmc, features = rna_features) + RotatedAxis()

Naïve/memory T cell surface protein expression:

DefaultAssay(pbmc) <- "prot"
prot_features <- c("CD3-TotalSeqB", "CD45RA-TotalSeqB", "CD45RO-TotalSeqB")
t_cells <- c("CD4+ naïve T", "CD4+ memory T", "Treg", 
             "CD8+ naïve T", "CD8+ memory T")
VlnPlot(pbmc[,pbmc@meta.data$celltype %in% t_cells], 
        features = prot_features, ncol = 3)

Downstream analysis

Since pbmc is a Seurat object, we can use corresponding processing and analysis functions. To provide an example, we will re-calculate cell neighbourhoods using the previously computed (on the full dataset) MOFA factors and then compute a non-linear embedding of cells.

pbmc <- FindNeighbors(pbmc, reduction = "MOFA", dims = 1:30)
#> Computing nearest neighbor graph
#> Computing SNN

pbmc <- RunUMAP(pbmc, reduction = "MOFA", dims = 1:30, reduction.key = "rUMAP_", reduction.name = "rUMAP")
#> Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
#> To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
#> This message will be shown once per session
#> 09:22:06 UMAP embedding parameters a = 0.9922 b = 1.112
#> 09:22:06 Read 411 rows and found 30 numeric columns
#> 09:22:06 Using Annoy for neighbor search, n_neighbors = 30
#> 09:22:06 Building Annoy index with metric = cosine, n_trees = 50
#> 0%   10   20   30   40   50   60   70   80   90   100%
#> [----|----|----|----|----|----|----|----|----|----|
#> **************************************************|
#> 09:22:06 Writing NN index file to temp file /tmp/RtmpugF5DQ/file22267933ba83
#> 09:22:06 Searching Annoy index using 1 thread, search_k = 3000
#> 09:22:06 Annoy recall = 100%
#> 09:22:06 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
#> 09:22:07 Initializing from normalized Laplacian + noise (using RSpectra)
#> 09:22:07 Commencing optimization for 500 epochs, with 13582 positive edges
#> 09:22:07 Optimization finished

DimPlot(pbmc, reduction = "rUMAP")

References

Muon: multimodal omics analysis framework preprint
Comprehensive Integration of Single-Cell Data paper
mudata (Python) documentation
muon documentation and tutorials

Session Info

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Seurat_5.0.0            SeuratObject_5.0.0      sp_2.1-1               
#> [4] MuDataSeurat_0.0.0.9000 BiocStyle_2.28.1       
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3     jsonlite_1.8.7         magrittr_2.0.3        
#>   [4] spatstat.utils_3.0-4   farver_2.1.1           rmarkdown_2.25        
#>   [7] fs_1.6.3               ragg_1.2.6             vctrs_0.6.4           
#>  [10] ROCR_1.0-11            memoise_2.0.1          spatstat.explore_3.2-5
#>  [13] htmltools_0.5.7        sass_0.4.7             sctransform_0.4.1     
#>  [16] parallelly_1.36.0      KernSmooth_2.23-22     bslib_0.5.1           
#>  [19] htmlwidgets_1.6.2      desc_1.4.2             ica_1.0-3             
#>  [22] plyr_1.8.9             plotly_4.10.3          zoo_1.8-12            
#>  [25] cachem_1.0.8           igraph_1.5.1           mime_0.12             
#>  [28] lifecycle_1.0.4        pkgconfig_2.0.3        Matrix_1.6-1.1        
#>  [31] R6_2.5.1               fastmap_1.1.1          fitdistrplus_1.1-11   
#>  [34] future_1.33.0          shiny_1.7.5.1          digest_0.6.33         
#>  [37] colorspace_2.1-0       patchwork_1.1.3        rprojroot_2.0.4       
#>  [40] tensor_1.5             RSpectra_0.16-1        irlba_2.3.5.1         
#>  [43] textshaping_0.3.7      labeling_0.4.3         progressr_0.14.0      
#>  [46] fansi_1.0.5            spatstat.sparse_3.0-3  httr_1.4.7            
#>  [49] polyclip_1.10-6        abind_1.4-5            compiler_4.3.2        
#>  [52] withr_2.5.2            bit64_4.0.5            fastDummies_1.7.3     
#>  [55] highr_0.10             MASS_7.3-60            tools_4.3.2           
#>  [58] lmtest_0.9-40          httpuv_1.6.12          future.apply_1.11.0   
#>  [61] goftest_1.2-3          glue_1.6.2             nlme_3.1-163          
#>  [64] promises_1.2.1         grid_4.3.2             Rtsne_0.16            
#>  [67] cluster_2.1.4          reshape2_1.4.4         generics_0.1.3        
#>  [70] hdf5r_1.3.8            gtable_0.3.4           spatstat.data_3.0-3   
#>  [73] tidyr_1.3.0            data.table_1.14.8      utf8_1.2.4            
#>  [76] spatstat.geom_3.2-7    RcppAnnoy_0.0.21       ggrepel_0.9.4         
#>  [79] RANN_2.6.1             pillar_1.9.0           stringr_1.5.0         
#>  [82] spam_2.10-0            RcppHNSW_0.5.0         later_1.3.1           
#>  [85] splines_4.3.2          dplyr_1.1.3            lattice_0.21-9        
#>  [88] bit_4.0.5              survival_3.5-7         deldir_1.0-9          
#>  [91] tidyselect_1.2.0       miniUI_0.1.1.1         pbapply_1.7-2         
#>  [94] knitr_1.45             gridExtra_2.3          bookdown_0.36         
#>  [97] scattermore_1.2        xfun_0.41              matrixStats_1.1.0     
#> [100] stringi_1.7.12         lazyeval_0.2.2         yaml_2.3.7            
#> [103] evaluate_0.23          codetools_0.2-19       tibble_3.2.1          
#> [106] BiocManager_1.30.22    cli_3.6.1              uwot_0.1.16           
#> [109] xtable_1.8-4           reticulate_1.34.0      systemfonts_1.0.5     
#> [112] munsell_0.5.0          jquerylib_0.1.4        Rcpp_1.0.11           
#> [115] globals_0.16.2         spatstat.random_3.2-1  png_0.1-8             
#> [118] parallel_4.3.2         ellipsis_0.3.2         pkgdown_2.0.7         
#> [121] ggplot2_3.4.4          dotCall64_1.1-0        listenv_0.9.0         
#> [124] viridisLite_0.4.2      scales_1.2.1           ggridges_0.5.4        
#> [127] leiden_0.4.3           purrr_1.0.2            rlang_1.1.2           
#> [130] cowplot_1.1.1

Danila Bredikhin

Ilia Kats

2023-11-14