CITE-seq data with MuData and Seurat
Danila Bredikhin
European Molecular Biology Laboratory, Heidelberg, Germanydanila.bredikhin@embl.de
Ilia Kats
German Cancer Research Center, Heidelberg, Germanyi.kats@dkfz-heidelberg.de
2023-11-14
PBMC-CITE-seq.Rmd
Introduction
CITE-seq data provide RNA and surface protein counts for the same cells. There are different workflows to analyse these data in R such as with Seurat or with CiteFuse. This tutorial shows how such data stored in MuData (H5MU) files can be read and integrated with Seurat-based workflows.
Installation
The most recent MuDataSeurat
build can be installed from
GitHub:
library(remotes)
remotes::install_github("PMBio/MuDataSeurat")
Loading data
For this tutorial, we will use a subset of cells and features from the output of the CITE-seq integration in muon.
This example dataset can be downloaded as a file in the H5MU format, which is then deserialised to create a Seurat object:
# Download file into a temporary directory
fdir <- tempdir()
mdata_path <- file.path(fdir, "minipbcite.h5mu")
download.file(url="https://github.com/gtca/h5xx-datasets/blob/main/datasets/minipbcite.h5mu?raw=true", destfile=mdata_path, mode="wb")
pbmc <- ReadH5MU(mdata_path)
#> Warning: Missing on read: /var. Seurat does not support global variables
#> metadata (feature_types, gene_ids, highly_variable).
#> Warning in read_layers_to_assay(h5[["mod"]][[mod]], mod): The var_names from
#> modality prot have been renamed as feature names cannot contain '_'. E.g.
#> CD3_TotalSeqB -> CD3-TotalSeqB.
#> Warning: No columnames present in cell embeddings, setting to 'MOFA_1:30'
#> Warning: Keys should be one or more alphanumeric characters followed by an
#> underscore, setting key from MOFA_UMAP_ to MOFAUMAP_
#> Warning: No columnames present in cell embeddings, setting to 'MOFAUMAP_1:2'
#> Warning: No columnames present in cell embeddings, setting to 'UMAP_1:2'
#> Warning: Keys should be one or more alphanumeric characters followed by an
#> underscore, setting key from WNN_UMAP_ to WNNUMAP_
#> Warning: No columnames present in cell embeddings, setting to 'WNNUMAP_1:2'
#> Warning: No columnames present in cell embeddings, setting to 'protPCA_1:31'
#> Warning: No columnames present in cell embeddings, setting to 'protUMAP_1:2'
#> Warning: No columnames present in cell embeddings, setting to 'rnaPCA_1:50'
#> Warning: No columnames present in cell embeddings, setting to 'rnaUMAP_1:2'
pbmc
#> An object of class Seurat
#> 56 features across 411 samples within 2 assays
#> Active assay: prot (29 features, 29 variable features)
#> 2 layers present: counts, data
#> 1 other assay present: rna
#> 8 dimensional reductions calculated: MOFA, MOFA_UMAP, UMAP, WNN_UMAP, protPCA, protUMAP, rnaPCA, rnaUMAP
Here, pbmc
is a full-featured Seurat object that can be
used in downstream analysis workflows.
Visualising data
Importantly, data can now be plotted with Seurat.
Markers
Cells can be coloured by expression level and plotted in a selected latent space:
DefaultAssay(pbmc) <- "rna"
DimPlot(pbmc, reduction = "MOFA_UMAP", label = TRUE, repel = TRUE) + NoLegend() +
FeaturePlot(pbmc, features = "MS4A1", reduction = "MOFA_UMAP")
RNA expression in different celltypes (genes in the subset were pre-selected to be celltype markers):
rna_features <- c("IRF8", "FCGR3A", "CD14", "MS4A1", "IGHD",
"KLRC2", "NKG7", "CCL5", "CD8B", "IL2RA", "IL7R")
DotPlot(pbmc, features = rna_features) + RotatedAxis()
Naïve/memory T cell surface protein expression:
DefaultAssay(pbmc) <- "prot"
prot_features <- c("CD3-TotalSeqB", "CD45RA-TotalSeqB", "CD45RO-TotalSeqB")
t_cells <- c("CD4+ naïve T", "CD4+ memory T", "Treg",
"CD8+ naïve T", "CD8+ memory T")
VlnPlot(pbmc[,pbmc@meta.data$celltype %in% t_cells],
features = prot_features, ncol = 3)
Downstream analysis
Since pbmc
is a Seurat object, we can use corresponding
processing and analysis functions. To provide an example, we will
re-calculate cell neighbourhoods using the previously computed (on the
full dataset) MOFA factors and then compute a non-linear embedding of
cells.
pbmc <- FindNeighbors(pbmc, reduction = "MOFA", dims = 1:30)
#> Computing nearest neighbor graph
#> Computing SNN
pbmc <- RunUMAP(pbmc, reduction = "MOFA", dims = 1:30, reduction.key = "rUMAP_", reduction.name = "rUMAP")
#> Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
#> To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
#> This message will be shown once per session
#> 09:22:06 UMAP embedding parameters a = 0.9922 b = 1.112
#> 09:22:06 Read 411 rows and found 30 numeric columns
#> 09:22:06 Using Annoy for neighbor search, n_neighbors = 30
#> 09:22:06 Building Annoy index with metric = cosine, n_trees = 50
#> 0% 10 20 30 40 50 60 70 80 90 100%
#> [----|----|----|----|----|----|----|----|----|----|
#> **************************************************|
#> 09:22:06 Writing NN index file to temp file /tmp/RtmpugF5DQ/file22267933ba83
#> 09:22:06 Searching Annoy index using 1 thread, search_k = 3000
#> 09:22:06 Annoy recall = 100%
#> 09:22:06 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
#> 09:22:07 Initializing from normalized Laplacian + noise (using RSpectra)
#> 09:22:07 Commencing optimization for 500 epochs, with 13582 positive edges
#> 09:22:07 Optimization finished
DimPlot(pbmc, reduction = "rUMAP")
References
mudata (Python) documentation
muon documentation and tutorials
Session Info
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] Seurat_5.0.0 SeuratObject_5.0.0 sp_2.1-1
#> [4] MuDataSeurat_0.0.0.9000 BiocStyle_2.28.1
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 jsonlite_1.8.7 magrittr_2.0.3
#> [4] spatstat.utils_3.0-4 farver_2.1.1 rmarkdown_2.25
#> [7] fs_1.6.3 ragg_1.2.6 vctrs_0.6.4
#> [10] ROCR_1.0-11 memoise_2.0.1 spatstat.explore_3.2-5
#> [13] htmltools_0.5.7 sass_0.4.7 sctransform_0.4.1
#> [16] parallelly_1.36.0 KernSmooth_2.23-22 bslib_0.5.1
#> [19] htmlwidgets_1.6.2 desc_1.4.2 ica_1.0-3
#> [22] plyr_1.8.9 plotly_4.10.3 zoo_1.8-12
#> [25] cachem_1.0.8 igraph_1.5.1 mime_0.12
#> [28] lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.6-1.1
#> [31] R6_2.5.1 fastmap_1.1.1 fitdistrplus_1.1-11
#> [34] future_1.33.0 shiny_1.7.5.1 digest_0.6.33
#> [37] colorspace_2.1-0 patchwork_1.1.3 rprojroot_2.0.4
#> [40] tensor_1.5 RSpectra_0.16-1 irlba_2.3.5.1
#> [43] textshaping_0.3.7 labeling_0.4.3 progressr_0.14.0
#> [46] fansi_1.0.5 spatstat.sparse_3.0-3 httr_1.4.7
#> [49] polyclip_1.10-6 abind_1.4-5 compiler_4.3.2
#> [52] withr_2.5.2 bit64_4.0.5 fastDummies_1.7.3
#> [55] highr_0.10 MASS_7.3-60 tools_4.3.2
#> [58] lmtest_0.9-40 httpuv_1.6.12 future.apply_1.11.0
#> [61] goftest_1.2-3 glue_1.6.2 nlme_3.1-163
#> [64] promises_1.2.1 grid_4.3.2 Rtsne_0.16
#> [67] cluster_2.1.4 reshape2_1.4.4 generics_0.1.3
#> [70] hdf5r_1.3.8 gtable_0.3.4 spatstat.data_3.0-3
#> [73] tidyr_1.3.0 data.table_1.14.8 utf8_1.2.4
#> [76] spatstat.geom_3.2-7 RcppAnnoy_0.0.21 ggrepel_0.9.4
#> [79] RANN_2.6.1 pillar_1.9.0 stringr_1.5.0
#> [82] spam_2.10-0 RcppHNSW_0.5.0 later_1.3.1
#> [85] splines_4.3.2 dplyr_1.1.3 lattice_0.21-9
#> [88] bit_4.0.5 survival_3.5-7 deldir_1.0-9
#> [91] tidyselect_1.2.0 miniUI_0.1.1.1 pbapply_1.7-2
#> [94] knitr_1.45 gridExtra_2.3 bookdown_0.36
#> [97] scattermore_1.2 xfun_0.41 matrixStats_1.1.0
#> [100] stringi_1.7.12 lazyeval_0.2.2 yaml_2.3.7
#> [103] evaluate_0.23 codetools_0.2-19 tibble_3.2.1
#> [106] BiocManager_1.30.22 cli_3.6.1 uwot_0.1.16
#> [109] xtable_1.8-4 reticulate_1.34.0 systemfonts_1.0.5
#> [112] munsell_0.5.0 jquerylib_0.1.4 Rcpp_1.0.11
#> [115] globals_0.16.2 spatstat.random_3.2-1 png_0.1-8
#> [118] parallel_4.3.2 ellipsis_0.3.2 pkgdown_2.0.7
#> [121] ggplot2_3.4.4 dotCall64_1.1-0 listenv_0.9.0
#> [124] viridisLite_0.4.2 scales_1.2.1 ggridges_0.5.4
#> [127] leiden_0.4.3 purrr_1.0.2 rlang_1.1.2
#> [130] cowplot_1.1.1