单细胞之轨迹分析

轨迹分析系列：

单细胞之轨迹分析-1：RNA velocity

单细胞之轨迹分析-2：monocle2 原理解读+实操

Monocle3和Monocle2并没有本质上的区别，只是把降维图从DDRTree改成了UMAP。原因可能是包的作者认为UMAP比DDRTree降维更能反映高维空间的数据。

拟时分析的原理见：Trajectory inference analysis of scRNA-seq data

Monocle2的原理和应用已经介绍过：monocle2

monocle3的三个主要功能：

1. 分群、计数细胞

2. 构建细胞轨迹

3. 差异表达分析

monocle3的工作流程：

Monocle3的官网：https://cole-trapnell-lab.github.io/monocle3/

1. 安装

if (!requireNamespace("BiocManager", quietly = TRUE))install.packages("BiocManager")BiocManager::install(version = "3.10")BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats', 'limma', 'S4Vectors', 'SingleCellExperiment', 'SummarizedExperiment', 'batchelor', 'Matrix.utils'))install.packages("devtools")devtools::install_github('cole-trapnell-lab/leidenbase')devtools::install_github('cole-trapnell-lab/monocle3')

2. 数据准备，创建CDS对象并进行降维。

注意：该数据集使用的是pbmc3k的数据集，由于pbmc都是分化成熟的免疫细胞，理论上并不存在直接的分化关系，因此不适合用来做拟时轨迹分析。这里仅作为学习演示。

library(Seurat)library(monocle3)library(tidyverse)library(patchwork)rm(list=ls())dir.create("Monocle3")setwd("Monocle3")##创建CDS对象并预处理数据pbmc <- readRDS("pbmc.rds")data <- GetAssayData(pbmc, assay = 'RNA', slot = 'counts')cell_metadata <- pbmc@meta.datagene_annotation <- data.frame(gene_short_name = rownames(data))rownames(gene_annotation) <- rownames(data)cds <- new_cell_data_set(data, cell_metadata = cell_metadata, gene_metadata = gene_annotation)

3. 预处理

3.1 标准化和PCA降维

（RNA-seq是使用PCA，如果是处理ATAC-seq的数据用Latent Semantic Indexing)

#⚠️preprocess_cds函数相当于seurat中NormalizeData+ScaleData+RunPCAcds <- preprocess_cds(cds, num_dim = 50)plot_pc_variance_explained(cds)

这里横轴的50个PC就是preprocess_cds()中num_dim 设置的50个PC，如果这里设置的100，图的横轴也会展示100个PC。

3.2 可视化

umap降维

cds <- reduce_dimension(cds,preprocess_method = "PCA") #preprocess_method默认是PCAplot_cells(cds)

color_cells_by参数设置umap图的颜色，可以是colData(cds)中的任何一列。

colnames(colData(cds))[1] "orig.ident" "nCount_RNA" [3] "nFeature_RNA" "percent.mt" [5] "RNA_snn_res.0.5" "seurat_clusters"[7] "cell_type" "Size_Factor"

#以之前的Seurat分群来添加颜色，和原有的Seurat分群对比p1 <- plot_cells(cds, reduction_method="UMAP", color_cells_by="seurat_clusters") + ggtitle('cds.umap')##从seurat导入整合过的umap坐标cds.embed <- cds@int_colData$reducedDims$UMAPint.embed <- Embeddings(pbmc, reduction = "umap")int.embed <- int.embed[rownames(cds.embed),]cds@int_colData$reducedDims$UMAP <- int.embedp2 <- plot_cells(cds, reduction_method="UMAP", color_cells_by="seurat_clusters") + ggtitle('int.umap')p = p1|p2ggsave("Reduction_Compare.pdf", plot = p, width = 10, height = 5)

左图是monocle3重新降维的结果，右图是之前seurat降维的结果

如果细胞数目特别多（>10,000细胞或更多），可以设置一些参数来加快UMAP运行速度。在reduce_dimension()函数中设置umap.fast_sgd=TRUE可以使用随机梯度下降方法（fast stochastic gradient descent method）加速运行。还可以使用cores参数设置多线程运算。

可视化指定基因

ciliated_genes <- c("CD4","CD52","JUN")plot_cells(cds, genes=ciliated_genes, label_cell_groups=FALSE, show_trajectory_graph=FALSE)

也可以使用tSNE降维

cds <- reduce_dimension(cds, reduction_method="tSNE")plot_cells(cds, reduction_method="tSNE", color_cells_by="seurat_clusters")

随后也可使用Monocle3分cluster，鉴定每个cluster的marker基因并进行细胞注释等等。由于在Seurat的操作中已经对数据进行了注释，就不再使用Monocle3进行这些操作。

plot_cells(cds, reduction_method="UMAP", color_cells_by="cell_type")

4. Cluster your cells

这里的cluster其实是做分区，不同分区的细胞会进行单独的轨迹分析。

cds <- cluster_cells(cds)plot_cells(cds, color_cells_by = "partition")

5. 构建细胞轨迹

5.1 轨迹学习Learn the trajectory graph（使用learn_graph()函数）

## 识别轨迹cds <- learn_graph(cds)p = plot_cells(cds, color_cells_by = "cell_type", label_groups_by_cluster=FALSE, label_leaves=FALSE, label_branch_points=FALSE)ggsave("Trajectory.pdf", plot = p, width = 8, height = 6)

上面这个图将被用于许多下游分析，比如分支分析和差异表达分析。

plot_cells(cds, color_cells_by = "cell_type", label_groups_by_cluster=FALSE,+ label_leaves=TRUE, label_branch_points=TRUE,graph_label_size=1.5)

黑色的线显示的是graph的结构。数字带白色圆圈表示不同的结局，也就是叶子。数字带黑色圆圈代表分叉点，从这个点开始，细胞可以有多个结局。这些数字可以通过label_leaves和label_branch_points参数设置。

5.2 细胞按拟时排序

在学习了graph之后，我们就可以根据学习的发育轨迹（拟时序）排列细胞。

为了对细胞进行排序，我们首先需要告诉Monocle哪里是这个过程的起始点。也就是需要指定轨迹的'roots'。

手动选择root

# 解决order_cells(cds)报错"object 'V1' not found"# rownames(cds@principal_graph_aux[["UMAP"]]$dp_mst) <- NULL# colnames(cds@int_colData@listData$reducedDims@listData$UMAP) <- NULLcds <- order_cells(cds)

运行上面的代码后，会跳出这个窗口。可以手动在图上选择一个位置，然后点击Done。（比如图上我选择的红点。）可以选择多个位置。

p = plot_cells(cds, color_cells_by = "pseudotime", label_cell_groups = FALSE, label_leaves = FALSE, label_branch_points = FALSE)ggsave("Trajectory_Pseudotime.pdf", plot = p, width = 8, height = 6)saveRDS(cds, file = "cds.rds")

以刚刚选择的点为root绘制的轨迹图。注意图上有一些灰色的部分，是因为这些细胞和我们选择的起点没有分化关系，因此这部分细胞的拟时轨迹没有被定义。

6. 差异表达分析

There are two approaches for differential analysis in Monocle:

Regression analysis: using fit_models(), you can evaluate whether each gene depends on variables such as time, treatments, etc.

Graph-autocorrelation analysis: using graph_test(), you can find genes that vary over a trajectory or between clusters.

6.1 寻找拟时轨迹差异基因

#graph_test分析最重要的结果是莫兰指数（morans_I），其值在-1至1之间，0代表此基因没有#空间共表达效应，1代表此基因在空间距离相近的细胞中表达值高度相似。Track_genes <- graph_test(cds, neighbor_graph="principal_graph", cores=6)Track_genes <- Track_genes[,c(5,2,3,4,1,6)] %>% filter(q_value < 1e-3)write.csv(Track_genes, "Trajectory_genes.csv", row.names = F)

6.2 挑选top10画图展示

Track_genes_sig <- Track_genes %>% top_n(n=10, morans_I) %>% pull(gene_short_name) %>% as.character()

基因表达趋势图

p <- plot_genes_in_pseudotime(cds[Track_genes_sig,], color_cells_by="seurat_clusters", min_expr=0.5, ncol = 2)ggsave("Genes_Jitterplot.pdf", plot = p, width = 8, height = 6)

FeaturePlot图

p <- plot_cells(cds, genes=Track_genes_sig, show_trajectory_graph=FALSE, label_cell_groups=FALSE, label_leaves=FALSE)p$facet$params$ncol <- 5ggsave("Genes_Featureplot.pdf", plot = p, width = 20, height = 8)

寻找共表达基因模块

Track_genes <- read.csv("Trajectory_genes.csv")genelist <- pull(Track_genes, gene_short_name) %>% as.character()gene_module <- find_gene_modules(cds[genelist,], resolution=1e-1, cores = 6)write.csv(gene_module, "Genes_Module.csv", row.names = F)cell_group <- tibble::tibble(cell=row.names(colData(cds)), cell_group=colData(cds)$seurat_clusters)agg_mat <- aggregate_gene_expression(cds, gene_module, cell_group)row.names(agg_mat) <- stringr::str_c("Module ", row.names(agg_mat))p <- pheatmap::pheatmap(agg_mat, scale="column", clustering_method="ward.D2")ggsave("Genes_Module.pdf", plot = p, width = 8, height = 8)

提取拟时分析结果返回seurat对象

pseudotime <- pseudotime(cds, reduction_method = 'UMAP')pseudotime <- pseudotime[rownames(pbmc@meta.data)]pbmc$pseudotime <- pseudotimep = FeaturePlot(pbmc, reduction = "umap", features = "pseudotime")# pseudotime中有无限值，无法绘图。ggsave("Pseudotime_Seurat.pdf", plot = p, width = 8, height = 6)saveRDS(pbmc, file = "sco_pseudotime.rds")

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。