▌multiMiR基本信息
multiMiR包发布于2014年7月,当前版本于2020年4月更新,收录4 个外部数据库,其中8 个为预测的 miRNA-靶基因相互作用关系(DIANA-microT-CDS、ElMMo、MicroCosm、miRanda、miRDB、PicTar、PITA 和 TargetScan),其中有3 个为实验验证的 miRNA-靶基因相互作用关系(miRecords、miRTarBase 和TarBase),以及3个包含药物/疾病等相关信息的数据库(miR2Disease、Pharmaco-miR 和 PhenomiR),为基于miRNA-靶基因调控关系的疾病发病机制及诊疗相关研究提供极大便利。
首先我们安装并加载multiMiR包,资源来自于BiocManager网站,详细可参考:https://www.bioconductor.org/packages/release/bioc/vignettes/multiMiR/inst/doc/multiMiR.html。
##### 了解multiMiR包 #####
# 安装包
BiocManager::install("multiMiR")
# 加载包
library(multiMiR)
# Welcome to multiMiR.
#
# multiMiR database URL has been set to the
# default value: http://multimir.org/
#
# Database Version: 2.3.0 Updated: 2020-04-15
#
# Warning message:
# 程辑包'multiMiR’是用R版本4.1.1 来建造的
关于multiMiR包版本:
multimir_dbInfoVersions()函数查看历史版本,当前2.3.0版本为2020年4月15号更新;
multimir_switchDBVersion()函数切换版本。
## 关于R包版本
db.ver <- multimir_dbInfoVersions() # 查看历史版本
db.ver
# VERSION UPDATED RDA DBNAME SCHEMA PUBLIC
# 1 2.3.0 2020-04-15 multimir_cutoffs_2.3.rda multimir2_3 multiMiR_DB_schema.sql 1
# 2 2.2.0 2017-08-08 multimir_cutoffs_2.2.rda multimir2_2 multiMiR_DB_schema.sql 1
# 3 2.1.0 2016-12-22 multimir_cutoffs_2.1.rda multimir2_1 multiMiR_DB_schema.sql 1
# 4 2.0.0 2015-05-01 multimir_cutoffs.rda multimir multiMiR_DB_schema.sql 1
multimir_switchDBVersion(db_version <- "2.0.0") # 切换版本
# Now using database version: 2.0.0
curr_vers <- db.ver[1, "VERSION"]
multimir_switchDBVersion(db_version <- curr_vers) # 切换版本
# Now using database version: 2.3.0
关于外部数据库:
multimirdbTables()函数查看14个外部数据库,multimirdbInfo()函数查看外部数据库版本信息;
predictedtables() 、validatedtables() 和diseasedrug_tables() 函数分别查看查看网站预测、实验验证的和收录疾病/药物相关性miRNA与靶基因关系数据库;
reversetablelookup("targetscan") 查看targetscan数据库类型为predicted;
multimir_dbCount() 函数查看数据库规模,版本multiMiR包含近 7000 万条记录,其中来自人类、小鼠和大鼠的 5830 个 miRNA 和 97186 个靶基因,以及 64 种药物和 223 个疾病术语。
db.tables <- multimir_dbTables() # 14个外部数据库
db.tables
# [1] "diana_microt" "elmmo" "map_counts" "map_metadata" "microcosm"
# [6] "mir2disease" "miranda" "mirdb" "mirecords" "mirna"
# [11] "mirtarbase" "pharmaco_mir" "phenomir" "pictar" "pita"
# [16] "tarbase" "target" "targetscan"
db.info <- multimir_dbInfo() # 外部数据库版本
db.info
# map_name source_name source_version source_date
# 1 diana_microt DIANA-microT 5 Sept, 2013
# 2 elmmo EIMMo 5 Jan, 2011
# 3 microcosm MicroCosm 5 Sept, 2009
# 4 mir2disease miR2Disease Mar 14, 2011
# 5 miranda miRanda Aug, 2010
# 6 mirdb miRDB 6 June, 2019
# 7 mirecords miRecords 4 Apr 27, 2013
# 8 mirtarbase miRTarBase 7.0 Sept, 2017
# 9 pharmaco_mir Pharmaco-miR (Verified Sets)
# 10 phenomir PhenomiR 2 Feb 15, 2011
# 11 pictar PicTar 2 Dec 21, 2012
# 12 pita PITA 6 Aug 31, 2008
# 13 tarbase TarBase 8 2018
# 14 targetscan TargetScan 7.2 March, 2018
predicted_tables() # 查看网站预测的miRNA-靶基因关系数据库
# [1] "diana_microt" "elmmo" "microcosm" "miranda" "mirdb" "pictar"
# [7] "pita" "targetscan"
validated_tables() # 查看实验验证的miRNA-靶基因关系数据库
# [1] "mirecords" "mirtarbase" "tarbase"
diseasedrug_tables() # 查看疾病/药物相关性miRNA-靶基因关系数据库
# [1] "mir2disease" "pharmaco_mir" "phenomir"
reverse_table_lookup("targetscan") # 查看数据库类型
# [1] "predicted"
db.count <- multimir_dbCount() # 查看数据库规模
db.count
# map_name human_count mouse_count rat_count total_count
# 1 diana_microt 7664602 3747171 0 11411773
# 2 elmmo 3959112 1449133 547191 5955436
# 3 microcosm 762987 534735 353378 1651100
# 4 mir2disease 2875 0 0 2875
# 5 miranda 5429955 2379881 247368 8057204
# 6 mirdb 1990425 1091263 199250 3280938
# 7 mirecords 2425 449 171 3045
# 8 mirtarbase 544588 50673 652 595913
# 9 pharmaco_mir 308 5 0 313
# 10 phenomir 15138 491 0 15629
# 11 pictar 404066 302236 0 706302
# 12 pita 7710936 5163153 0 12874089
# 13 tarbase 433048 209831 1307 644186
# 14 targetscan 13906497 10442093 0 24348590
apply(db.count[,-1], 2, sum) # 当前版本multiMiR包含近 7000 万条记录
# human_count mouse_count rat_count total_count
# 42826962 25371114 1349317 69547393
## 查看multiMiR收录的miRNA、gene、drug和disease
# 当前版本multiMiR收录来自人类、小鼠和大鼠的 5830 个 miRNA 和 97186 个靶基因,以及 64 种药物和 223 个疾病术语
miRNAs <- list_multimir("mirna", limit <- 10)
genes <- list_multimir("gene", limit <- 10)
drugs <- list_multimir("drug", limit <- 10)
diseases <- list_multimir("disease", limit <- 10)
head(miRNAs)
head(genes)
head(drugs)
head(diseases)
▌multiMiR应用示例
getmultimir()函数是multiMiR包的主体,首先运行 ??getmultimir 查看帮助文档。
关于几个参数介绍一下:
get_multimir(url = NULL, org = "hsa", mirna = NULL, target = NULL,
disease.drug = NULL, table = "validated", predicted.cutoff = NULL,
predicted.cutoff.type = "p", predicted.site = "conserved",
summary = FALSE, add.link = FALSE, use.tibble = FALSE, limit = NULL,
legacy.out = FALSE)
org:表示物种来源,默认为人类。即org = "hsa" "human"或 "Homo Sapiens",其他两个物种是小鼠 ("mmu", "mouse"或"Mus musculus")和大鼠 ("rno", "rat"或"Rattus norvegicus")。
mirna:表示待检索的miRNA,默认为NULL。输入miRNA可以是miRNA accession number (i.e. "MIMAT0000072"), mature miRNA ID (i.e. "hsa-miR-199a-3p"), 或二者都有 (i.e. c("MIMAT0000065", "hsa-miR-30a-5p")),可以是单个miRNA或miRNA列表。
target:表示待检索靶基因,默认为NULL。输入基因可以是gene symbol (i.e. c("TP53", "KRAS")),Entrez gene ID (i.e. c(578, 3845))、Ensembl gene ID (i.e. "ENSG00000171791"),或三者都有 (i.e. c("TP53", 3845, "ENSG00000171791")),可以是单个基因或基因列表。
disease.drug:表示检索疾病/药物相关miRNA与靶基因关系,默认为NULL。输入参数可以是 disease(s) 和/或 drug(s) (i.e. c("bladder cancer", "cisplatin"))。
table:表示检索的数据库类型。默认validated,即实验验证的miRNA与靶基因关系数据库("mirecords", "mirtarbase", and "tarbase"), 还可以是predicted,即网站预测的miRNA与靶基因关系数据库("dianamicrot", "elmmo", "microcosm", "miranda", "mirdb", "pictar", "pita", and "targetscan"), 或者是disease.drug,即疾病/药物相关性miRNA与靶基因关系数据库 ( "mir2disease", "pharmacomir", and "phenomir"), 若输入all表示检索以上全部数据库,或者输入某个数据库名称进行单个数据库检索。
predicted.cutoff和predicted.cutoff.type:表示数据库检索范围,默认为NULL,即若predicted.cutoff.type="p",检索范围为预测评分TOP20%,若predicted.cutoff.type="n",检索范围为预测评分TOP300000。
predicted.site:表示predicted数据库中检索保守("conserved")或非保守("nonconserved")位点,或不限制保守或非保守位点("all")。
summary:表示是否统计检索结果,默认FALSE。
1
以hsa-miR-182-5p为例,检索经实验验证的miRNA与靶基因调控关系,检索结果example1为S4格式文件,目录下data为检索结果列表。
## 检索给定miRNA所有经过验证的靶基因
# 以hsa-miR-182-5p为例
example1 <- get_multimir(org = "hsa", # 物种来源 hsa/mmu/rno
mirna = 'hsa-miR-182-5p',
table = "validated", # validated/predicted/all
summary = TRUE) # 检索结果汇总
# Searching mirecords ...
# Searching mirtarbase ...
# Searching tarbase ...
table(example1@data$type) # 查看检索结果 4057条信息
# validated
# 4057
example1_result <- example1@data # 提取data数据,检索结果列表
head(example1_result)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 1 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 2 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 3 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 4 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 5 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 6 mirecords MIMAT0000259 hsa-miR-182-5p ADCY6 112
# target_ensembl experiment support_type pubmed_id type
# 1 ENSG00000174233 17597072 validated
# 2 ENSG00000174233 17597072 validated
# 3 ENSG00000174233 17597072 validated
# 4 ENSG00000174233 17597072 validated
# 5 ENSG00000174233 17597072 validated
# 6 ENSG00000174233 17597072 validated
检索结果列表提供hsa-miR-182-5p靶基因gene symbol、entrez ID和ensembl ID,以及来源数据库、实验方法和参考文献等信息。
接下来统计一下检索结果,mirecords数据库来源信息12条,mirtarbase数据库来源信息260条,tarbase数据库来源信息3785条,总计不重复结果3732条。
# 统计一下检索结果
example1_sum <- example1@summary # 提取summary数据
head(example1_sum)
# mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl mirecords
# 1 MIMAT0000259 hsa-miR-182-5p ADCY6 112 ENSG00000174233 6
# 2 MIMAT0000259 hsa-miR-182-5p MITF 4286 ENSG00000187098 4
# 3 MIMAT0000259 hsa-miR-182-5p AMMECR1L 83607 ENSG00000144233 0
# 4 MIMAT0000259 hsa-miR-182-5p ARRDC3 57561 ENSG00000113369 0
# 5 MIMAT0000259 hsa-miR-182-5p ATF1 466 ENSG00000123268 0
# 6 MIMAT0000259 hsa-miR-182-5p ATP13A3 79572 ENSG00000133657 0
# mirtarbase tarbase validated.sum all.sum
# 1 2 1 3 3
# 2 2 1 3 3
# 3 1 1 2 2
# 4 1 2 2 2
# 5 1 1 2 2
# 6 1 1 2 2
apply(example1@summary[, 6:10], 2, sum)
# mirecords mirtarbase tarbase validated.sum all.sum
# 12 260 3785 3732 3732
筛选出经过荧光素酶报告实验验证的结果,60个。
# 筛选经荧光素酶报告实验验证的结果
example1_Lucifer <- example1_result[grep("Luciferase",
example1_result[, "experiment"]), ]
head(example1_Lucifer)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 16 mirtarbase MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 17 mirtarbase MIMAT0000259 hsa-miR-182-5p ADCY6 112
# 18 mirtarbase MIMAT0000259 hsa-miR-182-5p ATF1 466
# 21 mirtarbase MIMAT0000259 hsa-miR-182-5p BARD1 580
# 22 mirtarbase MIMAT0000259 hsa-miR-182-5p BCL2 596
# 25 mirtarbase MIMAT0000259 hsa-miR-182-5p BDNF 627
# target_ensembl experiment
# 16 ENSG00000174233 Luciferase reporter assay
# 17 ENSG00000174233 Luciferase reporter assay//qRT-PCR
# 18 ENSG00000123268 Luciferase reporter assay
# 21 ENSG00000138376 Luciferase reporter assay
# 22 ENSG00000171791 Flow//Luciferase reporter assay//qRT-PCR//Western blot
# 25 ENSG00000176697 Luciferase reporter assay//qRT-PCR
# support_type pubmed_id type
# 16 Functional MTI 17597072 validated
# 17 Functional MTI 20656788 validated
# 18 Functional MTI 23249749 validated
# 21 Functional MTI 23249749 validated
# 22 Functional MTI 22848417 validated
# 25 Functional MTI 25955435 validated
接下来,将上述3个数据库检索结果绘制Veen图,3个数据库检索结果取交集获得2个基因,即"ADCY6"和"MITF"。
# 提取3个数据检索得到的target_symbol,绘制Veen图
library(VennDiagram)
library(tidyverse)
db <- unique(example1_result$database)
row1 <- example1_result %>% filter(database == db[1])
row2 <- example1_result %>% filter(database == db[2])
row3 <- example1_result %>% filter(database == db[3])
venn.plot <-venn.diagram(
x = list(target1=unique(row1$target_symbol),
target2=unique(row2$target_symbol),
target3=unique(row3$target_symbol)),
filename ="Venn1.tif",
lty ="dotted",
lwd =0.5,
col ="black",
fill =c("dodgerblue", "goldenrod1", "darkorange1"),
alpha =0.60,
cat.col =c("dodgerblue", "goldenrod1", "darkorange1"),
cat.cex =1,
cat.fontface="bold",
margin =0.05,
cex =1)
# 获取交集
venn_list <- list(target1=unique(row1$target_symbol),
target2=unique(row2$target_symbol),
target3=unique(row3$target_symbol))
for (i in 1:length(venn_list)) {
if(i == 1){interGenes <- venn_list[[1]]}
else{interGenes <- intersect(interGenes, venn_list[[i]])}
}
interGenes
# [1] "ADCY6" "MITF"
2
以检索与顺铂有关的人类miRNA-靶基因相互作用关系为例,定义isease.drug参数为顺铂"cisplatin",table 参数为'disease.drug',结果有45个。
# 检索与顺铂有关的人类miRNA-靶基因相互作用关系
example2 <- get_multimir(org = "hsa",
disease.drug = 'cisplatin', # 顺铂
table = 'disease.drug',
summary = TRUE)
# Searching mir2disease ...
# Searching pharmaco_mir ...
# Searching phenomir ...
table(example2@data$type) # 查看检索结果
# disease.drug
# 45
example2_result <- example2@data
head(example2_result)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 1 pharmaco_mir MIMAT0000772 hsa-miR-345-5p ABCC1 4363
# 2 pharmaco_mir MIMAT0000720 hsa-miR-376c-3p ALK7
# 3 pharmaco_mir MIMAT0000423 hsa-miR-125b-5p BAK1 578
# 4 pharmaco_mir hsa-miR-34 BCL2 596
# 5 pharmaco_mir MIMAT0000318 hsa-miR-200b-3p BCL2 596
# 6 pharmaco_mir MIMAT0000617 hsa-miR-200c-3p BCL2 596
# target_ensembl disease_drug paper_pubmedID type
# 1 ENSG00000103222 cisplatin 20099276 disease.drug
# 2 cisplatin 21224400 disease.drug
# 3 ENSG00000030110 cisplatin 21823019 disease.drug
# 4 ENSG00000171791 cisplatin 18803879 disease.drug
# 5 ENSG00000171791 cisplatin 21993663 disease.drug
# 6 ENSG00000171791 cisplatin 21993663 disease.drug
3
以检索网站预测的小鼠中靶向调节Gnb1基因的miRNA为例,检索范围设置为预测评分TOP 35%,检索结果绘制upset展示。
# 检索网站预测的小鼠中靶向调节Gnb1基因的miRNA
example3 <- get_multimir(org = "mmu", # 物种来源 hsa/mmu/rno
target = "Gnb1", # 目标靶基因
table = "predicted", # 预测的miRNA-target
summary = TRUE,
predicted.cutoff = 35, # 检索范围为预测评分TOP 35%
predicted.cutoff.type = "p", # 检索范围为预测评分TOP 35%
predicted.site = "all") # conserved/nonconserved/all
# Searching diana_microt ...
# Searching elmmo ...
# Searching microcosm ...
# Searching miranda ...
# Searching mirdb ...
# Searching pictar ...
# Searching pita ...
# Searching targetscan ...
table(example3@data$type) # 查看检索结果
# predicted
# 716
example3_result <- example3@data
head(example3_result)
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# 1 diana_microt MIMAT0000663 mmu-miR-218-5p Gnb1 14688
# 2 diana_microt MIMAT0017276 mmu-miR-493-5p Gnb1 14688
# 3 diana_microt MIMAT0000656 mmu-miR-139-5p Gnb1 14688
# 4 diana_microt MIMAT0014946 mmu-miR-3074-2-3p Gnb1 14688
# 5 diana_microt MIMAT0000144 mmu-miR-132-3p Gnb1 14688
# 6 diana_microt MIMAT0020608 mmu-miR-5101 Gnb1 14688
# target_ensembl score type
# 1 ENSMUSG00000029064 0.975 predicted
# 2 ENSMUSG00000029064 0.964 predicted
# 3 ENSMUSG00000029064 0.96 predicted
# 4 ENSMUSG00000029064 0.921 predicted
# 5 ENSMUSG00000029064 0.92 predicted
# 6 ENSMUSG00000029064 0.918 predicted
example3_sum <- example3@summary # 统计检索结果
head(example3_sum)
apply(example3@summary[, 6:13], 2, sum)
# diana_microt elmmo microcosm miranda mirdb pictar
# 105 108 5 49 59 18
# pita targetscan
# 175 197
获得8个数据库预测结果,接下来绘制upset图,首先准备upset绘图文件,如下图所示。
# 准备upset绘图文件
db2 <- unique(example3_result$database)
row1 <- example3_result %>% filter(database == db2[1])
row2 <- example3_result %>% filter(database == db2[2])
row3 <- example3_result %>% filter(database == db2[3])
row4 <- example3_result %>% filter(database == db2[4])
row5 <- example3_result %>% filter(database == db2[5])
row6 <- example3_result %>% filter(database == db2[6])
row7 <- example3_result %>% filter(database == db2[7])
row8 <- example3_result %>% filter(database == db2[8])
data3 <- cbind(unique(row1$mature_mirna_id),
unique(row2$mature_mirna_id),
unique(row3$mature_mirna_id),
unique(row4$mature_mirna_id),
unique(row5$mature_mirna_id),
unique(row6$mature_mirna_id),
unique(row7$mature_mirna_id),
unique(row8$mature_mirna_id)) %>%
as.data.frame()
colnames(data3) <- db2
# 构建函数
Upsetdata <- function(data){
require(tidyverse)
Genes <- data %>%
do.call(c,.) %>%
unique()
upset0 = matrix(NA, nrow = length(Genes), ncol = length(data))%>%
as.data.frame() %>%
`rownames<-`(Genes) %>%
rownames_to_column("Gene")
colnames(upset0)<-c("Gene",names(data))
for (i in 2:c(length(data)+1)) {
upset0[match(data[[i-1]],upset0[["Gene"]]),i] = 1
upset0[is.na(upset0[[i]]),i] <- 0
}
return(upset0)
}
upset_data <- Upsetdata(data3)
接下来绘制upset图,需调用UpSetR包。
# 绘制upset图
library(UpSetR)
upset(upset_data,
nsets =13, # 绘制全部13个基因集 默认绘制前5个基因集
keep.order=TRUE, # 保持输入文件列表中基因顺序
point.size =2,
matrix.color=c("#BC80BD"), # 点线图颜色
line.size =0.3,
matrix.dot.alpha = 0.5, # 非交集点透明度
shade.color = "blue", #背景颜色
shade.alpha = 0.1) #背景透明度
4
以示例数据进行演示,示例数据DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因,加载方式为load(url("http://multimir.org/bladder.rda"))。
# 加载示例数据
# DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因
load(url("http://multimir.org/bladder.rda"))
View(DE.gene.info)
View(DE.miRNA.info)
以已知miRNA列表预测靶基因为例,结果14个数据库预测结果中,实验验证的有24806个,网站预测的有37331个,疾病/药物相关的有451个。
# 已知miRNA列表预测靶基因
example4 <- get_multimir(org = "hsa",
mirna = DE.miRNA.up,
table = "all", # 检索14个数据库
summary = TRUE,
predicted.cutoff.type = "p",
predicted.cutoff = 10,
use.tibble = TRUE)
# Searching mirecords ...
# Searching mirtarbase ...
# Searching tarbase ...
# Searching diana_microt ...
# Searching elmmo ...
# Searching microcosm ...
# Searching miranda ...
# Searching mirdb ...
# Searching pictar ...
# Searching pita ...
# Searching targetscan ...
# Searching pharmaco_mir ...
# Joining, by = c("database", "mature_mirna_acc", "mature_mirna_id", "target_symbol", "target_entrez", "target_ensembl", "type")
# Joining, by = c("database", "mature_mirna_acc", "mature_mirna_id", "target_symbol", "target_entrez", "target_ensembl", "type")
save(example4, file = "example4.Rdata") # 时间较长,建议及时保存检索结果
load(file = "example4.Rdata")
table(example4@data$type)
# disease.drug predicted validated
# 451 37331 24806
example4_result <- example4@data
head(example4_result)
example4_sum <- example4@summary
head(example4_sum)
apply(example4@summary[, 6:23], 2, sum)
# diana_microt elmmo microcosm mir2disease miranda mirdb
# 9015 18378 696 114 4048 2722
# mirecords mirtarbase pharmaco_mir phenomir pictar pita
# 115 4175 9 328 25 1656
# tarbase targetscan disease.sum predicted.sum validated.sum all.sum
# 20516 791 22 21626 22970 45847
# 已知mRNA列表检索miRNA
example4_2 <- get_multimir(org = "hsa",
target = DE.entrez.dn,
table = "all", # 检索14个数据库
summary = TRUE,
predicted.cutoff.type = "p",
predicted.cutoff = 10,
use.tibble = TRUE)
save(example4_2, file = "example4_2.Rdata") # 时间较长,建议及时保存检索结果
load(file = "example4_2.Rdata")
table(example4@data$type)
example4_result2 <- example4@data
head(example4_result2)
example4_sum2 <- example4@summary
head(example4_sum2)
apply(example4@summary[, 6:23], 2, sum)
2
以示例数据进行演示,示例数据DE.miRNA.up包含 9 个上调miRNA,DE.entrez.dn有47个下调的基因,加载方式为load(url("http://multimir.org/bladder.rda"))。结果显示,网站预测、实验验证和疾病/药物相关的miRNA与靶基因调节关系有160个、98个和442个。
# 加载示例数据
load(url("http://multimir.org/bladder.rda"))
# 数据库检索
example5 <- get_multimir(org = "hsa",
mirna = DE.miRNA.up,
target = DE.entrez.dn,
table = "all",
summary = TRUE,
predicted.cutoff.type = "p",
predicted.cutoff = 10,
use.tibble = TRUE)
save(example5, file = "example5.Rdata")
load(file = "example5.Rdata")
table(example5@data$type)
# disease.drug predicted validated
# 442 160 98
example5_result <- example5@data
head(example5_result)
example5_sum <- example5@summary
head(example5_sum)
apply(example5@summary[, 6:19], 2, sum)
# diana_microt elmmo mir2disease miranda mirdb mirtarbase
# 27 92 114 13 11 8
# phenomir pita tarbase targetscan disease.sum predicted.sum
# 328 14 90 3 16 88
# validated.sum all.sum
# 91 198
接下来,筛选出实验验证的检索结果,有85个miRNA与靶基因调节关系,其中7个miRNA和41个mRNA,可按照不同数据库展示检索结果。
# 筛选实验验证的结果
result <- select(example5, keytype = "type",
keys = "validated",
columns = columns(example5))
# 去重
unique_pairs <- result[!duplicated(result[, c("mature_mirna_id",
"target_entrez")]), ] # 去重方法一
unique_pairs_2 <-
unique(data.frame(miRNA.ID = as.character(result$mature_mirna_id),
target.Entrez = as.character(result$target_entrez))) # 去重方法二
nrow(unique_pairs) # 85个miRNA-target
nrow(unique_pairs_2) # 85个miRNA-target
unique_miRNA <- unique(result$mature_mirna_id)
length(unique_miRNA) # 7个miRNA
unique_target <- unique(result$target_symbol)
length(unique_target) # 41个mRNA
# 按照不同数据库展示检索结果
example5_split <- split(result, result$database) # 按照数据库分类
example5_split$mirtarbase
# # A tibble: 8 x 13
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# <chr> <chr> <chr> <chr> <chr>
# 1 mirtarbase MIMAT0000087 hsa-miR-30a-5p FDX1 2230
# 2 mirtarbase MIMAT0000259 hsa-miR-182-5p CUL5 8065
# 3 mirtarbase MIMAT0000418 hsa-miR-23b-3p RRAS2 22800
# 4 mirtarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998
# 5 mirtarbase MIMAT0000418 hsa-miR-23b-3p SWAP70 23075
# 6 mirtarbase MIMAT0000087 hsa-miR-30a-5p PEG10 23089
# 7 mirtarbase MIMAT0000245 hsa-miR-30d-5p PEG10 23089
# 8 mirtarbase MIMAT0000420 hsa-miR-30b-5p PEG10 23089
# # ... with 8 more variables: target_ensembl <chr>, experiment <chr>,
# # support_type <chr>, pubmed_id <chr>, type <chr>, score <chr>,
# # disease_drug <chr>, paper_pubmedID <chr>
example5_split$tarbase
# # A tibble: 90 x 13
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# <chr> <chr> <chr> <chr> <chr>
# 1 tarbase MIMAT0000418 hsa-miR-23b-3p LIMA1 51474
# 2 tarbase MIMAT0000449 hsa-miR-146a-5p LIMA1 51474
# 3 tarbase MIMAT0000259 hsa-miR-182-5p FOXN3 1112
# 4 tarbase MIMAT0000449 hsa-miR-146a-5p FOXN3 1112
# 5 tarbase MIMAT0000418 hsa-miR-23b-3p BCAT1 586
# 6 tarbase MIMAT0000449 hsa-miR-146a-5p BCAT1 586
# 7 tarbase MIMAT0000080 hsa-miR-24-3p BCAT1 586
# 8 tarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998
# 9 tarbase MIMAT0000259 hsa-miR-182-5p LIMCH1 22998
# 10 tarbase MIMAT0000259 hsa-miR-182-5p IPO5 3843
# # ... with 80 more rows, and 8 more variables: target_ensembl <chr>,
# # experiment <chr>, support_type <chr>, pubmed_id <chr>, type <chr>,
# # score <chr>, disease_drug <chr>, paper_pubmedID <chr>
我们在筛选一个与膀胱癌相关的检索结果,即hsa-miR-23b-3p和hsa-miR-146a-5p。再检索一下实验验证的靶基因,以hsa-miR-23b-3p为例,结果以Veen图展示,靶基因有3个,即"NOTCH1","PLAU"和"MET"。
# 筛选膀胱癌相关结果
mykeys <- keys(example5, keytype = "disease_drug")
mykeys <- mykeys[grep("bladder", mykeys, ignore.case = TRUE)]
result2 <- select(example5, keytype = "disease_drug", keys = mykeys,
columns = columns(example5))
unique_pairs2 <- result2[!duplicated(result[, c("mature_mirna_id",
"target_entrez")]), ]
nrow(unique_pairs2) # 2个miRNA-target
# [1] 2
unique_pairs2
# # A tibble: 2 x 13
# database mature_mirna_acc mature_mirna_id target_symbol target_entrez
# <chr> <chr> <chr> <chr> <chr>
# 1 mir2disease MIMAT0000418 hsa-miR-23b-3p NA NA
# 2 phenomir MIMAT0000449 hsa-miR-146a-5p NA NA
# # ... with 8 more variables: target_ensembl <chr>, experiment <chr>,
# # support_type <chr>, pubmed_id <chr>, type <chr>, score <chr>,
# # disease_drug <chr>, paper_pubmedID <chr>
# 以hsa-miR-23b-3p为例检索实验验证的靶基因
example5_2 <- get_multimir(org = "hsa",
mirna = 'hsa-miR-23b-3p',
table = "validated",
summary = TRUE)
apply(example5_2@summary[, 6:10], 2, sum)
# mirecords mirtarbase tarbase validated.sum all.sum
# 7 433 3660 3750 3750
example1_Lucifer <- example5_2@data[grep("Luciferase",
example1_result[, "experiment"]), ]
nrow(example1_Lucifer) # Luciferase实验验证结果4条
# [1] 42
# 绘制Veen图
library(VennDiagram)
library(tidyverse)
db <- unique(example5_2@data$database)
row1 <- example5_2@data %>% filter(database == db[1])
row2 <- example5_2@data %>% filter(database == db[2])
row3 <- example5_2@data %>% filter(database == db[3])
venn.plot <-venn.diagram(
x = list(target1=unique(row1$target_symbol),
target2=unique(row2$target_symbol),
target3=unique(row3$target_symbol)),
filename ="Venn1.tif",
lty ="dotted",
lwd =0.5,
col ="black",
fill =c("dodgerblue", "goldenrod1", "darkorange1"),
alpha =0.60,
cat.col =c("dodgerblue", "goldenrod1", "darkorange1"),
cat.cex =1,
cat.fontface="bold",
margin =0.05,
cex =1)
# 获取交集
venn_list <- list(target1=unique(row1$target_symbol),
target2=unique(row2$target_symbol),
target3=unique(row3$target_symbol))
for (i in 1:length(venn_list)) {
if(i == 1){interGenes <- venn_list[[1]]}
else{interGenes <- intersect(interGenes, venn_list[[i]])}
}
interGenes
[1] "NOTCH1" "PLAU" "MET"
以上就是multiMiR包全部内容,开发并维护R包不易,小伙伴们使用时别忘记引用以下文献哦~!
LncRNA/CircRNA系列
lnRNA生信一站式分析神器!差异表达,临床分析,ceRNA网络都有了,还要啥自行车!
引药生变数据库系列传送门(完结)
1.厉害了!疾病和药物基因哪里找?这个数据库是你的课题思路之源!也太太太太太太实用了吧!
2.一站式搞定中药数据挖掘,生信分析里的万金油!靶点,基因,机制,疾病都有了!
3.药物和基因搞对象的全流程,全靠这个数据库?高分SCI的套路,1分钟搞定!
6.研究药物的有福了!这个网药数据库还有这么强大的分析功能!堪称神器!
10.惊喜!10+SCI用这个数据库添彩,靶基因-药物预测,就靠它了!
11.超实用!比你还专业,用这个药物数据库,让你的基金和SCI瞬间开挂!
12.没想到吧?NCBI旗下的药物综合数据库,竟然还有这么多实用功能!给你的科研加个buff!
甲基化数据库系列传送门(完结)
1. 没想到这个数据库这么好用!国自然、SCI都能用得上!导师不明觉厉,直呼内行!
2. 今天文章重点:实用!(6-11分SCI都偏爱这个Style)
4. 用上这个数据库,至少给你的SCI涨1分!屡试不爽,大神最爱!
5. 听说国自然评议专家都爱这个美图?16+SCI里频频出现?送你个必杀神器,助力你的科研!
8. 甲基化+临床分析,你以为只能用R做吗?高分文章的图,你也可以跟着无脑复现!
9. 听说用这个数据库保底5+SCI?5篇文章复现给你看!进来捡宝贝啦!
10. 没想到吧?这些分子修饰的图,零代码就可以搞定!搭配10+SCI,那叫一个和谐!
11. 绝对有创新性!帮老板审了20份标书,发现这个热点研究,居然这个数据库就搞定了!?
12. 自从学会了这个数据库,分子修饰联合临床分析,这样高大上的图都用的飞起了!
13. 多组学+表观遗传+临床联合分析大杀器!用这个数据库就够了!绝对让你爱不释手!
14. 国自然标书里这个亮点设计你想到了没?用上这些美图的8成都中了!
欢迎大家关注解螺旋生信频道-挑圈联靠公号~
联系客服