打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
关键基因和hub基因(生物网络角度)
写在前面
It's on the front
这篇文章仍然来自几篇文章及自己平时的积累,主要阐述关键基因和hub基因。很多人误以为hub基因就是关键基因,甚至有人认为差异表达基因就是关键基因。在正式看本文章之前,我先以个人理解的角度简单的来说明这三者之间的关系,不同见解的请留言。
This article still comes from several articles and its own daily accumulation, focusing on key genes and hub genes. Many people mistakenly believe that the hub gene is the key gene, and even some people think that the differential expression gene is the key gene. Before I take a formal look at this article, I will briefly explain the relationship between the three in terms of personal understanding. Please leave a comment.
差异表达基因是两个group之间有统计学差异的gene,以芯片为例的话,几万个探针里可能差异的就1000个左右(当然根据设定阈值差异很大) A differentially expressed gene is a gene that has a statistical difference between the two groups. In a chip, for example, there may be about 1,000 differences among tens of thousands of probes (depending on the set threshold, of course)
hub基因,是degree高的gene,在基因表达网络中有高的连接度degree,不涉及betweeness等。并且hub基因的筛选有很大的人为因素,到底是取前5%还是10%没有具体要求,一般建议5%。也就是说这是一个很宽松的设定。 The hub gene, which is high in degree, has a high degree of connectivity in the gene expression network, not involved in betweeness and so on. And the screening of hub genes has a lot of human factors, whether the top 5% or 10% is not specific requirements, the general recommendation of 5% . Which means it's a pretty loose setting
关键基因,有人从hub里挑靠前的,有人从差异表达基因里挑p值大的。到怎么才算关键基因?笼统来说,假如你这个基因被敲减,表型显著消失,那肯定是关键基因。但仅从生物信息分析角度怎么挑?不可能有一种方法就可以直接解决这个问题,现在只从表达网络的角度,稍后我会写一篇多个角度如何筛选关键基因的文章。,其范围要比hub小。hub不一定关键,关键不一定hub。 Key genes, some from the hub to pick the first, others from the differential expression of genes to pick the P value of large. How do you define a key gene? In general, if your gene is knocked down and your phenotype disappears significantly, that must be the key gene. But what about just from a bioinformatics point of view? There is no one way to solve this problem directly, just from the point of view of the expression network, and later I'll write an article on how to screen for key genes from multiple angles. The scope is smaller than the hub. Hub not necessarily critical, critical not necessarily hub
总之,在数目上获范畴上
In short, in terms of numbers, in terms of categories
DGEs>Hubs>key genes(candidate genes)
Dges 集中关键基因(候选基因)
------------------------------------------------
好了,开始正文吧
All right, let's get down to business
HUB 基因
Hub Gene
The WGCNA approach typically deals with the identification of gene modules by using the gene expression levels that are highly correlated across samples. This technique has been successfully utilized to detect gene modules in Arabidopsis, rice, maize and poplar for various biotic and abiotic stresses . Further, this approach also leads to construction of Gene Co-expression Network (GCN), a scale free network, where, genes are represented as nodes and edges depict associations among genes . In such network, highly connected genes are called hub genes, which are expected to play an important role in understanding the biological mechanism of response under stresses/conditions. Identification of hub genes will also help in mitigating the stress in plants through genetic engineering. The existing approaches have mainly focused on hub gene identification, based only on gene connection degrees in the GCN. Moreover, these techniques select such genes empirically without any statistical criteria. Besides, few approaches can be found in the literature for the identification of hub nodes in a scale free network.
Wgcna 方法通常处理通过使用基因表达水平在样本之间高度相关的基因模块的鉴定。 该技术已成功应用于 Arabidopsis、水稻、玉米和杨树等植物对各种生物和非生物胁迫的基因组件检测。 此外,这种方法还导致了基因共表达网络(GCN)的构建,一个无标度的网络,其中,基因表示为节点,边描述基因之间的关联。 在这样的网络中,高度连接的基因被称为枢纽基因,它们在理解胁迫 / 条件下反应的生物学机制方面发挥着重要作用。 中心基因的鉴定也将有助于通过基因工程减轻植物的压力。 现有的基因识别方法主要集中在基因连接度的基因轮毂基因识别上。 此外,这些技术在没有任何统计标准的情况下根据经验选择这些基因。 此外,对于无标度网络中的枢纽节点识别,文献中的方法很少。
这里可以看出,hub基因是是在无尺度共表达网络中存在的,对应着degree,也就是说在GCN中。现存的方法主要关注hub基因的鉴定,基于的就是GCN中的连接度,这些技术只是凭经验选择,并没有统计学标准。另外,在文献中很少有方法发现来鉴定无尺度网络的中hub nodes。
所以作者提出了一个算法,并写了一个包,对hub gene提供p值,可以根据p值标准来减少hub gene数目。
包在这里
文章地址1
文章地址2
As can be seen here, the hub genes are present in the scale-free co-expression network corresponding to degree, that is, in GCN. Existing methods focus on the identification of hub genes based on connectivity in GCN, which are selected empirically and have no statistical criteria. In addition, few methods have been found in the literature to identify the central hub nodes of the scale-free network. So the authors propose an algorithm and write a package that provides a p value to hub gene to reduce the number of hub gene based on the P value criterion. The package is here, Article Address 1, Article Address 2
It has been a long-standing长久存在的 goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has been on highly connected proteins (“hubs”). As a complementary notion, it is possible to define bottlenecks as proteins with a high betweenness centrality (i.e., network nodes that have many “shortest paths” going through them, analogous to major bridges and tunnels on a highway map). Bottlenecks are, in fact, key connector proteins with surprising functional and dynamic properties. In particular, they are more likely to be essential proteins. In fact, in regulatory and other directed networks, betweenness (i.e., “bottleneck-ness”) is a much more significant indicator of essentiality than degree (i.e., “hub-ness”). Furthermore, bottlenecks correspond to the dynamic components of the interaction network—they are significantly less well coexpressed with their neighbors than nonbottlenecks, implying that expression dynamics is wired into the network topology.
A network is a graph consisting of a number of nodes with edges connecting them. Recently, network models have been widely applied to biological systems. Here, we are mainly interested in two types of biological networks: the interaction network, where nodes are proteins and edges connect interacting partners; and the regulatory network, where nodes are proteins and edges connect transcription factors and their targets. Betweenness is one of the most important topological properties of a network. It measures the number of shortest paths going through a certain node. Therefore, nodes with the highest betweenness control most of the information flow in the network, representing the critical points of the network. We thus call these nodes the “bottlenecks” of the network. Here, we focus on bottlenecks in protein networks. We find that, in the regulatory network, where there is a clear concept of information flow, protein bottlenecks indeed have a much higher tendency to be essential genes. In this type of network, betweenness is a good predictor of essentiality. Biological researchers can therefore use the betweenness as one more feature to choose potential targets for detailed analysis.
寻找蛋白质网络的拓扑性质和功能特征之间的关系是系统生物学的一个长期目标。 然而,大多数网络研究的重点是高度连接的蛋白质(“枢纽”)。 作为一个补充概念,可以将瓶颈定义为具有高中间中心性的蛋白质(即具有许多“最短路径”的网络节点通过它们,类似于公路地图上的主要桥梁和隧道)。 事实上,瓶颈是具有惊人功能和动态特性的关键连接蛋白质。 特别是,它们更可能是必需的蛋白质。 事实上,在监管和其他定向网络中,中间性(即“瓶颈性”)是比程度(即“中心性”)更重要的衡量重要性的指标。 此外,瓶颈与交互网络的动态组成部分相对应ーー它们与邻居的协同表达明显不如非瓶颈,这意味着表达动态被连接到网络拓扑结构中。 近年来,网络模型在生物系统中得到了广泛应用。 在这里,我们主要对两种类型的生物网络感兴趣: 相互作用网络,其中节点是蛋白质和边缘连接相互作用的伙伴; 和调控网络,其中节点是蛋白质和边缘连接转录因子和他们的目标。 介于性是网络最重要的拓扑性质之一。 它度量通过某个节点的最短路径的数量。 因此,中间性最高的节点控制着网络中的大部分信息流,代表着网络的关键点。 因此,我们称这些节点为网络的“瓶颈”。 在这里,我们关注的是蛋白质网络的瓶颈。 我们发现,在有明确信息流概念的调控网络中,蛋白质瓶颈确实有更高的趋势成为必需基因。 在这种类型的网络中,介于中间性是一个很好的预测重要性的指标。 因此,生物学研究人员可以利用介于两者之间作为另一个特征来选择潜在的分析目标。
Figure1.png
Figure2.png
下面是关于hub和bottlenecks的区别解释
Here's the difference between hub and bottlenecks
Central complex members have a low betweenness and are hub–nonbottlenecks. 中心复合体成员低betweenness,属于hub-nonbottlenecks.
中心复合体成员具有较低的中间性,并且是非中心瓶颈的。 Something something something something something something something something something something something 介于两者之间,something something 枢纽-非瓶颈。
Because of the high connectivity inside these complexes, paths can go through them and all their neighbors. On the other hand, hub–bottlenecks tend to correspond to highly central proteins that connect several complexes or are peripheral members of central complexes.
由于这些复合体内部的高度连通性,路径可以穿过它们和它们的所有邻居。 另一方面,枢纽瓶颈倾向于对应于连接多个复合体的高度中心蛋白质或中心复合体的外围成员。
Hub-bottlenecks倾向于对应那些高中心性蛋白,连接几个复合体,或者是中心复合体的周边成员,他们有高betweenness的事实显示这些蛋白不是简单的大的蛋白复合体的成员(nonbottleneck-hubs的特点),而是把这个复合体和网络中其他部分连接起来,一定意义上说,是真正的连接度瓶颈。
The fact that Hub-bottlenecks tend to correspond to those high-central proteins, linking several complexes, or peripheral members of the central complex, with a high degree of betweenness suggests that these proteins are not simple members of the large protein complex (a feature of nonbottleneck-hubs) Rather, it connects the complex to the rest of the network, which is, in a sense, a real connectivity bottleneck.
The fact that they have a high betweenness suggests that these proteins are not, however, simply members of large protein complexes (which is true for nonbottleneck–hubs), but are those members that connect the complex to the rest of the graph; in a sense, real connectivity bottlenecks. While hub–nonbottlenecks mainly consist of structural proteins, hub–bottlenecks are more likely to be part of signal transduction pathways.
Hub-nonbottlenecks主要构成结构蛋白,
Hub-bottlenecks更倾向于是信号转导通路的一部分
事实上,它们之间的介数很高,这表明这些蛋白质不仅仅是大型蛋白质复合体的成员(这对于非瓶颈枢纽来说是正确的) ,而是那些将复合体连接到图表其余部分的成员; 在某种意义上,是真正的连接瓶颈。 中枢-非瓶颈主要由结构蛋白组成,而中枢-瓶颈则更有可能成为信号转导通路的一部分。 枢纽-非瓶颈 something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something,枢纽-瓶颈 something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something something
Furthermore, hub–bottlenecks are (by construction) the most efficient in disrupting the network upon hub removal. This relates nicely to the date/party-hub concept by Han et al. : hub–bottlenecks tend to be date-hubs, whereas hub–nonbottlenecks tend to be party-hubs.
此外,枢纽瓶颈(通过构建)是在迁移枢纽时破坏网络的最有效方法。 这与 Han 等人的日期 / 聚会中心概念很好地相关: 中心-瓶颈倾向于日期中心,而非中心瓶颈倾向于聚会中心。
另外,一旦hub被移走,hub-bottlenecks是破坏网络最有效的节点。这和Han的hub概念非常接近:hub-bottlenecks倾向于是date-hubs,hub-nonbottlenecks倾向于party-hubs(hans的文章看了就明白,datehubs更容易是大架构的组织者维持者,是大老板)。(han的这个观点发表在nature上,下面是han的观点)
In addition, once the hub is removed, hub-bottlenecks are the most effective nodes to disrupt the network. This is very close to Han's hub concept: hub-bottlenecks tend to be date-hubs, hub-nonbottlenecks tend to party-hubs. (as Hans's article makes clear, datehubs is more likely to be the organizer, maintainer, and big boss of big architecture.). (Han's paper is published in nature, followed by Han's.)
上面说的那个han的nature上的文章
https://www.nature.com/articles/nature02555
In apparently scale-free protein–protein interaction networks, or ‘interactome’ networks1,2, most proteins interact with few partners, whereas a small but significant proportion of proteins, the ‘hubs’, interact with many partners.
在无尺度蛋白相互作用网络或叫相互作用组网络,大多数蛋白都是和少数的partners作用,只有少部分蛋白,也就是hubs,和很多partners作用.
In an apparently scale-free protein interaction networks, or 'interactions' https://www.nature.com/articles/nature025551,2, most proteins interact with partners, whereas a small but significant proportion of proteins, the 'hibs' , interact with many partners. In a scale free protein interaction network, or Interaction Group Network, most proteins interact with a small number of partners, only a small number of proteins, known as hubs, and many partners.
非hub但瓶颈通常比那些非hub非瓶颈蛋白和他们的邻居共表达更少,符合这个观察:betweenness是和邻接蛋白平均相关性的指标,非hub但瓶颈蛋白很少是复合体成员,并且大部分都是调节蛋白和信号转到machinery。
不管是生物还是非生物,只要是无尺度网络,都对随机的node移除有抵抗能力,但是对hubs的移除非常敏感。
大概就是酵母做了个实验,移除敲除编码hub蛋白的基因,比非hub的死亡率大3倍,我们发现了两类hub:party hubs党派型,同时和partners的大部分相互作用。Date hubs约会型,不同的时间或位置结合不同的partners。
Non Hub but bottleneck proteins are generally less expressed than non hub non bottleneck proteins and their neighbors, consistent with this observation: betweenness is an indicator of average correlation with neighboring proteins, but non hub but bottleneck proteins are rarely members of complexes And most of them regulate the proteins and signals that go to machinery. Both living and non living scale-free network are resistant to random node removal, but are sensitive to the removal of hubs. Presumably Yeast did an experiment, removing the gene that codes for the hub protein, three times more likely to die than a non hub, and we found two types of hubs: Party Hubs Partisans, and most interactions with partners. Date hubs dating type, different times or locations combined with different partners.
Figure3.png
这样,酵母中的相互作用网络的hub基于他们的partners‘表达谱,可以分为两类:date和party hubs。这种区分揭示了酵母蛋白组组织模块的模型,通过regulators,mediators或adaptors连接模块,这就是date hubs。Party hubs代表不同的模块内部的必须的成分,对这这些模块介导的功能很重要(因此倾向于是必须蛋白),倾向于在蛋白组的组织上低水平工作。(大概意思是date hubs是大boss,沟通衔接,而party hubs是模块内部的小老板)。我们提出,date hubs在整个蛋白组网络中生物模块的总体组织中是必须的,参与的是大范围的整合连接(虽然一些date hub可以简单的共享,并且调节模块内或跨模块的局部功能)。这种相互作用网络的关键特点,比如对抗外界环境的遗传稳定性和弹性,使用这样的模块组织方式作为框架就更好理解了。
So, the interaction networks of the hub in yeast can be divided into two groups based on their partners'expression profiles: Date and party hubs. This distinction reveals a model of the Yeast Protein Tissue Module, which connects to the module via regulators, mediators, or adaptors, known as date hubs. Party hubs represents the necessary components within different modules that are important to the function mediated by these modules (and therefore tend to be essential proteins) and tend to work at a low level in the tissue of the proteome. Date hubs is the big boss, and party hubs is the small boss inside the module. We propose that date hubs is a necessary part of the overall organization of biological modules throughout the proteomic network, participating in a wide range of integrated connections (although some date hubs can be simply shared and modulate local functions within or across modules) . Key features of this network of interactions, such as genetic stability and resilience against the environment, are better understood using such modular organization as a framework.
因此,所谓的date-hubs是那些有高的betweeness(hub-bottlenecks),
而party-hubs更可能是有着低betweeness的hubs(hub-nonbottlenecks)
这个发现,或许表明了相互作用网络中动态和拓扑特性之间的联系,而这迄今为止是人类未知的。
作者相信,虽然先有不好实现的地方,但是betweenness将来会被证明是一个非常有用的工具对很多蛋白昂立来说,尤其是有方向的edges(调控网络)。
总之,我们提供了两种互补的拓扑网络特性的整合分析,这适合于不同的网络类型。这种整合的方法解释了先前不为人知的网络拓扑性质之间的联系,蛋白质必要性和表达动态。我们相信,这种整合的方法就像现在提出的这种,会对将来的预测模型至为重要。
Thus, the so-called date-hebs are those with high betweeness (hub-bottlenecks) , while the party-hebs are more likely with low betweeness HEBS (hub-nonbottlenecks) finding may indicate a connection between dynamic and topological properties in interacting networks And so far, it's unknown to humans. The authors believe that betweenness will prove to be a very useful tool for many proteins, especially the directed edges, although it has its drawbacks. In summary, we provide an integrated analysis of the characteristics of two complementary topological networks, which are suitable for different network types. This integrated approach explains previously unknown network topological properties, protein necessity, and expression dynamics. We believe that such an integrated approach, such as the one proposed today, will be essential for future prediction models.
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
无代码生信挖掘范文及套路分享(附18年无代码生信文章)!
使用R语言的clusterProfiler对葡萄做GO富集分析的简单小例子
Science:基因调控原理定量连接细菌中的 DNA 与 RNA 和蛋白质
【直播】我的基因组80:为什么有些基因的内部测序深度差异如此大
不需要编程的医学套路文章,不止3分
基因共表达网络分析口腔鳞癌中的关键模块和hub基因
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服