精华内容
下载资源
问答
  • Spark GraphX之全局聚类系数、局部聚类系数、网络平均聚类系数

    想要测量一张图的连通性,这可以通过调用GraphX原生支持的triangleCount()来实现。但是如果想要对比多张图的连通性,这时又该如何呢?

    Global clustering coefficient(全局聚类系数)

    Another way to measure connectedness, the global clustering coefficient, is better in that it always returns a number between 0 and 1, making it possible to compare the connectedness of different sized graphs.

    全局聚类系数通常定义如下:closed triplets / total triplets (open or closed)

    A triplet in this case is a set of three vertices that have two or three edges among them. If there are three edges, then it’s a triangle, and this is called a closed triplet. If there are only two edges, then it’s called an open triplet. Triplets are counted for each vertex and then added all together; this means that a triangle will count as three closed triplets, because each of the three vertices will have one closed triplet associated with it.

    在这里插入图片描述

    如图所示,存在与一个三角形相关联的三个闭合三元组:与顶点1相关联的一个闭合三元组(1-2-3),与顶点2相关联的一个闭合三元组(2-3-1),以及与顶点3相关联的一个闭合三元组(3-1-2)。另外,与顶点3相关联的还有两个开放三元组,即(1-3-4)和(2-3 -4)。所以,此图的全局聚类系数为{(1-2-3),(2-3-1),(3-1-2)} / {(1-2-3),(2-3-1),(3-1-2),(1-3-4),(2-3 -4)} = 3 / 5 = 0.6

    计算全局聚类系数的代码如下:

      def globalClusteringCoefficient[VD: ClassTag, ED: ClassTag](g:Graph[VD, ED]) = {
        val numerator  = g.triangleCount().vertices.map(_._2).reduce(_ + _)
        val denominator = g.inDegrees.map{ case (_, d) => d*(d-1) / 2.0 }.reduce(_ + _)
        if(denominator == 0) 0.0 else numerator / denominator
      }
    

    local clustering coefficient(局部聚类系数)

    与全局聚类系数相对应,有一个局部聚类系数的概念。

    that is the ratio of the actual triangle count at a vertex to the number of possible triangles at that vertex based on how
    many neighbors it has. For an undirected graph, the local clustering coefficient C for a vertex that has k neighbors and t triangles is: C = 2t / [ k * (k - 1) ]

    计算局部聚类系数的代码如下:

      def localClusteringCoefficient[VD: ClassTag, ED: ClassTag](g: Graph[VD, ED]) = {
        val triCountGraph = g.triangleCount()
        val maxTrisGraph = g.inDegrees.mapValues(srcAttr => srcAttr*(srcAttr-1) / 2.0 )
        triCountGraph.vertices.innerJoin(maxTrisGraph){ (vid, a, b) => if(b == 0) 0 else a / b }
      }
    

    network average clustering coefficient(网络平均聚类系数)

    Computing the average value of the local clustering coefficient for all of the vertices in the graph gives us the network average clustering coefficient

    测试代码如下:

        val vertices = sc.makeRDD(Seq((1L, ""), (1L, ""), (2L, ""), (3L, ""), (4L, ""), (4L, "")))
        val edges = sc.makeRDD(Seq(Edge(1L, 1L, ""), Edge(1L, 2L, ""), Edge(2L, 3L, ""), Edge(1L, 3L, ""), Edge(3L, 4L, ""))).filter(e => e.srcId != e.dstId).flatMap(e => Array(e, Edge(e.dstId, e.srcId, e.attr))).distinct()  //去掉自循环边,有向图变为无向图,去除重复边
        val testGraph =  Graph(vertices, edges).cache()
    

    打印顶点:

        testGraph.vertices.foreach(println)
    

    输出结果如下:

    (3,)
    (1,)
    (2,)
    (4,)
    

    从上述输出来看,不难发现GraphX自动为我们去除了重复顶点。

    The vertices and edges arguments that we used to construct the Graph instance were regular RDDs—we didn’t even deduplicate the entries in the vertices so that there was only a single instance of each topic. Fortunately, the Graph API does this
    for us, converting the RDDs we passed in to a VertexRDD and an EdgeRDD, so that the vertex counts are now unique.
    Note that if there are duplicate entries in the EdgeRDD for a given pair of vertices, the Graph API will not deduplicate them: GraphX allows us to create multigraphs, which can have multiple edges with different values between the same pair of vertices. This can be useful in applications where the vertices in the graph represent rich objects, like people or businesses, that may have many different kinds of relationships between them (e.g., friends, family members, customers, partners, etc.). It also allows us to treat the edges as either directed or undirected, depending on the context.

    计算全局聚类系数:

        println(globalClusteringCoefficient2(testGraph))
    

    输出如下:

    0.6
    

    计算局部聚类系数:

        localClusteringCoefficient(testGraph).foreach(println)
    

    输出如下:

    (3,0.3333333333333333)
    (1,1.0)
    (2,1.0)
    (4,0.0)
    

    计算网络平均聚类系数:

        println(localClusteringCoefficient(testGraph).map(_._2).sum() / testGraph.vertices.count())
    

    输出如下:

    0.5833333333333333
    
    展开全文
  • 聚类系数

    万次阅读 2019-01-16 09:10:05
    全局聚类系数: 全局集聚系数是基于结点三元组的。 全局集聚系数是封闭的三元组数目/所有三元组数目,即 Clustering coefficient(global) = number of closed triplet / number of triplet(closed+open)   局部...

    全局聚类系数:

    全局集聚系数是基于结点三元组的。 全局集聚系数是封闭的三元组数目/所有三元组数目,即

    Clustering coefficient(global) = number of closed triplet / number of triplet(closed+open)

     

    局部聚类系数:

    局部计算是面向节点的,对于节点vi,找出其直接邻居节点集合Ni,计算Ni构成的网络中的边数K,除以Ni集合可能的边数|Ni|*(|Ni|-1)/2

     平均集聚
    整个网络的集聚系数由Watts和Strogatz定义为所有结点n的局部集聚系数的均值:
    如果一个图的平均集聚系数显著高于相同结点集生成的随机图,而且平均最短距离与相应随机生成的随机图相近,那么这个图被认为是小世界的。
    有更高平均集聚系数的网络被发现有着模块结构,同时在不同结点中还有更小的平均距离

    展开全文
  • 可以计算网络中各节点的聚类系数和网络的平均聚类系数
  • 我想在没有NetworkX内置方法的情况下计算网络的聚类系数。在我现在写的方法似乎正是import networkx as nximport matplotlib.pyplot as plt%matplotlib inline#create graphG=nx.Graph()G.add_edges_from([(0,1),(0,...

    我想在没有NetworkX内置方法的情况下计算网络的聚类系数。在

    我现在写的方法似乎正是import networkx as nx

    import matplotlib.pyplot as plt

    %matplotlib inline

    #create graph

    G=nx.Graph()

    G.add_edges_from([(0,1),(0,2),(0,4),(0,3),(0,5),(1,7),(1,10),(1,11),(1,12),(2,4),(2,5),(2,3),(3,4),(5,8),(5,6),(6,8),(6,9),(6,7),(7,9),(7,10),(10,11),(10,12),(11,13),(12,13)])

    # print 1 test value

    print nx.clustering(G,1)

    def clustering_coefficient(G):

    # this will store the mapping of node/coefficient

    clusteringDict = {}

    for node in G:

    neighboursOfNode = []

    nodesWithMutualFriends = []

    # store all neighbors of the node in an array so we can compare

    for neighbour in G.neighbors(node):

    neighboursOfNode.append(neighbour)

    for neighbour in G.neighbors(node):

    for second_layer_neighbour in G.neighbors(neighbour):

    # compare if any second degree neighbour is also a first degree neighbour (this makes a triangle)

    # if so, append it to the mutual friends list

    if second_layer_neighbour in neighboursOfNode:

    nodesWithMutualFriends.append(second_layer_neighbour)

    # filter duplicates from the mutual friend array

    nodesWithMutualFriends = list(set(nodesWithMutualFriends))

    clusteringCoefficientOfNode = 0

    # apply coefficient formula to calculate

    if len(nodesWithMutualFriends):

    clusteringCoefficientOfNode = (2 * float(len(nodesWithMutualFriends)))/((float(len(G.neighbors(node))) * (float(len(G.neighbors(node))) - 1)))

    clusteringDict[node] = clusteringCoefficientOfNode

    clustering_coefficient(G)

    但是,运行此脚本时,NetworkX值在大多数情况下都会给出与我自己的脚本不同的值。无论如何,这个脚本也可以运行到2.0而不是1.0。在

    我的逻辑怎么了?在

    展开全文
  • 关于聚类系数的原汁原味的介绍,可以参考小世界网络这篇论文 [1]:The clustering coefficient C(p) is defined as follows. Suppose that a vertex v has kv neighbours; then at most kvðkv 21Þ=2 edges can ...

    关于聚类系数的原汁原味的介绍,可以参考小世界网络这篇论文 [1]:

    The clustering coefficient C(p) is defined as follows. Suppose that a vertex v has kv neighbours; then at most kvðkv 21Þ=2 edges can exist between them (this occurs when every neighbourof v is connected to everyother neighbour of v). Let Cv denote the fraction of these allowable edges that actually exist.

    Define C as the average of Cv over all v. For friendship networks, these statistics have intuitive meanings: L is the average number of friendships in the shortest chain connecting two people;

    Cv reflects the extent to which friends of v are also friends of each other; and thus C measures the cliquishness of a typical

    friendship circle. The data shown in the figure are averages over 20 random realizations of the rewiring process described in Fig.1, and have been normalized by the values L(0), C(0) for a regular lattice. All the graphs have n ¼ 1;000 vertices and an average degree of k ¼ 10 edges per vertex. We note that a logarithmic horizontal scale has been used to resolve the rapid drop in L(p), corresponding to the onset of the small-world phenomenon.

    During this drop, C(p) remains almost constant at its value for the regular lattice, indicating that the transition to a small

    world is almost undetectable at the local level .

    用白话解释一下:

    聚类系数的定义:网路中所有长度为2的路径中闭合路径所占的比例 [2]。

    节点的聚类系数是度量某节点的两个邻居节点也互为邻居的平均概率 [2]。也就是以该节点为中心,其周围的邻居中相互之间存在边的个数 除以 该节点周围邻居节点对总数。分子要求必须有连边,分母不要求,也就是 该节点的 度*(度-1)度*(度-1)。

    关于节点 聚类系数 [2, 3] 的定义:

    \[C_i = \frac{节点 i 的邻居中直接相连的节点对的个数}{节点 i 的邻居节点对的总数}\]

    其Python版计算方式如下:

    方法1:

    def cal_ci(G, u):

    '''

    @description: 计算节点的聚类系数

    @param : G, u

    @return:

    '''

    return sum([1 for x in G[u] for y in G[x] if y in G[u]]) / (G.degree[u] * (G.degree[u] - 1)) if G.degree[u] > 1 else 0

    方法2:

    def get_ci(G, z):

    ''' 获取节点聚集系数 Clustering Coefficient'''

    return sum([1 for u in G[z] for v in G[z] if not u == v and (u, v) in G.edges()]) / (len(G[z]) * (len(G[z]) - 1)) if len(G[z]) != 0 and len(G[z]) != 1 else 0

    注:这里可能会存在疑问,用 (u, v) in G.edges() 会不会出错?实际上 G.edges() 打印出来虽然是列表的形式,但其实是NetworkX的内置类型,可以通过以下代码验证:

    G = nx.path_graph(5)

    print(G.edges())

    print(type(G.edges()))

    a = (3, 2) in G.edges()

    b = (3, 2) in [(0, 1), (1, 2), (2, 3), (3, 4)]

    print(a)

    print(b)

    运行结果:

    [(0, 1), (1, 2), (2, 3), (3, 4)]

    True

    False

    方法3:NetworkX库中的方法:

    nx.clustering(G,u)

    对应源代码:

    def clustering(G, nodes=None, weight=None):

    r"""Compute the clustering coefficient for nodes.

    For unweighted graphs, the clustering of a node :math:`u`

    is the fraction of possible triangles through that node that exist,

    .. math::

    c_u = \frac{2 T(u)}{deg(u)(deg(u)-1)},

    where :math:`T(u)` is the number of triangles through node :math:`u` and

    :math:`deg(u)` is the degree of :math:`u`.

    For weighted graphs, there are several ways to define clustering [1]_.

    the one used here is defined

    as the geometric average of the subgraph edge weights [2]_,

    .. math::

    c_u = \frac{1}{deg(u)(deg(u)-1))}

    \sum_{vw} (\hat{w}_{uv} \hat{w}_{uw} \hat{w}_{vw})^{1/3}.

    The edge weights :math:`\hat{w}_{uv}` are normalized by the maximum weight

    in the network :math:`\hat{w}_{uv} = w_{uv}/\max(w)`.

    The value of :math:`c_u` is assigned to 0 if :math:`deg(u) < 2`.

    For directed graphs, the clustering is similarly defined as the fraction

    of all possible directed triangles or geometric average of the subgraph

    edge weights for unweighted and weighted directed graph respectively [3]_.

    .. math::

    c_u = \frac{1}{deg^{tot}(u)(deg^{tot}(u)-1) - 2deg^{\leftrightarrow}(u)}

    T(u),

    where :math:`T(u)` is the number of directed triangles through node

    :math:`u`, :math:`deg^{tot}(u)` is the sum of in degree and out degree of

    :math:`u` and :math:`deg^{\leftrightarrow}(u)` is the reciprocal degree of

    :math:`u`.

    Parameters

    ----------

    G : graph

    nodes : container of nodes, optional (default=all nodes in G)

    Compute clustering for nodes in this container.

    weight : string or None, optional (default=None)

    The edge attribute that holds the numerical value used as a weight.

    If None, then each edge has weight 1.

    Returns

    -------

    out : float, or dictionary

    Clustering coefficient at specified nodes

    Examples

    --------

    >>> G=nx.complete_graph(5)

    >>> print(nx.clustering(G,0))

    1.0

    >>> print(nx.clustering(G))

    {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0}

    Notes

    -----

    Self loops are ignored.

    References

    ----------

    .. [1] Generalizations of the clustering coefficient to weighted

    complex networks by J. Saramäki, M. Kivelä, J.-P. Onnela,

    K. Kaski, and J. Kertész, Physical Review E, 75 027105 (2007).

    http://jponnela.com/web_documents/a9.pdf

    .. [2] Intensity and coherence of motifs in weighted complex

    networks by J. P. Onnela, J. Saramäki, J. Kertész, and K. Kaski,

    Physical Review E, 71(6), 065103 (2005).

    .. [3] Clustering in complex directed networks by G. Fagiolo,

    Physical Review E, 76(2), 026107 (2007).

    """

    if G.is_directed():

    if weight is not None:

    td_iter = _directed_weighted_triangles_and_degree_iter(

    G, nodes, weight)

    clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

    for v, dt, db, t in td_iter}

    else:

    td_iter = _directed_triangles_and_degree_iter(G, nodes)

    clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

    for v, dt, db, t in td_iter}

    else:

    if weight is not None:

    td_iter = _weighted_triangles_and_degree_iter(G, nodes, weight)

    clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

    v, d, t in td_iter}

    else:

    td_iter = _triangles_and_degree_iter(G, nodes)

    clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

    v, d, t, _ in td_iter}

    if nodes in G:

    # Return the value of the sole entry in the dictionary.

    return clusterc[nodes]

    return clusterc

    虽然,求网络的聚类系数定义为网络中存在的三角形个数 * 3 再除以 三元组个数,但是实际中大多采用所有节点聚类系数的平均值来计算。

    参考文献:

    D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, Jun. 1998.

    M.E.J.Newman. 网络科学引论[M]. 2014.

    汪小帆, 李翔, 陈关荣. 网络科学导论[J]. 高等教育出版社, 2012.

    展开全文
  • 前面一篇文章,有朋友留言:十亿加节点局部聚类系数能跑出来么?本期,主要学习和解答这个问题。1、聚类系数聚类系数(Clustering coefficient)是表示一个图中节点聚集程度的系数。定义有两种,全局的和局部的(局部的...
  • 聚类系数 matlab

    2016-08-11 14:01:35
    求一个已知网络的聚类系数
  • 聚类系数与小世界网络

    千次阅读 2020-06-02 21:46:19
    聚类系数 聚类系数也称为聚集系数,源自于社会学中的“可传递三元组比率”,聚类系数描述了网络中个体的邻居节点也互为邻居的可能性。 节点i的聚类系数定义为: 表示聚集网络中节点i,j之间的连边,若连边存在...
  • 复杂网络聚类系数的matlab编程代码,将复杂网络存储为矩阵,再对其matLab编程,求得复杂网络的集类系数 复杂网络聚类系数的matlab编程代码,将复杂网络存储为矩阵,再对其matLab编程,求得复杂网络的集类系数
  • 针对实现网络特征的真实情况,提出了一类可调聚类系数的加权无标度网络模型,该模型能够重现现实网络权重和节点度呈幂律分布的统计特性.特别是聚类系数与度之间的非线性关系,恰好符合某些现实网络聚类系数与度之间的...
  • 通过研究RGG模型中网络聚类系数和网络次大特征值之间的关系,来探索聚类系数对网络收敛性能的影响程度,并希望从有边界效应和无边界效应两种情况入手对其进行研究。通过仿真实验发现在无边界情况下的聚类系数是一个...
  • 拓扑性质之聚类系数

    千次阅读 2020-06-25 16:44:25
    节点聚类系数描述节点的邻居节点之间互连的概率 网络的聚类系数C:网络中所有节点的聚类系数的平均值 C=1N∑i=1NCiC=\frac{1}{N}\sum_{i=1}^{N}CiC=N1​i=1∑N​Ci 其中CiCiCi:网络中一个度为kikiki的节点iii的...
  • 复杂网络聚类系数

    2013-12-19 20:56:15
    m文件,求复杂网络聚类系数,方便好用,网络图用矩阵表示。
  • 复杂网络 求聚类系数 maltab代码 子程序代码 非常简洁 调用方便
  • MATLAB源码集锦-复杂网络的聚类系数算法代码
  • matlab程序用于计算复杂网络度相关系数和平均聚类系数的求解。适合于编程新手使用。大家可以使用,用于社交网络中复杂网络的计算。
  • 聚类系数clustering coefficient计算方式 1、出处 William L. Hamilton. (2020). Graph Representation Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 14, No. 3 , Pages ...
  • 比较图元向量和点的聚类系数对差异网络的研究
  • 针对于标准二分图网络推荐算法(NBI)的物质扩散机制过于简单的问题,提出了基于聚类系数的改进NBI算法(简称NBICC)。推荐系统可以被抽象为一个有向加权二分图网络,在物质扩散的过程中,考虑到聚类系数因素的影响...
  • 本文通过定义稠密子团,结合边聚类系数和局部模块度,提出一种DIDE社团挖掘算法。该算法通过选取稠密子团作为初始聚类团,利用边聚类系数扩张该稠密子团,最大化局部模块度值来生成社团结构。在计算机生成网络、三...
  • # -*- coding: cp936 -*-import randomimport networkx as nxfrom networkx.generators.classic import empty_graphdef powerlaw_cluster_graph(n, m, p, seed=None):"""Holme and Kim algorithm for growing graphs...
  • 即时通讯网络已成为大众信息传播的主要途径,研究了...通过实验仿真对群数、聚类系数、传播者和免疫者的变化观察,发现群数越多则聚类系数越大,传播者峰值越高,以及免疫者也相应地有所提高,从而对信息传播的影响也越大。
  • 多网络中,如果节点v1连接于节点v2,节点...可以用聚类系数(CC)来表示,在无向网络中,聚类系数定义为: C = 2*n/(k*(k-1))其中,n表示在节点v的所有k个邻居间的边数。 用Java编写一个求聚类系数的程序,代码注释
  • 用scikit-learn进行k-means聚类,默认使用欧式距离,为了用余弦距离作为度量,找了一个在... 另外,为了确定一个合理的聚类系数,采用轮廓系数作为衡量标准: 轮廓系数取值为[-1, 1],其值越大越好。from sklearn...
  • 聚类系数计算——GCC&LCC

    千次阅读 2020-06-14 13:05:44
    聚类系数计算 在图论中,集聚系数是图中的点倾向于集聚在一起的程度的一种度量。证据显示:在多数实际网络以及特殊的社会网络中,结点有形成团的强烈倾向,这一倾向的特征是有一个相对紧密的连接(Holland and ...
  • 无法反映共邻节点的邻居对潜在节点对的贡献以及度量共邻节点互连密集程度对预测结果的影响这一问题,从局部结构的密集层面来分析共邻节点对潜在节点对的影响,提出了一种集成加权聚类系数的相似度指标(WCCLP)。...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 1,297
精华内容 518
关键字:

聚类系数