Clustering in Weighted Networks

tnet » Weighted Networks » Clustering

TripletA fundamental measure that has long received attention in both theoretical and empirical research is the clustering coefficient. This measure assesses the degree to which nodes tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties (Feld, 1981; Friedkin, 1984; Heider, 1946; Holland and Leinhardt, 1970; Louch, 2000; Snijders, 2001; Snijders et al., 2006; Watts and Strogatz, 1998). In real-world networks, this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971; Wasserman and Faust, 1994; Watts and Strogatz, 1998).

Traditionally, the two versions of the clustering coefficient developed for testing the tendency of nodes to cluster together into tightly knit groups are the global clustering coefficient (Doreian, 1969) and the local clustering coefficient (Watts and Strogatz, 1998). The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the embeddedness of single nodes.

The Global Clustering Coefficient

The global clustering coefficient is based on triplets of nodes. A triplet is three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. A triangle consists of three closed triplets, one centred on each of the nodes. The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed). The first attempt to measure it was made by Luce and Perry (1949). This measure gives an indication of the clustering in the whole network (global), and can be applied to both undirected and directed networks (often called transitivity, see Wasserman and Faust, 1994, page 243). However, it cannot be applied to weighted networks.

Sample networkFor this sample network, the binary global clustering coefficient would be 3 over 9, or 0.33. The closed triplets are B→A←C; A→B←C; A→C←B; whereas the open triplets are A→B←D; A→B←E; C→B←D; C→B←E; D→B←E; B→E←F. As can be seen from the sample network, the strongest ties are inside the triangle. This is not captured by the binary coefficient as the weights are not considered.

A generalisation of the global clustering coefficient to weighted networks was proposed by Opsahl and Panzarasa (2009). This generalisation required a triplet value to be defined, and then, divided the value of closed triplets on the value of all triplets. The generalised clustering coefficient produces the same result as the binary global clustering coefficient when it is applied to a binary network. This is because all triplets have the same value, irrespective of the method used to calculate triplet values. In addition, the generalised coefficient has the same properties as C. Moreover, if weights are randomly assigned in the network, the weighted clustering coefficient is equal to the binary coefficient.

Four methods was suggested for defining a triplet value: the arithmetic mean, geometric mean, maximum, and minimum of the two tie weights that make up the triplet. The arithmetic mean is simply the average of the tie weights. This method is insensitive to differences between the two tie weights as an extreme value can have a major impact on the triplet value. The geometric mean of the weights attached to the two ties overcomes some of the sensitivity issues as a triplet made up by a tie with a low value and a tie with a high value will have a lower value than if the arithmetic mean were used. In addition, it is possible to use two extreme methods to deal with differences in tie weights. The maximum method takes the highest value of the two weights, and will make a triplet with a strong tie and a weak tie equal to a triplet with two strong ties. Conversely, the minimum method takes the lowest value of the two weights, and make triplets with a strong tie and a weak tie equal to triplets with two weak ties. The table below highlights the differences between the methods of defining the triplet value (adopted from Chapter 2 of my thesis).

Triplet value

It is vital to use an appropriate method for defining the value of a triplet as this impacts on the outcome of the coefficient. The method should be chosen based on the research question as well as the way in which the strength of the ties are operationalised into weights. For example, in a network where the weights correspond to the level of flow, and a weak tie would act as a bottleneck, the minimum method might be most appropriate to use.

Are strong ties embedded?To highlight a potential use of the binary and generalised global clustering coefficient, it is possible to see whether stronger triplets are more likely to be closed than weaker ones. This can be done by comparing the weighted version (C_{w}) to the binary version calculated on a binary network where all ties with a weight greater than 0 is set to present (C_{GT0}). In so doing, the two coefficients are calculated on the same network structure. The only difference is that the former takes the weights of ties into consideration. If the ratio between the weighted clustering coefficient and the binary coefficient is higher than 1, it can be argued that triplets made up by strong ties are more likely to be closed than triplets made up by weak ties (\frac{C_{w}}{C_{GT0}} > 1 \Rightarrow p(X) > p(Y)). On the contrary, it can be argued that triplets made up by weak ties are more likely to be closed than those made up by strong ties if the ratio is less than 1.

To empirically investigate whether strong ties are more likely to be part of closed triplets than weak ties, I have used the tnet networks. For simplicity, the arithmetic mean method is used for defining triplet values.

Network Nodes Ties C_{GT0} C_{w} \frac{C_{w}}{C_{GT0}}
1 Freeman’s EIES network (time 1) 48 695 0.7627 0.7702 1.0099
2 Freeman’s EIES network (time 2) 48 830 0.8131 0.8214 1.0102
3 Freeman’s EIES network (messages) 32 460 0.6386 0.7378 1.1555
4 Consulting (advice) 46 879 0.6932 0.7130 1.0285
5 Consulting (value) 46 858 0.6764 0.6852 1.0131
6 Research team (advice) 77 2228 0.6848 0.7127 1.0408
7 Research team (awareness) 77 2326 0.6785 0.6957 1.0253
8 C.elegans’ neural network 306 2345 0.1818 0.2364 1.3009
9 US airport network 500 2980 0.3514 0.4765 1.3562
10 Newman’s scientific collaboration network 16730 47594 0.3596 0.3178 0.8838

As it is possible to see from the above table, 9 of the 10 networks have a weighted clustering coefficient that is higher than the traditional clustering coefficient. This suggest that triangles generally are, in fact as suggested by Simmel (1923[1950]), made up by strong ties. The tenth network, which shows the opposite result, is different from the other nine networks. It is a one-mode projection of a two-mode network, whereas the first nine networks are native one-mode networks. This feature could explain the observed result as the one-mode projection is constructed in a way that penalises the weight of ties within triangles, or more specifically, large fully connected cliques (Newman, 2001). See the projecting two-mode networks.

Local Clustering Coefficient

The local clustering coefficient is based on ego network density or local density (Scott, 2000; Uzzi and Spiro, 2005; Watts and Strogatz, 1998). For a node, this is the fraction of the number of present ties over the total number of possible ties between the node’s neighbours. Therefore, the outcome ranges between 0 and 1: 0 if no ties exist between the neighbours, and 1 if all possible ties exists.

Sample networkFor the sample network, nodes A and C would get a score of 1 since all possible ties among their neighbours are present, whereas node E would get a score of 0 since none of the possible ties among its neighbours are present. Nodes D and F would not get a value since the number of possible ties among their neighbours is 0 (if a node has less than 2 neighbours, the coefficient is undefined). For node B, one out of six possible ties is present, so the coefficient is 1/6 or 0.1667.

The main advantage of this version of the clustering coefficient is that a score is assigned to each node (local measure). This enables researchers to study associations between the coefficient and other nodal properties (e.g. Panzarasa et al., 2009) and perform regression analyses with the observations being the nodes of a network (e.g. Uzzi and Lancaster, 2004). However, this version of the clustering coefficient suffers from three major limitations (see Opsahl and Panzarasa, 2009, for a longer discussion). First, its outcome does not take into consideration the weight of the ties in the network. Second, the local clustering coefficient cannot be calculated on directed networks. Third, a negative correlation with degree is often found in real-world networks. This is due to the fact that it is “easier” for a node with two neighbours to get a score of 1 (only one tie is need) than for a node with 10 neighbours (45 ties must be present).

Recently, there have been a number of attempts to extend the local clustering coefficient to the case of weighted networks (Barrat et al., 2004; Lopez-Fernandez et al., 2004; Onnela et al., 2005; Zhang and Horvath, 2005). I will focus on Barrat et al. (2004) measure. They proposed a generalisation of the local clustering coefficient to weighted networks by taking the weights explicitly into account. First, they assigned a triplet value to each triplet in the network based on the arithmetic mean (normal average). Then for each node, they summed the value of the closed triplets that were centred on the node and divided it by the total value of all triplets centred on the node. Similarily to the weighted global clustering coefficient, triplet values do not have to be the arithmetic mean, but could also be the geometric mean, maximum, and minimum of the two tie weights.

For the above sample network, all the nodes, except node B, would get the same outcome as the binary analysis. Nodes A and C get a value of 1 because all their neighbours are connected, whereas node E get a value of 0 because none of its neighbours are connected. Nodes D and E do not have a value as they have less than 2 neighbours. Node B does not get the same value as it has more than 2 neighbours, and is the centre of both open and closed triplets. The weighted local clustering coefficient is between 0.18 and 0.36, while the binary one is 0.17. The increased weighed coefficient is a reflection of node B’s strong ties being directed towards neighbours that are themselves connected. The table below highlights the local clustering coefficients.

Node Weighted local clustering coefficient using
arithmetic mean geometric mean maximum method minimum method binary version
1 1.00 1.00 1.00 1.00 1.00
2 0.24 0.27 0.18 0.36 0.17
3 1.00 1.00 1.00 1.00 1.00
4 NaN NaN NaN NaN NaN
5 0.00 0.00 0.00 0.00 0.00
6 NaN NaN NaN NaN NaN

Want to test it with your data?

The clustering coefficients can be calculated using tnet. First, you need to download and install tnet in R. Then, you need to create an edgelist of your network (see the data structures in tnet for weighted one-mode networks). The commands below show how the edgelist for the sample network here can manually be entered, and how to calculate the clustering coefficients.

# Load tnet
library(tnet)

# Manually enter the sample network
net <- cbind(
i=c(1,1,2,2,2,2,3,3,4,5,5,6),
j=c(2,3,1,3,4,5,1,2,2,2,6,5),
w=c(4,2,4,4,1,2,2,4,1,2,1,1))

# Compute the global clustering coefficients
clustering_w(net, measure=c("bi", "am", "gm", "ma", "mi"))

# Compute the local clustering coefficients
clustering_local_w(net, measure=c("bi", "am","gm","ma","mi"))
Depending on the size of your network, these R functions might be a bit slow. I have also programmed these functions in c++, which makes them much faster. Send me an email if you would like the source/compiled code of the c++ functions.

The code used to create the above table was:

# Load tnet
library(tnet)

# Load networks
data(tnet)
net <- list()
net[[1]] <- Freemans.EIES.net.1.n48
net[[2]] <- Freemans.EIES.net.2.n48
net[[3]] <- Freemans.EIES.net.3.n32
net[[4]] <- Cross.Parker.Consulting.net.info
net[[5]] <- Cross.Parker.Consulting.net.value
net[[6]] <- Cross.Parker.Manufacturing.net.info
net[[7]] <- Cross.Parker.Manufacturing.net.aware
net[[8]] <- celegans.n306.net
net[[9]] <- USairport.n500.net
net[[10]] <- Newman.Condmat.95.99.net.1mode.wNewman

# Calculate values
output <- data.frame(CGT0=NaN, Cw=NaN, Ratio=NaN)
for(i in 1:length(net)) 
  output[i,c("CGT0","Cw")] <- clustering_w(net[[i]], measure=c("bi","am"))
output[,"Ratio"] <- output[,"Cw"]/output[,"CGT0"]

References

Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A., 2004. The architecture of complex weighted networks. Proceedings of the National Academy of Sciences 101 (11), 3747-3752. arXiv:cond-mat/0311416

Burt, R. S., 1992. Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge, MA.

Coleman, J. S., 1988. Social capital in the creation of human capital. American Journal of Sociology 94, S95-S120.

Colizza, V., Pastor-Satorras, R., Vespignani, A., 2007. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nature Physics 3, 276-282. arXiv:cond-mat/0703129

Doreian, P., 1969. A note on the detection of cliques in valued graphs. Sociometry 32 (2), 237-242.

Feld, S. L., 1981. The focused organization of social ties. American Journal of Sociology 86, 1015-1035.

Friedkin, N. E., 1984. Structural cohesion and equivalence explanations of social homogeneiety. Sociological Methods and Research 12, 235-261.

Granovetter, M., 1973. The strength of weak ties. American Journal of Sociology 78, 1360-1380.

Heider, F., 1946. Attitudes and cognitive organization. Journal of Psychology 21, 107-112.

Holland, P. W., Leinhardt, S., 1970. A method for detecting structure in sociometric data. American Journal of Sociology 76, 492-513.

Holland, P. W., Leinhardt, S., 1971. Transitivity in structural models of small groups. Comparative Group Studies 2, 107-124.

Louch, H., 2000. Personal network integration: Transitivity and homophily in strong-tie relations. Social Networks 22, 45-64.

Luce, R. D., Perry, A. D., 1949. A method of matrix analysis of group structure. Psychometrika 14 (1), 95-116.

Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163.

Panzarasa, P., Opsahl, T., Carley, K. M., 2009. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60 (5), 911-932.

Scott, J., 2000. Social Network Analysis: A Handbook. Sage Publications, London, UK.

Simmel, G., 1950. The Sociology of Georg Simmel (KH Wolff, trans.). Free Press, New York, NY.

Snijders, T. A. B., 2001. The statistical evaluation of social network dynamics. Sociological Methodology 31, 361-395.

Snijders, T. A. B., Pattison, P. E., Robins, G. L., Handcock, M. S., 2006. New specifications for exponential random graph models. Sociological Methodology 35, 99-153.

Uzzi, B., Lancaster, R., 2004. Embeddedness and price formation in the corporate law market. American Sociological Review 69, 319-344.

Uzzi, B., Spiro, J., 2005. Collaboration and creativity: The small world problem. American Journal of Sociology 111, 447-504.

Wasserman, S., Faust, K., 1994. Social Network Analysis. Cambridge University Press, Cambridge, MA.

Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.

If you use any of the information on this page, please cite: Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163

11 Comments Add your own

  • 1. Jinie Pak  |  July 17, 2013 at 9:19 pm

    Hi Tore,
    I have a question about tnet clustering coefficient function.
    My data is weighted one-mode network (edge list).
    I tried to get clustering coefficient score for this network, but I kept getting the following error messages:

    Error in clustering_local_w(data) : Network is not undirected!
    Measure is not defined from directed networks.
    In addition: Warning message:
    In as.tnet(net, type = “weighted one-mode tnet”) :
    There were self-loops in the edgelist, these were removed
    Can you point out what I am doing wrong?
    Thank you!

    Reply
    • 2. Tore Opsahl  |  July 19, 2013 at 12:49 am

      Jinie,

      This happens because you have ties from a node to the same node (i.e., where the i column is equal to the j column).

      Tore

      Reply
  • 3. Theo  |  October 24, 2013 at 9:57 am

    Hi Tore,

    A question about the global clustering function.

    I apply the function to small weighted networks. I normally keep the network data in a matrix format. I have converted the matrixes to edgelists before I applied the global clustering coefficients function but I noticed that the results are the same as when I apply the function directly to the data in the matrix format. Is there any special reason why I should keep converting the data?

    Best regards!
    Theo

    Reply
    • 4. Tore Opsahl  |  October 24, 2013 at 2:14 pm

      Hi Theo,

      Thank you for your comment!

      All tnet functions can be applied to edgelists and matrices. Specifically, the as.tnet-function is also run first by the functions if it hasn’t manually be run already. This function makes a number of assumptions based on the properties of the supplied data object:

      1. if it has 3 columns, it is assumed to be “weighted one-mode tnet”
      2. if it has 2 columns, it is assumed to be “binary two-mode tnet”
      3. if it has 4 columns, it is assumed to be “longitudinal tnet”
      4. if it has more than 4 columns and the same number of rows as columns, it is transformed into an edgelist and assumed to be “weighted one-mode tnet”
      5. if it has more than 4 columns, not the same number of rows as columns, and consists of only 0s and 1s, it is transformed into an edgelist and assumed to be “binary two-mode tnet”
      6. if it has more than 4 columns, not the same number of rows as columns, and consists of not only 0s and 1s, it is transformed into an edgelist and assumed to be “weighted two-mode tnet”

      As good practice, I would always recommend running as.tnet and specifying the type-parameter manually after loading your network. For example:

      net.matrix <- read.table(…)
      net <- as.tnet(net.matrix, type="weighted one-mode tnet")
      clustering_w(net)

      Best,
      Tore

      Reply
  • 5. Theo  |  October 25, 2013 at 10:19 am

    Hi Tore,

    Thank you for your quick reply and a very useful answer! Since I work with square matrices tnet transforms them to weighted one-mode tnet. I will check the type-parameter to be sure.

    Since I work with small networks I want to keep track of different nodes. The fact that nodes are given a number when the matrix is converted to an edgelist makes this more difficult. Is there an easy way to get node ID’s from a matrix to the edgelist?

    Best regards,
    Theo

    Reply
    • 6. Tore Opsahl  |  October 26, 2013 at 2:04 pm

      Hi Theo,

      I am not entire sure what you mean by node ids as this information is not contained in the matrix of ties except for row/column numbers. The as.tnet function takes the row/column number as the id by using which applied to a matrix with arr.ind equal to TRUE (line 40 in the as.tnet-function: net 0, arr.ind = TRUE), net[net > 0]) ).

      If you would like consistent id numbers, I suggest explicitly create edgelists with the numbers.

      Best,
      Tore

      Reply
  • 7. Eduarda  |  March 24, 2014 at 4:40 pm

    Hi Tore,
    a have a question about the output of clustering_w. The result is supposed to be in a range between 0 and 1 right? I’m applying this to a non directed network represented by a symmetric matrix of 44 vertex and weighted edges. All the values are non negative.
    Using the comand:

    clu2<-clustering_w(O2,measure=c("am", "gm", "ma", "mi"))

    I'm getting

    clu2
    am gm ma mi
    1.986471 4.322812 7.850534 1.288835

    Do you have any idea of what am I doing wrong?

    Reply
    • 8. Tore Opsahl  |  March 25, 2014 at 1:51 am

      Hi Eduarda,

      The clustering coefficient should indeed be between 0 and 1. Could you send me your data and code? Then I will have a look at it.

      Best,
      Tore

      Reply
  • 9. Lorenzo  |  July 26, 2014 at 6:12 pm

    Hello. I’d like to leave you a question.
    On en.wikipedia, at a certain point (https://en.wikipedia.org/wiki/Clustering_coefficient#Global_clustering_coefficient), it says that Watts and Strogatz defined the GCC as (as I have understood) the average of the LCCs. But I’ve done some tries and the method of the triplets and the method of the average of the LCCs are not equivalent.
    How is it possible?

    Reply
    • 10. Tore Opsahl  |  July 26, 2014 at 6:33 pm

      Hi Lorenzo,

      There are multiple ways of defining the overall level of clustering in a network. The global clustering coefficient which is defined as 3x triangles over triples has been around for a long time (all the way back to Luce and Perry, 1949). In their 1998 paper, Watts and Strogatz introduced the local clustering coefficient, which is a node level metric (studying the number of ties among neighbors were not a new idea, see Granovetter, 1973, Coleman, 1988, and Burt, 1992). To get an overall coefficient, Watts and Strogatz took a simple average of the node-level scores. By using a simple average, you get a different answer than the global clustering coefficient if the nodes do not all have the same number of ties. To get the same answer, you can use a weighted average where each node is weighted by the number of triplets they are the center of, n*(n-1) (see code below).

      Hope this helps and clarifies!
      Tore

      # Load tnet and tnet's datasets
      library(tnet)
      data(tnet)
      
      # Global clustering coefficient of US 500 airport network
      clustering_w(USairport.n500.net, measure="bi")
      
      # 0.3513818 
      
      # Local clustering scores
      lc <- clustering_local_w(USairport.n500.net, measure="bi")
      
      # Get degree scores
      k <- degree_w(USairport.n500.net, measure="degree")
      
      # Triplets
      t <- k[,"degree"]*(k[,"degree"]-1)
      
      # Weighted mean
      weighted.mean(lc[,"bi"], t)
      
      # 0.3513818
      
      Reply
      • 11. Lorenzo  |  July 27, 2014 at 11:29 am

        Thank you very much for the immediate response!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Subscribe to the comments via RSS Feed


Follow

Get every new post delivered to your Inbox.

Join 75 other followers

%d bloggers like this: