## Clustering in Weighted Networks

A fundamental measure that has long received attention in both theoretical and empirical research is the clustering coefficient. This measure assesses the degree to which nodes tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties (Feld, 1981; Friedkin, 1984; Heider, 1946; Holland and Leinhardt, 1970; Louch, 2000; Snijders, 2001; Snijders et al., 2006; Watts and Strogatz, 1998). In real-world networks, this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971; Wasserman and Faust, 1994; Watts and Strogatz, 1998).

Traditionally, the two versions of the clustering coefficient developed for testing the tendency of nodes to cluster together into tightly knit groups are the global clustering coefficient (Doreian, 1969) and the local clustering coefficient (Watts and Strogatz, 1998). The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the embeddedness of single nodes.

**The Global Clustering Coefficient**

The global clustering coefficient is based on triplets of nodes. A triplet is three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. A triangle consists of three closed triplets, one centred on each of the nodes. The global clustering coefficient is the number of closed triplets (or *3 x triangles*) over the total number of triplets (both open and closed). The first attempt to measure it was made by Luce and Perry (1949). This measure gives an indication of the clustering in the whole network (global), and can be applied to both undirected and directed networks (often called transitivity, see Wasserman and Faust, 1994, page 243). However, it cannot be applied to weighted networks.

For this sample network, the binary global clustering coefficient would be 3 over 9, or 0.33. The closed triplets are B→A←C; A→B←C; A→C←B; whereas the open triplets are A→B←D; A→B←E; C→B←D; C→B←E; D→B←E; B→E←F. As can be seen from the sample network, the strongest ties are inside the triangle. This is not captured by the binary coefficient as the weights are not considered.

A generalisation of the global clustering coefficient to weighted networks was proposed by Opsahl and Panzarasa (2009). This generalisation required a triplet value to be defined, and then, divided the value of closed triplets on the value of all triplets. The generalised clustering coefficient produces the same result as the binary global clustering coefficient when it is applied to a binary network. This is because all triplets have the same value, irrespective of the method used to calculate triplet values. In addition, the generalised coefficient has the same properties as C. Moreover, if weights are randomly assigned in the network, the weighted clustering coefficient is equal to the binary coefficient.

Four methods was suggested for defining a triplet value: the arithmetic mean, geometric mean, maximum, and minimum of the two tie weights that make up the triplet. The arithmetic mean is simply the average of the tie weights. This method is insensitive to differences between the two tie weights as an extreme value can have a major impact on the triplet value. The geometric mean of the weights attached to the two ties overcomes some of the sensitivity issues as a triplet made up by a tie with a low value and a tie with a high value will have a lower value than if the arithmetic mean were used. In addition, it is possible to use two extreme methods to deal with differences in tie weights. The maximum method takes the highest value of the two weights, and will make a triplet with a strong tie and a weak tie equal to a triplet with two strong ties. Conversely, the minimum method takes the lowest value of the two weights, and make triplets with a strong tie and a weak tie equal to triplets with two weak ties. The table below highlights the differences between the methods of defining the triplet value (adopted from Chapter 2 of my thesis).

It is vital to use an appropriate method for defining the value of a triplet as this impacts on the outcome of the coefficient. The method should be chosen based on the research question as well as the way in which the strength of the ties are operationalised into weights. For example, in a network where the weights correspond to the level of flow, and a weak tie would act as a bottleneck, the minimum method might be most appropriate to use.

To highlight a potential use of the binary and generalised global clustering coefficient, it is possible to see whether stronger triplets are more likely to be closed than weaker ones. This can be done by comparing the weighted version () to the binary version calculated on a binary network where all ties with a weight greater than 0 is set to present (). In so doing, the two coefficients are calculated on the same network structure. The only difference is that the former takes the weights of ties into consideration. If the ratio between the weighted clustering coefficient and the binary coefficient is higher than 1, it can be argued that triplets made up by strong ties are more likely to be closed than triplets made up by weak ties (). On the contrary, it can be argued that triplets made up by weak ties are more likely to be closed than those made up by strong ties if the ratio is less than 1.

To empirically investigate whether strong ties are more likely to be part of closed triplets than weak ties, I have used the tnet networks. For simplicity, the arithmetic mean method is used for defining triplet values.

Network | Nodes | Ties | ||||
---|---|---|---|---|---|---|

1 | Freeman’s EIES network (time 1) | 48 | 695 | 0.7627 | 0.7702 | 1.0099 |

2 | Freeman’s EIES network (time 2) | 48 | 830 | 0.8131 | 0.8214 | 1.0102 |

3 | Freeman’s EIES network (messages) | 32 | 460 | 0.6386 | 0.7378 | 1.1555 |

4 | Consulting (advice) | 46 | 879 | 0.6932 | 0.7130 | 1.0285 |

5 | Consulting (value) | 46 | 858 | 0.6764 | 0.6852 | 1.0131 |

6 | Research team (advice) | 77 | 2228 | 0.6848 | 0.7127 | 1.0408 |

7 | Research team (awareness) | 77 | 2326 | 0.6785 | 0.6957 | 1.0253 |

8 | C.elegans’ neural network | 306 | 2345 | 0.1818 | 0.2364 | 1.3009 |

9 | US airport network | 500 | 2980 | 0.3514 | 0.4765 | 1.3562 |

10 | Newman’s scientific collaboration network | 16730 | 47594 | 0.3596 | 0.3178 | 0.8838 |

As it is possible to see from the above table, 9 of the 10 networks have a weighted clustering coefficient that is higher than the traditional clustering coefficient. This suggest that triangles generally are, in fact as suggested by Simmel (1923[1950]), made up by strong ties. The tenth network, which shows the opposite result, is different from the other nine networks. It is a one-mode projection of a two-mode network, whereas the first nine networks are native one-mode networks. This feature could explain the observed result as the one-mode projection is constructed in a way that penalises the weight of ties within triangles, or more specifically, large fully connected cliques (Newman, 2001). See the projecting two-mode networks.

**Local Clustering Coefficient**

The local clustering coefficient is based on ego network density or local density (Scott, 2000; Uzzi and Spiro, 2005; Watts and Strogatz, 1998). For a node, this is the fraction of the number of present ties over the total number of possible ties between the node’s neighbours. Therefore, the outcome ranges between 0 and 1: 0 if no ties exist between the neighbours, and 1 if all possible ties exists.

For the sample network, nodes A and C would get a score of 1 since all possible ties among their neighbours are present, whereas node E would get a score of 0 since none of the possible ties among its neighbours are present. Nodes D and F would not get a value since the number of possible ties among their neighbours is 0 (if a node has less than 2 neighbours, the coefficient is undefined). For node B, one out of six possible ties is present, so the coefficient is 1/6 or 0.1667.

The main advantage of this version of the clustering coefficient is that a score is assigned to each node (local measure). This enables researchers to study associations between the coefficient and other nodal properties (e.g. Panzarasa et al., 2009) and perform regression analyses with the observations being the nodes of a network (e.g. Uzzi and Lancaster, 2004). However, this version of the clustering coefficient suffers from three major limitations (see Opsahl and Panzarasa, 2009, for a longer discussion). First, its outcome does not take into consideration the weight of the ties in the network. Second, the local clustering coefficient cannot be calculated on directed networks. Third, a negative correlation with degree is often found in real-world networks. This is due to the fact that it is “easier” for a node with two neighbours to get a score of 1 (only one tie is need) than for a node with 10 neighbours (45 ties must be present).

Recently, there have been a number of attempts to extend the local clustering coefficient to the case of weighted networks (Barrat et al., 2004; Lopez-Fernandez et al., 2004; Onnela et al., 2005; Zhang and Horvath, 2005). I will focus on Barrat et al. (2004) measure. They proposed a generalisation of the local clustering coefficient to weighted networks by taking the weights explicitly into account. First, they assigned a triplet value to each triplet in the network based on the arithmetic mean (normal average). Then for each node, they summed the value of the closed triplets that were centred on the node and divided it by the total value of all triplets centred on the node. Similarily to the weighted global clustering coefficient, triplet values do not have to be the arithmetic mean, but could also be the geometric mean, maximum, and minimum of the two tie weights.

For the above sample network, all the nodes, except node B, would get the same outcome as the binary analysis. Nodes A and C get a value of 1 because all their neighbours are connected, whereas node E get a value of 0 because none of its neighbours are connected. Nodes D and E do not have a value as they have less than 2 neighbours. Node B does not get the same value as it has more than 2 neighbours, and is the centre of both open and closed triplets. The weighted local clustering coefficient is between 0.18 and 0.36, while the binary one is 0.17. The increased weighed coefficient is a reflection of node B’s strong ties being directed towards neighbours that are themselves connected. The table below highlights the local clustering coefficients.

Node | Weighted local clustering coefficient using | ||||
---|---|---|---|---|---|

arithmetic mean | geometric mean | maximum method | minimum method | binary version | |

1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

2 | 0.24 | 0.27 | 0.18 | 0.36 | 0.17 |

3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

4 | NaN | NaN | NaN | NaN | NaN |

5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |

6 | NaN | NaN | NaN | NaN | NaN |

**Want to test it with your data?**

The clustering coefficients can be calculated using tnet. First, you need to download and install tnet in *R*. Then, you need to create an edgelist of your network (see the data structures in tnet for weighted one-mode networks). The commands below show how the edgelist for the sample network here can manually be entered, and how to calculate the clustering coefficients.

# Load tnet library(tnet) # Manually enter the sample network net <- cbind( i=c(1,1,2,2,2,2,3,3,4,5,5,6), j=c(2,3,1,3,4,5,1,2,2,2,6,5), w=c(4,2,4,4,1,2,2,4,1,2,1,1)) # Compute the global clustering coefficients clustering_w(net, measure=c("bi", "am", "gm", "ma", "mi")) # Compute the local clustering coefficients clustering_local_w(net, measure=c("bi", "am","gm","ma","mi"))

The code used to create the above table was:

# Load tnet library(tnet) # Load networks data(tnet) net <- list() net[[1]] <- Freemans.EIES.net.1.n48 net[[2]] <- Freemans.EIES.net.2.n48 net[[3]] <- Freemans.EIES.net.3.n32 net[[4]] <- Cross.Parker.Consulting.net.info net[[5]] <- Cross.Parker.Consulting.net.value net[[6]] <- Cross.Parker.Manufacturing.net.info net[[7]] <- Cross.Parker.Manufacturing.net.aware net[[8]] <- celegans.n306.net net[[9]] <- USairport.n500.net net[[10]] <- Newman.Condmat.95.99.net.1mode.wNewman # Calculate values output <- data.frame(CGT0=NaN, Cw=NaN, Ratio=NaN) for(i in 1:length(net)) output[i,c("CGT0","Cw")] <- clustering_w(net[[i]], measure=c("bi","am")) output[,"Ratio"] <- output[,"Cw"]/output[,"CGT0"]

**References**

Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A., 2004. The architecture of complex weighted networks. Proceedings of the National Academy of Sciences 101 (11), 3747-3752. arXiv:cond-mat/0311416

Burt, R. S., 1992. Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge, MA.

Coleman, J. S., 1988. Social capital in the creation of human capital. American Journal of Sociology 94, S95-S120.

Colizza, V., Pastor-Satorras, R., Vespignani, A., 2007. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nature Physics 3, 276-282. arXiv:cond-mat/0703129

Doreian, P., 1969. A note on the detection of cliques in valued graphs. Sociometry 32 (2), 237-242.

Feld, S. L., 1981. The focused organization of social ties. American Journal of Sociology 86, 1015-1035.

Friedkin, N. E., 1984. Structural cohesion and equivalence explanations of social homogeneiety. Sociological Methods and Research 12, 235-261.

Granovetter, M., 1973. The strength of weak ties. American Journal of Sociology 78, 1360-1380.

Heider, F., 1946. Attitudes and cognitive organization. Journal of Psychology 21, 107-112.

Holland, P. W., Leinhardt, S., 1970. A method for detecting structure in sociometric data. American Journal of Sociology 76, 492-513.

Holland, P. W., Leinhardt, S., 1971. Transitivity in structural models of small groups. Comparative Group Studies 2, 107-124.

Louch, H., 2000. Personal network integration: Transitivity and homophily in strong-tie relations. Social Networks 22, 45-64.

Luce, R. D., Perry, A. D., 1949. A method of matrix analysis of group structure. Psychometrika 14 (1), 95-116.

Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163.

Panzarasa, P., Opsahl, T., Carley, K. M., 2009. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60 (5), 911-932.

Scott, J., 2000. Social Network Analysis: A Handbook. Sage Publications, London, UK.

Simmel, G., 1950. The Sociology of Georg Simmel (KH Wolff, trans.). Free Press, New York, NY.

Snijders, T. A. B., 2001. The statistical evaluation of social network dynamics. Sociological Methodology 31, 361-395.

Snijders, T. A. B., Pattison, P. E., Robins, G. L., Handcock, M. S., 2006. New specifications for exponential random graph models. Sociological Methodology 35, 99-153.

Uzzi, B., Lancaster, R., 2004. Embeddedness and price formation in the corporate law market. American Sociological Review 69, 319-344.

Uzzi, B., Spiro, J., 2005. Collaboration and creativity: The small world problem. American Journal of Sociology 111, 447-504.

Wasserman, S., Faust, K., 1994. Social Network Analysis. Cambridge University Press, Cambridge, MA.

Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.

1.Jinie Pak | July 17, 2013 at 9:19 pmHi Tore,

I have a question about tnet clustering coefficient function.

My data is weighted one-mode network (edge list).

I tried to get clustering coefficient score for this network, but I kept getting the following error messages:

Error in clustering_local_w(data) : Network is not undirected!

Measure is not defined from directed networks.

In addition: Warning message:

In as.tnet(net, type = “weighted one-mode tnet”) :

There were self-loops in the edgelist, these were removed

Can you point out what I am doing wrong?

Thank you!

2.Tore Opsahl | July 19, 2013 at 12:49 amJinie,

This happens because you have ties from a node to the same node (i.e., where the i column is equal to the j column).

Tore

3.Theo | October 24, 2013 at 9:57 amHi Tore,

A question about the global clustering function.

I apply the function to small weighted networks. I normally keep the network data in a matrix format. I have converted the matrixes to edgelists before I applied the global clustering coefficients function but I noticed that the results are the same as when I apply the function directly to the data in the matrix format. Is there any special reason why I should keep converting the data?

Best regards!

Theo

4.Tore Opsahl | October 24, 2013 at 2:14 pmHi Theo,

Thank you for your comment!

All tnet functions can be applied to edgelists and matrices. Specifically, the as.tnet-function is also run first by the functions if it hasn’t manually be run already. This function makes a number of assumptions based on the properties of the supplied data object:

notthe same number of rows as columns, and consists of only 0s and 1s, it is transformed into an edgelist and assumed to be “binary two-mode tnet”notthe same number of rows as columns, and consists ofnotonly 0s and 1s, it is transformed into an edgelist and assumed to be “weighted two-mode tnet”As good practice, I would always recommend running as.tnet and specifying the type-parameter manually after loading your network. For example:

net.matrix <- read.table(…)

net <- as.tnet(net.matrix, type="weighted one-mode tnet")

clustering_w(net)

Best,

Tore

5.Theo | October 25, 2013 at 10:19 amHi Tore,

Thank you for your quick reply and a very useful answer! Since I work with square matrices tnet transforms them to weighted one-mode tnet. I will check the type-parameter to be sure.

Since I work with small networks I want to keep track of different nodes. The fact that nodes are given a number when the matrix is converted to an edgelist makes this more difficult. Is there an easy way to get node ID’s from a matrix to the edgelist?

Best regards,

Theo

6.Tore Opsahl | October 26, 2013 at 2:04 pmHi Theo,

I am not entire sure what you mean by node ids as this information is not contained in the matrix of ties except for row/column numbers. The as.tnet function takes the row/column number as the id by using which applied to a matrix with arr.ind equal to TRUE (line 40 in the as.tnet-function: net 0, arr.ind = TRUE), net[net > 0]) ).

If you would like consistent id numbers, I suggest explicitly create edgelists with the numbers.

Best,

Tore

7.Eduarda | March 24, 2014 at 4:40 pmHi Tore,

a have a question about the output of clustering_w. The result is supposed to be in a range between 0 and 1 right? I’m applying this to a non directed network represented by a symmetric matrix of 44 vertex and weighted edges. All the values are non negative.

Using the comand:

clu2<-clustering_w(O2,measure=c("am", "gm", "ma", "mi"))

I'm getting

clu2

am gm ma mi

1.986471 4.322812 7.850534 1.288835

Do you have any idea of what am I doing wrong?

8.Tore Opsahl | March 25, 2014 at 1:51 amHi Eduarda,

The clustering coefficient should indeed be between 0 and 1. Could you send me your data and code? Then I will have a look at it.

Best,

Tore

9.Lorenzo | July 26, 2014 at 6:12 pmHello. I’d like to leave you a question.

On en.wikipedia, at a certain point (https://en.wikipedia.org/wiki/Clustering_coefficient#Global_clustering_coefficient), it says that Watts and Strogatz defined the GCC as (as I have understood) the average of the LCCs. But I’ve done some tries and the method of the triplets and the method of the average of the LCCs are not equivalent.

How is it possible?

10.Tore Opsahl | July 26, 2014 at 6:33 pmHi Lorenzo,

There are multiple ways of defining the overall level of clustering in a network. The global clustering coefficient which is defined as 3x triangles over triples has been around for a long time (all the way back to Luce and Perry, 1949). In their 1998 paper, Watts and Strogatz introduced the local clustering coefficient, which is a node level metric (studying the number of ties among neighbors were not a new idea, see Granovetter, 1973, Coleman, 1988, and Burt, 1992). To get an overall coefficient, Watts and Strogatz took a simple average of the node-level scores. By using a simple average, you get a different answer than the global clustering coefficient if the nodes do not all have the same number of ties. To get the same answer, you can use a weighted average where each node is weighted by the number of triplets they are the center of, n*(n-1) (see code below).

Hope this helps and clarifies!

Tore

11.Lorenzo | July 27, 2014 at 11:29 amThank you very much for the immediate response!

12.Rose | August 28, 2014 at 7:16 pmHi Tore,

I have a geographic distance matrix (distance of one household to another). I have taken the inverse distance, so households located closer to each other have stronger weights. All the households have the same number of connections. I am interested in calculating network level measures, instead of individual nodes. I have 100 20×20 (dimension) matrices. I have tried calculating the global clustering coefficient for each matrix after I have transformed each matrix to an edge list. However, the global clustering coefficient I get for all the matrices are 1, which is not what I would expect because of the different weights for each matrix.

Also, would you be able to recommend other network level measures? I know you have discussed some of the issues of calculating somehow aggregate measures, but I was just wondering maybe it may not be a big issue if all households are connected to the same number of households anyway and that the network structure is fixed (household location)? Thanks in advance.

Sincerely,

Rose

13.Tore Opsahl | August 30, 2014 at 12:02 amHi Rose,

The clustering metrics all become 1 when the network is fully connected as would be in your case. Have you thought about the weighted rich-club?

Let me know how you end up analyzing your network!

Tore

14.Rose | August 30, 2014 at 12:38 amHi Tore,

Ohh, that was what I was guessing. Thanks for confirming.

About the weighted rich-club suggestion, I did see the R commands you posted. I also read through you paper and I think it is a good measure for what I am looking for. I followed what you have posted (which may be a wrong application for me), but I think I may be doing something wrong and hence not getting the weighted rich club coefficient. Here is the command I used:

out <- weighted_richclub_w(edge_b2, rich="k", reshuffle="links", NR=1000, seed=1)

where edge_b2 is an edgelist for one of my neighborhoods. Is that how I do it? Cause I'm not getting the answer. Sorry if this seems like an easy question but I couldn't figure it out as of now. Thanks so much!

Sincerely,

Krisha

15.Tore Opsahl | August 30, 2014 at 12:39 amKrisha,

In the case of a fully connected network, I would use the weight reshuffling. Let me know how it works!

Tore