Local clustering coefficient for two-mode networks

January 6, 2010

This post is part of a preprint titled Triadic closure in two-mode networks: Redefining the global and local clustering coefficient. The preprint is available on the Publications-page.

In a similar vein as the global clustering coefficient that I proposed in Clustering in two-mode networks, the local clustering coefficient can be redefined for two-mode networks. Originally, Watts and Strogatz (1998) defined the local clustering coefficient for a focal node as the fraction of present ties among a node’s neighbors over the possible number of ties between them. It can be formalized for a focal node, i as follows:

C_{local}(i) = \frac{\mbox{actual ties between a node's neighbours}}{\mbox{possible ties between a node's neighbours}}=\frac{\tau_{i,\Delta}}{\tau_{i}}

where \tau_{i} is the number of 2-paths centered on a node, and \tau_{i,\Delta} is the number of these that are closed. While the global clustering coefficient is an aggregation of all 2-paths, the local one can be seen as simply an intermediary level of aggregation as it can be conceptualized in terms of 2-paths.

When applying the traditional local clustering coefficient to the projection of two-mode network, cliques among nodes connected to common nodes in the two-mode network are created. These cliques contain a high number of triangles. This has an impact on measures that rely on ego network density, such as the clustering coefficients and structural holes measures (Burt, 1992). The average of local clustering coefficients is over-estimated for these networks as projections of random two-mode networks contain an above random clustering coefficient. Therefore, a new measure that does not over-estimate the level is needed.

Given the extention of 2-paths in one-mode networks to 4-paths in two-mode networks for the global clustering coefficient, the denominator and numerator of the local clustering coefficient can also be redefined in terms of 4-paths. While the original local coefficient was based on 2-paths centered on the focal node, this can be extended to 4-paths centered on a focal node in two-mode networks. This would imply that the first and last nodes of the path are of the same mode as the focal node. Formally, I propose:

C^{*}_{local}(i) = \frac{\mbox{closed 4-paths centered on ego}}{\mbox{4-paths centered on ego}}=\frac{\tau^{*}_{i,\Delta}}{\tau^{*}_{i}}

where \tau^{*}_{i} is the number of 4-paths with ego as the middle node, and \tau^{*}_{i,\Delta} is the subset of these in which the first and the last nodes of the path share a common node that is not part of the 4-path.

This coefficient has similar properties as the local clustering coefficient. First, for each node, the coefficient varies between 0 and 1 as the numerator and denominator are positive numbers, and the numerator is a subset of the denominator. Second, all 4-paths are closed in a fully connected network, and therefore, the coefficient is equal to 1. Third, if ties are randomly placed in the network, the expected value of the local clustering coefficient is the same as the one for the global coefficient, 1-(1-d^{2})^{(N_{p}-2)} where d is the density of the network.

Empirical test

To empirically test the proposed local clustering coefficient for two-mode networks, I have also used the Davis Southern Women dataset as this dataset has a limited number of nodes. The table below shows the local clustering coefficients attained from the two-mode network and projected one-mode network as well as the two-mode and one-mode degree scores (i.e., the number of events attended and the number of other women attending the same events, respectively).

Node Events attended Other women
attending
same events
One-mode LCC Two-mode LCC
EVELYN 8 17 0.8971 0.7667
LAURA 7 15 0.9619 0.8422
THERESA 8 17 0.8971 0.7523
BRENDA 7 15 0.9619 0.8388
CHARLOTTE 4 11 1 1
FRANCES 4 15 0.9619 0.869
ELEANOR 4 15 0.9619 0.7959
PEARL 3 16 0.9333 0.6463
RUTH 4 17 0.8971 0.6703
VERNE 4 17 0.8971 0.6741
MYRNA 4 16 0.9333 0.7139
KATHERINE 6 16 0.9333 0.7696
SYLVIA 7 17 0.8971 0.7462
NORA 8 17 0.8971 0.838
HELEN 5 17 0.8971 0.8159
DOROTHY 2 16 0.9333 0.5407
OLIVIA 2 12 1 0.5806
FLORA 2 12 1 0.5806

The two-mode and one-mode degree scores and the traditional and proposed local clustering coefficients (LCC) of the women in Davis’ (1940) Southern Women dataset. The randomly expected one-mode clustering coefficient is 0.9085, while the one for two-mode networks is 0.7978.

There are a number of observations. First, for all the nodes that did not have the maximum value, the two-mode coefficient is smaller than the coefficient attained on the projected network. This feature is not given as multiple 4-paths might exist among three primary nodes, and therefore, the two-mode coefficient might be higher than the one attained on projected one-mode network. It gives, however, an indication of the bias that is created by three or more primary nodes are connected to a common node. Second, the reduction difference between the two coefficients is greater for the women attending fewer events (pair-wise correlation between the number of events and the difference is -0.69, with a p-value of 0.001). This might suggest that the bias is greatest for nodes that attend few events. This is not unexpected as a woman attending a single event with at least two others would automatically attain a coefficient of 1 in the binary network.

To further highlight some of the features of the redefined local clustering coefficient, Flora and the network around her up to three steps is shown below. In a one-mode projection, all the possible ties among Flora’s contacts are present. This is due to the fact that eleven out of the twelve contacts attended event 9. The twelfth contact that did not attend event 9, Helen, is connected to all others connect through other events. The redefined clustering coefficient is less than 1 for Flora. This is because event 9 and 11 are not used to form closing ties among the women attending them (i.e., close 4-paths). More specifically, 4-paths exist from the nodes attached to event 11 to the nodes connected to event 9 (excluding themselves). In total, there are 31 4-paths, out of which 18 are closed by the event 6, 7, 8, and 10.


Flora’s local network up to three steps. Only non-redudant ties are shown between the second and third steps.

References

Burt, R.S., 1992. Structural Holes. Harvard University Press, Cambridge, MA.
Davis, A., Gardner, B. B., Gardner, M. R., 1941. Deep South. University of Chicago Press, Chicago, IL.
Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of small-world networks. Nature 393, 440-442.

What to try it with your data

The redefined local clustering coefficient is implemented in tnet as clustering_tm_local. Below is the code for analysing a sample network and Davis’ (1940) Southern Women dataset is shown.

# Load tnet
library(tnet)

# Load networks
net <- cbind(
 i=c(1,1,2,2,2,3,3,4,5,5,6),
 p=c(1,2,1,3,4,2,3,4,3,5,5),
 w=c(3,5,6,1,2,6,2,1,3,1,2))

# Obtain the binary local clustering coefficients of the nodes in the sample network
clustering_tm_local(net[,1:2])

# Obtain the weighted local clustering coefficients of the nodes in the sample network
clustering_tm_local(net)

# Obtain the binary local clustering coefficients of the women in Davis' (1940) Southern Women dataset
data("Davis.Southern.women")
clustering_tm_local(Davis.Southern.women.2mode)

The output from the binary and weighted analyses of the sample network is:

 node  lc
    1 1.0
    2 0.2
    3 0.5
    4 NaN
    5 0.0
    6 NaN

 node  lc     lc.am     lc.gm     lc.ma     lc.mi
    1 1.0 1.0000000 1.0000000 1.0000000 1.0000000
    2 0.2 0.2400000 0.2313222 0.2608696 0.2000000
    3 0.5 0.4666667 0.4317651 0.5000000 0.3333333
    4 NaN       NaN       NaN       NaN       NaN
    5 0.0 0.0000000 0.0000000 0.0000000 0.0000000
    6 NaN       NaN       NaN       NaN       NaN
I would like to acknowledge John Skvoretz and Filip Agneessens in helping to develop the idea behind this post.
Please cite or link to this post if you use it.

Entry Filed under: Network thoughts. .

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Welcome

Tore OpsahlMy aim for this blog is to explore and throw out in the open some of the ideas about social network analysis that I have, but no time to implement. Many of my ideas stem from my interest in weighted networks and my belief that the weights are an enormous source of data. However, many social network measures require that the weights are discarded. In so doing, the richness of the data is considerably reduced. In turn, this limits the analysis.

Recent Posts

Upcoming Posts

Creating an ensemble of binary networks from a weighted one

Closeness in weighted networks

tnet: Software for Analysing Two-Mode Networks

Links

Feeds

Licensing

The information on this blog is published under the Creative Commons Attribution-Noncommercial 3.0-lisence.

This means that you are free to:
· share (copy, distribute and transmit)
· remix (adapt)
under the following conditions:
· attribution (you must cite this blog)
· noncommercial (you may not use it for
   commercial purposes).