Clustering in Two-mode Networks
A subject that has long received attention in both theoretical and empirical research is nodes’ tendency to cluster together. Evidence suggests that in most real-world networks, and especially in social networks, nodes cluster into densely connected groups (Holland and Leinhardt, 1970; Opsahl and Panzarasa, 2009). A number of measures have been developed for testing this tendency. Specifically, the global clustering coefficient assesses the overall level of clustering in a network (Luce and Perry, 1949), and the local clustering coefficient assesses the clustering in a single node’s immediate network (i.e., the node and its contacts; Watts and Strogatz, 1998). Formally, the global coefficient is the fraction of triplets or 2-paths (i.e., three nodes connected by two ties) that are closed by the presence of a tie between the first and the third node, and the local clustering coefficient is the fraction of 2-paths centered on the focal node that are closed by the presence of a tie between the first and the third node. For a review, see Opsahl and Panzarasa (2009) or the Clustering in Weighted Networks-page.
Applying the two clustering coefficients directly to a two-mode network is senseless as two-mode networks are bipartite and, thus, the contacts of a node cannot be connected to each other by construction and no triangles can exist (Borgatti and Everett, 1997). When one-mode measures cannot directly be applied to two-mode networks, there are two options: (1) transform two-mode networks into one-mode networks so that the one-mode measure can be used and (2) use the specially designed measures for two-mode networks.
Why not transform two-mode networks into one-mode networks when measuring clustering
A number of issues are present when measuring clustering in transformed two-mode network. Transforming a two-mode network to a one-mode network is often done using a method known as projection. This method operates by selecting the primary nodes and linking two nodes if they have at least one common secondary node. Projected two-mode network tends to have more and larger fully-connected cliques than prototypical one-mode networks (Wasserman and Faust, 1994). This has direct implications for network measures, especially those based on triangles including the structural holes measures (Burt, 1992, 2005) and the clustering coefficients (Opsahl and Panzarasa, 2009). To exemplify the cliques, and the many closed triplets, produced when projecting a two-mode network, this diagram shows the main component of the interpersonal network among Norwegian directors (Seierstad and Opsahl, 2011).
To further explore whether projected networks contain many more triangles than prototypical networks with a similar tendency for triadic closure, I randomized the two-mode structure of a scientific collaboration network (Newman, 2001) while maintaining the degree distributions (i.e., randomly assigning the ties in the two-mode network while keeping each author’s number of co-authored papers, and each paper’s number of authors; see Random Two-mode Networks) before projecting it onto a one-mode network and calculating the global clustering coefficient. By using this randomization procedure, the triangles formed due to the two-mode structure are maintained in the one-mode projection. Across 1,000 projected random two-mode networks, the average global clustering coefficient was 0.1236, which is over 350 times larger than the coefficient in corresponding one-mode classical random networks. As such, the default benchmark value of the clustering coefficient is not appropriate when projecting a two-mode network.The local clustering coefficient also suffers from similar issues if applied to projections of two-mode data. Specifically, it is inversely related to nodes’ two-mode degree (i.e., the number of ties in the two-mode network). If a node is connected to a single secondary node in the two-mode network with at least two other primary nodes, it will automatically have a clustering coefficient of 1 as the node’s contacts are connected by default. To highlight the relationship between the local clustering coefficient and nodes’ two-mode degree, this figure shows the local clustering coefficient for nodes in Newman’s (2001) scientific collaboration network where the ties have been randomly allocated in such a way that each node (i.e., authors and papers) maintain their number of ties. As can be seen, in random versions of a sparse two-mode network, the local clustering coefficient is roughly the inverse of nodes’ two-mode degree, and not simply the density of ties among their neighbours.
Clustering coefficients for two-mode networks: Reinforcement
To overcome this bias, a number of clustering coefficients for two-mode networks have been proposed in the literature (Latapy et al., 2008; Lind et al., 2005; Opsahl, 2012; Robins and Alexander, 2004; Zhang et al., 2008). The first set of clustering coefficients for two-mode networks are based on 4-cycles, which is the smallest possible cycle in two-mode networks. For example, Robins and Alexander (2004) defined a coefficient as the ratio between the number of 4-cycles and the number of 3-paths. This measure is illustrated in Panel A of the diagram to the right. The solid lines represent a 3-path and the count of these in a network would be the denominator. If the dashed line was present, the 3-path would be closed and part of a 4-cycle. The count of these would be the numerator. Formally, this coefficient is:
This measure departs from the concept of triadic closure as a 3-path would simply be, in the case of scientific collaboration networks, the number of papers that an author’s collaborators have co-authored, and a 4-cycle would be two authors collaborating twice (see Panel B of the diagram). Although this could be viewed as a form of clustering, it would not be triadic closure as it includes only two individuals. In fact, it could be considered a measure of reinforcement between two individuals rather than clustering of a group of individuals. From an evolving network perspective (e.g., in ERG models and SIENA; Peng et al., 2009; Snijders et al., 2011), it could be conceptualized as the adaptation or agreement that leads a node to connect to the same group as others.
Clustering coefficients for two-mode networks: Global coefficientThe fundamental purpose of the one-mode clustering coefficient was to detect closure among three nodes. Based on this concept, Opsahl (2012) proposed a new coefficient for two-mode networks that measures closure among three nodes from the primary node set instead of only two primary nodes (e.g., Robins and Alexander, 2004). Specifically, the denominator and numerator of the one-mode global clustering coefficient can be redefined in terms of 4-paths and closed 4-paths, respectively. This is due to the fact that all 4-paths in a two-mode network are 2-paths in a one-mode projection of the network; however, not all 2-paths in a one-mode projection are created from 4-paths. In fact, 2-paths can also be created due to multiple nodes being connected to the same node. The 2-paths created by the latter mechanism would be excluded when only considering 4-paths in the two-mode structure. This feature is illustrated in Panels A and B of the diagram to the right. In the first panel (A), there are five 4-paths, three of which are closed. These 4-paths represent five 2-paths in the one-mode projection (panel B). However, in the one-mode projection, there are an additional three 2-paths. These are created among node 2, node 3, and node 4 as these nodes are all connected to node C in the two-mode network. The clustering coefficient of the two-mode network (panel A) is 0.6, while the clustering coefficient of the one-mode projection (panel B) is 0.75. Formally, the global clustering coefficient for two-mode networks is:
where 4-paths that are closed by being part of at least one 6-cycle (i.e., a loop of six ties with five nodes).
The coefficient has a number of properties. First, it varies between 0 and 1 as the numerator and denominator are positive numbers, and the numerator is a subset of the denominator. Second, in a fully-connected network, it is equal to 1 as all 4-paths are closed. Third, in classical random two-mode networks (i.e., with a set number of nodes and density), the expected value is , where is the number of secondary nodes and d is the density of the two-mode network (Borgatti and Everett, 1997).
The one-mode global clustering coefficient has been extended to weighted networks by assigning a value to each triplet or 2-path (Opsahl and Panzarasa, 2009). This value is based on the weights of the two ties that compose the 2-path (e.g., the arithmetic mean, geometric mean, maximum, or minimum of the tie weights). The coefficient incorporates these values by being defined as the total value of closed 2-paths over the total value of all 2-paths. In addition to having the same parameters as the original coefficient, this coefficient produces the same outcome as the original one if all 2-paths have the same value (e.g., 1 in a binary network) or if the weights are randomly reshuffled in the network. In a similar spirit, the global two-mode coefficient can be generalized to weighted two-mode networks, such as those created from online forums where the weights are the number of messages or characters posted to a thread. Specifically, a 4-path-value could be defined instead of a 2-path value. This value should be constructed based on the four tie weights, and could be defined using the same four methods as the one-mode weighted clustering coefficient. For example, the 4-path from node 1 to node 4 (via nodes A, 2, and C) in Panel C of the diagram above would be assigned a value of 3.25, 2.71, 6, or 1 if the arithmetic mean, geometric mean, maximum, or minimum was used, respectively. The global coefficient based on the four methods would be 0.6494, 0.6806, 0.6, and 0.7778, respectively. Given that all 4-paths contain a tie weight of 6 and the topology of the network is identical to the binary two-mode network in Panel A, the coefficient attained with the maximum method is equal to the binary coefficient. The increases in the coefficients, when other methods for defining 4-path values are used, are a reflection of the fact that the closed 4-paths have relatively stronger ties than the open 4-paths. The various explanations given in Opsahl and Panzarasa (2009) for the differences between the methods for defining 2-paths values are also applicable in the case of 4-path values.
Clustering coefficients for two-mode networks: Local coefficient
As noted above, the local global clustering coefficient suffers from a number of issues if applied to projections of two-mode data. The local clustering coefficient can be redefined in a similar vein as the global clustering coefficient for two-mode networks. While the one-mode local coefficient was based on 2-paths centered on a focal node, this could be extended to 4-paths centered on a focal node in two-mode networks. This would imply that the first and last nodes of the path are of the same node set as the focal node. For example, node 3 in the network shown in Panel A of the diagram above is in the center of two 4-paths, where node 1 can be seen as the first node, and nodes 2 and 4 as the last ones. A 4-path does not exist from node 2 to node 4 (via node C, node 3, and node C) as node C is part of it twice. If such a path would be included in the measure, it would be closed by definition. Formally, the local clustering coefficient for two-mode networks is:
This coefficient has similar properties as the global coefficient. First, for each node, the coefficient varies between 0 and 1 as the numerator and denominator are positive numbers, and the numerator is a subset of the denominator. Second, all 4-paths are closed in a fully connected network, and therefore, coefficient is equal to 1. Third, if ties are randomly placed in the network, the expected value of the local clustering coefficient is . By using the generalization of the local clustering coefficient for weighted one-mode networks (Barrat et al., 2004), the two-mode local clustering coefficient can be generalized for weighted two-mode networks. By using the same 4-path values as the global coefficient, it is possible to differentiate them. In turn, it is possible to define a local clustering coefficient for weighted two-mode networks that share the same properties as the binary one, and is roughly equal to the binary one if weights are randomly assigned to ties.
Want to try with your data
The global and local clustering coefficients for two-mode networks can be calculated using tnet, and the reinforcement coefficient can be calculated using the R-code below. First, you need to download and install tnet in R. Then, you need to create an edgelist of your network (see the data structures in tnet for two-mode networks). The commands below show how to load Davis’ (1940) Southern Women data and how to calculate the three coefficients.
# Load tnet library(tnet) # Load Davis' (1940) Southern Women Dataset data(tnet) net <- Davis.Southern.women.2mode net <- as.tnet(net, type="binary two-mode tnet") # Calculate the reinforcement coefficient (Robins and Alexander, 2004) reinforcement_tm(net) # Calculate the global coefficient (Opsahl, 2012) clustering_tm(net) # Calculate the local coefficient (Opsahl, 2012) clustering_local_tm(net)
Code used on this page
The code below can be used to reproduce the first diagram showing the Norwegian Interlocking directorate.
# Load tnet library(tnet) # Download Norwegian interlocking directorate network net <- read.table("http://boardsandgender.com/data/net2m/net2m_2009-08-01.txt") net <- as.tnet(cbind(net[,2], net[,1]), type="binary two-mode tnet") # Download gender information for colouring net.gender <- read.table("http://boardsandgender.com/data/data_people.txt", header=TRUE) net.gender <- as.character(net.gender[,"gender"]) net.gender[net.gender=="1"] <- "#000000" net.gender[net.gender=="2"] <- "#FFFFFF" # Project onto a one-mode network net.projected <- projecting_tm(net, method="binary") # Load as igraph object net.igraph <- tnet_igraph(net.projected) # Set gender as a vertex attribute net.igraph <- set.vertex.attribute(net.igraph, "color", value=net.gender[1:vcount(net.igraph)]) # Extract main component net.components <- clusters(net.igraph) net.main.component <- induced.subgraph(net.igraph, which(net.components$membership==which.max(net.components$csize))) # Plot network plot(net.main.component, layout=layout.fruchterman.reingold.grid, vertex.size=3, vertex.label=NA)
Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A., 2004. The architecture of complex weighted networks. Proceedings of the National Academy of Sciences 101 (11), 3747-3752.
Borgatti, S. P., Everett, M. G., 1997. Network analysis of 2-mode data. Social networks 19, 243-269.
Burt, R.S., 1992. Structural holes. Harvard University Press, Cambridge, MA.
Burt, R.S., 2005. Brokerage and closure: An introduction to social capital. Oxford University Press, Oxford, UK.
Holland, P. W., Leinhardt, S., 1970. A method for detecting structure in sociometric data. American Journal of Sociology 76, 492-513.
Latapy, M., Magnien, C., Del Vecchio, N., 2008. Basic notions for the analysis of large two-mode networks. Social Networks 30(1), 31-48.
Lind, P.G., González, M.C., Herrmann, H.J., 2005. Cycles and clustering in bipartite networks. Physical Review E 72, 056127.
Luce, R. D., Perry, A. D., 1949. A method of matrix analysis of group structure. Psychometrika 14 (1), 95-116.
Newman, M. E. J., 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E 64, 016132.
Opsahl, T., 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks 35, doi:10.1016/j.socnet.2011.07.001.
Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social networks 31, 155-163.
Peng, W., Sharpe, K., Robins, G.L., Pattison, P.E., 2009. Exponential random graph (p*) models for affiliation networks. Social Networks 31, 12–25.
Robins, G., Alexander, M., 2004. Small worlds among interlocking Directors: Network structure and distance in bipartite graphs. Computational and Mathematical Organization Theory 10 (1), 69–94.
Seierstad, C., Opsahl, T., 2011. For the few, not the many? The effects of affirmative action on presence, prominence, and social capital of women directors in Norway. Scandinavian Journal of Management 27 (1), 44-54.
Snijders, T. A. B., 2001. The statistical evaluation of social network dynamics. Sociological Methodology 31, 361-395.
Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, New York, NY.
Zhang, P., Wang, J., Li, X., Li, M., Di, Z., Fan, Y., 2008. Clustering coefficient and community structure of bipartite networks. Physica A 387, 6869–6875.