Clustering in two-mode networks
September 11, 2009
Introduction
Networks are representations of systems in which the elements (or nodes) are connected by ties (Wasserman and Faust, 1994). Most networks are defined as one-mode networks with one set of nodes that are similar to each other. However, many network dataset are in fact two-mode networks (also known as affiliation or bipartite networks). These networks are a particular kind of networks with two different sets of nodes (e.g., employees and projects), and ties are only established between nodes belonging to different sets (e.g., employees are linked to a project if they have worked on it). In most two-mode networks, the nodes of one node set are considered more responsible for tie creation (primary node set) than the other (secondary node set). For example, employees decide whether or not to collaborate on a project.
One of the first two-mode datasets to be analysed was the Davis Southern Club Women dataset (Davis et al., 1941), which recorded the attendance of a group of women (node set 1) to a series of events (set 2). A woman would be linked to an event if she attended it. Another category of two-mode networks that has become popular in recent years is scientific collaboration networks (e.g., Newman, 2001). In these networks, the two sets of nodes are scientists (node set 1) and papers (node set 2), and a tie is established if a scientist have authored a paper.
Two-mode networks are rarely analysed without transforming them. A likely cause of this is that most network measures are defined for one-mode network, and few of them have been redefined for two-mode networks. Therefore, researchers have constructed one-mode networks from the two-mode networks. This is often done using a method referred to as projection. This method operates by selecting one of the two node sets and linking two nodes from that set if they were connected to a common node of the other set. For example, interlocking directorates can be studied in two ways from a network perspective (Mizruchi, 1996; Zajaz, 1988). In this kind of networks, the two node sets are directors and corporate boards, and ties represent the affiliation of directors with boards. On the one hand, it is possible to study the interdependence among companies. Levine (1979) studied how companies were connected by having the same individual on both their boards of directors. By being connected, communication, knowledge transfer, and coordination are facilitated between companies (Zajaz, 1988). On the other, it is possible to study the interpersonal relationships among individuals that are directors on a set of boards of directors. In this case, individuals can be linked by being members of the same boards (Opsahl and Seierstad, 2009).
Projection of two-mode networks creates a number of modelling issues. First, each tie in a regular one-mode network is created separately; however, this is not the case in projected two-mode networks. For example, while a standard phone call creates a communication tie from one person to another, a director forms ties with all the other directors on a board when she or he joins that board. This has direct implications for models that utilize random networks to create benchmark values (e.g., Opsahl et al., 2008) and when comparing values found in an observed network with those found in corresponding random networks. Ties in random networks are assumed to be independent of each other. (Footnote 1) Although this is neither the case in observed one-mode nor two-mode networks, in projected two-mode networks, ties are not even formed separately. Thus, they are less comparable to the observed network than to genuine one-mode networks. Second, depending on the degree distribution of the non-selected node set, a projected one-mode network tends to have more and larger fully-connected cliques than non-projected one-mode networks (Wasserman and Faust, 1994). These are produced when a group of nodes are connected to the same other node in the two-mode network (e.g., all the directors on a single board are all connected). This feature impacts a number of network measures, and in particular, the clustering coefficient (for a review, see Opsahl and Panzarasa, 2009). This measure is the fraction of 2-paths (i.e., three nodes connected by two ties) that are part of a triangle. Given that the cliques contain a number of triangles, this measures might be biased. To exemplify the cliques, and the many triangles, produced when projecting a two-mode network, Figure 1 shows the main component of the interpersonal network among Norwegian directors (Opsahl and Seierstad, 2009).

Figure 1: The network structure among directors (circles) who form part of the largest group of interconnected directors. Two directors are connected if they are members of the same board. The solid circles refer to women, whereas the hollow circles refer to men.
The rest of the paper is organised as follows. First, I will describe the binary clustering coefficient and its properties. Then, I will propose a new measure that detects triadic closure in two-mode networks while omitting triangles formed by three nodes solely being connected to the same other nodes (e.g., three scientists writing a paper). I will then suggest a method for integrating the generalization by Opsahl and Panzarasa (2009) of the clustering coefficient to weighted one-mode networks by offering a generalisation of the proposed coefficient to weighted two-mode networks. This will be followed by applications of the proposed coefficients to four two-mode networks from the domains of event attendance, scientific collaboration, interlocking directorates, and online communication. Finally, I will offer some concluding remarks.
Clustering coefficient for one-mode networks
A measure that has long received much attention in both theoretical and empirical research is concerned with the degree to which nodes tend to cluster together. Evidence suggests that in most real-world networks, and especially social networks, nodes tend to cluster into densely connected groups (Holland and Leinhardt, 1970; Opsahl and Panzarasa, 2009; Watts and Strogatz, 1998). Two main measures have been developed for testing this tendency: the local clustering coefficient (Watts and Strogatz, 1998) and the global clustering coefficient (Luce and Perry, 1949; Opsahl and Panzarasa, 2009). The global coefficient assesses the overall level of clustering in the network, whereas the local coefficient measures the density of ties among a node’s contacts. This paper is concerned with the former of these two measures. This coefficient is the fraction of triplets or 2-paths that are closed by the presence of a tie between the first and the third node. It is formally defined as:
where is the number of 2-paths, and
is the number of these 2-paths that are closed by being part of a triangle. This coefficient varies between 0 and 1. It is equal to 0 if no triangles exist in the network, and equal to 1 if all 2-paths are closed. In a completely connected network, the coefficient would be 1 as all 2-paths would be closed. Moreover, in a classical random network, the expected value of the clustering coefficient is equal to the probability of a tie being formed (i.e., the density) as ties are independent of each other (Erdos and Rényi, 1959).
Clustering coefficient for two-mode networks
In two-mode networks, the denominator and numerator of the clustering coefficient can be redefined in terms of 4-paths and closed 4-paths, respectively. (Footnote 2) First, a 4-path is a path that starts at a node, goes via two nodes of the other node set and one node of the same node set and, and ends up at a node of the same node set. For example, in Figure 2a, a 4-path exists between node 1 and node 3 via node A, node 2, and node C. Second, a closed 4-path is a 4-path in which the two end nodes are connected to a common node (excluding the nodes that are already part of the path). For example, the mentioned 4-path between node 1 and node 3 is closed as these nodes are both connected to node B, which is not part of the path.

Figure 2: (a) A two-mode network where the shape and colour of nodes represent the set to which a node belongs, and (b) the one-mode projection of the two-mode network in panel a. The round blue nodes are the primary nodes in this projection. The clustering coefficient of the two-mode network (panel a) is 0.3, while the clustering coefficient of the one-mode projection (panel b) is 0.4615.
All 4-paths in a two-mode network are 2-paths in a one-mode projection of the network; however, not all 2-paths in a one-mode projection are created from 4-paths. In fact, 2-paths can also be created due to multiple nodes being connected to the same node. The 2-paths created by the latter mechanism would be excluded when only considering 4-paths in the two-mode structure. This feature is illustrated in Figure 2. In the first panel (a), there are ten 4-paths, three of which are closed. (Footnote 3) These 4-paths represent ten 2-paths in the one-mode projection (panel b). (Footnote 4) However, in the one-mode projection (panel b), there are an additional three 2-paths. These are created among node 2, node 3, and node 5 as these nodes are all connected to node C in the two-mode network.
Formally, the proposed coefficient can be defined as:
where is the number of 4-paths, and
is the number of these 4-paths that are closed by being part of a 6-cycle (i.e., a loop of six ties with five nodes).
The proposed coefficient has a number of properties. First, it varies between 0 and 1 as the numerator and denominator are positive numbers, and the numerator is a subset of the denominator. Second, in a fully connected network, the coefficient is equal to 1 as all 4-paths are closed. Third, in an ensemble of random two-mode networks with a set number of nodes and ties, the average clustering coefficient is approximately , where
is the density (i.e.,
for two-mode networks),
is the number of primary nodes, and
is the number of secondary nodes). (Footnotes 5 and 6)
Weighted networks
The one-mode clustering coefficient has been extended to weighted networks by assigning a value to each triplet or 2-path, (Opsahl and Panzarasa, 2009). This value is based on the weights of the two ties that compose the 2-path, e.g., the arithmetic mean, geometric mean, minimum, or maximum of the tie weights. The coefficient incorporates these values by being defined as the total value of closed 2-paths over the total value of all 2-paths. In addition to having the same parameters as the original coefficient, this coefficient produces the same outcome as the original one if all 2-paths have the same value (e.g., a binary network) or if the weights are randomly reshuffled in the network.
The coefficient proposed in this paper can be generalized to weighted two-mode networks, such as those created from online forums where the weights is the number of messages or characters posted to a topic. In a similar spirit as the generalization to weighted networks of the one-mode clustering coefficient, a 4-path-value could be defined. This valued should be constructed based on the four tie weights, and could be defined as the arithmetic mean, geometric mean, minimum, or maximum. For example, the 4-path from node 1 to node 4 (via nodes A, 2, and D) in Figure 3 could be assigned a value of 3, 2.45, 1, or 6, respectively. The clustering coefficients based on the four methods for the sample network shown in Figure 3 would be 0.4, 0.398, 0.385, and 0.383. The topology of the networks is identical to the binary two-mode network in Figure 2, which had a clustering coefficient of 0.3. The increase in the coefficient when tie weights are considered is a reflection of the fact that the closed 4-paths have relatively stronger ties than the open 4-paths. The various explanations given in Opsahl and Panzarasa (2009) for the differences between the methods for defining 2-paths values are also applicable in the case of 4-path values. Formally, the coefficient for weighted two-mode networks could be defined as:

Figure 3: A weighted two-mode network.
The generalized coefficient has a number of properties. In addition to having the same properties as the binary two-mode clustering coefficient, the generalized coefficient is equal to the binary one when all ties have the same value (i.e., binary). Moreover, the outcome of the generalised coefficient is approximately equal to the binary one if tie weights are randomly reshuffled in the network.
Empirical test
To illustrate the proposed coefficient, I will apply it to Davis Southern Club Women dataset (Davis et al., 1941), Newman’s (2001) scientific collaboration network, the interpersonal network among directors on Norwegian public limited company boards (Opsahl and Seierstad, 2009), and data collected from an online forum. First, Davis Southern Club Women was collected in the 1930s, and contains the attendance of 18 women to 14 events. This network is relatively dense as 91 percent of the possible ties are present in the one-mode projection, and a slightly higher clustering coefficient is found (0.9284). Thus, one would assume that a triadic closure effect drives tie formation. However, the observed two-mode clustering coefficient (0.77) is less than the randomly expected value (0.80). This suggests that a triadic closure effect is not at play, in fact, it suggests an opposite reverse effect.
Second, Newman’s (2001) collaboration network is based upon 22,016 co-authored papers published on the arXiv e-repository website between 1995 and 1999. In total, these papers have 16,726 authors listed. The one-mode projection of this network using the authors as the primary node set has often been used as an empirical example for a variety of network measures (e.g., Opsahl et al., 2008). In the projected one-mode network, a clustering coefficient of 0.3596 is obtained, while the randomly expected value is 0.0003. Due to the great difference between the observed coefficient and the expected one, it has been argued that there is a strong tendency towards clustering (Newman, 2001). However, this number includes many of the triangles that are form by construction as the average number of authors per paper is 2.66. (Footnote 7) The proposed clustering coefficient for two-mode networks can be applied in an effort to exclude these triangles. This coefficient is 0.2769, while the randomly expected value is 0.0006. Albeit a weaker effect than the one found in the projected one-mode network, this suggests that there is an actual triadic closure effect in this network.
Third, the interlocking directorate dataset is composed of the 367 Norwegian public limited company boards as of August 1, 2009 (Opsahl and Seierstad, 2009). Based on this dataset, it is possible to construct an interpersonal network among the 1,413 directors, and link two individuals if they sit on the same board. In this one-mode projection, the observed and randomly expected clustering coefficients are 0.6477 and 0.0035, respectively, giving a ratio between them of 185. However, in the two-mode network, the observed and randomly expected clustering coefficients are 0.0119 and 0.0041, respectively. As the ration between these two coefficients is only 2.93, a much weaker triadic closure effect in the network is suggested than assumed from the projected one-mode network.
Fourth, the online forum data was collected from an online community of students at University of California, Irvine, in 2004. This community consisted of 2,995 students who could create any group that they wanted, and post messages to any group that they were a member of. In total, 889 students posted 33,720 messages to 552 groups. On average, each student posted 4.76 messages or 480.19 characters to each group they actively participated in. A two-mode network can be constructed from this dataset by linking a student to a group if she or he posted to it. Unlike the other datasets, it is possible to create both binary and weighted to-mode networks from this one. As weights can be assigned to each tie based on either the number of messages or characters posted, I have created a binary and two weighted two-mode networks from this dataset. The various clustering coefficients obtained from these networks are listed in Table 1. (Footnote 8)
| Network | 2M |
1M |
1M CC | Random 1M CC | 2M CC | Random 2M CC | ||
|---|---|---|---|---|---|---|---|---|
| Davis Southern Women Club | 18 | 14 | 89 | 139 | 0.9284 | 0.9085 | 0.7719 | 0.7978 |
| Scientific collaboration network | 16,726 | 22,016 | 58,595 | 47,594 | 0.3596 | 0.0003 | 0.2769 | 0.0006 |
| Interpersonal network among directors | 1,413 | 367 | 1,734 | 3,414 | 0.6477 | 0.0035 | 0.0119 | 0.0041 |
| Online forum (binary) | 899 | 522 | 7,089 | 16,860 | 0.5049 | 0.1768 | X | 0.1119 |
| Online forum (weighted, messages) | … | … | … | … | … | … | X | 0.1119 |
| Online forum (weighted, characters) | … | … | … | … | … | … | X | 0.1119 |
Table 1: The various clustering coefficients (CC) obtained on the four empirical datasets. 2M refers to two-mode networks, and 1M to one-mode ones. The one-mode clustering coefficients are obtained on binary projections.
Concluding remarks
Two-mode networks are rarely analysed without projecting them onto one-mode networks as there are few methods for this purpose. However, by projecting two-mode networks, certain assumptions in the one-mode methods might be violated, such as the ability of each tie to be formed separately. Moreover, measures based on triangles or ties among nodes’ contacts might be biased. This is due to the fact that projected two-mode networks generally contain more and larger fully-connected cliques than regular one-mode networks. In particular, depending on the degree distribution of the non-projected nodes, the measure that assesses the overall level of clustering in a one-mode network, the clustering coefficient, could be biased. More specifically, if the non-projected nodes have a degree greater than 2, triangles will be automatically formed in the one-mode projection, which will increase the coefficient. Thus, there is a need to redefine one-mode measures for two-mode networks, and in particular, the ones that are directly affected by the increase in triangles.
This paper proposed such a redefinition for the clustering coefficient. It subtracts the level of triadic closure caused by groups of nodes that are connected to common nodes in the two-mode structure. More specifically, it excludes triplets that are formed due to this effect, and only counts the number of triplets that are formed based on independent dyadic interactions.
The proposed coefficient is not without its limitations. A major one is that one of the node sets must be considered to be responsible for tie generation, and designated as the primary node set. Nodes from this set would be first and last nodes of the 4-paths. Generally, the selection of the primary node set depends on the perspective of analysis. If the network is projected onto a one-mode network, the chosen node set is generally thought of as the primary one. In certain empirical cases, the selection of these nodes is obvious. For example, in scientific collaboration networks where researchers come together and co-author papers, the researchers are often assumed to be the primary nodes (Newman, 2001). However, in the case of interlocking directorates, it is not obvious whether directors or boards are the primary node set (Levine, 1979; Opsahl and Seierstad, 2009). This is due to the fact that tie formation is a mutual process where the directors must (1) be invited to join the board, and (2) accept the invitation.
The proposed coefficient represents only a first step in the process of redefining one-mode network measures for two-mode networks. Although the clustering coefficient is particularly affected by the projection procedure, there are many other measures that are also affected, such as the local clustering coefficient (Watts and Strogatz, 1998) and the array of structural holes measures (Burt, 1992). Redefining these measures for two-mode networks could increase their accuracy, and might give rise to novel insights.
Footnotes
1: In random networks where the present of each tie is given by a sought after density (i.e., where
is the number of ties, and
is the number of nodes), ties are independent of each other. However, ties are dependent upon each other in random networks with a set number of nodes and ties. For example, in an undirected random network, the likelihood of the first tie to be present is equal to the density. However, the likelihood of the second tie being present is dependent upon whether the first tie is present. Thus, it is equal to
where
is equal to 1 if the first was created and 0 otherwise. In fact, the presence of the last possible tie in a random network is given by the presence or lack thereof of the other ties in the network. Nevertheless, in large and sparse random networks, this dependence plays a negligible role as the number of actual ties is marginal relative to the number of possible ties.
2: I have knowingly chosen not to define the measure as the number of 4-cycles over the number of 3-paths. This is due to the fact that a 3-path would simply be, in the case of collaboration networks, the number of projects that a person’s collaborators are affiliated with. A 4-cycle would be, in that case, two individuals collaborating twice. Although this could be viewed as a form of clustering, it would not be triadic closure. In fact, it is a measure of reinforcement between two individuals rather than clustering of a group of individuals.
3: The 4-paths are 1-A-2-C-3 (closed); 1-A-2-C-5; 1-A-2-D-4; 1-B-3-C-2 (closed); 1-B-3-C-5; 2-A-1-B-3 (closed); 2-C-5-E-6; 3-C-2-D-4; 3-C-5-E-6; 4-D-2-C-5.
4: These 2-paths are: 1-2-3 (closed); 1-2-4; 1-2-5; 1-3-2 (closed); 1-3-5; 2-1-3 (closed); 2-5-6; 3-2-4; 3-5-6; 4-2-5.
5: The components of this express are the following. The density of the two-mode network, , is the likelihood that a tie is present. The square of
is the likelihood that (1) a tie is present from the first node on a 4-path to a node, and (2) a tie is present from that node to the last node on the 4-path. The inner brackets,
, is the likelihood that these two ties are not present. The power of the inner bracket,
, ensures that
, is the likelihood that no node connect the first and last node on the 4-path. As the two nodes that are on the 4-path cannot close it, 2 is subtracted from
. By subtracting the inner bracket and its power from 1, the outcome is the likelihood that a 4-path is closed. This is the randomly expected value of Eq. 2.
6: Simulations conducted on ensembles of random networks with different number of nodes and ties produced outcomes that were not statistically different from the outcome attained with the above expression. Each ensemble contained 1,000 random networks.
7: To highlight the intrinsic level of clustering in this network, I randomised the two-mode structure while keeping the degree distributions (i.e., randomly assign the ties in the two-mode network while keeping each author’s number of co-authored papers, and each paper’s number of authors) before projecting it onto a one-mode network and calculating the clustering coefficient (Eq. 1). The average clustering coefficient across 100 random networks was 0.1235764, which is over 350 times larger than the coefficient found in a random one-mode network with the same number of nodes and ties.
8: Unless otherwise stated, the weighted clustering coefficients are based on the arithmetic mean method for defining the 2-path and 4-path values.
References
Burt, R.S., 1992. Structural Holes. Harvard University Press, Cambridge, MA.
Davis, A., Gardner, B. B., Gardner, M. R., 1941. Deep South. University of Chicago Press, Chicago, IL.
Erdo s, P., Rényi, A., 1959. On random graphs. Publicationes Mathematicae 6, 290-297.
Holland, P.W., Leinhardt, S., 1970. A method for detecting structure in sociometric data. American Journal of Sociology 76, 492-513.
Levine, J., 1979. Joint-space analysis of “pick-any” data: Analysis of choices from an unconstrained set of alternatives. Psychometrika 44(1), 85-92.
Luce, R.D., Perry, A.D., 1949. A method of matrix analysis of group structure. Psychometrika 14 (1), 95-116.
Mizruchi, M. S., 1996. What Do Interlocks Do? An Analysis, Critique, and Assessment of Research on Interlocking Directorates. Annual Review of Sociology 22, 271-298.
Newman, M. E. J., 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E 64, 016132.
Opsahl, T., Colizza, V., Panzarasa, P., Ramasco, J.J., 2008. Prominence and control: The weighted rich-club effect. Physical Review Letters 101, 168702.
Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social networks 31, 155-163.
Opsahl, T., Seierstad, C., 2009. For the few, not the many: The effects of affirmative action on presence, prominence, and social capital of female directors in Norway. Unpublished manuscript available on http://toreopsahl.com/publications/
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, New York, NY.
Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of small-world networks. Nature 393, 440-442.
Zajac, E., 1988. Interlocking directorates as an interorganizational strategy: A test of critical assumptions. Academy of Management Journal 31(2), 428-438.
What to try it with your data
# Load tnet library(tnet) # This is the code for the binary two-mode network above net <- cbind( i=c(1,1,2,2,2,3,3,4,5,5,6), p=c(1,2,1,3,4,2,3,4,3,5,5)) # Calculate measure clustering_tm(net) # This is the code for the weighted two-mode network above net <- cbind( i=c(1,1,2,2,2,3,3,4,5,5,6), p=c(1,2,1,3,4,2,3,4,3,5,5), w=c(3,5,6,1,2,6,2,1,3,1,2)) # Calculate measure clustering_tm(net)
Entry Filed under: Network thoughts. Tags: actors, affiliation networks, arcs, bipartite networks, clustering coefficient, community, complex networks, degree, edges, embeddedness, global, graphs, Links, local, network, nodes, reinforcement, social network analysis, strength of ties, ties, two-mode networks, undirected networks, valued networks, vertices.

Trackback this post | Subscribe to the comments via RSS Feed