## Thesis: 2 Clustering in Weighted Networks

An article based on this chapter has been published (Opsahl and Panzarasa, 2009). This article was written after this chapter and contains a number of changes.

This chapter has a methodological nature in that it builds on, and extends, a fundamental measure of network structure, namely the clustering coefficient, that has long received attention in both theoretical and empirical research. This measure assesses the degree to which nodes tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties (Feld, 1981; Heider, 1946; Holland and Leinhardt, 1970; Freeman, 1992; Friedkin, 1984; Louch, 2000; Snijders, 2001; Snijders et al., 2006; Watts and Strogatz, 1998). More generally, one can ask: if there are three nodes in a network, i, j, and k, and i is tied to j and k, how likely is it that j and k are tied? In real-world networks, this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971; Wasserman and Faust, 1994)..

For social networks, scholars have investigated the mechanisms that are responsible for the increase in the probability that two people will be tied if they share an acquaintance (Holland and Leinhardt, 1971; Snijders, 2001; Snijders et al., 2006). The nature of these mechanisms can be social, as in the case of third-part referral (Davis, 1970; Heider, 1946), or non-social, as in the case of focus constraints (Feld, 1981). On the one hand, an individual may reduce cognitive stress by introducing his or her acquaintances to each other (Heider, 1946). Moreover, indirect ties foster trust, enhance a sense of belonging, facilitate the enforcement of social norms, and enable the creation of a common culture (Coleman, 1988). Burt (2005) found that reputation of a person is only maintained if his or her contacts can communicate or gossip. This also applies to inter-organisational networks where organisations in tightly knit groups create informal governance arrangements (Uzzi and Lancaster, 2004). On the other, focus constraints refer to the increased likelihood of interaction and clustering among nodes that share the same physical, institutional, organisational or social environment. For example, people who share the same office are more likely to create independent dyadic ties leading to a heightened tendency towards clustering than people that reside in distant geographical locations (Feld, 1981).

Traditionally, the tendency of nodes to cluster together is measured using the global clustering coefficient (e.g. Feld, 1981; Karlberg, 1997, 1999; Louch, 2000; Newman, 2003) or the local clustering coefficient (Watts and Strogatz, 1998). This chapter deals with the former of these measures. Nevertheless, the local clustering coefficient is briefly introduced below to review and highlight differences among the two measures.

The local clustering coefficient is based on ego network density or local density (Scott, 2000; Uzzi and Spiro, 2005). For a node i, this is the fraction of the number of present ties over the total number of possible ties between node i‘s neighbours. For undirected networks, the local clustering coefficient is formally defined as (equation 1):

$C_i = \frac{\text{ties between node }i\text{'s neighbours}}{\text{node }i\text{'s neighbours } \times \text{ (node }i\text{'s neighbours } - 1) / 2}$

To obtain an overall coefficient for a network, the fractions for all the nodes in a network are averaged. The main advantage of this measure is that a score is assigned to each node. This enables researchers to study correlations with other nodal properties (e.g. Panzarasa et al., 2009) and perform regression analyses with the observations being the nodes of a network (e.g. Uzzi and Lancaster, 2004). However, this coefficient suffers from two major limitations. First, its outcome does not take into consideration the weight of the ties in the network. As a result, the same value of the coefficient might be attributed to networks that share the same topology, but differ in terms of how weights are distributed across ties and, therefore, may be characterised by different likelihoods of friends being friends with each other. Second, the local clustering coefficient does not take into consideration the directionality of the ties connecting a node to its neighbours. A neighbour of node i might be: 1) a node that has directed a tie towards node i, 2) a node that node i has directed a tie towards, or 3) a node that has directed a tie towards node i and to whom node i has also directed a tie. Barrat et al. (2004) proposed a generalisation of the coefficient to take the weight to the ties into consideration. However, the issue of directionality still remains unsolved (Caldarelli, 2007).

Unlike the local clustering coefficient, the global coefficient is based on a clustering measure for directed networks: transitivity (Wasserman and Faust, 1994, 243). However, it is only defined for networks where ties are without weights. When the weights are attached to the ties, researchers have set an arbitrary cut-off level and then dichotomised the network by removing ties with weights that are below the cut-off, and then removing the weights from the remaining ties (this process is described in detail in Section 1.1). The result is a binary network consisting of ties that are either present (or equal to 1) or absent (or equal to 0; Scott, 2000). Doreian (1969) studied clustering in a weighted network by creating a series of binary networks from the original weighted network using different cut-offs. A sensitivity analysis can address some of the problems arising from the subjectivity inherent in the choice of the cut-off. However, it tells us little about the original weighted network, except that the value of the clustering coefficient changes at different cut-off levels. While we also conduct similar sensitivity analyses on various datasets, here we propose a generalisation that explicitly takes weights of ties into consideration and, for this reason, does not depend on a cut-off to dichotomise weighted networks.

In what follows, we start by discussing the existing literature on the global clustering coefficient in undirected and binary networks¹. In Section 2.2 we propose our generalised measure of clustering. We then test and compare the generalisation with the existing measure by using a number of empirical datasets based on weighted and undirected networks. In Section 2.4, we turn our attention to directed networks and discuss the existing literature on clustering in that type of network. We then extend our generalised measure to cover weighted and directed networks. Finally, Sections 2.5 and 2.6 highlight the contribution to the literature and offers a critical assessment of the main results.
_____________________
¹ Even though directionality of ties is a key advantage in choosing the global clustering coefficient over the local, for the sake of simplicity, we choose to start by focusing on undirected ties.