## Thesis: 2.6 Conclusion and discussion

An article based on this chapter has been published (Opsahl and Panzarasa, 2009). This article was written after this chapter and contains a number of changes.

Relations to unique people are unique. We live in an increasingly connected world with an increasing number of contacts to whom we relate in different ways, with different frequencies, and for different reasons. Each social relationship bears a special meaning to us, and it would be overly simplistic and grossly unfair to treat every contact in the same manner. Therefore, it is important to capture this difference when studying social networks. We believe that social network measures should capture the richness of information that the weight of ties contains. There are a great number of networks where the weights of the ties are recorded (see Section 2.3, but also Ebel et al., 2002; Holme et al., 2004; Kossinets and Watts, 2006), nevertheless only a limited number of measures take the weight into account (e.g. Burt, 1992; Freeman et al., 1991; Nordlund, 2007; Yang and Knoke, 2001). Therefore, most measures can only be calculated on binary networks. This means that researchers must set a subjective cut-off, and ties whose weight falls below this cut-off are removed, and those whose weight is above are simply set to present (Doreian, 1969). However, this procedure is detrimental to the quality of the analysis, primarily because it compromises the information that the data holds.

To overcome this shortcoming of network measures applied to weighted networks, we offered a generalisation of the clustering coefficient that takes the weight of ties explicitly into account. It does so by attaching a value to each triplet. A triplet consists of two ties, either two undirected ties or two directed ties, depending on the nature of the network. The value of a triplet is based on the weight of these two ties. We proposed four methods for calculating this value. The first two methods utilise a mean algorithm, namely the arithmetic mean and the geometric mean. These methods discount a strong tie when it is coupled with a weak tie, with the “geometric mean” method discounting more than the “arithmetic mean” method. The last two methods represent extreme methods, namely the “maximum” and “minimum” methods. These methods are not sensitive to differences between the weights of the two ties that constitute the triplet. They simply set the value of the triplet equal to the maximum or the minimum weight of the two ties, respectively. The advantages and shortcomings of each of these four methods should be evaluated based on the research question and the type of network dataset at hand. For example, in a network where the weights correspond to the level of flow, and a weak tie would act as a bottleneck, the minimum method might be most appropriate to use. By contrast, when ties are weighted in terms of costs or time, it may be more suitable to apply the maximum method so as not to underestimate the value of triplets. The appropriateness of the two methods based on averages is linked to the question of whether extreme tie weights should be discounted. For example, the geometric mean might be more appropriate than the other methods when studying knowledge transfer in a social network. This is due to the fact that proportionally less knowledge is transferred over a strong tie (e.g., a tie with a weight of five is likely not transfer five times the knowledge as a tie with a weight of one; Granovetter, 1973). In an attempt to account for this feature, the geometric mean might be suitable as it discounts triplets with extreme weights (see Table 1). On the contrary, in networks where the weights are directly proportional to capacity, the arithmetic average might be more suitable. This is often the case in non-social networks, such as the US airport network.

The generalised coefficient produces the same results as the binary clustering coefficient when applied to a binary network. The binary coefficient divides the number of closed triplets by the total number of triplets, whereas the generalised coefficient divides the total value of the closed triplets by the total value of all triplets. When the data are binary, all the triplets have the same value, $\omega=1$, regardless of the method used to calculated the value. Therefore, the total value of closed triplets equals the number of closed triplets, and the total value of triplets equals the total number of triplets. Hence, the generalised coefficient equals the binary coefficient when applied to a binary network.

We measured and compared the binary and the generalised clustering coefficients on a number of empirical datasets where the weights of relations are recorded. First, we found that the binary coefficient generally decreased as the cut-off increased. However, as the rate of decrease varies, it is difficult to interpret this result. Second, we found that there were differences among the outcomes when different methods for defining the triplet value were used. The generalised coefficient using the “minimum” method yielded mostly the highest outcome, whereas when the “maximum” method was used, the lowest outcome was generally attained. There were three exceptions. These refer to the networks with the greatest range of tie weights. We speculated that this might have affected results because the difference between the weights of the ties that make up a triplet in these networks is likely to be relatively greater than in other networks. Third, we found that, in all social networks studied, the value of the generalized coefficient was greater than the value of the binary one. This signal that triplets composed out of stronger ties are more likely to be closed than triplets composed of weaker ties. This is not surprising as social interactions generally occur in groups larger than two, and if two people spend a great deal of time with the same third person, they are also likely to meet and develop a bond with each other. Moreover, this finding provide support in favour of an assumption used by Granovetter’s (1973) when he formulated the strength of weak ties theory (Freeman, 1992). Our finding suggests that triplets compose of weak ties are more likely to be open than triplets composed of strong ties. Thus, acquaintances are less likely to be connected with a person’s contacts than friends.

One of the advantages of the generalised clustering coefficient is also a limitation. As opposed to the binary clustering coefficient, ties in weighted networks are not transformed. This becomes an issue when all the ties within a network are defined, even by a marginal weight, because the network is fully connected, and the coefficient is 1. The binary clustering coefficient does not have this issue as ties with a marginal weight are set to absent, and the network is therefore no longer fully connected. An example of a weighted, fully connected network, is a network consisting of cities and where ties among cities represent distances. Since there is a finite distance among all cities, all possible ties in this network are defined. The binary clustering coefficient overcomes this issue by setting weak ties (or long distances) to absent. A possible solution when applying the generalised coefficient (which does not normally transform the data) is exactly to apply this transformation and set weak ties to absent. However, the suitability and appropriateness of this solution depends on the data, the context in which the data were collected, and the research question.

We believe that researchers should carefully operationalise variables when dealing with research questions concerned with tie weights. Marsden and Campbell (1984) conducted a comparative analysis of Granovetter’s (1973, pg. 1361) four criteria for defining tie weights. They found that emotional intensity was a better indicator of friendship than the other three criteria. Conversely, we believe that researchers should suitably assess which criterion represents the most appropriate measure for operationalising each variable of the study. This, in turn, will depend on the nature of the nodes, ties, and more generally on the context of the research setting.

In addition, the scale of the weights should be carefully defined. The scale should represent the chosen criteria to minimise subjective biases. A standard network question often used in the studies of advice networks is:

Please indicate how often you have turned to this person for information or advice on work-related topics in the past three months.

with the scale: 0, Do not know this person; 1, Never; 2, Seldom; 3, Sometimes; 4, Often; 5, Very Often.¹ In this case, answers are prone to the inevitable bias that comes from the different ways in which different people assess duration and define the time-related scale. One way to overcome this problem is to design a scale that reflects actual time. For example, a better scale could be: 0, Never; 1, Once; 3, Monthly; 6, Bi-weekly; 12, Weekly. In turn, this scale, when compared to the former, is likely to yield a dataset that is richer in information, more robust against potential subjective biases, and more suitable for network studies that rely on generalised measures, such as our proposed clustering coefficient.
_____________________
¹ Cross and Parker (2004) used this question to create the advice network in the consulting company used in Section 2.3.