Thesis: 1.3 Projects and outline of thesis
In the last few years, I have focused on a number of research projects aimed at improving our understanding of networks. These include both theoretical and empirical studies as well as the development of a software package. In this thesis, I will highlight four of these projects, two of which are related to a structural analysis of weighted networks. The first project, presented in Chapter 2, is a generalisation of a much-studied network measure, namely the clustering coefficient, to weighted networks. The clustering coefficient is the fraction of triplets (i.e., three nodes that are tied together) that are part of triangles in a network (Luce and Perry, 1949; Wasserman and Faust, 1994). In social networks, the clustering coefficient tends to be higher than the one found if the ties were randomly formed among the people (Erdos and Renyi, 1959; Solomonoff and Rapoport, 1951). This implies that people have an increased likelihood of being tied together if they share a common contact. In sociology, it has been speculated that this is due to a person’s cognitive need to create balance by introducing his or her contacts to each other (Heider, 1946; Holland and Leinhardt, 1971).
A limitation of the clustering coefficient is that it can only be applied to binary networks. This represents a major weakness as the richness of the information offered by the strength of ties is undoubtedly lost. In this thesis, I propose a new measure that takes into account the strength of ties by incorporating the weights directly into its definition. The generalisation is flexible and allows the coefficient to be applied to different types of networks, including directed ones. In directed networks, ties do not simply connect two nodes together, but are formed by one node and terminates at another (Wasserman and Faust, 1994). For example, asking for advice is often seen as a directed tie (or arc) because it refers to social interactions in which knowledge flows from one person to another in a specific direction (Lazega, 2001). Conversely, collaboration is typically seen as two people forging an undirected tie (or edge) because it usually implies a two-way interaction between the nodes (Newman, 2001b).
The second project (Chapter 3) proposes a new measure for weighted networks called the weighted rich-club effect. Unlike the generalisation of the clustering coefficient, this measure detects a feature only found in weighted networks, namely whether the strongest ties in a network are shared among a subset of “prominent” nodes, e.g. nodes with the largest number of contacts. Numerous studies have shown that a number of properties in a wide range of networks are heterogeneously distributed across the nodes (Barabasi and Albert, 1999; Barrat et al., 2004; Pareto, 1897; Pastor-Satorras and Vespignani, 2004; Simon, 1955; Zipf, 1935). Investigating the nature of the interactions among the nodes with the highest levels of a given property (i.e., the prominent ones) can provide useful insights into the network’s organisation and functioning. Scholars have already started studying interactions among prominent nodes by investigating whether there is a tendency of the highly connected nodes to form more ties with each other than randomly expected (the topological rich-club phenomenon; Colizza et al., 2006; Zhou and Mondragon, 2004). Conversely, our proposed measure assesses whether the prominent nodes share the strongest ties in the network. For example, we ask: do prominent people attract and exchange among themselves the vast majority of resources available in a social network, or do they tend to distribute resources homogeneously across the network? The measure helps answer this question by testing if the ties among prominent nodes are stronger or weaker than expected by chance.
The third project presented in Chapter 4 aims to provide a framework for studying the evolution of binary and weighted longitudinal networks (i.e., networks in which the exact sequence of the addition and removal of nodes, and the creation, strengthening, weakening, and severing of ties is known). Unlike cross-sectional networks with unknown dependency structure, the dependency among ties is known in this type of datasets. This feature allows us to identify the other nodes in the network and their properties at the time a node decides to form a tie. Thus, whether, and the extent to which, different properties affect the likelihood of receiving a tie can be probed directly. We empirically test this framework on an online social network created from a virtual community. This allows us to accurately study the interplay between a host of mechanisms that guide online behaviour and interpersonal dynamics.
During my PhD, I have used numerous software packages. For network analysis, these include UCINET (Borgatti et al., 2002), Pajek (Batagelj and Mrvar, 2007), Siena (Snijders et al., 2007), Pnet (Wang et al., 2005), and the sna (Butts, 2006) and statnet (Handcock et al., 2003) packages in R (R Development Team, 2008). To perform statistical and econometric analysis I have relied upon Stata (StataCorp, 2007) and general functions in R. In addition, I have programmed a number of functions in R and Matlab (MathWorks, Inc., 2007) to carry out a number of non-standard algorithms. In fact, my final project, presented in Chapter 5, is a software package named tnet which is a collection of most of the functions that I have written in R.
This software package was developed in response to the lack of open-source software programmes that can deal with weighted and longitudinal networks. For example, the widely used network-package that many other open-source packages developed in R, notably the sna and statnet-packages, depend on, does not even have a data class for weighted networks (Butts, 2006; Butts et al., 2008; Handcock et al., 2003). Therefore, to allow for an assessment of weighted and/or longitudinal networks, a new platform is needed.
tnet is in itself an open-source package and allows others to add or modify functions. To this end, it contains two data structures, one for weighted networks and one for longitudinal ones, and a number of support functions. Based on these, researchers aiming to develop new measures or generalise existing ones to weighted or longitudinal networks have a platform on which they can easily do so. The package contains a set of functions for each of the two data structures. For the analysis of weighted networks, it includes a host of structural measures. Among these are functions to calculate the measures proposed in Chapter 2 and Chapter 3. Regarding the study of longitudinal networks, the package includes the framework proposed in Chapter 4. The goal of this project is to enable researchers to easily conduct a structural analysis of weighted and longitudinal networks by applying the measures proposed in this thesis and in the literature.