tnet: Software for Analysing Weighted Networks
June 12, 2009
tnet is a package written in R
that currently can calculate weighted social network measures, from version 0.1.0 analyse two-mode networks, and from version 0.2.0 detect underlying principles that guide tie formation in datasets with time-stamped ties. It forms part of a wider effort to analyse richer networks datasets without transforming the networks into simple static binary undirected one-mode ones. This post details tnet’s capabilities related to weighted networks.
Motivation
Almost all of the ideas posted on this blog are related to weighted networks as, I believe, taking into consideration tie weights enable us to uncover and study interesting network properties. Not only are few network measures applicable to weighted networks, but there is also a lack of software programmes that can analyse this type of networks. To the best of my knowledge, there are no programmes that can both deal with weighted networks and allow users to create their own functions. On the one hand, programmes like UCINET
and Pajek
have a small set of functions for weighted networks, but they do not allow users to programme additional functions (Batagelj and Mrvar, 2007; Borgatti et al., 2002). Therefore, researchers proposing new measures must create stand-alone programmes to deal with a single aspect of weighted networks (e.g., Brandes, 2001; Newman, 2001; Opsahl et al., 2008
; Opsahl and Panzarasa, 2009
). On the other, a number of packages for analysing networks has been created within the open-source statistical programme R
, notably the sna
and statnet
-packages (Butts, 2006; Handcock et al., 2003). These packages allow researchers to create additional functions on top of existing ones. This ability reduces the time spent on programming greatly, and let researchers focus on the contribution to the literature instead. For example, if someone has already written a function for identifying the shortest paths in a network, a researcher that would like to extend this measure can simply work on this code without programming the function from scratch. However, the sna
and statnet
-packages rely on the basic network
-package for data structures to represent networks (Butts et al., 2008). This basic package does not have a data class for weighted networks. Therefore, to ease the development of new functions for weighted networks, a new platform is needed. tnet represent an attempt to create such a platform. Although it is a user-written package in R
similar to the sna
and statnet
-packages, it does not rely on the network
-package. It utilises its own data structures, one of which can handle weighted networks.
Data Structures
Since most networks are sparse (i.e., the number of ties (A) is much lower than the squared number of nodes (), I opted for an edgelist format instead of a matrix one. A binary edgelist consists of two columns that represent the pairs of nodes that are tied together in a network (e.g., the edgelist1-format in UCINET’s dl files
; Borgatti et al., 2002). When a directed network is represented, the first column represents the nodes that create the ties, whereas the second column represents the target nodes. Thus, an edgelist is an table. This type of list has been extended to cover weighted networks by adding a third column representing the weight of the ties (an
table; Borgatti et al., 2002). While the matrix format records the weight of all possible ties (a non-established tie would get a weight of 0), this format records the sender, receiver, and weight of established ties only. Thus, the space needed to store a network is proportional
instead of
. The main advantage of this format is that it can scale to networks with many nodes as it is the number of ties, not nodes, that determine the size of the data object. Although many programmes can read edgelists, most network analysis programmes rely on an internal matrix representation, e.g. UCINET
and the network
-package (Borgatti et al., 2002). Conversely, Pajek
, which was designed to analyse large-scale sparse networks, specifically uses an internal edgelist representation (Batagelj and Mrvar, 2007). Thus, tnet can efficiently be applied to large-scale sparse networks.
In an effort to stay consistent with existing data structures, this three column table is also the structure used by tnet. The object class of an edgelist in R
should be data.frame class. This class allows the different columns of a table to be of different classes, such as integer and numeric (i.e., real numbers, which takes more space than integers). Thus, the data.frame class is more efficient at storing data than the matrix class, which requires all columns to be numeric. The first two columns of the edgelist are assumed integers (i.e., the identification number of the node creating the tie and the identification number of the node receiving the tie, respectively). The third column can be integers or numeric that represents the weights attached to the ties.
To illustrate the edgelist structure, the figure below shows two networks with two ties each. The ties in the first network are directed, whereas in the second one they are undirected.

Example of a directed (a) and an undirected (b) network with weighted ties. This figure is based on Figure 16 in my thesis.
The directed network in the above figure should be represented by using the following table:
1 2 4 1 3 2
A network is deemed as undirected if all ties are included twice – one in each direction. The undirected network in the above figure should be represented by the following table:
1 2 4 2 1 4 1 3 2 3 1 2
There are a number of functions that help users to convert data in other formats into the weighted edgelist format. For example, if a dataset is undirected, but there is only one entry for each tie in the edgelist, the symmetrise-function adds a second entry of the edge with the identification numbers of the creator and target nodes reversed. Moreover, if a dataset is similar to an edgelist, but with only two columns (representing the identification numbers of the creator and target nodes) and multiple entries of the same tie refer to the weight of that tie (e.g., if a tie has a weight of 3, it is included three times), then the shrink_to_weighted_network-function allows the users to convert the edgelist into the correct format. To allow for a comparison between weighted and binary network measures, the dichotomise-function creates a binary network from a weighted one. It does so by removing the ties in a weighted edgelist that fall below a certain cut-off and sets the weight to 1 for the remaining ones.
Implemented Network Measures
The main measures that are implemented in tnet are generalised versions of the global clustering coefficient or transitivity (Opsahl and Panzarasa, 2009
), local clustering coefficient (Barrat et al., 2004), and centrality measures (degree, closeness, and betweenness; Barrat et al., 2004; Brandes, 2001; Newman, 2001) as well as the weighted rich-club effect framework (Opsahl et al., 2008
).
First, in Opsahl and Panzarasa (2009)
, we generalised the global clustering coefficient or transitivity, which measures the fraction of triplets that are part of triangles (or closed), to weighted networks. We generalised this measure by assigning a value to the triplets, and then taking the ratio between the total value of closed triplets and the total value of all triplets. This measure has been implemented in the function clustering_w.
Second, the local clustering coefficient has also been generalised to weighted networks (Barrat et al., 2004). This coefficient measures the density of nodes’ ego networks by taking the ratio between the number of ties that exist among a node’s contacts over the total possible number. The number of possible ties is equal to the number of triplets where the focal node is the middle node. Barrat et al. (2004) generalised the measure by assigning a value to each triplet, which was defined as the mean tie weight of the two ties that make up a triplet, and then took the ratio between the total value of closed triplets and the total value of all triplet values. In a similar spirit as the global clustering coefficient, I have proposed to use three additional methods for defining triplet values
. This coefficient with all the four methods for defining triplet values is implemented in the function clustering_w_barrat.
Third, generalisations of Freeman’s (198) centrality measures, namely degree, closeness and betweenness, have been implemented as degree_w, closeness_w, and betweenness_w. The generalisations of degree
and betweenness
have previously been highlighted in posts on this blog, but closeness has yet to be mentioned. The original measure was formalised as the inverse distance to all other reachable nodes in the network. Since it relies on the calculation of shortest distances in the network, a first step towards extending it to weighted networks is to generalise how shortest distances are defined. I have covered one such method in the post average shortest distance in weighted networks
, which was based on work by Dijkstra (1959) and Newman (2001). In fact, Newman (2001) has already extended the closeness measure using this method. He applied both the binary and weighted measures to a coauthorship network, and found that different authors had the highest scores. This highlights the importance of considering the weights, and suggests that a binary measure might not be a good proxy of a possible weighted one.
Fourth, I have included the weighted rich-club effect framework (Opsahl et al., 2008
). However, due to the very general nature of the framework, I have only implemented two prominence parameters (degree and node strength), and two reshuffling procedures (weight reshuffling and weight & link reshuffling) in the function weighted_richclub. For code with additional reshuffling procedures and significance (error bars), see details in the post: Weighted Rich-club Effect: A more appropriate null model for scientific collaboration networks
.
In addition, there are two functions that create random weighted networks. First, rg_w takes a set of properties, and according to these properties, it produces a random network. These properties included the number of nodes and ties, the range of weights, and whether the resulting network should be directed. Second, rg_reshuffling_w takes an observed network and randomises certain properties, such as the creator node, target node, or the location of the weights. In fact, the latter function is used by weighted_richclub when creating corresponding random networks.
Availability and Licensing
Compiled versions and the source code of tnet are available through the CRAN servers. If you are using the Windows version of R
, you should be able to install tnet by going to the ‘Package’-menu in R
, opening ‘Install package(s)’, selecting a server close to you, and then, choosing ‘tnet’ from the list. For specific details on how to install if you are unfamiliar with the CRAN system, see tnet’s supporting website
.
tnet, as the information on this blog, is published under the Creative Commons Attribution-Noncommercial 3.0-lisence
. This means that you are free to:
· share (copy, distribute and transmit)
· remix (adapt)
under the following conditions:
· attribution (you must cite or link to it)
· noncommercial (you may not use it for commercial purposes).
The current citation for tnet is:
Opsahl, T., 2009. Structure and Evolution of Weighted Networks. University of London (Queen Mary College), London, UK, pp. 104-122. Available at http://toreopsahl.com/publications/thesis/.
References
Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A., 2004. The architecture of complex weighted networks. Proceedings of the National Academy of Sciences 101 (11), 3747-3752. arXiv:cond-mat/0311416
Batagelj, V., Mrvar, A., 2007. Pajek: Program for Large Network Analysis: version 1.20. http://pajek.imfm.si/.
Borgatti, S. P., Everett, M. G., Freeman, L. C., 2002. Ucinet for Windows: Software for Social Network Analysis. Analytic Technologies, Harvard, MA.
Brandes, U., 2001. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 25, 163-177.
Butts, C. T., 2006. sna-package: Package for Social Network Analysis. R package version 1.4.
Butts, C. T., Handcock, M. S., Hunter, D. R., 2008. network: Classes for Relational Data. http://statnet.org/.
Dijkstra, E. W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.
Freeman, L. C., 1978. Centrality in social networks: Conceptual clarification. Social Networks 1, 215-239.
Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Morris, M., 2003. statnet: Software Tools for the Statistical Modeling of Network Data. http://statnetproject.org.
Newman, M. E. J., 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E 64, 016132.
Opsahl, T., Colizza, V., Panzarasa, P., Ramasco, J. J., 2008. Prominence and control: The weighted rich-club effect. Physical Review Letters 101 (168702). arXiv:0804.0417.
Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163.
Entry Filed under: Network thoughts. Tags: actors, arcs, betweenness, centrality, closeness, clustering coefficient, complex networks, degree, directed networks, edges, embeddedness, global, graphs, hubs, Links, local, network, nodes, r-package, reciprocation, reinforcement, shortest distance, shortest path, social network analysis, software, strength of nodes, strength of ties, ties, undirected networks, valued networks, vertices, weighted networks, weighted-richclub.

Trackback this post | Subscribe to the comments via RSS Feed