One-mode Data Structure
Since most networks are sparse (i.e., the number of ties is much lower than the squared number of nodes, I opted for an edgelist format instead of a matrix one. A binary edgelist consists of two columns that represent the pairs of nodes that are tied together in a network (e.g., the edgelist1-format in UCINET’s dl files; Borgatti et al., 2002). When a directed network is represented, the first column represents the nodes that create the ties, whereas the second column represents the target nodes. This type of list has been extended to cover weighted networks by adding a third column representing the weight of the ties. While the matrix format records the weight of all possible ties (a non-established tie would get a weight of 0), this format records the sender, receiver, and weight of established ties only. The main advantage of this format is that it can scale to networks with many nodes as it is the number of ties, not nodes, that determine the size of the data object. Although many programmes can read edgelists, most network analysis programmes rely on an internal matrix representation, e.g. UCINET and the network-package in R (Butts, 2006). Conversely, Pajek, which was designed to analyse large-scale sparse networks, specifically uses an internal edgelist representation (Batagelj and Mrvar, 2007). By following Pajek, tnet can efficiently be applied to large-scale sparse networks.In an effort to stay consistent with existing data formats, this three column table is also the format used by tnet. The object class of an edgelist in R should be the standard data.frame-class. This class allows the different columns of a table to be of different classes, such as integer and numeric (i.e., real numbers, which takes more space than integers). The first two columns of the edgelist are assumed integers (i.e., the identification number of the node creating the tie and the identification number of the node receiving the tie, respectively). The third column can be integers or numeric that represents the weights attached to the ties. To illustrate this format, the sample network on the right should be represented as follows (if a tie has two arrow heads, there are two directed ties between nodes; if there is a single number next to a two-arrow-head tie, this is the weight for both ties; if there are two numbers, the tie weight is close to the arrow head/receiver; the numbers in the square brackets are column/row headings and should not be included when loading data):
[,1] [,2] [,3] [1,] 1 2 2 [2,] 1 3 2 [3,] 2 1 4 [4,] 2 3 4 [5,] 2 4 1 [6,] 2 5 2 [7,] 3 1 2 [8,] 3 2 4 [9,] 5 2 2 [10,] 5 6 1
All networks are assumed directed in tnet. To represent an undirected network, each tie must be mentioned twice – one in each direction. Currently, only the local clustering coefficient is defined solely for undirected networks. If the sample network above was undirected (symmetrised using the average tie weight), the network should be represented as follows:
[,1] [,2] [,3] [1,] 1 2 3.0 [2,] 1 3 2.0 [3,] 2 1 3.0 [4,] 2 3 4.0 [5,] 2 4 0.5 [6,] 2 5 2.0 [7,] 3 1 2.0 [8,] 3 2 4.0 [9,] 4 2 0.5 [10,] 5 2 2.0 [11,] 5 6 0.5 [12,] 6 5 0.5
Loading Your Network
The most common way of loading a network is to read a text file with the network. The
read.table-function is the standard method for reading text files. This function works by giving it a filename or link, and a character for separating the values into columns (e.g., a tab). It is important to not just read, but also assign the read file to an object. To illustrate this procedure, the directed and undirected networks can be loaded into the objects
undirected.net using these commands (note that these files are on the web, and hence, the link instead of a filename).
# Read the directed network directed.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-directed-network.txt", sep="\t") # Read the undirected network undirected.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-undirected-network.txt", sep="\t")
Ensure that the network conforms to the tnet standard
To ensure that the network conforms to the tnet standard, the
as.tnet-function can be used. This function is run automatically by the functions if it has not been run on the network manually. This function takes two parameters: the network and a character string specifying the type of network. If the type parameter is not set, an object will be assumed to be a one-mode edgelist if it has three columns or if it is a square matrix with more than 4 nodes. Below is the code for testing the directed network above.
# Load tnet library(tnet) # Read the directed network directed.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-directed-network.txt", sep="\t") # Check that it confirms to the tnet standard for weighted one-mode networks directed.net <- as.tnet(directed.net, type="weighted one-mode tnet")
There are a number of functions that help users to convert data in other formats into the weighted edgelist format. For example, if a dataset is undirected, but there is only one entry for each tie in the edgelist, the
symmetrise_w-function adds a second entry of the edge with the identification numbers of the creator and target nodes reversed. Moreover, if a dataset is similar to an edgelist, but with only two columns (representing the identification numbers of the creator and target nodes) and multiple entries of the same tie refer to the weight of that tie (e.g., if a tie has a weight of 3, it is included three times), then the
shrink_to_weighted_network-function allows the users to convert the edgelist into the correct format. To allow for a comparison between weighted and binary network measures, the
dichotomise_w-function creates a binary network from a weighted one. It does so by removing the ties in a weighted edgelist that fall below a certain cut-off and sets the weight to 1 for the remaining ones.
Batagelj, V., Mrvar, A., 2007. Pajek: Program for Large Network Analysis: version 1.20. http://pajek.imfm.si/.
Borgatti, S.P., Everett, M.G., Freeman, L.C., 2002. UCINET forWindows: Software for Social Network Analysis. Analytic Technologies, Harvard, USA.
Butts, C. T., 2006. sna-package: Package for Social Network Analysis. R package version 1.4.
Opsahl, T., 2009. Structure and Evolution of Weighted Networks. University of London (Queen Mary College), London, UK, pp. 104-122. Available at http://toreopsahl.com/publications/thesis/.