One-mode Data Structure

tnet » Software » One-mode Data Structure

Since most networks are sparse (i.e., the number of ties is much lower than the squared number of nodes, I opted for an edgelist format instead of a matrix one. A binary edgelist consists of two columns that represent the pairs of nodes that are tied together in a network (e.g., the edgelist1-format in UCINET’s dl files; Borgatti et al., 2002). When a directed network is represented, the first column represents the nodes that create the ties, whereas the second column represents the target nodes. This type of list has been extended to cover weighted networks by adding a third column representing the weight of the ties. While the matrix format records the weight of all possible ties (a non-established tie would get a weight of 0), this format records the sender, receiver, and weight of established ties only. The main advantage of this format is that it can scale to networks with many nodes as it is the number of ties, not nodes, that determine the size of the data object. Although many programmes can read edgelists, most network analysis programmes rely on an internal matrix representation, e.g. UCINET and the network-package in R (Butts, 2006). Conversely, Pajek, which was designed to analyse large-scale sparse networks, specifically uses an internal edgelist representation (Batagelj and Mrvar, 2007). By following Pajek, tnet can efficiently be applied to large-scale sparse networks.

A directed and weighted one-mode network.

In an effort to stay consistent with existing data formats, this three column table is also the format used by tnet. The object class of an edgelist in R should be the standard data.frame-class. This class allows the different columns of a table to be of different classes, such as integer and numeric (i.e., real numbers, which takes more space than integers). The first two columns of the edgelist are assumed integers (i.e., the identification number of the node creating the tie and the identification number of the node receiving the tie, respectively). The third column can be integers or numeric that represents the weights attached to the ties. To illustrate this format, the sample network on the right should be represented as follows (if a tie has two arrow heads, there are two directed ties between nodes; if there is a single number next to a two-arrow-head tie, this is the weight for both ties; if there are two numbers, the tie weight is close to the arrow head/receiver; the numbers in the square brackets are column/row headings and should not be included when loading data):

      [,1] [,2] [,3]
 [1,]    1    2    2
 [2,]    1    3    2
 [3,]    2    1    4
 [4,]    2    3    4
 [5,]    2    4    1
 [6,]    2    5    2
 [7,]    3    1    2
 [8,]    3    2    4
 [9,]    5    2    2
[10,]    5    6    1

Undirected Networks

All networks are assumed directed in tnet. To represent an undirected network, each tie must be mentioned twice – one in each direction. Currently, only the local clustering coefficient is defined solely for undirected networks. If the sample network above was undirected (symmetrised using the average tie weight), the network should be represented as follows:

      [,1] [,2] [,3]
 [1,]    1    2  3.0
 [2,]    1    3  2.0
 [3,]    2    1  3.0
 [4,]    2    3  4.0
 [5,]    2    4  0.5
 [6,]    2    5  2.0
 [7,]    3    1  2.0
 [8,]    3    2  4.0
 [9,]    4    2  0.5
[10,]    5    2  2.0
[11,]    5    6  0.5
[12,]    6    5  0.5

Loading Your Network

The most common way of loading a network is to read a text file with the network. The read.table-function is the standard method for reading text files. This function works by giving it a filename or link, and a character for separating the values into columns (e.g., a tab). It is important to not just read, but also assign the read file to an object. To illustrate this procedure, the directed and undirected networks can be loaded into the objects directed.net and undirected.net using these commands (note that these files are on the web, and hence, the link instead of a filename).

# Read the directed network
directed.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-directed-network.txt", sep="\t")

# Read the undirected network
undirected.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-undirected-network.txt", sep="\t")

Ensure that the network conforms to the tnet standard

To ensure that the network conforms to the tnet standard, the as.tnet-function can be used. This function is run automatically by the functions if it has not been run on the network manually. This function takes two parameters: the network and a character string specifying the type of network. If the type parameter is not set, an object will be assumed to be a one-mode edgelist if it has three columns or if it is a square matrix with more than 4 nodes. Below is the code for testing the directed network above.

# Load tnet
library(tnet)

# Read the directed network
directed.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-directed-network.txt", sep="\t")

# Check that it confirms to the tnet standard for weighted one-mode networks
directed.net <- as.tnet(directed.net, type="weighted one-mode tnet")

There are a number of functions that help users to convert data in other formats into the weighted edgelist format. For example, if a dataset is undirected, but there is only one entry for each tie in the edgelist, the symmetrise_w-function adds a second entry of the edge with the identification numbers of the creator and target nodes reversed. Moreover, if a dataset is similar to an edgelist, but with only two columns (representing the identification numbers of the creator and target nodes) and multiple entries of the same tie refer to the weight of that tie (e.g., if a tie has a weight of 3, it is included three times), then the shrink_to_weighted_network-function allows the users to convert the edgelist into the correct format. To allow for a comparison between weighted and binary network measures, the dichotomise_w-function creates a binary network from a weighted one. It does so by removing the ties in a weighted edgelist that fall below a certain cut-off and sets the weight to 1 for the remaining ones.

References

Batagelj, V., Mrvar, A., 2007. Pajek: Program for Large Network Analysis: version 1.20. http://pajek.imfm.si/.

Borgatti, S.P., Everett, M.G., Freeman, L.C., 2002. UCINET forWindows: Software for Social Network Analysis. Analytic Technologies, Harvard, USA.

Butts, C. T., 2006. sna-package: Package for Social Network Analysis. R package version 1.4.

Opsahl, T., 2009. Structure and Evolution of Weighted Networks. University of London (Queen Mary College), London, UK, pp. 104-122. Available at http://toreopsahl.com/publications/thesis/.

If you use tnet, please cite: Opsahl, T., 2009. Structure and Evolution of Weighted Networks. University of London (Queen Mary College), London, UK, pp. 104-122. Available at http://toreopsahl.com/publications/thesis/

9 Comments Add your own

  • 1. David Fisher  |  November 15, 2012 at 3:33 pm

    HI, love the blog, unfortunately when trying to shrink my 2-column edgelist (with duplicate interactions) in to a weighted edge list using the “shrink_to_weighted_network-function” I get the following error from R: Error in Ops.factor(net[, "i"], net[, "j"]) :
    level sets of factors are different

    Any ideas as to why this might be? Apologies if this query is in the wrong place.

    David

    Reply
    • 2. Tore Opsahl  |  November 15, 2012 at 3:53 pm

      Hi David,

      THanks for your comment. Are you using integer values as node ids? If you write class(net[,"i"]) or class(net[,"j"], the output should be integer or numeric. If you have more issues, send me an email with your code and data, and then I will have a look at it.

      Best,
      Tore

      Reply
  • 3. David Fisher  |  November 15, 2012 at 4:00 pm

    The node IDs are all unique but are a mixture of numbers and letters (e.g. “LA”, “A”, “S1″, “U2″), could this be causing the problem?

    Entering “class(binters08[,"i"])” gives me this error: “Error in `[.data.frame`(binters08, , “i”) : undefined columns selected”

    Cheers

    David

    Reply
  • 4. Carol Xu  |  February 28, 2013 at 5:12 am

    Hi,

    Don’t know if this issue ever got resolved but I’m having the same problem and some of my node IDs are also a mixture of numbers and letters separated by underscores, i.e. “GOOD_3″ and “MISS_2″. Is there a way to work around this or would I have to change the names?

    Thanks,
    Carol

    Reply
  • 6. xulace  |  March 1, 2013 at 7:30 pm

    Thank you!

    Reply
  • 7. Timo Eckhardt  |  January 10, 2014 at 9:49 am

    Hi Tore,
    your blog is indeed a rich and useful resource – thanks a lot for all your work!

    I have a directed, weighted network and created an edge-list according to the instructions you gave above (textfile with three columns and tabs separating the numbers). As far as I can judge, I successfully loaded the network into tnet. However, when I try to check whether the network conforms to the tnet standard using the as.tnet-function, tnet gives me the following error:

    “Error in if (sum(net[, 1] net[, 2]) == :
    missing value where TRUE/FALSE needed”

    Can you help me?

    Best,
    Timo

    Reply
    • 8. Tore Opsahl  |  January 10, 2014 at 2:37 pm

      Hi Timo,

      Please check the data classes of the object and columns by typing (assuming your network is loaded as net):
      class(net)
      class(net[,1])
      class(net[,2])
      class(net[,3])

      The first answer should be either data.frame or matrix, and the others should be either numeric or integer.

      If you have more problem, send me an email with the code and data you are using.

      Best,
      Tore

      Reply
      • 9. Timo Eckhardt  |  January 15, 2014 at 8:50 am

        Hi Tore,
        now it works perfectly! Thanks for your help!
        Timo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Subscribe to the comments via RSS Feed


Follow

Get every new post delivered to your Inbox.

Join 74 other followers

%d bloggers like this: