One-mode Data Structure

tnet » Software » One-mode Data Structure

Since most networks are sparse (i.e., the number of ties is much lower than the squared number of nodes, I opted for an edgelist format instead of a matrix one. A binary edgelist consists of two columns that represent the pairs of nodes that are tied together in a network (e.g., the edgelist1-format in UCINET’s dl files; Borgatti et al., 2002). When a directed network is represented, the first column represents the nodes that create the ties, whereas the second column represents the target nodes. This type of list has been extended to cover weighted networks by adding a third column representing the weight of the ties. While the matrix format records the weight of all possible ties (a non-established tie would get a weight of 0), this format records the sender, receiver, and weight of established ties only. The main advantage of this format is that it can scale to networks with many nodes as it is the number of ties, not nodes, that determine the size of the data object. Although many programmes can read edgelists, most network analysis programmes rely on an internal matrix representation, e.g. UCINET and the network-package in R (Butts, 2006). Conversely, Pajek, which was designed to analyse large-scale sparse networks, specifically uses an internal edgelist representation (Batagelj and Mrvar, 2007). By following Pajek, tnet can efficiently be applied to large-scale sparse networks.

A directed and weighted one-mode network.

In an effort to stay consistent with existing data formats, this three column table is also the format used by tnet. The object class of an edgelist in R should be the standard data.frame-class. This class allows the different columns of a table to be of different classes, such as integer and numeric (i.e., real numbers, which takes more space than integers). The first two columns of the edgelist are assumed integers (i.e., the identification number of the node creating the tie and the identification number of the node receiving the tie, respectively). The third column can be integers or numeric that represents the weights attached to the ties. To illustrate this format, the sample network on the right should be represented as follows (if a tie has two arrow heads, there are two directed ties between nodes; if there is a single number next to a two-arrow-head tie, this is the weight for both ties; if there are two numbers, the tie weight is close to the arrow head/receiver; the numbers in the square brackets are column/row headings and should not be included when loading data):

      [,1] [,2] [,3]
 [1,]    1    2    2
 [2,]    1    3    2
 [3,]    2    1    4
 [4,]    2    3    4
 [5,]    2    4    1
 [6,]    2    5    2
 [7,]    3    1    2
 [8,]    3    2    4
 [9,]    5    2    2
[10,]    5    6    1

Undirected Networks

All networks are assumed directed in tnet. To represent an undirected network, each tie must be mentioned twice — one in each direction. Currently, only the local clustering coefficient is defined solely for undirected networks. If the sample network above was undirected (symmetrised using the average tie weight), the network should be represented as follows:

      [,1] [,2] [,3]
 [1,]    1    2  3.0
 [2,]    1    3  2.0
 [3,]    2    1  3.0
 [4,]    2    3  4.0
 [5,]    2    4  0.5
 [6,]    2    5  2.0
 [7,]    3    1  2.0
 [8,]    3    2  4.0
 [9,]    4    2  0.5
[10,]    5    2  2.0
[11,]    5    6  0.5
[12,]    6    5  0.5

Loading Your Network

The most common way of loading a network is to read a text file with the network. The read.table-function is the standard method for reading text files. This function works by giving it a filename or link, and a character for separating the values into columns (e.g., a tab). It is important to not just read, but also assign the read file to an object. To illustrate this procedure, the directed and undirected networks can be loaded into the objects directed.net and undirected.net using these commands (note that these files are on the web, and hence, the link instead of a filename).

# Read the directed network
directed.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-directed-network.txt", sep="\t")

# Read the undirected network
undirected.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-undirected-network.txt", sep="\t")

Ensure that the network conforms to the tnet standard

To ensure that the network conforms to the tnet standard, the as.tnet-function can be used. This function is run automatically by the functions if it has not been run on the network manually. This function takes two parameters: the network and a character string specifying the type of network. If the type parameter is not set, an object will be assumed to be a one-mode edgelist if it has three columns or if it is a square matrix with more than 4 nodes. Below is the code for testing the directed network above.

# Load tnet
library(tnet)

# Read the directed network
directed.net <- read.table("http://opsahl.co.uk/tnet/datasets/one-mode-directed-network.txt", sep="\t")

# Check that it confirms to the tnet standard for weighted one-mode networks
directed.net <- as.tnet(directed.net, type="weighted one-mode tnet")

There are a number of functions that help users to convert data in other formats into the weighted edgelist format. For example, if a dataset is undirected, but there is only one entry for each tie in the edgelist, the symmetrise_w-function adds a second entry of the edge with the identification numbers of the creator and target nodes reversed. Moreover, if a dataset is similar to an edgelist, but with only two columns (representing the identification numbers of the creator and target nodes) and multiple entries of the same tie refer to the weight of that tie (e.g., if a tie has a weight of 3, it is included three times), then the shrink_to_weighted_network-function allows the users to convert the edgelist into the correct format. To allow for a comparison between weighted and binary network measures, the dichotomise_w-function creates a binary network from a weighted one. It does so by removing the ties in a weighted edgelist that fall below a certain cut-off and sets the weight to 1 for the remaining ones.

References

Batagelj, V., Mrvar, A., 2007. Pajek: Program for Large Network Analysis: version 1.20. http://pajek.imfm.si/.

Borgatti, S.P., Everett, M.G., Freeman, L.C., 2002. UCINET forWindows: Software for Social Network Analysis. Analytic Technologies, Harvard, USA.

Butts, C. T., 2006. sna-package: Package for Social Network Analysis. R package version 1.4.

Opsahl, T., 2009. Structure and Evolution of Weighted Networks. University of London (Queen Mary College), London, UK, pp. 104-122. Available at https://toreopsahl.com/publications/thesis/.

If you use tnet, please cite: Opsahl, T., 2009. Structure and Evolution of Weighted Networks. University of London (Queen Mary College), London, UK, pp. 104-122. Available at https://toreopsahl.com/publications/thesis/

19 Comments Add your own

  • 1. David Fisher  |  November 15, 2012 at 3:33 pm

    HI, love the blog, unfortunately when trying to shrink my 2-column edgelist (with duplicate interactions) in to a weighted edge list using the “shrink_to_weighted_network-function” I get the following error from R: Error in Ops.factor(net[, “i”], net[, “j”]) :
    level sets of factors are different

    Any ideas as to why this might be? Apologies if this query is in the wrong place.

    David

    Reply
    • 2. Tore Opsahl  |  November 15, 2012 at 3:53 pm

      Hi David,

      Thanks for your comment. Are you using integer values as node ids? If you write class(net[,”i”]) or class(net[,”j”], the output should be integer or numeric. If you have more issues, send me an email with your code and data, and then I will have a look at it.

      Best,
      Tore

      Reply
  • 3. David Fisher  |  November 15, 2012 at 4:00 pm

    The node IDs are all unique but are a mixture of numbers and letters (e.g. “LA”, “A”, “S1”, “U2”), could this be causing the problem?

    Entering “class(binters08[,”i”])” gives me this error: “Error in `[.data.frame`(binters08, , “i”) : undefined columns selected”

    Cheers

    David

    Reply
  • 4. Carol Xu  |  February 28, 2013 at 5:12 am

    Hi,

    Don’t know if this issue ever got resolved but I’m having the same problem and some of my node IDs are also a mixture of numbers and letters separated by underscores, i.e. “GOOD_3” and “MISS_2”. Is there a way to work around this or would I have to change the names?

    Thanks,
    Carol

    Reply
  • 6. xulace  |  March 1, 2013 at 7:30 pm

    Thank you!

    Reply
  • 7. Timo Eckhardt  |  January 10, 2014 at 9:49 am

    Hi Tore,
    your blog is indeed a rich and useful resource – thanks a lot for all your work!

    I have a directed, weighted network and created an edge-list according to the instructions you gave above (textfile with three columns and tabs separating the numbers). As far as I can judge, I successfully loaded the network into tnet. However, when I try to check whether the network conforms to the tnet standard using the as.tnet-function, tnet gives me the following error:

    “Error in if (sum(net[, 1] net[, 2]) == :
    missing value where TRUE/FALSE needed”

    Can you help me?

    Best,
    Timo

    Reply
    • 8. Tore Opsahl  |  January 10, 2014 at 2:37 pm

      Hi Timo,

      Please check the data classes of the object and columns by typing (assuming your network is loaded as net):
      class(net)
      class(net[,1])
      class(net[,2])
      class(net[,3])

      The first answer should be either data.frame or matrix, and the others should be either numeric or integer.

      If you have more problem, send me an email with the code and data you are using.

      Best,
      Tore

      Reply
      • 9. Timo Eckhardt  |  January 15, 2014 at 8:50 am

        Hi Tore,
        now it works perfectly! Thanks for your help!
        Timo

  • 10. Hermann Norpois  |  November 1, 2014 at 2:28 pm

    Hello,
    I have an undirected igraph object g and I want to measure the betweenness of g.
    I thought the following code should work to do so:

    # requires library (tnet) and library (igraph)
    betweenness.tnet <- function (g)# g is an igraph object
    {
    # get an edgelist of an igraph object
    net <- cbind (get.edgelist(g, names=FALSE), E(g)$weight)
    # symmetrise net
    net <- symmetrise_w (net)
    # measure betweeness of net
    net <- betweenness_w (net)

    return (net)
    }

    But I still get an error message.

    In as.tnet(net, type = "weighted one-mode tnet") :
    The network might be undirected. If this is the case, each tie should be mention twice. The symmetrise-function can be used to include reverse version of each tie.

    Can you please give some hints how to solve the problem.
    Thanks
    Hermann

    Reply
    • 11. Tore Opsahl  |  November 1, 2014 at 4:46 pm

      Hi Hermann,

      That message is not an error message, but a warning message. To get rid of it, you can do a manually duplicate the ties or use suppressWarnings. For example:

      # Load igraph and tnet
      library(tnet)
      
      # Create an igraph network (undirected)
      g <- igraph::erdos.renyi.game(n=10, p.or.m=0.25, type="gnp", directed=FALSE)
      # Add weights
      E(g)$weight <- sample.int(n=4, size=ecount(g), replace=TRUE)
      
      # Convert to tnet (and avoiding warning due to it being undirected)
      net <- cbind(get.edgelist(g, names=FALSE), E(g)$weight)
      net <- rbind(net, cbind(net[,c(2,1,3)]))
      
      # You can also do the following instead of line 11
      #net <- suppressWarnings(symmetrise_w(net))
      
      # Compute weighted betweeness
      out <- betweenness_w(net)
      

      Best,
      Tore

      Reply
  • 12. Julius Mayer  |  January 9, 2015 at 2:35 pm

    Hi Tore,

    I am currently constructing a weighted, directed one-mode network (adjacency matrix) where I use the delta of the weight between two nodes, which makes me end up with negative values. For example I have Node A > B value 10 and B > A value 2, then I ultimately get value 8 for the A > B connection and -8 for the B > A connection. My question is if tnet can cope with those negative values since other packets cannot.

    Reply
    • 13. Tore Opsahl  |  January 10, 2015 at 3:51 pm

      Hi Julius,

      Unfortunately, none of the functions implemented in tnet are designed for negative weights. What metrics would you like to use?

      Tore

      Reply
      • 14. Julius Mayer  |  January 10, 2015 at 5:55 pm

        I am trying to compute a network for hierarchies between regions. It is somewhat similar to Taylors Global City network where he computes the influence of Cities from the subnodal company level, if you are familiar with that. I am not using a 2-mode network though but a 1-mode adjacency matrix with regions x regions looking like this:

        1 2 3 4
        1 0 0 0 0
        2 118 4353 275 717
        3 14 315 968 161
        4 0 213 161 1064

        And after building the delta between 2 regions it looks like this:

        1 2 3 4
        1 0 -118 -14 0
        2 118 0 -40 504
        3 14 40 0 0
        4 0 -504 0 0

        I do this to illustrate the power of one region over another. This also gets rid of the problem that regions often have a huge power over themselves. However, I have not found a solution to the problem of negative weights.

        You probably also need to know that am I comparing different points in time and the size of the network changes over time. However, I do not know how to normalize it because the weights would be messed up that way.

  • 15. Julius Mayer  |  January 10, 2015 at 7:34 pm

    Sorry the matrix got out of place a little, hope its still readable

    Reply
  • 16. Tore Opsahl  |  January 10, 2015 at 7:42 pm

    Your best bet is probably to reach out to the people behind the method that you want to apply whether they have a piece of code that does that analysis.

    Good luck,
    Tore

    Reply
  • 17. Claudio de la O  |  October 20, 2016 at 11:08 pm

    Dear Tore,

    how should I introduce my data for an undirected edgelist, if simplifying from a directed one?
    For exaple:
    A B 4
    B A 2

    So, actually I’ve got 6 interactions between A and B. So the undirected edgelist must be:
    A B 6
    B A 6
    or
    A B 3
    B A 3

    Does this affect the network parameters?

    Kind regards.

    Reply
    • 18. Tore Opsahl  |  October 26, 2016 at 5:05 pm

      Hi Claudio,

      First of all, I would recommend that you keep the directed nature of the network. Most network metrics are defined for directed networks and these are implemented in tnet.

      Second, method used to symmetrize the tie weights matter for most metrics (unless normalized…). See the symmetrise_w-function for more details.

      More generally, an undirected tie in tnet is represented by two directed ties — one in each directions. For example from node 1 to node 2 with a weight of 5 is represented as:
      1 2 5
      2 1 5

      If you do not have the second tie, you can use the symmetrise_w-function to create them.

      Good luck!
      Tore

      Reply
      • 19. Claudio  |  December 2, 2016 at 12:55 am

        Thank you so much. I asked this because for the moment I am not interested in discriminate the direction of the interaction between A and B, rather in analyzing how often they interact. Anyway, your answer was really informative. Thanks again!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Subscribe to the comments via RSS Feed


%d bloggers like this: