Datasets

For the convenience of tnet users, I have collected a number of weighted network datasets that were available on the Internet, and made them conform to the required standard. Please cite the source if you use any of the datasets.

If there are any mistakes or conflicts of interest, please contact me

Weighted networks format
# Name Directed Nodes Ties tnet UCINET
1 Online Social Network yes 1899 20296 226kb 226kb
2 Freeman’s EIES network (time 1) yes 48 695 6kb 6kb
3 Freeman’s EIES network (time 2) yes 48 830 8kb 8kb
4 Freeman’s EIES network (messages) yes 32 460 5kb 5kb
5 Consulting (advice) yes 46 879 9kb 9kb
6 Consulting (value) yes 46 858 9kb 9kb
7 Research team (advice) yes 77 2228 22kb 22kb
8 Research team (awareness) yes 77 2326 23kb 23kb
9 C.elegans’ neural network yes 306 2345 22kb 22kb
10 US airport network no 500 2980 79kb 79kb
11 Newman’s scientific collaboration network no 16726 47594 1.98mb 1.98mb
12 Davis southern club women no 18 278 7kb 7kb

How to load datasets

tnet

To use tnet, you first need to download and install R (information from R’s website) and then download and install tnet within R (information from tnet’s website). You only need to do these steps once. Every time that you start R, you need to load tnet. This you can do by writing the following command

library(tnet)

A dataset can be loaded by writing a command similar to:

net <- read.table("<link to dataset>")

where is the link to the dataset in the above table, e.g. Freeman’s third EIES network can be loaded by the following command:

net <- read.table("http://opsahl.co.uk/tnet/datasets/Freemans_EIES-3_n32.txt")

UCINET

To use UCINET, you need to download and install UCINET (information on UCINET’s website). This programme is not free, but there is a 30-day trial period.

To load a dataset, you must download and save the dl-file of the dataset you wish to study from the above table to your computer. The network can be imported into UCINET by using the DL import function. You can find this function through the menu: Data > Import > DL. When the function’s dialog box opens, you must select the downloaded file containing the dataset by clicking on “…” after “Input text file in DL format”. The second box can be set to default, but do remember, and change if you wish, the name that appears in the third box as this will be the name of the internal UCINET file.

Network 1: Onlince Social Network

The first network is the Online Social Network-dataset used in my Ph.D. thesis. This network has also been described in Patterns and Dynamics of Users’ Behaviour and Interaction: Network Analysis of an Online Community and used in Prominence and control: The weighted rich-club effect and Clustering in weighted networks. The network originate from an online communication network among students at University of California, Irvine. The edgelist includes the users that sent or received at least one message during that period (1,899). A total number of 59,835 online messages were sent among these over 20,296 directed ties.

Citation: Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163, doi: 10.1016/j.socnet.2009.02.002

Network 2-4: Freeman’s EIES network

The second dataset is Freeman’s EIES networks (Freeman, 1979), also used in Wasserman and Faust (1994). This dataset was collected in 1978 and contains three networks of researchers working on social network analysis. The first network contains the personal relationships among 48 of the researchers at the beginning of the study (time 1). The second network is the personal relationship at the end of the study (time 2). In these two networks, all ties have a weight between 0 and 4. 4 represents a close personal friend of the researcher’s; 3 represents a friend; 2 represents a person the researcher has met; 1 represents a person the researcher has heard of, but not met; and 0 represents a person unknown to the researcher. The third network is different. It is a matrix with the number of messages sent among 32 of the researchers that used an electronic communication tool (frequency matrix).

There are three pieces of information about each of the 32 researchers that were part of the third network (nodal attributes): their name, the main disciplinary affiliation (1: sociology; 2: anthropology; 3: mathematics or statistics; and 4: others), and the number of citations each researcher had in the Social Science Citation Index in 1978.

Citation: Freeman, S.C., Freeman, L.C., 1979. The networkers network: A study of the impact of a new communications medium on sociometric structure. Social Science Research Reports 46. University of California, Irvine, CA.

Network 5-8: Intra-organisational networks

This dataset contains four networks are intra-organizational networks. Two are from a consulting company (46 employees) and two are from a research team in a manufacturing company (77 employees). These networks was used by Cross and Parker (2004).

In the first network, the ties are differentiated on a scale from 0 to 5 in terms of frequency of information or advice requests (“Please indicate how often you have turned to this person for information or advice on work-related topics in the past three months”). 0: I Do Not Know This Person; 1: Never; 2: Seldom; 3: Sometimes; 4: Often; and 5:Very Often.

In the second network, ties are differentiated in terms of the value placed on the information or advice received (“For each person in the list below, please show how strongly you agree or disagree with the following statement: In general, this person has expertise in areas that are important in the kind of work I do.”). The weights in this network is also based on a scale from 0 to 5. 0: I Do Not Know This Person; 1: Strongly Disagree; 2: Disagree; 3: Neutral; 4: Agree; and 5: Strongly Agree.

In the third network, the ties among the researchers are differentiated in terms of advice (“Please indicate the extent to which the people listed below provide you with information you use to accomplish your work”). The weights are based on the following scale: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Very Infrequently; 2: Infrequently; 3: Somewhat Infrequently; 4: Somewhat Frequently; 5: Frequently; and 6: Very Frequently.

The fourth network is based on the employees’ awareness of each others’ knowledge and skills (“I understand this person’s knowledge and skills. This does not necessarily mean that I have these skills or am knowledgeable in these domains but that I understand what skills this person has and domains they are knowledgeable in”). The weight scale in this network is: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Strongly Disagree; 2: Disagree; 3: Somewhat Disagree; 4: Somewhat Agree; 5: Agree; and 6: Strongly Agree.

In addition to the relational data, the dataset also contains information about the people (nodal attributes). The following attributes are known for the consultancy firm: the organisational level (1 Research Assistant; 2: Junior Consultant; 3: Senior Consultant; 4: Managing Consultant; 5: Partner), gender (1: male; 2: female), region (1: Europe; 2: USA), and location (1: Boston; 2: London; 3: Paris; 4: Rome; 5: Madrid; 6: Oslo; 7: Copenhagen).

For the researchers in the manufacturing company, the following attributes are known: location (1: Paris; 2: Frankfurt; 3: Warsaw; 4: Geneva), tenure (1: 1-12 months; 2: 13-36 months; 3: 37-60 months; 4: 61+ months) and the organisational level (1: Global Dept Manager; 2: Local Dept Manager; 3: Project Leader; 4: Researcher).

Citation: Cross, R., Parker, A., 2004. The Hidden Power of Social Networks. Harvard Business School Press, Boston, MA.

Network 9: The Caenorhabditis elegans worm’s neural network

This dataset contains the neural network of the Caenorhabditis elegans worm (C.elegans). It was studied by Watts and Strogatz (1998). The network contains 306 nodes that represent neurons. Two neurons are connected if at least one synapse or gap junction exist between them. The weight is the number of synapses and gap junctions. This network was obtained from the Collective Dynamics Group’s website. Note: This network contained 14 duplicated ties (i.e., a tie was mentioned twice in the edgelist). In the files available here, the duplicated tie pairs are merged, and the weight is the sum of the two identical ties.

Citation: Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.

Network 10: The network of the 500 busiest commercial airports in the United States

The nodes in this network is the 500 busiest commercial airports in the United States. A tie exists between two airports if a flight was scheduled between them in 2002. The weights corresponds to the number of seats available on the scheduled flights. Even thought this type of networks is directed by nature as a flight is scheduled from one airport and to another, the networks are highly symmetric (Barrat et al., 2004). Therefore, the version of this network is undirected (i.e., the weight of the tie from one airport towards another is equal to the weight of the reciprocal tie). This network was obtained from the Complex Networks Collaboratory’s website

Citation: Colizza, V., Pastor-Satorras, R., Vespignani, A., 2007. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nature Physics 3, 276-282.

Network 11: Newman’s scientific collaboration network

This is the co-authorship network of scientists based on preprints posted to Condensed Matter section of arXiv E-Print Archive between 1995 and 1999.

Two-mode to one-mode weighted projectionThis network can be classified as a two-mode or affiliation network since there are two types of “nodes” (authors and papers) and connections exist only between different types of nodes. An author (e.g., blue circle in the diagram) is connected to a paper (e.g., red circle) if her or his name appeared on it. The two-mode structure of the network is available in tnet two-mode format (659kb).

Few network measures exist for two-mode networks, and therefore, these networks are often projected onto a one-mode (only one type of nodes) network by selecting one of the types of nodes and linking two nodes if they were connected to the same node (of the other kind). This process is exemplified in the lower part of the diagram. The binary one-mode structure of the network is available in tnet format (1.21mb).

Traditionally, the ties in projected one-mode networks do not have weights. Recent empirical studies of two-mode networks has created a weighted network by defining the weights as the number of co-occurrences (e.g., the number of papers that two authors had collaborated on). The co-occurrence one-mode structure of the network is available in tnet format (1.21mb).

This method was refined by Newman (2001). He argued that smaller collaborations created stronger social bonds among scientists than larger ones. Therefore, he extended this procedure and proposed to define weights among the nodes use the following formula:
w_{ij} = \sum_p \frac{1}{N_p - 1}
where w_{ij} is the weight between node i and node j, p is the papers that they have collaborated on, and N_p is the number of authors on a paper. This implies that if two authors only write a single paper together with no other co-authors, they get a weight of 1 (e.g., node B and node D). However, if they have a co-author, the weight on the tie between them is 0.5 (e.g., node A and node C). If two authors have written two papers together without any co-author, the weight of their tie would be 2 (e.g., node B and E). A more complicated example is the tie between node A and node B in the diagram. They have written two papers together: one without any other co-author and one with node C as a co-author. The first paper would give their tie a weight of 1, and the second tie would add 0.5 to the weight of this tie. Therefore, the weight is 1.5. Note: This method has been explained in more detail in the following post: Projecting two-mode networks onto weighted one-mode networks. The one-mode structure of the network using Newman’s method is available in tnet format (1.98mb) and UCINET dl-format (1.98mb).

This network was obtained from Mark Newman’s website.

Citation: Newman, M. E. J., 2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America 98, 404-409.

Network 12: Davis southern club women

This dataset was collected by Davis and colleague in the 1930s. It contains the observed attendance at 14 social events by 18 Southern women (two-mode network, 1kb). This has been projected onto a binary one-mode network (3kb); a co-occurance one-mode network (3kb), and a one-mode network based on Newman’s (2001) method (tnet, 7kb; UCINET, 7kb). The first name of the women is also available (1kb).

Citation: Davis, A., Gardner, B. B., Gardner, M. R., 1941. Deep South. University of Chicago Press, Chicago, IL.

Network 13: Yours?

If you have a weighted network that you would like to have added to this page, just contact me

3 Comments Add your own

  • 1. Navneet Aggarwal  |  June 13, 2009 at 5:13 am

    Hi Tore,
    First off, congratulation on the completion of your Ph.D. Now, I have a question for you. Have you ever come across a network dataset that includes geospatial data along with the usual weights and edge lists? I am working on infrastructural networks and this can provide some extra information about node that are note connected, however I am having a hard time finding such a data set.

    Also, this blog looks great!!

    Regards,
    Nav.

    Reply
  • 2. Rasheed  |  November 23, 2009 at 8:15 am

    I just downloaded your Online Social Network dataset in UCINET format. its really nice and thanks for the effort. it is in “edge list” format. Is there any tool which provides conversion from edge list format to “node list” format? Kindly let me know.

    Reply
    • 3. Tore Opsahl  |  November 23, 2009 at 9:53 am

      Thanks for taking an interest in the dataset. To create a nodelist, I would:
      1) Download the dl file
      2) Import it in UCINET (Data > Import text file > DL…)
      3) Export it as a nodelist file (Data > Export > Raw…; Select Output format as Nodelist1, and Minimum tie val allowed as 1)

      Reply

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed