Datasets

For the convenience of tnet users, I have collected a number of network datasets that were available on the Internet, and made them conform to the required standard. If you have a network that you would like to add to this page or if there are any mistakes or conflicts of interest, please contact me.

Note: Please do cite the mentioned reference if you use a dataset.

To make it easier for other researchers, it is possible to downloaded the networks in their native form and transformed versions. For example, the Facebook-like Social Network is available as a longitudinal one-mode network (native form) and as a static one-mode network. Two-mode networks are transformed to weighted one-mode networks as described on the projecting two-mode networks onto weighted one-mode networks-page.

For instructions on how to load the datasets in tnet and UCINET, see the end of this page

nodes time-stamped ties two-mode ties one-mode ties
primary secondary attributes # weighted # weighted # directed weighted
1: Facebook-like Social Network 1899 0 59835 20296
2: Facebook-like Forum Network 899 522 0 33720 7089 71380 ×
3: Freeman’s EIES Network (time 1) 48 0 695
4: Freeman’s EIES Network (time 2) 48 0 830
5: Freeman’s EIES Network (msgs) 32 3 460
6: C.elegans Neural Network 306 0 2345
7: Norwegian Boards (Aug’09) 1495 367 4 1834 × 4065 ×
8: Organisational (Consult; Advice) 46 4 879
9: Organisational (Consult; Value) 46 4 858
10: Organisational (R&D; Advice) 77 3 2228
11: Organisational (R&D; Aware) 77 3 2326
12: Scientific Collaboration 16726 22016 1 58595 × 47594 ×
13: Southern Women Network 18 14 1 89 × 278 ×
14a: US Top-500 Airport Network 500 0 2980 ×
14b: US Airport Network 1574 0 28236
14c: Openflights 7976 10 30501
15: US Power Grid 4941 0 6594 × ×


Network 1: Facebook-like Social Network

The Facebook-like Social Network originate from an online community for students at University of California, Irvine. The dataset includes the users that sent or received at least one message (1,899). A total number of 59,835 online messages were set over 20,296 directed ties among these users. This dataset was the main dataset used in my Ph.D. thesis. This network has also been described in Patterns and Dynamics of Users’ Behaviour and Interaction: Network Analysis of an Online Community and used in a number of articles including Prominence and control: The weighted rich-club effect and Clustering in weighted networks. Although this dataset contains many nodal attributes (e.g., gender, age, and course attended), these are not made available as it would be possible to reverse engineer the anonymisation procedure of users. Self-loops in the longitudinal edgelist signal the time that users registered on the site.

Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163, doi: 10.1016/j.socnet.2009.02.002


Network 2: Facebook-like Forum Network

The Facebook-like Forum Network was attained from the same online community as the online social network; however, the focus in this network is not on the private messages exchanged among users, but on users’ activity in the forum. The forum represents an interesting two-mode network among 899 users and 522 topics in that a weight can be assigned to the ties based on the number of messages or characters that a user posted to a topic. When transforming this weighted two-mode network into a one-mode network, I have maintained the users as I believe these are directly responsible for the tie generation. The number of users in this network is smaller than in the online social network as all users that sent or received private messages did not participate in the forum. Note that the identification numbers do not match with the online social network. The two-mode networks are projected onto one-mode networks using the procedure outlined on the projecting two-mode networks onto weighted one-mode networks-page.

  • Weighted longitudinal two-mode network (weighted by number of characters): tnet-format
  • Binary longitudinal two-mode network: tnet-format
  • Weighted static two-mode network (weighted by number of messages): tnet-format
  • Weighted static two-mode network (weighted by number of characters): tnet-format
  • Weighted static one-mode network (weighted by number of messages; sum): tnet-format; UCINET-format
  • Weighted static one-mode network (weighted by number of characters; sum): tnet-format; UCINET-format
  • Weighted static one-mode network (weighted by number of messages; Newman’s method): tnet-format; UCINET-format
  • Weighted static one-mode network (weighted by number of characters; Newman’s method): tnet-format; UCINET-format
Opsahl, T. 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks 35 (2), 159-167, doi: 10.1016/j.socnet.2011.07.001.


Network 3-5: Freeman’s EIES dataset

The second dataset is Freeman’s EIES networks (Freeman, 1979), also used in Wasserman and Faust (1994). This dataset was collected in 1978 and contains three networks of researchers working on social network analysis. The first network contains the personal relationships among 48 of the researchers at the beginning of the study (time 1). The second network is the personal relationship at the end of the study (time 2). In these two networks, all ties have a weight between 0 and 4. 4 represents a close personal friend of the researcher’s; 3 represents a friend; 2 represents a person the researcher has met; 1 represents a person the researcher has heard of, but not met; and 0 represents a person unknown to the researcher. The third network is different. It is a matrix with the number of messages sent among 32 of the researchers that used an electronic communication tool (frequency matrix).

There are three pieces of information about each of the 32 researchers that were part of the third network (nodal attributes): their name, the main disciplinary affiliation (1: sociology; 2: anthropology; 3: mathematics or statistics; and 4: others), and the number of citations each researcher had in the Social Science Citation Index in 1978.

Freeman, S.C., Freeman, L.C., 1979. The networkers network: A study of the impact of a new communications medium on sociometric structure. Social Science Research Reports 46. University of California, Irvine, CA.


Network 6: The Caenorhabditis elegans worm’s neural network

This dataset contains the neural network of the Caenorhabditis elegans worm (C.elegans). It was studied by Watts and Strogatz (1998). The network contains 306 nodes that represent neurons. Two neurons are connected if at least one synapse or gap junction exist between them. The weight is the number of synapses and gap junctions. This network was obtained from the Collective Dynamics Group’s website. Note: This network contained 14 duplicated ties (i.e., a tie was mentioned twice in the edgelist). In the files available here, the duplicated tie pairs are merged, and the weight is the sum of the two identical ties.

Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.


Network 7: Norwegian Interlocking Directorate (August 2009)

This is the interlocking directorate among 384 public limited companies in Norway (Allmennaksjeselskap or ASA). The list of companies is created by selecting all companies listed as public limited companies on the website of the Norwegian Business Register on August 5, 2009. For each company, we downloaded public announcements containing changes to the boards’ composition since November 1999. From these announcements, we extracted monthly affiliation (or two-mode) networks since May 2002 (see website for choice of cut-off). Corresponding one-mode projections are also available. We strive to keep the data updated by downloading new announcements around the middle of each month.

As we are not including new companies in the list, but remove companies if they file a bankruptcy notice, the dataset is shrinking. This was also the case with the data used in the original paper (Seierstad and Opsahl, 2011). Although the paper is based on August 1, 2009, data, 17 companies had given a bankruptcy notice by this time. Thus, there were only 367 companies with 1,495 directors.

This dataset contains some nodal attributes. The directors’ and companies’ names are known. In addition, for the companies, the city and post code of their registered office are known, while for the directors, the gender is known.

The data files are available through www.boardsandgender.com along with a description of how the data is collected and directors’ gender determined.

Seierstad, C., Opsahl, T., 2011. For the few not the many? The effects of affirmative action on presence, prominence, and social capital of women directors in Norway. Scandinavian Journal of Management 27 (1), 44-54, doi: 10.1016/j.scaman.2010.10.002


Network 8-11: Intra-organisational networks

This dataset contains four networks are intra-organizational networks. Two are from a consulting company (46 employees) and two are from a research team in a manufacturing company (77 employees). These networks was used by Cross and Parker (2004).

In the first network, the ties are differentiated on a scale from 0 to 5 in terms of frequency of information or advice requests (“Please indicate how often you have turned to this person for information or advice on work-related topics in the past three months”). 0: I Do Not Know This Person; 1: Never; 2: Seldom; 3: Sometimes; 4: Often; and 5:Very Often.

In the second network, ties are differentiated in terms of the value placed on the information or advice received (“For each person in the list below, please show how strongly you agree or disagree with the following statement: In general, this person has expertise in areas that are important in the kind of work I do.”). The weights in this network is also based on a scale from 0 to 5. 0: I Do Not Know This Person; 1: Strongly Disagree; 2: Disagree; 3: Neutral; 4: Agree; and 5: Strongly Agree.

In the third network, the ties among the researchers are differentiated in terms of advice (“Please indicate the extent to which the people listed below provide you with information you use to accomplish your work”). The weights are based on the following scale: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Very Infrequently; 2: Infrequently; 3: Somewhat Infrequently; 4: Somewhat Frequently; 5: Frequently; and 6: Very Frequently.

The fourth network is based on the employees’ awareness of each others’ knowledge and skills (“I understand this person’s knowledge and skills. This does not necessarily mean that I have these skills or am knowledgeable in these domains but that I understand what skills this person has and domains they are knowledgeable in”). The weight scale in this network is: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Strongly Disagree; 2: Disagree; 3: Somewhat Disagree; 4: Somewhat Agree; 5: Agree; and 6: Strongly Agree.

In addition to the relational data, the dataset also contains information about the people (nodal attributes). The following attributes are known for the consultancy firm: the organisational level (1 Research Assistant; 2: Junior Consultant; 3: Senior Consultant; 4: Managing Consultant; 5: Partner), gender (1: male; 2: female), region (1: Europe; 2: USA), and location (1: Boston; 2: London; 3: Paris; 4: Rome; 5: Madrid; 6: Oslo; 7: Copenhagen).

For the researchers in the manufacturing company, the following attributes are known: location (1: Paris; 2: Frankfurt; 3: Warsaw; 4: Geneva), tenure (1: 1-12 months; 2: 13-36 months; 3: 37-60 months; 4: 61+ months) and the organisational level (1: Global Dept Manager; 2: Local Dept Manager; 3: Project Leader; 4: Researcher).

Cross, R., Parker, A., 2004. The Hidden Power of Social Networks. Harvard Business School Press, Boston, MA.


Network 12: Newman’s scientific collaboration network

This is the co-authorship network of based on preprints posted to Condensed Matter section of arXiv E-Print Archive between 1995 and 1999. This dataset can be classified as a two-mode or affiliation network since there are two types of “nodes” (authors and papers) and connections exist only between different types of nodes. The two-mode network is projected onto one-mode networks using the procedure outlined on the projecting two-mode networks onto weighted one-mode networks-page. In addition to the network data, the names of the authors (369kb) are available.

  • Binary static two-mode network: tnet-format (659kb)
  • Binary static one-mode network: tnet-format (1.21mb)
  • Weighted static one-mode network (sum of joint papers): tnet-format (1.21mb)
  • Weighted static one-mode network (Newman’s projection method): tnet-format (1.98mb); UCINET-format (1.98mb)

This network was given by Mark Newman.

Newman, M. E. J., 2001. The structure of scientific collaboration networks. PNAS 98, 404-409.


Network 13: Davis’ Southern Women Club

This dataset was collected by Davis and colleague in the 1930s. It contains the observed attendance at 14 social events by 18 Southern women. For a more detailed description, see Davis et al. (1941) or Wasserman and Faust (1994). The first name of the women is also available (1kb).

Davis, A., Gardner, B. B., Gardner, M. R., 1941. Deep South. University of Chicago Press, Chicago, IL.


Network 14: The network of airports in the United States

There are three US airport networks. The first is the network of the 500 busiest commercial airports in the United States. This dataset was used in Colizza et al. (2007). A tie exists between two airports if a flight was scheduled between them in 2002. The weights corresponds to the number of seats available on the scheduled flights. Even thought this type of networks is directed by nature as a flight is scheduled from one airport and to another, the networks are highly symmetric (Barrat et al., 2004). Therefore, the version of this network is undirected (i.e., the weight of the tie from one airport towards another is equal to the weight of the reciprocal tie). This network was obtained from the Complex Networks Collaboratory’s website

Colizza, V., Pastor-Satorras, R., Vespignani, A., 2007. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nature Physics 3, 276-282.

The second dataset is the complete US airport network in 2010. This is the network used in the first part of the Why Anchorage is not (that) important: Binary ties and Sample selection-blog post. The data is downloaded from the Bureau of Transportation Statistics (BTS) Transtats site (Table T-100; id 292) with the following filters: Geography=all; Year=2010; Months=all; and columns: Passengers, Origin, Dest. Based on this table, the airport codes are converted into id numbers, and the weights of duplicated ties are summed up. Also ties with a weight of 0 are removed (only cargo), and self-loops removed.

The third dataset was also used in the Why Anchorage is not (that) important: Binary ties and Sample selection-blog post. The data is downloaded from Openflights.org. Unlike the BTS data, this dataset contains ties between two non-US-based airports. As such, it gives much more of a complete picture and avoids the sample selection. The weights in this network refer to the number of routes between two airports. Airport attributes are available.


Network 15: The US power grid

This is the network is the high-voltage power grid in the Western States of the United States of America. The nodes are transformers, substations, and generators, and the ties are high-voltage transmission lines. This network was originally used in Watts and Strogatz (1998). Although the transmission lines can be directed and differentiated based on their capacity, this information is not available.

Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.


How to load datasets

tnet

To use tnet, you first need to download and install R and then download and install tnet within R (information from tnet’s website). You only need to do these steps once. Every time that you start R, you need to load tnet. This you can do by writing the following command

library(tnet)

A dataset can be loaded by writing a command similar to:

net <- read.table("<link to dataset>")

where is the link to the dataset in the above table, e.g. Freeman’s third EIES network can be loaded by the following command:

net <- read.table("http://opsahl.co.uk/tnet/datasets/Freemans_EIES-3_n32.txt")

UCINET

To use UCINET, you need to download and install UCINET (information on UCINET’s website). This programme is not free, but there is a 30-day trial period.

To load a dataset, you must download and save the dl-file of the dataset you wish to study from the above table to your computer. The network can be imported into UCINET by using the DL import function. You can find this function through the menu: Data > Import > DL. When the function’s dialog box opens, you must select the downloaded file containing the dataset by clicking on “…” after “Input text file in DL format”. The second box can be set to default, but do remember, and change if you wish, the name that appears in the third box as this will be the name of the internal UCINET file.

tnet: Analysis of Weighted, Two-mode, and Longitudinal Networks
Weighted Networks Two-mode Networks Longitudinal Networks Software Datasets

49 Comments Add your own

  • 1. Navneet Aggarwal  |  June 13, 2009 at 5:13 am

    Hi Tore,
    First off, congratulation on the completion of your Ph.D. Now, I have a question for you. Have you ever come across a network dataset that includes geospatial data along with the usual weights and edge lists? I am working on infrastructural networks and this can provide some extra information about node that are note connected, however I am having a hard time finding such a data set.

    Also, this blog looks great!!

    Regards,
    Nav.

    Reply
  • 2. Rasheed  |  November 23, 2009 at 8:15 am

    I just downloaded your Online Social Network dataset in UCINET format. its really nice and thanks for the effort. it is in “edge list” format. Is there any tool which provides conversion from edge list format to “node list” format? Kindly let me know.

    Reply
    • 3. Tore Opsahl  |  November 23, 2009 at 9:53 am

      Thanks for taking an interest in the dataset. To create a nodelist, I would:
      1) Download the dl file
      2) Import it in UCINET (Data > Import text file > DL…)
      3) Export it as a nodelist file (Data > Export > Raw…; Select Output format as Nodelist1, and Minimum tie val allowed as 1)

      Reply
  • 4. Nina  |  May 2, 2010 at 7:53 pm

    Dear Tore, thanks for sharing the data with us. I’m most interested into your bipartite network data. The one you compiled from Newman – how did you actually do that? I could only find a compiled, weighted, one-mode version from him. Do you have the raw data from him? Or could you tell me which author hides behind which ID? Is the first column the author or the paper?
    Both informations would be very valuable to me.
    The same with the women-event data: do you have a key which woman is what and which event is which? That would be very interesting! Thanks again for sharing,
    Nina

    Reply
    • 5. Tore Opsahl  |  May 3, 2010 at 10:48 pm

      Nina,

      Glad you find this page useful.

      Mark Newman sent me the two-mode network, which did contain the names of the authors. I have uploaded this file now (see the network description). Also, in the description of the Southern Women dataset is a link to the women’s first name.

      All the two-mode network files list people ids first, and then the paper/event ids.

      Hope this helps,

      Tore

      Reply
  • 6. Thor Sigfusson  |  June 9, 2010 at 12:26 pm

    Dear Tore,

    I find your site extremely helpful. In my PhD I am working on a study of 20 entrepreneurial high tech firms in Iceland and building a framework of their international ventures. As the interviews with the firms have proceeded I have noticed aspects which have drawn my attention to social networks. Especially interesting was that many of the entrepreneurs mentioned the same individuals who were connectors in their initial international ventures. I am also observing their relations on the new social networks on the web and the networks which they talk about in the interviews and the networks on the web are quite different! This would be interesting to observe with SNAS tools. I have now a list of 130 individuals/links (all numbered from 1 to 130). Now I am struggling to set it up in an R-Framework. Any suggestions?

    Best regards

    Thor
    University of Iceland

    Reply
    • 7. Tore Opsahl  |  June 9, 2010 at 2:41 pm

      Hi Thor,

      The standard answer is to use an edgelist as most R-packages can easily load this format. An edgelist is a list of the ties between nodes. For example:

      1 2
      3 1

      would be the two ties from node 1 to node 2 and from node 3 to node 1. My package, tnet, uses a third column that differentiates the ties from one another:

      1 2 4
      3 1 2

      so here, the first tie is twice as strong as the second tie.

      To load a text file (.txt) with an edgelist, simply write in R:

      net <- read.table(“filename_of_edgelist_file.txt”)

      That said, the standard network analysis packages might not be appropriate for you as they generally are cross-sectional and cannot distinguish between the first tie and the last tie. From you description, you might want to keep this temporal aspect in your analysis. Also, you might want to consider keeping the two-mode structure (i.e., people and companies) instead of simply creating a network among people (see the post on projecting networks).

      All the best,
      Tore

      Reply
  • 8. jay  |  June 17, 2010 at 8:26 am

    Hi Tore,

    I am trying to use tnet for a behavioral dataset as a trial, I tried an undirected network .

    sampledata <- rbind(
    c(1,2,3),
    c(1,3,2),
    c(2,3,4),
    c(2,4,5),
    c(2,5,2),
    c(6,5,1))

    and used the weighted_richclub_w() function on the above. I find that for directed=FALSE and for rich="k" and reshuffle="weights", it gives the following error message:

    "Error in random.m[i, j + 1] <- random.phi[which(random.phi[, "x"] == j), :
    subscript out of bounds"

    The same error appears for rich="k" and reshuffle="links" with directed=FALSE

    With directed=NULL there is no error !

    1. I don't understand why it gave an error with directed=FALSE even though the input was actually an undirected network.

    2. For the case of directed weighted network, how is the generalization done for null models weight reshuffle and weights & links reshuffle of your PRL 101 paper ?

    3. Does the combination rich="k", reshuffle="links" correspond to weight reshuffle, and rich="k", reshuffle="weights" correspond to weights & links reshuffle and rich="s", reshuffle="weights.local" correspond to directed reshuffle null model of your PRL 101 paper ?

    Your comments on these will help me apply tnet to my data.

    ciao,
    jay

    Reply
    • 9. Tore Opsahl  |  June 19, 2010 at 10:19 am

      For undirected networks, each tie should be mentioned twice — one in each direction. You can use the symmetrise-function to do this. Let me know if you have further issues after running this command.

      Tore

      Reply
  • 10. Arash  |  July 26, 2010 at 3:40 am

    Dear Dr.Tore :
    How can I convert the raw data of a database into a structured form of dataset, like datasets which you have uploaded ?

    Reply
    • 11. Tore Opsahl  |  July 26, 2010 at 5:31 pm

      Arash,

      That entirely depend on your raw data. R is great for manipulating data: have a look at Quick-R, http://www.statmethods.net/, for an introduction.

      Best,
      Tore

      Reply
  • 12. Praveen  |  December 17, 2010 at 5:06 pm

    Hi,

    The datasets are great, but is there anyway I can get the headers of those. For some of those I couldn’t figure out the headers..

    Thank you
    + Praveen

    Reply
    • 13. Tore Opsahl  |  December 17, 2010 at 5:37 pm

      Praveen,

      The datasets are in tnet format. For one-mode networks, this is “sender”, “receiver”, and “tie weight” (for undirected networks, each tie is mentioned twice – one in each direction). For two-mode networks, it is primary nodes, secondary nodes, and tie weight (optional). For longitudinal networks: “time”, “sender”, “receiver”, and “tie weight”.

      Best,
      Tore

      Reply
  • 14. Hitechreview  |  February 21, 2011 at 10:20 am

    Hi Tore
    These datasets are really precious.
    Thanks for your efforts in centrality measures.

    Regards

    Reply
  • 15. Tuğçe  |  January 2, 2012 at 3:25 pm

    Hello,
    Could you check the Ucinet format buttons of all of them because they are not working for me. Is it me or the links broken?

    Reply
    • 16. Tore Opsahl  |  January 2, 2012 at 10:32 pm

      Tuğçe,

      Thanks for letting me know. I just tried all of them, and they seemed to work. As they are all hosted on the same server, they should all work or all not work (e.g., if the server is down). Please email me if you have further issues.

      Tore

      Reply
  • 17. yanhaojie  |  May 3, 2012 at 2:46 pm

    hello,
    they are presice data ,and thank you .but i can’t download the data, and i don’t konw why, i tried many times. so i will appreciate it that you can send me these data in type of ucinet, and thanks a lot !

    Reply
    • 18. Tore Opsahl  |  May 3, 2012 at 3:19 pm

      Hi Yanhaojie,

      Which datasets do you have an issue with? They seem to be working fine for me. Have you tried to right-click and save as? If this doesn’t work, send me an email and I will send you the specific datasets.

      Best,
      Tore

      Reply
  • 19. Payam  |  June 6, 2012 at 9:07 pm

    Dr. Tore,

    I am Master student working on my thesis which is about using social networks in promoting healthy behaviors. I need adjacency matrices from Facebook users networks. Do u have data sets include adjacency matrices from Facebook users networks?

    Thanks for your help

    Reply
    • 20. Tore Opsahl  |  June 7, 2012 at 2:25 pm

      Hi Payam,

      It is not that straight forward to get Facebook users’ networks. Have a look at the work Bernie Hogan at the Oxford Internet Institute.

      Best,
      Tore

      Reply
  • 21. sarabjot  |  July 25, 2012 at 8:03 am

    Hi Dr. Tore.

    I am a student of M.tech . I need datasets for determining power users in social networks

    Reply
  • 22. Gordon Govan  |  May 17, 2013 at 4:39 pm

    Dr. Tore,

    I’ve been trying to use the C. Elegans network that you have linked.
    In your description you say that it has 306 nodes, but only 297 are mentioned in the files you supply.

    The missing nodes are 283-290 and 304.

    Do you know where these nodes could have gone?

    Reply
    • 23. Tore Opsahl  |  May 20, 2013 at 9:42 am

      Hi Gordon,

      Thanks for your comments, and making sure that the data is correct. The c. elegans files on here are direct copies from Duncan Watt’s old group website. I would try to reach out to him if you have a question regarding the data.

      Let me know if you find the answer!

      Best,
      Tore

      Reply
  • 24. roya  |  May 29, 2013 at 10:51 pm

    Dear tore,
    thanks for sharing the data sets with us. How can I access the node attributes (e.g., gender, age, and course attended) in Facebook-like Social Network data set?
    Best regards,
    roya

    Reply
    • 25. Tore Opsahl  |  May 30, 2013 at 10:23 am

      Hi Roya,

      Great that you can use the datasets. I have not made the node attributes available as node identification could be possible then. Hope you are able to use it anyway.

      Best,
      Tore

      Reply
  • 26. Xiaoge Zhang  |  June 7, 2013 at 1:27 pm

    Dear Tore:
    First of all, thanks for sharing the datasets with us. Can you provide the detailed information for the network 14a (e.g. the name of these airports, or the location of these airports)? I cannot figure out which airport every number in the date set represents.
    Best Regards,
    Xiaoge Zhang

    Reply
    • 27. Tore Opsahl  |  June 10, 2013 at 12:43 pm

      Hi Xiaoge,

      Unfortunately, this information is not available for the US airport top 500. I would suggest that you use the full network (14b), calculate node strength scores, and then pick the top 500 if you are interested in the network among the top nodes.

      Best,
      Tore

      Reply
  • 28. roya  |  June 18, 2013 at 7:49 am

    Hi Dr. Tore

    First of all, thanks for sharing the data sets with us. What is meaning “weighted by number of characters” in Facebook-like Social Network data set?
    Please explain.

    Best Regards,
    Roya

    Reply
    • 29. Tore Opsahl  |  June 18, 2013 at 4:22 pm

      Hi Roya,

      The tie weights in the “weighted by number of characters”-networks are the sum of characters across all messages sent from one person to another (or group). This differ from the “weighted by number of messages”-version as the tie weight in these is the number of messages sent.

      Hope this helps,
      Tore

      Reply
  • 30. Muad  |  October 24, 2013 at 6:24 pm

    Hi Dr Tore,

    As you mentioned above, the dataset C.elegans Neural Network contains 14 duplicated ties and that you merged each duplicate tie into one tie by summing their weights. Still, I see in the files provided above these duplicates. So just wondering if these files has duplicates being removed or not.

    Thanks a lot,

    Reply
    • 31. Tore Opsahl  |  November 4, 2013 at 5:37 pm

      Hi Muad,

      Thanks for your comment. I do not believe there are any duplicated ties in the tnet format edgelist, the dl-format edgelist, nor the R-object packaged with the latest version of tnet. Do send me an email if you do you an issue.

      Tore

      Reply
  • 32. Cris  |  May 22, 2014 at 12:28 pm

    Dear Dr Tore,
    I’m a PhD student working on models of network formation and I find this website very useful, thank you very much for sharing your datasets. For my thesis I would like to use the dataset of a directed unweighted large network composed by several small separated subnetworks (each subnetwork should have at most 20 nodes), but I’m not able to find a dataset matching these requirements. Do you have any suggestion? Thank you

    Reply
    • 33. Tore Opsahl  |  May 23, 2014 at 1:42 am

      Hi Cris,

      I am not aware of a real-world network like that. Of course, you can find multiple smaller networks, but they would have different organizing principles. You might want to look into intra-school class networks.

      Tore

      Reply
  • 34. Hend Kareem  |  January 12, 2015 at 6:32 pm

    Hi, I wonder can one get information about who is friend with whom for the first data set, Facebook-like social network?

    BR.
    Hend

    Reply
    • 35. Tore Opsahl  |  January 13, 2015 at 2:17 am

      Hi Hend,

      Only the message network is available for the Facebook-like social network.

      Best,
      Tore

      Reply
  • 36. sara  |  August 27, 2015 at 5:01 am

    Hi. i want to use the Facebook-like social network but it is directed. i want to change it to undirected network. is it true that, if there is
    1 3 32 , I add 3 1 32 to the matrix and for this row ( 3 1 35 ) in the tnet-format , I sum 32 and 35 and consider (32+35) as tie weight between node 1 and 3?

    Reply
    • 37. Tore Opsahl  |  August 27, 2015 at 12:24 pm

      Hi Sara,

      You can use the symmetrise_w-function to achieve this.

      Best,
      Tore

      # Load tnet and dataset
      library(tnet)
      data(tnet)
      
      # Directed net
      netDirected <- OnlineSocialNetwork.n1899.net
      netDirected[netDirected[,"i"] %in% c(1,3) & netDirected[,"j"] %in% c(1,3),]
      
      # Undirected network
      netUndirected <- symmetrise_w(netDirected, method="SUM")
      netUndirected[netUndirected[,"i"] %in% c(1,3) & netUndirected[,"j"] %in% c(1,3),]
      
      Reply
  • 38. Carmine  |  July 29, 2016 at 9:38 pm

    Hi Tore,

    is it possible to create a unique text file which include different teams by specifying for instance the team number? I would like to calculate betweenness centrality for the nodes within a team. Shall I create a file for each team or is it possible to add a numbering so that t-net calculates centrality for all individual taking into account the team to which they belong?

    Hope it is clear, thank you very much in advance

    Best

    Carmine

    Reply
    • 39. Tore Opsahl  |  July 31, 2016 at 4:46 pm

      Hi Carmine,

      I am not entirely sure what you would like to do. If you’d like to compute betweenness but only consider a subgraph, you can extract the subgraph and then compute betweenness. If you’d like to compute betweenness when nodes are aggregated into a team, you need to do the aggregation first, and then use the betweenness function.

      If you have specific data questions, please drop me an email.

      Best,
      Tore

      Reply
      • 40. Tore Opsahl  |  August 3, 2016 at 2:31 am

        For everyone’s benefit, below is the code to compute intra-group betweenness and inter-group betweenness.

        # Load tnet
        library(tnet) 
        
        # Load network and group identifier
        data(tnet)
        net <- Freemans.EIES.net.3.n32
        grp <- Freemans.EIES.node.Discipline.n32
        
        # Append group id to network
        net <- merge(net, data.frame(i = 1:length(grp), grpI = grp))
        net <- merge(net, data.frame(j = 1:length(grp), grpJ = grp))
        
        # Extract only intra-group ties
        netIntra <- net[net[,"grpI"]==net[,"grpJ"],]
        
        # Split net by group
        netIntra <- split(netIntra[,c("i","j","w")], netIntra[,"grpI"])
        
        # Apply betweenness to the intra-group ties
        lapply(netIntra, betweenness_w)
        
        # Create group network
        netGrp <- net[net[,"grpI"]!=net[,"grpJ"],]
        netGrp <- netGrp[order(netGrp[,"grpI"], netGrp[,"grpJ"]),c("grpI","grpJ","w")]
        ind <- !duplicated(netGrp[,c("grpI","grpJ")])
        netGrp <- data.frame(netGrp[ind,1:2], w=tapply(netGrp[,"w"], cumsum(ind), sum))
        
        # Apple betweenness to the inter-group ties
        betweenness_w(netGrp)
        
  • 41. Hongyi Jiang  |  January 20, 2017 at 1:37 am

    Hi Tore,

    Just want to thank you for making the above data sets available. They have proved very useful; I will be citing and employing them in some upcoming manuscripts. Thanks again.

    Best wishes,

    Hongyi

    Reply
  • 42. Zehra  |  July 15, 2017 at 3:42 am

    Hi Tore,

    Can you please give examples of those real datasets which are still not in the form of graph network.

    Regards,
    Zehra

    Reply
    • 43. Tore Opsahl  |  July 16, 2017 at 5:58 pm

      Hi Zehra,

      I am not entire sure how that is possible. Please extract the network ties from you dataset using standard R commands, and then you can use the functions in the tnet package.

      Best,
      Tore

      Reply
  • 44. Marcus Aguiar  |  June 7, 2018 at 3:40 pm

    Dear Tore,

    I have been trying to access the web site http://cdg.columbia.edu/cdg/datasets (Collective Dynamics Group) but it seems to be out for days. Do you know if it moved or was discontinued? Thank you!
    Best,

    Marcus.

    Reply
    • 45. Tore Opsahl  |  February 25, 2019 at 9:03 pm

      Hi Marcus,

      It seems that that link is dead now. I don’t know that group. Let me know if you find the site somewhere else.

      Best,
      Tore

      Reply
  • 46. Giancarlo Bergamin (@GianBerg)  |  December 14, 2018 at 6:27 am

    Hey Tore!

    Thanks for the data sources!
    I am currently working with the openflights network 14c) and I found out that there are only 2939 Nodes (not the stated 7976). The highest ID of a node is 7976 but there are not all nodes from ID 1 to ID 7976 present. But there are 30501 edges. Did take some time until I figured this out, so maybe other people have the same problem too.

    Greetings
    Giancs

    Reply
    • 47. Tore Opsahl  |  December 27, 2018 at 2:58 pm

      Thanks for highlighting this, Giancs.

      The network has a number of isolates; hence the higher number of nodes than unique node ids in the edgelist.

      The network is not outdated, so it would be worth going to get the data again from Openflights.org. I can see that the link doesn’t work anymore. If you re-collect this dataset, let me know and we can put it up here.

      Best,
      Tore

      Reply
  • 48. AdamShaw  |  July 2, 2019 at 1:40 pm

    Hi Tore

    This is an awesome resource and I’ve found your other pages very useful, particularly your post on centrality in weighted networks.

    I was wondering if you could provide more details on Networks 8-11. I have Rob Cross’s book myself and would like to cross-reference the text with my experiments tinkering with the dataset. Would you happen to know which cases/pages in the book correspond to these datasets?

    Thanks

    A

    Reply
    • 49. Tore Opsahl  |  August 20, 2019 at 9:18 pm

      Hi Adam,

      The best would be to check with Rob Cross himself regarding these networks.

      Best,
      Tore

      Reply

Leave a comment

Trackback this post  |  Subscribe to the comments via RSS Feed