Shortest Paths in Weighted Networks

tnet » Weighted Networks » Shortest Paths

geodesic-n1Shortest paths or distances among nodes has long been a key element of network research. While the shortest paths often are not of interest in themselves, they are the key component of a number of measures. First, a popular question has been what is the average distance among people? The question became famous following Milgram’s six-degrees of separation experiment in 1967 that found that people in the US were on average 6-steps from any other person (see the introduction of my thesis for a more complete story and criticism). Second, the shortest distances have been used as an underlying metric in a number of measures, such as centrality ones (Freeman, 1978; Opsahl et al., 2010). One of these measures is betweenness, which is roughly equal to the number of shortest paths between others that pass through a node. This page is aimed at showing a generalisation of shortest path calculations for weighted networks that has the potential of being more accurate.

The shortest distance among nodes in a network is quite easy to calculate if you only have present or absent ties: you simply count the ties along the shortest path. If two nodes are directly connected: distance=1; and if they are not directly connected, but are connected through intermediaries, then it is the lowest number of intermediary nodes +1. For example, if the ties in this network did not have weights, the distance between node A and F would be 3. Below is a matrix showing the shortest distances among the nodes in the sample network.

   A  B  C  D  E  F
A NA  1  1  2  2  3
B  1 NA  1  1  1  2
C  1  1 NA  2  2  3
D  2  1  2 NA  2  3
E  2  1  2  2 NA  1
F  3  2  3  3  1 NA

The average shortest distance in this network is 1.8 (the mean of the matrix, diagonal excluded).

Difficulty occurs when ties are differentiated, as they are in a weighted network. For example, the tie between node A and B has twice the strength of the tie between node A and node C. This could mean that node A has more frequent contact with node B than with node C. In turn, this could imply that node A might give node B a piece of information (or a disease) twice as likely as node C. If we are looking at the diffusion of information or diseases in a network, then the speed that it travels, and routes that it takes, are clearly affected by the weights.

Let us look at the shortest path from node C to node B. The direct connection carries a weight of 1 only; however, the indirect connection through node A is composed of stronger ties. Therefore, a piece of information (or a disease) might be transmitted quicker through node A than directly.

Dijkstra (1959) proposed an algorithm that sum the cost of connections and find the path of least resistance. For example, GPS devices uses this algorithm by assigning a time-cost to each leg of the road, and then find the route that cost least in terms of time. This algorithm can also be used in social network analysis. Newman (2001) applied it to a collaboration network of scientists by inversing the tie weights (dividing 1 by the weight). This implies that a stronger tie gets a lower cost than a weaker tie. Below is the network from above with the inverted weights:
geodesic-m2

By applying Dijkstra’s algorithm, we find that the direct connection between node C and node B has a cost of 1, whereas the indirect connection via node A has a cost of 0.75 (1/2 + 1/4). Therefore, according to this algorithm, the information will travel faster through the indirect connection. Below is a matrix showing the cost of the least costly routes among the nodes in the sample network.

     A    B    C    D    E    F
A   NA 0.25 0.50 0.50 0.75 1.75
B 0.25   NA 0.75 0.25 0.50 1.50
C 0.50 0.75   NA 1.00 1.25 2.25
D 0.50 0.25 1.00   NA 0.75 1.75
E 0.75 0.50 1.25 0.75   NA 1.00
F 1.75 1.50 2.25 1.75 1.00   NA

The average of this matrix is 0.9833.

This application of Dijkstra’s algorithm is fine to use if we wish to identify the shortest paths; however, what does an average distance of 0.98 mean?

To this end, I suggest to “normalise” the weights by the average weight in the network. The weights would then be (before being inverted):
geodesic-navg1

By calculating Dijkstra’s algorithm following this normalisation, we get the following shortest distances among nodes:

       A      B      C      D      E      F
A     NA 0.5825 1.1650 1.1650 1.7475 4.0775
B 0.5825     NA 1.7475 0.5825 1.1650 3.4950
C 1.1650 1.7475     NA 2.3300 2.9125 5.2425
D 1.1650 0.5825 2.3300     NA 1.7475 4.0775
E 1.7475 1.1650 2.9125 1.7475     NA 2.3300
F 4.0775 3.4950 5.2425 4.0775 2.3300     NA

The average of this matrix is 2.29.

A unit of distance then refers to one step with the average weight in the network. The average of the matrix suggests that on average nodes are 2.29 steps with average tie weight away from each other. This measure would be comparable across networks with different ranges of tie weights.

If anyone has any suggestions for improvements, let me know!

Want to test it with your data?

This distance matrix can be calculated using tnet. First, you need to download and install tnet in R. Then, you need to create an edgelist of your network (see the data structures in tnet for weighted one-mode networks). The commands below show how the edgelist for the sample network here can manually be entered, and how to calculate the distance matrix.

# Load tnet
library(tnet)

# Load network
net  <- cbind(
i=c(1,1,2,2,2,2,3,3,4,5,5,6),
j=c(2,3,1,3,4,5,1,2,2,2,6,5),
w=c(4,2,4,1,4,2,2,1,4,2,1,1))

# Create a binary version of the network
bnet <- dichotomise_w(net)

# Calculate the average distance in the binary network
mean(distance_w(bnet),na.rm=T)

# Calculate the average distance in the weighted network
mean(distance_w(net),na.rm=T)

References

Dijkstra, E. W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.

Freeman, L. C., 1978. Centrality in social networks: Conceptual clarification. Social Networks 1, 215-239.

Milgram, S., 1967. The small world problem. Psychology Today 2, 60-67.

Newman, M. E. J., 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E 64, 016132.

Opsahl, T., Agneessens, F., Skvoretz, J., 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32 (3), 245-251.

If you use any of the information in this post, please cite: Opsahl, T., Agneessens, F., Skvoretz, J., 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32 (3), 245-251

23 Comments Add your own

  • 1. Bhupendrasinh Thakre  |  June 18, 2012 at 4:47 pm

    Awesome articles. They are helping me a lot to build my knowledge in SNA.
    Thank you very much Dr. Opsahl.

    Reply
  • 2. Romana  |  November 1, 2013 at 2:46 pm

    First of all, thank you very much for providing all these helpful SNA resources on your blog!

    I have a question concerning the tnet package.
    While the command “closeness_w” returns the “raw” closeness score as well as the normalized score, the command “betweenness_w” only gives one kind of score (not normalized, I guess).
    Is there a specific reason why normalization is only available for closeness centrality?

    Thank you very much and best regards
    Romana

    Reply
    • 3. Tore Opsahl  |  November 4, 2013 at 5:12 pm

      Hi Romana,

      Thank you for your comment. I am not a big fan of normalized indices as the only purpose of them is to compare nodes’ scores across networks where the number of nodes and ties vary, and to do so, it makes a couple of assumptions (e.g., scaling with n*(n-1)). This might not be appropriate. If you would like to scale the network metrics, you can simply apply the method you would like.

      Best,
      Tore

      Reply
      • 4. romanariegler  |  November 4, 2013 at 5:24 pm

        Thank you for your reply! So I guess if I’m just looking at one single network (no comparisons) I can just go with the non-normalized scores of betweenness and closeness.
        BTW I’m sorry for my redunant comments. The first one wasn’t displayed to me until just now, so I thought something had gone wrong and posted a second time. Please feel free to delete one of them.

  • 5. Ali  |  February 16, 2014 at 4:42 pm

    very good article.
    Do you know any software/method can calculate the closeness centrality for a weighted directed network based on the Dijkstra’s algorithm? the commands you presented here is for unweighted network.

    Thanks,

    Reply
    • 6. Tore Opsahl  |  February 19, 2014 at 1:12 pm

      Hi Ali,

      The tnet R package can calculate this metric for you. See the code example.

      Best,
      Tore

      Reply
  • 7. Walter  |  January 6, 2016 at 6:47 am

    First, let me thank you for this very helpful blog; I found many of the entries to be highly informative and useful.
    I am using iGraph for my calculations and I find the weights issue to be a bit confusing, especially after reading this entry and trying to calculate average shortest path length for networks. I assumed that in igraph the weight of an edge is a measurement of tie strength. Therefore, higher values indicate stringer connection. However, is seems that when using Dijkstra’s algorithm in Igraph, the package treats the weights of the edges as distances, looking for paths that minimize the sum of all paths between two nodes. This seems to indicate that the algorithm sees lower weights as stronger ties (ie shorter distances). My solution will be, just for this calculation to convert every edge weight x in my edge weights list to 1/x (as with the Newman reference you provided).

    Did the problem I observed is right and do you think that solution a good approach?
    Additionally, is this a problem only in this case (and centrality measures) but not an issue in the calculation of transitivity and modularityin igraph (for example, when using random walks).

    Best,
    Walter

    Reply
    • 8. Tore Opsahl  |  January 10, 2016 at 9:13 pm

      Hi Walter!

      Glad that you find this site useful!

      Indeed the assumption that tie weights represent costs instead of strength is often leading to some incorrect findings. The distance_w-function in tnet takes the inverse of the tie weights and then utilize the igraph implementation of Dijkstra’s algorithm. I also scale the weights with the mean tie weight so that one can more easily interpret the values. I am not sure how igraph implements transitivity and modularity, so you might reach out to them directly. All the tnet functions work on the assumption of high values equal stronger connections.

      Good luck!
      Tore

      Reply
  • 9. Drahomir  |  February 28, 2016 at 8:44 pm

    Hi,

    First thanks for a lot of useful information provided here. I had 2 questions regarding the average shortest path in weighted graph, particluary if there’s a similar way to compute Diameter of graph and also to display a distribution of shortest paths (in a way like “what is the probability of choosing a path with a certain distance if picking randomly?”)

    With regards,
    Drahomir

    Reply
    • 10. Tore Opsahl  |  March 3, 2016 at 2:46 am

      Hi Drahomir,

      The diameter of a graph is simply the longest path in the network, and the probability is found by studying the non-diagonal values in the distance matrix. The code below shows how to compute both (including a plot of the probability of randomly picking a path with a particular length or less).

      Good luck!
      Tore

      # Load tnet
      library(tnet)
      
      # Load network
      net  <- cbind(
        i=c(1,1,2,2,2,2,3,3,4,5,5,6),
        j=c(2,3,1,3,4,5,1,2,2,2,6,5),
        w=c(4,2,4,1,4,2,2,1,4,2,1,1))
      
      # Compute distances
      dmat <- distance_w(net)
      
      # Diameter
      max(dmat, na.rm=TRUE)
      
      # Vectorize distance matrix and take out diagonal NAs
      dvec <- as.vector(dmat)
      dvec <- dvec[!is.na(dvec)]
      
      # Plot histogram / distribution
      hist(dvec)
      
      # Cumulative distribution
      dCumuDist <- hist(dvec)
      plot(dCumuDist$mids, cumsum(dCumuDist$density), type="l", xlab="distance", ylab="cumulative probability")
      
      Reply
      • 11. Drahomir  |  March 3, 2016 at 8:20 pm

        Thank you very much Dr. Opsahl, really appreciating your kind help.

  • 12. Drahomir  |  March 20, 2016 at 10:11 pm

    Hello again Dr. Opsahl

    I’ve recently come about 2 uncertaineties. I’ve analyzed a graph consisting of 157 nodes. When i compute a matrix of weighted distances (shorthest paths) between nodes my matrix is 141×141 not 157×157 (as I understand it, this is mainly because of existing separated dyads (pairs of 2 nodes connected only to each other and disconnected from all other nodes in graph). There are 7 of these separated pairs in my graph. Could this be a reason for recieving 141×141 matrix and not the 157×157 matrix?

    Secondly. If i’m looking at the specific value of distance in matrix lets say row = 28 and collumn = 31, is this value a distance between nodes (28-31) that were originaly inputed in the analyzed dataframe? In other words, is row number 28 representing node number 28 in original data?

    Thanks for any advice,
    With regards
    Drahomír

    Reply
    • 13. Tore Opsahl  |  March 25, 2016 at 12:53 am

      Hi Drahomír,

      Glad you are able to use tnet. To answer your questions in order:

      1) The distance matrix is only from the main / giant component of the network. If you would like it for all nodes, you must specify gconly=FALSE in the distance_w-function.

      2) The node ids of the rows and columns in the distance matrix when gconly=TRUE is saved as an attribute of the output from the distance_w-function. For example, if the output object is called dmat, the node list can be found by typing attributes(dmat)$nodes.

      Good luck!
      Tore

      Reply
      • 14. Drahomir  |  April 12, 2016 at 2:08 am

        Thanks Dr. Opsahl

        Unfortunately I’m still a bit confused.

        Here’s my problem. Nodes in my graph represent animal names, produced in category/semantic fluency test. 2 most strongly connected nodes are “cat” and “dog”. I cannot think of any other shortest path between these two nodes than the direct one, as this is the path with highest weight in graph. Node “cat” was numericaly labeled as 1 and node “dog” as 2. When i lookup shorthest path between 1 and 2 in dmat matrix the value is 2.74 and this doesn’t make any sense to me.

        I’m definitely missing something, would be glad for any further guidance.

      • 15. Tore Opsahl  |  April 13, 2016 at 5:04 am

        Hi Drahomir,

        Please send me an email with your data and code, and I will have a look at what is going on.

        Tore

      • 16. Drahomir  |  April 17, 2016 at 10:47 am

        Hi, I’m really appreciating your willingness to help, but I’ve already figured it out. Error was in my data input, which didn’t include raw weights but already normalized, so they were normalized once again by the package.

  • 17. Elio Bartos  |  April 1, 2016 at 2:33 pm

    Hello,
    thanks for this post, I’m just getting started with SNA and tnet package.
    1. Is there a way to calculate shortest distances like in example 1. (without normalisation) with tnet package?
    2. Does measures like closeness use normalised shortest distances or just shortest distances?

    Thank you!

    Reply
    • 18. Tore Opsahl  |  April 1, 2016 at 3:11 pm

      Hi Elio,

      1) The “normalization” of ties weights is a constant applied to all weights. You can undo this normalization with the mean time weight. Have a look at the function by typing distance_w without the (). You can then copy this code and comment out the normalization line.

      2) The functions rely on the normalized distances. But it’s key to remember that the same paths are identified as the shortest ones with and without normalization.

      Hope this helps!
      Tore

      Reply
  • 19. Prem  |  July 14, 2016 at 11:29 am

    Hello Tore
    I am using tnet package for my weighted network analysis. I am little bit confuse for average path length of disconnected graphs. After breakdown my network divided into 10 part, where i am considering the sub graph have more than 5 nodes, most of the algorithm is to calculate the mean path length by considering the number of nodes from giant component only. But in a power grid network , each sub graph is a part of the network even if there is no link for a certain period of time, so i wish to consider all the nodes present in disconnected sub graph, and wants to calculated average path length based on total nodes, am i right or wrong?

    Reply
  • 21. Prem  |  July 16, 2016 at 12:19 pm

    thank you Tore

    Reply
  • 22. Gab (@red_peque)  |  October 23, 2016 at 9:59 pm

    Hello Dr. Opsahl!

    First of all, many thanks for tnet and all the information in your blog.

    Sorry, I am having a problem when trying to get distances in a weighted directed network. I am inputting data that contains 304 nodes and 710 edges. 300 nodes are connected to the giant component and only 4 are not. I was expecting to get a matrix of 300*300 (or 304*304) but I am only getting 66*66 (when using “gconly=TRUE”). I tried to write the “distance_w” function with “gconly=FALSE” but I get a message saying that R has encountered a fatal error and it needs to shut down. Any clue of what I might be doing wrong? (My weights take the values of 1, 2, 2.5 and 3 – So not sure if the ‘.5’ is creating an issue)

    Many thanks!
    Gabriela

    Reply
    • 23. Tore Opsahl  |  October 26, 2016 at 5:12 pm

      Hi Gabriela,

      Glad you are finding the information useful.

      I am not entirely sure what is happening in your case. Please drop me a note with your network and code. Then I can better help you.

      The 0.5 weight should not be an issue.

      Best,
      Tore

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Subscribe to the comments via RSS Feed


%d bloggers like this: