Shortest Paths in Weighted Networks
Shortest paths or distances among nodes has long been a key element of network research. While the shortest paths often are not of interest in themselves, they are the key component of a number of measures. First, a popular question has been what is the average distance among people? The question became famous following Milgram’s six-degrees of separation experiment in 1967 that found that people in the US were on average 6-steps from any other person (see the introduction of my thesis for a more complete story and criticism). Second, the shortest distances have been used as an underlying metric in a number of measures, such as centrality ones (Freeman, 1978; Opsahl et al., 2010). One of these measures is betweenness, which is roughly equal to the number of shortest paths between others that pass through a node. This page is aimed at showing a generalisation of shortest path calculations for weighted networks that has the potential of being more accurate.
The shortest distance among nodes in a network is quite easy to calculate if you only have present or absent ties: you simply count the ties along the shortest path. If two nodes are directly connected: distance=1; and if they are not directly connected, but are connected through intermediaries, then it is the lowest number of intermediary nodes +1. For example, if the ties in this network did not have weights, the distance between node A and F would be 3. Below is a matrix showing the shortest distances among the nodes in the sample network.
A B C D E F A NA 1 1 2 2 3 B 1 NA 1 1 1 2 C 1 1 NA 2 2 3 D 2 1 2 NA 2 3 E 2 1 2 2 NA 1 F 3 2 3 3 1 NA
The average shortest distance in this network is 1.8 (the mean of the matrix, diagonal excluded).
Difficulty occurs when ties are differentiated, as they are in a weighted network. For example, the tie between node A and B has twice the strength of the tie between node A and node C. This could mean that node A has more frequent contact with node B than with node C. In turn, this could imply that node A might give node B a piece of information (or a disease) twice as likely as node C. If we are looking at the diffusion of information or diseases in a network, then the speed that it travels, and routes that it takes, are clearly affected by the weights.
Let us look at the shortest path from node C to node B. The direct connection carries a weight of 1 only; however, the indirect connection through node A is composed of stronger ties. Therefore, a piece of information (or a disease) might be transmitted quicker through node A than directly.
Dijkstra (1959) proposed an algorithm that sum the cost of connections and find the path of least resistance. For example, GPS devices uses this algorithm by assigning a time-cost to each leg of the road, and then find the route that cost least in terms of time. This algorithm can also be used in social network analysis. Newman (2001) applied it to a collaboration network of scientists by inversing the tie weights (dividing 1 by the weight). This implies that a stronger tie gets a lower cost than a weaker tie. Below is the network from above with the inverted weights:
By applying Dijkstra’s algorithm, we find that the direct connection between node C and node B has a cost of 1, whereas the indirect connection via node A has a cost of 0.75 (1/2 + 1/4). Therefore, according to this algorithm, the information will travel faster through the indirect connection. Below is a matrix showing the cost of the least costly routes among the nodes in the sample network.
A B C D E F A NA 0.25 0.50 0.50 0.75 1.75 B 0.25 NA 0.75 0.25 0.50 1.50 C 0.50 0.75 NA 1.00 1.25 2.25 D 0.50 0.25 1.00 NA 0.75 1.75 E 0.75 0.50 1.25 0.75 NA 1.00 F 1.75 1.50 2.25 1.75 1.00 NA
The average of this matrix is 0.9833.
This application of Dijkstra’s algorithm is fine to use if we wish to identify the shortest paths; however, what does an average distance of 0.98 mean?
To this end, I suggest to “normalise” the weights by the average weight in the network. The weights would then be (before being inverted):
By calculating Dijkstra’s algorithm following this normalisation, we get the following shortest distances among nodes:
A B C D E F A NA 0.5825 1.1650 1.1650 1.7475 4.0775 B 0.5825 NA 1.7475 0.5825 1.1650 3.4950 C 1.1650 1.7475 NA 2.3300 2.9125 5.2425 D 1.1650 0.5825 2.3300 NA 1.7475 4.0775 E 1.7475 1.1650 2.9125 1.7475 NA 2.3300 F 4.0775 3.4950 5.2425 4.0775 2.3300 NA
The average of this matrix is 2.29.
A unit of distance then refers to one step with the average weight in the network. The average of the matrix suggests that on average nodes are 2.29 steps with average tie weight away from each other. This measure would be comparable across networks with different ranges of tie weights.
If anyone has any suggestions for improvements, let me know!
Want to test it with your data?
This distance matrix can be calculated using tnet. First, you need to download and install tnet in R. Then, you need to create an edgelist of your network (see the data structures in tnet for weighted one-mode networks). The commands below show how the edgelist for the sample network here can manually be entered, and how to calculate the distance matrix.
# Load tnet library(tnet) # Load network net <- cbind( i=c(1,1,2,2,2,2,3,3,4,5,5,6), j=c(2,3,1,3,4,5,1,2,2,2,6,5), w=c(4,2,4,1,4,2,2,1,4,2,1,1)) # Create a binary version of the network bnet <- dichotomise_w(net) # Calculate the average distance in the binary network mean(distance_w(bnet),na.rm=T) # Calculate the average distance in the weighted network mean(distance_w(net),na.rm=T)
Dijkstra, E. W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.
Freeman, L. C., 1978. Centrality in social networks: Conceptual clarification. Social Networks 1, 215-239.
Milgram, S., 1967. The small world problem. Psychology Today 2, 60-67.
Newman, M. E. J., 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E 64, 016132.
Opsahl, T., Agneessens, F., Skvoretz, J., 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32 (3), 245-251.