Average shortest distance in weighted networks

January 9, 2009

Shortest distances has long been a key element of network research. First, it has been used a global measure: what is the average shortest distance among nodes in a network? The global perspective became famous following Milgram’s six-degrees of separation experiment in 1967 that found that people in the US were on average 6-steps from any other person (see the introduction of my thesis for a more complete story and criticism).

Second, the shortest distances have been used as an underlying metric in a number of measures, such as centrality ones (Freeman, 1978). One of these measures is closeness, which is inversely proportional to the distance from a node to all other nodes in the network. A number of researchers has found an association between the closeness to others and performance – but this post is not about applications, but about the global average shortest distances.

geodesic-n1The shortest distance among nodes in a network is quite easy to calculate if you only have present or absent ties: you simply count the ties along the shortest path. If two nodes are directly connected: distance=1; and if they are not directly connected, but are connected through intermediaries, then it is the lowest number of intermediary nodes +1. For example, if the ties in this network did not have weights, the distance between node A and F would be 3. Below is a matrix showing the shortest distances among the nodes in the sample network.

   A  B  C  D  E  F
A NA  1  1  2  2  3
B  1 NA  1  1  1  2
C  1  1 NA  2  2  3
D  2  1  2 NA  2  3
E  2  1  2  2 NA  1
F  3  2  3  3  1 NA

The average shortest distance in this network is 1.8 (the mean of the matrix, diagonal excluded).

Difficulty occurs when ties are differentiated, as they are in a weighted network. For example, the tie between node A and B has twice the strength of the tie between node A and node C. This could mean that node A has more frequent contact with node B than with node C. In turn, this could imply that node A might give node B a piece of information (or a disease) twice as likely as node C. If we are looking at the diffusion of information or diseases in a network, then the speed that it travels, and routes that it takes, are clearly affected by the weights.

Let us look at the shortest path from node C to node B. The direct connection carries a weight of 1 only; however, the indirect connection through node A is composed of stronger ties. Therefore, a piece of information (or a disease) might be transmitted quicker through node A than directly.

Dijkstra (1959) proposed an algorithm that sum the cost of connections and find the path of least resistance. For example, GPS devices uses this algorithm by assigning a time-cost to each leg of the road, and then find the route that cost least in terms of time. This algorithm can also be used in social network analysis. Newman (2001) applied it to a collaboration network of scientists by inversing the tie weights (dividing 1 by the weight). This implies that a stronger tie gets a lower cost than a weaker tie. Below is the network from above with the inverted weights:
geodesic-m2

By applying Dijkstra’s algorithm, we find that the direct connection between node C and node B has a cost of 1, whereas the indirect connection via node A has a cost of 0.75 (1/2 + 1/4). Therefore, according to this algorithm, the information will travel faster through the indirect connection. Below is a matrix showing the cost of the least costly routes among the nodes in the sample network.

     A    B    C    D    E    F
A   NA 0.25 0.50 0.50 0.75 1.75
B 0.25   NA 0.75 0.25 0.50 1.50
C 0.50 0.75   NA 1.00 1.25 2.25
D 0.50 0.25 1.00   NA 0.75 1.75
E 0.75 0.50 1.25 0.75   NA 1.00
F 1.75 1.50 2.25 1.75 1.00   NA

The average of this matrix is 0.9833.

This application of Dijkstra’s algorithm is fine to use if we wish to rank the nodes in terms of closeness to others (node A is closer to the others than node C because it has a stronger tie to node B); however, what does an average distance of 0.98 mean?

To this end, I suggest to “normalise” the weights by the average weight in the network. The weights would then be:
geodesic-navg1

By calculating Dijkstra’s algorithm using this normalisation, we get the following shortest distances among nodes:

       A      B      C      D      E      F
A     NA 0.5825 1.1650 1.1650 1.7475 4.0775
B 0.5825     NA 1.7475 0.5825 1.1650 3.4950
C 1.1650 1.7475     NA 2.3300 2.9125 5.2425
D 1.1650 0.5825 2.3300     NA 1.7475 4.0775
E 1.7475 1.1650 2.9125 1.7475     NA 2.3300
F 4.0775 3.4950 5.2425 4.0775 2.3300     NA

The average of this matrix is 2.29.

A unit of distance then refers to one step with the average weight in the network. The average of the matrix suggests that on average nodes are 2.29 steps with average tie weight away from each other. This measure would be comparable across networks with different ranges of tie weights.

If anyone has any suggestions for improvements, let me know!

Want to test it with your data?

First, you need to download tnet and install it in R.

You need to create an edgelist of your network (see the tnet documentation). The edgelist for the sample network here can be entered into R using the following command:

net  <- cbind(
i=c(1,1,2,2,2,2,3,3,4,5,5,6),
j=c(2,3,1,3,4,5,1,2,2,2,6,5),
w=c(4,2,4,1,4,2,2,1,4,2,1,1))

Then you can write the following commands:

# Load tnet
library(tnet)

# Create a binary version of the network
bnet <- dichotomise(net)

# Calculate the average distance in the binary network
mean(distance_w(bnet),na.rm=T)

# Calculate the average distance in the weighted network
mean(distance_w(net),na.rm=T)

References

Dijkstra, E. W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.

Freeman, L. C., 1978. Centrality in social networks: Conceptual clarification. Social Networks 1, 215-239.

Milgram, S., 1967. The small world problem. Psychology Today 2, 60-67.

Newman, M. E. J., 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E 64, 016132.

Please cite or link to this post if you use it.

Entry Filed under: Network thoughts. Tags: , , , , , , , , , .

2 Comments Add your own

  • 1. esagor  |  January 9, 2009 at 2:46 pm

    Tore, this is great stuff. I’m also working on dissertation research (pretty early in the process) and love the way you’ve organized your site. I look forward to reading future posts and seeing how your work proceeds.

    Reply
  • 2. kaikuo  |  May 5, 2009 at 12:36 pm

    your posts are really helpful:) I am working on my dissertation too

    Reply

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Welcome

Tore OpsahlMy aim for this blog is to explore and throw out in the open some of the ideas about social network analysis that I have, but no time to implement. Many of my ideas stem from my interest in weighted networks and my belief that the weights are an enormous source of data. However, many social network measures require that the weights are discarded. In so doing, the richness of the data is considerably reduced. In turn, this limits the analysis.

Recent Posts

Upcoming Posts

Creating an ensemble of binary networks from a weighted one

Closeness in weighted networks

tnet: Software for Analysing Two-Mode Networks

Links

Feeds

Licensing

The information on this blog is published under the Creative Commons Attribution-Noncommercial 3.0-lisence.

This means that you are free to:
· share (copy, distribute and transmit)
· remix (adapt)
under the following conditions:
· attribution (you must cite this blog)
· noncommercial (you may not use it for
   commercial purposes).