Closeness centrality in networks with disconnected components
March 20, 2010 at 1:03 pm 16 comments
A key node centrality measure in networks is closeness centrality (Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994). It is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes. As the distance between nodes in disconnected components of a network is infinite, this measure cannot be applied to networks with disconnected components (Opsahl et al., 2010; Wasserman and Faust, 1994). This post highlights a possible work-around, which allows the measure to be applied to these networks and at the same time maintain the original idea behind the measure.
This network gives a concrete example of the closeness measure. The distance between node G and node H is infinite as a direct or indirect path does not exist between them (i.e., they belong to separate components). As long as at least one node is unreachable by the others, the sum of distances to all other nodes is infinite. As a consequence, researchers have limited the closeness measure to the largest component of nodes (i.e., measured intra-component). The distance matrix for the nodes in the sample network is:
| Nodes | All inclusive | Intra-component | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | B | C | D | E | F | G | H | I | J | K | Farness | Closeness | Farness | Closeness | ||
| A | … | 1 | 1 | 2 | 2 | 3 | 3 | Inf | Inf | Inf | Inf | Inf | Inf | 12 | 0.08 | |
| B | 1 | … | 1 | 2 | 1 | 2 | 3 | Inf | Inf | Inf | Inf | Inf | Inf | 10 | 0.10 | |
| C | 1 | 1 | … | 1 | 2 | 2 | 2 | Inf | Inf | Inf | Inf | Inf | Inf | 9 | 0.11 | |
| D | 2 | 2 | 1 | … | 2 | 1 | 1 | Inf | Inf | Inf | Inf | Inf | Inf | 9 | 0.11 | |
| E | 2 | 1 | 2 | 2 | … | 1 | 3 | Inf | Inf | Inf | Inf | Inf | Inf | 11 | 0.09 | |
| F | 3 | 2 | 2 | 1 | 1 | … | 2 | Inf | Inf | Inf | Inf | Inf | Inf | 11 | 0.09 | |
| G | 3 | 3 | 2 | 1 | 3 | 2 | … | Inf | Inf | Inf | Inf | Inf | Inf | 14 | 0.07 | |
| H | Inf | Inf | Inf | Inf | Inf | Inf | Inf | … | 1 | 2 | Inf | Inf | Inf | 3 | 0.33 | |
| I | Inf | Inf | Inf | Inf | Inf | Inf | Inf | 1 | … | 1 | Inf | Inf | Inf | 2 | 0.50 | |
| J | Inf | Inf | Inf | Inf | Inf | Inf | Inf | 2 | 1 | … | Inf | Inf | Inf | 3 | 0.33 | |
| K | Inf | Inf | Inf | Inf | Inf | Inf | Inf | Inf | Inf | Inf | … | Inf | Inf | 0 | … | |
Although the intra-component closeness scores are not infinite for all the nodes in the network, it would be inaccurate to use them as a closeness measure. This is due to the fact that the sum of distances would contain different number of paths (e.g., there are two distance from node H to other nodes in its component, while there are six distances from node G to other nodes in its component). In fact, nodes in smaller components would generally be seen as being closer to others than nodes in larger components. Thus, researchers has focused solely on the largest component. However, this leads to a number of methodological issues, including sample selection.
To develop this measure, I went back to the original equation:
where is the focal node,
is another node in the network, and
is the shortest distance between these two nodes. In this equation, the distances are inversed after they have been summed, and when summing an infinite number, the outcome is infinite. To overcome this issue while staying consistent with the existing measure of closeness, I took advantage of the fact that the limit of a number divided by infinity is zero. Although infinity is not an exact number, the inverse of a very high number is very close to 0. In fact, 0 is returned if you enter 1/Inf in the statistical programme R. By taking advantage of this feature, it is possible to rewrite the closeness equation as the sum of inversed distances to all other nodes instead of the inversed of the sum of distances to all other nodes. The equation would then be:
To exemplify this change, for the example network above, the inversed distances and closeness scores are:
| Nodes | Closeness | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | B | C | D | E | F | G | H | I | J | K | Sum | Normalized | ||
| A | … | 1.00 | 1.00 | 0.50 | 0.50 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 3.67 | 0.37 | |
| B | 1.00 | … | 1.00 | 0.50 | 1.00 | 0.50 | 0.33 | 0 | 0 | 0 | 0 | 4.33 | 0.43 | |
| C | 1.00 | 1.00 | … | 1.00 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4.50 | 0.45 | |
| D | 0.50 | 0.50 | 1.00 | … | 0.50 | 1.00 | 1.00 | 0 | 0 | 0 | 0 | 4.50 | 0.45 | |
| E | 0.50 | 1.00 | 0.50 | 0.50 | … | 1.00 | 0.33 | 0 | 0 | 0 | 0 | 3.83 | 0.38 | |
| F | 0.33 | 0.50 | 0.50 | 1.00 | 1.00 | … | 0.50 | 0 | 0 | 0 | 0 | 3.83 | 0.38 | |
| G | 0.33 | 0.33 | 0.50 | 1.00 | 0.33 | 0.50 | … | 0 | 0 | 0 | 0 | 3.00 | 0.30 | |
| H | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 1.00 | 0.50 | 0 | 1.50 | 0.15 | |
| I | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 | … | 1.00 | 0 | 2 | 0.20 | |
| J | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 1.00 | … | 0 | 1.50 | 0.15 | |
| K | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | |
As can be seen from this table, a closeness score is attained for all nodes taking into consideration an equal number of distances for each node irrespective of the size of the nodes’ component. Moreover, nodes belonging to a larger component generally attains a higher score. This is deliberate as these nodes can reach a greater number of others than nodes in smaller components. The normalized scores are bound between 0 and 1. It is 0 if a node is an isolate, and 1 if a node is directly connected all others.
This measure can easily be extended to weighted networks by introducing Dijkstra’s (1959) algorithm as proposed in Average shortest distance in weighted networks.
References
Dijkstra, E. W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269-271.
Freeman, L. C., 1978. Centrality in social networks: Conceptual clarification. Social Networks 1, 215-239.
Opsahl, T., Agneessens, F., Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32, 245-251.
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, New York, NY.
What to try it with your data?
Below is the code to calculate the closeness measure on a binary version of Freeman’s third EIES network.
# Load tnet library(tnet) # Load network # Node K is assigned node id 8 instead of 10 as isolates at the end of id sequences are not recorded in edgelists net <- cbind( i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,7,9,10,10,11), j=c(2,3,1,3,5,1,2,4,3,6,7,2,6,4,5,4,10,9,11,10), w=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)) # Calculate measures closeness_w(net, gconly=FALSE)
Entry filed under: Network thoughts. Tags: actors, arcs, centrality, closeness, complex networks, directed networks, edges, global, graphs, hubs, Links, local, network, nodes, shortest distance, shortest path, social network analysis, strength of ties, ties, undirected networks, valued networks, vertices, weighted networks.
RSS feed
1.
Wolfgang Weber | September 30, 2010 at 12:48 pm
Hi Tore,
i have a question about the definition of closeness. I thought the definition is
In your example for your new solution with inversed distances and the normalized closeness you seem to use this definition in an adapted way
quasi inversed distances and re-inversed closeness, but in the first example (intra-component) its simply 1/(sum of distances).
I’m only an amateur, so please don’t be too mathematically in your answers ;-)
Wolfgang
2.
Tore Opsahl | September 30, 2010 at 3:43 pm
Wolfgang,
What you are talking about is the normalisation of closeness scores. A normalisation procedure is simply ensuring that scores are bound between 0 and 1. If you divide positive scores by its theoretical maximum, you will achieve this.
I am not a fan of normalisation as (1) it does not increase the variance among scores if you only analyse one network or networks of similar size (i.e., multiplying all scores with a constant), and (2) it is questionable whether the sum of all distances scale linearly with the number of nodes (see the small-world literature on this topic). As a result, I have not used normalised scores.
Hope this helps,
Tore
3.
Manal Rayess | January 4, 2011 at 9:30 am
Hi Tore,
tnet outputs the normailzed closeness as well, however the tutorial mentions that the output is a data.frame with two columns, node ids and closeness scores. Can you please just indicate in the tutorial that a third column (n.closeness) is output as well?
Thanks and regards.
4.
Tore Opsahl | January 4, 2011 at 9:40 am
Manal,
The third column in the normalised closeness scores (i.e., the closeness scores divided by n-1). This column is only added when gconly=FALSE. But there is no reason why it is not computed when gconly=TRUE. Will add this in the upcoming version of tnet, and change the manual. Thanks for noticing.
Best,
Tore
5.
Elizabeth Hobson | November 9, 2011 at 8:08 pm
Hi Tore,
I am comparing two networks of slightly different sizes (n=21 & n=19) and would like to normalize the closeness scores to facilitate this comparison. Since the networks are very similar in size, I don’t think I have to worry about small world scaling issues. My question has to do with the normalized closeness data. When tnet outputs closeness alpha=0, the normalized values are bounded between 0 and 1 as expected. However, if I run closeness with alpha=0.5 or 1, the normalized values exceed 1 (I get values up to 1.29). This is driven by nonnormalized closeness values that exceed n-1. For example, in one case I have n-1=20 and one node with a closeness score of 24.5 (when alpha=0.5). Does your normalization procedure only apply to closeness when using alpha=0? Could you suggest a way to normalize closeness for alpha=0.5?
Thanks for your help,
Liz
6.
Tore Opsahl | November 9, 2011 at 10:37 pm
Hi Liz,
The non-alpha=0-measures do not have a fixed maximum. As such, it is difficult to normalise the measures. Unfortunately, I do not know of a way to normalize the non-binary scores. If you find one, do let me know!
Best,
Tore
7.
sadia shah | April 19, 2011 at 9:29 am
Tore,
I am using this approach for a directed network….and i come across cases where a node X cannot be reached by another node Z because although connections between intermediate nodes (say Y) exist but not in both directions…shall i consider that the distance X and Z will be infinity?
i m waiting for a quick reply :-)
Regards,
Sadia.
8.
Tore Opsahl | April 19, 2011 at 8:25 pm
Sadia,
Great that you are fining this method interesting and applicable!
The traditional closeness measure requires all nodes to be mutally reachable. The above procedure does not have this requirement.
The distance from one node to another in a directed network might be different from the distance from the latter to the former node. The distance calculation in a directed network generally assumes that paths follow ties direction (e.g., if a has a tie with b, and b has a tie with c, the there is a path from a to c, but not from c to a). The distance_w and closeness_w-functions in tnet use this procedure.
Hope this helps,
Tore
9.
sadia shah | April 20, 2011 at 10:05 am
Thank you for noticing this comment and replying to it so quickly:)…Yes it did help…..
I need one further guidance related to the dataset i am using. it is an email network which is weighted,directed and has disconnected components…….I have some email sender nodes but their recipients are missing………
for example node X send 2 or say 3 very important emails but i do not know who were the recipients……Of course i can not deny their existance………..what could be done?
Can u suggest something?
Regards,
Sadia.
10.
Tore Opsahl | April 24, 2011 at 11:10 pm
Sadia,
An always interesting, but sometimes forgotten concept in network analysis, is the boundary of the network. Unfortunately, few, if none, network measures are able to incorporate missing nodes. Let me know how you deal with this issue.
Best,
Tore
11.
sadia shah | May 31, 2011 at 6:35 am
Tore,
I have a small issue…….while calculating the average closeness of all nodes, can i remove nodes having 0 closeness with the rest of the network by considering them to as isolated nodes? e.g. from the above network, can i remove node K while finding average?
waiting for a reply.
Sadia.
12.
Tore Opsahl | May 31, 2011 at 11:06 am
Sadia,
If you save the output from the closeness_w-function as an object called out, then you can extract the rows of out where closeness is greater than 0, and calculate the mean of the closeness column. Below is some sample code that could replace the last line in the code in the blog post.
Best,
Tore
13.
sadia shah | June 8, 2011 at 2:07 pm
tore,
thank u for the help….can u explain:
what will be the possible effect of removing “0″ closeness nodes on the mean closeness of the network?
or can u recommend any other resource from where i can read or get some theoretical guidence?
ur replies always raise new questions in my mind:)
regards,
sadia.
14.
Tore Opsahl | June 9, 2011 at 11:33 am
Sadia,
By removing the nodes with a score of 0, you will increase the mean. However, this is more a question of the boundary of the analysis/network. Should isolates be included? If yes, then the 0 scores should be included. If not, then they should be removed.
Best,
Tore
15.
Chavdar Dangalchev | September 19, 2011 at 2:44 pm
Hi Tore,
How your definition is different from the definition used in:
“Latora V., Marchiori M., Efficient behavior of small-world networks,
Physical Review Letters, V. 87, p. 19, 2001.”
?
Shouldn’t you start quoting Latora and Marchiori?
Regards,
Chavdar
16.
Tore Opsahl | September 19, 2011 at 4:02 pm
Hi Chavdar,
Thank you for guiding me to this article. It is very interesting how they created a unifying small-world measure. This is something I have been thinking about for quite some time.
In this post, I focused on centrality, or more specifically, node closeness scores. You are absolutely right that the inverse of geodesic distances were also taken in Latora and Marchiori (2001); however, they did so from a different background (small-world literature) to reach a very different outcome (i.e., understanding the overall function of the network). The path of research that I was following originated with Freeman’s (1978) work on centrality. In fact, it is worth noting that the terms closeness and centrality are not even mentioned in Latora and Marchiori (2001).
The proposed measure by Latora and Marchiori (2001) enables an assessment of the connectedness of a network. Although I don’t think that the normalisation using n*(n-1) is appropriate as the small-world literature has told us that geodesic distance does not scale with n-squared, it does show how a measure to test for the existence of a backbone in networks could be created. In fact, it is exactly this where I believe the paper is contributing to the literature.
Thanks again for pointing me to this paper!
Tore