## Thesis: 2.4 Directed networks

An article based on this chapter has been published (Opsahl and Panzarasa, 2009). This article was written after this chapter and contains a number of changes.

Directed data increase the difficulty of calculating the clustering coefficient. In directed networks, two directed ties might exist within a dyad — one in each direction. A network can be represented by an adjacency matrix, x. If a directed tie from node i to node j is present, the cell $x_{ij}=1$. This matrix is a special case of the weighted adjacency matrix, w. In this matrix, the cell $w_{ij}$ is the weighted of the tie $x_{ij}$. We define the triplet consisting of the two directed ties, $x_{ji}$ and $x_{ik}$, as $\tau_{ji,ik}$, and the value of this triplet as $\omega_{ji,ik}$.

The binary clustering coefficient as stated in Equation 2 cannot be applied to directed data. A more refined measure to calculate clustering in directed networks is called transitivity, T (for a review, see Wasserman and Faust, 1994, pg. 243). Transitivity produces the same results as the binary clustering coefficient if applied to an undirected network (Feld, 1981; Newman, 2003) and shares the same properties. More specifically, $0 \leq T \leq 1$, in a completely connected network $T=1$, and in a classical random network $T\rightarrow 0$ as the network size grows if the average degree is constant. Transitivity takes the direction of the ties into consideration by using a more sophisticated definition of a triplet. A triplet $\tau$, centred on node i, must have one in-coming and one out-going tie, i.e. $x_{ki}=x_{ij}=1$ or $x_{ji}=x_{ik}=1$ as shown in Figures 3a and b, respectively.

Figure 3: Non-vacuous triplets centred around node i

Wasserman and Faust (1994) termed triplets that do not fulfill this condition as vacuous. These triplets are not part of the numerator nor of the denominator of the fraction. More specifically, when we are dealing with directed data there can be four basic configurations of a triplet around an individual node i: $\tau_{ij,ik}$, $\tau_{ij,ki}$, $\tau_{ji,ik}$, and $\tau_{ji,ki}$. The configurations $\tau_{ij,ik}$ and $\tau_{ji,ki}$ form, respectively an out- and in-star, and therefore, are vacuous and not part of the fraction. Conversely, the configurations $\tau_{ij,ki}$ and $\tau_{ji,ik}$ are non-vacuous. These can be either transitive or intransitive.

Triplets defined according to Wasserman and Faust (1994) form chains of nodes. These triplets have been termed 2-path as they form chains of two directed ties between three nodes (Luce and Perry, 1949). A triplet is transitive if a tie is present from the first node to the last node of the chain, $x_{kj}=1$ and $x_{jk}=1$ for the triplets shown in Figure 3a and b, respectively. If some of the ties between the nodes in a triplet are reciprocated (i.e., there exist two directed ties within a single dyad), there might exist multiple triplets between the nodes.

Transitivity suffers from the same limitation as the binary clustering coefficient in that it cannot be applied to networks where weights are attached to ties. To overcome this shortcoming, we extend our proposed generalisation to directed and weighted networks by using the same definition of a triplet, $\tau$, as transitivity. The triplet value, $\omega$, is calculated using the same methods as stated in Section 2.2; however, we use the weights of the two directed ties that form the triplet instead of the two undirected ties. The generalised coefficient, $T_{\omega}$, produces the same results as transitivity if applied to a binary and directed data, and the same result as the binary clustering coefficient if applied to binary and undirected data. Moreover, it still ranges between 0 and 1, and in a completely connected network we would still obtain $T_{\omega}=1$, whereas in a classical random network $T_{\omega}\rightarrow 0$ as the network grows. In particular, we found that $T_{\omega}$ approximates the probability of a directed tie in classical random networks, as shown by Table 4.

Table 4: Simulations of the generalised clustering coefficient on ensembles of classical random directed networks with 50, 100, 200, 400, 800, and 1,600 nodes and an average out-degree (i.e., the number of directed ties originating from a node) of 10, where ties are assigned a random weight between 1 to 10. Each ensemble contains 1,000 scenarios. The four methods for defining triplet values were not statistically significantly different (p>0.7).

To clarify which triplets are transitive and non-vacuous, Table 5 illustrates configurations of triplets centred on node i. The first four rows show the basic configurations mentioned above. The remaining rows show configurations of triplets where ties are reciprocated. More specifically, each additional directed tie doubles the number of triplets. In addition, the table shows which triplets are transitive under different conditions and which triplet values should be included in the fraction of Equation 3.