Those measures should work on fragmented networks. The one metric which is sensitive to fragmentation is closeness (or at least the standard version of it: https://toreopsahl.com/2010/03/20/closeness-centrality-in-networks-with-disconnected-components/)

Good luck!

Tore

If these coefficients are not compatible with fragmented bipartite networks, are there any alternative measures that might be useful here?

If anyone knows, please share!

]]>When I obtain the one mode projection of the above csv file, I get an edge list with vertices V1 and V2 with weights such as v1 40 v2 57 w 2. This results in loss of data as in some cases there are 3 vertices , for example ( RefID 17613 has 3 attributes 40 57 and 85). ]]>

I am not entirely sure what your goal is. You could compute the distance among attributes using the distance function with gc_only=FALSE.

Best,

Tore

I have a bipartite network with about 2000 rows and 2 columns in a csv file ( partially shown here)

RefID,Attributes

17562,24

17573,67

17574,82

17580,55

17613,40

17613,57

17613,85

17616,24

17630,75

17632,9

17643,13

17672,25

17711,40

17733,40

17733,57

17733,85

17791,43

17797,24

17807,41

17818,13

17901,32

17936,67

17941,78

17977,82

17977,21

18001,19

18011,23

18012,34

18050,81

18057,81

18070,79

18088,83

Would I be able to obtain the unique clusters of the attributes from the second column such as {40,57,85}, {82,21} so that I can compare them and check whether they exist across other csv files using tnet. I have done one mode projections of the bipartite matrix and have obtained edge lists for the 2 columns separately but I seem to lose valuable information of clusters that have more than 2 attributes. Hence working with a two mode matrix seems more sensible. I would be grateful for any pointers in the right direction. ]]>

Thank you so much for your kind reply.

You answered my question.

Best regards,

Jinseok

]]>Thank you for reaching out. I’m glad you find tnet helpful.

There are two aspects of all clustering coefficients: a numerator and a denominator. While most explanations are solely focused on the numerator, the denominator is key when comparing the ratio. It is true that one-mode projections “over-count” the number of triangles as a single secondary node with three or more nodes generate an “automatic triangle”. In the paper, I argued that these triangles do not represent triadic closure as this notion is related to an existing triplet (e.g., A->B->C) that are responsible for a heightened probability of a closing tie (i.e., A->C). These automatic triangles increase the numerator as well as the denominator as both open and closed triplets are also counted there. The over-counting happens because all these triangles are closed.

Now, how does this compare to the two-mode clustering coefficient. First, there are a different number of 4-paths in these networks (i.e., the denominator is different). This number is not necessarily smaller than the denominator in the projected one-mode network as multiple secondary nodes among primary nodes can create more 4-paths than projected triplets. For example, the ties:

A->1->B

A->2->B

B->3->C

create two 4-paths from A to C (whereas the projected one-mode network only would have a single triplet):

A->1->B->3->C

A->2->B->3->C

In essence, if it is more common among closed 4-paths to have multiple secondary nodes than the bias of automatic triangles in the projected one-mode network, the two-mode clustering coefficient could be higher than the one computed on the projected one-mode network.

Hope this helps,

Tore

I am currently using tnet for a large dataset without any problem.

Thank you again for your kind help.

I have a question.

One of my collaboration networks shows a tm clustering coefficient (0.34) higher than Newman’s clustering coefficient (0.33).

In your paper (Triadic closure in two-mode networks: Redefining the global and local clustering coefficients – Social Networks), you reported tm clustering coefficients a little higher than Newman’s measure for random networks of Scientific Collaboration and Norwegian Directors networks (Table 1).

So, I think it is possible that a tm clustering coefficient can be higher than Newman’s coefficient in a network.

But the problem is that I can not figure out what network structure or cases can cause this.

Could you please provide any clue or insight to me?

Best regards,

Jinseok Kim

]]>Thank you very much for your kind concern and reply.

Kind regards,

A.W.Mahesar

]]>If your tie weights are high, you might get integer overflow (especially on the 32-bit version of R). This is a limitation of R.

For an example of plotting degree distributions on log-log scales with regression line, see https://toreopsahl.com/2009/10/16/similarity-between-node-degree-and-node-strength/

Best,

Tore

I have found local clustering coefficient for my network and in case of GM method I’m getting NA result in whole column. On the other hands I’m getting values for remaining methods. The r-project shows that produced by integer overflow. What could be reason for this. Further, is there any way to plot power-law behavior on log-log scale in tnet? I’ll remain thankful for your kind concern on this.

Kind regards

A.W.Mahesar

]]>This is a problem of using R. I have traded-off speed with memory usage (large objects vs loops). If you are using the clustering_tm-function, then there is a c++ version that is much faster and memory efficient. Email me if you need these files.

Best,

Tore

I am using your tnet package in R. I’d like to calculate two mode data which has about 1,700 vertices in the primary data and about 1,800 vertices in the secondary data. But when i tried using your tnet package in R, i got an error message, often. Such as below.

Do you have any solutions on these problems? pleas teach me your methods.

from jung sung hoon in S. KOREA

Error: cannot allocate vector of size 1.8 Gb

In addition: Warning messages:

1: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 8053Mb: see help(memory.size)

2: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 8053Mb: see help(memory.size)

3: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 8053Mb: see help(memory.size)

4: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 8053Mb: see help(memory.size)

The local clustering coefficient is undefined (or NaN in R) for isolates and nodes with only one connection. This is similar to the one-mode version as the denominator (in the one-mode coefficient) is n(n-1) where n is the number of connections. If n=1, the denominator is equal to 0, and it is not possible to divide a numerator with 0.

Hope this helps,

Tore

I was using the clustering_local_tm(net) for my two-mode-network with 700 vertices in the primery node set and 24 in the secondary node set. The resulting table contains a lot of values that are “Not a number” i.e.

Node lc

696 696 NaN

697 697 NaN

698 698 NaN

699 699 NaN

700 700 0.008620690

701 701 NaN

Is there any explanation for this?

Thanks!

]]>You are running out of memory.. The R-functions were written to be faster than memory efficient due to loops being really really slow in R. For larger networks, I have c++ functions that are very fast and very memory efficient; however, they require some knowledge to work. Email me and I will send them to you.

Best,

Tore

I am using the clustering_local_tm function for my two mode data which has 6029 vertices in the primary node set and 9313 vertices in the secondary node set with 57912 edges connecting them.When i tried using the function, i got an error message stating:

Error: cannot allocate vector of size 1.4 Gb

In addition: Warning messages:

1: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 4043Mb: see help(memory.size)

2: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 4043Mb: see help(memory.size)

3: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 4043Mb: see help(memory.size)

4: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), :

Reached total allocation of 4043Mb: see help(memory.size)

can u please help me out with the problem.

Thanks

Yaseswini

Yaseswini

]]>You can swap the columns to get local clustering scores for the secondary nodes. For example, if you network is in an object called net which has two columns, you can simply type `net <- net[,2:1]`

to reverse the columns.

Best,

Tore

I am using the tnet package for bipartite graph analysis. Iam particularly interested in calculating the clustering coefficient of all the vertices of the network. But the documentation says that the clustering coefficient function “clustering_local_tm(net)” returns the values for the primary node set.How do i calculate the clustering coefficient of the second set? could you please explain

Thanks

Yaseswini

Thanks for spotting a missing piece of the documentation. The second column called lc is simply the binary local clustering coefficient for two-mode networks.

Let me know if there is anything else – I will incorporate this in the upcoming version of tnet!

Best,

Tore

node lc lc.am lc.gm lc.ma lc.mi

1 1 0.5789474 0.5756300 0.5568930 0.5759220 0.4851772

2 2 0.3787879 0.4117520 0.4336639 0.3799618 0.5025974

3 3 0.2736781 0.2912917 0.2849269 0.2914947 0.2684981

4 4 0.3240741 0.3355950 0.3482124 0.3313095 0.3813559

5 5 NaN NaN NaN NaN NaN

6 6 NaN NaN NaN NaN NaN

I assume that the columns with suffixes am, gm, ma, and mi are calculated with the arithmetic and geometric mean and the maximum and minimum value for the 4-paths, but how is the first lc column calculated? I could not find an explanation in the manual or online.

Thanks, John

]]>