Comments on: Clustering in Two-mode Networks

By: Juan C. Correa

Juan C. Correa — Sun, 05 Jan 2025 17:54:36 +0000

In reply to Tore Opsahl.

Dear Tore,

Thank you very much for your comment. I agree with you regarding the robust decision on which node set is primary.

Thanks to your comments, I have the impression, that clustering bipartite networks might be a very good topic to contribute to (perhaps to the journal Social Networks or Applied Network Science). In a previous comment, you mentioned that "I never found a real-world directed two-mode network and therefore I never wrote it up."

I think that another example of real-world bipartite directed networks is available in the concept of "economic complexity" coined by Hausmann and Hidalgo (2009) (https://www.pnas.org/doi/full/10.1073/pnas.0900943106). These researchers applied bipartite directed networks using data from the international trade with countries that import goods from other exporter countries. Please note that in the context of international trade, the distinction between importers and exporters is pretty evident when it comes to deciding which node set should be primary. The exporter takes the action of selling goods to the importer. But this "who-takes-action-on-whom" is context-dependent because a country can play both roles for two different items. Thus, in the case of oil-related products, a country can be an importer (being part of the second partition), but in the case of electronics, that same country can be an exporter (being part of the first partition).

By: Tore Opsahl

Tore Opsahl — Sun, 05 Jan 2025 17:13:55 +0000

In reply to Juan C. Correa.

Agree Juan. However, I think we are talking about two different definitions of directionality in two-mode networks.

Conceptually in my opinion, directed ties starts from a node taking an action to form the tie and ends at the node subject to that action (e.g., a person sends a message to another).

Two-mode networks often have one node set that takes the action (e.g., person attending an event), and this defines which node set is primary (e.g., people) and secondary (e.g., events). It is important to think about this directionality as flipping the node sets changes the analysis completely (e.g., events being primary and people being secondary nodes). I believe this is what you are referring to as the ties in your network only go from one node set to the other, but never in reverse. It’s important to make a robust decision which node set is primary.

In directed one-mode networks, ties can go in both ways (e.g., a person sends a messages to another, and then that person can reply). I have yet to find a good real-world example of a two-mode network with this characteristic (i.e., nodes in both node sets could initiate a tie). The solution suggested above would require this.

By: Juan C. Correa

Juan C. Correa — Sat, 04 Jan 2025 21:23:31 +0000

Dear Tore,

Thank you for your insightful comment. As far as I know, clustering bipartite directed network is an ongoing debate. Another recent solution is this https://arxiv.org/abs/2401.17887. I am going to study it to see if it works in my context. The direction of ties in my network can not be ignored, as it has other implications for knowledge graphs development in higher education research.

By: Tore Opsahl

Tore Opsahl — Sat, 04 Jan 2025 16:57:39 +0000

In reply to Juan C. Correa.

Hi Juan,

I would analyze your network as an undirected two-mode network. You might consider flipping the node sets (i.e., brochures/programs as primary set and skills as secondary set).

The method combining transitivity and four-paths would not work in your context as there would be no directed four paths given no links from primary nodes (skills) to secondary nodes (brochures/programs).

By: Juan C. Correa

Juan C. Correa — Fri, 03 Jan 2025 17:39:38 +0000

In reply to Tore Opsahl.

Hi Tore! Sure. I am leading an ongoing research on the academic offering in a sample of universities. Here, the academic offering (captured by university brochures intended to recruit students) is scrutinized with a keyword-in-context search to define a rectangular matrix where rows are skills (e.g., "critical thinking") and columns are brochures. I take this matrix as the input for modeling a bipartite network where skills are modeled as nodes in the first partition and brochures are nodes of the second partition. As brochures summarize the information about the the skills and knowledge they aim to, the direction goes from brochures to skills (and not the other way around). In a previous article, we modeled the academic offering as a non-directed bipartite model (see here https://www.tandfonline.com/doi/full/10.1080/03075079.2023.2254799). However, in the literature on job-related skills, there are some works that suggest the idea of modeling as directed networks (e.g., https://journals.sagepub.com/doi/10.1177/2158244020915904)

By: Tore Opsahl

Tore Opsahl — Fri, 03 Jan 2025 15:55:39 +0000

In reply to Juan C. Correa.

Hi Juan: Great question. There is a way to combine transitivity (i.e., directed one-mode clustering) and the two-mode clustering proposed here; however, I never found a real-world directed two-mode network and therefore I never wrote it up. Could you elaborate about the network that you are analyzing?

Directed two-model networks
I often think about two mode networks as, for example, people attending events or people writing papers together. These are not directed in my mind if the direction of ties is always from one node set to another — in other works, always from primary nodes to secondary nodes and never from secondary nodes to primary nodes. For example, people go to events, but events never go to people.

Extension to directed two-mode networks
A four-path would be primary node 1 to secondary node A to primary node 2 to secondary node B to primary node 3, and this would be a closed four-path if primary node 1 is connected to another secondary node (e.g., C) which in turn is connected to primary node 3.

By: Juan C. Correa

Juan C. Correa — Fri, 03 Jan 2025 01:40:08 +0000

Interesting approach. I would love to know if this approach can be applicable to directed weighted bipartite networks? If so, I would love to share fresh data to see its implementation

By: Tore Opsahl

Tore Opsahl — Sun, 18 Feb 2024 18:24:11 +0000

In reply to wp1310024334.

Hi Toshitaka,

The two mode clustering coefficient is undefined for actors 4 to 6 in your example as their two mode degree is less than 2 — ie they do not have an opportunity to be connectors. If you do not want to apply this assumption, you could always compute the degree using degree_tm-function first.

Tore

By: wp1310024334

wp1310024334 — Sat, 17 Feb 2024 04:19:36 +0000

Hi Tore,

I'm Toshitaka in Japan. Thank you for your excellent research and analysis program.

I am interested in the "clustering_local_tm" in "tnet".

In the following two-mode network, actors 1, 2, and 3 are closed by events A, B, and C. If I calculate the clustering coefficient by tnet, it is 1.

On the other hand, actors 4, 5, and 6 are closed by event D, but the clustering coefficient cannot be calculated by tnet.

Is there an option in the "clustering_local_tm" of "tnet" to assume that actors 4, 5, and 6 are also closed?

eventA eventB eventC eventD

actor1 1 1 0 0

actor2 1 0 1 0

actor3 0 1 1 0

actor4 0 0 0 1

actor5 0 0 0 1

actor6 0 0 0 1

Toshitaka

By: Elena Stasewitsch

Elena Stasewitsch — Sat, 21 Jul 2018 09:10:19 +0000

Hi Tore,
thank you for your quick reply, I already contacted him and I hope he will write back soon. Thank you so much for your help and this awesome website:)
Best,
Elena

By: Tore Opsahl

Tore Opsahl — Fri, 20 Jul 2018 00:30:12 +0000

In reply to Elena Stasewitsch.

Hi Elena,

I did indeed use tnet (and the c++ version of the clustering functions). To get the code, check with my coauthor Antoine as he continued the work after I left academia.

Best,
Tore

By: Elena Stasewitsch

Elena Stasewitsch — Thu, 19 Jul 2018 14:48:39 +0000

Hi Tore,

I just read your paper on the Small-World Phenomenon and was wondering if you provided the script for calculating the randomized networks somewhere and in particular how you created the figure for the
distributions of values (from tie reshuffled networks and weight reshuffled networks (based on 1,000 randomizations))? Did you use tnet or any other program? I would be very happy about a few hints:)
Best,
Elena

By: mparke26

mparke26 — Thu, 06 Jul 2017 18:10:21 +0000

In reply to Tore Opsahl. Thanks Tore!

By: Tore Opsahl

Tore Opsahl — Thu, 06 Jul 2017 03:41:48 +0000

In reply to mparke26.

Hi Matt,

Those measures should work on fragmented networks. The one metric which is sensitive to fragmentation is closeness (or at least the standard version of it: https://toreopsahl.com/2010/03/20/closeness-centrality-in-networks-with-disconnected-components/)

Good luck!
Tore

By: mparke26

mparke26 — Wed, 05 Jul 2017 22:05:45 +0000

Quick question: are these global coefficients (reinforcement & clustering) usable for highly fragmented bipartite networks (numerous isolated components of varying sizes)? The more I think about it, the less I think they will provide meaningful measures of a highly fragmented network large real-world network. I am pretty new to network analysis, so I’d appreciate the opinion of someone who is more knowledgeable on the subject.

If these coefficients are not compatible with fragmented bipartite networks, are there any alternative measures that might be useful here?

If anyone knows, please share!

By: Fareena

Fareena — Sat, 18 Mar 2017 23:20:16 +0000

Thank you , Tore for your response. I will try to work on this. Just to clarify -My aim is to extract attribute information from at least 30 tables similar to the one above and the analyse these values. I want to consider only those RefIDs which have 2 or more attributes and exclude those with single attr values.
When I obtain the one mode projection of the above csv file, I get an edge list with vertices V1 and V2 with weights such as v1 40 v2 57 w 2. This results in loss of data as in some cases there are 3 vertices , for example ( RefID 17613 has 3 attributes 40 57 and 85).

By: Tore Opsahl

Tore Opsahl — Sat, 18 Mar 2017 22:22:56 +0000

In reply to Fareena.

Hi Fareena,

I am not entirely sure what your goal is. You could compute the distance among attributes using the distance function with gc_only=FALSE.

Best,
Tore

By: Fareena

Fareena — Sat, 18 Mar 2017 11:23:01 +0000

Hi Tore
I have a bipartite network with about 2000 rows and 2 columns in a csv file ( partially shown here)
RefID,Attributes
17562,24
17573,67
17574,82
17580,55
17613,40
17613,57
17613,85
17616,24
17630,75
17632,9
17643,13
17672,25
17711,40
17733,40
17733,57
17733,85
17791,43
17797,24
17807,41
17818,13
17901,32
17936,67
17941,78
17977,82
17977,21
18001,19
18011,23
18012,34
18050,81
18057,81
18070,79
18088,83
Would I be able to obtain the unique clusters of the attributes from the second column such as {40,57,85}, {82,21} so that I can compare them and check whether they exist across other csv files using tnet. I have done one mode projections of the bipartite matrix and have obtained edge lists for the 2 columns separately but I seem to lose valuable information of clusters that have more than 2 attributes. Hence working with a two mode matrix seems more sensible. I would be grateful for any pointers in the right direction.

By: Jinseok Kim

Jinseok Kim — Sun, 27 Sep 2015 02:18:29 +0000

In reply to Tore Opsahl.

Tore,

Thank you so much for your kind reply.
You answered my question.

Best regards,

Jinseok

By: Tore Opsahl

Tore Opsahl — Sat, 26 Sep 2015 12:37:45 +0000

In reply to Jinseok Kim.

Hi Jinseok,

Thank you for reaching out. I’m glad you find tnet helpful.

There are two aspects of all clustering coefficients: a numerator and a denominator. While most explanations are solely focused on the numerator, the denominator is key when comparing the ratio. It is true that one-mode projections “over-count” the number of triangles as a single secondary node with three or more nodes generate an “automatic triangle”. In the paper, I argued that these triangles do not represent triadic closure as this notion is related to an existing triplet (e.g., A->B->C) that are responsible for a heightened probability of a closing tie (i.e., A->C). These automatic triangles increase the numerator as well as the denominator as both open and closed triplets are also counted there. The over-counting happens because all these triangles are closed.

Now, how does this compare to the two-mode clustering coefficient. First, there are a different number of 4-paths in these networks (i.e., the denominator is different). This number is not necessarily smaller than the denominator in the projected one-mode network as multiple secondary nodes among primary nodes can create more 4-paths than projected triplets. For example, the ties:

A->1->B
A->2->B
B->3->C

create two 4-paths from A to C (whereas the projected one-mode network only would have a single triplet):

A->1->B->3->C
A->2->B->3->C

In essence, if it is more common among closed 4-paths to have multiple secondary nodes than the bias of automatic triangles in the projected one-mode network, the two-mode clustering coefficient could be higher than the one computed on the projected one-mode network.

Hope this helps,
Tore

By: Jinseok Kim

Jinseok Kim — Thu, 24 Sep 2015 14:43:50 +0000

Hello Tore,

I am currently using tnet for a large dataset without any problem.
Thank you again for your kind help.

I have a question.
One of my collaboration networks shows a tm clustering coefficient (0.34) higher than Newman’s clustering coefficient (0.33).

In your paper (Triadic closure in two-mode networks: Redefining the global and local clustering coefficients – Social Networks), you reported tm clustering coefficients a little higher than Newman’s measure for random networks of Scientific Collaboration and Norwegian Directors networks (Table 1).
So, I think it is possible that a tm clustering coefficient can be higher than Newman’s coefficient in a network.

But the problem is that I can not figure out what network structure or cases can cause this.
Could you please provide any clue or insight to me?

Best regards,

Jinseok Kim

By: Abdul Waheed

Abdul Waheed — Thu, 25 Dec 2014 07:23:37 +0000

Dear Tore,

Thank you very much for your kind concern and reply.

Kind regards,

A.W.Mahesar

By: Tore Opsahl

Tore Opsahl — Wed, 24 Dec 2014 14:33:37 +0000

In reply to Abdul Waheed.

Hi A.W.,

If your tie weights are high, you might get integer overflow (especially on the 32-bit version of R). This is a limitation of R.

For an example of plotting degree distributions on log-log scales with regression line, see https://toreopsahl.com/2009/10/16/similarity-between-node-degree-and-node-strength/

Best,
Tore

By: Abdul Waheed

Abdul Waheed — Wed, 24 Dec 2014 09:41:51 +0000

Dear Tore,

I have found local clustering coefficient for my network and in case of GM method I’m getting NA result in whole column. On the other hands I’m getting values for remaining methods. The r-project shows that produced by integer overflow. What could be reason for this. Further, is there any way to plot power-law behavior on log-log scale in tnet? I’ll remain thankful for your kind concern on this.

Kind regards

A.W.Mahesar

By: Jung, Sung Hoon

Jung, Sung Hoon — Wed, 20 Nov 2013 10:15:24 +0000

In reply to Tore Opsahl. Thank you very much!!!