You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I'm pleased to annouce a new open source software project that we're developing at Topos Institute: CatColab. The tool is under heavy development but you can play with a demo at the link above.
In one sentence, the aim is to make modeling in category-theoretic domain-specific languages accessible to people who do not necessarily have any training in category theory or mathematics generally. For more, @Kevin Carlson has written a nice blog post explaining the project and where we hope to go with it: Introducing CatColab
Great work! I can't wait to play with it when I find some time
CatColab is software based on double categories (or more precisely double theories), created by @Evan Patterson, @Kevin Carlson and perhaps others at the Topos Institute.
v0.2 is out - read about it here:
New features include:
CatColab is now a usable, but not polished, tool for co-authoring and publishing both ologs and systems dynamics models.
Thanks for posting, John!
Thanks for making this! I feel like I've been looking for something like this and all its cool things for years. I have just started to play in it. Is this channel the best place to get updates, or is there a listserv that I can sign up for, or something else?
Thanks for this, JR! We haven't advertised it much yet, but we have a public Zulip instance where we post more regular updates: https://catcolab.zulipchat.com
@Evan Patterson - I was showing @Adittya Chaudhuri the feature where Catcolab uses Kleisli morphisms between signed graphs to detect balancing loops and other motifs, and when I tried to do it for the Sustainable Peace signed graph, my browser emitted a warning that the program was taking a long time to run and was slowing things down. He replied:
I have one question "the usual causal loop diagrams/regulatory networks that I have seen from biologists here are very very large (like disease pathways). Can this software find feedback loops/feedforward loops for them in a reasonable time?
I mean the question is "if the graph size is very large, can this software still find the feedback loops in a reasonable time"?
Ack, caught! It runs fine on the cap and trade example but unfortunately sustainable peace currently produces a combinatorial explosion we haven’t set up to manage gracefully. Working on it.
The first thing we have to figure out is whether the algorithm or the Rust code itself is too inefficient or whether, after having compiled it to Web Assembly, the web browser is simply getting mad that we're asking it to do a big computation.
In general we haven’t put essentially any work into the algorithmics of morphism search in CatColab yet, but the theoretical complexity should be with the problem of finding subgraphs of a fixed shape in a given graph.
Hmm, do I mean that? I might not.
John Baez said:
the usual causal loop diagrams/regulatory networks that I have seen from biologists here are very very large (like disease pathways). Can this software find feedback loops/feedforward loops for them in a reasonable time?
It would be fun to get our hands on one of these big disease pathways.
At least for this particular problem I should say the complexity can be reduced to where are the numbers of vertices, edges, and indecomposable loops-to-be-found. Hopefully there are analogous results for more general morphism domain shapes. There might be interesting algorithms work here, or maybe it all basically reduces down to loops and paths in graphs.
Kevin Carlson said:
In general we haven’t put essentially any work into the algorithmics of morphism search in CatColab yet, but the theoretical complexity should be with the problem of finding subgraphs of a fixed shape in a given graph.
But I think that complexity seems nonthreatening, in most cases.
unfortunately sustainable peace currently produces a combinatorial explosion
Ironic, ain't it? :upside_down:
Evan Patterson said:
Adittya Chaudhuri wrote:
the usual causal loop diagrams/regulatory networks that I have seen from biologists here are very very large (like disease pathways). Can this software find feedback loops/feedforward loops for them in a reasonable time?
It would be fun to get our hands on one of these big disease pathways.
Can you get one, @Adittya Chaudhuri?
Kevin Carlson said:
But I think that complexity seems nonthreatening, in most cases.
Just to finish the loudly echoing “but” there, if the graph is sufficiently dense then the number of circuits can be factorial in the number of edges, so we’re not going to be able to guarantee fast results in all cases. I suspect that in realistic cases the graphs are large but relatively sparse and so will probably not be asymptotically much bigger than , but that’s just vibes.
John Baez said:
Evan Patterson said:
Adittya Chaudhuri wrote:
the usual causal loop diagrams/regulatory networks that I have seen from biologists here are very very large (like disease pathways). Can this software find feedback loops/feedforward loops for them in a reasonable time?
It would be fun to get our hands on one of these big disease pathways.
Can you get one, Adittya Chaudhuri?
IMG_8675.HEIC
This is the photo of a pathway in my current office in Rostock. I am talking about these "sizes". Although the diagram is not exactly like signed graphs, it is somewhat informal (that's what I am trying to formalise using the "Mathematics of graphs with generalised polarities with @John Baez" ).
Wow! It'd be very interesting to get our hands fully on something like this as we consider what further features regnets need. Another optimistic thought on complexity is that presumably biological pathways will generally be largely decomposable into sub-pathways with no long loops jumping between these components.
https://www.kegg.jp/kegg-bin/show_pathway?hsa05200 this is a pathway in cancer
Thanks, this is great! (In case it helps anyone else, if you click the "Help" button on the top, it shows you a legend for what all the different graphical notations mean.)
Welcome :)
First of all, thanks for the discussion. I have a point to talk about. I work with some biologists who are not well equipped with abstract pure maths but are quite successful in their field. They often (I think almost in every meeting) ask me "How ACT-based frameworks" are better than their "Hand-drawn simulation-based frameworks?" I say to them " rewording what @John Baez said
"So, people are already trying to systematize the use of diagrams. But mathematicians should join the fray. Why? Because mathematicians are especially good at soaring above the particulars and seeing general patterns. Also, they know ways to think of diagrams, not just as handy tools, but as rigorously defined structures that you can prove theorems about… with the help of category theory." in https://johncarlosbaez.wordpress.com/2011/03/04/network-theory-part-1/
Still they are not very happy and want concrete proofs which I can not provide them beacuse of my inexperience in Biology. Then, I said them about Softwares based on Algebraic Julia, CatColab etc. Now, of course finding motifs/feedback loops/feed forward loops are very important for Systems biologists. If I can tell them that ACT provides a way using compositionality "to find important motifs in a very large scale network", then that would definitely be a good convincing point for them. Also, even if it takes 15 days to find the all possible loops in a very very big diagram, still I think they would be happy about ACT frameworks. Basically, I also want an answer that I want to tell them when they will ask me next time again.
Thanks again!!
Great! So getting motif search working on arbitrarilylarge-scale diagrams should be a high priority for us soon, then. It's really nice to know that people would find this impressive.
Thank you!! Then, I would tell them about it!!
Basically, to me, principles of compositionality should give us the liberty to "break down a very big complex diagram" into "small understandable parts", and then an analysis of all the small parts should give us an analysis of the big part. From this point of view, yes, ACT makes great sense for analysing large scale diagrams. The problem is about "concrete demonstrations using very big diagrams ", which they will not be able to do by using hand-drawn simulation models.
I'm very glad you're having this conversation. I think getting biologists to care about CatColab would be very beneficial to both biologists and Topos, and this might be a place to start.
Thank you very much! I completely agree with your point, and I am truly excited about this venture.
Thanks again, Adittya. The cancer pathway diagram is proving very helpful in thinking about how to refine the logics we implement relating to regulatory networks. It'll also provide a nice example of compositionality as we get some capabilities on colimits of notebooks. However, it's not going to be that interesting for finding motifs, is it? As far as I can tell there aren't any loops in the diagram. Do you have any advice on finding a large example with more interesting motifs to search for, other than digging around the KEGG repository at random?
Thanks Kevin. I will definitely search for an example of a large diagram with interesting motifs. However, biologists often use this site a lot https://www.ebi.ac.uk/biomodels/
This is Covid 19 https://www.kegg.jp/kegg-bin/show_pathway?hsa05171. Is it interesting?
Kevin Carlson said:
Thanks again, Adittya. The cancer pathway diagram is proving very helpful in thinking about how to refine the logics we implement relating to regulatory networks. It'll also provide a nice example of compositionality as we get some capabilities on colimits of notebooks. However, it's not going to be that interesting for finding motifs, is it? As far as I can tell there aren't any loops in the diagram. Do you have any advice on finding a large example with more interesting motifs to search for, other than digging around the KEGG repository at random?
Hi, can you please explain a bit on what you meant by "colimits of notebooks"?
John Baez said:
Evan Patterson - I was showing Adittya Chaudhuri the feature where Catcolab uses Kleisli morphisms between signed graphs to detect balancing loops and other motifs, and when I tried to do it for the Sustainable Peace signed graph, my browser emitted a warning that the program was taking a long time to run and was slowing things down.
I've pushed an update that fixes, or rather side steps, this issue by putting a user-configurable upper bound on the length of the paths considered by the motif finder. By default, this number is now set to 5. On the example above, you can see that there are 366 (!) reinforcing feedback loops of length at most 5.
You can still blow up the app by setting this parameter unfavorably on a big example. There is a separate issue about performing such long-running computations in a separate thread to avoid hosing the browser tab, but that's a task for the future.
So now I'd like to get a large and interesting biological example into the tool. Besides the data entry, that might involve figuring out (1) what theory is appropriate for these signaling networks and (2) what motifs biologists might care to look for. I still need to look more closely at the examples you've posted.
Thanks for all this, Evan!
Does this search for reinforcing feedback loops only report minimal-length ones, or also composite ones? If we don't exclude composite ones the number of reinforcing loops of length will typically grow exponentially with . E.g. if you have one minimal such loop of length 1 and one of length 2, both based at the same point, the number of loops of length n will be the (n-1)st Fibonacci number
Thanks!! @Evan Patterson I just checked the sustainable peace example!! It's amazing!!
"what theory is appropriate for these signaling networks"
I was thinking actually about graphs with polarities whose labelling sets has a monoid structure(as in most biological setting they have room more kinds of influences other than positve and negative)
So now I'd like to get a large and interesting biological example into the tool
My colleagues here built the Atlas of Inflammation resolution https://air.bio.informatik.uni-rostock.de. It may have the kind of interesting biological example you are asking," though I am not sure. But I can ask them about it.
(2) what motifs biologists might care to look for.
I am not fully sure. But I can ask my colleagues here this exact question.
Adittya, what are you thinking of as the monoid of polarities in something like the big example you sent? Is the composition of a “phosphorylation” edge and a “ubiquitination” edge just going to be taken in a free monoid?
Yes, in the free category of the associated graph. (For that, first, I think it is essential to "choose an appropriate monoid" suitable for the purpose)
John Baez said:
Does this search for reinforcing feedback loops only report minimal-length ones, or also composite ones? If we don't exclude composite ones the number of reinforcing loops of length will typically grow exponentially with . E.g. if you have one minimal such loop of length 1 and one of length 2, both based at the same point, the number of loops of length n will be the (n-1)st Fibonacci number
As you can see by building your suggested example, we currently provide "simple" loops in the sense that edges aren't allowed to repeat in a returned loop. However, this definition does permit compound loops in a certain sense--you can build a loop by composing subloops without repeating edges. Thus John's example is currently counted as having total reinforcing loops in it.
I personally suspect that the third loop, combining the length-1 and length-2 loops into one grand loop, shouldn't really be counted, but at least we definitely aren't getting Fibonacci!
I do not know. Free monoid can be a good choice. But in a way I think "we can loose some biological significance" with such a generality!
I just don't know what you're supposed to call a path that phosphorylates, then ubiquitates, other than a "phosphorylation-then-ubiquitation" path, which is basically combining in the free monoid!
Unless, I suppose, there's an actual name for the biochemical process of phosphorylation followed by ubiquitation.
Yes, I agree with your point. It is the most natural choice. In that case Free monoid is probably the right choice. I was just thinking about "adding some semantics" while labelling by "finding a right monoid". But, I am not sure, whether such thing can actually be possible
But, yes, I agree, of course, from the perspective of only syntax, Free monoid is the most natural choice
However, if we do refinements can there be a problem with free monoids? (like when we move from signed graphs to categories), as, I think usually refinements come with semantics (for example in usual regulatory networks +.-=-, -.-=+.. so on), but for free monoids, we may loose that kind of significance. Am I understanding correctly here?
The example of moving from signed graphs to categories seems both backwards and non-type-checked to me; are you thinking of moving from categories to signed categories? (Or, equivalently, for the free modles, from graphs to signed graphs?) I wouldn't call the forgetful process from signed graphs to graphs a refinement!
Yes I meant signed categories.
And you're thinking of refining as in, for example, adding signs where there weren't any before?
Yes, "signs assigned to paths by composing the signs on edges by the monoid structure" . Actually, I am currently working on the "general theory of these kinds of structures" with @John Baez on a paper about graphs with polarities.
Well, I don't think I'm following what your concern is about the free monoids, all in all. If there's a monoid homomorphism then you'll always be able to refine -signed graphs to -signed ones, and if is free then that's easy to arrange.
Of course. Yes, I agree. Thanks. But I am talking about finding the right here suitable in specific biological setting.
Sure; you could start with something free and then, if you learn there really is a good name for "phosphorylation then ubiquitation", use a map from the free guy to the guy in which is that new name to improve your theory.
Yes, that's a nice way to approach. Thanks!
I am curious about a situation "when a loop in a very big diagram is "too big".. using ACT tools like structured cospans/decorted cosspans, we can decompose the whole big diagram into small parts, each containing a small portion of the loop. But the loop is only identified when we compose all the small parts together. For example, the yellow loop in the attached hand drawn diagram !
Composition of loop.png
Is such thing possible in CatColab?
More precisely, is the "searching process for a loop in CatColab" compatible with the underlying compositional structure (for eg: symmetric monoidal double categories)?
This hits on a broad and general issue of compositionality that we’ll have to consider: how to decide when it’s algorithmically worthwhile to try to divide and conquer to find loops across components of a composed model via their pieces, versus search globally. There’s no general answer or magic bullet here; it just is the case that there can be loops in a composed model that don’t come from loops in any of the pieces, as you say.
For the moment we don’t support the composition of models in that manner anyway, so it’s not yet a live issue.
I see! Thanks!!
Kevin Carlson said:
I just don't know what you're supposed to call a path that phosphorylates, then ubiquitates, other than a "phosphorylation-then-ubiquitation" path, which is basically combining in the free monoid!
Freely generating the composite is always an option, but I suspect that interactions will often come with a taxonomy that let you do better. In this example, the legend taxonomizes the arrows into enzyme-enzyme relations, gene expression relations, protein-protein interactions, etc. And it turns out that phosphorylation and ubiquitination are both protein-protein interactions! So perhaps you should define "phosphorylation ubiquitination protein-protein interaction".
The idea here is that a sequence of protein-protein interactions is again a (compound) protein-protein interaction. What the formalism of a category sliced over a monoid of "types" can't quite capture is that "phosphorylation" is a "subtype" of "protein-protein interaction". Happily, you can capture this with the more general formalism of simple double theories, namely by introducing a cell like this into your theory. The cell acts as a coercion operation. CatColab doesn't yet let you do this, but it will!
Now I'm aware that this might not make much sense if you don't know about double theories, but the idea here is the same as that of a promonad. See, for example, this blog post: https://topos.institute/blog/2024-01-29-algebras-are-promonads/
@Evan Patterson Thanks a lot!! Yes, I agree that the formalism of category sliced over a "monoid of types" can not capture the idea of a subtype, and for that, we need a more general formalism like "simple double theories" as you explained. From your explanation, now it makes a lot more sense to use "such ideas" than "the natural interpretation as free monoids as @Kevin Carlson suggested previously" when we want to formalise diagrams (like in KEGG) as "labeled graphs like objects where the labels of the edges can be composed".
Also, thank you very much for sharing your article on promonads. It's very interesting!!
I talked to one of my colleagues, Shailendra Gupta (Systems Biologist), regarding "the large example" and about "finding out the motifs". Below, I am trying to summarise the discussion that I had with him regarding the topic:
He referred to this paper: https://www.nature.com/articles/s41467-017-00268-2#Sec22 (he is a coauthor in it). He said Figure 1 can be a good candidate for the case study (It contains more than thousands of nodes.) Although "finding out the motifs" is interesting from the perspective of Systems Biology, he said there are already good softwares which can identify motifs (of size 3-4-5-may be more) in large-scale networks (like more than 1000 nodes ) in directed signed graphs in reasonable time. In fact, in the paper he referred (https://www.nature.com/articles/s41467-017-00268-2#Sec22), he said they have found out the number of important motifs (like the ones in https://www.nature.com/articles/nrg2102) for the network shown in Figure 1 by converting the KEGG like diagrams into regulatory networks, and then, they did the feedback loop analysis using the software Cytoscape https://cytoscape.org by using NetDs (version 2.8) (see Figure 9 in his paper https://www.nature.com/articles/s41467-017-00268-2#Sec22) (although according to him version 2.8 may not be not available now).
This discussion again made me think about "how software built on ACT" is better than "normal software available, which usual systems biologists use for network analysis" from the point of a biologist? Of course, as a mathematician, I am fascinated by ACT-based software, but I am still not able to ignore this question.
Unfortunately, almost everything about the supplementary information in that paper is now broken (their main model 404s and NetDS is long dead), so it's impossible to reproduce the workflow. That said, it's helpful to know that finding motifs in a large network won't in itself be particularly impressive.
"Converting the KEGG-like diagrams into regulatory networks" is something we can do in an automated and verifiable way, whereas I might guess that it was quite laborious for them. Shifting across logics in this way will be a particular strength of CatColab.
"Not having everything bitrot in less than a decade" is also a problem we intend to improve on state-of-the-art on, though of course it's hard to sell people on this as a high priority. If the big model had been built in CatColab, then rather than (or in addition to) posting a complicated, interactive diagram on a webpage, the kind of thing that classically rots quickly, they could have provided a simple JSON dump that could be revitalized by a live version of CatColab in a single click with no downloading, or in the worst case, would have been reasonably easy to parse in some other piece of software, as opposed to this visualization which would be essentially non-replicable even if we had the full version, without several days of manual work, and which is now seemingly entirely lost to the public. Even better than the JSON dump, the CatColab model could be directly shared at a link which would stay live as long at CatColab does, saving the researcher the responsibility to supervise their shared models, which is too much to ask.
It's likely our algorithms for finding feedback loops are no better than the much-lamented NetDS's, but the fact that we see this as a special case of a highly multi-disciplinary problem means that biologists don't have to shoulder the whole burden of building and maintaining software for this narrow problem, a burden which is apparently heavier than the level of need justifies. This is unfortunately another strength that doesn't instantly benefit biologists, of the "maintainability and reliability" kind; I'm just thinking out loud here to see whether I land on anything good.
All of the detailed points above are downstream of the fact that in CatColab, all models are well-defined mathematical objects that we can manipulate formally and uniformly, rather than all this highly manual passing around of complicated files through a long stack of independent tools. This sort of write-only, non-reproducible data analysis seems highly characteristic of scientific computing and we would really like to help improve the situation.
Thank you so much, @Kevin Carlson, for explaining the vital differences between ACT-based software and others. I understand your point of view on it, and from your explanation, I am convinced that ACT-based software is a really exciting object, even for working biologists!!
I’m glad, and thanks for your attention! I hope we can keeping work through you to figure out how to convince some working biologists of the same.
Thanks!! I am also very glad to learn so much about CatColab, ACT-based software in general, and the interesting ideas for labelling KEGG diagrams. I am very happy and looking forward to working with you to figure out how to convince regular biologists of the usefulness of both ACT and ACT-based software.
Adittya Chaudhuri said:
https://www.kegg.jp/kegg-bin/show_pathway?hsa05200 this is a pathway in cancer
nothing useful to add to the discussion but this caught my eye lol
image.png
A stairway to immortality!! lol
I believe cancer cells are immortal, and that's a big part of the problem.
Yes, I agree.
Btw this reminds me of how some people now are seeking immortality, in fact using the same method - extending the telomeres. If rich people succeed in getting this, it'll indeed be a form of social cancer.
Ohh!! I never imagined that. Of course, it would create a tremendous social issue as the number of rich people is just a handful compared to the whole world population.
John Baez said:
Btw this reminds me of how some people now are seeking immortality, in fact using the same method - extending the telomeres. If rich people succeed in getting this, it'll indeed be a form of social cancer.
I pray every day God doesn't take his best invention away from us---mortality.