Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: learning: reading & references

Topic: Papers Where CT Is Essential For Theoretical Results in AI?

Pierre R (Oct 03 2025 at 20:56):

Hello Guys,

I approached the only Category Theory professor in my department to collaborate in applied category theory in AI as a general guiding research topic for part of my master thesis. She was very kind but asked me a very specific question that I wasn’t ready to answer with my intro-level CT:

Her question, summarized:
In computational/AI contexts, is Category Theory mainly being used just as a language to describe constructions more succinctly, or are there cases where theorems from particular categorical structures are essential to prove non-trivial results in neural networks, algorithms, or other AI subfield — results that could not be established (or would be much harder) without CT?

She compared this to Hopf algebras, where CT is necessary to reach classification theorems (I may not correct citing her here), and asked whether similar things exist in AI/ML.

Ask: I would appreciate references (titles only are fine) to recent papers (2023–present preferred) of this type — either in applied CT for AI, or in applied CT in other computational areas that may be relevant.

Thanks!

Ryan Wisnesky (Oct 04 2025 at 03:32):

my usual rejoinder to this kind of question is that, when you have a compositional system, category theory just applies, same as how when you have a symmetric system group theory just applies; whether or not you must use some underlying aspect of a system such as symmetry or compositionally is often 'no' simply because of human cleverness. Anyway, when database theorists try to figure out if various classes of logic programming language are closed under composition, they don't think of themselves as doing category theory, even though they are directly probing the compositionality of a given system, and this was a major community goal - 20 years ago. Others will have to speak for the more modern kind of AI (database theory being 'symbolic' AI / deductive logic).

James Deikun (Oct 04 2025 at 08:34):

From what I can tell, one of the major problems of AI as a field is that there are hardly any theoretical results that apply to the kind of AI/ML that people use in practice, and all this engineering of tremendous civilizational importance and which huge numbers of people and projects are directly exposed to is being done purely by vibe-guided experimentation, with a very short path to deployment. So probably there aren't any results of importance out there that essentially use category theory, but that says very little about whether category theory could be essential to important results.

John Baez (Oct 04 2025 at 13:29):

Since the person asking the question

are there cases where theorems from particular categorical structures are essential to prove non-trivial results in neural networks, algorithms, or other AI subfield

was not an AI expert but "the only category theory professor" in @Pierre R's department, this was probably not the usual question we all hear, where someone is doubting the usefulness of category theory to their field, and challenging us to change their mind. ("See if you can change my mind! I have it firmly set not to change!")

Instead, it could be a case of a category theorist wondering how their field could be useful to some other subject!

But personally, I find that the usefulness of categories in "applied category theory" is often less about theorems and more about using them as part of a software environment. The theorems exist, but often they merely show that the software makes sense.

Pierre R (Oct 04 2025 at 15:48):

Hello John,

You are completely right in your reading of the question. She is not doubting Category Theory—she works in theoretical CT—and neither am I.

I think what she is really asking me to do is look for AI papers where CT concepts are essential to reaching the main conclusions of the work.

As a beginner, I tend to think that any paper using CT in a meaningful way already makes CT essential. But from her point of view, simply reframing things in terms of Categories, Functors, or Natural Transformations is not enough to count as strong theoretical CT.

That’s probably what prompted my response, since I’m not sure whether my understanding is too basic to even classify work in that way.

Morgan Rogers (he/him) (Oct 05 2025 at 11:36):

Some relevant work appeared at ACT this year, I think. You could check there for slides.

John Baez (Oct 05 2025 at 12:38):

I think what she is really asking me to do is look for AI papers where CT concepts are essential to reaching the main conclusions of the work.

Okay. I think you should read the work of some people here:

Category theorists in AI.

Antonio Lorenzin (Oct 06 2025 at 10:08):

If I can add to the discussion, a paper that I think it's quite important to machine learning theory is the fact that d-separation can be proved in the categorical setting ([2207.05740] The d-separation criterion in Categorical Probability https://share.google/lHyRxqCZ3jC5aU8Ej).

Other recent works that have a more "proving theorems" approach are

[2401.14669] Hidden Markov Models and the Bayes Filter in Categorical Probability https://share.google/lQ0KkPEUh15rl8jhe

[2406.11814] Stochastic Neural Network Symmetrisation in Markov Categories https://share.google/1lR26Vux5fSGAGrsm

I should cite more people to be comprehensive, but these works may be what your professor is looking for.
The important thing here is that by proving theorems for Markov categories, they immediately hold in different probabilistic and possibilistic settings. Ironically, in some books the d-separation criterion is proven for finite sets, and then the proof for Gaussian probability is deemed analogous and basically left to the reader (see Kooler Friedman's book Probabilistic Graphical Models: Principles and Techniques https://share.google/p0h31upQzkfM29Bpu).

Antonio Lorenzin (Oct 06 2025 at 10:21):

Another important insight is how intractable the standard probability theory can be if we consider the setting beyond finite and gaussian probability, where one has to deal with multiple integrations. Without the categorical perspective, even understanding what is the right analogous statement of a theorem for finite sets can be challenging. Using categorical semantics, you even have the proof under standard assumptions (such as "having conditionals").

I remember talking to David Dalrymple of ARIA and he was interested in aciomatisations beyond finite sets and gaussian probability, so it is plausible that this ability of categorical semantics may be important for the future of AI.

Antonio Lorenzin (Oct 06 2025 at 10:23):

Last thing, I'm honestly amazed by the fact that the use of string diagrams have already emerged in machine learning independently from category theory under the name of factor graphs (Factor graph - Wikipedia https://share.google/vnySaqubG4ZNIaVGs).

David Michael Roberts (Oct 06 2025 at 22:55):

@Antonio Lorenzin is there any reason you are giving google share links rather than links to the arXiv? I think I would prefer to go to the source (or if that's where they go, not through Google's tracking)

Antonio Lorenzin (Oct 07 2025 at 06:01):

I have a silly reason, i.e. when I ask my phone to share the link from Google search, it generates a Google link :sweat_smile: Sorry for this!

Here are the correct links:

The d-separation criterion in Categorical Probability https://arxiv.org/abs/2207.05740
Hidden Markov Models and the Bayes Filter in Categorical Probability https://arxiv.org/abs/2401.14669
Stochastic Neural Network Symmetrisation in Markov Categories https://arxiv.org/abs/2406.11814
Koller Friedman book Probabilistic Graphical Models https://mitpress.mit.edu/9780262013192/probabilistic-graphical-models/
Factor graph https://en.m.wikipedia.org/wiki/Factor_graph

Matteo Capucci (he/him) (Oct 17 2025 at 15:49):

Pierre R said:

Her question, summarized:
In computational/AI contexts, is Category Theory mainly being used just as a language to describe constructions more succinctly, or are there cases where theorems from particular categorical structures are essential to prove non-trivial results in neural networks, algorithms, or other AI subfield — results that could not be established (or would be much harder) without CT?

You might want to check Sheaf Neural Networks. In general, the closest subfield of deep learning to CT is Geometric Deep Learning. Some people are also using algebro-geometric methods to study what is arguably the biggest problem in deep learning, namely why do NNs work so goddamn well---this is Singular Learning Theory.

Finally, there have been quite a few papers on compositionality for AI models, starting from https://arxiv.org/abs/1711.10455 and https://arxiv.org/abs/2103.01931, and related papers. Shiebler, Gavranovic and Wilson wrote a lit review some years ago, https://arxiv.org/abs/2106.07032.

However, I believe the jury is still out on whether CT can be applied to mainstream AI to the point that an AI scientist would feel the need to go and learn CT in the same way they eventually learn calculus or linear algebra. In fact, I would say this is true of almost any topic (programming language theory being the most notable exception, IMO). CT remains a great tool to enhance understanding, and to organize knowledge.

kourouklides (Oct 19 2025 at 00:05):

Hey @Pierre R, good for you on choosing such a challenging and interesting topic for your Master's thesis. I hope it all goes well.

Disclaimer: My background is mostly from ML, Complex Systems and Signal Processing, with a very basic understanding of specific subfields within CT that might potentially relate to these three disciplines. So, for advanced CT topics, you should better consult with the more experienced people in this community.

Crucially, for CT (and multidisciplinary CT and ML) papers, you might have to broaden your search before 2023. Unlike the ML literature, you might have to look at CT papers that are decades old. The situation is somewhat different.

You mentioned both ML and AI, so does your definition of AI in the context of your thesis also include non-ML approaches, such as GOFAI / 'Symbolic AI' (e.g., using hand-picked rules and traditional knowledge graphs), as mentioned by others above? I will focus on ML in this reply and use a numbered list for brevity reasons. Also, feel free to DM me at any time, but using the public channels might benefit others too.

There is a common misconception that ML mostly involves heuristics or severely lacks theoretical foundations, but this is far from the truth. The actual problem is that ML is very multidisciplinary with too many theoretical foundations (e.g., Information Theory, Learning Theory, etc.), which creates a major problem when trying to see how everything "fits together". Hopefully, CT can somehow help in this respect in the future.
At the risk of stating the obvious, CT is usually used as a “unifying language” or meta-theoretical framework, so directly using CT to process data or define models (e.g., ANNs) might not be a good angle to explore or even (mathematically) valid.
So far, I would argue that the most impactful work in ML related to CT is the Nonlinear Dimensionality Reduction (i.e. decomposition) method of UMAP, which uses Manifold Learning. Briefly, it builds on top of previous CT work by M. Barr, D. Spivak and others in order to propose a topology-aware method that tries to preserve certain important topological representations/features (but not all). It relates to the category of fuzzy simplicial sets and the category of some specific metric spaces. The authors also released a very stable software library that can be used both for development purposes and research experiments. Links below. In general, the connection between CT and ML is usually more apparent when it comes to Manifold Learning, Topological Machine Learning, etc. UMAP exemplifies this.

Original paper:

https://arxiv.org/abs/1802.03426

Further discussion and explanation of the Maths behind UMAP:

Peva Blanchard (Oct 19 2025 at 06:57):

@Tim Hosgood has written an article about UMAP on the Topos Institute blog. From what I remember from the article, it is not clear UMAP could count as an application of category theory.

kourouklides (Oct 19 2025 at 10:40):

In general, I wouldn’t take the specific blog post into serious consideration dues to a series of reasons (obviously this doesn’t apply to the whole blog by the institute).

As the author admits himself, he is not from an ML background.

What counts or does not count as “an application of CT” needs a definition. But for sure, UMAP has been influenced by CT.

Bruno Gavranović (Oct 19 2025 at 11:30):

I'll add one concrete example: this paper on filter-equivariant functions tells you how to fully extrapolate the behaviour of a certain class of list-functions.

That is, if you have a function f : [a] -> [a] which is natural in a (i.e. it's the component of some natural transformation $\text{List} \Rightarrow \text{List}$ ) and also filter-equivariant, then to determine this function on lists of any length all you need is an example of this function on one length 2 list.

For example, let's say I tell you I have a mysterious natural filter-equivariant function $f$ , and you need to guess what it is. If I tell you that

$\mathsf{f}([4, 7]) = [7, 4]$

then this paper argues that I've fully defined the function for you, and gives you an algorithm for computing the action of this function on any other input list.

While you theoretically could argue that this could've been done 'without CT or FP', or perhaps say that it's merely a 'first step' towards more general mathematical specifications of extrapolation, I think this is a pretty strong contender for 'CT was necessary to obtain this result'.

John Baez (Oct 19 2025 at 11:42):

kourouklides said:

In general, I wouldn’t take the specific blog post into serious consideration due to a series of reasons (obviously this doesn’t apply to the whole blog by the institute).

I don't see @Tim Hosgood arguing in this post that UMAP "doesn't count as an application of category theory". Instead he's trying to explain UMAP in a way that doesn't mention category theory. As he says, "can we unpack things in less technical language?"

Often an application of subject X to subject Y can be explained in a way that doesn't mention X.

Peva Blanchard (Oct 20 2025 at 10:47):

Sorry, I didn't mean to downplay the influence of CT on UMAP. I had Pierre's request in mind

Pierre R said:

are there cases where theorems from particular categorical structures are essential to prove non-trivial results in neural networks, algorithms, or other AI subfield — results that could not be established (or would be much harder) without CT?

And reading Tim Hosgood's article, I was unsure UMAP essentially relies on CT in this way. What seems to be true is that the creators of UMAP were essentially guided towards their invention by CT. So it's good anyway to study their perspective.

John Baez (Oct 20 2025 at 16:34):

Okay, that make sense. Finding theorems in category theory that are essential to proving results in other areas is fairly hard. I'd say that's quite a bit more restrictive than finding "applications" of category theory to other areas.