You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
This is as good as time as any to make a thread here: I've published my PhD thesis!
The thesis title is
Fundamental Components of Deep Learning: A category-theoretic approach
You can find it here, together with the accompanying blog post. At some point I will give more context around this, but for now I'll just leave here the last paragraph of the thesis which summarises how I see this thesis fitting into a broader research programme which studies the process by which we find structure in data, and attempts to formalise this with concrete implementation of systems which do that via iterative updates.
Screenshot_20240314_120107.png
I'm looking forward to hearing your thoughts and comments.
And lastly, I want to promote here our latest position paper Categorical Deep Learning: An Algebraic Theory of Architectures which directly builds on the construction and provides a general theory of architectures of neural networks that directly subsumes Geometric Deep Learning : it formulates equivariant maps as morphisms of monad algebra homomorphisms, and shows how by genrealising these to particular endofunctor algebra homomorphisms we can go beyond equivariance to invertible transformations, and start capturing transformations which aren't invertible, such as constructors for lists, trees, and all sorts of inductive and coinductive structures in theoretical computer science.
This revealed some interesting correspondences, for instance those between parametric mealy machines and recurrent neural networks. We're still trying to figure out many parts of this and I imagine lots of people in the community here might have interesting ideas about structured computation that can be captured in these ways.
Screenshot_20240314_121419.png
Bruno Gavranović said:
I'll just leave here the last paragraph of the thesis which summarises how I see this thesis fitting into a broader research programme [...]
I'm looking forward to hearing your thoughts and comments.
That resonates with a vague thought I had while watching this video yesterday. In it, they build an AI agent to play Trackmania (a car racing game), the agent is faster than humans, but very inconsistent due to some chaotic behavior in the physics engine. On the other hand, humans can get fairly consistent (modulo human parameters like fatigue) but not at faster speeds. I was thinking about how to train for consistency in such a setting, and I believe that maybe using a compositional model would help (if the agent understands which settings are harder to navigate like humans do, it will be more careful and try to mitigate the inconsistencies). Are there some results about compositionality as a means to achieve consistency?
Ah, absolutely!
I think compositional models are the key to achieving consistency. Take for example the universal approximation theorem: it tells you that with an infinitely wide network you can approximate any function. And while that's true, there's a very neat visual proof that it does not tell you anything about generalisation: say, fitting a function is done by fitting a bunch of squares under it, meaning for any finite dataset there's always a part of the input which the neural network hasn't seen, and has no means of generalising well on.
So there is no 'consistency' as you call it: you can't fully learn function with a single linear layer, because you can only produce a linear combination of inputs, which is not.
I actually watched the other video from the same creator on Youtube and I believe the same story applies here. This current Trackmania model seems to largely overfit, and can easily get confused on out of distribution examples.
For this particular case, some physics-based priors seem to be necessary to achieve good generalisations. But I'm not sure what algebraic structure can be used to encode them.
I think the same issue arises when dealing with, say, structural recursion. There is ample evidence (1, 2) that transformers perform an analogus operation to 'fitting squares' when learning complex algorithms, and have no architectural bias that would allow them to learn how to perform, say, structural recursion. They can learn it for 1,3,10, or perhaps 20 steps, but they eventually break down.
The deep learning community indeed identified compositionality as a useful tool in describing the world around us --- take Yoshua Bengio's Turing Lecture which states that 'Compositionality is useful to describe the world around us effectively'. But no specific connections to category theory have been made.
Perhaps the consistency problems can be dealt with by explicitly training for consistency by selecting for strategies that take longer before diverging when applied to very small perturbations of the input?
Bruno can I ask about your involvement with Paul lessard and symbolica? He was just on machine learning street talk and they claimed to have raised $30M, the main paper they referenced was CDL: algebraic theory of architectures.
What is this $30m specifically for? Are you accruing compute to now train a model based on Para?
Congrats on publishing the thesis btw!
https://finance.yahoo.com/news/vinod-khosla-betting-former-tesla-130000001.html
Noah Chrein said:
What is this $30m specifically for?
And even more importantly, how much of these 30m are you giving to your fellow category theorists to fund their research? :hearts:
@Ryan Wisnesky please don't post links without accompanying text...
I clicked anyway. It made me sad.
Khosla admits he does not understand the math-filled paper—pointing out there are very few people in the world who fully understand category theory—“when these really smart people gravitate to an idea, it’s an important idea,” he said.
:man_facepalming: I really do not want CT to be beached by a tech hype tide... but at least if it's successful enough policymakers will put money into it for a while.
Noah Chrein said:
Bruno can I ask about your involvement with Paul lessard and symbolica? He was just on machine learning street talk and they claimed to have raised $30M, the main paper they referenced was CDL: algebraic theory of architectures.
What is this $30m specifically for? Are you accruing compute to now train a model based on Para?
Absolutely, @Noah Chrein .
This is as good time as any to announce a few different pieces of exciting news:
I'm incredibly excited about this, for a few different reasons.
Monoids, categories, universal properties and other concepts from category theory have been an indespensable tool for myself and many other scientists for understanding the world. It allowed us to find robust patterns in data, and also communicate, verify and explain our reasoning to one another. In many ways, isn't this the goal of deep learning? Creation of models which understand the world in robust, generalisable, but also verifiable ways?
Noah Chrein said:
What is this $30m specifically for?
The funding is to be used for the development of this research programme, staffing, and compute.
fosco said:
how much of these 30m are you giving to your fellow category theorists to fund their research? :hearts:
We're hiring fellow category theorists! We're opening up offices in UK and AUS - check out the job ads.
Morgan Rogers (he/him) said:
Khosla admits he does not understand the math-filled paper—pointing out there are very few people in the world who fully understand category theory—“when these really smart people gravitate to an idea, it’s an important idea,” he said.
:man_facepalming: I really do not want CT to be beached by a tech hype tide... but at least if it's successful enough policymakers will put money into it for a while.
This is a justified concern, and one I've been thinking about myself. It's easy to overpromise and build hype! I'm trying hard - together with some other fantastic people we've hired - to keep us grounded.
Khosla indeed isn't privy to insights behind CT, but our team is.
You should advertise those over in #community: positions !
I did it - easy enough to do while I was looking over the ads.
Thanks @John Baez !
Thanks for the info Bruno, this is wonderful.
Morgan Rogers (he/him) said:
I really do not want CT to be beached by a tech hype tide
This is just one of the first waves of a storm brewing in structured AI. The structure-first (i.e. Cat Theory) view of AI is probably correct and I think more VC type "whales" are about to beach themselves here. Let's hope we don't get too disrupted by whatever explodes out of the beached whales. This is what I was trying to allude to in this thread when I was mentioning potential AI agents entering into the zulip (originally in response to a different thread but moved by a mod). 30m is a "drop in the ocean" for AI VC funding and I can imagine some poaching of category theorists.
I wonder @Bruno Gavranović if you have any thoughts about how this community and individual category theorists can navigate the structured-AI hype train that, I feel, is about to hit CT. Maybe we can continue this conversation in that other thread or start a new one.
What does "VC" stand for?
https://en.wikipedia.org/wiki/Venture_capital
Thanks @Ralph Sarkis