Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: community: our work

Topic: Bruno Gavranović


view this post on Zulip Bruno Gavranović (Mar 14 2024 at 12:08):

This is as good as time as any to make a thread here: I've published my PhD thesis!

The thesis title is

Fundamental Components of Deep Learning: A category-theoretic approach

You can find it here, together with the accompanying blog post. At some point I will give more context around this, but for now I'll just leave here the last paragraph of the thesis which summarises how I see this thesis fitting into a broader research programme which studies the process by which we find structure in data, and attempts to formalise this with concrete implementation of systems which do that via iterative updates.

Screenshot_20240314_120107.png

I'm looking forward to hearing your thoughts and comments.

view this post on Zulip Bruno Gavranović (Mar 14 2024 at 12:13):

And lastly, I want to promote here our latest position paper Categorical Deep Learning: An Algebraic Theory of Architectures which directly builds on the Para\mathbf{Para} construction and provides a general theory of architectures of neural networks that directly subsumes Geometric Deep Learning : it formulates equivariant maps as morphisms of monad algebra homomorphisms, and shows how by genrealising these to particular endofunctor algebra homomorphisms we can go beyond equivariance to invertible transformations, and start capturing transformations which aren't invertible, such as constructors for lists, trees, and all sorts of inductive and coinductive structures in theoretical computer science.

view this post on Zulip Bruno Gavranović (Mar 14 2024 at 12:14):

This revealed some interesting correspondences, for instance those between parametric mealy machines and recurrent neural networks. We're still trying to figure out many parts of this and I imagine lots of people in the community here might have interesting ideas about structured computation that can be captured in these ways.

Screenshot_20240314_121419.png

view this post on Zulip Ralph Sarkis (Mar 14 2024 at 15:27):

Bruno Gavranović said:

I'll just leave here the last paragraph of the thesis which summarises how I see this thesis fitting into a broader research programme [...]
I'm looking forward to hearing your thoughts and comments.

That resonates with a vague thought I had while watching this video yesterday. In it, they build an AI agent to play Trackmania (a car racing game), the agent is faster than humans, but very inconsistent due to some chaotic behavior in the physics engine. On the other hand, humans can get fairly consistent (modulo human parameters like fatigue) but not at faster speeds. I was thinking about how to train for consistency in such a setting, and I believe that maybe using a compositional model would help (if the agent understands which settings are harder to navigate like humans do, it will be more careful and try to mitigate the inconsistencies). Are there some results about compositionality as a means to achieve consistency?

view this post on Zulip Bruno Gavranović (Mar 15 2024 at 22:32):

Ah, absolutely!

I think compositional models are the key to achieving consistency. Take for example the universal approximation theorem: it tells you that with an infinitely wide network you can approximate any function. And while that's true, there's a very neat visual proof that it does not tell you anything about generalisation: say, fitting a function f(x)=x2f(x)=x^2 is done by fitting a bunch of squares under it, meaning for any finite dataset there's always a part of the input which the neural network hasn't seen, and has no means of generalising well on.
So there is no 'consistency' as you call it: you can't fully learn function x2x^2 with a single linear layer, because you can only produce a linear combination of inputs, which x2x^2 is not.

I actually watched the other video from the same creator on Youtube and I believe the same story applies here. This current Trackmania model seems to largely overfit, and can easily get confused on out of distribution examples.
For this particular case, some physics-based priors seem to be necessary to achieve good generalisations. But I'm not sure what algebraic structure can be used to encode them.

I think the same issue arises when dealing with, say, structural recursion. There is ample evidence (1, 2) that transformers perform an analogus operation to 'fitting squares' when learning complex algorithms, and have no architectural bias that would allow them to learn how to perform, say, structural recursion. They can learn it for 1,3,10, or perhaps 20 steps, but they eventually break down.

The deep learning community indeed identified compositionality as a useful tool in describing the world around us --- take Yoshua Bengio's Turing Lecture which states that 'Compositionality is useful to describe the world around us effectively'. But no specific connections to category theory have been made.

view this post on Zulip Graham Manuell (Mar 16 2024 at 06:38):

Perhaps the consistency problems can be dealt with by explicitly training for consistency by selecting for strategies that take longer before diverging when applied to very small perturbations of the input?

view this post on Zulip Noah Chrein (Apr 03 2024 at 19:13):

Bruno can I ask about your involvement with Paul lessard and symbolica? He was just on machine learning street talk and they claimed to have raised $30M, the main paper they referenced was CDL: algebraic theory of architectures.

What is this $30m specifically for? Are you accruing compute to now train a model based on Para?

view this post on Zulip Noah Chrein (Apr 03 2024 at 19:15):

Congrats on publishing the thesis btw!

view this post on Zulip Ryan Wisnesky (Apr 09 2024 at 22:00):

https://finance.yahoo.com/news/vinod-khosla-betting-former-tesla-130000001.html

view this post on Zulip fosco (Apr 10 2024 at 07:58):

Noah Chrein said:

What is this $30m specifically for?

And even more importantly, how much of these 30m are you giving to your fellow category theorists to fund their research? :hearts:

view this post on Zulip Morgan Rogers (he/him) (Apr 10 2024 at 07:59):

@Ryan Wisnesky please don't post links without accompanying text...

view this post on Zulip Morgan Rogers (he/him) (Apr 10 2024 at 08:24):

I clicked anyway. It made me sad.

Khosla admits he does not understand the math-filled paper—pointing out there are very few people in the world who fully understand category theory—“when these really smart people gravitate to an idea, it’s an important idea,” he said.

:man_facepalming: I really do not want CT to be beached by a tech hype tide... but at least if it's successful enough policymakers will put money into it for a while.

view this post on Zulip Bruno Gavranović (Apr 10 2024 at 08:41):

Noah Chrein said:

Bruno can I ask about your involvement with Paul lessard and symbolica? He was just on machine learning street talk and they claimed to have raised $30M, the main paper they referenced was CDL: algebraic theory of architectures.

What is this $30m specifically for? Are you accruing compute to now train a model based on Para?

Absolutely, @Noah Chrein .

This is as good time as any to announce a few different pieces of exciting news:

view this post on Zulip Bruno Gavranović (Apr 10 2024 at 08:42):

I'm incredibly excited about this, for a few different reasons.

Monoids, categories, universal properties and other concepts from category theory have been an indespensable tool for myself and many other scientists for understanding the world. It allowed us to find robust patterns in data, and also communicate, verify and explain our reasoning to one another. In many ways, isn't this the goal of deep learning? Creation of models which understand the world in robust, generalisable, but also verifiable ways?

view this post on Zulip Bruno Gavranović (Apr 10 2024 at 08:45):

Noah Chrein said:

What is this $30m specifically for?

The funding is to be used for the development of this research programme, staffing, and compute.

view this post on Zulip Bruno Gavranović (Apr 10 2024 at 08:46):

fosco said:

how much of these 30m are you giving to your fellow category theorists to fund their research? :hearts:

We're hiring fellow category theorists! We're opening up offices in UK and AUS - check out the job ads.

view this post on Zulip Bruno Gavranović (Apr 10 2024 at 08:50):

Morgan Rogers (he/him) said:

Khosla admits he does not understand the math-filled paper—pointing out there are very few people in the world who fully understand category theory—“when these really smart people gravitate to an idea, it’s an important idea,” he said.

:man_facepalming: I really do not want CT to be beached by a tech hype tide... but at least if it's successful enough policymakers will put money into it for a while.

This is a justified concern, and one I've been thinking about myself. It's easy to overpromise and build hype! I'm trying hard - together with some other fantastic people we've hired - to keep us grounded.

Khosla indeed isn't privy to insights behind CT, but our team is.

view this post on Zulip Morgan Rogers (he/him) (Apr 10 2024 at 09:15):

You should advertise those over in #community: positions !

view this post on Zulip John Baez (Apr 10 2024 at 14:22):

I did it - easy enough to do while I was looking over the ads.

view this post on Zulip Bruno Gavranović (Apr 10 2024 at 14:42):

Thanks @John Baez !

view this post on Zulip Noah Chrein (Apr 10 2024 at 15:19):

Thanks for the info Bruno, this is wonderful.

Morgan Rogers (he/him) said:

I really do not want CT to be beached by a tech hype tide

This is just one of the first waves of a storm brewing in structured AI. The structure-first (i.e. Cat Theory) view of AI is probably correct and I think more VC type "whales" are about to beach themselves here. Let's hope we don't get too disrupted by whatever explodes out of the beached whales. This is what I was trying to allude to in this thread when I was mentioning potential AI agents entering into the zulip (originally in response to a different thread but moved by a mod). 30m is a "drop in the ocean" for AI VC funding and I can imagine some poaching of category theorists.

I wonder @Bruno Gavranović if you have any thoughts about how this community and individual category theorists can navigate the structured-AI hype train that, I feel, is about to hit CT. Maybe we can continue this conversation in that other thread or start a new one.

view this post on Zulip Morgan Rogers (he/him) (Apr 10 2024 at 15:36):

What does "VC" stand for?

view this post on Zulip Ralph Sarkis (Apr 10 2024 at 15:37):

https://en.wikipedia.org/wiki/Venture_capital

view this post on Zulip Morgan Rogers (he/him) (Apr 10 2024 at 15:44):

Thanks @Ralph Sarkis