Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: event: Categories for AI

Topic: Week 5: Monads


view this post on Zulip Andrew Dudzik (Nov 06 2022 at 16:34):

Hi all,

For the final lesson of the course, I'll be going over monads and their algebras, and why they come in handy for typing data in neural networks.

The time is Monday November 7th, 4PM UTC. The Zoom link is here: https://uva-live.zoom.us/j/83816139841
Slides are here: https://docs.google.com/presentation/d/1TBEaz-S5Zq42nDlkJVxDUBIyJ16rK9sD

This lecture will help explain the use of monads in Graph Neural Networks are Dynamic Programmers (Dudzik and Veličković, NeurIPS 2022)

See you all soon!
Andrew

view this post on Zulip Pim de Haan (Nov 07 2022 at 12:46):

The youtube link is https://youtu.be/y16JDvRi8GU

view this post on Zulip Alberto Colombo (Nov 07 2022 at 18:03):

Does anyone has a good reference for monads (or cat theory in general) uses in functional programming?

view this post on Zulip Jethro Djan (Nov 07 2022 at 18:12):

someone has already recommended it but in case you didn't catch it you can get "category theory for programmers" by Bartosz Milewski

view this post on Zulip Jethro Djan (Nov 07 2022 at 18:17):

I also like "Learning functional programming" by Jack Wedman. Chapter 3 has some info on monads translate practically in FP

view this post on Zulip Sichu Lu (Nov 07 2022 at 22:54):

Hey I came across a cool workshop/conference on the use of monads in probability/statistics if anyone here wants to learn more stuff :P https://www.youtube.com/watch?v=3Da88Tgw_rM&list=PLaILTSnVfqtIebAXFOcee9MvAyBwhIMyr&index=1

view this post on Zulip Pim de Haan (Nov 08 2022 at 07:36):

Hi @Andrew Dudzik , I have a question regarding your talk. In the title you mention LSTMs, but they didn't really come back in the talk. I'm wondering how RNNs fit in the picture of list algebras.
It seems like a RNN is like a general map [R]R[R] \to R, but for this to be a proper algebra, it needs to be composed of binary monoid multiplications R×RRR \times R \to R, which is not what an RNN does, as it tracks a hidden state while processing the list. Is there an elegant way of adding a hidden state to the setup, so we can see an RNN as a monoid? Thanks!

view this post on Zulip Pim de Haan (Nov 08 2022 at 08:06):

@Sichu Lu I believe you asked about higher dimensional generalizations of algebras of a monad. Perhaps actegories are that. Here's a survey on those by @Matteo Capucci (he/him) and @Bruno Gavranovic .

view this post on Zulip Andrew Dudzik (Nov 08 2022 at 08:48):

Pim de Haan said:

Hi Andrew Dudzik , I have a question regarding your talk. In the title you mention LSTMs, but they didn't really come back in the talk. I'm wondering how RNNs fit in the picture of list algebras.
It seems like a RNN is like a general map [R]R[R] \to R, but for this to be a proper algebra, it needs to be composed of binary monoid multiplications R×RRR \times R \to R, which is not what an RNN does, as it tracks a hidden state while processing the list. Is there an elegant way of adding a hidden state to the setup, so we can see an RNN as a monoid? Thanks!

Thanks, I intended to bring this up but it got lost in the mix.

I think we can model a hidden state by a module for a monoid, i.e. an algebra MM for the writer monad A×A \times -, where AA is the monoid of updates. Then we have an action A×MMA\times M \to M.

So we think of the input to an RNN a monoid given by a list type and the hidden state as a representation for it. Let me know if I'm overlooking any details.

view this post on Zulip Bruno Gavranović (Nov 08 2022 at 10:18):

Indeed, I think that would be an encoding RNN (using the terminology of this blog post).
Getting a full encoding-decoding RNN takes a bit more work.

view this post on Zulip Pim de Haan (Nov 08 2022 at 10:45):

Thanks, that makes sense!
Then is there an elegant way of thinking about the fold map [A]×MM[A] \times M \to M?

view this post on Zulip Andrew Dudzik (Nov 08 2022 at 11:07):

So let's use RR as above for the type of inputs, and let AA be a monoid we use to handle updates. I think we should have a set map RAR\to A, which is equivalent to a monoid map [R]A[R]\to A, that interprets inputs as updates. This is basically giving a presentation of AA using RR as a set of generators.

This separates things into two concerns: 1) What is the structure of the "update algebra"? and 2) Which module/representation of that algebra are we using? So for example, in the special case of invertible updates, this is like giving a presentation of a group and asking about its representations. (though if we want to work over vector spaces we should expand our group to a Hopf algebra)

view this post on Zulip Pim de Haan (Nov 08 2022 at 12:17):

Thanks!

view this post on Zulip Bruno Gavranović (Nov 09 2022 at 11:57):

Indeed, I think there's a general pattern hiding in initial algebras / final coalgebras that describes the structurally recursive nature of many neural networks.
For the encoding RNN, as you say, we need a map f:X×SSf : X \times S \to S that takes an input, a state and produces a new state. Together with an initial state s0:1Ss_0 : 1 \to S this forms a map [s0,f]:1+X×SS[s_0, f] : 1 + X \times S \to S which can be thought of as the algebra for the 1+X×:CC1 + X \times - : \mathcal{C} \to \mathcal{C} endofunctor.
Then the initial algebra of it describes lists, i.e. a map that takes any number of XX's and recursively feeds them into ff. image.png

In a dual way, we can get decoding RNNs by using final coalgebras. By following the description in the blog post, this arises out of the endofunctor Y×:CCY \times - : \mathcal{C} \to \mathcal{C}, giving us streams.

With general RNNs it gets a bit tricky, since they don't arise as a composition of these two concepts like the blog post suggests. But it can still be done using the Kleisli category of the state monad.

view this post on Zulip Bruno Gavranović (Nov 09 2022 at 12:00):

Of course, what we also want is for each of these individual maps to be parametric, i.e. take the same parameter at every time step. This takes special care in unpacking and making sure it does what we want, but can essentially be handled using the Para\mathbf{Para} construction (paper).

view this post on Zulip Andrew Dudzik (Nov 09 2022 at 12:26):

It's worth adding for pedagogical purposes that "algebras for an endofunctor" and "algebras for a monad" mean slightly different things, as the latter has coherence axioms while the former doesn't. (this is one of many reasons I don't like the term "algebras" in this context)

But the two definitions are related: an algebra for an endofunctor T\mathtt{T} is equivalently an algebra for the free monad generated by T\mathtt{T}.