Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: applied category theory

Topic: an involution on 'Learn'

Mitchell Riley (Apr 18 2021 at 01:57):

A while ago I spotted something about learners; I'll record it here in the chance someone finds it useful.

The 'Backprop as Functor' paper defines learners $A \nrightarrow B$ as tuples $(P, I : P \times A \to B, U : P \times A \times B \to P, r : P \times A \times B \to A)$ , up to an equivalence: two learners are equivalent if there is a bijection $P \to P'$ so that the components of the two learners are equal up to this map. They mention that you might want to consider learners equal if there's just a surjection $P \to P'$ , but if you go all the way to considering learners equivalent if there's any map $P \to P'$ that commutes with the components, something interesting happens.

~~This relation is now exactly a coend over $Set$~~ (EDIT: no it isn't, the coend relation is slightly different on the U component), and the set of learners $A \nrightarrow B$ is given by

$\int^{P} Set(P \times A, B) \times Set(P \times A \times B, P) \times Set(P \times A \times B, A)$

Learners and lenses are closely related, and we can use the same old moves to write the body of the coend in an 'optic' style:

$\begin{aligned} &\int^{P} Set(P \times A, B) \times Set(P \times A \times B, P) \times Set(P \times A \times B, A) \\ & \cong \int^{P} Set(P \times A, B) \times Set(P \times A \times B, P \times A) \\ & \cong \int^{P, Q} Set(P \times A, Q) \times Set(P \times A, B) \times Set(Q \times B, P \times A) \\ & \cong \int^{P, Q} Set(P \times A, Q \times B) \times Set(Q \times B, P \times A) \end{aligned}$

I think this symmetrical description is quite nice. There's now an obvious involution $Learn \to Learn^{op}$ : just switch the two functions in the coend! Written out using the original definition of a learner, the 'adjoint' of a $A \nrightarrow B$ given by $(P, I, U, r)$ is a learner $B \nrightarrow A$ with parameter set $P \times A$ , and

$\begin{aligned} I^\dagger : (P \times A) \times B &\to A \\ I^\dagger(p, a, b) &= r(p, a, b) \\ U^\dagger : (P \times A) \times B \times A &\to (P \times A) \\ U^\dagger(p, a', b, a) &= (U(p, a', b), a) \\ r^\dagger : (P \times A) \times B \times A &\to B \\ r^\dagger(p, a', b, a) &= I(U(p, a', b), a) \end{aligned}$

If you do this twice, you end up with a learner $A \nrightarrow B$ with parameter set $P \times A \times B$ , which is the original learner again, but running 'one training datum behind': when you give it a new bit of data, it updates $P$ using the remembered data and stores the new $A \times B$ for next time. To show this is equivalent to the original learner, we need a map $P \times A \times B \to P$ that commutes with the components, and for this we have $U$ .

I haven't kept up with the latest on learners, does anyone have a sense of whether this notion of equivalence is too strong to be useful?

Javier Prieto (Apr 18 2021 at 13:29):

Is there an intuitive way to see the learner $B\nrightarrow A$ ? The parameter space is a bit confusing. It's like having a neural net trained on MNIST that carries a picture around and uses it for inference.

Jules Hedges (Apr 18 2021 at 13:30):

Oh, I need to tag @Bruno Gavranovic here, pretty sure he's been working with this exact definition of 2-cells with the coend, and talked with David about it

Jules Hedges (Apr 18 2021 at 13:32):

I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out

Mitchell Riley (Apr 21 2021 at 02:03):

I guess you need to use whatever intuition there is for the request function to understand what $B \nrightarrow A$ is doing. In the backprop example, the original request function nudges the provided $A$ towards something that gives the output $B$ . So right, the adjoint learner has a remembered $A$ , and given any $B$ , the 'inferred' $A$ is produced by nudging the remembered $A$ in the direction of that $B$ . I don't have a good feel for this either...

Mitchell Riley (Apr 21 2021 at 02:07):

Jules Hedges said:

I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out

What question were you thinking about? Whether there is an involution on Game? Or whether you need a stronger notion of Game equivalence?

Javier Prieto (Apr 21 2021 at 10:31):

Mitchell Riley said:

So right, the adjoint learner has a remembered $A$ , and given any $B$ , the 'inferred' $A$ is produced by nudging the remembered $A$ in the direction of that $B$ . I don't have a good feel for this either...

This vaguely reminds me of style transfer. Say your learner is a neural net trained to identify artistic styles. Then the adjoint learner would keep a painting in memory and, when given a style, output that memorized painting in this new style.

I'm being deliberately imprecise about what constitutes a style here because my understanding is that this technique doesn't work with simple labels (ie you don't give the algorithm a painting and a tag like "impressionism") but with more complex features at intermediate layers.

Jules Hedges (Apr 21 2021 at 10:34):

Mitchell Riley said:

Jules Hedges said:

I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out

What question were you thinking about? Whether there is an involution on Game? Or whether you need a stronger notion of Game equivalence?

I wrote an entire paper, "The game semantics of game theory" (https://arxiv.org/abs/1904.11287), about an analogy between game theory and game semantics via open games where all of the small-p players in an open game collectively make up the capital-P Player, and the environment makes up the capital-O Opponent. So it seemed like a natural idea to try to get a duality switching the Player and Opponent. But I was never able to make it work

Jules Hedges (Apr 21 2021 at 10:36):

Game equivalence has always been something I've mostly ignored because it's far too difficult - equivalence even of ordinary normal form games is a gigantic can of worms, and adding openness to the mixture seems overwhelming. But, for what it's worth, recently I've convinced myself that a coend-based equivalence is probably a good idea

Jules Hedges (Apr 21 2021 at 10:42):

This involution definitely looks "right" to me, ie. part of the structure that ought to be considered as part of the category. Whether it transfers from learners also to games I'm not sure, but I'll definitely try to work it out

Jules Hedges (Apr 21 2021 at 10:44):

(The extra context here is that open learners and open games should be considered more or less different names for the same thing, nearly all distinctions between them should vanish when both of them are done right)

Bruno Gavranović (Apr 21 2021 at 11:26):

Mitchell Riley said:

$\begin{aligned} \int^{P, Q} Set(P \times A, Q \times B) \times Set(Q \times B, P \times A) \end{aligned}$

I think this symmetrical description is quite nice. There's now an obvious involution $Learn \to Learn^{op}$ : just switch the two functions in the coend! Written out using the original definition of a learner, the 'adjoint' of a $A \nrightarrow B$ given by $(P, I, U, r)$ is a learner $B \nrightarrow A$ with parameter set $P \times A$ ...

It's surprising that we get a learner with parameter space $P \times A$ out of this, but this might be because I'm not sure what you mean by "switch the two functions in the coend"? Do you mean $^{op}$ them, or rebind the variables, or something different?
I see in the elaboration that the new learner's implementation uses the request of the original learner, and the request is the only map that has $B \to A$ in it, so this is pretty neat

Jules Hedges (Apr 21 2021 at 11:43):

It means rebind the variables: $\int^{P,Q} Set (P \times A, Q \times B) \times Set (Q \times B, P \times A) = \int^{Q, P} Set (Q \times B, P \times A) \times Set (P \times A, Q \times B)$

Mitchell Riley (Apr 21 2021 at 12:09):

Right, so in terms of a representative, $(f,g) \mapsto (g, f)$ .

Mitchell Riley (Apr 21 2021 at 12:41):

Bruno Gavranovic said:

It's surprising that we get a learner with parameter space $P \times A$ out of this

The $P \times A$ as the parameter set comes from the Yoneda reduction step in that string of isomorphisms. We do the reduction on $P \times A$ to create $Q$ , and then when we do the swap, the $P \times A$ becomes the new ' $P$ '

Jules Hedges (Apr 21 2021 at 12:51):

I've been playing with this and I'm 90% sure that you can get more than a dagger structure with this trick: you can get a self-dual compact closed structure for which your dagger is the transpose

Jules Hedges (Apr 21 2021 at 12:53):

The counits $\varepsilon_A : A \otimes A \to 1$ has $P = 1$ and are already present in the category of learners. The units $\eta_A : 1 \to A \otimes A$ have $P = A$ . Since transpose is built as a composite, and composites of learners multiply the parameters, this would "explain" why you multiply the parameter type by $A$

Eigil Rischel (Apr 21 2021 at 13:24):

Note that the map $\emptyset \to P$ always commutes with the structure for trivial reasons, so the coend

$\int^{P} Set(P \times A, B) \times Set(P \times A \times B, P) \times Set(P \times A \times B, A)$

is always a singleton given by the unique learner with $P = \emptyset$

Mitchell Riley (Apr 21 2021 at 13:45):

Oops, I screwed up in jumping from "function that commutes with the structure" to "coend relation". But we're not hosed, the coend relation for $U$ to be related to $U'$ actually requires that there's a function $V : P' \times A \times B \to P$ such that $U = V \circ (f \times A \times B)$ and $U' = f \circ V$ . And then the empty set doesn't break things.

Mitchell Riley (Apr 21 2021 at 14:06):

Jules Hedges said:

Game equivalence has always been something I've mostly ignored because it's far too difficult - equivalence even of ordinary normal form games is a gigantic can of worms, and adding openness to the mixture seems overwhelming. But, for what it's worth, recently I've convinced myself that a coend-based equivalence is probably a good idea

Right, I definitely didn't mean some complicated game-theoretic equivalence. Do you have thoughts on what that coend-y equivalence would look like? I find it confusing that open games are related to these other things that can be described using coends/profunctors, but then open games are only contravariant in the strategy set. Is it possible there is some missing covariant data?

Matteo Capucci (he/him) (Apr 21 2021 at 16:39):

Hi Mitchell, that's a very cool observation! I've been talking about this with @Bruno Gavranovic today and we have a nice diagrammatic picture of what's happening

Matteo Capucci (he/him) (Apr 21 2021 at 16:40):

The $\dagger$ operation can be seen as the following diagram manipulation:

Connect the bottom P wire with the top P wire
Slide the bottom part of the lens along this

We conjecture that the 'loop' you draw is a delayed trace, since when we slide the bottom part along it, it remembers A

Matteo Capucci (he/him) (Apr 21 2021 at 16:40):

In pictures, this is the lens you start with (plus the loop): lens-delayed-trace.jpg

Matteo Capucci (he/him) (Apr 21 2021 at 16:41):

Sliding the bottom part along the loop yields: lens-delayed-trace2.jpg

Matteo Capucci (he/him) (Apr 21 2021 at 16:41):

which one can rearrange as lens-delayed-trace3.jpg

Matteo Capucci (he/him) (Apr 21 2021 at 16:41):

which is the $\dagger$ of the first!

Matteo Capucci (he/him) (Apr 21 2021 at 16:44):

I think it'd be nice to write up something about this and the recent work we've been doing at Strathclyde. There's seems to be a very useful structure on parametrized optics, which we take as the fundamental building blocks of cybernetic systems (e.g. learners and games)

Mitchell Riley (Apr 22 2021 at 00:01):

Cool! I regret using the $\dagger$ symbol, this construction still works if the objects have distinct input and output ports (or whatever you call them), so a learner is given by $(P, I : P \times A \to B, U : P \times A \times B' \to P, r : P \times A \times B' \to A')$ . But then the objects are not self-dual as Jules was describing, and the swap on $(A, A') \nrightarrow (B, B')$ gives a learner $(B', B) \nrightarrow (A', A)$ .

Matteo Capucci (he/him) (Apr 22 2021 at 09:28):

Absolutely

Matteo Capucci (he/him) (Apr 22 2021 at 09:28):

Also for parametrized mixed optics

Jules Hedges (Apr 22 2021 at 09:34):

Yeah, I noticed that when I tried to generalise it to open games, and then that was the hint to me that it's actually a compact closure

Jules Hedges (Apr 22 2021 at 09:36):

Matteo is referring to the generalisation I guessed/conjectured yesterday when I wrote it in terms of the machinery we're heavily working on at Strathclyde: If $C$ is any symmetric monoidal category then $Para (Optic (C))$ is a compact closed bicategory

Mitchell Riley (Apr 22 2021 at 11:38):

Is a 2-cell in that bicategory an "optic between the parameter sets"? I'm not seeing how you would get an invertible 2-cell between a learner and its double dual. Or maybe that's not the right thing to expect?

Jules Hedges (Apr 22 2021 at 11:41):

Yes. And the 2-cell you need isn't invertible, which is what makes this kind of subtle. At some point in the tower of abstractions you end up taking a coend over Optic, which quotients out arbitrary non-invertible optics in either direction. In the case of "classic" learners things are simpler

Mitchell Riley (Apr 22 2021 at 11:44):

Is the definition of $Para$ you are using written down somewhere? I am definitely missing something.

Jules Hedges (Apr 22 2021 at 11:44):

Coends over the category of optics are something we only started thinking about recently, they're a bit mind-bending

Jules Hedges (Apr 22 2021 at 11:46):

Not written down yet. It's quite a simple definition - this is @Bruno Gavranovic's definition, although it's simple and natural enough that it's probably hiding somewhere in the literature. For any monoidal category $C$ there is a category $Para (C)$ whose objects are the same as $C$ , morphisms $X \to Y$ are pairs of an object $M$ and a morphism $M \otimes X \to Y$ , and 2-cells $(M, f) \to (N, g)$ are morphisms $M \to N$ making the triangle commute. That's the direct definition, and it can be characterised in some fun abstract ways too

Jules Hedges (Apr 22 2021 at 11:47):

So the category that's called Para in Backprop As Functor is Para of [Euclidean spaces, smooth maps] with invertible 2-cells quotiented out

Jules Hedges (Apr 22 2021 at 11:48):

I shouldn't say too much about this stuff in public, otherwise we'll have to scramble even faster to get a paper out...

Bruno Gavranović (Apr 22 2021 at 12:08):

Mitchell Riley said:

Is the definition of $Para$ you are using written down somewhere? I am definitely missing something.

Yes it is - it's written here https://arxiv.org/abs/2103.01931

Bruno Gavranović (Apr 22 2021 at 12:13):

And indeed, figuring out what the right notion of 2-cells for Learners is a good question. A thing appearing in Backprop as Functor (and the "2-cells" in the coend you wrote) is not the thing you get when you construct $\mathbf{Para(Optic}(\mathcal{C}))$ . But $\mathbf{Para(Optic}(\mathcal{C}))$ makes more sense coming from actual machine learning.

2-cells in $\mathbf{Para(Optic}(\mathcal{C}))$ are maps in $\mathbf{Optic(\mathcal{C})}$ and 2-cells in $\mathbf{Learn}$ are maps in $\mathcal{C}$ . The former satisfies this triangle commuting condition that Jules wrote and the latter satisfies this condition coming out of a coend.

It turns out that if you take optics as 2-cells, you get a lot of neat stuff pertaining to gradient descent. I still don't know of a good example in machine learning where $\mathbf{Learn}$ 2-cells are useful

Bruno Gavranović (Apr 22 2021 at 12:27):

P.S. I still have to properly internalize this involution, it's a very interesting idea!

The way I've unpacked it for myself is that a learner is a pair of two maps $P \otimes A \to M \otimes B$ and $M \otimes B \to P \otimes A$ . Then what you're proposing is figuring out a learner going from $B \to A$ . The only way we can do that is to use $M$ as the parameter space (and in lenses $M = P \times A$ ), so that's where $A$ appears in the parameter space).

Bruno Gavranović (Apr 22 2021 at 12:28):

But somehow the idea of the double involution, i.e. "a learner should remember previous inputs in its parameter space" doesn't immediately sit right with me, for reasons I'm not sure how to articulate yet.

Mitchell Riley (Apr 22 2021 at 12:34):

Right, that's the definition of Para I was thinking of. But Jules I don't see where a coend over Optic would come into it...

Mitchell Riley (Apr 22 2021 at 12:41):

I should stress that I screwed up my description of the coend relation in my initial message, as pointed out by Eigil, and so the coend notion of 2-cell is not the same thing as the Learn notion of 2-cell.

Mitchell Riley (Apr 22 2021 at 12:46):

(Also, I'm not sure we would actually want to use the coend relation as un-quotiented cells)

Mitchell Riley (Apr 23 2021 at 03:27):

One more little observation: if $C$ is a symmetric monoidal category and $D$ is a compact closed category, then it looks like any symmetric monoidal functor $F : C \to D$ extends to a compact closed functor $\tilde F : CoendLearner(C) \to D$ by sending an object $(A, A')$ to $A \otimes A'^*$ and a representative $(f, g) : (A,A') \nrightarrow (B,B')$ to Screen-Shot-2021-04-23-at-1.04.54-pm.png