Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: applied category theory

Topic: an involution on 'Learn'


view this post on Zulip Mitchell Riley (Apr 18 2021 at 01:57):

A while ago I spotted something about learners; I'll record it here in the chance someone finds it useful.

The 'Backprop as Functor' paper defines learners ABA \nrightarrow B as tuples (P,I:P×AB,U:P×A×BP,r:P×A×BA)(P, I : P \times A \to B, U : P \times A \times B \to P, r : P \times A \times B \to A), up to an equivalence: two learners are equivalent if there is a bijection PPP \to P' so that the components of the two learners are equal up to this map. They mention that you might want to consider learners equal if there's just a surjection PPP \to P', but if you go all the way to considering learners equivalent if there's any map PPP \to P' that commutes with the components, something interesting happens.

This relation is now exactly a coend over SetSet (EDIT: no it isn't, the coend relation is slightly different on the U component), and the set of learners ABA \nrightarrow B is given by

PSet(P×A,B)×Set(P×A×B,P)×Set(P×A×B,A)\int^{P} Set(P \times A, B) \times Set(P \times A \times B, P) \times Set(P \times A \times B, A)

Learners and lenses are closely related, and we can use the same old moves to write the body of the coend in an 'optic' style:

PSet(P×A,B)×Set(P×A×B,P)×Set(P×A×B,A)PSet(P×A,B)×Set(P×A×B,P×A)P,QSet(P×A,Q)×Set(P×A,B)×Set(Q×B,P×A)P,QSet(P×A,Q×B)×Set(Q×B,P×A)\begin{aligned} &\int^{P} Set(P \times A, B) \times Set(P \times A \times B, P) \times Set(P \times A \times B, A) \\ & \cong \int^{P} Set(P \times A, B) \times Set(P \times A \times B, P \times A) \\ & \cong \int^{P, Q} Set(P \times A, Q) \times Set(P \times A, B) \times Set(Q \times B, P \times A) \\ & \cong \int^{P, Q} Set(P \times A, Q \times B) \times Set(Q \times B, P \times A) \end{aligned}

I think this symmetrical description is quite nice. There's now an obvious involution LearnLearnopLearn \to Learn^{op}: just switch the two functions in the coend! Written out using the original definition of a learner, the 'adjoint' of a ABA \nrightarrow B given by (P,I,U,r)(P, I, U, r) is a learner BAB \nrightarrow A with parameter set P×AP \times A, and

I:(P×A)×BAI(p,a,b)=r(p,a,b)U:(P×A)×B×A(P×A)U(p,a,b,a)=(U(p,a,b),a)r:(P×A)×B×ABr(p,a,b,a)=I(U(p,a,b),a)\begin{aligned} I^\dagger : (P \times A) \times B &\to A \\ I^\dagger(p, a, b) &= r(p, a, b) \\ U^\dagger : (P \times A) \times B \times A &\to (P \times A) \\ U^\dagger(p, a', b, a) &= (U(p, a', b), a) \\ r^\dagger : (P \times A) \times B \times A &\to B \\ r^\dagger(p, a', b, a) &= I(U(p, a', b), a) \end{aligned}

If you do this twice, you end up with a learner ABA \nrightarrow B with parameter set P×A×BP \times A \times B, which is the original learner again, but running 'one training datum behind': when you give it a new bit of data, it updates PP using the remembered data and stores the new A×BA \times B for next time. To show this is equivalent to the original learner, we need a map P×A×BPP \times A \times B \to P that commutes with the components, and for this we have UU.

I haven't kept up with the latest on learners, does anyone have a sense of whether this notion of equivalence is too strong to be useful?

view this post on Zulip Javier Prieto (Apr 18 2021 at 13:29):

Is there an intuitive way to see the learner BAB\nrightarrow A? The parameter space is a bit confusing. It's like having a neural net trained on MNIST that carries a picture around and uses it for inference.

view this post on Zulip Jules Hedges (Apr 18 2021 at 13:30):

Oh, I need to tag @Bruno Gavranovic here, pretty sure he's been working with this exact definition of 2-cells with the coend, and talked with David about it

view this post on Zulip Jules Hedges (Apr 18 2021 at 13:32):

I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out

view this post on Zulip Mitchell Riley (Apr 21 2021 at 02:03):

I guess you need to use whatever intuition there is for the request function to understand what BAB \nrightarrow A is doing. In the backprop example, the original request function nudges the provided AA towards something that gives the output BB. So right, the adjoint learner has a remembered AA, and given any BB, the 'inferred' AA is produced by nudging the remembered AA in the direction of that BB. I don't have a good feel for this either...

view this post on Zulip Mitchell Riley (Apr 21 2021 at 02:07):

Jules Hedges said:

I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out

What question were you thinking about? Whether there is an involution on Game? Or whether you need a stronger notion of Game equivalence?

view this post on Zulip Javier Prieto (Apr 21 2021 at 10:31):

Mitchell Riley said:

So right, the adjoint learner has a remembered AA, and given any BB, the 'inferred' AA is produced by nudging the remembered AA in the direction of that BB. I don't have a good feel for this either...

This vaguely reminds me of style transfer. Say your learner is a neural net trained to identify artistic styles. Then the adjoint learner would keep a painting in memory and, when given a style, output that memorized painting in this new style.

I'm being deliberately imprecise about what constitutes a style here because my understanding is that this technique doesn't work with simple labels (ie you don't give the algorithm a painting and a tag like "impressionism") but with more complex features at intermediate layers.

view this post on Zulip Jules Hedges (Apr 21 2021 at 10:34):

Mitchell Riley said:

Jules Hedges said:

I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out

What question were you thinking about? Whether there is an involution on Game? Or whether you need a stronger notion of Game equivalence?

I wrote an entire paper, "The game semantics of game theory" (https://arxiv.org/abs/1904.11287), about an analogy between game theory and game semantics via open games where all of the small-p players in an open game collectively make up the capital-P Player, and the environment makes up the capital-O Opponent. So it seemed like a natural idea to try to get a duality switching the Player and Opponent. But I was never able to make it work

view this post on Zulip Jules Hedges (Apr 21 2021 at 10:36):

Game equivalence has always been something I've mostly ignored because it's far too difficult - equivalence even of ordinary normal form games is a gigantic can of worms, and adding openness to the mixture seems overwhelming. But, for what it's worth, recently I've convinced myself that a coend-based equivalence is probably a good idea

view this post on Zulip Jules Hedges (Apr 21 2021 at 10:42):

This involution definitely looks "right" to me, ie. part of the structure that ought to be considered as part of the category. Whether it transfers from learners also to games I'm not sure, but I'll definitely try to work it out

view this post on Zulip Jules Hedges (Apr 21 2021 at 10:44):

(The extra context here is that open learners and open games should be considered more or less different names for the same thing, nearly all distinctions between them should vanish when both of them are done right)

view this post on Zulip Bruno Gavranović (Apr 21 2021 at 11:26):

Mitchell Riley said:

P,QSet(P×A,Q×B)×Set(Q×B,P×A)\begin{aligned} \int^{P, Q} Set(P \times A, Q \times B) \times Set(Q \times B, P \times A) \end{aligned}

I think this symmetrical description is quite nice. There's now an obvious involution LearnLearnopLearn \to Learn^{op}: just switch the two functions in the coend! Written out using the original definition of a learner, the 'adjoint' of a ABA \nrightarrow B given by (P,I,U,r)(P, I, U, r) is a learner BAB \nrightarrow A with parameter set P×AP \times A...

It's surprising that we get a learner with parameter space P×AP \times A out of this, but this might be because I'm not sure what you mean by "switch the two functions in the coend"? Do you mean op^{op} them, or rebind the variables, or something different?
I see in the elaboration that the new learner's implementation uses the request of the original learner, and the request is the only map that has BAB \to A in it, so this is pretty neat

view this post on Zulip Jules Hedges (Apr 21 2021 at 11:43):

It means rebind the variables: P,QSet(P×A,Q×B)×Set(Q×B,P×A)=Q,PSet(Q×B,P×A)×Set(P×A,Q×B)\int^{P,Q} Set (P \times A, Q \times B) \times Set (Q \times B, P \times A) = \int^{Q, P} Set (Q \times B, P \times A) \times Set (P \times A, Q \times B)

view this post on Zulip Mitchell Riley (Apr 21 2021 at 12:09):

Right, so in terms of a representative, (f,g)(g,f)(f,g) \mapsto (g, f).

view this post on Zulip Mitchell Riley (Apr 21 2021 at 12:41):

Bruno Gavranovic said:

It's surprising that we get a learner with parameter space P×AP \times A out of this

The P×AP \times A as the parameter set comes from the Yoneda reduction step in that string of isomorphisms. We do the reduction on P×AP \times A to create QQ, and then when we do the swap, the P×AP \times A becomes the new 'PP'

view this post on Zulip Jules Hedges (Apr 21 2021 at 12:51):

I've been playing with this and I'm 90% sure that you can get more than a dagger structure with this trick: you can get a self-dual compact closed structure for which your dagger is the transpose

view this post on Zulip Jules Hedges (Apr 21 2021 at 12:53):

The counits εA:AA1\varepsilon_A : A \otimes A \to 1 has P=1P = 1 and are already present in the category of learners. The units ηA:1AA\eta_A : 1 \to A \otimes A have P=AP = A. Since transpose is built as a composite, and composites of learners multiply the parameters, this would "explain" why you multiply the parameter type by AA

view this post on Zulip Eigil Rischel (Apr 21 2021 at 13:24):

Note that the map P\emptyset \to P always commutes with the structure for trivial reasons, so the coend

PSet(P×A,B)×Set(P×A×B,P)×Set(P×A×B,A)\int^{P} Set(P \times A, B) \times Set(P \times A \times B, P) \times Set(P \times A \times B, A)

is always a singleton given by the unique learner with P=P = \emptyset

view this post on Zulip Mitchell Riley (Apr 21 2021 at 13:45):

Oops, I screwed up in jumping from "function that commutes with the structure" to "coend relation". But we're not hosed, the coend relation for UU to be related to UU' actually requires that there's a function V:P×A×BPV : P' \times A \times B \to P such that U=V(f×A×B)U = V \circ (f \times A \times B) and U=fVU' = f \circ V. And then the empty set doesn't break things.

view this post on Zulip Mitchell Riley (Apr 21 2021 at 14:06):

Jules Hedges said:

Game equivalence has always been something I've mostly ignored because it's far too difficult - equivalence even of ordinary normal form games is a gigantic can of worms, and adding openness to the mixture seems overwhelming. But, for what it's worth, recently I've convinced myself that a coend-based equivalence is probably a good idea

Right, I definitely didn't mean some complicated game-theoretic equivalence. Do you have thoughts on what that coend-y equivalence would look like? I find it confusing that open games are related to these other things that can be described using coends/profunctors, but then open games are only contravariant in the strategy set. Is it possible there is some missing covariant data?

view this post on Zulip Matteo Capucci (he/him) (Apr 21 2021 at 16:39):

Hi Mitchell, that's a very cool observation! I've been talking about this with @Bruno Gavranovic today and we have a nice diagrammatic picture of what's happening

view this post on Zulip Matteo Capucci (he/him) (Apr 21 2021 at 16:40):

The \dagger operation can be seen as the following diagram manipulation:

  1. Connect the bottom P wire with the top P wire
  2. Slide the bottom part of the lens along this

We conjecture that the 'loop' you draw is a delayed trace, since when we slide the bottom part along it, it remembers A

view this post on Zulip Matteo Capucci (he/him) (Apr 21 2021 at 16:40):

In pictures, this is the lens you start with (plus the loop): lens-delayed-trace.jpg

view this post on Zulip Matteo Capucci (he/him) (Apr 21 2021 at 16:41):

Sliding the bottom part along the loop yields: lens-delayed-trace2.jpg

view this post on Zulip Matteo Capucci (he/him) (Apr 21 2021 at 16:41):

which one can rearrange as lens-delayed-trace3.jpg

view this post on Zulip Matteo Capucci (he/him) (Apr 21 2021 at 16:41):

which is the \dagger of the first!

view this post on Zulip Matteo Capucci (he/him) (Apr 21 2021 at 16:44):

I think it'd be nice to write up something about this and the recent work we've been doing at Strathclyde. There's seems to be a very useful structure on parametrized optics, which we take as the fundamental building blocks of cybernetic systems (e.g. learners and games)

view this post on Zulip Mitchell Riley (Apr 22 2021 at 00:01):

Cool! I regret using the \dagger symbol, this construction still works if the objects have distinct input and output ports (or whatever you call them), so a learner is given by (P,I:P×AB,U:P×A×BP,r:P×A×BA)(P, I : P \times A \to B, U : P \times A \times B' \to P, r : P \times A \times B' \to A'). But then the objects are not self-dual as Jules was describing, and the swap on (A,A)(B,B)(A, A') \nrightarrow (B, B') gives a learner (B,B)(A,A)(B', B) \nrightarrow (A', A).

view this post on Zulip Matteo Capucci (he/him) (Apr 22 2021 at 09:28):

Absolutely

view this post on Zulip Matteo Capucci (he/him) (Apr 22 2021 at 09:28):

Also for parametrized mixed optics

view this post on Zulip Jules Hedges (Apr 22 2021 at 09:34):

Yeah, I noticed that when I tried to generalise it to open games, and then that was the hint to me that it's actually a compact closure

view this post on Zulip Jules Hedges (Apr 22 2021 at 09:36):

Matteo is referring to the generalisation I guessed/conjectured yesterday when I wrote it in terms of the machinery we're heavily working on at Strathclyde: If CC is any symmetric monoidal category then Para(Optic(C))Para (Optic (C)) is a compact closed bicategory

view this post on Zulip Mitchell Riley (Apr 22 2021 at 11:38):

Is a 2-cell in that bicategory an "optic between the parameter sets"? I'm not seeing how you would get an invertible 2-cell between a learner and its double dual. Or maybe that's not the right thing to expect?

view this post on Zulip Jules Hedges (Apr 22 2021 at 11:41):

Yes. And the 2-cell you need isn't invertible, which is what makes this kind of subtle. At some point in the tower of abstractions you end up taking a coend over Optic, which quotients out arbitrary non-invertible optics in either direction. In the case of "classic" learners things are simpler

view this post on Zulip Mitchell Riley (Apr 22 2021 at 11:44):

Is the definition of ParaPara you are using written down somewhere? I am definitely missing something.

view this post on Zulip Jules Hedges (Apr 22 2021 at 11:44):

Coends over the category of optics are something we only started thinking about recently, they're a bit mind-bending

view this post on Zulip Jules Hedges (Apr 22 2021 at 11:46):

Not written down yet. It's quite a simple definition - this is @Bruno Gavranovic's definition, although it's simple and natural enough that it's probably hiding somewhere in the literature. For any monoidal category CC there is a category Para(C)Para (C) whose objects are the same as CC, morphisms XYX \to Y are pairs of an object MM and a morphism MXYM \otimes X \to Y, and 2-cells (M,f)(N,g)(M, f) \to (N, g) are morphisms MNM \to N making the triangle commute. That's the direct definition, and it can be characterised in some fun abstract ways too

view this post on Zulip Jules Hedges (Apr 22 2021 at 11:47):

So the category that's called Para in Backprop As Functor is Para of [Euclidean spaces, smooth maps] with invertible 2-cells quotiented out

view this post on Zulip Jules Hedges (Apr 22 2021 at 11:48):

I shouldn't say too much about this stuff in public, otherwise we'll have to scramble even faster to get a paper out...

view this post on Zulip Bruno Gavranović (Apr 22 2021 at 12:08):

Mitchell Riley said:

Is the definition of ParaPara you are using written down somewhere? I am definitely missing something.

Yes it is - it's written here https://arxiv.org/abs/2103.01931

view this post on Zulip Bruno Gavranović (Apr 22 2021 at 12:13):

And indeed, figuring out what the right notion of 2-cells for Learners is a good question. A thing appearing in Backprop as Functor (and the "2-cells" in the coend you wrote) is not the thing you get when you construct Para(Optic(C))\mathbf{Para(Optic}(\mathcal{C})). But Para(Optic(C))\mathbf{Para(Optic}(\mathcal{C})) makes more sense coming from actual machine learning.

2-cells in Para(Optic(C))\mathbf{Para(Optic}(\mathcal{C})) are maps in Optic(C)\mathbf{Optic(\mathcal{C})} and 2-cells in Learn\mathbf{Learn} are maps in C\mathcal{C}. The former satisfies this triangle commuting condition that Jules wrote and the latter satisfies this condition coming out of a coend.

It turns out that if you take optics as 2-cells, you get a lot of neat stuff pertaining to gradient descent. I still don't know of a good example in machine learning where Learn\mathbf{Learn} 2-cells are useful

view this post on Zulip Bruno Gavranović (Apr 22 2021 at 12:27):

P.S. I still have to properly internalize this involution, it's a very interesting idea!

The way I've unpacked it for myself is that a learner is a pair of two maps PAMBP \otimes A \to M \otimes B and MBPAM \otimes B \to P \otimes A. Then what you're proposing is figuring out a learner going from BAB \to A. The only way we can do that is to use MM as the parameter space (and in lenses M=P×AM = P \times A), so that's where AA appears in the parameter space).

view this post on Zulip Bruno Gavranović (Apr 22 2021 at 12:28):

But somehow the idea of the double involution, i.e. "a learner should remember previous inputs in its parameter space" doesn't immediately sit right with me, for reasons I'm not sure how to articulate yet.

view this post on Zulip Mitchell Riley (Apr 22 2021 at 12:34):

Right, that's the definition of Para I was thinking of. But Jules I don't see where a coend over Optic would come into it...

view this post on Zulip Mitchell Riley (Apr 22 2021 at 12:41):

I should stress that I screwed up my description of the coend relation in my initial message, as pointed out by Eigil, and so the coend notion of 2-cell is not the same thing as the Learn notion of 2-cell.

view this post on Zulip Mitchell Riley (Apr 22 2021 at 12:46):

(Also, I'm not sure we would actually want to use the coend relation as un-quotiented cells)

view this post on Zulip Mitchell Riley (Apr 23 2021 at 03:27):

One more little observation: if CC is a symmetric monoidal category and DD is a compact closed category, then it looks like any symmetric monoidal functor F:CDF : C \to D extends to a compact closed functor F~:CoendLearner(C)D\tilde F : CoendLearner(C) \to D by sending an object (A,A)(A, A') to AAA \otimes A'^* and a representative (f,g):(A,A)(B,B)(f, g) : (A,A') \nrightarrow (B,B') to Screen-Shot-2021-04-23-at-1.04.54-pm.png