You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
A while ago I spotted something about learners; I'll record it here in the chance someone finds it useful.
The 'Backprop as Functor' paper defines learners as tuples , up to an equivalence: two learners are equivalent if there is a bijection so that the components of the two learners are equal up to this map. They mention that you might want to consider learners equal if there's just a surjection , but if you go all the way to considering learners equivalent if there's any map that commutes with the components, something interesting happens.
This relation is now exactly a coend over (EDIT: no it isn't, the coend relation is slightly different on the U component), and the set of learners is given by
Learners and lenses are closely related, and we can use the same old moves to write the body of the coend in an 'optic' style:
I think this symmetrical description is quite nice. There's now an obvious involution : just switch the two functions in the coend! Written out using the original definition of a learner, the 'adjoint' of a given by is a learner with parameter set , and
If you do this twice, you end up with a learner with parameter set , which is the original learner again, but running 'one training datum behind': when you give it a new bit of data, it updates using the remembered data and stores the new for next time. To show this is equivalent to the original learner, we need a map that commutes with the components, and for this we have .
I haven't kept up with the latest on learners, does anyone have a sense of whether this notion of equivalence is too strong to be useful?
Is there an intuitive way to see the learner ? The parameter space is a bit confusing. It's like having a neural net trained on MNIST that carries a picture around and uses it for inference.
Oh, I need to tag @Bruno Gavranovic here, pretty sure he's been working with this exact definition of 2-cells with the coend, and talked with David about it
I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out
I guess you need to use whatever intuition there is for the request function to understand what is doing. In the backprop example, the original request function nudges the provided towards something that gives the output . So right, the adjoint learner has a remembered , and given any , the 'inferred' is produced by nudging the remembered in the direction of that . I don't have a good feel for this either...
Jules Hedges said:
I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out
What question were you thinking about? Whether there is an involution on Game? Or whether you need a stronger notion of Game equivalence?
Mitchell Riley said:
So right, the adjoint learner has a remembered , and given any , the 'inferred' is produced by nudging the remembered in the direction of that . I don't have a good feel for this either...
This vaguely reminds me of style transfer. Say your learner is a neural net trained to identify artistic styles. Then the adjoint learner would keep a painting in memory and, when given a style, output that memorized painting in this new style.
I'm being deliberately imprecise about what constitutes a style here because my understanding is that this technique doesn't work with simple labels (ie you don't give the algorithm a painting and a tag like "impressionism") but with more complex features at intermediate layers.
Mitchell Riley said:
Jules Hedges said:
I should think about this involution.... I asked almost exactly the same question about open games at some point but never was able to make it work out
What question were you thinking about? Whether there is an involution on Game? Or whether you need a stronger notion of Game equivalence?
I wrote an entire paper, "The game semantics of game theory" (https://arxiv.org/abs/1904.11287), about an analogy between game theory and game semantics via open games where all of the small-p players in an open game collectively make up the capital-P Player, and the environment makes up the capital-O Opponent. So it seemed like a natural idea to try to get a duality switching the Player and Opponent. But I was never able to make it work
Game equivalence has always been something I've mostly ignored because it's far too difficult - equivalence even of ordinary normal form games is a gigantic can of worms, and adding openness to the mixture seems overwhelming. But, for what it's worth, recently I've convinced myself that a coend-based equivalence is probably a good idea
This involution definitely looks "right" to me, ie. part of the structure that ought to be considered as part of the category. Whether it transfers from learners also to games I'm not sure, but I'll definitely try to work it out
(The extra context here is that open learners and open games should be considered more or less different names for the same thing, nearly all distinctions between them should vanish when both of them are done right)
Mitchell Riley said:
I think this symmetrical description is quite nice. There's now an obvious involution : just switch the two functions in the coend! Written out using the original definition of a learner, the 'adjoint' of a given by is a learner with parameter set ...
It's surprising that we get a learner with parameter space out of this, but this might be because I'm not sure what you mean by "switch the two functions in the coend"? Do you mean them, or rebind the variables, or something different?
I see in the elaboration that the new learner's implementation uses the request of the original learner, and the request is the only map that has in it, so this is pretty neat
It means rebind the variables:
Right, so in terms of a representative, .
Bruno Gavranovic said:
It's surprising that we get a learner with parameter space out of this
The as the parameter set comes from the Yoneda reduction step in that string of isomorphisms. We do the reduction on to create , and then when we do the swap, the becomes the new ''
I've been playing with this and I'm 90% sure that you can get more than a dagger structure with this trick: you can get a self-dual compact closed structure for which your dagger is the transpose
The counits has and are already present in the category of learners. The units have . Since transpose is built as a composite, and composites of learners multiply the parameters, this would "explain" why you multiply the parameter type by
Note that the map always commutes with the structure for trivial reasons, so the coend
is always a singleton given by the unique learner with
Oops, I screwed up in jumping from "function that commutes with the structure" to "coend relation". But we're not hosed, the coend relation for to be related to actually requires that there's a function such that and . And then the empty set doesn't break things.
Jules Hedges said:
Game equivalence has always been something I've mostly ignored because it's far too difficult - equivalence even of ordinary normal form games is a gigantic can of worms, and adding openness to the mixture seems overwhelming. But, for what it's worth, recently I've convinced myself that a coend-based equivalence is probably a good idea
Right, I definitely didn't mean some complicated game-theoretic equivalence. Do you have thoughts on what that coend-y equivalence would look like? I find it confusing that open games are related to these other things that can be described using coends/profunctors, but then open games are only contravariant in the strategy set. Is it possible there is some missing covariant data?
Hi Mitchell, that's a very cool observation! I've been talking about this with @Bruno Gavranovic today and we have a nice diagrammatic picture of what's happening
The operation can be seen as the following diagram manipulation:
We conjecture that the 'loop' you draw is a delayed trace, since when we slide the bottom part along it, it remembers A
In pictures, this is the lens you start with (plus the loop): lens-delayed-trace.jpg
Sliding the bottom part along the loop yields: lens-delayed-trace2.jpg
which one can rearrange as lens-delayed-trace3.jpg
which is the of the first!
I think it'd be nice to write up something about this and the recent work we've been doing at Strathclyde. There's seems to be a very useful structure on parametrized optics, which we take as the fundamental building blocks of cybernetic systems (e.g. learners and games)
Cool! I regret using the symbol, this construction still works if the objects have distinct input and output ports (or whatever you call them), so a learner is given by . But then the objects are not self-dual as Jules was describing, and the swap on gives a learner .
Absolutely
Also for parametrized mixed optics
Yeah, I noticed that when I tried to generalise it to open games, and then that was the hint to me that it's actually a compact closure
Matteo is referring to the generalisation I guessed/conjectured yesterday when I wrote it in terms of the machinery we're heavily working on at Strathclyde: If is any symmetric monoidal category then is a compact closed bicategory
Is a 2-cell in that bicategory an "optic between the parameter sets"? I'm not seeing how you would get an invertible 2-cell between a learner and its double dual. Or maybe that's not the right thing to expect?
Yes. And the 2-cell you need isn't invertible, which is what makes this kind of subtle. At some point in the tower of abstractions you end up taking a coend over Optic, which quotients out arbitrary non-invertible optics in either direction. In the case of "classic" learners things are simpler
Is the definition of you are using written down somewhere? I am definitely missing something.
Coends over the category of optics are something we only started thinking about recently, they're a bit mind-bending
Not written down yet. It's quite a simple definition - this is @Bruno Gavranovic's definition, although it's simple and natural enough that it's probably hiding somewhere in the literature. For any monoidal category there is a category whose objects are the same as , morphisms are pairs of an object and a morphism , and 2-cells are morphisms making the triangle commute. That's the direct definition, and it can be characterised in some fun abstract ways too
So the category that's called Para in Backprop As Functor is Para of [Euclidean spaces, smooth maps] with invertible 2-cells quotiented out
I shouldn't say too much about this stuff in public, otherwise we'll have to scramble even faster to get a paper out...
Mitchell Riley said:
Is the definition of you are using written down somewhere? I am definitely missing something.
Yes it is - it's written here https://arxiv.org/abs/2103.01931
And indeed, figuring out what the right notion of 2-cells for Learners is a good question. A thing appearing in Backprop as Functor (and the "2-cells" in the coend you wrote) is not the thing you get when you construct . But makes more sense coming from actual machine learning.
2-cells in are maps in and 2-cells in are maps in . The former satisfies this triangle commuting condition that Jules wrote and the latter satisfies this condition coming out of a coend.
It turns out that if you take optics as 2-cells, you get a lot of neat stuff pertaining to gradient descent. I still don't know of a good example in machine learning where 2-cells are useful
P.S. I still have to properly internalize this involution, it's a very interesting idea!
The way I've unpacked it for myself is that a learner is a pair of two maps and . Then what you're proposing is figuring out a learner going from . The only way we can do that is to use as the parameter space (and in lenses ), so that's where appears in the parameter space).
But somehow the idea of the double involution, i.e. "a learner should remember previous inputs in its parameter space" doesn't immediately sit right with me, for reasons I'm not sure how to articulate yet.
Right, that's the definition of Para I was thinking of. But Jules I don't see where a coend over Optic would come into it...
I should stress that I screwed up my description of the coend relation in my initial message, as pointed out by Eigil, and so the coend notion of 2-cell is not the same thing as the Learn notion of 2-cell.
(Also, I'm not sure we would actually want to use the coend relation as un-quotiented cells)
One more little observation: if is a symmetric monoidal category and is a compact closed category, then it looks like any symmetric monoidal functor extends to a compact closed functor by sending an object to and a representative to Screen-Shot-2021-04-23-at-1.04.54-pm.png