You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Hello! :wave: I have a question regarding The Simple Essence of Automatic Differentiation by Conal Elliott. While the paper is not part of the course's curriculum, it is quite related to the topics discussed here, so I hope it's not too off-topic to ask here. To give some context, the paper generalizes automatic differentiation such that when instantiated to a given category k
it recovers various flavours of automatic differentiation algorithms (e.g., setting k
to Cont
yields reverse-mode automatic differentiation).
My question is the following: The paper claims that Cont
(Section 12) results in fully left-associative computations, but is that the case for Dual
(Section 13) as well? The paper seems to suggest that yes:
[W]e can turn the
Cont
andDual
techniques around to yield category transformers that perform full right- instead of left-association.
But it appears to me that it's not the case, since based on the definition in Figure 10, Dual
just reverses the composition of the original computation. For example, if we were given an expression of the form
(f . g) . ((h . i) . k)
applying the D
differentiation functor (corresponding to in the paper, which I think maps linear maps to their adjoint) we would obtain:
D ((f . g) . ((h . i) . k))
= { definition of composition: D (f . g) = D g . D f }
(D k . (D i . D h)) . (D g . D f)
which is not left-associative.
What am I missing?
Hi there! I also found this paper quite intriguing. I can't say I've understood all details of it so far but perhaps this helps:
In section 12 the third paragraph seems crucial for this and onwards:
Given any category k, we can represent its morphisms by the intent to left-compose with some to-be-given
morphism h. That is, represent f :: a ‘k‘ b by the function (◦ f ) :: (b ‘k‘ r ) → (a ‘k‘ r ), where r is any object in
k. The morphism h will be a continuation, finishing the journey from f all the way to the codomain of the
overall function being assembled. Building a category around this idea results in converting all composition
patterns into fully left-associated form. This trick is akin to conversion to continuation-passing style [Reynolds,
1972; Appel, 2007; Kennedy, 2007]. [...]
The arxiv version of the paper (https://arxiv.org/abs/1804.00746) also contains an appendix with proofs & calculations which I found helpful to see the stated specifications and definitions in action, in particular when proving functoriality of cont
and asDual
Niklas Schmitz said:
Hi there! I also found this paper quite intriguing. I can't say I've understood all details of it so far but perhaps this helps:
In section 12 the third paragraph seems crucial for this and onwards:
Thanks, Niklas! But my confusion is about the left-associativity resulting from Dual
(and not that of Cont
, which is the subject of Section 12). I understand that
Cont
yields left-associative computations; and thatDual
is obtained from Cont
by applying the asDual
functor,asDual
preserves left associativity?By the way, regarding terminology, someone can correct me if I'm wrong, but I think that
cont
(with result type ) corresponds to the functor that carries a vector space to its dual;asDual . cont
is the adjoint (or transpose) functor between Euclidean spaces.(And I believe that the adjoint functor doesn't result in left associated compositions.)
You're right, I believe Dual
just takes a transpose, and doesn't change the association. In Category Theoretic terms, that seems to be a dagger structure on a category.
Dan Oneață said:
applying the
D
differentiation functor (corresponding to in the paper, which I think maps linear maps to their adjoint)
The Dual
functor is the one that maps linear maps to their adjoint. The functor maps a differentiable (potentially non-linear) map to the map (as per Figure 6.) Since Dual
transposes the computation, the previous expression is equal to (I've added the superscript as the paper does, since these are additive functions).
Interestingly, the language of categories doesn't capture the information of which way the functions are associated. This is because of the associativity law which tells us that all of these are equal. To fully capture the issue of "which way are we associating matrix multiplication" we need 2-categories, where we the morphism between morphisms (i.e. a 2-cell) could describe rebracketings.
Looking at the paper again, I'm also realising I don't understand what the objects in the category are. It looks like objects are pairs , where is an object in our category, and is some map (for some chosen ). But that's just a slice category , and the paper's usage of the word "continuation" has me thinking otherwise.
Thanks a lot @Bruno Gavranovic for taking the time to understand my question and answer it!
So, if doesn't change the association, is it precise to say that backpropagation is just , as indicated on slide 24 / 44 of this presentation? As far as I understand, backpropagation is an instance of reverse-mode automatic differentiation, which in turn implies left associated expressions.
Bruno Gavranovic said:
Looking at the paper again, I'm also realising I don't understand what the objects in the category are. It looks like objects are pairs , where is an object in our category, and is some map (for some chosen ). But that's just a slice category , and the paper's usage of the word "continuation" has me thinking otherwise.
In my mind was corresponding to the Hom functor, so the target category was something like , with mapping an object (from the source category ) to the set of functions from . I'll think about your suggestion as well; thanks!
N.B.: I got confused when to say or :smile: The first one should be the functor, but in the paper it's defined only on functions (and cannot be applied on objects) and the notation is not capturing the extra information (source category and return type)...
Hmm, I might've been confused. I see what you mean. You're saying that the duality on vector spaces (the functor ) is defined by applying the continuation functor to .
I think that's right! From that it must follow that the vector space duality reverses the order of associativity (since Cont does). But I've never heard people describe it that way :thinking:
As a matter of fact, I have to revisit the idea that somehow rebrackets the way functions are associated. Do you know of a simple example?
Dan Oneață said:
Bruno Gavranovic said:
Looking at the paper again, I'm also realising I don't understand what the objects in the category are. It looks like objects are pairs , where is an object in our category, and is some map (for some chosen ). But that's just a slice category , and the paper's usage of the word "continuation" has me thinking otherwise.
In my mind was corresponding to the Hom functor, so the target category was something like , with mapping an object (from the source category ) to the set of functions from . I'll think about your suggestion as well; thanks!
N.B.: I got confused when to say or :smile: The first one should be the functor, but in the paper it's defined only on functions (and cannot be applied on objects) and the notation is not capturing the extra information (source category and return type)...
It's confusing because the paper treats as a category, but really defines it as a functor.
Bruno Gavranovic said:
As a matter of fact, I have to revisit the idea that somehow rebrackets the way functions are associated. Do you know of a simple example?
I can tell how I was thinking about this, but now after I've written down my thoughts, I'm not sure anymore that it's correct :smile:
Let's take an expression in Set:
(f . g) . (h . i)
The cont
functor replaces each function with its (post)composition and "mirrors" the abstract syntax tree:
((- . i) . (- . h)) . ((- . g) . (- . f))
To kick off the computation we apply this expression to the identity function:
((- . i) . (- . h)) . ((- . g) . (- . f)) $ id
To reduce the expression, we make repeated use of the following transformation:
(φ . ψ) $ x = φ $ ψ $ x
and get:
((- . i) . (- . h)) . ((- . g) . (- . f)) $ id
=
((- . i) . (- . h)) $ ((- . g) . (- . f)) $ id
=
(- . i) $ (- . h) $ ((- . g) . (- . f)) $ id
=
(- . i) $ (- . h) $ (- . g) $ (- . f) $ id
or visually:
Finally, by substitution we obtain the fully left-associated expression:
(((id . f) . g) . h) . i
Now what I didn't realize at first (and makes me circumspect about my reasoning) is that it appears that no matter how a composition is associated in Set, the computation will always start from right to left (because composition is a lambda abstraction and it won't reduce?!). So, even for the expression above (which is fully left associated), when given an argument we will start from the right: first apply the function i
, then h
, then g
, and so on. I guess that discussing association matters only when we work on matrices (data) and not functions (lambda abstractions)?!
Thanks for unpacking this @Dan Oneață ! I just wanted to let you know that I've seen this, but haven't yet had the time to properly digest it. (It looks like there's a lot of nuance here)
It certainly feels like it's on to something, and it's making me take a closer look at continuation passing style. People seem to acknowledge that CPS forces an order of computation, but this doesn't seem to have been stated in terms of (2-)category theory. The closest thing I found was the paper Continuation-Passing Style, Defunctionalization, Accumulations, and Associativity (which is on 1-categorical level) but that too will take me a while to parse!
In general, if indeed we can force an order of computation, I'd expect to find a statement about relating the morphisms and (perhaps in the image of the cont
functor) using 2-cells.
Sure; do let me know if you reach a conclusion :smile: You are of course right: the discussion on the order of computation doesn't make much sense in the standard categorical language. Unfortunately, I don't yet have a more sophisticated mental model than this (like the 2-category machinery that you mention). And thanks for reminding me of Jeremy's paper; I've gone through his slides at some point, but I didn't make the connection to the current questions.