Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: event: Categories for AI

Topic: The simple essence of automatic differentiation

Dan Oneață (Oct 30 2022 at 20:57):

Hello! :wave: I have a question regarding The Simple Essence of Automatic Differentiation by Conal Elliott. While the paper is not part of the course's curriculum, it is quite related to the topics discussed here, so I hope it's not too off-topic to ask here. To give some context, the paper generalizes automatic differentiation such that when instantiated to a given category k it recovers various flavours of automatic differentiation algorithms (e.g., setting k to Cont yields reverse-mode automatic differentiation).

My question is the following: The paper claims that Cont (Section 12) results in fully left-associative computations, but is that the case for Dual (Section 13) as well? The paper seems to suggest that yes:

[W]e can turn the Cont and Dual techniques around to yield category transformers that perform full right- instead of left-association.

But it appears to me that it's not the case, since based on the definition in Figure 10, Dual just reverses the composition of the original computation. For example, if we were given an expression of the form

(f . g) . ((h . i) . k)

applying the D differentiation functor (corresponding to $D_{\text{Dual}_{\rightarrow}}$ in the paper, which I think maps linear maps to their adjoint) we would obtain:

D ((f . g) . ((h . i) . k))
= { definition of composition: D (f . g) = D g . D f }
(D k . (D i . D h)) . (D g . D f)

which is not left-associative.

What am I missing?

Niklas Schmitz (Oct 31 2022 at 21:48):

Hi there! I also found this paper quite intriguing. I can't say I've understood all details of it so far but perhaps this helps:
In section 12 the third paragraph seems crucial for this and onwards:

Given any category k, we can represent its morphisms by the intent to left-compose with some to-be-given
morphism h. That is, represent f :: a ‘k‘ b by the function (◦ f ) :: (b ‘k‘ r ) → (a ‘k‘ r ), where r is any object in
k. The morphism h will be a continuation, finishing the journey from f all the way to the codomain of the
overall function being assembled. Building a category around this idea results in converting all composition
patterns into fully left-associated form. This trick is akin to conversion to continuation-passing style [Reynolds,
1972; Appel, 2007; Kennedy, 2007]. [...]

Niklas Schmitz (Oct 31 2022 at 22:09):

The arxiv version of the paper (https://arxiv.org/abs/1804.00746) also contains an appendix with proofs & calculations which I found helpful to see the stated specifications and definitions in action, in particular when proving functoriality of cont and asDual

Dan Oneață (Oct 31 2022 at 22:55):

Niklas Schmitz said:

Hi there! I also found this paper quite intriguing. I can't say I've understood all details of it so far but perhaps this helps:
In section 12 the third paragraph seems crucial for this and onwards:

Thanks, Niklas! But my confusion is about the left-associativity resulting from Dual (and not that of Cont, which is the subject of Section 12). I understand that

Cont yields left-associative computations; and that
Dual is obtained from Cont by applying the asDual functor,
but it's not obvious to me why asDual preserves left associativity?

Dan Oneață (Oct 31 2022 at 22:57):

By the way, regarding terminology, someone can correct me if I'm wrong, but I think that

cont (with result type $\mathbb{R}$ ) corresponds to the functor that carries a vector space to its dual;
asDual . cont is the adjoint (or transpose) functor between Euclidean spaces.

(And I believe that the adjoint functor doesn't result in left associated compositions.)

Bruno Gavranović (Nov 03 2022 at 09:56):

You're right, I believe Dual just takes a transpose, and doesn't change the association. In Category Theoretic terms, that seems to be a dagger structure on a category.

Dan Oneață said:

applying the D differentiation functor (corresponding to $D_{\text{Dual}_{\rightarrow^+}}$ in the paper, which I think maps linear maps to their adjoint)

The Dual functor is the one that maps linear maps to their adjoint. The $D_{\text{Dual}_{\rightarrow^+}}$ functor maps a differentiable (potentially non-linear) map $a \to b$ to the map $a \to b \times (a \text{ Dual}_{\rightarrow^+} b)$ (as per Figure 6.) Since Dual transposes the computation, the previous expression is equal to $a \to b \times (b \rightarrow^+ a)$ (I've added the $^+$ superscript as the paper does, since these are additive functions).

Bruno Gavranović (Nov 03 2022 at 10:01):

Interestingly, the language of categories doesn't capture the information of which way the functions are associated. This is because of the associativity law which tells us that all of these are equal. To fully capture the issue of "which way are we associating matrix multiplication" we need 2-categories, where we the morphism between morphisms (i.e. a 2-cell) could describe rebracketings.

Bruno Gavranović (Nov 03 2022 at 10:08):

Looking at the paper again, I'm also realising I don't understand what the objects in the category $\textrm{Cont}_k^r$ are. It looks like objects are pairs $(a, f)$ , where $a$ is an object in our category, and $f$ is some map $a \to r$ (for some chosen $r$ ). But that's just a slice category $\mathcal{C}/R$ , and the paper's usage of the word "continuation" has me thinking otherwise.

Dan Oneață (Nov 03 2022 at 10:51):

Thanks a lot @Bruno Gavranovic for taking the time to understand my question and answer it!

So, if $\textrm{Dual}$ doesn't change the association, is it precise to say that backpropagation is just $D_{\textrm{Dual}_{\rightarrow}}$ , as indicated on slide 24 / 44 of this presentation? As far as I understand, backpropagation is an instance of reverse-mode automatic differentiation, which in turn implies left associated expressions.

Dan Oneață (Nov 03 2022 at 10:54):

Bruno Gavranovic said:

Looking at the paper again, I'm also realising I don't understand what the objects in the category $\textrm{Cont}_k^r$ are. It looks like objects are pairs $(a, f)$ , where $a$ is an object in our category, and $f$ is some map $a \to r$ (for some chosen $r$ ). But that's just a slice category $\mathcal{C}/R$ , and the paper's usage of the word "continuation" has me thinking otherwise.

In my mind $\textrm{Cont}_k^r$ was corresponding to the Hom functor, so the target category was something like $\textrm{Set}$ , with $\textrm{Cont}_k^r$ mapping an object $a$ (from the source category $k$ ) to the set of functions from $a \to r$ . I'll think about your suggestion as well; thanks!

N.B.: I got confused when to say $\textrm{cont}$ or $\textrm{Cont}_k^r$ :smile: The first one should be the functor, but in the paper it's defined only on functions (and cannot be applied on objects) and the notation is not capturing the extra information (source category and return type)...

Bruno Gavranović (Nov 04 2022 at 16:29):

Hmm, I might've been confused. I see what you mean. You're saying that the duality on vector spaces (the functor $\mathsf{Vect}_{\mathbb{R}} \to \mathsf{Vect}_{\mathbb{R}}^{op}$ ) is defined by applying the continuation functor to $\mathbb{R}$ .

I think that's right! From that it must follow that the vector space duality reverses the order of associativity (since Cont does). But I've never heard people describe it that way :thinking:

As a matter of fact, I have to revisit the idea that $\mathbf{Cont}$ somehow rebrackets the way functions are associated. Do you know of a simple example?

Bruno Gavranović (Nov 04 2022 at 16:31):

Dan Oneață said:

Bruno Gavranovic said:

Looking at the paper again, I'm also realising I don't understand what the objects in the category $\textrm{Cont}_k^r$ are. It looks like objects are pairs $(a, f)$ , where $a$ is an object in our category, and $f$ is some map $a \to r$ (for some chosen $r$ ). But that's just a slice category $\mathcal{C}/R$ , and the paper's usage of the word "continuation" has me thinking otherwise.

In my mind $\textrm{Cont}_k^r$ was corresponding to the Hom functor, so the target category was something like $\textrm{Set}$ , with $\textrm{Cont}_k^r$ mapping an object $a$ (from the source category $k$ ) to the set of functions from $a \to r$ . I'll think about your suggestion as well; thanks!

N.B.: I got confused when to say $\textrm{cont}$ or $\textrm{Cont}_k^r$ :smile: The first one should be the functor, but in the paper it's defined only on functions (and cannot be applied on objects) and the notation is not capturing the extra information (source category and return type)...

It's confusing because the paper treats $\textrm{Cont}_k^r$ as a category, but really defines it as a functor.

Dan Oneață (Nov 04 2022 at 20:19):

Bruno Gavranovic said:

As a matter of fact, I have to revisit the idea that $\mathbf{Cont}$ somehow rebrackets the way functions are associated. Do you know of a simple example?

I can tell how I was thinking about this, but now after I've written down my thoughts, I'm not sure anymore that it's correct :smile:

Let's take an expression in Set:

(f . g) . (h . i)

image.png

The cont functor replaces each function with its (post)composition and "mirrors" the abstract syntax tree:

((- . i) . (- . h)) . ((- . g) . (- . f))

To kick off the computation we apply this expression to the identity function:

((- . i) . (- . h)) . ((- . g) . (- . f)) $ id

image.png

To reduce the expression, we make repeated use of the following transformation:

(φ . ψ) $ x = φ $ ψ $ x

and get:

((- . i) . (- . h)) . ((- . g) . (- . f)) $ id
  =
((- . i) . (- . h)) $ ((- . g) . (- . f)) $ id
  =
(- . i) $ (- . h) $ ((- . g) . (- . f)) $ id
  =
(- . i) $ (- . h) $ (- . g) $ (- . f) $ id

or visually:

image.png

Finally, by substitution we obtain the fully left-associated expression:

(((id . f) . g) . h) . i

Now what I didn't realize at first (and makes me circumspect about my reasoning) is that it appears that no matter how a composition is associated in Set, the computation will always start from right to left (because composition is a lambda abstraction and it won't reduce?!). So, even for the expression above (which is fully left associated), when given an argument we will start from the right: first apply the function i, then h, then g, and so on. I guess that discussing association matters only when we work on matrices (data) and not functions (lambda abstractions)?!

Bruno Gavranović (Nov 06 2022 at 15:10):

Thanks for unpacking this @Dan Oneață ! I just wanted to let you know that I've seen this, but haven't yet had the time to properly digest it. (It looks like there's a lot of nuance here)

It certainly feels like it's on to something, and it's making me take a closer look at continuation passing style. People seem to acknowledge that CPS forces an order of computation, but this doesn't seem to have been stated in terms of (2-)category theory. The closest thing I found was the paper Continuation-Passing Style, Defunctionalization, Accumulations, and Associativity (which is on 1-categorical level) but that too will take me a while to parse!

In general, if indeed we can force an order of computation, I'd expect to find a statement about relating the morphisms $(f \circ g) \circ h : A \to D$ and $f \circ (g \circ h) : A \to D$ (perhaps in the image of the cont functor) using 2-cells.

Dan Oneață (Nov 06 2022 at 19:44):

Sure; do let me know if you reach a conclusion :smile: You are of course right: the discussion on the order of computation doesn't make much sense in the standard categorical language. Unfortunately, I don't yet have a more sophisticated mental model than this (like the 2-category machinery that you mention). And thanks for reminding me of Jeremy's paper; I've gone through his slides at some point, but I didn't make the connection to the current questions.