Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: event: ACT20

Topic: July 9: Paul Wilson et al.'s talk

Paolo Perrone (Jul 01 2020 at 20:47):

Hello all! This is the thread of discussion for the talk of Paul Wilson and Fabio Zanasi, "Reverse Derivative Ascent: A Categorical Approach to Learning Boolean Circuits".
Date and time: Thursday July 9, 12:35 UTC.
Zoom meeting: https://mit.zoom.us/j/7055345747
YouTube live stream: https://www.youtube.com/watch?v=aXpjE82yy-8&list=PLCOXjXDLt3pZDHGYOIqtg1m1lLOURjl1Q

Paolo Perrone (Jul 09 2020 at 12:30):

This talk will start in 5 minutes!

David Jaz (Jul 09 2020 at 13:02):

Hi Paul, I enjoyed your talk! I'm wondering if you're going to be in a jitsi room today?

Filip Buric (Jul 09 2020 at 13:04):

Hi! Cool work! I was just confused about why you chose "0" and "1" from the MNIST set. I understand you need to flatten these images to binary, but isn't it the same for all digits? As shapes, 0 and 1 are far simpler than the others and subject to less human ambiguity, so it would be cool to see performance on more tricky digits :)

Bruno Gavranović (Jul 09 2020 at 13:05):

Thanks for the interesting talk Paul! I've been thinking about related ideas and I'd love to have a chat. I'll add some questions here in the meantime:

Why is this ascent, as opposed to descent? I'm not sure I understood it from the talk Edit: After giving it a bit more thought, I'm realizing we're in $\mathbb{Z}_2$ and the set of operations we can do were is pretty limited. As you said, addition makes a lot of sense!
Is there a corresponding notion of nonlinearities here, that is present in standard real-valued neural networks? To clarify: composition of two linear maps in usual smooth spaces is closed under composition. The addition of non-linearities is what allows us to get more complex operations. I'm wondering does this issue manifest here in $\mathbb{Z}_2$ in any way.

Geoff Cruttwell (Jul 09 2020 at 13:33):

Bruno Gavranovic said:

Is there a corresponding notion of nonlinearities here, that is present in standard real-valued neural networks? To clarify: composition of two linear maps in usual smooth spaces is closed under composition. The addition of non-linearities is what allows us to get more complex operations. I'm wondering does this issue manifest here in $\mathbb{Z}_2$ in any way.

Any reverse derivative category has an inherent notion of "linear map" (which one can prove are closed under composition) so since they've shown that Boolean circuits form a reverse derivative category, one gets for free some notion of "linear" Boolean circuits. I don't know what they look like, though!

Geoff Cruttwell (Jul 09 2020 at 13:54):

Just thinking out loud here...following up on the above, any reverse derivative category has 3 things one can derive from it:

A "forward derivative operation", which takes a $f: A \to B$ and gives a $D[f]: A \times A \to B$ (as opposed to the reverse derivative, which is of type $A \times B \to A$ )
As noted above, a notion of "linear map"
A dagger operation on the linear maps.
Paul, have you thought about what any of these look like for Boolean circuits?

Bruno Gavranović (Jul 09 2020 at 14:04):

Geoff Cruttwell said:

Any reverse derivative category has an inherent notion of "linear map" (which one can prove are closed under composition) so since they've shown that Boolean circuits form a reverse derivative category, one gets for free some notion of "linear" Boolean circuits. I don't know what they look like, though!

Perhaps I should have clarified my question a bit more. I mean that the notion of non-linearities is crucial to learning complex functions, since they're defined in terms of these simpler linear + nonlinear constituents. What I am not clear here is whether this is also the case for boolean circuits. What are the examples of maps that are linear? What are the examples of nonlinear maps?
I guess you're saying that you're not aware of the linear ones, while for me the whole space of $\mathbb{Z}_2$ is a bit strange to think about in these terms :grinning:

Geoff Cruttwell (Jul 09 2020 at 15:06):

Bruno Gavranovic said:

Geoff Cruttwell said:

Any reverse derivative category has an inherent notion of "linear map" (which one can prove are closed under composition) so since they've shown that Boolean circuits form a reverse derivative category, one gets for free some notion of "linear" Boolean circuits. I don't know what they look like, though!

Perhaps I should have clarified my question a bit more. I mean that the notion of non-linearities is crucial to learning complex functions, since they're defined in terms of these simpler linear + nonlinear constituents. What I am not clear here is whether this is also the case for boolean circuits. What are the examples of maps that are linear? What are the examples of nonlinear maps?

Sorry, yes, I should have been a bit clearer that my thought above didn't actually give an answer to your question of which boolean circuits are linear or non-linear :wink: ; the general theory just provides an abstract definition of linearity which needs to be worked out in this particular case.

Whatever they work out to be, though, from the general theory they will have the same properties as ordinary linear functions, eg., they will be very simple with respect to both the forward and reverse derivative, and hence should be simple with respect to reverse derivative ascent.

I guess you're saying that you're not aware of the linear ones, while for me the whole space of $\mathbb{Z}_2$ is a bit strange to think about in these terms :grinning:

Oh definitely, for me too! But that's part of what's neat about what they've done: by putting things in this abstract framework, one doesn't have to guess as to what the right notion of linearity in this setting should be; the abstraction provides a definition for you.

Bruno Gavranović (Jul 09 2020 at 17:04):

Ah, yes, I see what you mean. Certainly there's an option to sit down and work out what these functions are but I thought I could skip that step by asking here :grinning:

Paolo Perrone (Jul 09 2020 at 17:33):

Here's the video!
https://www.youtube.com/watch?v=gdinMKXW86g&list=PLCOXjXDLt3pYot9VNdLlZqGajHyZUywdI

Paul Wilson (Jul 09 2020 at 20:09):

Bruno Gavranovic said:

Ah, yes, I see what you mean. Certainly there's an option to sit down and work out what these functions are but I thought I could skip that step by asking here :grinning:

So I think Geoff answered this from the theoretical perspective- graphically for $\mathrm{Poly}_{\mathbb{Z}_2}$ these are just the maps without white generators (Geoff please correct me if I'm wrong!)

It sounds like you're more interested in whether the final 'learned function' (i.e., the map $f(\theta, -) : \mathbb{Z}^a \to \mathbb{Z}^b$ ) can be "doing something interesting", right?
If so, then yes: the circuit example on the second slide is actually a special case of a model which can learn any boolean map $\mathbb{Z}^a \to \mathbb{Z}^b$ , by simply remembering the set of all input/output pairs. IOW: You can construct a "lookup table" circuit where the parameters are the "data" of the lookup table, and the inputs are the "address" of each value in the lookup table. Hopefully that answers your question and makes sense!

JS PL (he/him) (Jul 09 2020 at 20:14):

This super cool, I'm happy to learn about Boolean differential calculus and great to see that it gives a CRDC!
Simple question since I'm new to Boolean differential calculus: what is the formula for the forward derivative? (which of course can be derived from the reverse derivative but I'm having a bit of trouble with the definition of the derivative in Boolean differential calculus) So say for an arbitrary function $f: \mathbb{Z}^n_2 \to \mathbb{Z}_2$ what is its derivative $D[f]: \mathbb{Z}^n_2 \times \mathbb{Z}^n_2 \to \mathbb{Z}_2$ ?

JS PL (he/him) (Jul 09 2020 at 20:17):

And as a follow up question, does this work for any finite field? Or only $\mathbb{Z}_2$ ?

JS PL (he/him) (Jul 10 2020 at 12:28):

Another simple question: could you explain how the chain rule works in terms of Boolean functions?

JS PL (he/him) (Jul 10 2020 at 13:05):

JS Pacaud Lemay said:

Another simple question: could you explain how the chain rule works in terms of Boolean functions?

Ah I think I've worked it out. It really depends on the fact that $f: \mathbb{Z}^n_2 \to \mathbb{Z}$ can only be either the zero element or unit element of the field (which seems like an obvious statement!) and also that if $f(x) \neq f(y)$ then $f(x) + 1 = f(y)$ . Which is probably super obvious for those who work with Boolean functions! So I guess this probably doesn't easily extend to other finite fields.

JS PL (he/him) (Jul 10 2020 at 14:26):

JS Pacaud Lemay said:

for an arbitrary function $f: \mathbb{Z}^n_2 \to \mathbb{Z}_2$ what is its derivative $D[f]: \mathbb{Z}^n_2 \times \mathbb{Z}^n_2 \to \mathbb{Z}_2$ ?

I believe I've worked this out to be: $Df = \sum \limits^n_i f(x)y_i + f(x + e_i)y_i$ , hopefully this is correct!