Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: event: Categories for AI

Topic: Guest Lecture 1: Neural network layers as parametric spans


view this post on Zulip Petar Veličković (Nov 09 2022 at 16:41):

Dear Attendees,
With the core series of five lectures finishing, it is now time to dive into our exciting guest lectures!

The first guest lecture will be given by @Pietro Vertechi, titled: "Neural network layers as parametric spans"

The abstract for Pietro's lecture is as follows:
Properties such as composability and automatic differentiation made artificial neural networks a pervasive tool in applications. Tackling more challenging problems caused neural networks to progressively become more complex and thus difficult to define from a mathematical perspective. In this talk, we will discuss a general definition of linear layer arising from a categorical framework based on the notions of integration theory and parametric spans. This definition generalizes and encompasses classical layers (e.g., dense, convolutional), while guaranteeing existence and computability of the layer's derivatives for backpropagation.

Pietro's guest lecture will take place in the usual slot, next week (Monday 14 November, starting 4PM UK Time). The lecture will be given on Zoom and live-streamed on YouTube, just as before (the Zoom link should be the same as in previous weeks, but we will confirm the details in advance of the lectures).

This guest lecture will help explain key parts of Neural network layers as parametric spans (Bergomi and Vertechi, SYCO 9).

Lastly, on behalf of the entire organising team of Cats4AI :cat: , I'd like to thank you all for actively engaging with the course so far! :blush:
I'm sure I can speak for all five of us when I say that this was such a daunting but extremely valuable experience: for several of us it was the first time presenting these concepts to such a diverse audience, but seeing you all engaging with the content (whether it be on Zulip, Zoom, or otherwise) made it all the more worthwhile! :boom:
We will be sure to send out a feedback form in the future, to get a better feel on what could have been done better (for future years? :) )

view this post on Zulip Pim de Haan (Nov 10 2022 at 14:58):

The public link is https://uva-live.zoom.us/j/83816139841 (same as before)
The talk will be live-streamed to https://youtu.be/83a-MwlDy6s

view this post on Zulip Bruno Gavranović (Nov 14 2022 at 16:31):

Thoughts during the lecture: it looks like there's a correspondence between Propositions 1. and 2. in Pietro's talk and the Definition 3.5.2.16. in the Categorical Systems Theory book.

view this post on Zulip Bruno Gavranović (Nov 14 2022 at 16:33):

An in fact, Pietro does say that this can be interpreted as a general lens :grothendieck:

view this post on Zulip Petar Veličković (Nov 14 2022 at 16:57):

Fantastic talk @Pietro Vertechi (I watched over YouTube as I was in the office :) )

view this post on Zulip Ieva Cepaite (Nov 14 2022 at 16:59):

Yes! It was very interesting @Pietro Vertechi, I especially enjoyed your animated illustrations - made everything instantly intuitive :)

view this post on Zulip Bruno Gavranović (Nov 14 2022 at 17:05):

The idea of permuting the legs of a span to compute the backward pass reminds me of how you'd implement this differentiation in terms of einsum. Turns out the derivative can be implemented by permuting the indices.

view this post on Zulip Pietro Vertechi (Nov 25 2022 at 14:29):

@Bruno Gavranovic , I'm a bit late to the party, but I realized that this can be a very helpful comparison (thinking of discrete parametric spans as a generalized Einstein summation). Many layers can be implemented that way (well, I'm not sure the second one is accepted in einsum, but it's a useful notation)

y[j] += x[i] * w[i, j] # dense
y[j, k] += x[i, k-l] * w[i, j, l] # convolutional

The general discrete parametric span is of the form

y[t(p)] += x[s(p)] * w[π(p)]

where p lives in some generalize index space $E$.