Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: event: Categorical Probability and Statistics 2020 workshop

Topic: Tutorial: Probabilistic programming (Sam Staton)


view this post on Zulip Paolo Perrone (Jun 01 2020 at 19:32):

Hello! Here's Sam Staton's tutorial video, a categorical tutorial about probabilistic programming.
https://youtu.be/JimCpEG0nts
Any questions about the video go in this thread!

view this post on Zulip Tobias Fritz (Jun 01 2020 at 21:32):

That is a great talk, Sam! I have a few questions if you don't mind.

view this post on Zulip Sam Staton (Jun 02 2020 at 14:32):

Hi Tobias!

Tobias Fritz said:
Could you elaborate a bit on why you prefer thinking of the/a category of probability kernels as a multicategory rather than a symmetric monoidal category?

What are the main differences between different probabilistic programming languages? How much do they differ with respect to which kernels can be defined?

I'll just list some:

With the question about whether s-finite kernels form a Kleisli category, are you considering all measurable spaces as objects? Or only a subclass like Polish spaces?

view this post on Zulip Tobias Fritz (Jun 02 2020 at 16:31):

Very interesting, thanks! Yes, I see the point about the de Finetti theorem having more of a multicategorical flavour.

Out of curiosity: to what extent are you a user and/or developer of probabilistic programming languages, in addition to studying them at the theoretical level?

view this post on Zulip Paolo Perrone (Jun 02 2020 at 17:12):

David Myers once made me notice that lax monoidal functors, as opposed to strong, are naturally the morphisms not quite of monoidal categories, but rather of the underlying multicategories. If the structural functors that appear on probability kernels tend to be lax monoidal instead of strong (the underlying functor to the probability monad certainly is in this form), this could be an additional witness that "really Fubini is about multicategories".

view this post on Zulip Oscar Cunningham (Jun 02 2020 at 17:30):

Why is it more natural for a functor between multicategories rather than monoidal categories to be lax?

view this post on Zulip Paolo Perrone (Jun 02 2020 at 18:30):

Oscar Cunningham said:

Why is it more natural for a functor between multicategories rather than monoidal categories to be lax?

The idea is that if T is lax monoidal, then it canonically maps a morphism f: A x B -> C to a morphism TA x TB -> TC.

view this post on Zulip Sam Staton (Jun 02 2020 at 18:45):

Tobias Fritz said:

to what extent are you a user and/or developer of probabilistic programming languages

I dabble a bit, mainly because I'm interested to know what could be useful. I'm involved in a project with some social scientists on analyzing hate events on twitter and I've been writing probabilistic programs for that.

view this post on Zulip Tomáš Gonda (Jun 03 2020 at 22:08):

Hi Sam, thanks for the tutorial! I was a bit confused by the first example (4 buses in an hour) of the weighted Monte Carlo, so let me rephrase it to check if I got it right:

I think what threw me off at first was the naive intuition that, in this example, somehow the most 'complicated' part of the calculation is computing the likelihood. Therefore, I was subconsciously expecting the simulation to be approximating that, but then the likelihoods just entered as an input to the algorithm.

view this post on Zulip Sam Staton (Jun 04 2020 at 05:33):

Hi! Good question. Tomáš Gonda said:

The [weighted Monte Carlo] algorithm just samples from a uniform distribution and then scores each sample with the likelihood that on the given sampled day, one would see 4 buses. In the end, one then counts the weighted proportion of samples corresponding to a given hypothesis to get a posterior.

That's exactly right. (The example is very simple, and in practice you would be sampling from a more interesting prior.)

somehow the most 'complicated' part of the calculation is computing the likelihood

Indeed, there are various approaches to automatically converting a generative model into a density / likelihood function. But here I assume that the likelihood function is given to us (the Poisson density), and that is often the approach taken in probabilistic programming in practice.

view this post on Zulip Arthur Parzygnat (Jun 05 2020 at 11:26):

Thanks for your talk! I have some very basic questions. Let X:=A×BX:=A\times B be the space describing the values aa and bb in y=a+bxy=a+bx, so it's the space parametrizing affine maps from R\mathbb{R} to itself. Given a fixed (a,b)(a,b), it is not guaranteed that when an observation is made, the values will all lie along a straight line. If OO denotes the observation space, then this is described by a Markov kernel XOX\rightsquigarrow O. If we assume that there is a definite value of (a,b)(a,b) then this corresponds to a Dirac delta measure {}X\{\bullet\}\rightarrow X. We can push forward this measure to get one on the observation space OO which is describing the probabilities of witnessing certain observations. But what is OO exactly in this example? If we have nn observed data points, is it just i=1nR\coprod_{i=1}^{n}\mathbb{R}? (the disjoint union of nn copies of R\mathbb{R})? If so, to obtain the posterior that you plotted visually as a collection of straight lines, we apply Bayesian inversion to produce the associated Markov kernel OA×BO\rightsquigarrow A\times B having witnessed the specific observation of data?

view this post on Zulip Sam Staton (Jun 05 2020 at 13:15):

Hi Arthur Parzygnat. I think O=(R2)nO=(\mathbb{R}^2)^n, if there are $n$ observations in the plane. But maybe I misunderstood your notation?

view this post on Zulip Arthur Parzygnat (Jun 05 2020 at 15:01):

@Sam Staton Ah, I assumed that because the x values are only natural numbers then you get the disjoint union. But yes, if you allow arbitrary x positions, then yes. Okay, but it's good to know we agree.