You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Hello! Here's Sam Staton's tutorial video, a categorical tutorial about probabilistic programming.
https://youtu.be/JimCpEG0nts
Any questions about the video go in this thread!
That is a great talk, Sam! I have a few questions if you don't mind.
Hi Tobias!
Tobias Fritz said:
Could you elaborate a bit on why you prefer thinking of the/a category of probability kernels as a multicategory rather than a symmetric monoidal category?
What are the main differences between different probabilistic programming languages? How much do they differ with respect to which kernels can be defined?
I'll just list some:
With the question about whether s-finite kernels form a Kleisli category, are you considering all measurable spaces as objects? Or only a subclass like Polish spaces?
Very interesting, thanks! Yes, I see the point about the de Finetti theorem having more of a multicategorical flavour.
Out of curiosity: to what extent are you a user and/or developer of probabilistic programming languages, in addition to studying them at the theoretical level?
David Myers once made me notice that lax monoidal functors, as opposed to strong, are naturally the morphisms not quite of monoidal categories, but rather of the underlying multicategories. If the structural functors that appear on probability kernels tend to be lax monoidal instead of strong (the underlying functor to the probability monad certainly is in this form), this could be an additional witness that "really Fubini is about multicategories".
Why is it more natural for a functor between multicategories rather than monoidal categories to be lax?
Oscar Cunningham said:
Why is it more natural for a functor between multicategories rather than monoidal categories to be lax?
The idea is that if T is lax monoidal, then it canonically maps a morphism f: A x B -> C to a morphism TA x TB -> TC.
Tobias Fritz said:
to what extent are you a user and/or developer of probabilistic programming languages
I dabble a bit, mainly because I'm interested to know what could be useful. I'm involved in a project with some social scientists on analyzing hate events on twitter and I've been writing probabilistic programs for that.
Hi Sam, thanks for the tutorial! I was a bit confused by the first example (4 buses in an hour) of the weighted Monte Carlo, so let me rephrase it to check if I got it right:
I think what threw me off at first was the naive intuition that, in this example, somehow the most 'complicated' part of the calculation is computing the likelihood. Therefore, I was subconsciously expecting the simulation to be approximating that, but then the likelihoods just entered as an input to the algorithm.
Hi! Good question. Tomáš Gonda said:
The [weighted Monte Carlo] algorithm just samples from a uniform distribution and then scores each sample with the likelihood that on the given sampled day, one would see 4 buses. In the end, one then counts the weighted proportion of samples corresponding to a given hypothesis to get a posterior.
That's exactly right. (The example is very simple, and in practice you would be sampling from a more interesting prior.)
somehow the most 'complicated' part of the calculation is computing the likelihood
Indeed, there are various approaches to automatically converting a generative model into a density / likelihood function. But here I assume that the likelihood function is given to us (the Poisson density), and that is often the approach taken in probabilistic programming in practice.
Thanks for your talk! I have some very basic questions. Let be the space describing the values and in , so it's the space parametrizing affine maps from to itself. Given a fixed , it is not guaranteed that when an observation is made, the values will all lie along a straight line. If denotes the observation space, then this is described by a Markov kernel . If we assume that there is a definite value of then this corresponds to a Dirac delta measure . We can push forward this measure to get one on the observation space which is describing the probabilities of witnessing certain observations. But what is exactly in this example? If we have observed data points, is it just ? (the disjoint union of copies of )? If so, to obtain the posterior that you plotted visually as a collection of straight lines, we apply Bayesian inversion to produce the associated Markov kernel having witnessed the specific observation of data?
Hi Arthur Parzygnat. I think , if there are $n$ observations in the plane. But maybe I misunderstood your notation?
@Sam Staton Ah, I assumed that because the x values are only natural numbers then you get the disjoint union. But yes, if you allow arbitrary x positions, then yes. Okay, but it's good to know we agree.