Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: categorical probability

Topic: probability measure on probability measures


view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 11:40):

Given a probability space (X,dx)(X, dx), I'm trying to define a probability measure on GXGX, the measurable space given by Giry on XX. That is, I'm trying to construct a probability measure on GXGX.

My idea is the following:

  1. In measurable spaces, pullback dx:1GXdx : 1 \to GX along the multiplication μ:GGXGX\mu: GGX \to GX
  2. Obtain a subobject K={dPGGXA, GXevAdP=dx(A)}K=\{d\frak P \in GGX \mid \forall A,\ \int_{GX} ev_A d\frak P = dx(A)\}
  3. Notice KK is in fact convex according to the (free) convex structure of GGXGGX
  4. Take the expectation of KK to get a single point E[K]\mathbb{E}[K] of GGXGGX, i.e. a probability measure on GXGX

Has this been done before?

view this post on Zulip John Baez (May 14 2025 at 11:46):

If XX is a 2-point space with the probability measure that assigns 1/21/2 to each point, what probability measure does your construction put on the space of probability measures on XX, which is [0,1][0,1]?

view this post on Zulip John Baez (May 14 2025 at 11:46):

Wild guess: the delta measure at 12[0,1]\frac{1}{2} \in [0,1].

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 11:47):

Good question... That looks like a daunting computation but let me try

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:08):

So the evaluation map ev1:G2Rev_1 : G2 \to \R is simply the inclusion of Δ1=[0,1]\Delta^1 = [0,1] in R\R. So given dPGG2=G[0,1]d\frak P \in GG2 = G[0,1], the integral [0,1]xdP\int_{[0,1]} x d \frak P is the expectation of dPd\frak P. Conversely, ev2ev_2 is 1x1-x and thus we get

K={dPG[0,1]E[dP]=1/2}K = \{d\frak P \in G[0,1] \mid \mathbb E[d\frak P] = 1/2\}

Now what is the barycenter of this set? Intuitively, I expect it to be δ1/2\delta_{1/2}, as @John Baez predicted, by considerations of symmetry, but my convex analysis is rusty so I don't know how to conclude this!

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:10):

It'd also seem the way one obtain E[K]\mathbb E[K] should give a rather explicit formula to compute the results of applying it to sets of probability measures, but I'm lost in convex analysis there too

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:12):

Matteo Capucci (he/him) said:

Intuitively, I expect it to be δ1/2\delta_{1/2}, as @John Baez predicted, by considerations of symmetry, but my convex analysis is rusty so I don't know how to conclude this!

Uhm actually the uniform measure on [0,1][0,1] is also a strong contendant... maybe even stronger?

view this post on Zulip Tobias Fritz (May 14 2025 at 12:13):

I think you'd need to specify a measure on KK in order to take the expectation. In other words, what exactly do you mean by "barycenter"?

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:13):

Indeed, at least in this case, it seems the support of E[K]\mathbb E[K] is the supremum of the supports of each PK\frak P \in K (because taking convex combinations moves stuff away from 0), so it cannot be δ\delta

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:14):

Tobias Fritz said:

I think you'd need to specify a measure on KK in order to take the expectation. In other words, what exactly do you mean by "barycenter"?

I need only for GGXKGGX \supseteq K to have a convex structure and for KK to be convex, don't I?

view this post on Zulip Tobias Fritz (May 14 2025 at 12:15):

I don't know, what is your definition of barycenter?

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:15):

The expectation, as in the result of applying the convex algebra structure map

view this post on Zulip Tobias Fritz (May 14 2025 at 12:16):

You mean the GG-algebra map? For which algebra AA, and applied to which element of GAGA?

view this post on Zulip Tobias Fritz (May 14 2025 at 12:16):

If you mean to take A=KA = K, then you still need to specify an element of GKGK.

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:16):

yes. A=GGXA = GGX, applied to KGGGXK \in GGGX

view this post on Zulip Tobias Fritz (May 14 2025 at 12:17):

But KK is not an element of GGGXGGGX

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:17):

why not?

view this post on Zulip Tobias Fritz (May 14 2025 at 12:17):

KK is a set, not a probability measure

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:17):

uhm ok it's not a probability measure

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:17):

I got confused by convexity

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:20):

Matteo Capucci (he/him) said:

Given a probability space (X,dx)(X, dx), I'm trying to define a probability measure on GXGX, the measurable space given by Giry on XX. That is, I'm trying to construct a probability measure on GXGX.

Do you know if this can be done though @Tobias Fritz ?

view this post on Zulip John Baez (May 14 2025 at 12:20):

Here's one way to do it: given any probability space (X,μ)(X, \mu) we get a probability measure on GXG X which is the delta measure at μGX\mu \in GX.

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:21):

It seems like at least the spaces Gn=ΔnGn= \Delta^n come with a canonical probability measure

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:21):

John Baez said:

Here's one way to do it: given any probability space (X,μ)(X, \mu) we get a probability measure on GXG X which is the delta measure at μGX\mu \in GX.

well that makes sense

view this post on Zulip Tobias Fritz (May 14 2025 at 12:21):

I'm sure that there is no canonical construction, in the sense of a map 1GGX1 \to GGX that would be defined for every standard Borel measurable space XX (or even just finite XX) and be natural in XX

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:22):

not even when XX comes with its own probability measure?

view this post on Zulip Matteo Capucci (he/him) (May 14 2025 at 12:23):

what's the catch?

view this post on Zulip Tobias Fritz (May 14 2025 at 12:23):

So the only natural things that you can do are along the lines of what John is suggesting, which is to use the monad unit δ:XGX\delta : X \to GX to get δGX:GXGGX\delta_{GX} : GX \to GGX and GδX:GXGGXG\delta_X : GX \to GGX.

view this post on Zulip Tobias Fritz (May 14 2025 at 12:26):

I'm pretty sure that there are no other natural transformations GGGG \to GG, although I wouldn't know how to prove it offhand :sweat_smile: Maybe one of the other categorical probability folks here could do this.

view this post on Zulip John Baez (May 14 2025 at 12:27):

Matteo was hoping for a god-like power of this sort: your knowledge is captured by some probability measure, and someone tells you "well, that probability measure could be wrong" and you bounce back saying "true, but based on what I know I can automatically cook up a probability measure on probability measures, to help guess how likely other probability measures are".

And my construction amounts to the bull-headed approach where you say "actually, based on what I know, my probability distribution has a 100% chance of being correct".

view this post on Zulip John Baez (May 14 2025 at 12:30):

And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)

view this post on Zulip JR (May 14 2025 at 12:32):

John Baez said:

Matteo was hoping for a god-like power of this sort: your knowledge is captured by some probability measure, and someone tells you "well, that probability measure could be wrong" and you bounce back saying "true, but based on what I know I can automatically cook up a probability measure on probability measures, to help guess how likely other probability measures are".

Isn't this Bayesian statistics? Perhaps look at https://en.wikipedia.org/wiki/Dirichlet_distribution

view this post on Zulip Tobias Fritz (May 14 2025 at 12:34):

John Baez said:

And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)

Well, as I said there is another natural construction of a probability measure on GXGX from a measure on XX, namely to apply GδG\delta. In your example, this gives 12δ0+12δ1\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1. So I think there is some discrepancy between Matteo's requirement of the measure on GXGX integrating to the given measure on XX and the intuitive story you've given. But perhaps I'm misunderstanding something.

view this post on Zulip Tobias Fritz (May 14 2025 at 12:44):

JR said:

Isn't this Bayesian statistics? Perhaps look at https://en.wikipedia.org/wiki/Dirichlet_distribution

Oh yes, great point! So perhaps my earlier claim about there being no other natural transformation GGGG \to GG than δG\delta_G and GδG\delta was too premature. I don't really know this stuff, but the relevant paper is Dirichlet is natural.

view this post on Zulip John Baez (May 14 2025 at 13:00):

Tobias Fritz said:

John Baez said:

And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)

Well, as I said there is another natural construction of a probability measure on GXGX from a measure on XX, namely to apply GδG\delta. In your example, this gives 12δ0+12δ1\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1.

I'm confused about what this second construction does in general, but in this particular case it seems to be that you say "the probability of the coin landing heads up is 1/21/2", and someone says "that could be wrong, please give me a probability measure on probability measures" and you say "okay, with 50% chance it's landing heads up with 100% probability and with 50% chance it's landing tails up with 100% probability". Which is a nice comeback. :smirk:

But once you have two constructions you can also take mixtures.

view this post on Zulip Tobias Fritz (May 14 2025 at 13:05):

John Baez said:

I'm confused about what this second construction does in general, but in this particular case it seems to be that you say "the probability of the coin landing heads up is 1/21/2", and someone says "that could be wrong, please give me a probability measure on probability measures" and you say "okay, with 50% chance it's landing heads up with 100% probability and with 50% chance it's landing tails up with 100% probability".

That's a good description of what it does in general! You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.

But once you have two constructions you can also take mixtures.

Right, I noticed that too, but then realized that my earlier claim seems to be invalidated already by the Dirichlet distributions that @JR mentioned, and these are more important in Bayesian statistics.

view this post on Zulip John Baez (May 14 2025 at 13:16):

You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.

Okay, right.

I don't understand the Dirichlet stuff, but I vaguely gather from the Wikipedia article that at least when XX is finite, so GXG X is a finite-dimensional simplex, there are wads of "systematic" ways to get a probability distribution on GXG X from a point in GXG X. Then the paper you cited seems to be boosting these up to the case where XX can be infinite.

view this post on Zulip Tobias Fritz (May 14 2025 at 13:19):

Yes, but the main reason to link to that paper is that they prove naturality of the Dirichlet distributions, starting with the case of finite XX:
image.png

view this post on Zulip Tobias Fritz (May 14 2025 at 13:20):

Here, MM^* is the functor of nonnegative measures, so GMG \subseteq M^* is the subfunctor of normalized (=probability) measures.

view this post on Zulip Benedikt Peterseim (May 14 2025 at 13:21):

John Baez said:

You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.

Okay, right.

I don't understand the Dirichlet stuff, but I vaguely gather from the Wikipedia article that at least when XX is finite, so GXG X is a finite-dimensional simplex, there are wads of "systematic" ways to get a probability distribution on GXG X from a point in GXG X. Then the paper you cited seems to be boosting these up to the case where XX can be infinite.

In the infinite case, the Dirichlet distribution is also called the Dirichlet process, so the relevant Wikipedia link is: https://en.m.wikipedia.org/wiki/Dirichlet_process

It gives several nice interpretations, one having to do with East Asian cuisine, so while potentially spicy, I think this should make it quite intuitive.

view this post on Zulip James Deikun (May 14 2025 at 13:25):

It's important to note that Dirichlet gives a family of probability distributions concentrated at your original one, parameterized by a positive scaling factor which says how concentrated, i.e., how sure you are that your original i.i.d. distribution guess was close.

view this post on Zulip John Baez (May 14 2025 at 13:53):

That's very nice - in my terminology it would say how bull-headed you want to be.

I will study this stuff. I'll never be an expert on probability and statistics, but it's fun in bite-sized portions, and that "Dirichlet distribution" article looks enticing.

view this post on Zulip David Corfield (May 14 2025 at 14:56):

I haven't followed the details of the discussion, but when I used to think about this area, all the talk was around Edwin Jaynes and his ideas on selecting the maximum entropy distribution with respect to the information of a problem, articulated at length in 'Probability Theory: The Logic of Science', particular this part of the book:

image.png

So, say, you'd look to give an ignorance prior on the bias of a coin.

Has anything of this been taken in category-theoretic probability theory?

view this post on Zulip Tobias Fritz (May 15 2025 at 12:17):

I'm not sure how closely related active inference is to maximum entropy methods, but the two are at least similar in flavour, and thus Active Inference in String Diagrams: A Categorical Account of Predictive Processing and Free Energy may be interesting to know about.

Also @Paolo Perrone has worked on Markov Categories and Entropy and may be able to say more.

view this post on Zulip Paolo Perrone (May 15 2025 at 14:05):

In case this is relevant: the set μ1(dx)GGX\mu^{-1}(dx)\subseteq GGX is naturally equipped with the structure of a partial order under partial evaluation. δ(gx)\delta(gx) and Gδ(dx)G\delta(dx) are the top and bottom elements in this partial order, and this order comes up indeed in Bayesian inference (as "Blackwell's order"). This is part of an active line of research, see the last paper on the topic as well as the references therein.

view this post on Zulip Paolo Perrone (May 15 2025 at 14:14):

Also something that could be relevant: one of the most natural ways to come up with elements of GGXGGX is via de Finetti's theorem.
The theorem says that GGXGGX is in (natural, functorial, etc) bijection with exchangeable sequences, which form the subspace of G(XN)G(X^\mathbb{N}) of those measures on XNX^\mathbb{N} which are invariant under finite permutations.
Now, given dxGXdx\in GX,

The Dirichlet process is exchangeable, and so we can apply de Finetti, obtaining once again an element of GGXGGX. A way to see this explicitly is via the stick-breaking picture.

view this post on Zulip Paolo Perrone (May 15 2025 at 14:18):

By the way, since people in this field use names such as "Indian buffet process" and "Chinese restaurant process", as an Italian I feel left out - I would suggest we call the "stick breaking process" the "pizza slicing process" instead. (Or at least "grissini sharing".)

view this post on Zulip David Corfield (May 15 2025 at 14:30):

Don't you have enough mathematical food allusions already?
image.png
When will steak and kidney pie appear?

view this post on Zulip John Baez (May 15 2025 at 14:40):

British food is fairly low on the hierarchy.

view this post on Zulip Morgan Rogers (he/him) (May 15 2025 at 16:13):

That response can only mean that you've never had a good steak and kidney pie. That said, having become a vegetarian I will likely not have one again.

view this post on Zulip Alex Lew (May 15 2025 at 16:26):

Victor Blanchi and Hugo Paquet have a nice characterization of the natural transformations G->GG: https://popl23.sigplan.org/details/lafi-2023-papers/6/Random-probability-distributions-as-natural-transformations

view this post on Zulip Hugo Paquet (May 15 2025 at 16:40):

Thanks Alex, we also tried to be a bit more precise in Section 8 of this paper: https://arxiv.org/abs/2405.17595.
The idea is roughly that the Dirichlet process is natural because the atom weights are sampled independenly (in the probabilistic sense) from the atom locations. In this case, the weights are generated by stick-breaking, but you will still get a natural transformation if you use any other distribution on the space of families of weights that sum to 11. (We called these "element-free distributions".)

view this post on Zulip Hugo Paquet (May 15 2025 at 16:42):

There is quite a bit of work on random discrete distributions (by Kingman and Pitman and probably many others in probability theory).

view this post on Zulip Hugo Paquet (May 15 2025 at 16:45):

I don't know much about random continuous distributions. Naturality is quite strong, we showed that the continuous part of a natural random distribution must coincide with the base measure after renormalizing. (By "base measure" I mean the input to the function GXGGXGX \to GGX).

view this post on Zulip Hugo Paquet (May 15 2025 at 16:51):

Matteo Capucci (he/him) said:

Take the expectation of KK to get a single point E[K]\mathbb{E}[K] of GGXGGX, i.e. a probability measure on GXGX

Maybe I missed something in the discussion but isn't the expectation of KGGXK \in GGX exactly μ(K)\mu(K)? For random distributions viewed as random variables the expectation is pointwise (E[d])(A)=E[d(A)](\mathbb{E}[d])(A) = \mathbb{E}[d(A)].

view this post on Zulip Paolo Perrone (May 15 2025 at 16:57):

Yes, but K is a set.

view this post on Zulip Paolo Perrone (May 15 2025 at 18:52):

(But yes, in principle Hugo is correct: all the measures that are natural are going to have the same expectation, exactly dxdx.)