Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: categorical probability

Topic: probability measure on probability measures

Matteo Capucci (he/him) (May 14 2025 at 11:40):

Given a probability space $(X, dx)$ , I'm trying to define a probability measure on $GX$ , the measurable space given by Giry on $X$ . That is, I'm trying to construct a probability measure on $GX$ .

My idea is the following:

In measurable spaces, pullback $dx : 1 \to GX$ along the multiplication $\mu: GGX \to GX$
Obtain a subobject $K=\{d\frak P \in GGX \mid \forall A,\ \int_{GX} ev_A d\frak P = dx(A)\}$
Notice $K$ is in fact convex according to the (free) convex structure of $GGX$
Take the expectation of $K$ to get a single point $\mathbb{E}[K]$ of $GGX$ , i.e. a probability measure on $GX$

Has this been done before?

John Baez (May 14 2025 at 11:46):

If $X$ is a 2-point space with the probability measure that assigns $1/2$ to each point, what probability measure does your construction put on the space of probability measures on $X$ , which is $[0,1]$ ?

John Baez (May 14 2025 at 11:46):

Wild guess: the delta measure at $\frac{1}{2} \in [0,1]$ .

Matteo Capucci (he/him) (May 14 2025 at 11:47):

Good question... That looks like a daunting computation but let me try

Matteo Capucci (he/him) (May 14 2025 at 12:08):

So the evaluation map $ev_1 : G2 \to \R$ is simply the inclusion of $\Delta^1 = [0,1]$ in $\R$ . So given $d\frak P \in GG2 = G[0,1]$ , the integral $\int_{[0,1]} x d \frak P$ is the expectation of $d\frak P$ . Conversely, $ev_2$ is $1-x$ and thus we get

$K = \{d\frak P \in G[0,1] \mid \mathbb E[d\frak P] = 1/2\}$

Now what is the barycenter of this set? Intuitively, I expect it to be $\delta_{1/2}$ , as @John Baez predicted, by considerations of symmetry, but my convex analysis is rusty so I don't know how to conclude this!

Matteo Capucci (he/him) (May 14 2025 at 12:10):

It'd also seem the way one obtain $\mathbb E[K]$ should give a rather explicit formula to compute the results of applying it to sets of probability measures, but I'm lost in convex analysis there too

Matteo Capucci (he/him) (May 14 2025 at 12:12):

Matteo Capucci (he/him) said:

Intuitively, I expect it to be $\delta_{1/2}$ , as @John Baez predicted, by considerations of symmetry, but my convex analysis is rusty so I don't know how to conclude this!

Uhm actually the uniform measure on $[0,1]$ is also a strong contendant... maybe even stronger?

Tobias Fritz (May 14 2025 at 12:13):

I think you'd need to specify a measure on $K$ in order to take the expectation. In other words, what exactly do you mean by "barycenter"?

Matteo Capucci (he/him) (May 14 2025 at 12:13):

Indeed, at least in this case, it seems the support of $\mathbb E[K]$ is the supremum of the supports of each $\frak P \in K$ (because taking convex combinations moves stuff away from 0), so it cannot be $\delta$

Matteo Capucci (he/him) (May 14 2025 at 12:14):

Tobias Fritz said:

I think you'd need to specify a measure on $K$ in order to take the expectation. In other words, what exactly do you mean by "barycenter"?

I need only for $GGX \supseteq K$ to have a convex structure and for $K$ to be convex, don't I?

Tobias Fritz (May 14 2025 at 12:15):

I don't know, what is your definition of barycenter?

Matteo Capucci (he/him) (May 14 2025 at 12:15):

The expectation, as in the result of applying the convex algebra structure map

Tobias Fritz (May 14 2025 at 12:16):

You mean the $G$ -algebra map? For which algebra $A$ , and applied to which element of $GA$ ?

Tobias Fritz (May 14 2025 at 12:16):

If you mean to take $A = K$ , then you still need to specify an element of $GK$ .

Matteo Capucci (he/him) (May 14 2025 at 12:16):

yes. $A = GGX$ , applied to $K \in GGGX$

Tobias Fritz (May 14 2025 at 12:17):

But $K$ is not an element of $GGGX$

Matteo Capucci (he/him) (May 14 2025 at 12:17):

why not?

Tobias Fritz (May 14 2025 at 12:17):

$K$ is a set, not a probability measure

Matteo Capucci (he/him) (May 14 2025 at 12:17):

uhm ok it's not a probability measure

Matteo Capucci (he/him) (May 14 2025 at 12:17):

I got confused by convexity

Matteo Capucci (he/him) (May 14 2025 at 12:20):

Matteo Capucci (he/him) said:

Given a probability space $(X, dx)$ , I'm trying to define a probability measure on $GX$ , the measurable space given by Giry on $X$ . That is, I'm trying to construct a probability measure on $GX$ .

Do you know if this can be done though @Tobias Fritz ?

John Baez (May 14 2025 at 12:20):

Here's one way to do it: given any probability space $(X, \mu)$ we get a probability measure on $G X$ which is the delta measure at $\mu \in GX$ .

Matteo Capucci (he/him) (May 14 2025 at 12:21):

It seems like at least the spaces $Gn= \Delta^n$ come with a canonical probability measure

Matteo Capucci (he/him) (May 14 2025 at 12:21):

John Baez said:

Here's one way to do it: given any probability space $(X, \mu)$ we get a probability measure on $G X$ which is the delta measure at $\mu \in GX$ .

well that makes sense

Tobias Fritz (May 14 2025 at 12:21):

I'm sure that there is no canonical construction, in the sense of a map $1 \to GGX$ that would be defined for every standard Borel measurable space $X$ (or even just finite $X$ ) and be natural in $X$

Matteo Capucci (he/him) (May 14 2025 at 12:22):

not even when $X$ comes with its own probability measure?

Matteo Capucci (he/him) (May 14 2025 at 12:23):

what's the catch?

Tobias Fritz (May 14 2025 at 12:23):

So the only natural things that you can do are along the lines of what John is suggesting, which is to use the monad unit $\delta : X \to GX$ to get $\delta_{GX} : GX \to GGX$ and $G\delta_X : GX \to GGX$ .

Tobias Fritz (May 14 2025 at 12:26):

I'm pretty sure that there are no other natural transformations $G \to GG$ , although I wouldn't know how to prove it offhand :sweat_smile: Maybe one of the other categorical probability folks here could do this.

John Baez (May 14 2025 at 12:27):

Matteo was hoping for a god-like power of this sort: your knowledge is captured by some probability measure, and someone tells you "well, that probability measure could be wrong" and you bounce back saying "true, but based on what I know I can automatically cook up a probability measure on probability measures, to help guess how likely other probability measures are".

And my construction amounts to the bull-headed approach where you say "actually, based on what I know, my probability distribution has a 100% chance of being correct".

John Baez (May 14 2025 at 12:30):

And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)

JR (May 14 2025 at 12:32):

John Baez said:

Matteo was hoping for a god-like power of this sort: your knowledge is captured by some probability measure, and someone tells you "well, that probability measure could be wrong" and you bounce back saying "true, but based on what I know I can automatically cook up a probability measure on probability measures, to help guess how likely other probability measures are".

Isn't this Bayesian statistics? Perhaps look at https://en.wikipedia.org/wiki/Dirichlet_distribution

Tobias Fritz (May 14 2025 at 12:34):

John Baez said:

And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)

Well, as I said there is another natural construction of a probability measure on $GX$ from a measure on $X$ , namely to apply $G\delta$ . In your example, this gives $\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1$ . So I think there is some discrepancy between Matteo's requirement of the measure on $GX$ integrating to the given measure on $X$ and the intuitive story you've given. But perhaps I'm misunderstanding something.

Tobias Fritz (May 14 2025 at 12:44):

JR said:

Isn't this Bayesian statistics? Perhaps look at https://en.wikipedia.org/wiki/Dirichlet_distribution

Oh yes, great point! So perhaps my earlier claim about there being no other natural transformation $G \to GG$ than $\delta_G$ and $G\delta$ was too premature. I don't really know this stuff, but the relevant paper is Dirichlet is natural.

John Baez (May 14 2025 at 13:00):

Tobias Fritz said:

John Baez said:

And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)

Well, as I said there is another natural construction of a probability measure on $GX$ from a measure on $X$ , namely to apply $G\delta$ . In your example, this gives $\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1$ .

I'm confused about what this second construction does in general, but in this particular case it seems to be that you say "the probability of the coin landing heads up is $1/2$ ", and someone says "that could be wrong, please give me a probability measure on probability measures" and you say "okay, with 50% chance it's landing heads up with 100% probability and with 50% chance it's landing tails up with 100% probability". Which is a nice comeback. :smirk:

But once you have two constructions you can also take mixtures.

Tobias Fritz (May 14 2025 at 13:05):

John Baez said:

I'm confused about what this second construction does in general, but in this particular case it seems to be that you say "the probability of the coin landing heads up is $1/2$ ", and someone says "that could be wrong, please give me a probability measure on probability measures" and you say "okay, with 50% chance it's landing heads up with 100% probability and with 50% chance it's landing tails up with 100% probability".

That's a good description of what it does in general! You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.

But once you have two constructions you can also take mixtures.

Right, I noticed that too, but then realized that my earlier claim seems to be invalidated already by the Dirichlet distributions that @JR mentioned, and these are more important in Bayesian statistics.

John Baez (May 14 2025 at 13:16):

You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.

Okay, right.

I don't understand the Dirichlet stuff, but I vaguely gather from the Wikipedia article that at least when $X$ is finite, so $G X$ is a finite-dimensional simplex, there are wads of "systematic" ways to get a probability distribution on $G X$ from a point in $G X$ . Then the paper you cited seems to be boosting these up to the case where $X$ can be infinite.

Tobias Fritz (May 14 2025 at 13:19):

Yes, but the main reason to link to that paper is that they prove naturality of the Dirichlet distributions, starting with the case of finite $X$ :
image.png

Tobias Fritz (May 14 2025 at 13:20):

Here, $M^*$ is the functor of nonnegative measures, so $G \subseteq M^*$ is the subfunctor of normalized (=probability) measures.

Benedikt Peterseim (May 14 2025 at 13:21):

John Baez said:

You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.

Okay, right.

I don't understand the Dirichlet stuff, but I vaguely gather from the Wikipedia article that at least when $X$ is finite, so $G X$ is a finite-dimensional simplex, there are wads of "systematic" ways to get a probability distribution on $G X$ from a point in $G X$ . Then the paper you cited seems to be boosting these up to the case where $X$ can be infinite.

In the infinite case, the Dirichlet distribution is also called the Dirichlet process, so the relevant Wikipedia link is: https://en.m.wikipedia.org/wiki/Dirichlet_process

It gives several nice interpretations, one having to do with East Asian cuisine, so while potentially spicy, I think this should make it quite intuitive.

James Deikun (May 14 2025 at 13:25):

It's important to note that Dirichlet gives a family of probability distributions concentrated at your original one, parameterized by a positive scaling factor which says how concentrated, i.e., how sure you are that your original i.i.d. distribution guess was close.

John Baez (May 14 2025 at 13:53):

That's very nice - in my terminology it would say how bull-headed you want to be.

I will study this stuff. I'll never be an expert on probability and statistics, but it's fun in bite-sized portions, and that "Dirichlet distribution" article looks enticing.

David Corfield (May 14 2025 at 14:56):

I haven't followed the details of the discussion, but when I used to think about this area, all the talk was around Edwin Jaynes and his ideas on selecting the maximum entropy distribution with respect to the information of a problem, articulated at length in 'Probability Theory: The Logic of Science', particular this part of the book:

image.png

So, say, you'd look to give an ignorance prior on the bias of a coin.

Has anything of this been taken in category-theoretic probability theory?

Tobias Fritz (May 15 2025 at 12:17):

I'm not sure how closely related active inference is to maximum entropy methods, but the two are at least similar in flavour, and thus Active Inference in String Diagrams: A Categorical Account of Predictive Processing and Free Energy may be interesting to know about.

Also @Paolo Perrone has worked on Markov Categories and Entropy and may be able to say more.

Paolo Perrone (May 15 2025 at 14:05):

In case this is relevant: the set $\mu^{-1}(dx)\subseteq GGX$ is naturally equipped with the structure of a partial order under partial evaluation. $\delta(gx)$ and $G\delta(dx)$ are the top and bottom elements in this partial order, and this order comes up indeed in Bayesian inference (as "Blackwell's order"). This is part of an active line of research, see the last paper on the topic as well as the references therein.

Paolo Perrone (May 15 2025 at 14:14):

Also something that could be relevant: one of the most natural ways to come up with elements of $GGX$ is via de Finetti's theorem.
The theorem says that $GGX$ is in (natural, functorial, etc) bijection with exchangeable sequences, which form the subspace of $G(X^\mathbb{N})$ of those measures on $X^\mathbb{N}$ which are invariant under finite permutations.
Now, given $dx\in GX$ ,

The measure $dx\otimes dx\otimes dx\otimes...$ (that is, the iid measure) corresponds exactly to $\delta(dx)\in GGX$ . (These are exactly the extreme points of the simplex $GGX$ , this is related to the Hewitt-Savage zero-one law)
The measure $G\Delta(dx)$ , where $\Delta:X\to X^\mathbb{N}$ is the diagonal map, corresponds exactly to $G\delta(dx)$ .

The Dirichlet process is exchangeable, and so we can apply de Finetti, obtaining once again an element of $GGX$ . A way to see this explicitly is via the stick-breaking picture.

Paolo Perrone (May 15 2025 at 14:18):

By the way, since people in this field use names such as "Indian buffet process" and "Chinese restaurant process", as an Italian I feel left out - I would suggest we call the "stick breaking process" the "pizza slicing process" instead. (Or at least "grissini sharing".)

David Corfield (May 15 2025 at 14:30):

Don't you have enough mathematical food allusions already?
image.png
When will steak and kidney pie appear?

John Baez (May 15 2025 at 14:40):

British food is fairly low on the hierarchy.

Morgan Rogers (he/him) (May 15 2025 at 16:13):

That response can only mean that you've never had a good steak and kidney pie. That said, having become a vegetarian I will likely not have one again.

Alex Lew (May 15 2025 at 16:26):

Victor Blanchi and Hugo Paquet have a nice characterization of the natural transformations G->GG: https://popl23.sigplan.org/details/lafi-2023-papers/6/Random-probability-distributions-as-natural-transformations

Hugo Paquet (May 15 2025 at 16:40):

Thanks Alex, we also tried to be a bit more precise in Section 8 of this paper: https://arxiv.org/abs/2405.17595.
The idea is roughly that the Dirichlet process is natural because the atom weights are sampled independenly (in the probabilistic sense) from the atom locations. In this case, the weights are generated by stick-breaking, but you will still get a natural transformation if you use any other distribution on the space of families of weights that sum to $1$ . (We called these "element-free distributions".)

Hugo Paquet (May 15 2025 at 16:42):

There is quite a bit of work on random discrete distributions (by Kingman and Pitman and probably many others in probability theory).

Hugo Paquet (May 15 2025 at 16:45):

I don't know much about random continuous distributions. Naturality is quite strong, we showed that the continuous part of a natural random distribution must coincide with the base measure after renormalizing. (By "base measure" I mean the input to the function $GX \to GGX$ ).

Hugo Paquet (May 15 2025 at 16:51):

Matteo Capucci (he/him) said:

Take the expectation of $K$ to get a single point $\mathbb{E}[K]$ of $GGX$ , i.e. a probability measure on $GX$

Maybe I missed something in the discussion but isn't the expectation of $K \in GGX$ exactly $\mu(K)$ ? For random distributions viewed as random variables the expectation is pointwise $(\mathbb{E}[d])(A) = \mathbb{E}[d(A)]$ .

Paolo Perrone (May 15 2025 at 16:57):

Yes, but K is a set.

Paolo Perrone (May 15 2025 at 18:52):

(But yes, in principle Hugo is correct: all the measures that are natural are going to have the same expectation, exactly $dx$ .)