You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Given a probability space , I'm trying to define a probability measure on , the measurable space given by Giry on . That is, I'm trying to construct a probability measure on .
My idea is the following:
Has this been done before?
If is a 2-point space with the probability measure that assigns to each point, what probability measure does your construction put on the space of probability measures on , which is ?
Wild guess: the delta measure at .
Good question... That looks like a daunting computation but let me try
So the evaluation map is simply the inclusion of in . So given , the integral is the expectation of . Conversely, is and thus we get
Now what is the barycenter of this set? Intuitively, I expect it to be , as @John Baez predicted, by considerations of symmetry, but my convex analysis is rusty so I don't know how to conclude this!
It'd also seem the way one obtain should give a rather explicit formula to compute the results of applying it to sets of probability measures, but I'm lost in convex analysis there too
Matteo Capucci (he/him) said:
Intuitively, I expect it to be , as @John Baez predicted, by considerations of symmetry, but my convex analysis is rusty so I don't know how to conclude this!
Uhm actually the uniform measure on is also a strong contendant... maybe even stronger?
I think you'd need to specify a measure on in order to take the expectation. In other words, what exactly do you mean by "barycenter"?
Indeed, at least in this case, it seems the support of is the supremum of the supports of each (because taking convex combinations moves stuff away from 0), so it cannot be
Tobias Fritz said:
I think you'd need to specify a measure on in order to take the expectation. In other words, what exactly do you mean by "barycenter"?
I need only for to have a convex structure and for to be convex, don't I?
I don't know, what is your definition of barycenter?
The expectation, as in the result of applying the convex algebra structure map
You mean the -algebra map? For which algebra , and applied to which element of ?
If you mean to take , then you still need to specify an element of .
yes. , applied to
But is not an element of
why not?
is a set, not a probability measure
uhm ok it's not a probability measure
I got confused by convexity
Matteo Capucci (he/him) said:
Given a probability space , I'm trying to define a probability measure on , the measurable space given by Giry on . That is, I'm trying to construct a probability measure on .
Do you know if this can be done though @Tobias Fritz ?
Here's one way to do it: given any probability space we get a probability measure on which is the delta measure at .
It seems like at least the spaces come with a canonical probability measure
John Baez said:
Here's one way to do it: given any probability space we get a probability measure on which is the delta measure at .
well that makes sense
I'm sure that there is no canonical construction, in the sense of a map that would be defined for every standard Borel measurable space (or even just finite ) and be natural in
not even when comes with its own probability measure?
what's the catch?
So the only natural things that you can do are along the lines of what John is suggesting, which is to use the monad unit to get and .
I'm pretty sure that there are no other natural transformations , although I wouldn't know how to prove it offhand :sweat_smile: Maybe one of the other categorical probability folks here could do this.
Matteo was hoping for a god-like power of this sort: your knowledge is captured by some probability measure, and someone tells you "well, that probability measure could be wrong" and you bounce back saying "true, but based on what I know I can automatically cook up a probability measure on probability measures, to help guess how likely other probability measures are".
And my construction amounts to the bull-headed approach where you say "actually, based on what I know, my probability distribution has a 100% chance of being correct".
And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)
John Baez said:
Matteo was hoping for a god-like power of this sort: your knowledge is captured by some probability measure, and someone tells you "well, that probability measure could be wrong" and you bounce back saying "true, but based on what I know I can automatically cook up a probability measure on probability measures, to help guess how likely other probability measures are".
Isn't this Bayesian statistics? Perhaps look at https://en.wikipedia.org/wiki/Dirichlet_distribution
John Baez said:
And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)
Well, as I said there is another natural construction of a probability measure on from a measure on , namely to apply . In your example, this gives . So I think there is some discrepancy between Matteo's requirement of the measure on integrating to the given measure on and the intuitive story you've given. But perhaps I'm misunderstanding something.
JR said:
Isn't this Bayesian statistics? Perhaps look at https://en.wikipedia.org/wiki/Dirichlet_distribution
Oh yes, great point! So perhaps my earlier claim about there being no other natural transformation than and was too premature. I don't really know this stuff, but the relevant paper is Dirichlet is natural.
Tobias Fritz said:
John Baez said:
And I think it's philosophically very important, if true, that there's generally no "natural" response other than this bull-headed one. (I don't really know a theorem to this effect, but there should be some.)
Well, as I said there is another natural construction of a probability measure on from a measure on , namely to apply . In your example, this gives .
I'm confused about what this second construction does in general, but in this particular case it seems to be that you say "the probability of the coin landing heads up is ", and someone says "that could be wrong, please give me a probability measure on probability measures" and you say "okay, with 50% chance it's landing heads up with 100% probability and with 50% chance it's landing tails up with 100% probability". Which is a nice comeback. :smirk:
But once you have two constructions you can also take mixtures.
John Baez said:
I'm confused about what this second construction does in general, but in this particular case it seems to be that you say "the probability of the coin landing heads up is ", and someone says "that could be wrong, please give me a probability measure on probability measures" and you say "okay, with 50% chance it's landing heads up with 100% probability and with 50% chance it's landing tails up with 100% probability".
That's a good description of what it does in general! You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.
But once you have two constructions you can also take mixtures.
Right, I noticed that too, but then realized that my earlier claim seems to be invalidated already by the Dirichlet distributions that @JR mentioned, and these are more important in Bayesian statistics.
You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.
Okay, right.
I don't understand the Dirichlet stuff, but I vaguely gather from the Wikipedia article that at least when is finite, so is a finite-dimensional simplex, there are wads of "systematic" ways to get a probability distribution on from a point in . Then the paper you cited seems to be boosting these up to the case where can be infinite.
Yes, but the main reason to link to that paper is that they prove naturality of the Dirichlet distributions, starting with the case of finite :
image.png
Here, is the functor of nonnegative measures, so is the subfunctor of normalized (=probability) measures.
John Baez said:
You just need to replace the outcomes and the 50% numbers by the outcomes and weights of whatever probability measure you start with.
Okay, right.
I don't understand the Dirichlet stuff, but I vaguely gather from the Wikipedia article that at least when is finite, so is a finite-dimensional simplex, there are wads of "systematic" ways to get a probability distribution on from a point in . Then the paper you cited seems to be boosting these up to the case where can be infinite.
In the infinite case, the Dirichlet distribution is also called the Dirichlet process, so the relevant Wikipedia link is: https://en.m.wikipedia.org/wiki/Dirichlet_process
It gives several nice interpretations, one having to do with East Asian cuisine, so while potentially spicy, I think this should make it quite intuitive.
It's important to note that Dirichlet gives a family of probability distributions concentrated at your original one, parameterized by a positive scaling factor which says how concentrated, i.e., how sure you are that your original i.i.d. distribution guess was close.
That's very nice - in my terminology it would say how bull-headed you want to be.
I will study this stuff. I'll never be an expert on probability and statistics, but it's fun in bite-sized portions, and that "Dirichlet distribution" article looks enticing.
I haven't followed the details of the discussion, but when I used to think about this area, all the talk was around Edwin Jaynes and his ideas on selecting the maximum entropy distribution with respect to the information of a problem, articulated at length in 'Probability Theory: The Logic of Science', particular this part of the book:
So, say, you'd look to give an ignorance prior on the bias of a coin.
Has anything of this been taken in category-theoretic probability theory?
I'm not sure how closely related active inference is to maximum entropy methods, but the two are at least similar in flavour, and thus Active Inference in String Diagrams: A Categorical Account of Predictive Processing and Free Energy may be interesting to know about.
Also @Paolo Perrone has worked on Markov Categories and Entropy and may be able to say more.
In case this is relevant: the set is naturally equipped with the structure of a partial order under partial evaluation. and are the top and bottom elements in this partial order, and this order comes up indeed in Bayesian inference (as "Blackwell's order"). This is part of an active line of research, see the last paper on the topic as well as the references therein.
Also something that could be relevant: one of the most natural ways to come up with elements of is via de Finetti's theorem.
The theorem says that is in (natural, functorial, etc) bijection with exchangeable sequences, which form the subspace of of those measures on which are invariant under finite permutations.
Now, given ,
The Dirichlet process is exchangeable, and so we can apply de Finetti, obtaining once again an element of . A way to see this explicitly is via the stick-breaking picture.
By the way, since people in this field use names such as "Indian buffet process" and "Chinese restaurant process", as an Italian I feel left out - I would suggest we call the "stick breaking process" the "pizza slicing process" instead. (Or at least "grissini sharing".)
Don't you have enough mathematical food allusions already?
image.png
When will steak and kidney pie appear?
British food is fairly low on the hierarchy.
That response can only mean that you've never had a good steak and kidney pie. That said, having become a vegetarian I will likely not have one again.
Victor Blanchi and Hugo Paquet have a nice characterization of the natural transformations G->GG: https://popl23.sigplan.org/details/lafi-2023-papers/6/Random-probability-distributions-as-natural-transformations
Thanks Alex, we also tried to be a bit more precise in Section 8 of this paper: https://arxiv.org/abs/2405.17595.
The idea is roughly that the Dirichlet process is natural because the atom weights are sampled independenly (in the probabilistic sense) from the atom locations. In this case, the weights are generated by stick-breaking, but you will still get a natural transformation if you use any other distribution on the space of families of weights that sum to . (We called these "element-free distributions".)
There is quite a bit of work on random discrete distributions (by Kingman and Pitman and probably many others in probability theory).
I don't know much about random continuous distributions. Naturality is quite strong, we showed that the continuous part of a natural random distribution must coincide with the base measure after renormalizing. (By "base measure" I mean the input to the function ).
Matteo Capucci (he/him) said:
Take the expectation of to get a single point of , i.e. a probability measure on
Maybe I missed something in the discussion but isn't the expectation of exactly ? For random distributions viewed as random variables the expectation is pointwise .
Yes, but K is a set.
(But yes, in principle Hugo is correct: all the measures that are natural are going to have the same expectation, exactly .)