Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: categorical probability

Topic: Universal property of the Giry monad


view this post on Zulip Nathaniel Virgo (Feb 04 2025 at 13:52):

This is idle wondering and I don't really expect the answer to be known, but maybe it will be an interesting conversation.

Chen's A universal characterisation of the category of standard Borel spaces characterises the category of standard Borel spaces as the universal one with certain nice properties. This is great because it means not only that we know a bunch of nice properties that this category has, but also that those are in some sense its only relevant ones when it comes to proving things about it.

However, that paper doesn't really mention Markov kernels or monads at all, so it doesn't immediately let us talk about probability.

So I'm wondering to what it extent it might be possible to construct the Giry monad, or BorelStoch\bf BorelStoch, in a similarly abstract way.

In other words, if someone hands me a category and tells me that it's the initial countably complete countably extensive category, how can I use that information to construct the notions of measure and Markov kernel?

view this post on Zulip Benedikt Peterseim (Feb 04 2025 at 14:10):

This is a very interesting question! I hadn’t seen that paper before.

view this post on Zulip Benedikt Peterseim (Feb 04 2025 at 14:15):

Of course, there are constructions of probability monads as Codensity monads, but they presuppose that you have the unit interval in your category already. On the other hand, there is also work on universal characterisations of the unit interval, so maybe using one of these, one might actually obtain a universal property for BorelStoch. This seems far from straightforward, though.

view this post on Zulip Tobias Fritz (Feb 04 2025 at 15:12):

The following is pretty speculative, but I think a characterization like the following could be true and conceptually meaningful:

Conjecture: BorelStoch\mathsf{BorelStoch} is the free Markov category with countable Kolmogorov products generated by the Markov category FinStoch\mathsf{FinStoch}.

For the morphisms, I imagine using strong symmetric monoidal functors presering copy. This would be in a bit of a different direction than Chen's result though.

view this post on Zulip Tobias Fritz (Feb 04 2025 at 15:16):

More along the lines of Chen's paper, perhaps one could consider something like the free representable Markov category that is countably distributive?

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:27):

Perhaps worth mentioning is also the idea of free affine commutative monads, which my student @Victor Bloch is working on. Similar to how the free monad generated by a binary operation is the monad of binary trees, Victor has constructed the free affine commutative monad generated by a binary operation on Set, and we seem to be able to show that it embeds into the distribution monad in a way that is topologically dense. (Very roughly, think of approximating a distribution by repeated coin flips.)

Moreover, we conjecture that the Radon monad, which is the standard probability monad on the category of compact Hausdorff spaces, literally is the free affine commutative monad generated by a binary operation. But so far we don't know how to approach this problem.

Binary operations arguably aren't special in this story, and I'd expect everything to work the same if one starts with an operation of any finite arity.

view this post on Zulip Benedikt Peterseim (Feb 04 2025 at 16:38):

Tobias Fritz said:

Perhaps worth mentioning is also the idea of free affine commutative monads, which my student Victor Bloch is working on. Similar to how the free monad generated by a binary operation is the monad of binary trees, Victor has constructed the free affine commutative monad generated by a binary operation on Set, and we seem to be able to show that it embeds into the distribution monad in a way that is topologically dense. (Very roughly, think of approximating a distribution by repeated coin flipping.)

This is interesting—so you get a construction of the dyadic rational unit interval as M(2) entirely for free? (where M is this free commutative affine monad) Also, if the conjecture about the Radon monad is true, I suppose that would give another (very natural, in my opinion) universal chracterisation of the unit interval?

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:41):

It's actually not the dyadic interval -- it's all those numbers that you can get by repeatedly flipping a biased coin with transcendental bias!

view this post on Zulip Benedikt Peterseim (Feb 04 2025 at 16:41):

By the way, are you using midpoint algebras (https://ncatlab.org/nlab/show/midpoint+algebra) or variations of this in any way? I think there are characterizations of the unit interval that use this iirc, so I’m thinking it might be relevant to the conjecture regarding the Radon monad.

view this post on Zulip Benedikt Peterseim (Feb 04 2025 at 16:41):

Tobias Fritz said:

It's actually not the dyadic interval -- it's all those numbers that you can get by repeatedly flipping a biased coin with transcendental bias!

Oh, I’m surprised and confused. :)

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:41):

The reason is that the defining operation of a midpoint algebra is commutative, so it cannot be the generating operation in a free monad, and not even in a free affine commutative monad :sweat_smile:

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:42):

This is clear once you note that not all binary operations in a commutative monad are commutative.

view this post on Zulip Benedikt Peterseim (Feb 04 2025 at 16:43):

Ah, I think I’m starting to see this, okay. :)

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:45):

The difficult logical part of this is from On convex linear forms, a 1974 paper by Fajtlowicz and Mycielski, who show that the theory of binary convex combinations with a transcendental coefficient is axiomatized by two equations:
image.png
Intuitively, these two equations say precisely that the monad is affine resp. commutative.

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:46):

A good way to understand the second equation is to note that it amounts to the operation distributing over itself, or what we would call commuting with itself. Like how in a commutative monad any two operations commute.

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:50):

So these algebraic structures are midpoint algebras without the commutativity. They seem to be called modes.

view this post on Zulip Tobias Fritz (Feb 04 2025 at 16:52):

Maybe you'd have some idea for how to approach the case of compact Hausdorff spaces @Benedikt Peterseim ? Happy to talk more in case you're interested.

view this post on Zulip Benedikt Peterseim (Feb 04 2025 at 18:05):

Not any specific idea right now, but it does seem of a similar flavour to some things I’ve thought about before. Would definitely be interested in some more details and in sharing some thoughts; feel free to DM me.

view this post on Zulip Morgan Rogers (he/him) (Feb 05 2025 at 09:14):

Benedikt Peterseim said:

Of course, there are constructions of probability monads as Codensity monads, but they presuppose that you have the unit interval in your category already. On the other hand, there is also work on universal characterisations of the unit interval, so maybe using one of these, one might actually obtain a universal property for BorelStoch.

You don't need a universal characterization of the unit interval, it's enough to have a construction of the unit interval using the limits and colimits available in the characterization of BorelStoch. The subtlety is just that while there is a unique uncountable measure space, there are non-equivalent ways of equipping it with this structure. We already have the trivial (two element) interval. Constructing the diadic interval is easy enough. Fortunately the operations involved in the standard unit interval are continuous (except division at 0, which is still okay) and hence measurable, so this should be possible.

view this post on Zulip Morgan Rogers (he/him) (Feb 05 2025 at 09:15):

A consequence would be that you can do probability theory over any countably complete Boolean countably lextensive category with the same suite of interval objects.

view this post on Zulip Nathaniel Virgo (Feb 08 2025 at 19:55):

Morgan Rogers (he/him) said:

A consequence would be that you can do probability theory over any countably complete Boolean countably lextensive category with the same suite of interval objects.

I'm kind of confused about this point actually. Assuming we can construct the Giry monad from the limits and colimits in the characterisation of StdBorel, however we do it, shouldn't that mean we can do it in any countably complete Boolean countably extensive category? Since Set is such a category (right?), this seems to imply we'd get for free a monad on Set that inherits all the nice properties of the Giry monad. That would be great if it's true, but it sounds like too much to hope for. Presumably something would prevent that - do you have an idea what it would be?

view this post on Zulip Nathaniel Virgo (Feb 08 2025 at 22:00):

With that caveat in mind, here's another conjecture about how to answer this version of my question:

Nathaniel Virgo said:

In other words, if someone hands me a category and tells me that it's the initial countably complete Boolean countably extensive category, how can I use that information to construct the notions of measure and Markov kernel?

Take an object XX and construct XNX^\mathbb{N} as a countable product. Consider the group of finite permutations of N\mathbb{N} and its action on XNX^\mathbb{N}. I think we have enough structure to construct the object of orbits of this group action. I conjecture that this is P(X)P(X), the Giry monad applied to XX.

This is because of de Finetti's theorem. Whatever the quotient of this group action is, measures over it ought to be basically the same as exchangeable measures over XNX^\mathbb{N}, which are the same as measures over P(X)P(X). So in particular, elements of it should just be elements of P(X)P(X). (Since over standard Borel spaces an element is the same as a measure concentrated on that element.)

If that's right then one can probably get the monad structure by considering a bijection between sequences of sequences of elements of XX and sequences of elements of XX.

(Although maybe there is some weird measure-theoretic reason why this doesn't work. Usually when I make a guess about measure theory it turns out to be wrong.)

view this post on Zulip Kevin Carlson (Feb 09 2025 at 07:32):

Doesn't the Giry monad restrict to a monad on Set along the inclusion of sets as discrete measurable spaces?

view this post on Zulip John Baez (Feb 09 2025 at 08:02):

I don't think so: the Giry monad maps a measurable space to the measurable space of probability measures on it, but the measurable space of probability measures on a discrete measurable space is not itself discrete, e.g. the probability measure space on 2 is isomorphic to [0,1] with its usual Borel sigma-algebra, which is different than the sigma-algebra of the discrete measurable space [0,1] (namely the sigma-algebra of all subsets of [0,1]).

view this post on Zulip Kevin Carlson (Feb 09 2025 at 08:52):

Ah, right.

view this post on Zulip Morgan Rogers (he/him) (Feb 09 2025 at 17:21):

Nathaniel Virgo said:

Morgan Rogers (he/him) said:

A consequence would be that you can do probability theory over any countably complete Boolean countably lextensive category with the same suite of interval objects.

I'm kind of confused about this point actually. ...shouldn't that mean we can do it in any countably complete Boolean countably extensive category? Since Set is such a category (right?), this seems to imply we'd get for free a monad on Set that inherits all the nice properties of the Giry monad.

I expect that the confusion comes from assuming that the monad thus constructed will inherit the properties of the Giry monad? We could figure this out explicitly, though: what is an example of a nice property that the Giry monad has that you expect to fail when the construction is transported to Set?

view this post on Zulip Nathaniel Virgo (Feb 10 2025 at 11:54):

If I try to be more precise I suppose that one of the following things must happen: either the monad needs initiality of StdBorel to construct it (hence you can't construct the same monad on Set), or some of its nice properties need initiality of StdBorel to prove them (hence its counterpart on Set doesn't have those properties), or the Set version does inherit all the nice properties but becomes trivial in some way, or at least isn't useful for probability in the same way that BorelStoch is. Or none of the above, which is the best possible outcome but seems unlikely. Or something else and I'm more confused than I thought, which is quite possible.

Specifically, the properties I'm thinking of are the following, which I regularly use to reason about its Kleisli category, BorelStoch:

These are basically the assumptions used in Fritz+Gonda+Perrone's abstract proof of de Finetti's theorem.

view this post on Zulip Tobias Fritz (Feb 11 2025 at 15:31):

Nathaniel Virgo said:

Take an object XX and construct XNX^\mathbb{N} as a countable product. Consider the group of finite permutations of N\mathbb{N} and its action on XNX^\mathbb{N}. I think we have enough structure to construct the object of orbits of this group action. I conjecture that this is P(X)P(X), the Giry monad applied to XX.

This is similar in spirit to what we're currently working on with the law of large numbers. This is at a sufficiently advanced stage that we could possibly share a draft if you're interested.

What I imagine that you have in mind is to consider a map XNPXX^{\mathbb{N}} \to PX which takes every sequence to the associated "empirical measure" defined by the relative frequencies of the elements in the sequence. The first problem now is that this doesn't make sense in general, because many sequences don't have well-defined relative frequencies. Therefore the transformation can at most be partially defined. Another issue is that you can have two sequences for which the relative frequencies do converge, and even be identical so that you get the same element of PXPX, even if the sequences do not differ by a finite permutation. (They don't even necessarily differ by an infinite permutation, because one may contain elements with relative frequency zero!) So although a partial transformation exists, there's no simple characterization of when two elements of its domain get mapped to the same element of the codomain.

(There are further analytical subtleties that we've had to struggle with, which is part of why it's taken us a few years to get to the point where we have a draft.)

view this post on Zulip Tobias Fritz (Feb 11 2025 at 15:31):

With that being said, I think that your construction could nevertheless give an affine commutative monad on Meas or BorelMeas, so that its Kleisli category is going to be a Markov category. Maybe this Markov category is of some interest even if it's not Stoch? For one thing, this construction clearly works on Set too.

view this post on Zulip Tobias Fritz (Feb 11 2025 at 15:36):

Nathaniel Virgo said:

This is because of de Finetti's theorem. Whatever the quotient of this group action is, measures over it ought to be basically the same as exchangeable measures over XNX^\mathbb{N}, which are the same as measures over P(X)P(X). So in particular, elements of it should just be elements of P(X)P(X). (Since over standard Borel spaces an element is the same as a measure concentrated on that element.)

This indeed gives a construction of PX, but it's different in two ways from the quotient that you've described:

view this post on Zulip Tobias Fritz (Feb 11 2025 at 15:40):

So although we can indeed characterize the Giry monad like this, doing so requires starting with the Markov category already. I still think that this is interesting, and this is what we're doing with the law of large numbers: we assume that we have a Markov category with additional structure that makes it have a law of large numbers, and then we derive representability from that.

view this post on Zulip Tobias Fritz (Feb 11 2025 at 15:41):

Conceptually, I think it's good practice to think of Markov categories as the primary structures and the monads as derived rather than the other way around.

view this post on Zulip Nathaniel Virgo (Feb 11 2025 at 18:31):

Ah, of course, that all makes complete sense. I know all these facts very well, I had just not realised I was ignoring them!

Still, I wonder if there's a way to fix it. It would be kind of neat to turn de Finetti's theorem into the definition of a probability measure, if it can be done.

I agree with the general principle of seeing Markov categories rather than monads as primary. It's just that it seems the work of characterising the deterministic subcategory of BorelStoch has already been done, so I was speculating about ways to get from there to an abstract characterisation of BorelStoch itself.