Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: categorical probability

Topic: de Finetti's Theorem


view this post on Zulip Tobias Fritz (May 07 2021 at 09:13):

De Finetti's Theorem is a classical result in probability theory. @Tomáš Gonda, @Paolo Perrone and myself have now proven it in completely categorical terms: as it turns out, one only needs a Markov category that has conditionals, countable Kolmogorov products and is the Kleisli category of a monad in a way that respects almost sure equality.

The Kleisli category of the Giry monad on standard Borel measurable spaces satisfies these assumptions, and we therefore recover the usual de Finetti Theorem in this case. Despite having a large supply of Markov categories in general, we do not know of any nontrivial Markov category outside of measure-theoretic probability that would satisfy the three relevant assumptions! So unfortunately we don't get any new de Finetti theorems yet.

view this post on Zulip Morgan Rogers (he/him) (May 07 2021 at 10:21):

Can you at least loosen the conditions in some directions to get weakenings of the theorem?

view this post on Zulip Tobias Fritz (May 07 2021 at 10:50):

Good question! I can't think of any weakening that would still preserve the spirit of the theorem, but perhaps the others can say more.

Some of our lemmas though do hold more generally, and we do expect them to be more generally useful, in particular Lemma 5.2. But we pretty much use the existence of conditionals throughout, and this seems to be the "bottleneck": most Markov categories that I can think of do not have conditionals, and assuming conditionals already seems to get us quite close to measure-theoretic probability. This is an intuition so far, but it would be nice to make this statement precise at some point by providing a categorical characterization theorem of measure-theoretic probability.

view this post on Zulip Spencer Breiner (May 07 2021 at 16:37):

I don't understand this stuff very well (hopefully this paper will help!), but an obvious question, just from looking at the theorem itself, is what happens if you remove the restriction to finite permutations? Can you generalize your Kolmogorov product definition to other families of subsets (maybe an ultrafilter?)?

view this post on Zulip Spencer Breiner (May 07 2021 at 16:39):

Alternatively/additionally, I would be interested in hearing more about why the Kolmogorov product is defined in the way that it is? Is there something going on here with the cofinite topology?

view this post on Zulip Spencer Breiner (May 07 2021 at 16:45):

Spencer Breiner said:

(maybe an ultrafilter?)

That doesn't make sense. How about the closed sets for a finer topology?

view this post on Zulip Spencer Breiner (May 07 2021 at 16:49):

I suppose this would be a strengthening of the theorem, so the question is really whether those extra "projections" would give you something more.

view this post on Zulip Tobias Fritz (May 07 2021 at 17:31):

I'm not sure what generalization of Kolmogorov products you have in mind, in the sense of where the closed sets come in. Could you spell that out a bit? What would its universal property be?

The definition of Kolmogorov products is based on the Kolmogorov extension theorem, which states that a probability measure is uniquely determined by its finite marginals; and conversely, every compatible family of finite marginals arises in this way. So it very much has the flavour of a sheaf condition. There's a more detailed discussion of Kolmogorov products and why they're defined that way in an earlier paper that I wrote together with @Eigil Rischel.

view this post on Zulip Eigil Rischel (May 17 2021 at 15:37):

Tobias Fritz said:

Good question! I can't think of any weakening that would still preserve the spirit of the theorem, but perhaps the others can say more.

Some of our lemmas though do hold more generally, and we do expect them to be more generally useful, in particular Lemma 5.2. But we pretty much use the existence of conditionals throughout, and this seems to be the "bottleneck": most Markov categories that I can think of do not have conditionals, and assuming conditionals already seems to get us quite close to measure-theoretic probability. This is an intuition so far, but it would be nice to make this statement precise at some point by providing a categorical characterization theorem of measure-theoretic probability.

Doesn't SetMulti\mathsf{SetMulti} have conditionals? I think the really hard combination is conditionals + kolmogorov products - there I don't know of any example other than BorelStoch\mathsf{BorelStoch}

view this post on Zulip Eigil Rischel (May 17 2021 at 15:54):

The basic pattern being:

view this post on Zulip Tobias Fritz (May 17 2021 at 17:43):

Eigil Rischel said:

Doesn't SetMulti\mathsf{SetMulti} have conditionals? I think the really hard combination is conditionals + kolmogorov products - there I don't know of any example other than BorelStoch\mathsf{BorelStoch}

Yes, I completely agree with all of that. I guess my statement about conditionals getting us "close to measure-theoretic probability" was a bit unfortunate, and I indeed should have been saying that about conditionals plus (countable) Kolmogorov products.

Here's a bold question along these lines: Could it be that BorelStoch\mathsf{BorelStoch} is the initial Markov category with conditionals and countable Kolmogorov products generated by FinSet\mathsf{FinSet} as a deterministic subcategory, plus a single morphism 121 \to 2 that is invariant under the swap map 222 \to 2 (and corresponds to the fair coin)? For example, using countable Kolmogorov products and marginalization, we can at least obtain an arbitrarily biased coin from fair ones.

view this post on Zulip Tobias Fritz (May 17 2021 at 17:44):

Eigil Rischel said:

The basic pattern being:

That's a really nice way to put it!

view this post on Zulip dusko (May 23 2021 at 02:09):

hi tobias. how is your categorical view of this related with the one proposed by bart jacobs and sam satton? i see that you cite their paper and say that the approaches are different. the formalisms obviously are, but the general goal of axiomatizing synthetic probability theory seems to be shared. and i guess one could say that both jaynes and de finetti were arguing for the same. what would you say to de finetti himself: what does CT bring to the table that he did not have? and why are there two different categorical approaches? ((i forgot to say: wonderful work! so no misunderstanding about why i ask: CT is all about the big picture...))

view this post on Zulip Tobias Fritz (May 23 2021 at 10:23):

Hi Dusko, those are great questions! Let me try to say something about each one in turn while hopefully not making this too lengthy.

I'm obviously biased though, and am a bit worried about not having done the JS paper enough justice. Since @Sam Staton is here occasionally too, perhaps he can add whatever I may have missed.

In any case, I'm glad that you find this stuff interesting!

view this post on Zulip Sam Staton (May 24 2021 at 09:17):

Hi Tobias and Dusko. While writing the paper with Bart, I thought of trying to do something axiomatic, so it’s really great that it has worked out in the synthetic framework. My (unfinished) idea was possibly a bit different, though: I wanted to make a theorem from our paper an axiom. For example, to postulate that there exists a final exchangeable coalgebra, or that the sequence of multisets has a limit.

My idea was that finite probability spaces have a clear meaning, in terms of bets or whatever, but the infinite space [0,1] arguably only has a meaning because of de Finetti’s theorem. And so arguably we should postulate that there is a categorical limit of the finite exchangeable probabilities, and call it [0,1], and that is the only thing we should say about it in general. Do you see what I mean? Here the uniqueness is handy, because it uniquely determines [0,1]. Possibly this is also related to other coinductive approaches to the reals, outside probability.

Anyway, I was meaning to ask, Tobias, did you consider an axiom like this, a bit like a converse to your synthetically-proved theorem? For example, if a monoidal diagram FinSetsAndInjections^op --> C in a Markov category C has a limit P(X), does P(X) necessarily classify the probabilistic maps? I'm not sure whether that is in the same synthetic spirit or different.

view this post on Zulip Tobias Fritz (May 24 2021 at 10:11):

Hi Sam :smile: Turning the de Finetti theorem (in the form of your universal property or a variant of it) into the definition of PXPX is a very nice idea. I like it also because it may well be part of a more general pattern: together with @Eigil Rischel, we have a general definition of "Kolmogorov limit", of which both Kolmogorov products and the "de Finetti limit" which exhibits PXPX as the equalizer of all finite permutations on XNX^{\mathbb{N}} are particular instances, while at the same time it's a very restricted class of limits. Although we currently don't know whether all (say countable) Kolmogorov limits actually exist in measure-theoretic probability (say in BorelStoch\mathsf{BorelStoch}), it's tempting to conjecture that they do. If this is indeed the case, then I think that postulating the existence of Kolmogorov limits could be a very useful candidate axiom for synthetic probability, much nicer than some of the axioms that we currently work with, especially from the categorical perspective (and would indeed at least partially replace those).

As you suggest, exhitbiting PXPX as a Kolmogorov limit in particular implies representability, i.e. that the Markov category under consideration is the Kleisli category of PP on the subcategory of deterministic morphisms. In our setting this holds because, as per one of the conditions in the definition of Kolmogorov limit, the deterministic morphisms APXA \to PX must correspond to those general morphisms AXNA \to X^{\mathbb{N}} that are exchangeable and display conditional independence, and those are all of the form fNcopyf^{\mathbb{N}} \circ \mathsf{copy} for a unique f:AXf : A \to X.

view this post on Zulip Sam Staton (May 26 2021 at 20:03):

Thanks Tobias. This sounds like very interesting work. I suppose it’s not totally obvious how to write down a de Finetti-type axiom in the Markov setting, because the limiting cone isn’t formed of comonoid homomorphisms and yet you would want a canonical comonoid structure on the limiting object somehow. So I look forward to seeing your definition of Kolmogorov limit at some point.

In a similar vein, I started looking at the free limit completion of the Markov category of finite probabilities. (It's similar to the category of effect modules.) Possibly you've already thought about that.

view this post on Zulip dusko (May 28 2021 at 07:53):

Sam Staton said:

My idea was that finite probability spaces have a clear meaning, in terms of bets or whatever, but the infinite space [0,1] arguably only has a meaning because of de Finetti’s theorem.

FWIW, in the 70s grothendieck was thinking of all things along those lines. or at least of homotopy groups. in any case, he seems to have expected that all groups would have profinite representations. i think it was disproved relatively recently. from a completely different direction, levin also explains how all computable measures are profinite (in the random sequences part of his paper with zvonkin...)

view this post on Zulip dusko (May 28 2021 at 08:05):

hmm, it's a very natural message to draw from de finetti. i am almost surprised that it is not a well-developed direction. seems like it fits naturally into algorithmic information theory. martin-loef's notion of randomness is in a pro-measure space... (sorry, i am rambling. but will hit send anyway :)

view this post on Zulip Sam Staton (May 28 2021 at 14:53):

Thanks Dusko. I don’t know much about algorithmic randomness so I didn’t think of the connection to the Zvonkin-Levin direction, which seems interesting. I had been thinking a little bit about the Bernoulli factory problem, which perhaps you know. The idea, roughly, is to ask which functions (say) [0,1]->[0,1] can be constructed, if I think of the argument as a coin of unknown bias and I want to produce another source of randomness.

view this post on Zulip Eigil Rischel (Jun 03 2021 at 10:28):

Gromov also semi-recently advanced the idea that one should work with "pro-(finite measure spaces)" as the basis for probability theory, which was discussed a bit here: https://categorytheory.zulipchat.com/#narrow/stream/253118-theory.3A-probability/topic/Gromov's.20functorial.20probability.20theory

view this post on Zulip John van de Wetering (Jun 03 2021 at 19:41):

Perhaps this is not at all relevant, but this discussion reminds me of this paper: https://arxiv.org/abs/1701.00662
They show that matrix algebras are dense in the category of von neumann algebras. Von Neumann algebras are sometimes seen as a non-commutative generalisation of measure spaces, so perhaps the commutative version of this theorem gives a result like that every large measure space is a limit of finite ones?

view this post on Zulip Eigil Rischel (Jun 22 2021 at 13:58):

dusko said:

hmm, it's a very natural message to draw from de finetti. i am almost surprised that it is not a well-developed direction. seems like it fits naturally into algorithmic information theory. martin-loef's notion of randomness is in a pro-measure space... (sorry, i am rambling. but will hit send anyway :)

By the way, does anyone know of any work connecting algorithmic information theory to categorical probability stuff? It seems like another view of nondeterminism, so that one maybe there's a Markov category where a Martin-Löf random sequence (in this sense: https://en.wikipedia.org/wiki/Algorithmically_random_sequence) is a nondeterministic state on the space {0,1}\{0,1\}. On the face of it it seems like, if ϕ:I{0,1}\phi: I \to \{0,1\} is that state, ϕϕ\phi \otimes \phi would end up being supported on the diagonal (exactly what we don't want), but maybe one can get around this with some sort of cleverly chosen tensor product?

view this post on Zulip Tobias Fritz (Jun 22 2021 at 16:16):

Very interesting! I don't have a definite answer, but after some thought my tendency is that algorithmic randomness can probably not be captured by a Markov category since it behaves tropically: ϕϕ\phi \otimes \phi contains the same amount of algorithmic randomness as both ϕ\phi itself and copyϕ\mathsf{copy} \circ \phi do, indicating that ϕ\phi should be considered deterministic in the Markov categories sense. Nevertheless, ϕψ\phi \otimes \psi may contain more algorithmic randomness than ϕ\phi and ψ\psi do individually; it feels like taking the join of two elements in a poset.

view this post on Zulip Eigil Rischel (Jun 22 2021 at 16:48):

Yes, I've been coming to a similar conclusion. Although you can extract two "independent"* Martin-Löf sequences from a single one by taking the even and the odd indices, but it doesn't seem like you can extend this idea to build a Markov category. Actually, in some sense maybe that's exactly the problem: given a general sequence ψ{0,1}\psi \in \{0,1\}^\infty, it's possible that eg the "random variables" ψ2n+1\psi_{2n+1} and ψ2n\psi_{2n} are "dependent" (for example, if ψ2n\psi_{2n} is a Martin-Löf sequence, and ψ2n+1=ψ2n\psi_{2n+1} = \psi_{2n}, but also that they are independent. So it doesn't seem like Markov categories are well-suited to the situation where a state on {0,1}\{0,1\} is a binary sequence.

view this post on Zulip Tobias Fritz (Jun 22 2021 at 16:54):

Right. Which still leaves open the possibility of giving a categorical treatment in a different way. For example, categorical probability still has a lot to offer for merely semicartesian categories, and I bet that there is a nice semicartesian category that describes algorithmic randomness!

Maybe something like finite sets as objects, morphisms XYX \to Y being functions XNYNX^\mathbb{N} \to Y^\mathbb{N} such that initial output segments only depend on initial input segments, and then perhaps identify two such functions whenever they differ by a computable endomorphism of YNY^\mathbb{N}. The monoidal structure can then presumably be taken to be just the cartesian product. (This still looks like a Markov category with respect to the obvious copy morphisms... What do we make of this?)

view this post on Zulip Eigil Rischel (Jun 22 2021 at 16:55):

Yes, but in this Markov category the Martin-Löf sequences are deterministic

view this post on Zulip Tobias Fritz (Jun 22 2021 at 16:55):

I know, and I don't know how to resolve that. Maybe what I'm suggesting is to ignore the Markov category structure and try to do algorithmic randomness without it.

view this post on Zulip Tobias Fritz (Jun 22 2021 at 16:56):

For example, is there a different way to characterize the computable sequences in categorical terms?

view this post on Zulip Tobias Fritz (Jun 22 2021 at 17:03):

Maybe this touches upon categorical approaches to computability like realizability toposes and Turing categories, but I don't know enough about either of these to even write down the definition. So perhaps someone who knows can say whether either of these can provide a treatment of algorithmic randomness?

view this post on Zulip Eigil Rischel (Jun 22 2021 at 19:09):

Hmm, here is a thought: let's use the "leaking information" definition of deterministic (specialized to the case of a state for simplicity):

ϕ:IX\phi: I \to X is deterministic if, for any ψ:IXY\psi: I \to X \otimes Y so that ψ  XdelY=ϕ\psi \; X \otimes \operatorname{del}_Y = \phi,
ψ=ρϕ\psi = \rho \otimes \phi for some ρ:IY\rho: I \to Y.

Of course in the category you suggested, a Martin-Löf sequence is still deterministic.
But we can modify the definition as follows:

Say that ϕ\phi is deterministic if, for any ψ:IXY\psi: I \to X \otimes Y which can be written as the composition of ϕ\phi and computable morphisms, and so that ψ  XdelY=ϕ\psi \; X \otimes \operatorname{del}_Y = \phi, there is a computable ρ:IY\rho: I \to Y so that ψ=ρϕ\psi = \rho \otimes \phi.

I'm not sure that I quite like this definition - once you unpack it it becomes sort of trivial that computable = deterministic - but maybe it does capture in some sense the intuition that a Martin-Löf sequence "looks random" if you only "test" this using computable morphisms...

view this post on Zulip Tobias Fritz (Jun 22 2021 at 20:41):

Well, to begin with I'm not even sure if the "category" that I described even is a category, since it's not clear to me whether composition is well-defined with respect to the quotienting given by applying computable endofunctions. Maybe you're thinking about the category before taking the quotient, so that a morphism XYX \to Y is really just a function XNYNX^{\mathbb{N}} \to Y^{\mathbb{N}}, with initial output segments only depending on initial input segments (or equivalently continuity wrt the product topologies)?

In any case, it seems worth thinking about the question some more, but so far I feel like we don't have much.

view this post on Zulip Tobias Fritz (Jun 23 2021 at 06:54):

Here's a nice toy model for the problem. Fix a set PP ("parameters"). Consider the category with sets as objects and morphisms XYX \to Y being functions P×XYP \times X \to Y, to be thought of as functions indexed by a parameter in PP, and with the obvious composition and the cartesian monoidal structure. This is also the Kleisli category of the monad ()P(-)^P on Set. There is a natural subcategory consisting of those morphisms with trivial dependence on PP. The absence of such a dependence makes them "deterministic", or "computable" in the sense that they can be applied without knowledge about PP.

So what is this situation an example of? We seem to have two categories with finite products, one included in the other in a product-preserving way. In this particular situation, the supercategory is even a Kleisli category on the subcategory with respect to a product-preserving monad on the subcategory. Is there any general characterization of the subcategory in terms of the supercategory only?

In the case of algorithmic randomness, we could try to consider the general continuous functions XNYNX^\mathbb{N} \to Y^\mathbb{N} as the supercategory, and the computable ones as the subcategory. Then again both have finite products and the inclusion preserves these. But it now seems very unlikely that the supercategory would be a Kleisli category on the subcategory.

We could try to write down a categorical definition of Kolmogorov complexity which specializes to the standard one, also try to introduce Martin-Löf randomness for morphisms, and see what they amount to in the toy model. But I'm not sure how interesting this would be. Are there other things that could be done?

view this post on Zulip Spencer Breiner (Jun 23 2021 at 15:53):

Is the idea that the parameters encode the program being run? If so, it seems like dusko 's work on monoidal computers should be relevant.

view this post on Zulip Tobias Fritz (Jun 23 2021 at 22:24):

That may be one possible interpretation, but it would be more in line with our earlier discussion here to think of them as something like the "state of the environment": you don't know the state of the environment while running your computation, but it may have an effect on its result. Of course that's still "parameters encoding the program being run" in some sense, but in a different way than one would think when seeing that phrase.

In any case, thanks for the pointer to dusko's paper! It's been an intriguing read (as may be expected with him being the author). The concept of data service is something that I discovered independently last year in the context of categorical probability, so it's nice to see that it has already been considered.

view this post on Zulip Spencer Breiner (Jun 23 2021 at 22:44):

Indeed, this is not what I intended by "encoding the program". I was definitely thinking of the program definition itself as the "parameter"; you know it exists because the map is computable, but not what it is.

view this post on Zulip dusko (Jun 29 2021 at 21:00):

Spencer Breiner said:

Is the idea that the parameters encode the program being run? If so, it seems like dusko 's work on monoidal computers should be relevant.

we used the categorical view of computability to show how the martin-loef randomness can be expressed as in terms of strategies here:
https://arxiv.org/abs/1503.03185
i am not sure how clearly this particular presentation displays the CT view of AIT, but it is actually very direct in monoidal computer.

i wrote a book about monoidal computer, but since it actually wrote itself through 15 years of teaching computability and complexity courses in diagrams, it came out as a mixture of 3 books in one. another demostration of my talents to write unreadable stuff. so i have rewritten it completely, and have just the last chapter to go. thanks for the mention, @Spencer Breiner.