Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: categorical probability

Topic: de Finetti's Theorem

Tobias Fritz (May 07 2021 at 09:13):

De Finetti's Theorem is a classical result in probability theory. @Tomáš Gonda, @Paolo Perrone and myself have now proven it in completely categorical terms: as it turns out, one only needs a Markov category that has conditionals, countable Kolmogorov products and is the Kleisli category of a monad in a way that respects almost sure equality.

The Kleisli category of the Giry monad on standard Borel measurable spaces satisfies these assumptions, and we therefore recover the usual de Finetti Theorem in this case. Despite having a large supply of Markov categories in general, we do not know of any nontrivial Markov category outside of measure-theoretic probability that would satisfy the three relevant assumptions! So unfortunately we don't get any new de Finetti theorems yet.

Morgan Rogers (he/him) (May 07 2021 at 10:21):

Can you at least loosen the conditions in some directions to get weakenings of the theorem?

Tobias Fritz (May 07 2021 at 10:50):

Good question! I can't think of any weakening that would still preserve the spirit of the theorem, but perhaps the others can say more.

Some of our lemmas though do hold more generally, and we do expect them to be more generally useful, in particular Lemma 5.2. But we pretty much use the existence of conditionals throughout, and this seems to be the "bottleneck": most Markov categories that I can think of do not have conditionals, and assuming conditionals already seems to get us quite close to measure-theoretic probability. This is an intuition so far, but it would be nice to make this statement precise at some point by providing a categorical characterization theorem of measure-theoretic probability.

Spencer Breiner (May 07 2021 at 16:37):

I don't understand this stuff very well (hopefully this paper will help!), but an obvious question, just from looking at the theorem itself, is what happens if you remove the restriction to finite permutations? Can you generalize your Kolmogorov product definition to other families of subsets (maybe an ultrafilter?)?

Spencer Breiner (May 07 2021 at 16:39):

Alternatively/additionally, I would be interested in hearing more about why the Kolmogorov product is defined in the way that it is? Is there something going on here with the cofinite topology?

Spencer Breiner (May 07 2021 at 16:45):

Spencer Breiner said:

~~(maybe an ultrafilter?)~~

That doesn't make sense. How about the closed sets for a finer topology?

Spencer Breiner (May 07 2021 at 16:49):

I suppose this would be a strengthening of the theorem, so the question is really whether those extra "projections" would give you something more.

Tobias Fritz (May 07 2021 at 17:31):

I'm not sure what generalization of Kolmogorov products you have in mind, in the sense of where the closed sets come in. Could you spell that out a bit? What would its universal property be?

The definition of Kolmogorov products is based on the Kolmogorov extension theorem, which states that a probability measure is uniquely determined by its finite marginals; and conversely, every compatible family of finite marginals arises in this way. So it very much has the flavour of a sheaf condition. There's a more detailed discussion of Kolmogorov products and why they're defined that way in an earlier paper that I wrote together with @Eigil Rischel.

Eigil Rischel (May 17 2021 at 15:37):

Tobias Fritz said:

Good question! I can't think of any weakening that would still preserve the spirit of the theorem, but perhaps the others can say more.

Some of our lemmas though do hold more generally, and we do expect them to be more generally useful, in particular Lemma 5.2. But we pretty much use the existence of conditionals throughout, and this seems to be the "bottleneck": most Markov categories that I can think of do not have conditionals, and assuming conditionals already seems to get us quite close to measure-theoretic probability. This is an intuition so far, but it would be nice to make this statement precise at some point by providing a categorical characterization theorem of measure-theoretic probability.

Doesn't $\mathsf{SetMulti}$ have conditionals? I think the really hard combination is conditionals + kolmogorov products - there I don't know of any example other than $\mathsf{BorelStoch}$

Eigil Rischel (May 17 2021 at 15:54):

The basic pattern being:

For Kolmogorov products to work, you need some sort of "continuity" principle (although it can be quite weak, seeing as how $\mathsf{BorelStoch}$ qualifies).
Conditionals tend to be "discontinuous" in some sense.

Tobias Fritz (May 17 2021 at 17:43):

Eigil Rischel said:

Doesn't $\mathsf{SetMulti}$ have conditionals? I think the really hard combination is conditionals + kolmogorov products - there I don't know of any example other than $\mathsf{BorelStoch}$

Yes, I completely agree with all of that. I guess my statement about conditionals getting us "close to measure-theoretic probability" was a bit unfortunate, and I indeed should have been saying that about conditionals plus (countable) Kolmogorov products.

Here's a bold question along these lines: Could it be that $\mathsf{BorelStoch}$ is the initial Markov category with conditionals and countable Kolmogorov products generated by $\mathsf{FinSet}$ as a deterministic subcategory, plus a single morphism $1 \to 2$ that is invariant under the swap map $2 \to 2$ (and corresponds to the fair coin)? For example, using countable Kolmogorov products and marginalization, we can at least obtain an arbitrarily biased coin from fair ones.

Tobias Fritz (May 17 2021 at 17:44):

Eigil Rischel said:

The basic pattern being:

For Kolmogorov products to work, you need some sort of "continuity" principle (although it can be quite weak, seeing as how $\mathsf{BorelStoch}$ qualifies).

Conditionals tend to be "discontinuous" in some sense.

That's a really nice way to put it!

dusko (May 23 2021 at 02:09):

hi tobias. how is your categorical view of this related with the one proposed by bart jacobs and sam satton? i see that you cite their paper and say that the approaches are different. the formalisms obviously are, but the general goal of axiomatizing synthetic probability theory seems to be shared. and i guess one could say that both jaynes and de finetti were arguing for the same. what would you say to de finetti himself: what does CT bring to the table that he did not have? and why are there two different categorical approaches? ((i forgot to say: wonderful work! so no misunderstanding about why i ask: CT is all about the big picture...))

Tobias Fritz (May 23 2021 at 10:23):

Hi Dusko, those are great questions! Let me try to say something about each one in turn while hopefully not making this too lengthy.

Concerning the relation to the Jacobs-Staton paper, the short answer is that they've reformulated de Finetti's theorem as a universal property, while we've proven the existence part of that universal property (which is the hard part) entirely synthetically.
In some more detail, I think that their approach is more hands-on and explicit, arguably closer to probabilistic programming, while not providing a clean separation between the synthetic and the semantic level. In particular they do not have anything like a synthetic proof of the de Finetti theorem. For these latter reasons I'm not sure if I would consider their paper as part of the project of axiomatizing synthetic probability theory. Our paper makes a clear distinction between the synthetic axiomatics and the semantics, and the main point is to give a purely synthetic proof of the de Finetti theorem. But on the other hand and I'm not aware of any significance of our proof to probabilistic programming. As for more minor technical differences, JS have also focused on the binary case, although (as they comment at the end) it can quite plausibly be extended from there. We have not stated any universal property like JS have, since so far we have only been able to prove the existence but not the uniqueness part of it synthetically. Finally, JS consider multisets while we work with sequences with permutation invariance, but this is arguably more of a superficial difference.

I'm obviously biased though, and am a bit worried about not having done the JS paper enough justice. Since @Sam Staton is here occasionally too, perhaps he can add whatever I may have missed.

As for the relevance of CT, I think the answer is that the synthetic level provides a much cleaner framework in which things like the de Finetti theorem can be proven without getting bogged down too much in measure-theoretic technicalities. Ultimately one can hope that this will facilitate the development of results of greater complexity than what is humanly possible with measure theory directly. One could also hope that our de Finetti theorem would be more general and apply with semantics different from measure-theoretic probability, but this hasn't panned out so far: as discussed earlier in this thread, we haven't yet found any category outside of measure-theoretic probability in which our result would apply nontrivially. In particular, any attempted application to categories with nondeterminism (instead of probabilistic uncertainty) in the morphisms seems to conspire so as to violate one of our axioms. For example, the category of sets and multivalued functions (meaning the counit-preserving morphisms in Rel) does not have countable Kolmogorov products.

In any case, I'm glad that you find this stuff interesting!

Sam Staton (May 24 2021 at 09:17):

Hi Tobias and Dusko. While writing the paper with Bart, I thought of trying to do something axiomatic, so it’s really great that it has worked out in the synthetic framework. My (unfinished) idea was possibly a bit different, though: I wanted to make a theorem from our paper an axiom. For example, to postulate that there exists a final exchangeable coalgebra, or that the sequence of multisets has a limit.

My idea was that finite probability spaces have a clear meaning, in terms of bets or whatever, but the infinite space [0,1] arguably only has a meaning because of de Finetti’s theorem. And so arguably we should postulate that there is a categorical limit of the finite exchangeable probabilities, and call it [0,1], and that is the only thing we should say about it in general. Do you see what I mean? Here the uniqueness is handy, because it uniquely determines [0,1]. Possibly this is also related to other coinductive approaches to the reals, outside probability.

Anyway, I was meaning to ask, Tobias, did you consider an axiom like this, a bit like a converse to your synthetically-proved theorem? For example, if a monoidal diagram FinSetsAndInjections^op --> C in a Markov category C has a limit P(X), does P(X) necessarily classify the probabilistic maps? I'm not sure whether that is in the same synthetic spirit or different.

Tobias Fritz (May 24 2021 at 10:11):

Hi Sam :smile: Turning the de Finetti theorem (in the form of your universal property or a variant of it) into the definition of $PX$ is a very nice idea. I like it also because it may well be part of a more general pattern: together with @Eigil Rischel, we have a general definition of "Kolmogorov limit", of which both Kolmogorov products and the "de Finetti limit" which exhibits $PX$ as the equalizer of all finite permutations on $X^{\mathbb{N}}$ are particular instances, while at the same time it's a very restricted class of limits. Although we currently don't know whether all (say countable) Kolmogorov limits actually exist in measure-theoretic probability (say in $\mathsf{BorelStoch}$ ), it's tempting to conjecture that they do. If this is indeed the case, then I think that postulating the existence of Kolmogorov limits could be a very useful candidate axiom for synthetic probability, much nicer than some of the axioms that we currently work with, especially from the categorical perspective (and would indeed at least partially replace those).

As you suggest, exhitbiting $PX$ as a Kolmogorov limit in particular implies representability, i.e. that the Markov category under consideration is the Kleisli category of $P$ on the subcategory of deterministic morphisms. In our setting this holds because, as per one of the conditions in the definition of Kolmogorov limit, the deterministic morphisms $A \to PX$ must correspond to those general morphisms $A \to X^{\mathbb{N}}$ that are exchangeable and display conditional independence, and those are all of the form $f^{\mathbb{N}} \circ \mathsf{copy}$ for a unique $f : A \to X$ .

Sam Staton (May 26 2021 at 20:03):

Thanks Tobias. This sounds like very interesting work. I suppose it’s not totally obvious how to write down a de Finetti-type axiom in the Markov setting, because the limiting cone isn’t formed of comonoid homomorphisms and yet you would want a canonical comonoid structure on the limiting object somehow. So I look forward to seeing your definition of Kolmogorov limit at some point.

In a similar vein, I started looking at the free limit completion of the Markov category of finite probabilities. (It's similar to the category of effect modules.) Possibly you've already thought about that.

dusko (May 28 2021 at 07:53):

Sam Staton said:

My idea was that finite probability spaces have a clear meaning, in terms of bets or whatever, but the infinite space [0,1] arguably only has a meaning because of de Finetti’s theorem.

FWIW, in the 70s grothendieck was thinking of all things along those lines. or at least of homotopy groups. in any case, he seems to have expected that all groups would have profinite representations. i think it was disproved relatively recently. from a completely different direction, levin also explains how all computable measures are profinite (in the random sequences part of his paper with zvonkin...)

dusko (May 28 2021 at 08:05):

hmm, it's a very natural message to draw from de finetti. i am almost surprised that it is not a well-developed direction. seems like it fits naturally into algorithmic information theory. martin-loef's notion of randomness is in a pro-measure space... (sorry, i am rambling. but will hit send anyway :)

Sam Staton (May 28 2021 at 14:53):

Thanks Dusko. I don’t know much about algorithmic randomness so I didn’t think of the connection to the Zvonkin-Levin direction, which seems interesting. I had been thinking a little bit about the Bernoulli factory problem, which perhaps you know. The idea, roughly, is to ask which functions (say) [0,1]->[0,1] can be constructed, if I think of the argument as a coin of unknown bias and I want to produce another source of randomness.

Eigil Rischel (Jun 03 2021 at 10:28):

Gromov also semi-recently advanced the idea that one should work with "pro-(finite measure spaces)" as the basis for probability theory, which was discussed a bit here: https://categorytheory.zulipchat.com/#narrow/stream/253118-theory.3A-probability/topic/Gromov's.20functorial.20probability.20theory

John van de Wetering (Jun 03 2021 at 19:41):

Perhaps this is not at all relevant, but this discussion reminds me of this paper: https://arxiv.org/abs/1701.00662
They show that matrix algebras are dense in the category of von neumann algebras. Von Neumann algebras are sometimes seen as a non-commutative generalisation of measure spaces, so perhaps the commutative version of this theorem gives a result like that every large measure space is a limit of finite ones?

Eigil Rischel (Jun 22 2021 at 13:58):

dusko said:

hmm, it's a very natural message to draw from de finetti. i am almost surprised that it is not a well-developed direction. seems like it fits naturally into algorithmic information theory. martin-loef's notion of randomness is in a pro-measure space... (sorry, i am rambling. but will hit send anyway :)

By the way, does anyone know of any work connecting algorithmic information theory to categorical probability stuff? It seems like another view of nondeterminism, so that one maybe there's a Markov category where a Martin-Löf random sequence (in this sense: https://en.wikipedia.org/wiki/Algorithmically_random_sequence) is a nondeterministic state on the space $\{0,1\}$ . On the face of it it seems like, if $\phi: I \to \{0,1\}$ is that state, $\phi \otimes \phi$ would end up being supported on the diagonal (exactly what we don't want), but maybe one can get around this with some sort of cleverly chosen tensor product?

Tobias Fritz (Jun 22 2021 at 16:16):

Very interesting! I don't have a definite answer, but after some thought my tendency is that algorithmic randomness can probably not be captured by a Markov category since it behaves tropically: $\phi \otimes \phi$ contains the same amount of algorithmic randomness as both $\phi$ itself and $\mathsf{copy} \circ \phi$ do, indicating that $\phi$ should be considered deterministic in the Markov categories sense. Nevertheless, $\phi \otimes \psi$ may contain more algorithmic randomness than $\phi$ and $\psi$ do individually; it feels like taking the join of two elements in a poset.

Eigil Rischel (Jun 22 2021 at 16:48):

Yes, I've been coming to a similar conclusion. Although you can extract two "independent"* Martin-Löf sequences from a single one by taking the even and the odd indices, but it doesn't seem like you can extend this idea to build a Markov category. Actually, in some sense maybe that's exactly the problem: given a general sequence $\psi \in \{0,1\}^\infty$ , it's possible that eg the "random variables" $\psi_{2n+1}$ and $\psi_{2n}$ are "dependent" (for example, if $\psi_{2n}$ is a Martin-Löf sequence, and $\psi_{2n+1} = \psi_{2n}$ , but also that they are independent. So it doesn't seem like Markov categories are well-suited to the situation where a state on $\{0,1\}$ is a binary sequence.

Here "independent" should be interpreted in some constructive asymptotic sense, where for any computable sequence of events $U_n \subseteq \{0,1\}^\infty \times \{0,1\}^\infty$ , where the probability goes to zero assuming the two coordinates are sampled from a product measure, $\phi \notin U_i$ for some $i$ .

Tobias Fritz (Jun 22 2021 at 16:54):

Right. Which still leaves open the possibility of giving a categorical treatment in a different way. For example, categorical probability still has a lot to offer for merely semicartesian categories, and I bet that there is a nice semicartesian category that describes algorithmic randomness!

Maybe something like finite sets as objects, morphisms $X \to Y$ being functions $X^\mathbb{N} \to Y^\mathbb{N}$ such that initial output segments only depend on initial input segments, and then perhaps identify two such functions whenever they differ by a computable endomorphism of $Y^\mathbb{N}$ . The monoidal structure can then presumably be taken to be just the cartesian product. (This still looks like a Markov category with respect to the obvious copy morphisms... What do we make of this?)

Eigil Rischel (Jun 22 2021 at 16:55):

Yes, but in this Markov category the Martin-Löf sequences are deterministic

Tobias Fritz (Jun 22 2021 at 16:55):

I know, and I don't know how to resolve that. Maybe what I'm suggesting is to ignore the Markov category structure and try to do algorithmic randomness without it.

Tobias Fritz (Jun 22 2021 at 16:56):

For example, is there a different way to characterize the computable sequences in categorical terms?

Tobias Fritz (Jun 22 2021 at 17:03):

Maybe this touches upon categorical approaches to computability like realizability toposes and Turing categories, but I don't know enough about either of these to even write down the definition. So perhaps someone who knows can say whether either of these can provide a treatment of algorithmic randomness?

Eigil Rischel (Jun 22 2021 at 19:09):

Hmm, here is a thought: let's use the "leaking information" definition of deterministic (specialized to the case of a state for simplicity):

$\phi: I \to X$ is deterministic if, for any $\psi: I \to X \otimes Y$ so that $\psi \; X \otimes \operatorname{del}_Y = \phi$ ,
$\psi = \rho \otimes \phi$ for some $\rho: I \to Y$ .

Of course in the category you suggested, a Martin-Löf sequence is still deterministic.
But we can modify the definition as follows:

Say that $\phi$ is deterministic if, for any $\psi: I \to X \otimes Y$ which can be written as the composition of $\phi$ and computable morphisms, and so that $\psi \; X \otimes \operatorname{del}_Y = \phi$ , there is a computable $\rho: I \to Y$ so that $\psi = \rho \otimes \phi$ .

I'm not sure that I quite like this definition - once you unpack it it becomes sort of trivial that computable = deterministic - but maybe it does capture in some sense the intuition that a Martin-Löf sequence "looks random" if you only "test" this using computable morphisms...

Tobias Fritz (Jun 22 2021 at 20:41):

Well, to begin with I'm not even sure if the "category" that I described even is a category, since it's not clear to me whether composition is well-defined with respect to the quotienting given by applying computable endofunctions. Maybe you're thinking about the category before taking the quotient, so that a morphism $X \to Y$ is really just a function $X^{\mathbb{N}} \to Y^{\mathbb{N}}$ , with initial output segments only depending on initial input segments (or equivalently continuity wrt the product topologies)?

In any case, it seems worth thinking about the question some more, but so far I feel like we don't have much.

Tobias Fritz (Jun 23 2021 at 06:54):

Here's a nice toy model for the problem. Fix a set $P$ ("parameters"). Consider the category with sets as objects and morphisms $X \to Y$ being functions $P \times X \to Y$ , to be thought of as functions indexed by a parameter in $P$ , and with the obvious composition and the cartesian monoidal structure. This is also the Kleisli category of the monad $(-)^P$ on Set. There is a natural subcategory consisting of those morphisms with trivial dependence on $P$ . The absence of such a dependence makes them "deterministic", or "computable" in the sense that they can be applied without knowledge about $P$ .

So what is this situation an example of? We seem to have two categories with finite products, one included in the other in a product-preserving way. In this particular situation, the supercategory is even a Kleisli category on the subcategory with respect to a product-preserving monad on the subcategory. Is there any general characterization of the subcategory in terms of the supercategory only?

In the case of algorithmic randomness, we could try to consider the general continuous functions $X^\mathbb{N} \to Y^\mathbb{N}$ as the supercategory, and the computable ones as the subcategory. Then again both have finite products and the inclusion preserves these. But it now seems very unlikely that the supercategory would be a Kleisli category on the subcategory.

We could try to write down a categorical definition of Kolmogorov complexity which specializes to the standard one, also try to introduce Martin-Löf randomness for morphisms, and see what they amount to in the toy model. But I'm not sure how interesting this would be. Are there other things that could be done?

Spencer Breiner (Jun 23 2021 at 15:53):

Is the idea that the parameters encode the program being run? If so, it seems like dusko 's work on monoidal computers should be relevant.

Tobias Fritz (Jun 23 2021 at 22:24):

That may be one possible interpretation, but it would be more in line with our earlier discussion here to think of them as something like the "state of the environment": you don't know the state of the environment while running your computation, but it may have an effect on its result. Of course that's still "parameters encoding the program being run" in some sense, but in a different way than one would think when seeing that phrase.

In any case, thanks for the pointer to dusko's paper! It's been an intriguing read (as may be expected with him being the author). The concept of data service is something that I discovered independently last year in the context of categorical probability, so it's nice to see that it has already been considered.

Spencer Breiner (Jun 23 2021 at 22:44):

Indeed, this is not what I intended by "encoding the program". I was definitely thinking of the program definition itself as the "parameter"; you know it exists because the map is computable, but not what it is.

dusko (Jun 29 2021 at 21:00):

Spencer Breiner said:

Is the idea that the parameters encode the program being run? If so, it seems like dusko 's work on monoidal computers should be relevant.

we used the categorical view of computability to show how the martin-loef randomness can be expressed as in terms of strategies here:
https://arxiv.org/abs/1503.03185
i am not sure how clearly this particular presentation displays the CT view of AIT, but it is actually very direct in monoidal computer.

i wrote a book about monoidal computer, but since it actually wrote itself through 15 years of teaching computability and complexity courses in diagrams, it came out as a mixture of 3 books in one. another demostration of my talents to write unreadable stuff. so i have rewritten it completely, and have just the last chapter to go. thanks for the mention, @Spencer Breiner.