You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I've long been a fan of Cox's 1946 paper on probability as logic {1}, which presents probability as being, fundamentally, about assigning numbers (conditional probabilities) to pairs of statements of propositional logic, the idea being that under certain assumptions, probability theory is the uniquely correct way to extend Boolean logic to deal with uncertainty.
I like to think about probability this way, but unfortunately Cox's treatment isn't completely formal and doesn't address measure-theoretic issues, with the result that whenever I try to think about probability measures on infinite sets I start to feel like I'm on conceptually shaky ground. In particular, I don't know of any justification for the use of countable unions in the definition of a sigma algebra from this probability-as-logic perspective.
So, firstly, I'm wondering whether anything can be said about this from the perspective of topos theory or the connection between logic and topology. (I know nothing about these fields, so this is a basic question.) Do sigma algebras arise naturally from that perspective as the "right" kind of lattice of propositions that we should assign probabilities to?
Secondly, I'm wondering whether there exists a category-theoretic treatment of probability at this kind of foundational level. The approaches I know about tend to start from an existing definition of measurable spaces, but I'm wondering if it's possible instead to arrive at the definition of probability through category-theoretic reasoning. (As a loose analogy, consider the way Lawvere metric spaces arise from a category-theoretic construction - it seems like something similar ought to be possible for probability.) I'm dreaming of a more rigorous, category-flavoured version of Cox's argument, and it would be great to read about that if it does exist.
{1} Cox(1946) 'Probability, Frequency and Reasonable Expectation', American Journal of Physics 14(1). There's a pdf here.
M. Jackson's PhD thesis, A Sheaf Theoretic Approach To Measure Theory, should be a useful reference for you as far as building toposes from -algebras is concerned. I can't remember how much Jackson looks into the internal logic, but this could be a good place to start :heart_eyes:
@Nathaniel Virgo I don't know anything about this but I like this question a lot. It sounds like Cox's approach is overdue for a revision.
Do -algebras have to be involved? You can talk about the syntax and semantics of propositional and predicate logic without mentioning boolean algebras, so analogously you should be able to do probabilistic logic without -algebras.
right, I suspect one sort of answer will be that you should call what you're doing "synthetic probability theory" and then one class of models comes from classical probability theory with -algebras and all that jazz
But that's kind of a non-answer so it would be worth digging deeper.
The thought I had about this was that the analogue of a sequent would be the conditional probability . Then instead of deduction rules you would have rules for calculating conditional probabilities from other conditional probabilities.
At one point I did a very small amount of digging into "finitely additive probability", where people relax the countable union requirement of a sigma algebra. The paper {2} seemed useful. It seems that you lose some useful theorems of conventional probability theory, but you gain the ability to define a uniform measure on the integers and a translation-invariant measure on the reals.
So it's plausible that the answer is "-algebras aren't really necessary after all", or "sometimes you do need a -algebra and sometimes you don't, depending on context." But still, it would be really nice to see what the presence or absence of countable union corresponds to on the logic side of the picture.
{2} Schervish, J., Seidenfeld, T. & Kadane, J. B. (1986). Statistical implications of finitely additive probability. Rethinking the Foundations of Statistics, 211-232.
https://www.cmu.edu/dietrich/philosophy/docs/seidenfeld/Stat%20implications%20FA.pdf
@Oscar Cunningham I guess in that picture, the -algebra corresponds to the lattice of propositions that we're allowed to assign conditional probabilities to. Then we can still ask the question of whether that lattice should have finite or countable or arbitrary unions. (It seems like that's the kind of question that could be considered purely from the perspective of logic, independently of probability, but I have very little knowledge of that.)
My first guess would be to try and find a Stone duality between the category of -algebras and a category of spaces. A quick search led me to this MO question which says it was done for a specific type of -algebras in {3}
{3} Sikorski, Roman. "Remarks on some topological spaces of high power." Fundamenta Mathematicae 37.1 (1950): 125-136
http://matwbn.icm.edu.pl/ksiazki/fm/fm37/fm37111.pdf
Apparently this does not work but it is completely understood how it does not work:
https://terrytao.wordpress.com/2009/01/12/245b-notes-1-the-stone-and-loomis-sikorski-representation-theorems-optional/
I think the notion of valuation (see https://ncatlab.org/nlab/show/valuation+(measure+theory) for example), which lives on a topology rather than on a sigma-algebra, is much better behaved categorically, and has nice links to (constructive/intuitionistic logic).
Here (https://arxiv.org/abs/1910.03752) for example we prove that valuations behave a lot like closed sets, and closed sets can be considered a zero-one version of valuations. Mind though: closed sets, not all sets!
Rough idea: just like an open set is a property that's "easy to prove right", it's also a set which has "fat interior", and so it's "easy to find its measure".
Right, and Cox's equation 18 is essentially the modularity axiom for a valuation, so it seems related there as well.
Does this mean we could/should rebuild probability theory in terms of valuations instead of measures if we wanted to relate it rigorously to logic? How different would it be? (This is a very vague and speculative question.)
Countability it's a Jedi mental trick, it's not really there. I say this because
It's going to take me some time to really digest some of this. In parallel, I wonder if we can attack the easier (for me) side of my question: Can we "just-ify" probability through a category-theoretic construction? I've a feeling the following is probably already implicit in what some of you are saying, but here goes anyway:
Let's just worry about the finite case for the moment (unless it turns out we don't need that restriction), and consider a finite Boolean algebra . It probably makes sense to think of this as a thin category (or a depleted one?). If we're doing things Cox's way then what we want to do is define a function from pairs of elements of the Boolean algebra to the reals between 0 and 1, , which we write as for . We want this function to obey the following axioms, which are the conclusions of Cox's argument:
(Note that this is very slightly different from the conventional definition of probability, in that it allows to be defined even when . I rather like this as a feature.)
The question is, is there an equivalent way to state this where we just say that is a functor from to some other category (presumably as an ordered set) with some natural conditions on the functor?
I guess one thing to note is that we can relax axiom 3 a bit to get
3b. for all
which gives us something that looks like a valuation, and makes it seem like should be a functor that respects bimonoidal structures defined on both and . (This version doesn't mention 1, so it's really talking about unnormalised probabilities.)
I guess the questions are just whether there's a more elegant way to frame that in terms of category theory, and whether there's any sense in trying to get to probability this way - will something along these lines generalise nicely to reasoning about infinite sets, logics without LEM, etc.?
ive been out of chat for long enough that im skimming everything i missed and not rly reading this thread in full but, from the glance i took, i just wanted to say that all of the times ive felt like probability & stats were things i wanted to learn, it's been when i was seeing edges of this framing :D
Nathaniel Virgo said:
[I]t seems like should be a functor that respects bimonoidal structures defined on both and . (This version doesn't mention 1, so it's really talking about unnormalised probabilities.)
This is interesting, but why would you want to turn a measure into a functor?
Looking into this stuff has convinced me probability theory is a chimera of many ideas which can't really work well together in a structural approach. That is, if you want to give it a more elegant look, you have to pay somewhere.
Probability measures as-is do not respect bimonoidal structures in any nice way. I mean, unions are sent to sums if and only if they are disjoint. Products are a mess: you need the events to be independent. I've yet to find a good account of this interaction which is not either (a) a complete makeover of the notion of probability or (b) a restatement of the usual laws in a more fancy language, which I believe gains you very little
Btw there's an extensive literature on "probability logic", i.e. -valued semantics, but the above applies: you let go somewhere in logic or probability.
Matteo Capucci said:
This is interesting, but why would you want to turn a measure into a functor?
I have lots of reasons, although quite likely no good reasons. I guess the main one is probably that I haven't gone deep enough into it to become convinced that a more elegant picture isn't possible. I see probability as the calculus for reasoning in the face of uncertainty, and as such I don't like that it's a chimera of different ideas. It feels like it should be as clean and elegant as logic, and it should have deep interconnections with other fields in the same kind of way. If it can be formulated as a functor that might be a way into understanding them.
If that isn't possible then maybe it deserves to be reinvented. It could be that measures never were the right tool for reasoning under uncertainty except in the most well-behaved cases, and in that case replacing them with something else could shed a lot of light on things. Maybe in the end that's where this line of reasoning would have to go.
Another reason is to see how it generalises. In one direction there is quantum probability, and there's an obvious question of whether there's an analog of Cox's theorem for quantum theory. You would have to change some of the assumptions, but knowing which ones might tell us something. (There were some people trying to do that a few years ago, but I haven't kept up with it.) In another direction, which I'm somewhat more interested in, there are the Tsallis entropy and the -divergences and that little family of ideas, which seem like they're more at home in the world of unnormalised distributions than normalised ones. I feel like there's more work to do before I understand what they "really mean" - it feels somehow like probability theory is really a point in a bigger space, and I'd like to get more of an idea what that space looks like. In other words, I'm seeking a definition of "things that behave like probability theory," not just a definition of probability theory.
This does all make me think that turning a measure into a functor might not be the right approach though. The right abstraction might not involve measures at all.
Re, probablily as Logic, I'd be interested in people's take on David Lewis' "A Subjectivist's Guide to Objective Chance" from 1980. That starts from spaces of possibilities, and sees probabilities as partitions of those. I tend to prefer possibilities to come first. But then I am not very knowledgeable about the subject.
@Nathaniel Virgo There are papers that argue that quantum theory is an instance of the same sort of thing as probability (and logic). They satisfy some basic mathematical rules for 'plausible reasoning' which are similar to your above rules. Logic is where everything is certain (0 or 1). Probability is for when you can be certain that an event occurred, but not what its outcome will be. And quantum theory is for when there is uncertainty about whether events occurred and their outcome.
This explains why the mathematics of quantum mechanics is useful for statistical models of things outside of physics (which, I guess it is). The authors also argue (in quite a lot of papers) that this is also why it's useful in physics, and that the usual interpretations of quantum theory are incorrectly applying statistical statements to individual events. But I guess most people will have a hard time believing that.
Dan Doel said:
This explains why the mathematics of quantum mechanics is useful for statistical models of things outside of physics (which, I guess it is).
I've always wanted a good example of that. Do you know one?
(The thing I've always wanted a good example of is a case where it's natural to use a density matrix with off-diagonal entries in the process of calculating probabilities for a purely classical system. This may or may not be exactly what you mean.)
I think this might be the book they mention in one of their papers: https://www.amazon.com/Ubiquitous-Quantum-Structure-Psychology-Finance-ebook/dp/B00FC903UQ
Maybe you've seen that, though.
There are probably individual papers on stuff you can scrounge up based on the fields mentioned in the description of that book, too.
Oh, I also have a computer program that simulates something similar to one of their models for the EPRB experiment. That one's particularly easy to model in a way that gets similar counterintuitive results for the actual experimental procedure, although I don't know if it's what you're looking for.
I'm not any kind of expert on anything related to this, though.
Dan Doel said:
Probability is for when you can be certain that an event occurred, but not what its outcome will be. And quantum theory is for when there is uncertainty about whether events occurred and their outcome.
Nice insight!
Well, I can take no credit for it. :smiley:
If you do probability theory and don't make contact with measures, nobody will recognize it as probability theory. Maybe you won't found it on measures, but it's got to connect to measures somehow.
A measure gives a functor from the poset of measurable subsets of a measurable space to the poset ... but with some extra properties!
I like my advisor Irving Segal's approach to integration theory, based on algebra (see the theorem on page 14 here).
This connects integration theory and probability theory to quantum theory in a nice way.
I believe that doing probability theory in terms of measures is like doing set theory in terms of elements. As category theorists, we know how to do better: set theory "should" be formulated in terms of functions! Since the probability analogue of a function is a Markov kernel, what we really should be doing is to redevelop probability theory in terms of Markov kernels! So we need axioms that make a category look like a category of Markov kernels. I've proposed some candidate axioms in my paper on Markov categories, following earlier work of Golubtsov as well as Cho and Jacobs. One can develop quite a bit of probability theory within this framework, e.g. by reproving and generalizing zero-one laws, as I've done with Eigil Rischel. But there's a lot more to do!
For a quick introduction to Markov categories, perhaps these slides could be useful.
I'm partway through your Markov categories paper currently! I'll post more on this thread tomorrow.
Nathaniel Virgo said:
I'm partway through your Markov kernels paper currently! I'll post more on this thread tomorrow.
Oh nice! (Comments or questions very welcome, either here or by email.) Do you have plans with it or just being curious?
BTW from the probability-as-logic perspective, countable additivity in some sense amounts to preservation of the existential quantifier: if is a predicate on , then . Have you considered this way of thinking about countable additivity? It's called the Gaifman condition. (While it's a neat observation, it doesn't seem to explain what's special about countability. For that, the structure of the real numbers as an Archimedean ordered field seems to be crucial.)
John Baez said:
A measure gives a functor from the poset of measurable subsets of a measurable space to the poset ... but with some extra properties!
Well, any function is 'a functor with extra properties'. The point is: do the extra properties have a nice categorical meaning?
John Baez said:
If you do probability theory and don't make contact with measures, nobody will recognize it as probability theory. Maybe you won't found it on measures, but it's got to connect to measures somehow.
This is true of course, but at the same time I feel that in applications the measure is very often not the real object of interest. Even if you're using something like stochastic differential equations to model a physical process you probably really want to compute expectations and variances, or calculate some kind entropy based measure, or generate good samples from a discrete approximation, or approximate a probability density function so you can plot it. Under the hood there's a measure on a space of continuous functions, but in some sense this is an implementation detail - it's an intermediate step in computing the statistics, which are the things you're really interested in. So from a somewhat applications-focused perspective I would be happy with something with the same kinds of "inputs" and "outputs" as measure theory, even if it looked a bit different on the inside.
One can also motivate probability theory from the perspective of decision theory. E.g. consider the "Dutch book" arguments that say an agent should use Bayes' rule because if it doesn't then you can bet against it in such a way that you always win. If you take this approach then the "implementation detail" point becomes more formal: the probability measures are "inside the agent's head" and only matter to the extent that they affect its behaviour. Anything else would be fine as long as it leads to the same decisions being made in the end. (But this might be meandering a bit away from where I was originally trying to go, I'm not sure.)
Tobias Fritz said:
Oh nice! (Comments or questions very welcome, either here or by email.) Do you have plans with it or just being curious?
I want to see how far we can get towards doing information theory and information geometry this way. (This is one of my motivations for learning category theory: I was going crazy filling my notebooks up with discrete sums and logarithms, and decided there had to be a better way.) I possibly have some nice ideas on how to do that, but it's maybe a topic for another time, once I've figured out more of the details.
Tobias Fritz said:
BTW from the probability-as-logic perspective, countable additivity in some sense amounts to preservation of the existential quantifier: if is a predicate on , then . Have you considered this way of thinking about countable additivity? It's called the Gaifman condition. (While it's a neat observation, it doesn't seem to explain what's special about countability. For that, the structure of the real numbers as an Archimedean ordered field seems to be crucial.)
This looks really interesting, thank you! I haven't come across that. I will read this paper.
Nathaniel Virgo said:
I want to see how far we can get towards doing information theory and information geometry this way. (This is one of my motivations for learning category theory: I was going crazy filling my notebooks up with discrete sums and logarithms, and decided there had to be a better way.) I possibly have some nice ideas on how to do that, but it's maybe a topic for another time, once I've figured out more of the details.
That sounds exciting! I'm curious to hear about it when it's done.
Perhaps going in a similar direction: I have also been suspecting that there is a Markov category which describes information theory. Morally speaking, information theory is the limit of probability theory with respect to large powers. So we can try to define a category in which the objects are finite sets, and morphisms are defined in terms of sequences , where is a Markov kernel between powers of and , the sequences satisfy some compatibility condition across different , and two such sequences represent the same morphism from to if they are asymptotically equivalent as . My conjecture is that this results in a Markov category, and that this is the Markov category which secretly underlies information theory. But it's hard to find the time to work out the details.
That's quite different from what I was thinking about, but it makes a lot of sense. I did idly wonder at one point whether you can get probability and entropy from Set through a similar kind of large number limit, basically by codifying the counting arguments from physics.
I haven't thought very far along that line yet though - I was thinking more about how to define things like exponential families and the Kullback-Liebler divergence. I guess the connection between these things and large numbers is large deviations theory - maybe I should think more about how that fits in.
When it comes to duality-theoretic view on probability, I would like to advertise our new paper (https://arxiv.org/abs/1907.04036). Although, it is trying to answer a different question (what is the relationship between the theory of structural limits and Logic on Words). As a byproduct we get another instance of the fact that adding a layer of quantifiers (existential, probabilistic, semiring, ...) corresponds dually to measure space construction.
One warning: Since we chose to work with classical logic, we can't describe the space of measures because that space is not zero-dimensional and so there is no Boolean algebra dual to it. So we instead study the space of measured valued in a different space Gamma, which is basically a zero-dimensional version of the unit interval. I believe that with not much work you can adapt this to the usual [0,1]-valued measures but then you have to use geometric logic/frame theory to describe the duals.
A quick search suggests that the frame-theoretic version has been worked out by Vickers (A monad of valuation locales) and possibly also Coquand and Spitters (Integrals and Valuations)
On the relationship between probability theory and measure theory, I like these notes by Terry Tao: https://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/
He argues that probability theory cannot be identified with the study of measures and gives a criterion for a concept to be properly probabilistic:
In order to have the freedom to perform extensions every time we need to introduce a new source of randomness, we will try to adhere to the following important dogma: probability theory is only “allowed” to study concepts and perform operations which are preserved with respect to extension of the underlying sample space. (This is analogous to how differential geometry is only “allowed” to study concepts and perform operations that are preserved with respect to coordinate change, or how graph theory is only “allowed” to study concepts and perform operations that are preserved with respect to relabeling of the vertices, etc..)
These and other considerations suggest that there is good reason to study probability independent of its standard measure-theoretic foundation.
Segal's approach to probability theory avoids using that "underlying sample space" that people like to call . The idea is to focus on the algebra of random variables, which should be an integration algebra.
Segal and Kunze's book Integrals and Operators lays out this approach in detail.
It would be nice to blend it with the stuff @Tobias Fritz is doing (which is more general, but it's also good to specialize).
Nice, thanks! I hadn't heard of this.
Evan Patterson said:
On the relationship between probability theory and measure theory, I like these notes by Terry Tao: https://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/
He argues that probability theory cannot be identified with the study of measures and gives a criterion for a concept to be properly probabilistic:
In order to have the freedom to perform extensions every time we need to introduce a new source of randomness, we will try to adhere to the following important dogma: probability theory is only “allowed” to study concepts and perform operations which are preserved with respect to extension of the underlying sample space. (This is analogous to how differential geometry is only “allowed” to study concepts and perform operations that are preserved with respect to coordinate change, or how graph theory is only “allowed” to study concepts and perform operations that are preserved with respect to relabeling of the vertices, etc..)
Perhaps it's worth mentioning that this has also inspired Alex Simpson's work on probability sheaves.
John Baez said:
Segal's approach to probability theory avoids using that "underlying sample space" that people like to call . The idea is to focus on the algebra of random variables, which should be an integration algebra.
Segal and Kunze's book Integrals and Operators lays out this approach in detail.
I'm a fan of algebraic integration theory, but only learned of Segal's paper now -- thanks! However, I don't see what it has to do with the (in)convenience of having or not having an underlying sample space . In fact, it seems to me that both in measure theory and in algebraic integration theory, you can develop probability theory either in terms of an underlying sample space or in terms of joint distributions of all variables involved. So, there are four possible combinations! Let me briefly sketch the two missing ones here.
1) In measure theory, start with all the "variables" that you want to consider, and figure out what the possible values of each variables are. This results in a family of measurable spaces . Now consider a probability measure on the product space . This provides a canonical choice of sample space, and seems conceptually analogous to "focussing on the algebra of random variables".
2) In algebraic integration theory, start with an algebra together with an integration functional, and think of this algebra as playing the role of . Then a "random variable" on this algebra is an algebra homomorphism into it from another algebra; this is the formal dual of a measurable map out of . So the analog of "sample space with a bunch of random variables mapping out of it" is "sample algebra with a bunch of homomorphisms mapping into it".
There's more to say; for example, all of this neatly fits into the Markov categories setting. Namely there's a Markov category for measure-theoretic probability, and there's also one for algebraic integration theory, and when suitably defined they become equivalent. Moreover, one can state and prove the equivalence between the "underlying sample space" and the "joint distribution of all variables" approach for Markov categories in general. But this comment is already quite long, so I'll shut up now and say more only in case that someone wants more.
I have not really worked with probabilites since my S-Level Maths 30 years ago or so, sadly. But looking at co-constructive logic and having finally understood how co-implication works that reminds me of conditional probability. If one reads as a refutation of A that does not depend on one of B, this seems similar to thinking of the probability of A given B written P(A|B). Especially as in probability any number can be thought of as a probability of a refutation as much as one of truth. Is there something to that thought?
@Henry Story that's quite a nice thought.
Thinking out loud, if we think of as "the extent to which implies ", then seems like "the extent to which refutes ." It's interesting that these are, in general, numerically different (and neither is the same as the material implication ), when in bivalent (Boolean) logic they are the same.
That's nice!
Nathaniel Virgo said:
seems like "the extent to which refutes ."
Shouldn't that be ?
Another interesting take on the foundation of probability theory is quasi-Borel spaces. My main giveaway from that paper is that there is something to gain in generalizing a -algebra (= those random variables which are characteristic functions of a measurable set) to the full algebra of random variables. It also reminds me of the way Grothendieck generalized coverings in topology, by including new morphisms.
Matteo Capucci said:
Shouldn't that be ?
Yes, I suppose it should. (It's still different from .) But it's " refutes " that's the same as " implies " in Boolean logic, so I guess is the right one to consider.
Which paper are you referring to in the next comment? The Segal one? (There's quite a few references flying around now!)
Nathaniel Virgo said:
Which paper are you referring to in the next comment? The Segal one? (There's quite a few references flying around now!)
Oh yeah I forgot to link :face_palm: https://arxiv.org/abs/1701.02547